

The technical team works its magic on this process – using a powerful combination of Artificial Intelligence and machine learning technologies on big amounts of data to optimize annotations. Our state-of-the-art methodologies are augmented by the linguistic expertise of our team. The resulting database is used by the ReadSpeaker TTS engine to convert text into speech spoken by the TTS voice: segments (units) of speech are selected and ‘glued’ together in such a way that high-quality synthetic speech is produced. This is how a new ReadSpeaker TTS voice persona is born. One of ReadSpeaker’s unique characteristics is our ongoing improvement process. Through a system of high-quality feedback and a thorough Quality Assurance process by mother-tongue experts, imperfections are continuously corrected.

In parallel, ReadSpeaker creates so-called neural voices, using techniques based on deep learning AI technology. This revolutionary method involves mapping linguistic properties to acoustic features using Deep Neural Networks (DNNs). An iterative learning process minimises objectively measurable differences between the predicted acoustic features and the observed acoustic features in the training set. One of the advantages of the new DNN TTS method is that the acoustic database can be much smaller than for a USS voice. Only a few hours of recorded speech are needed for a neural voice, compared to at least three times as many for a good quality USS voice. Also, the resulting speech is generally smoother and even more human-like. This makes developing new, smart ReadSpeaker TTS voices with even more lifelike, expressive speech and customizable intonation faster than ever. #MEME TEXT TO SPEECH VOICES PROFESSIONAL#.
