On Achieving Auto-Pronounce

One of the thing we wanted this dictionary to have from the very start was a function to allow the user click and hear the audio pronunciation of a name of their choice. This is a standard feature in most online dictionaries. To achieve it, however, posed a very tough problem: how do we get one person to pronounce over tens of thousands of names, correctly, within the time needed for the dictionary to launch. A second problem: how do we get enough money to pay for the data space required to host such a huge body of audio data?

We acknowledged the dilemma: to find a way to host the audio on an affordable space, or to teach the computer to pronounce the names without having to host a human audio at all. A solution, when it came, fell in-between those two choices. It was possible, it turns out, to get a human to pronounce a finite set of audio and tonal segments, and to get the computer use this data to create an infinite number of words, with something that resembles match-making (the technical word is “concatenation”).  It required a knowledge of phonology in the target language, and a grasp of computer permutation procedures. Two of us working on this happened to be competent, individually, in these areas. (I expect Dadépọ̀ to, at some point in the future, write about how the technical aspect came together).

Fullscreen capture 422015 50346 PM.bmp

It took a couple of days pronouncing the test audio, covering hundreds of segments of sound in Yoruba in all their tonal iterations, along with all those vowels matched with all the consonants in the language. Feeding this to a computer program that recognizes how to match these together when presented with a Yoruba word is the next step. As the software department found out during last week of manual matching, it is something that is possible on a grand scale.

The preliminary results from manual matching are here and it blew my mind. Click here for the name “Bádé̩jọ” and here for “Kọ́lá”. Those are auto-pronounced names, half-done by man and half-done by the computer. It was a major breakthrough on which a larger future work can be based, and through which any word in the Yoruba (or any other) language can be realized by a computer through this process of concatenation.

The possibilities now seem limitless although there are still other things to sort out in the coming weeks. As it also turns out, this process (part of something larger, called “speech synthesis”) isn’t new at all, and is credited with a lot of advancement in speech technology all over the world. If we succeed, the we would have shifted a massive obstacle in the way of African language technology.

Next step, get a good professional female voice to do this again. I don’t think that many people expect a male voice while browsing a dictionary 🙂