The prize for developing a successful speech recognition technology isenormous. Speech is the quickest and most efficient way for humans tocommunicate. Speech recognition has the potential of replacing writing,typing, keyboard entry, and the electronic control provided by switches andknobs. It just needs to work a little better to become accepted by thecommercial marketplace. Progress in speech recognition will likely come fromthe areas of artificial intelligence and neural networks as much as through DSPitself. Don't think of this as a technical ; think of it as a technical .

As of the moment, we do not possess any information as of any solid Ukrainian companies being engaged in developing software based on speech recognition or synthesis. There probably are no such companies. We have information of either scientific institutions researching in the field of speech recognition and synthesis, or of individual developers. The situation is improving gradually, and in spring 2008 a synthesis module developed in Ukraine appeared in a foreign software product.

Speech recognition algorithms take this a step further by trying to recognizepatterns in the extracted parameters. This typically involves comparing thesegment information with templates of previously stored sounds, in an attemptto identify the spoken words. The problem is, this method does not work verywell. It is useful for some applications, but is far below the capabilities ofhuman listeners. To understand why speech recognition is so difficult forcomputers, imagine someone unexpectedly speaking the following sentence:

About a decade later, in 1951, Franklin Cooper and his associates developed a Pattern Playback synthesizer at the Haskins Laboratories (Klatt 1987, Flanagan et al. 1973). It reconverted recorded spectrogram patterns into sounds, either in original or modified form. The spectrogram patterns were recorded optically on the transparent belt (track ).

After demonstration of VODER the scientific world became more and more interested in speech synthesis. It was finally shown that intelligible speech can be produced artificially. Actually, the basic structure and idea of VODER is very similar to present systems which are based on source-filter-model of speech.

PAT and OVE synthesizers engaged a conversation how the transfer function of the acoustic tube should be modeled, in parallel or in cascade. John Holmes introduced his parallel formant synthesizer in 1972 after studying these synthesizers for few years. He tuned by hand the synthesized sentence "I enjoy the simple life" (track ) so good that the average listener could not tell the difference between the synthesized and the natural one (Klatt 1987). About a year later he introduced parallel formant synthesizer developed with JSRU (Joint Speech Research Unit) (Holmes et al. 1990).

First device to be considered as a speech synthesizer was VODER (Voice Operating Demonstrator) introduced by Homer Dudley in New York World's Fair 1939 (Flanagan 1972, 1973, Klatt 1987). VODER was inspired by VOCODER (Voice Coder) developed at Bell Laboratories in the mid-thirties. The original VOCODER was a device for analyzing speech into slowly varying acoustic parameters that could then drive a synthesizer to reconstruct the approximation of the original speech signal. The VODER consisted of wrist bar for selecting a voicing or noise source and a foot pedal to control the fundamental frequency. The source signal was routed through ten bandpass filters whose output levels were controlled by fingers. It took considerable skill to play a sentence on the device. The speech quality and intelligibility were far from good but the potential for producing artificial speech were well demonstrated. The speech quality of VODER is demonstrated in accompanying CD (track ).