Voice Type Dictation: A Personal Opinion

By Jay Schwartz

I do not know exactly where to begin this piece, except to say that the reception of the VoiceType products has been extremely underwhelming. When I first saw the VoiceType Dictation product demonstrated running on OS/2 I was astounded. You could actually speak to your computer and it would understand you and transcribe your words. The product had another name then, but it was version 1 running on OS/2 Version 2.1. I was amazed that it cost only $1500. Now, in Warp, the latest version of VoiceType is included for FREE in every "specially marked box". And those boxes come in both full and upgrade sizes. And they include the "connect" stuff, and the "Windows 3.1" stuff, and the "Java" stuff, and even the BonusPak (yes, I do use it).

Somehow, the world at large, it seems, will not discover voice recognition for another few years when Bill Gates finally invents it. That is truly unfortunate for them since many of us have discovered the value of speech recognition already.

Many OS/2 users are resisting the upgrade from Warp 3 to Warp 4. They have heard about problems. For me, however, the upgrade went very smoothly. (I never load Beta copies of OS/2 on my system.) In my case, I upgraded not to get the gaudy new colours, but rather to get my free copy of VoiceType. You may have seen this product demonstrated, but to really appreciate it, you need to live with it for a while.

As a short review let me remind you that VoiceType really consists of two different but integrated products. The first of these his voice navigation. Navigation is speaker independent and uses continuous speech. It can be used to navigate your desk-top -- launch applications, move windows around and select menu and button options. Although it is fun to play with, navigation is not really any faster or easier than using the mouse and I personally do not use voice for this purpose. The second part of VoiceType is its dictation capability. This is where it earns its keep. Yes, there are some drawbacks. It does require training (approximately two hours) and you must learn to speak with a short pause between words. This pause may seem unnatural at first but you can get used to it quite easily. I find the change in breathing more difficult to accommodate than the pauses in speech. In fact, I can dictate approximately as fast as I can think of things to say. ("Um" and "ur" don't count.) While I am not a terrible typist, I certainly cannot type at that rate.

I find VoiceType to be quite remarkable in that it rarely makes a mistake. Whenever I see an incorrect word, usually it is because I have mispronounced it; mostly by running two words together. The program's vocabulary is surprisingly large and includes many computer acronyms and most Canadian place names. To use dictation you open a special window called appropriately enough, the Dictation Window. The Dictation Window shows all of your spoken words as text but also retains the spoken sound as well. You can follow the text on this screen if you wish. However, it is not necessary. You can talk on as you like and the recognition engine will catch up to you. You can click on any word and have it played back. At this point you have several correction options. The program displays a list of reasonable alternatives to choose from, but if it is still not there, you can type in the correct word. These corrections are remembered and used to amend your voice model. The more you use VoiceType Dictation, the better it gets at recognizing the way you speak. Finally, when you are finished dictating, you can transfer (or cut and paste) the complete text to any word processor or text editor. Naturally, there is no font information but the text is complete and ready for editing. Keep in mind that it is not necessary to run a spell check on the results of VoiceType. Although it may select an incorrect word, you can be sure the word it chooses is spelled correctly.

Up until now I have used the term "word" to mean the same as utterance. Really any utterance is a sound surrounded by short silences. It is possible to create macros that can substitute any text for any utterance. Generally you want to select an utterance which is meaningful to you yet still not a normal word. The way to do this is to string two or three words together. For example a sample macro that I use is "my-name" which produces Jay Schwartz.

There are many macros already provided in VoiceType. They include such things as punctuation. Thus you can ask for *.,-$!:;&%. Macros can be as elaborate as you like.

The real down side for me is that I cannot use VoiceType to do C programming. In order for recognition to work and be useful, your content must be in sentences and the number of words should outnumber the punctuation symbols. For the right use, however, it is wonderful. For example, the first draft of this review was dictated.

My only other complaint is that when I am dictating, I must turn down the radio. Maybe if I get headphones for the radio, I will keep listening.

I should warn you that in addition to Warp 4 you do need a reasonably powerful computer. I have heard numbers ranging from Pentium 75 to Pentium 100 as minimum. You also need a sound card that OS/2 recognizes. In my case I had no sound card at all so I bought one specifically for VoiceType -- the cheapest Sound Blaster I could find. It cost a little over $100. Was it worth it? Well, I was ready to spend the $1500 on the original package, so it is as if I won $1400 in the lottery.

To my mind, speech recognition is the next "killer app". It has the ability to turn ordinary people into computer users (to do useful work, not play games).

Oh, someone should tell Bill Gates to stop trying. VoiceType is already available for Win95. The main difference from the OS/2 version is that the Win95 folks have to pay to get it!

Revised: 2001 Feb 18