Commodore User


Talking Heads

 
Published in Commodore User #25

Talking games have grabbed their fair share of chart toppers this year. Our American sound expert, Tom Jeffries, went to Berkeley, in California, to talk to the freelance sound specialists who put the chat into Ghostbusters, Impossible Mission, Beach Head II and Kennedy Approach.

Talking Heads

The latest thing in game software today is speech synthesis. So many games on the market these days use synthesized voices that I decided to find out who was responsible for all this digital eloquence, why software companies are fighting it worthwhile to include speech in their programs, and where it is all headed.

Back in the days when computers were enormous, expensive machines available only to people in large universities and corporations, the intellectual challenge of playing a game with a machine had to take the place of advanced features like graphics or sound. Computer time and memory space were far too expensive to fill up with such frills, so mainframe games were (and are) usually text only.

Home computers changed all that. Techno-freaks being what they are, it didn't take long before people started demanding arcade-style graphics on how computers, so special chips were added, and large amounts of memory were set aside just for graphics. Sound also got attention. At first outboard devices were required to create audible output, but soon ways were found to incorporate sound capability into computers. The Apple II and IBM PC both use one of the earliest and simplest forms of onboard sound: a speaker driven by a series of pulses sent by writing to a particular memory location. Programmers have created some amazingly complex sounds, including speech, using this primitive hardware.

As home computers progressed, both the graphics and sound capabilities got better and better. There has been a consistent push for greater realism in gameplay.

It Talks

So it won't surprise you that more and more computer games are including the sound of human (and non-human) voices. Pick up a copy of Kennedy Approach, Impossible Mission, Beach Head II, Jump Jet or Ghostbusters and you'll see what I mean. Not only is speech synthesis being used very widely, but the quality is amazingly clear, and improving all the time.

Good speech synthesis is very difficult; it didn't surprise me that most software houses have someone else do it for them. It did surprise me, however, to find out that all of the above-mentioned games, except for Jump Jet, had their speech provided by one company: Electronic Speech Systems of Berkeley, California. Since I only live a few miles away, it seemed like a good idea to run up there and see if I could find out the secret of their success.

ESS started in 1970 when Todd Mozer's father, Dr. Forrest Mozer, a space physicist at the University of California at Berkeley, developed a technique for speech synthesis based on playing back a digitized voice.

It had been assumed previously that this approach would use a prohibitive amount of memory, but Dr. Mozer found ways to encode the data and reduce its size as much as one hundred-fold. Other approaches rely on creating an elaborate mathematical model of the human voice, requiring either a special dedicated speech chip or a very fast, powerful (and expensive) central processor, and producing a rather mechanical sounding voice.

Dr. Mozer's algorithm keeps the natural inflections of the human voice, and in current implementations, can use any microprocessor.

At first Dr. Mozer concentrated on hardware implementations of his ideas. His technology was used in the first talking calculator for the blind and in a speech chip produced by National Semiconductor. As the limitations of this approach became clear, he and his associates began to concentrate on ways to synthesize speech in software with little or no added hardware, which led to the techniques used to reproduce the incredible laugh in Ghostbusters.

Currently ESS, in addition to providing blood-curdling sounds for computer games, is producing speech synthesis products for major electronic equipment manufacturers.

They're just finished a product for AT&T they will ring you up in case of a fire or burglary at your house when you are away and tell you what the problem is; they are working with a major automobile manufacturer on a system that will tell you if your oil is low, and will tell you or your mechanic what the problem is when you break down. Wow!

How It's Done

The ESS system is protected by a dozen or so patents so the details remain secret, but basically it goes like this. They start out by making a high quality recording of the words they want to use, with a voice they feel is appropriate. (For example, for an educational program based on Kipling's The Jungle Book they used an Indian student of Dr. Mozer's.) Then then digitise the sound (convert it from analog tape-type sound to 1s and 0s that the computer can read) and, using a mini-computer, crunch the original down to 100th of its original size. This crunching is the heart of their system. It takes a considerable amount of effort to decide what information can be thrown away, and which information is essential to the sound. The original information usually involves about 10,000 complete sound samples per second; the finished product uses between 90 and 625 bytes per second.

On the Commodore 64, they normally use a rate of 375 bytes per second or less, so it's possible to pack quite a lot of speech into a program.

To play back the speech on the Commodore 64, ESS uses the machine's own sound device, the SID chip, but in quite an unusual way. All of the registers of the SID are shut down except the volume control, which is varied up and down to recreate the original waveform.

Since there are only 16 possible settings, the resulting sound can never be as good as an ordinary tape deck, which has the capability of infinite variation, but they do produce easily intelligible speech.

ESS's technology can reproduce the accents and inflections of the original speaker quite accurately, like the Indian in Jungle Book, or can change them as needed so that the same vocabulary can produce a human and a robot voice.

Kennedy Approach

All of this technology is pretty impressive, but it's up to the software companies to put it to use. I asked George Geary of MicroProse software, publisher of Kennedy Approach, an air traffic control simulation, why MicroProse had decided to use speech synthesis in their program, and his answer was simple and to the point: "To enhance gameplay".

The voice from the airport control tower (you) alternates with the voices from the various airplanes in giving and receiving instructions and really does add a considerable amount of realism to the simulation. Listen carefully and you will notice that the voices of the different pilots are pitched differently - a subtle touch, but I found that even before I was aware that the voices were different, my ear knew the difference.

MicroProse, which has its speech digitising done by ESS, is so happy with the effect of speech in Kennedy Approach that it is currently adding a male and a female voice to Solo Flight so that they can re-release an enhanced version. They do plan to limit their use of speech synthesis to programs where the gameplay itself will be enhanced by the electronic voice.

Other uses of synthesized speech are more whimsical. No-one would argue that speech is a necessary part of Ghostbusters, but it certainly adds a distinctive and humorous touch. According to Brad Fregger, Director of Software Development at Activision, they want to "give the game the same feeling as the movie", and voice was one way of accomplishing this.

Activision considers voice to be "The icing on the cake - we wouldn't leave out the eggs in order to have the icing", but in this case there was room for both. Personally, I'm glad - what other game says, "He slimed me!" when I miss?

Likewise, the voices in Jump Jet and Impossible Mission, while adding to the enjoyment and character of the software, are not essential to the game.

Robert Botch, Epyx's Vice President of Marketing, said speech was put into Impossible Mission "to add something extra - some realism"; the cry that occurs as your character falls through one of the holes in the floor is certainly realistic enough.

A more serious use of speech synthesis is in educational programs. According to Todd Mozer, this is the area where ESS expects to see the greatest use of electronic voices in the future. He said, "There have been a lot of studies done about the effectiveness of speech in learning and the results have been extremely positive. Children will sit in front of a computer longer if it's giving them verbal feedback, and it provides a much more effective mechanism for teaching. I would expect that to be a realm where speech takes off." ESS has already produced speech for several educational programs including Talking Teacher by Imagic and Cave Of The Word Wizards by Timeworks.

The Future

What's the next step in the never-ending battle for greater realism and higher sales? The experts were nearly unanimous: before too long computers will be able to understand and respond to your speech. Speech recognition is extremely difficult to accomplish because of the complexities of the English language and the variations between voices, but several systems have been developed, including the Covox Voicemaster system for the Commodore 64. Mozer thinks that eventually computer manufacturers may include speech recognition capabilities as a part of the computer. It sounds like fun to me: I can think of quite a few things to say to that ghost that slimed me in Ghostbusters!

With built-in speech synthesis and speech recognition, you and your Commodore can sit down for a heart-to-heart chat or, more realistically, you will be able to use your home computer with a modem as an intelligent telephone answering machine. Not surprisingly, ESS is just putting the finishing touches to a system which does exactly that.

If there is any doubt about whether synthesized speech is here to stay or not, check into the specifications for Commodore's new wonder machine, the Amiga. Speech synthesis is built-in to the Amiga, and software companies are rushing to put it to use. So get used to hearing your computer talk back.

Tom Jeffries