Benchmark of Sphinx2, Sphinx3, PocketSphinx

8 08 2007

I made a Benchmark of Sphinx2, Sphinx3, and PocketSphinx to analyze the memory usage, time to decode, and errors on recognition ( errors on sentences and on words ).
The tests were made on a AMD Athlon(TM) XP 2000+ 1670.608 MHz with 512MB of memory.


Memory used: 4.7% of 512MB = 24,064MB

root@controle03# time perl scripts_pl/decode/
MODULE: DECODE Decoding using models previously trained
Decoding 130 segments starting at 0 (part 1 of 1)
Using files: 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Finished
SENTENCE ERROR: 78.462% (102/130) WORD ERROR RATE: 37.904% (293/773)

real 0m49.394s
user 0m46.599s
sys 0m0.296s


Memory used: 5.8% of 512MB = 29,696MB

root@controle03# time perl scripts_pl/decode/
MODULE: DECODE Decoding using models previously trained
Decoding 130 segments starting at 0 (part 1 of 1)
SENTENCE ERROR: 80.8% (105/130) WORD ERROR RATE: 34.7% (268/773)

real 5m47.224s
user 5m36.393s
sys 0m1.932s


Memory used: 0.6% of 512MB = 3,072MB

root@controle03# time perl scripts_pl/decode/
MODULE: DECODE Decoding using models previously trained
Decoding 130 segments starting at 0 (part 1 of 1)
SENTENCE ERROR: 78.5% (102/130) WORD ERROR RATE: 37.9% (292/773)

real 0m0.211s
user 0m0.156s
sys 0m0.052s

With this benchmark we can see that Sphinx3 has less errors on recognition of words ( WORD ERROR RATE ) than Sphinx3 and PocketSphinx, but the time of decode is much bigger compared with Sphinx2 and PocketSphinx. The memory usage of Sphinx3 is greater than Sphinx2 and PocketSphinx.

The PocketSphinx has less memory usage and is the faster decoder.

The Gnome-Voice-Control is been modified to use PocketSphinx. Soon we’ll have the release 0.3 using PocketSphinx.

Raphael Nunes




7 responses

8 08 2007

Sweet. I can’t wait for the next release. I had a lot of fun play with 0.2.
I would say the biggest things I would like to see is the ability to make certain voice commands run whatever you want, such as setting the browser command to firefox, or whatever you want it to do.
But I know these things will probably have to wait until the more important kinks are being worked out.

9 08 2007

I think you have a very good point here, and I’m glad to see this comparison.
PocketSphinx seems to be actually much faster, other than being lighter.

Honestly, do you think these results are acceptable? I mean, did you expect such close results (2% difference is nothing) with such different time and memory consumption?

Great job, by the way!!

22 08 2007
David Huggins-Daines


It looks like PocketSphinx may actually have failed to run completely, which isn’t surprising since the decoding scripts are sometimes broken.

You should definitely see about 20% less runtime and memory consumption, but not something as dramatic as above 🙂

6 11 2007

I am using gnome-voice-control 0.2 as we speak and some thoughts come to mind. Does this program limit it’s options? What I mean is, if I say “run” does it have a list of installed programs and judge what I say based on that list, or on the complete dictionary? I think that if it had the list of programs it would be a lot more reliable. It might also help if it knew the name of the program more then the command. Example: “open office writer” should run “ooffice -writer %U.”

Have you considered sphinx4? I know that it is said to be slow and speed is key, but inaccuracy is what makes people give up, so it would make sense to use the latest software.

Is there a list of possible commands that the user can see to help them use the program to the fullest? I have the “close window” command working almost without flaw, but I don’t know what other options I have. I also have no idea how to type. I have been trying “type hello” and nothing happens. Maybe I am doing the wrong thing.

Despite the criticisms, this is a great program and is the only program of it’s kind that I have been able to install. It is really easy to use and that is very important to me and new computer users.


24 12 2007


Great job on gnome-voice-conrol! I’ve used versions 0.2 and 0.3, and I thought I’d share my experiences in the hope that it might further development a little and help anyone who’s encountered the same (minor) problems that I had to get it up and running.

Regarding installation:

the 0.2 .deb package on the ubuntu repos installed without a problem. It took me a minute or two to realise it was a panel applet but apart from that it was all good (changing the package description to specify “panel” might be a useful addition though?).

with compiling 0.3 from source, I had a little trouble finding some of the listed dependencies and found some of the instructions vague, but I think that was due to my level of experience as opposed to your instructions! I got it installed without too much trouble anyway so I’m obviously learning.

Regarding initial usage:

0.2; Again, because I installed the software from the repos, I had no instructions or any idea how to use it. A quick google revealed the youtube screencast and that let me know what commands were supposed to work. I wasted some time repeating “run browser” in different tones/speeds of voice before doing some more searching and discovering the program opens epiphany by default which I dont have installed (perhaps you could implement a check and generate an error notification if expected software is not available?). Anyway, after I got firefox opening I set about trying to get the other commands recognised…

I had most success using the following approach:

1. Speaking clearly, and *slightly* exagerating my pronuciation, eg with the word “next” emphasise like this – nnex-T. Or with “file” emphasise the “f” and “l” sounds like this – ffy-le.
2. Not pausing between words if both words are part of the same command.
3. Speaking at my normal speed/slightly faster, avoiding slowing down or “dumbing down” my voice.
4. Trying not to sound frustrated when repeating commands – finding a neutral tone that works and repeating it consistently gives the best results (easier said than done sometimes).

Comparing 0.2 and 0.3:

While I appreciated the additional commands added to 0.3 I found it to be much more “paranoid” than 0.2. What I mean by this is, 0.2 generally just sat dormant and only reacted when it recognised a command. It mostly ignored background noise and normal speech and if it didnt understand me it ignored me. However, 0.3 is reacting much more to background sounds and best-guessing a command, any command. Just typing or coughing or moving your chair usually issues a command of some sort. Making random, silly noises will issue valid commands with 0.3 as well.

Regarding command list:

You might consider the following additional commands ifthey haven’t already been implemented;

shutdown computer
browse files
run messenger
run music player

That’s about all I can think of right now, except to say i’ll be keeping a close eye on this project and promoting it where I can. Keep up the good work!

10 03 2008

I have an acousic model(triphone) and I use it in s3.6 (.cont.) and it works well,
what should I do to use it in pocketsphinx0.4?
It has the error : #codebooks (650) != 1 in libpocketsphinx\s2_semi_mgau.c”, line 1150.

15 06 2008

Is it possible to make it work with the Gnome on screen keyboard? I mean all you need to say is the letters, numbers, capital letters, shift key+ letter for starters. Is this more difficult?


For the physically impaired using the above keyboard.
For those using other languages, who use the English QWERTY keyboard, where language support by voice may take a long time to come or never if it is a minor language.
other things ihave not thought of yet!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: