Speech Research >> forced alignment recognition utility?

by loseyourmarbles » Sat, 26 Feb 2005 15:24:01 GMT

Hi,
I'm looking for software that can take a wav file plus the spoken words
recorded in it (including the specific phones that make them up) and
output the times at which the phones occurred. Does anyone know of any
free, preferably easy-to-use software to accomplish this? Thanks so
much in advance - this would be an amazing help to our lab.

Gabe


Speech Research >> forced alignment recognition utility?

by James Salsman » Sat, 26 Feb 2005 17:10:55 GMT



Sphinx-2 in "forced alignment" mode:
http://cmusphinx.sourceforge.net/html/download.php

Sphinx-3 and -4 don't have "forced alignment" modes, but IIRC you can
give them a nonbranching grammar, if you want to try them too.

Cheers,
James
--
www.readsay.com - maker of the ReadSay PROnounce English literacy system
400 MHz PDA included: $499 -- http://www.readsay.com/PROnounce.html

Speech Research >> forced alignment recognition utility?

by Jerry W. » Mon, 28 Feb 2005 22:17:46 GMT


I can't speak for all versions of Sphinx-3, but the old, slow version
called "s3flat" has a forced-alignment program named "s3align", which I
use. I think that some of the more recent versions of Sphinx-3 may
also have such a program. See
http://cmusphinx.sourceforge.net/html/download.php #sphinx3.

Anyone attempting to use these open source recognizers will also need
an acoustic model. Fortunately there are some adult models available:
http://cmusphinx.sourceforge.net/html/system.php #models.

cheers,
jerry wolf

Speech Research >> forced alignment recognition utility?

by David Huggins-Daines » Wed, 02 Mar 2005 00:44:39 GMT


Yes, such a program comes with Sphinx 3.5. It is simply called "align".
However, the Sphinx-II one seems to work better at the moment for what
you're trying to do.

The FestVox project ( http://www.festvox.org/ ) has some scripts which do
exactly what you describe - they use Sphinx-II to segment speech files
for building concatenative TTS voices.