Speech Research >> Decide Speaker Threshold

by herism82 » Wed, 20 Apr 2005 10:40:24 GMT

I want to make a "Speaker Recognition". Currently, I have finished MFCC
feature Extraction and Vector Quantization parts. I want to ask about
pattern matching.

I have tried to do pattern matching by looking for the distance of the
nearest clustered input vectors and vector in the database. The problem
is, how can I decide the speaker threshold? I found that the distance is
changed time by time, although the change is not significant. Can anyone
tell me how the pattern matching should be done?

Thank you very much


Speech Research >> Decide Speaker Threshold

by James Salsman » Wed, 20 Apr 2005 12:49:11 GMT


> I want to make a "Speaker Recognition". Currently, I have finished MFCC

I'm not sure what a speaker threshold is, but perhaps you would benefit
from reading these papers from the creators of currently the most
accurate speaker recognition system ("Save as..." PDF files):

http://www.speech.sri.com/cgi-bin/run-distill?papers/icassp2005-spkr-system.ps.gz
http://www.speech.sri.com/cgi-bin/run-distill?papers/icassp2005-spkr-phonelats.ps.gz
http://www.speech.sri.com/cgi-bin/run-distill?papers/icslp2004-snerfs.ps.gz
http://www.speech.sri.com/cgi-bin/run-distill?papers/odyssey2004-nerfs.ps.gz
http://www.speech.sri.com/cgi-bin/run-distill?papers/asru2003-spkrid.ps.gz
http://www.speech.sri.com/cgi-bin/run-distill?papers/eurospeech2003-spkrdur.ps.gz

Sincerely,
James Salsman
--
www.readsay.com - maker of the ReadSay PROnounce English literacy system
400 MHz PDA included: $499 -- http://www.readsay.com/PROnounce.html

Speech Research >> Decide Speaker Threshold

by Olivier Galibert » Fri, 22 Apr 2005 18:13:37 GMT


[speaking of SRI]

On which task?

OG.

Speech Research >> Decide Speaker Threshold

by Tomi Kinnunen » Sat, 23 Apr 2005 23:37:41 GMT

: is, how can I decide the speaker threshold? I found that the distance is
: changed time by time, although the change is not significant. Can anyone
: tell me how the pattern matching should be done?

A good way to tackle the variability of the absolute match score is to
normalize it. There are at least two simple options:

1. Zero mean, unit=variance normalization:
2. Normalization relative to other speaker models/
For this you can Google for 'world modeling'
'cohort modelg' 'cohort normalatn' etc.

Simple example from 2:

D_i' = D_i - max D_j

Where D_i' is the normalized score and maximum is over all different
speakers other than i.

Score normalization transforms the scores into a common range and you can
experimentally find good threshold.

Hope th helps.

Tomi