How to convert human voice into digital format?

How to convert human voice into digital format?

I am working on a project where biometric system is used to secure the system. We are planning to use human voice to secure the system.
Idea is to allow the person to say some words or sentences and system will store that voice in digital format. Next time person wants to enter the system, he/she has to speak some words which may or may not be different from the words used earlier.
We don’t want to match words but want to match voice frequency.
I have read some research papers regarding this system but those papers don’t have any implementation details.
So just want to know whether there is any software/API which can convert analog voice into digital format and will also tell us the frequency of voice.
Until now I was working on normal web based applications so I know normal APIs and platforms like Java EE, C#, etc but I don’t have any experience about this kind of application.
Please enlighten !!!


Solution 1:

Solution 2:

This is as good a starting point as any :

It’s a open source software framework for audio processing. They’ve listed a bunch of projects that have used their framework in various ways so you could probably draw inspiration from it. The Telligence project in particular seems the closest to your needs as it it was used to gender classify audio :

Solution 3:

There are two steps on a project like this one I believe:

Related:  Do htmlspecialchars and mysql_real_escape_string keep my PHP code safe from injection?

First step would be to record the voice from an analog input into digital format (let’s assume wav-pcm). For this you can use DirectShow API in C#, or standard Wav-In as in this project: You may consider compressing your audio files later on, there are many options for this, in Windows you may consider Windows Media Format SDK to avoid licensing issues with other formats.

Second step is to build or use a voice recognition framework, if you want to build a recognition framework you will probably need to define a set of “features” for your sound fragments and select+implement a recognition algorithm. There are many aproaches available for this, IEEE amd websties are usually good sources. If you want to use an existing framework you may want to consider Nuance Recognizer (commercial) or (open source).

Hope this helps.