Calibrating the Formant Frequency Map in <tt>FFTVoiceDetection.h</tt>
This guide walks you through the process of calibrating the formant frequency map in the FFTVoiceDetection.h
file. Calibration ensures that the coordinates for each viseme (mouth shape) match the formant frequencies for your specific speaker and setup.
What Are Formants?
Formants are resonant frequencies of the vocal tract that define the characteristics of vowel sounds. In this file:
- F1 corresponds to the height of the tongue.
- F2 corresponds to the position of the tongue.
Each viseme has a pair of F1 and F2 frequencies that define its position in the frequency space.
Prerequisites
- Spectrum Analyzer:
- Install a spectrum analyzer app on your phone or computer. Examples:
- Microphone Setup:
- Use a high-quality microphone placed in a quiet environment.
- Ensure consistent distance and orientation between the microphone and the speaker.
- Sound Samples:
- Prepare a list of vowel sounds (
EE
, AE
, UH
, AR
, ER
, AH
, OO
) to produce during calibration.
Calibration Process
Step 1: Gather Formant Data
- Open the Spectrum Analyzer:
- Configure the analyzer to display frequency peaks.
- Set the frequency range to 0–4000 Hz.
Produce Each Sound:
- Speak or produce each vowel sound (
EE
, AE
, UH
, AR
, ER
, AH
, OO
).
- Observe and record the first peak (F1) and second peak (F2) frequencies for each sound.
Example:
- For
EE
, F1 might be 350 Hz and F2 might be 3200 Hz.
- Repeat for Accuracy:
- Repeat each sound several times to average the observed F1 and F2 values.
Step 2: Update the <tt>FFTVoiceDetection.h</tt> File
- Locate the Coordinates:
- Update the Values:
- Save the File:
Step 3: Verify Calibration
- Compile and Upload:
- Recompile the code and upload it to your Teensy board using PlatformIO.
- Test Sounds:
- Produce each vowel sound again and check the corresponding viseme probabilities using the
PrintVisemes
method.
- The method assumes f1 and f2 are set by prior calculations.
- If no valid viseme is identified, no output is generated.
- Serial formatting from
PrintVisemes
:
350.0,3200.0,EE
Implements a generic Kalman Filter for 1D data.
- Adjust Threshold (Optional):
- If probabilities do not align as expected, tweak the
threshold
value: float threshold = 400.0f;
Troubleshooting
- Inconsistent Probabilities:
- Ensure the microphone captures clear audio.
- Verify the F1 and F2 frequencies during calibration.
- Low Probabilities:
- Overlap Between Visemes:
- Adjust the coordinates for closer alignment with the actual formant frequencies.
By following these steps, you can ensure accurate calibration of the formant frequency map for reliable viseme detection.