Voice Calibration
Voice Calibration
Section titled “Voice Calibration”This guide walks you through calibrating the formant frequency map in FFTVoiceDetection.h for accurate viseme (mouth shape) detection.
What Are Formants?
Section titled “What Are Formants?”Formants are resonant frequencies of the vocal tract that define the characteristics of vowel sounds. In ProtoTracer:
- F1 corresponds to the height of the tongue
- F2 corresponds to the position of the tongue
Each viseme has a pair of F1 and F2 frequencies that define its position in the frequency space.
Prerequisites
Section titled “Prerequisites”1. Spectrum Analyzer
Section titled “1. Spectrum Analyzer”Install a spectrum analyzer app on your phone or computer:
- Spectroid (Android)
- Room EQ Wizard (Desktop)
2. Microphone Setup
Section titled “2. Microphone Setup”- Use a high-quality microphone in a quiet environment
- Ensure consistent distance and orientation between the microphone and speaker
3. Sound Samples
Section titled “3. Sound Samples”Prepare to produce these vowel sounds during calibration:
EE(as in “see”)AE(as in “cat”)UH(as in “but”)AR(as in “car”)ER(as in “her”)AH(as in “father”)OO(as in “boot”)
Calibration Process
Section titled “Calibration Process”Step 1: Gather Formant Data
Section titled “Step 1: Gather Formant Data”-
Open the Spectrum Analyzer
- Configure to display frequency peaks
- Set frequency range to 0–4000 Hz
-
Produce Each Sound
- Speak each vowel sound
- Record the first peak (F1) and second peak (F2) frequencies
Example: For
EE, F1 might be 350 Hz and F2 might be 3200 Hz -
Repeat for Accuracy
- Repeat each sound several times to average the observed F1 and F2 values
Step 2: Update FFTVoiceDetection.h
Section titled “Step 2: Update FFTVoiceDetection.h”-
Locate the Coordinates
In
FFTVoiceDetection.h, find theVector2Ddefinitions for each viseme:// Original formant mapVector2D visEE = Vector2D(350.0f, 3200.0f); ///< Coordinates for "EE"Vector2D visAE = Vector2D(500.0f, 2700.0f); ///< Coordinates for "AE"Vector2D visUH = Vector2D(1100.0f, 2700.0f); ///< Coordinates for "UH"Vector2D visAR = Vector2D(850.0f, 850.0f); ///< Coordinates for "AR"Vector2D visER = Vector2D(1000.0f, 1000.0f); ///< Coordinates for "ER"Vector2D visAH = Vector2D(900.0f, 2400.0f); ///< Coordinates for "AH"Vector2D visOO = Vector2D(600.0f, 600.0f); ///< Coordinates for "OO" -
Update the Values
Replace with your calibrated averages:
// Updated formant mapVector2D visEE = Vector2D(360.0f, 3150.0f); ///< Updated for "EE"Vector2D visAE = Vector2D(510.0f, 2650.0f); ///< Updated for "AE"Vector2D visUH = Vector2D(1120.0f, 2750.0f); ///< Updated for "UH"// ... continue for all visemes -
Save the File
Step 3: Verify Calibration
Section titled “Step 3: Verify Calibration”-
Compile and Upload
Recompile and upload to your Teensy using PlatformIO
-
Test Sounds
Produce each vowel sound and check the corresponding viseme probabilities using the
PrintVisemesmethod:// Serial output format from PrintVisemes:// <f1>,<f2>,<viseme_label>// Example: 350.0,3200.0,EE -
Adjust Threshold (Optional)
If probabilities don’t align as expected, tweak the threshold value:
float threshold = 400.0f; ///< Adjust as needed
Troubleshooting
Section titled “Troubleshooting”Inconsistent Probabilities
Section titled “Inconsistent Probabilities”- Ensure the microphone captures clear audio
- Verify F1 and F2 frequencies during calibration
Low Probabilities
Section titled “Low Probabilities”-
Increase the
thresholdvalue slightly -
Check
peakDetectionparameters for sensitivity:PeakDetection<peakCount> peakDetection = PeakDetection<peakCount>(8, 2.0f, 0.5f);
Overlap Between Visemes
Section titled “Overlap Between Visemes”- Adjust coordinates for closer alignment with actual formant frequencies
By following these steps, you can ensure accurate calibration of the formant frequency map for reliable viseme detection in your Protogen.