Home
>
Software Development
>
Creating Voice-Jaw Data with a Processing Script – InApps 2025

March 21, 2022 by Phu Nguyen

Creating Voice-Jaw Data with a Processing Script – InApps 2025

Main Contents:

Creating Voice-Jaw Data with a Processing Script – InApps is an article under the topic Software Development Many of you are most interested in today !! Today, let’s InApps.net learn Creating Voice-Jaw Data with a Processing Script – InApps in today’s post !

Key Summary

Overview: The article provides a guide on generating voice jaw animation data using a processing script, focusing on its application in animation or gaming, as presented by InApps Technology in 2022.
Key Points:
- Purpose: Voice jaw data is used to synchronize character mouth movements (jaw animations) with audio inputs, enhancing realism in games, virtual avatars, or animated films.
- Core Concept:
  - Analyze audio (e.g., speech) to detect phonemes or amplitude changes.
  - Map audio features to jaw movement parameters (e.g., open/close, intensity).
  - Output data for animation tools (e.g., Unity, Blender).
- Processing Script:
  - Typically written in a language like Python or Processing (a Java-based platform for visual arts).
  - Leverages audio processing libraries to extract relevant features.
- Steps to Create Voice Jaw Data:
  - Audio Input:
    - Load an audio file (e.g., WAV, MP3) using libraries like librosa (Python) or Minim (Processing).
  - Feature Extraction:
    - Analyze audio for amplitude, frequency, or phoneme patterns.
    - Example: Use Fast Fourier Transform (FFT) to detect speech intensity.
  - Mapping to Jaw Movement:
    - Convert audio features to jaw parameters (e.g., higher amplitude = wider jaw opening).
    - Define thresholds for open/close states or smooth transitions.
  - Data Output:
    - Generate animation data (e.g., JSON, CSV) with timestamps and jaw positions.
    - Example output: { time: 0.1s, jaw_angle: 30° }.
  - Integration:
    - Import data into animation tools or game engines for real-time or pre-rendered use.
- Sample Tools and Libraries:
  - Python: librosa for audio analysis, numpy for data processing.
  - Processing: Built-in audio libraries for real-time visualization.
  - Other: Praat for phoneme analysis, Audacity for audio preprocessing.

Example Workflow (Python-based):
import librosa

import numpy as np

# Load audio

audio, sr = librosa.load(“speech.wav”)

# Extract amplitude envelope

envelope = np.abs(librosa.amplitude_to_db(audio))

# Map to jaw movement (simplified)

jaw_data = [{“time”: t/sr, “jaw_open”: min(e/80, 1.0)} for t, e in enumerate(envelope)]

# Save as JSON

import json

with open(“jaw_data.json”, “w”) as f:

json.dump(jaw_data, f)

Integration with Animation:
- Unity: Import JSON/CSV and use scripts to animate character models.
- Blender: Use Python API to apply jaw data to rigged models.

Use Cases:
- Real-time lip-sync for virtual assistants or chatbots.
- Character animation in video games or animated films.
- Interactive installations requiring audio-driven visuals.
Benefits:
- Automates lip-sync, reducing manual animation effort.
- Enhances realism in character interactions.
- Flexible for various audio inputs and animation platforms.
Challenges:
- Accurate phoneme detection requires complex algorithms or ML models.
- Limited precision for nuanced expressions (e.g., emotions).
- Processing high-quality audio can be computationally intensive.
- Integration with animation tools may require custom scripting.
Conclusion: In 2022, creating voice jaw data with a processing script, as outlined by InApps Technology, streamlines audio-driven animation for games and films using tools like Python and Processing, offering automation and realism, though advanced applications demand sophisticated audio analysis and integration expertise.

We’ll Need Some Phrases

For now, the tech talk “act” will be “scripted” between Hedley and myself. It will give the illusion of a conversation. Of course, that’s assuming I can remember MY lines. It should be a great effect and lay the foundation for getting a much more sophisticated artificial intelligence (AI) conversational process up and running. Everything is going to AI, you know.

For a scripted act, we’ll certainly need a few canned audio responses. Naturally, Hedley should sound like a robot. A logical choice for this job is to use eSpeak, which is easily integrated into Linux scripts. It will even send the audio to a .wav file by using the “-w” option. We covered this text-to-speech program a few weeks ago.

Here’s an example.

robnotebook% espeak “My name is Hedley, please help me welcome Doctor Torq to the stage” -v en-us -p 30 -s 200 -k 20 -w intro-drtorq2.wav

robnotebook% espeak “My name is Hedley, please help me welcome Doctor Torq to the stage” –v en–us –p 30 –s 200 –k 20 –w intro–drtorq2.wav

The “-v” option sets the voice to US-English, a pitch (“-p” option) to 30, speed (“-s” option) to 200 and capitalization emphasis (“-k”) to 20.

You should definitely play around with the voices and other settings to find a combination that gives your robot a distinctive and interesting sound.

Down the line, I’ll need to be able to take a text response or some kind of data from an application and convert it into sounds that will come out of the speaker inside Hedley’s mouth. At the same time, the audio will have to be analyzed and the resulting data sent to Hedley’s Arduino-jaw servo subsystem.

I generated several .wav files with various responses for testing purposes. With the .wav files ready, lets now turn our attention to the Processing-based audio analysis program.

Turning a .wav File into Data

The Processing programming language, with its rich set of libraries, makes it easy to play a .wav audio file and analyze the waveforms, in near real-time. This is important because any lag between the sound and the jaw movement ruins the “talking skull” effect. There are a bunch of different functions in Processing for all kinds of sophisticated audio and visual effects.

Also, Processing closely mirrors the code layout for the Arduino. Why not use essentially the same code structure for the Arduino and your visual-audio code on a Linux notebook or Raspberry Pi?

I chose an “amplitude modulation” function to capture the prominent points of the audio wave profile. It gives reasonably realistic jaw movement. You’ll definitely need to tweak settings both on the Processing data analysis side and the Arduino-jaw servo programs for the best effect, with your project.

If you want to get really tricky, you might explore the Fast Fourier Transform (FFT) function, to grab specific parts of the audio waveform. It’s there if you want to give it a try. FFT analyzes the audio according to frequency, so you can precisely tailor your output data for specific sounds. The Wee Little Talker board does a similar thing, although much of the work is done in hardware. Hedley said he was happy speaking with the amplitude function for now.

Here’s the Processing code.

import processing.sound.*; import processing.serial.*; Serial myPort; SoundFile sample1; SoundFile sample2; Amplitude rms; float scale=8; float smooth_factor=.2; float sum; int valpos; int sendpos; int i = 0; int execloop = 1; public void setup() { size(640,360); myPort = new Serial(this, “/dev/ttyUSB0”, 115200); } public void draw() { background(125,255,125); noStroke(); fill(255,0,150); if (execloop == 1) { sample1 = new SoundFile(this, “/home/rob/hedleytheskull/intro-torq1.wav”); sample1.rate(.45); sample1.play(); rms = new Amplitude(this); rms.input(sample1); execloop = 0; } else if (execloop == 2) { sample2 = new SoundFile(this, “/home/rob/hedleytheskull/intro-torq2.wav”); sample2.rate(.45); sample2.play(); rms = new Amplitude(this); rms.input(sample2); execloop = 0; } sum += (rms.analyze() – sum) * smooth_factor; myPort.write(str(int(map((sum * 700), 30, 85, 8, 3)))); myPort.write(‘n’); }

import processing.sound.*;

import processing.serial.*;

Serial myPort;

SoundFile sample1;

SoundFile sample2;

Amplitude rms;

float scale=8;

float smooth_factor=.2;

float sum;

int valpos;

int sendpos;

int i = 0;

int execloop = 1;

public void setup() {

size(640,360);

myPort = new Serial(this, “/dev/ttyUSB0”, 115200);

}

public void draw() {

background(125,255,125);

noStroke();

fill(255,0,150);

if (execloop == 1) {

sample1 = new SoundFile(this, “/home/rob/hedleytheskull/intro-torq1.wav”);

sample1.rate(.45);

sample1.play();

rms = new Amplitude(this);

rms.input(sample1);

execloop = 0;

}

else if (execloop == 2) {

sample2 = new SoundFile(this, “/home/rob/hedleytheskull/intro-torq2.wav”);

sample2.rate(.45);

sample2.play();

rms = new Amplitude(this);

rms.input(sample2);

execloop = 0;

}

sum += (rms.analyze() – sum) * smooth_factor;

myPort.write(str(int(map((sum * 700), 30, 85, 8, 3))));

myPort.write(‘n’);

}

At the top are the usual library and variable initializations. Then the serial line is started, so we can send the resultant analysis data out to move the jaw. The real magic happens down in the draw loop where the audio files are played and then examined with the Amplitude function. The output data is smoothed out a bit and then proportionally mapped into the zero through nine range expected by the Arduino-jaw servo subsystem. Each data point terminates with a newline.

Notice that I used two different .wav files. I simply chose one or the other using the “execloop” variable, at runtime. My plan is to put the audio response file names in a text file and then step through the list as Hedley and I talk. I’ll add a new push button to my wired slide clicker and use that as the trigger when I want to jump to the next phrase. A fake antique microphone will go on top of the clicker but won’t be connected to anything at this point. With a little practice, it should look like we are talking with each other.

I’ve tested Hedley on several friends and family members and they thought his “talking skull” effect looked pretty cool and realistic.

Going Further

There is much to do as we get closer to the upcoming Embedded System Conference (ESC) show at the end of October. Hedley is getting anxious to be back up on stage.

The .wav file list function is next and then I’ll program the extra button to advance through the phrases. Hedley and I will also have to put together several conversations we’ll use for our act. And, of course, there will be lots of rehearsal… so I remember my lines. Our shtick will need to be woven in with the slides.

By-the-way, we’ll be running the slides using LibreOffice from Hedley’s onboard Raspberry Pi 3. Maybe I should start calling him my “skulltop computer.” Talk about your case mod!

Feature image via Pixabay.

InApps is a wholly owned subsidiary of Insight Partners, an investor in the following companies mentioned in this article: Torq.

Source: InApps.net

Rate this post

Phu Nguyen

As a Senior Tech Enthusiast, I bring a decade of experience to the realm of tech writing, blending deep industry knowledge with a passion for storytelling. With expertise in software development to emerging tech trends like AI and IoT—my articles not only inform but also inspire. My journey in tech writing has been marked by a commitment to accuracy, clarity, and engaging storytelling, making me a trusted voice in the tech community.