Feeling that the previous vocoding lacked depth, I decided to try a different approach this time by using an audio file in another language for vocoding. Additionally, I plan to fine-tune various effects and vocoder parameters to create a more convincing illusion that the tombstone or banknote is actually speaking.
First, I needed to extract video footage of Itō Hirobumi from media such as dramas or films and remove the background noise. Using deep learning-based AI technology makes it relatively easy to separate a human voice from background sounds. This process involves AI breaking down the audio data into smaller segments, isolating frequencies, and analyzing spectrograms to distinguish between different sound elements. By leveraging this, I was able to extract only the spoken lines of the actor portraying Itō Hirobumi, removing any interfering background noise.

For this attempt, I put more effort into achieving precise vocoding. First, I processed the original audio file using a compressor effect to refine the sound. Then, I used an equalizer to remove unnecessary low-end frequencies and adjust levels in different frequency ranges. After this, I applied the vocoder effect, aiming for a cleaner and more polished sound. The reason I had to shape the sound’s amplitude and frequency using a compressor and equalizer was that unstable sounds in the original recording could bypass the vocoder’s filters, causing unintended artifacts. This interference would make it difficult to properly apply the intended pitch and tonal characteristics of the voice, so this step was crucial. Lastly, I used a utility effect to fine-tune the overall gain and volume levels for better control.
The sound produced this time turned out quite well. It felt engaging rather than monotonous, and I was able to make it resemble human speech more closely. Additionally, I applied a reverb effect to the sound generated from money, adding a sense of spatial depth. Since the result was quite satisfactory, I plan to discuss it with other collaborators to decide on the next steps.
For now, I will create a test video using this sound, integrate the audio into the footage, and gather feedback to evaluate its overall impact. Based on that, I will refine the direction of the sound design.