Subtitles section Play video
This is Animalese.
KK Slider: [unintelligible babble]
It's the pseudo-language of Animal Crossing. This is what
it sounds like in the Japanese version:
KK Slider: [unintelligible babble]
And here's what it sounds like in the
English version:
KK Slider: [slightly deeper unintelligible babble]
They sound different, which is weird because it's supposed to
be nonsense, right?
So... why does Nintendo dub KK Slider?
To understand we need to
do a taxonomy of all the ways games have tried to represent -- or avoid representing --
human speech.
In the beginning, there was the Word, and the Word was:
Oop! That... was supposed to say voice synthesis.
Early attempts at adding audio to games were a
mix of pre-recorded voices and genuine voice synthesizers. But they were mostly
gimmicky, expensive add-ons. Voice chips made more sense in arcade machines
because they were already a huge investment of space and money -- but it was
still a technical struggle to get them to work. Like Q*bert, known for his mad ups
and foul mouth, this drop of Tang was originally supposed
to speak English instead of:
Q*Bert: [garbled synthetic phonemes]
but audio engineer David Thiel couldn't get the voice chip to
produce the sounds he was hoping for. So instead of continuing to mess with it,
he just said [Q*bert garble curse] and had it string together some incoherent phonemes instead.
Thiel, like many designers that followed, came to the conclusion that
human voices just weren't worth the fuss. Other developers opted for a style
that's entirely unique to video games, and I looked around but I couldn't find
a single definitive phrase used to describe this style -- which I think speaks
to how much we take it for granted, even though it is super weird.
I'm talking about using nonsensical sound effects to stand in for language, or simply put
[slow, low beeps that appear in time with the words]
[very high-pitched piercing beeps that appear in time with the words]
[sharp, high-pitch beeps that appear in time with the words]
for the purpose of this video -- and because it's cool to name things --
I'm going to call this beep speech.
The earliest examples of beep speech I could
find were in JRPGs like Star Arthur Legend - Planet Mephius [short pattern of beeps]
and Legend of Zelda [mid-pitch beeps]
Some American games used a similar trope of mimicking on-screen text, but
it's not meant to stand in for a voice so it's not quite the same.
That distinction is important because of beep speech's peculiar function; games that
use beep speech slowly reveal text and accompany each word with audio, which
makes the player process information as if they were really listening to somebody speak.
It's not a straight info-dump; it replicates the act of listening,
which makes it easier to stay engaged with the written text. That's
assuming you enjoy listening to bebe bebe be bebeep which is a great weakness
of the beep speech of the cartridge era. Because audio capabilities were still
limited, most games use the same beep for every character in every situation. Later
games - including Animal Crossing - could pitch the beeps higher or lower, and that
really helped spice things up.
Then there were games like Star Fox which gave each
character a different kind of "voice" so you could easily distinguish your kind
friends Slippy [synthetic sounds similar to frog croaks]
from that no-good hotshot Falco [deeper babble]
These were synthetic voices and
total nonsense with no real association with the text. Another strategy was to
use vocal grunts -- things like sighs and yells and other non-language forms
of communication. These were great for adding variety,
conveying emotion, and giving a character a voice without giving them language.
Although they use different strategies Star Fox and Ocarina of Time have two
weird things in common:
first of all, both have friendly frogs that never get their due.
[rhythmic frog croaking]
Second of all, both have English language lines even in the original Japanese versions.
Navi in Ocarina of Time: Hello!
The [GOOD LUCK] and [HEY, LISTEN] were the same in every
version of the games. And that points to one of the biggest strengths of beep speech
and vocal grunts: you DON'T have to translate them. A shiver is a shiver
in every language.
Link in Ocarina of Time: [shivers]
Localizing a game was - and is - a huge expenditure of time and
money, which makes these non-voice options the perfect replacement for
voice lines. Quality localization is basically a requirement for most games
now, but the 90s and early 2000s were a dark time for translations and voice
acting alike, leaving us with such gems as:
Dracula in Castlevania: What is a man?!
Barry in Resident Evil: A Jill Sandwich!
And that's when localization happened at all - sorry Earthbound fans. During this period,
beep speech was usually a stand-in for a real language. But Banjo Kazooie made a
huge innovation in that their gibberish was... just what it was.
Bottles: [a gentle honking]
Like Star Fox, the characters
had distinct voices - but they weren't synthetic. They were powered by
real human pipes, which is wild because it's human voices replicating a
synthetic style, that was made to replace human voices, like an aural ouroboros.
An auralboros.
Plenty of games of this era had full voice acting... but they weren't
on the N64. Nintendo's insistence on using cartridges would continue to
restrict their options for speech representation. On other consoles, games
became more invested in the cinematic experience of having characters say real shit.
Which mean a greater investment in voice acting. That caused a split in style where beep speech,
previously just fine for serious stories, came to represent a more lighthearted
cartoonish feel.
Mushi's Mama in Okami: [soft mid-tone beeps]
By the early 2000s, a trend emerged of
entirely fictional spoken languages. Whereas beep speech stood in for the
player's native speech, these constructed languages were more about making certain
characters and settings appear foreign -- while still empowering the player to
understand what they're saying. It's during this period that
both Animal Crossing and The Sims arrived.
Simlish actually predates The Sims; it first
appeared in Simcopter. But for The Sims, the team at Maxis knew they'd need
something more elaborate. Because the game was so much about the human
condition, they wanted to communicate emotions which would encourage players
to connect with their creations. Plus the practical considerations - anything that's
comprehensible can become repetitive, and having a huge scroll of dialogue meant
writing, translating, and redubbing a huge scroll of dialogue. Following the style
of Banjo Kazooie, they captured the real human voices of two improvisers and then
spent a year remix that audio to become the perfect blend of nonsense.
Sim 1: Dag dag aulf, Sim 2: Anamana blastamana
But that strategy can't work with every franchise. Animal Crossing had different
intentions and different styles, and so they needed a different approach. When
you hear Animalese for the first time, it sounds a lot like a standard
voice synthesis. But KK Slider is actually saying REAL WORDS.
Here he is slowed down:
KK Slider: [deeper and slower than normal, words that match the text box identifiable]
The synthetic voice doesn't exactly nail the pronunciation of each word, but
that works to its advantage; once it's sped back up, it's even harder to tell
that KK is speaking English. Dōbutsu no Mori, the original n64 Japanese
version of Animal Crossing, features Animalese in Japanese. Region-specific
Animalese is also the default language in New Leaf... but not Wild World or City
Folk. Instead they use a pretty standard sounding voice synthesis called Bebebese.
That's because Animal Crossing was never intended to be localized. In fact,
Nintendo didn't localize the first version of Animal Crossing; the American
release was based on the updated GameCube game Dōbutsu no Mori+.
Members of the Nintendo treehouse had to advocate for it to be translated,
partially because they had already gotten addicted to playing it. Because
they never intended to localize the game, Nintendo included a lot of specific
Japanese cultural elements, including of course the language. All of those had to
be changed in the American version because the style of translation at the
time called for completely eradicating any hint of a foreign culture. The
prevailing notion was that American audiences didn't want anything that had
what cultural theorist Kōichi Iwabuchi called "cultural odor,"
a phrase I hate to say out loud but have to respect the usefulness of. The localizers for the
first Animal Crossing did an amazing job replacing content and adding new
events for American audiences -- so much so that their game was actually real localized
back into Japanese and released for the Gamecube as Dōbutsu no Mori e+.
So when it came time to make Animal Crossing: Wild World, Nintendo needed a
localization strategy from the start.
And that strategy was to make a game with no regionality at ALL.
No cherry blossoms.
No Halloween.
And no regional Animalese.
The Bebebese of Wild World stuck around for City Folk,
but by New Leaf, Animalese had made a triumphant,
multilingual return. Why?
Well!
I don't know.
But my theory is this! City Folk got
a lot of criticism for being too similar to early entries in the franchise. The
next game had to distinguish itself significantly to avoid another letdown.
Aya Kyogoku, who co-directed New Leaf alongside Isao Moro,
viewed the game as a tool to communicate with both animal characters and other
players. So it made sense that the communication in game would be more
elaborate than Bebebese.
But doing full voice acting would have
been exorbitantly expensive;
City Folk had a huge script around, 640,000 words.
For perspective, Infinite Jest clocks in about 483,000 words,
so this cute little game about bugs and letters has it beat
by over a hundred thousand words, and that means
it's better.
On top of that it's just plain science that when creatures speak in adorable baby talk
they're cute and you just want to squish their widdle faces.
All of that probably