Placeholder Image

Subtitles section Play video

  • Twitter was set up to support 140 characters. And in the English alphabet, that's easy to

  • understand: a character is a letter, number, space or punctuation mark. People more or

  • less agree with computers there. And if it was twenty years ago, that's exactly how the

  • system would work. That far, no further.

  • But now, we have Unicode.

  • Mind you, it's still fairly straightforward in some languages. East Asian languages, for

  • example - Chinese, Japanese, Korean -- "one character" is a glyph, a number, a space,

  • or a punctuation mark. But since the language is denser -- each of these characters encodes

  • more information than an English character -- you can fit almost twice as much information

  • into each tweet.

  • And then, it gets complicated.

  • Take Arabic, for example. What counts as an Arabic letter? First of all, the shape of

  • Arabic letters change significantly depending on where they are in a word. Watch what happens

  • as I take the Arabic for "Arabic alphabet", and hit backspace. Arabic's right to left,

  • remember. The characters change in order to be consistent with the rules of the written

  • language, and the diacritics disappear separately to the letters they're next to.

  • In Vietnamese, on the other hand? Each of those counts as one character.

  • Backspace, and away they go.

  • It's at this point that most British programmers, myself included, throw up their hands in defeat

  • and just use existing code by some other generous soul who's already worked the problem out.

  • Or if they're lazy, they just say, well, no-one's going to use this who doesn't speak English,

  • so we don't need to worry about it.

  • (MOUTHS) Yes you do.

  • Hmm. Unicode has a single character for some English ligatures, like "ffi" - notice how

  • the letters there are smushed together to make them look better to the eye. Some programs

  • will automatically add those in for you. So you copy and paste your text from that into

  • Twitter, and suddenly you're saving characters.

  • People would count that as three characters. Unicode, and therefore Twitter, and pretty

  • much every computer program? Just one. The greatest example of this I could find is the

  • Arabic for "peace be upon him". Unicode has a single character for this, and Twitter will

  • treat it as counting for just 1 of your 140. Which is handy, if you're a devout Muslim

  • and want to talk about the prophets on Twitter.

  • So. What counts as a character? Well, it's complicated. Computers see things differently

  • to people. And let's be honest: unless you have a professor who's setting their essays

  • by character count instead of word count the only time it'll really matter for most people...

  • is when they're trying to tweet.

  • [Translating these subtitles? Add your name here!]

Twitter was set up to support 140 characters. And in the English alphabet, that's easy to

Subtitles and keywords

B1 INT UK arabic character unicode twitter english alphabet

Why You Can Tweet More In Japanese: What Counts As A Character?

  • 105 1
    Samuel   posted on 2018/01/12
Keywords

Go back to previous version