Placeholder Image

Subtitles section Play video

  • MALE SPEAKER: Today we're very pleased, very happy, to have

  • Luis Von Ahn here today, from Carnegie Mellon University.

  • His talk is on human computation.

  • Luis is a very new assistant professor in computer science

  • at the School of Computer Science at Carnegie Mellon

  • University.

  • He received his Ph.D. in 2005, and I'm told he was the

  • hottest new graduate on the market, with offers from just

  • about every university out there, including corporate

  • offers, too.

  • He received his B.S. from Duke University.

  • He received a Microsoft Research Fellowship Award.

  • His research interests include encouraging people to work for

  • free, as well as catching and thwarting cheaters in online

  • environments.

  • His work has appeared in over a hundred news publications

  • around the world.

  • New York Times, CNN, USA Today, BBC, and

  • the Discovery Channel.

  • Luis holds four patent applications and has licensed

  • technology to major internet companies.

  • Please join me in welcoming Luis Von Ahn.

  • [APPLAUSE]

  • LUIS VON AHN: Can you hear me now?

  • OK.

  • So, I want to start by asking a question to the people in

  • the audience.

  • How many of you have had to fill out a registration form

  • for something?

  • Like Yahoo, Hotmail, or Gmail, or some sort of web form where

  • you've been asked to read a distorted sequence of

  • characters or a distorted word such as this one?

  • How many of you found it annoying?

  • Awesome.

  • OK, well, that was part of my thesis.

  • That thing is called a CAPTCHA, and the reason it's

  • there is to make sure that you, the entity filling out

  • the web form, are actually a human, and not some sort of

  • computer program that was written to submit the form

  • millions and millions of times.

  • The reason it works is because humans--

  • at least non-visually impaired humans--

  • have no trouble reading distorted characters, whereas

  • computer programs simply can't do it as well yet.

  • More generally, a CAPTCHA is just a program that can tell

  • whether its user is a human or a computer.

  • OK, let me say that another way.

  • A CAPTCHA is a program that can generate and grade tests

  • that most humans can pass, but current computer

  • programs can not.

  • Notice the paradox here.

  • A CAPTCHA is a program that can generate and grade tests

  • that it itself cannot pass.

  • So in that way, CAPTCHAs are a lot like some professors.

  • [LAUGHTER]

  • Just to make things crystal clear, let me give you an

  • example of one of these programs that can generate and

  • grade tests that most humans can pass, but current computer

  • programs cannot.

  • Here's how the program works.

  • First, the program picks a random string of letters.

  • O-A-M-G, in this case.

  • Then the program renders the string into a randomly

  • distorted image, and then the program generates a test,

  • which consists of the randomly distorted image and the

  • question, "What are the characters in this image?"

  • CAPTCHAs are used all over the place, for all kinds of

  • things, and I could spend the next hour talking about all

  • the different applications of CAPTCHAs.

  • But since I don't want to do that, I want to illustrate one

  • of the applications through a little story.

  • So a few years ago, Slashdot--

  • which is a very popular website--

  • put up this poll in their site, asking which is the best

  • computer science graduate school in the United States?

  • This is a very dangerous question to ask over the web.

  • As with most online polls, IP addresses of voters were

  • recorded to make sure that each person could only vote,

  • at most, once.

  • However, as soon as the poll went up, students at CMU wrote

  • a program that voted for CMU thousands and

  • thousands of times.

  • The next day, students at MIT wrote their own program.

  • And a few days later, the poll had to be taken down with CMU

  • and MIT having, like, a gazillion votes and every

  • other school having less than 1,000.

  • I guess the poll worked in this case.

  • [LAUGHTER]

  • I'm just kidding.

  • But in general, this is a huge problem.

  • You simply cannot trust the results of an online poll,

  • because anybody could just write a program to vote for

  • their favorite option thousands and

  • thousands of times.

  • One solution is to use a CAPTCHA to make sure that only

  • humans can vote.

  • CAPTCHAs have many, many other applications.

  • Another one is in free email services.

  • For instance, there are several companies that offer

  • free email services--

  • Yahoo, Microsoft, Google--

  • and up until a few years ago, all of them were suffering

  • from a very specific type of attack.

  • It was people who wrote programs to obtain millions of

  • email accounts every day, and the people who wrote these

  • programs were usually spammers.

  • So if you're a spammer and you want to send spam from, say,

  • Yahoo, you run into the problem that each Yahoo

  • account only allows you to sound, like,

  • 100 messages a day.

  • So if you want to send millions of messages a day

  • from Yahoo accounts, you have to own

  • millions of Yahoo accounts.

  • And this is why spammers wrote programs to obtain millions of

  • Yahoo accounts.

  • And the solution--

  • or one solution-- and this is what we originally suggested

  • to Yahoo-- was to use a CAPTCHA to make sure that only

  • humans can obtain free email accounts.

  • Now, since CAPTCHAs are used all over the place to stop

  • spammers from doing bad things, spammers have started

  • coming up with all kinds of dirty hacks to get around the

  • CAPTCHAs that are being used in practice.

  • So let me explain a couple of them.

  • Here's one.

  • I'm sure a lot of you have heard of this.

  • CAPTCHA sweatshops.

  • Spam companies actually are hiring people to solve

  • CAPTCHAs all day long.

  • And they are usually being hired in other countries where

  • the minimum wage is a lot lower, and this

  • is currently happening.

  • But there's at least two consolations.

  • First, it's at least costing them some.

  • So whereas before, they could get the accounts for free, now

  • it costs them a fraction of a cent per account, so they

  • can't get that many.

  • Second, CAPTCHAs are actually generating jobs in

  • underdeveloped countries.

  • [LAUGHTER]

  • So this is one dirty hack.

  • There's an even dirtier hack, and I'm sure a lot of you have

  • heard of it, and this is what some porn companies are

  • allegedly doing.

  • And I'm going to emphasize the word "allegedly." So, porn

  • companies also want to send spam.

  • They also want to break CAPTCHAs, and here's how they

  • are allegedly doing it.

  • They write a program the fills out the entire registration

  • form, say, at Yahoo.

  • And whenever the program gets to the CAPTCHA,

  • it can't solve it.

  • So what it does is it copies the CAPTCHA

  • back to the porn page.

  • Now, back at the porn page, there's a lot of people

  • looking at porn.

  • And suddenly, one of them gets this screen saying, "If you

  • want to see the next picture, you got to tell me what word

  • is in the box below." And you know what people do?

  • They type the word as fast as possible.

  • [LAUGHTER]

  • And by doing so, they are effectively solving the

  • CAPTCHA for the porn company bot.

  • That is, they're effectively obtaining a free

  • email account for them.

  • So pornographers, they're really, really smart.

  • So CAPTCHAs take advantage of human processing power in

  • order to differentiate humans from computers, and it turns

  • out that being able to do so has some very, very nice

  • applications in practice.

  • Now that I've told you about CAPTCHAs, now I can tell you

  • what this talk really is about.

  • This talk is not about CAPTCHAs.

  • This talk is about human computation.

  • Sort of the flipside of CAPTCHAs.

  • The idea is there's a lot of things that humans can easily

  • do that computers cannot yet do.

  • I want to show you how we can solve some of these problems

  • by just making good use of human processing power.

  • And I think the best way to introduce the rest of the talk

  • is with a little statistic, and the statistic is that over

  • 9 billion human hours of Solitaire were played in 2003.

  • 9 billion.

  • Now, some people talk about wasted computer cycles.

  • What about wasted human cycles?

  • Just to give you an idea of how large this number really

  • is, let me give you two other numbers.

  • First is the number of human hours that it took to build

  • the Empire State Building.

  • Turns out it took 7 million human hours to build the

  • entire Empire State Building.

  • That's equivalent to about 6.8 hours of people playing

  • Solitaire around the world.

  • Now, in case you don't think the Empire State Building is a

  • monumental enough task, let me give you another number.

  • The Panama Canal.

  • It turns out it took 20 million human hours to build

  • the entire Panama Canal, and that's equivalent to a little

  • less than a day of people play Solitaire around the world.

  • I want to show how we can make good use of these

  • wasted human cycles.

  • And that is what I mean by human computation.

  • In this talk, we're going to consider the human brain as an

  • extremely advanced processing unit that can solve problems

  • that computers cannot yet solve.

  • Even more, we're going to consider all of humanity as an

  • extremely advanced and large scale distributed processing

  • unit that can solve large scale problems that computers

  • cannot yet solve.

  • I claim that the current relationship between humans

  • and computers is extremely parasitic.

  • We're parasites of computers.

  • What I want to advocate for in this talk is more of a

  • symbiotic relationship, a symbiosis.

  • One in which humans solve some problems, computers solve some

  • other problems, and together we work to

  • create a better world.

  • [LAUGHTER]

  • OK, I'm getting freaky.

  • But more seriously, I want to talk about some problems that

  • computers cannot yet solve, and I want to show you how we

  • can easily solve a lot of these problems by just making

  • good use of human processing power.

  • The first problem that I'm going to talk about is that of

  • labeling images with words.

  • So the problem is as follows.

  • When inputting an arbitrary image, we want to output a set

  • of key words that properly and correctly describe this image.

  • [LAUGHTER]

  • As you should all probably know, this is still a

  • completely open problem in computer vision and artificial

  • intelligence, in the sense that computer programs simply

  • can't do this.

  • However, a method that could accurately label images with

  • words would have several applications, one of which

  • you've probably already seen, and that is image

  • search on the web.

  • So Google, for instance, has Google Images.

  • You can go there, type a word like "dog," and get back a lot

  • of images related to the word "dog." Now, it is the case

  • that there's no computer program out there that can

  • tell you whether an arbitrary image from the web contains a

  • dog or not, so the way Google Images works-- and image

  • search on the web works, roughly--

  • is by using file names in html text.

  • So if you search for "dog," you get back a lot of images

  • named dog.jpg or dog.gif, or that have the word

  • "dog" very near them.

  • Of course, the problem with this method is that it doesn't

  • always work very well.

  • For instance, this is not any more, but it used to be the

  • first page of results for the query "dog" on Google Images.

  • There is an image of a rabbit, there.

  • There's a guy in a blue suit.

  • What the hell?

  • But if we have methods such that for every image on the