Subtitles section Play video Print subtitles MALE SPEAKER: Today we're very pleased, very happy, to have Luis Von Ahn here today, from Carnegie Mellon University. His talk is on human computation. Luis is a very new assistant professor in computer science at the School of Computer Science at Carnegie Mellon University. He received his Ph.D. in 2005, and I'm told he was the hottest new graduate on the market, with offers from just about every university out there, including corporate offers, too. He received his B.S. from Duke University. He received a Microsoft Research Fellowship Award. His research interests include encouraging people to work for free, as well as catching and thwarting cheaters in online environments. His work has appeared in over a hundred news publications around the world. New York Times, CNN, USA Today, BBC, and the Discovery Channel. Luis holds four patent applications and has licensed technology to major internet companies. Please join me in welcoming Luis Von Ahn. [APPLAUSE] LUIS VON AHN: Can you hear me now? OK. So, I want to start by asking a question to the people in the audience. How many of you have had to fill out a registration form for something? Like Yahoo, Hotmail, or Gmail, or some sort of web form where you've been asked to read a distorted sequence of characters or a distorted word such as this one? How many of you found it annoying? Awesome. OK, well, that was part of my thesis. That thing is called a CAPTCHA, and the reason it's there is to make sure that you, the entity filling out the web form, are actually a human, and not some sort of computer program that was written to submit the form millions and millions of times. The reason it works is because humans-- at least non-visually impaired humans-- have no trouble reading distorted characters, whereas computer programs simply can't do it as well yet. More generally, a CAPTCHA is just a program that can tell whether its user is a human or a computer. OK, let me say that another way. A CAPTCHA is a program that can generate and grade tests that most humans can pass, but current computer programs can not. Notice the paradox here. A CAPTCHA is a program that can generate and grade tests that it itself cannot pass. So in that way, CAPTCHAs are a lot like some professors. [LAUGHTER] Just to make things crystal clear, let me give you an example of one of these programs that can generate and grade tests that most humans can pass, but current computer programs cannot. Here's how the program works. First, the program picks a random string of letters. O-A-M-G, in this case. Then the program renders the string into a randomly distorted image, and then the program generates a test, which consists of the randomly distorted image and the question, "What are the characters in this image?" CAPTCHAs are used all over the place, for all kinds of things, and I could spend the next hour talking about all the different applications of CAPTCHAs. But since I don't want to do that, I want to illustrate one of the applications through a little story. So a few years ago, Slashdot-- which is a very popular website-- put up this poll in their site, asking which is the best computer science graduate school in the United States? This is a very dangerous question to ask over the web. As with most online polls, IP addresses of voters were recorded to make sure that each person could only vote, at most, once. However, as soon as the poll went up, students at CMU wrote a program that voted for CMU thousands and thousands of times. The next day, students at MIT wrote their own program. And a few days later, the poll had to be taken down with CMU and MIT having, like, a gazillion votes and every other school having less than 1,000. I guess the poll worked in this case. [LAUGHTER] I'm just kidding. But in general, this is a huge problem. You simply cannot trust the results of an online poll, because anybody could just write a program to vote for their favorite option thousands and thousands of times. One solution is to use a CAPTCHA to make sure that only humans can vote. CAPTCHAs have many, many other applications. Another one is in free email services. For instance, there are several companies that offer free email services-- Yahoo, Microsoft, Google-- and up until a few years ago, all of them were suffering from a very specific type of attack. It was people who wrote programs to obtain millions of email accounts every day, and the people who wrote these programs were usually spammers. So if you're a spammer and you want to send spam from, say, Yahoo, you run into the problem that each Yahoo account only allows you to sound, like, 100 messages a day. So if you want to send millions of messages a day from Yahoo accounts, you have to own millions of Yahoo accounts. And this is why spammers wrote programs to obtain millions of Yahoo accounts. And the solution-- or one solution-- and this is what we originally suggested to Yahoo-- was to use a CAPTCHA to make sure that only humans can obtain free email accounts. Now, since CAPTCHAs are used all over the place to stop spammers from doing bad things, spammers have started coming up with all kinds of dirty hacks to get around the CAPTCHAs that are being used in practice. So let me explain a couple of them. Here's one. I'm sure a lot of you have heard of this. CAPTCHA sweatshops. Spam companies actually are hiring people to solve CAPTCHAs all day long. And they are usually being hired in other countries where the minimum wage is a lot lower, and this is currently happening. But there's at least two consolations. First, it's at least costing them some. So whereas before, they could get the accounts for free, now it costs them a fraction of a cent per account, so they can't get that many. Second, CAPTCHAs are actually generating jobs in underdeveloped countries. [LAUGHTER] So this is one dirty hack. There's an even dirtier hack, and I'm sure a lot of you have heard of it, and this is what some porn companies are allegedly doing. And I'm going to emphasize the word "allegedly." So, porn companies also want to send spam. They also want to break CAPTCHAs, and here's how they are allegedly doing it. They write a program the fills out the entire registration form, say, at Yahoo. And whenever the program gets to the CAPTCHA, it can't solve it. So what it does is it copies the CAPTCHA back to the porn page. Now, back at the porn page, there's a lot of people looking at porn. And suddenly, one of them gets this screen saying, "If you want to see the next picture, you got to tell me what word is in the box below." And you know what people do? They type the word as fast as possible. [LAUGHTER] And by doing so, they are effectively solving the CAPTCHA for the porn company bot. That is, they're effectively obtaining a free email account for them. So pornographers, they're really, really smart. So CAPTCHAs take advantage of human processing power in order to differentiate humans from computers, and it turns out that being able to do so has some very, very nice applications in practice. Now that I've told you about CAPTCHAs, now I can tell you what this talk really is about. This talk is not about CAPTCHAs. This talk is about human computation.