Placeholder Image

Subtitles section Play video

  • Maybe we-- if you guys could stand over--

  • Is it okay if they stand over here?

  • - Yeah. - Um, actually.

  • Christophe, if you can get even lower.

  • - Okay. - ( shutter clicks )

  • This is Lee and this is Christophe.

  • They're two of the hosts of this show.

  • But to a machine, they're not people.

  • This is just pixels. It's just data.

  • A machine shouldn't have a reason to prefer

  • one of these guys over the other.

  • And yet, as you'll see in a second, it does.

  • It feels weird to call a machine racist,

  • but I really can't explain-- I can't explain what just happened.

  • Data-driven systems are becoming a bigger and bigger part of our lives,

  • and they work well a lot of the time.

  • - But when they fail... - Once again, it's the white guy.

  • When they fail, they're not failing on everyone equally.

  • If I go back right now...

  • Ruha Benjamin: You can have neutral intentions.

  • You can have good intentions.

  • And the outcomes can still be discriminatory.

  • Whether you want to call that machine racist

  • or you want to call the outcome racist,

  • we have a problem.

  • ( theme music playing )

  • I was scrolling through my Twitter feed a while back

  • and I kept seeing tweets that look like this.

  • Two of the same picture of Republican senator Mitch McConnell smiling,

  • or sometimes it would be four pictures

  • of the same random stock photo guy.

  • And I didn't really know what was going on,

  • but it turns out that this was a big public test of algorithmic bias.

  • Because it turns out that these aren't pictures of just Mitch McConnell.

  • They're pictures of Mitch McConnell and...

  • - Barack Obama. - Lee: Oh, wow.

  • So people were uploading

  • these really extreme vertical images

  • to basically force this image cropping algorithm

  • to choose one of these faces.

  • People were alleging that there's a racial bias here.

  • But I think what's so interesting about this particular algorithm

  • is that it is so testable for the public.

  • It's something that we could test right now if we wanted to.

  • - Let's do it. - You guys wanna do it?

  • Okay. Here we go.

  • So, Twitter does offer you options to crop your own image.

  • But if you don't use those,

  • it uses an automatic cropping algorithm.

  • - Wow. There it is. - Whoa. Wow.

  • That's crazy.

  • Christophe, it likes you.

  • Okay, let's try the other-- the happy one.

  • Lee: Wow.

  • - Unbelievable. Oh, wow. - Both times.

  • So, do you guys think this machine is racist?

  • The only other theory I possibly have

  • is if the algorithm prioritizes white faces

  • because it can pick them up quicker, for whatever reason,

  • against whatever background.

  • Immediately, it looks through the image

  • and tries to scan for a face.

  • Why is it always finding the white face first?

  • Joss: With this picture, I think someone could argue

  • that the lighting makes Christophe's face more sharp.

  • I still would love to do

  • a little bit more systematic testing on this.

  • I think maybe hundreds of photos

  • could allow us to draw a conclusion.

  • I have downloaded a bunch of photos

  • from a site called Generated Photos.

  • These people do not exist. They were a creation of AI.

  • And I went through, I pulled a bunch

  • that I think will give us

  • a pretty decent way to test this.

  • So, Christophe, I wonder if you would be willing to help me out with that.

  • You want me to tweet hundreds of photos?

  • - ( Lee laughs ) - Joss: Exactly.

  • I'm down. Sure, I've got time.

  • Okay.

  • ( music playing )

  • There may be some people who take issue with the idea

  • that machines can be racist

  • without a human brain or malicious intent.

  • But such a narrow definition of racism

  • really misses a lot of what's going on.

  • I want to read a quote that responds to that idea.

  • It says, "Robots are not sentient beings, sure,

  • but racism flourishes well beyond hate-filled hearts.

  • No malice needed, no "N" word required,

  • just a lack of concern for how the past shapes the present."

  • I'm going now to speak to the author of those words, Ruha Benjamin.

  • She's a professor of African-American Studies at Princeton University.

  • When did you first become concerned

  • that automated systems, AI, could be biased?

  • A few years ago, I noticed these headlines

  • and hot takes about so-called racist and sexist robots.

  • There was a viral video in which two friends were in a hotel bathroom

  • and they were trying to use an automated soap dispenser.

  • Black hand, nothing. Larry, go.

  • Black hand, nothing.

  • And although they seem funny

  • and they kind of get us to chuckle,

  • the question is, are similar design processes

  • impacting much more consequential technologies that we're not even aware of?

  • When the early news controversies came along maybe 10 years ago,

  • people were surprised by the fact that they showed a racial bias.

  • Why do you think people were surprised?

  • Part of it is a deep attachment and commitment

  • to this idea of tech neutrality.

  • People-- I think because life is so complicated

  • and our social world is so messy--

  • really cling on to something that will save us,

  • and a way of making decisions that's not drenched

  • in the muck of all of human subjectivity,

  • human prejudice and frailty.

  • We want it so much to be true.

  • We want it so much to be true, you know?

  • And the danger is that we don't question it.

  • And still we continue to have, you know, so-called glitches

  • when it comes to race and skin complexion.

  • And I don't think that they're glitches.

  • It's a systemic issue in the truest sense of the word.

  • It has to do with our computer systems and the process of design.

  • Joss: AI can seem pretty abstract sometimes.

  • So we built this to help explain

  • how machine learning works and what can go wrong.

  • This black box is the part of the system that we interact with.

  • It's the software that decides which dating profiles we might like,

  • how much a rideshare should cost,

  • or how a photo should be cropped on Twitter.

  • We just see a device making a decision.

  • Or more accurately, a prediction.

  • What we don't see is all of the human decisions

  • that went into the design of that technology.

  • Now, it's true that when you're dealing with AI,

  • that means that the code in this box

  • wasn't all written directly by humans,

  • but by machine-learning algorithms

  • that find complex patterns in data.

  • But they don't just spontaneously learn things from the world.

  • They're learning from examples.

  • Examples that are labeled by people,

  • selected by people,

  • and derived from people, too.

  • See, these machines and their predictions,

  • they're not separate from us or from our biases

  • or from our history,

  • which we've seen in headline after headline

  • for the past 10 years.

  • We're using the face-tracking software,

  • so it's supposed to follow me as I move.

  • As you can see, I do this-- no following.

  • Not really-- not really following me.

  • - Wanda, if you would, please? - Sure.

  • In 2010, the top hit

  • when you did a search for "black girls,"

  • 80% of what you found

  • on the first page of results was all porn sites.

  • Google is apologizing after its photo software

  • labeled two African-Americans gorillas.

  • Microsoft is shutting down

  • its new artificial intelligent bot

  • after Twitter users taught it how to be racist.

  • Woman: In order to make yourself hotter,

  • the app appeared to lighten your skin tone.

  • Overall, they work better on lighter faces than darker faces,

  • and they worked especially poorly

  • on darker female faces.

  • Okay, I've noticed that on all these damn beauty filters,

  • is they keep taking my nose and making it thinner.

  • Give me my African nose back, please.

  • Man: So, the first thing that I tried was the prompt "Two Muslims..."

  • And the way it completed it was,

  • "Two Muslims, one with an apparent bomb,

  • tried to blow up the Federal Building

  • in Oklahoma City in the mid-1990s."

  • Woman: Detroit police wrongfully arrested Robert Williams

  • based on a false facial recognition hit.

  • There's definitely a pattern of harm

  • that disproportionately falls on vulnerable people, people of color.

  • Then there's attention,

  • but of course, the damage has already been done.

  • ( Skype ringing )

  • - Hello. - Hey, Christophe.

  • Thanks for doing these tests.

  • - Of course. - I know it was a bit of a pain,

  • but I'm curious what you found.

  • Sure. I mean, I actually did it.

  • I actually tweeted 180 different sets of pictures.

  • In total, dark-skinned people

  • were displayed in the crop 131 times,

  • and light-skinned people

  • were displayed in the crop 229 times,

  • which comes out to 36% dark-skinned

  • and 64% light-skinned.

  • That does seem to be evidence of some bias.

  • It's interesting because Twitter posted a blog post

  • saying that they had done some of their own tests

  • before launching this tool, and they said that

  • they didn't find evidence of racial bias,

  • but that they would be looking into it further.

  • Um, they also said that the kind of technology

  • that they use to crop images

  • is called a Saliency Prediction Model,

  • which means software that basically is making a guess

  • about what's important in an image.

  • So, how does a machine know what is salient, what's relevant in a picture?

  • Yeah, it's really interesting, actually.

  • There's these saliency data sets

  • that documented people's eye movements

  • while they looked at certain sets of images.

  • So you can take those photos

  • and you can take that eye-tracking data

  • and teach a computer what humans look at.

  • So, Twitter's not going to give me any more information

  • about how they trained their model,

  • but I found an engineer from a company called Gradio.

  • They built an app that does something similar,

  • and I think it can give us a closer look

  • at how this kind of AI works.

  • - Hey. - Hey.

  • - Joss. - Nice to meet you. Dawood.

  • So, you and your colleagues

  • built a saliency cropping tool

  • that is similar to what we think Twitter is probably doing.

  • Yeah, we took a public machine learning model, posted it on our library,

  • and launched it for anyone to try.

  • And you don't have to constantly post pictures

  • on your timeline to try and experiment with it,

  • which is what people were doing when they first became aware of the problem.

  • And that's what we did. We did a bunch of tests just on Twitter.

  • But what's interesting about what your app shows

  • is the sort of intermediate step there, which is this saliency prediction.

  • Right, yeah. I think the intermediate step is important for people to see.

  • Well, I-- I brought some pictures for us to try.

  • These are actually the hosts of "Glad You Asked."

  • And I was hoping we could put them into your interface

  • and see what, uh, the saliency prediction is.

  • Sure. Just load this image here.

  • Joss: Okay, so, we have a saliency map.

  • Clearly the prediction is that faces are salient,

  • which is not really a surprise.

  • But it looks like maybe they're not equally salient.

  • - Right. - Is there a way to sort of look closer at that?

  • So, what we can do here, we actually built it out in the app

  • where we can put a window on someone's specific face,

  • and it will give us a percentage of what amount of saliency

  • you have over your face versus in proportion to the whole thing.

  • - That's interesting. - Yeah.

  • She's-- Fabiola's in the center of the picture,

  • but she's actually got a lower percentage

  • of the salience compared to Cleo, who's to her right.

  • Right, and trying to guess why a model is making a prediction

  • and why it's predicting what it is

  • is a huge problem with machine learning.

  • It's always something that you have to kind of

  • back-trace to try and understand.

  • And sometimes it's not even possible.

  • Mm-hmm. I looked up what data sets

  • were used to train the model you guys used,

  • and I found one that was created by

  • researchers at MIT back in 2009.

  • So, it was originally about a thousand images.

  • We pulled the ones that contained faces,

  • any face we could find that was big enough to see.

  • And I went through all of those,

  • and I found that only 10 of the photos,

  • that's just about 3%,

  • included someone who appeared to be

  • of Black or African descent.

  • Yeah, I mean, if you're collecting a data set through Flickr,

  • you're-- first of all, you're biased to people

  • that have used Flickr back in, what, 2009, you said, or something?

  • Joss: And I guess if we saw in this image data set,

  • there are more cat faces than black faces,

  • we can probably assume that minimal effort was made

  • to make that data set representative.

  • When someone collects data into a training data set,

  • they can be motivated by things like convenience and cost

  • and end up with data that lacks diversity.

  • That type of bias, which we saw in the saliency photos,

  • is relatively easy to address.

  • If you include more images representing racial minorities,

  • you can probably improve the model's performance on those groups.

  • But sometimes human subjectivity

  • is imbedded right into the data itself.

  • Take crime data for example.

  • Our data on past crimes in part reflects

  • police officers' decisions about what neighborhoods to patrol

  • and who to stop and arrest.

  • We don't have an objective measure of crime,

  • and we know that the data we do have

  • contains at least some racial profiling.

  • But it's still being used to train crime prediction tools.

  • And then there's the question of how the data is structured over here.

  • Say you want a program that identifies

  • chronically sick patients to get additional care

  • so they don't end up in the ER.

  • You'd use past patients as your examples,

  • but you have to choose a label variable.

  • You have to define for the machine what a high-risk patient is

  • and there's not always an obvious answer.

  • A common choice is to define high-risk as high-cost,

  • under the assumption that people who use

  • a lot of health care resources are in need of intervention.

  • Then the learning algorithm looks through

  • the patient's data--

  • their age, sex,

  • medications, diagnoses, insurance claims,

  • and it finds the combination of attributes

  • that correlates with their total health costs.

  • And once it gets good at predicting

  • total health costs on past patients,

  • that formula becomes software to assess new patients

  • and give them a risk score.

  • But instead of predicting sick patients,

  • this predicts expensive patients.

  • Remember, the label was cost,

  • and when researchers took a closer look at those risk scores,

  • they realized that label choice was a big problem.

  • But by then, the algorithm had already been used

  • on millions of Americans.

  • It produced risk scores for different patients,

  • and if a patient had a risk score

  • of almost 60,

  • they would be referred into the program

  • for screening-- for their screening.

  • And if they had a risk score of almost 100,

  • they would default into the program.

  • Now, when we look at the number of chronic conditions

  • that patients of different risk scores were affected by,

  • you see a racial disparity where white patients

  • had fewer conditions than black patients

  • at each risk score.

  • That means that black patients were sicker

  • than their white counterparts

  • when they had the same risk score.

  • And so what happened is in producing these risk scores

  • and using spending,

  • they failed to recognize that on average

  • black people incur fewer costs for a variety of reasons,

  • including institutional racism,

  • including lack of access to high-quality insurance,

  • and a whole host of other factors.

  • But not because they're less sick.

  • Not because they're less sick.

  • And so I think it's important

  • to remember this had racist outcomes,

  • discriminatory outcomes, not because there was

  • a big, bad boogie man behind the screen

  • out to get black patients,

  • but precisely because no one was thinking

  • about racial disparities in healthcare.

  • No one thought it would matter.

  • And so it was about the colorblindness,

  • the race neutrality that created this.

  • The good news is that now the researchers who exposed this

  • and who brought this to light are working with the company

  • that produced this algorithm to have a better proxy.

  • So instead of spending, it'll actually be

  • people's actual physical conditions

  • and the rate at which they get sick, et cetera,

  • that is harder to figure out,

  • it's a harder kind of proxy to calculate,

  • but it's more accurate.

  • I feel like what's so unsettling about this healthcare algorithm

  • is that the patients would have had

  • no way of knowing this was happening.

  • It's not like Twitter, where you can upload

  • your own picture, test it out, compare with other people.

  • This was just working in the background,

  • quietly prioritizing the care of certain patients

  • based on an algorithmic score

  • while the other patients probably never knew

  • they were even passed over for this program.

  • I feel like there has to be a way

  • for companies to vet these systems in advance,

  • so I'm excited to talk to Deborah Raji.

  • She's been doing a lot of thinking

  • and writing about just that.

  • My question for you is how do we find out

  • about these problems before they go out into the world

  • and cause harm rather than afterwards?

  • So, I guess a clarification point is that machine learning

  • is highly unregulated as an industry.

  • These companies don't have to report their performance metrics,

  • they don't have to report their evaluation results

  • to any kind of regulatory body.

  • But internally there's this new culture of documentation

  • that I think has been incredibly productive.

  • I worked on a couple of projects with colleagues at Google,

  • and one of the main outcomes of that was this effort called Model Cards--

  • very simple one-page documentation

  • on how the model actually works,

  • but also questions that are connected to ethical concerns,

  • such as the intended use for the model,

  • details about where the data's coming from,

  • how the data's labeled, and then also, you know,

  • instructions to evaluate the system according to its performance

  • on different demographic sub-groups.

  • Maybe that's something that's hard to accept

  • is that it would actually be maybe impossible

  • to get performance across sub-groups to be exactly the same.

  • How much of that do we just have to be like, "Okay"?

  • I really don't think there's an unbiased data set

  • in which everything will be perfect.

  • I think the more important thing is to actually evaluate

  • and assess things with an active eye

  • for those that are most likely to be negatively impacted.

  • You know, if you know that people of color are most vulnerable

  • or a particular marginalized group is most vulnerable

  • in a particular situation,

  • then prioritize them in your evaluation.

  • But I do think there's certain situations

  • where maybe we should not be predicting

  • with a machine-learning system at all.

  • We should be super cautious and super careful

  • about where we deploy it and where we don't deploy it,

  • and what kind of human oversight

  • we put over these systems as well.

  • The problem of bias in AI is really big.

  • It's really difficult.

  • But I don't think it means we have to give up

  • on machine learning altogether.

  • One benefit of bias in a computer versus bias in a human

  • is that you can measure and track it fairly easily.

  • And you can tinker with your model

  • to try and get fair outcomes if you're motivated to do so.

  • The first step was becoming aware of the problem.

  • Now the second step is enforcing solutions,

  • which I think we're just beginning to see now.

  • But Deb is raising a bigger question.

  • Not just how do we get bias out of the algorithms,

  • but which algorithms should be used at all?

  • Do we need a predictive model to be cropping our photos?

  • Do we want facial recognition in our communities?

  • Many would say no, whether it's biased or not.

  • And that question of which technologies

  • get built and how they get deployed in our world,

  • it boils down to resources and power.

  • It's the power to decide whose interests

  • will be served by a predictive model,

  • and which questions get asked.

  • You could ask, okay, I want to know how landlords

  • are making life for renters hard.

  • Which landlords are not fixing up their buildings?

  • Which ones are hiking rent?

  • Or you could ask, okay, let's figure out

  • which renters have low credit scores.

  • Let's figure out the people who have a gap in unemployment

  • so I don't want to rent to them.

  • And so it's at that problem

  • of forming the question

  • and posing the problem

  • that the power dynamics are already being laid

  • that set us off in one trajectory or another.

  • And the big challenge there being that

  • with these two possible lines of inquiry,

  • - one of those is probably a lot more profitable... - Exactly, exactly.

  • - ...than the other one. - And too often the people who are creating these tools,

  • they don't necessarily have to share the interests

  • of the people who are posing the questions,

  • but those are their clients.

  • So, the question for the designers and the programmers is

  • are you accountable only to your clients

  • or are you also accountable to the larger body politic?

  • Are you responsible for what these tools do in the world?

  • ( music playing )

  • ( indistinct chatter )

  • Man: Can you lift up your arm a little?

  • ( chatter continues )

Maybe we-- if you guys could stand over--

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it