Placeholder Image

Subtitles section Play video

  • Hi, I'm Jabril and welcome back to CrashCourse AI.

  • Algorithms are just math and code, but algorithms are created by people and use our data, so

  • biases that exist in the real world are mimicked or even exaggerated by AI systems.

  • This idea is called algorithmic bias.

  • Bias isn't inherently a terrible thing.

  • Our brains try to take shortcuts by finding patterns in data.

  • So if you've only seen small, tiny dogs, you might see a Great Dane and be likeWhoa

  • that dog is unnatural

  • This doesn't become a problem unless we don't acknowledge exceptions to patterns

  • or unless we start treating certain groups of people unfairly.

  • As a society, we have laws to prevent discrimination based on certainprotected classes” (like

  • gender, race, or age) for things like employment or housing.

  • So it's important to be aware of the difference between bias, which we all have, and discrimination,

  • which we can prevent.

  • And knowing about algorithmic bias can help us steer clear of a future where AI are used

  • in harmful, discriminatory ways.

  • INTRO

  • There are at least 5 types of algorithmic bias we should pay attention to.

  • First, training data can reflect hidden biases in society.

  • For example, if an AI was trained on recent news articles or books, the wordnurse

  • is more likely to refer to a “woman,” while the wordprogrammeris more likely

  • to refer to a “man.”

  • And you can see this happening with a Google image search: “nurseshows mostly women,

  • whileprogrammermostly shows mostly men.

  • We can see how hidden biases in the data gets embedded in search engine AI.

  • Of course, we know there are male nurses and female programmers and non-binary people doing

  • both of these jobs!

  • For example, an image search forprogrammer 1960” shows a LOT more women.

  • But AI algorithms aren't very good at recognizing cultural biases that might change over time,

  • and they could even be spreading hidden biases to more human brains.

  • t's also tempting to think that if we just don't collect or use training data that

  • categorizes protected classes like race or gender, then our algorithms can't possibly

  • discriminate.

  • But, protected classes may emerge as correlated features, which are features that aren't

  • explicitly in data but may be unintentionally correlated to a specific prediction.

  • For example, because many places in the US are still extremely segregated, zip code can

  • be strongly correlated to race.

  • A record of purchases can be strongly correlated to gender.

  • And a controversial 2017 paper showed that sexual orientation is strongly correlated

  • with characteristics of a social media profile photo.

  • Second, the training data may not have enough examples of each class, which can affect the

  • accuracy of predictions.

  • For example, many facial recognition AI algorithms are trained on data that includes way more

  • examples of white peoples' faces than other races.

  • One story that made the news a few years ago is a passport photo checker with an AI system

  • to warn if the person in the photo had blinked.

  • But the system had a lot of trouble with photos of people of Asian descent.

  • Being asked to take a photo again and again would be really frustrating if you're just

  • trying to renew your passport, which is already sort of a pain!

  • Or, let's say, you got a cool gig programming a drone for IBMbut it has trouble recognizing

  • your face because your skin's too darkfor example.

  • Third, it's hard to quantify certain features in training data.

  • There are lots of things that are tough to describe with numbers.

  • Like can you really rate a sibling relationship with a number?

  • It's complicated!

  • You love them, but you hate how messy they are, but you like cooking together, but you

  • hate how your parents compare you...

  • It's so hard to quantify all that!

  • In many cases, we try to build AI to evaluate complicated qualities of data, but sometimes

  • we have to settle for easily measurable shortcuts.

  • One recent example is trying to use AI to grade writing on standardized tests like SATs

  • and GREs with the goal to save human graders time.

  • Good writing involves complex elements like clarity, structure, and creativity, but most

  • of these qualities are hard to measure.

  • So, instead, these AI focused on easier-to-measure elements like sentence length, vocabulary,

  • and grammar, which don't fully represent good writingand made these AIs easier

  • to fool.

  • Some students from MIT built a natural language program to create essays that made NO sense,

  • but were rated highly by these grading algorithms.

  • These AIs could also potentially be fooled by memorizing portions oftemplateessays

  • to influence the score, rather than actually writing a response to the prompt, all because

  • of the training data that was used for these scoring AI.

  • Fourth, the algorithm could influence the data that it gets, creating a positive feedback

  • loop.

  • A positive feedback loop basically meansamplifying what happened in the past”… whether or

  • not this amplification is good.

  • An example is PredPol's drug crime prediction algorithm, which has been in use since 2012

  • in many large cities including LA and Chicago.

  • PredPol was trained on data that was heavily biased by past housing segregation and past

  • cases of police bias.

  • So, it would more frequently send police to certain neighborhoods where a lot of racial

  • minority folks lived.

  • Arrests in those neighborhoods increased, that arrest data was fed back into the algorithm,

  • and the AI would predict more future drug arrests in those neighborhoods and send the

  • police there again.

  • Even though there might be crime in neighborhoods where police weren't being sent by this

  • AI, because there weren't any arrests in those neighborhoods, data about them wasn't fed

  • back into the algorithm.

  • While algorithms like PredPol are still in use, to try and manage these feedback effects,

  • there is currently more effort to monitor and adjust how they process data.

  • So basically, this would be like a new principal who was hired to improve the average grades

  • of a school, but he doesn't really care about the students who already have good grades.

  • He creates a watchlist of students who have really bad grades and checks up on them every

  • week, and he ignores the students who keep up with good grades.

  • If any of the students on his watchlist don't do their homework that week, they get punished.

  • But all of the students NOT on his watchlist can slack on their homework, and get away

  • with it based onwhat happened in the past.”

  • This is essentially what's happening with PredPol, and you can be the judge if you believe

  • it's fair or not.

  • Finally, a group of people may mess with training data on purpose.

  • For example, in 2014, Microsoft released a chatbot named Xiaoice in China.

  • People could chat with Xiaoice so it would learn how to speak naturally on a variety

  • of topics from these conversations.

  • It worked great, and Xiaoice had over 40 million conversations with no incidents.

  • In 2016, Microsoft tried the same thing in the U.S. by releasing the Twitterbot Tay.

  • Tay trained on direct conversation threads on Twitter, and by playing games with users

  • where they could get it to repeat what they were saying.

  • In 12 hours after its release, after a “coordinated attack by a subset of peoplewho biased

  • its data set, Tay started posting violent, sexist, anti-semitic, and racist Tweets.

  • This kind of manipulation is usually framed asjokingortrolling,” but the

  • fact that AI can be manipulated means we should take algorithmic predictions with a grain

  • of salt.

  • This is why I don't leave John-Green-Bot alone online

  • The common theme of algorithmic bias is that AI systems are trying to make good predictions,

  • but they make mistakes.

  • Some of these mistakes may be harmless or mildly inconvenient, but others may have significant

  • consequences.

  • To understand the key limitations of AI in our current society, let's go to the Thought

  • Bubble.

  • Let's say there's an AI system called HireMe! that gives hiring recommendations

  • to companies.

  • HireMe is being used by Robots Weekly, a magazine where John-Green-bot applied for an editorial

  • job.

  • Just by chance, the last two people namedJohngot fired from Robots Weekly and

  • another threeJohnsdidn't make it through the hiring process.

  • So, when John-Green-Bot applies for the job, HireMe! predicts that he's only 24% likely

  • to be employed by the company in 3 years.

  • Seeing this prediction, the hiring manager at Robots Weekly rejects John-Green-bot, and

  • this data gets added to the HireMe!

  • AI system.

  • John-Green-Bot is just anotherJohnthat got rejected, even though he may have

  • been the perfect robot for the job!

  • Now, futureJohnshave an even lower chance to be hired.

  • It's a positive feedback loop, with some pretty negative consequences for John-Green-Bot.

  • Of course, being namedJohnisn't a protected class, but this could apply to

  • other groups of people.

  • Plus, even though algorithms like HireMe!

  • Are great at establishing a link between two kinds of data, they can't always clarify

  • why they're making predictions.

  • For example, HireMe! may find that higher age is associated with lower knowledge of

  • digital technologies, so the AI suggests hiring younger applicants.

  • Not only is this illegally discriminating against the protected class ofage,”

  • but the implied link also might not be true.

  • John-Green-bot may be almost 40, but he runs a robot blog and is active in online communities

  • like Nerdfighteria!

  • So it's up to humans interacting with AI systems like HireMe! to pay attention to recommendations

  • and make sure they're fair, or adjust the algorithms if not.

  • Thanks, Thought Bubble!

  • Monitoring AI for bias and discrimination sounds like a huge responsibility, so how

  • can we do it?

  • The first step is just understanding that algorithms will be biased.

  • It's important to be critical about AI recommendations, instead of just accepting thatthe computer

  • said so.”

  • This is why transparency in algorithms is so important, which is the ability to examine

  • inputs and outputs to understand why an algorithm is giving certain recommendations.

  • But that's easier said than done when it comes to certain algorithms, like

  • deep learning methods.

  • Hidden layers can be tricky to interpret.

  • Second, if we want to have less biased algorithms, we may need more training data on protected

  • classes like race, gender, or age.

  • Looking at an algorithm's recommendations for protected classes may be a good way to

  • check it for discrimination.

  • This is kind of a double-edged sword, though.

  • People who are part of protected classes may (understandably) be worried about handing

  • over personal information.

  • It may feel like a violation of privacy, or they might worry that algorithms will be misused

  • to target rather than protect them.

  • Even if you aren't actively working on AI systems, knowing about these algorithms and

  • staying informed about artificial intelligence are really important as we shape the future

  • of this field.

  • Anyone, including you, can advocate for more careful, critical interpretation of algorithmic

  • outputs to help protect human rights.

  • Some people are even advocating that algorithms should be clinically tested and scrutinized

  • in the same way that medicines are.

  • According to these opinions, we should know if there areside effectsbefore integrating

  • AI in our daily lives.

  • There's nothing like that in the works yet.

  • But it took over 2400 years for the Hippocratic Oath to transform into current medical ethics

  • guidelines.

  • So it may take some time for us to come up with the right set of practices.

  • Next time, we have a lab and I'll demonstrate how there are biases in even simple things

  • like trying to adopt a cat or a dog.

  • I'll see ya then.

  • Speaking of understanding how bias and misinformation spread, you should check out this video on Deep Fakes

  • I did with Above the Noise -- another PBSDS channel that gets into the research behind controversial issues.

  • Head over to the video in the description to find out how detect deep fakes.

  • Tell them Jabril sent you!

  • Crash Course AI is produced in association with PBS Digital Studios!

  • If you want to help keep all Crash Course free for everybody, forever, you can join

  • our community on Patreon.

  • And if you want to learn more about prejudice and discrimination in humans, you can check

  • out this episode of Crash Course Sociology.

Hi, I'm Jabril and welcome back to CrashCourse AI.

Subtitles and vocabulary

Operation of videos Adjust the video here to display the subtitles

B1 CrashCourse ai data bias green bot training data

Algorithmic Bias and Fairness: Crash Course AI #18

  • 0 0
    林宜悉 posted on 2020/03/30
Video vocabulary