Subtitles section Play video Print subtitles Hi, I'm Adriene Hill, and this is Crash Course Statistics. Welcome to a world of probabilities, paradoxes and p-values. There will be games. And thought experiments. And coin flipping. A lot of coin flipping. Statisticians love to talk about coin flipping. By the time we finish the course, you'll know why we use statistics. And how. And what questions you ought to be asking when you run across statistics in the world. Which is ALL THE TIME. Statistics can help you make a guess whether or not you're going to be accepted to Harvard. Marketers use them to sell us gold-lame pants. Netflix uses stats to predict what show we might want to watch next. You use statistics when you look at the weather forecast and decide what to wear--dress or jeans. Policy makers use them to decide whether or not to invest in more early childhood education, whether or not to spend more on mental health services. Statistics is all about making sense of data--and figuring out how to put that information to use. Today, we're going to answer the question “What IS Statistics?” INTRO The legend says that during a late 1920's English tea at Cambridge, a woman claimed that a cup of tea with milk added last tasted different than tea where the milk was added first. The brilliant minds of the day immediately began to think of ways to test her claim. They organized eight cups of tea in all sorts of patterns to see if she really could tell the difference between the milk first and tea first cups. But even after they had seen her guesses, how could they really decide? Because, she'd get about half the cups right just by randomly guessing either milk or tea. And even if she really could tell the difference, it's completely possible that she would miss a cup or two. So how could you tell if this woman was actually a tea-savant? What is the line between lucky tea guesser and tea supertaster? As fate would have it, future super-statistician and part time potato scientist Ronald A. Fisher was in attendance. During his lifetime, Fisher began work that set the stage for a large portion of Statistics which is the focus of this series. These statistics can help us make decisions in uncertain situations, tea-taste-tests and beyond. Fisher's insights into experimental design helped turn statistics into its own scientific discipline. And, although Fisher didn't publish results of this tea-test...the story has it...the woman sorted all the tea cups correctly. Just in case you were curious. At this point, it's worth mentioning that there are two related--but separate--meanings of the word statistics. We can refer to the field of statistics... which is the study and practice of collecting and analyzing data. And we can talk about statistics as in facts about... or summaries... of data. To answer the question “What is statistics?”, we should first... ...ask the question “What can statistics do?” Let's say you wake up at your desk after a long evening studying for finals with a cheeseburger wrapper stuck to your face. And you wonder... "why do I eat this stuff? Is fast food controlling my life?" But then you tell yourself, "No. It's just super convenient.." But you're worried, you're thinking about how great it is that McDonald's serves breakfast all day RIGHT NOW. But maybe that's normal, finals are this week afterall, so you google the question “Fast Food consumption” and you find the results of a fast food survey. The first thing you might do is start asking questions that interest you. For example, you could ask, Why do people eat fast food? Do people eat more fast food on the weekend than on weekdays? Does eating fast food stress me out? Now that we have some interesting questions, we need to ask ourselves an even more important one: Can these questions be answered by statistics? Like I mentioned earlier, statistics are tools for us to use, but they can't do all the heavy lifting. To answer the question about why people eat fast food, you can ask them to fill out a questionnaire, but you can't know whether their answers truly represent what they're thinking. Maybe they answer dishonestly because they don't want to admit that they scarf McDonalds because they're too tired to cook dinner, or because they are ashamed to admit they think Del Taco is delicious, or because none of the given answers represented their reasons, or they may not really know why they eat fast food. Armed with the results of the survey, you could tell you that the most common reason that people reported eating fast food was convenience, or that the average number of meals they eat out each week is five. But you're not truly measuring why people eat so much fast food. You're measuring what we call a “proxy”, something that is related to what we want to measure, but isn't exactly what we want to measure. To answer whether people eat more fast food on the weekends, or whether eating it more than twice a week increases stress, we'd not only need to know how much people are eating fast food, which our questionnaire asked, but also which days they eat it. And we'd need an additional measure of “stress”. You can use statistics to give a good answer about whether you're going through the drive-thru more on the weekend, but even the question of whether eating fast food is associated with higher levels of stress is hard to answer directly. What is stress and how can we measure it? And are people eating fast food because they are stressed? Or does eating all those calories make them stressed? It's often the case that some of the most interesting questions are the ones that can't be directly answered by statistics--like why people eat fast food. Instead we find questions that we can answer-- like whether people who eat fast food often work more than eighty hours a week. The tools we use to answer these questions are statistics-plural--and there are two main types: Descriptive and Inferential. Descriptive statistics, well... they describe what the data show! Descriptive statistics usually include things like where the middle of the data is--what statisticians call measures of central tendency--and measures of how spread out the data are. They take huge amounts of information that may not make much intuitive sense to us, and compress and summarize them to ...hopefully... give us more useful information. Let's go to the the Thought Bubble. You've been working for two years in the local waffle factory. Day in and day out, you create the golden-browny-iest, tastiest frozen waffles ever created. The holes are perfectly spaced. Screaming for syrup. And now you want a raise. You deserve a raise. No one can make a waffle as well as you can. But how much do you ask for? An extra thousand dollars? An extra 5-thousand dollars? You know you're valuable, but have no idea what other waffle makers get paid. So you dig around online and find there's an entire subreddit devoted to waffle makers. And someone username “waffleleaks” has posted a spreadsheet of waffle maker salaries. Now with a quick glance at this huge list of numbers, you can see whether the woman who works a similar job at the rival frozen waffle company makes more than you. You can see how much more you are making than the new guy, who's just now learning to mix batter. But you still don't know much about the paychecks of your waffle company as a whole. Or the industry. Cause it turns out there are thousands of waffle makers out there. And all you see is a list with data points, not patterns that can help you learn more about how much you might be able to convince the boss to pay you. Here is where descriptive statistics come in. You could calculate the average salary at your company as well as how spread out everyone's salaries are around that average. You'd be able to see whether the CEOs' paychecks are relatively close to the entry-level batter makers, or incredibly far away. And how your salary compares to both of their salaries. You could calculate the average salary of everyone in the industry with your job title. And see the high and low end of that pay. And then, armed with those descriptive statistics, you could confidently walk into the waffle bosses office and demand to be paid for your talents. Thanks, Thought Bubble. While descriptive statistics can be great, they only tell us the basics. Inferential statistics allows us to make….inferences. (Clever namers, those statisticians.) Inferential statistics allow us to make conclusions that extend beyond the data we have in hand. Imagine you have a candy barrel full of salt water taffy. Some pink, some white, some yellow. If you wanted to know how many of each color you have, you could count them. One by one by one. That'd give you a set of descriptive statistics. But who has time for all that? Or, you could grab a giant handful of taffy, and count just those you have pulled out, which would be using descriptive statics. If your candy was, in fact, mixed pretty evenly throughout the barrel, and you got a big enough handful, you could use inferential statistics on that “sample” to estimate the content of the entire taffy stash. We ask inferential statistics to do all sorts of much more complicated work for us. Inferential statistics let us test an idea or a hypothesis. Like answering whether people in the US under the age of 30 eat more fast food than people over 30. We don't survey EVERY person to answer that question. Let's say someone tells you that their new brain vitamin--Smartie-vite--improves your IQ. Do you rush out and buy it? What if they told you that the average IQ increase for Group A-- twenty people who took Smartie-vite for a month--was two IQ points, and the average IQ increase for Group B--twenty people who took nothing--was one IQ point. How about now? Still not sure? It is a pretty small difference right]? Inferential statistics give you the ability to test how likely it is that the two populations we sampled actually have different IQ increases.