Subtitles section Play video Print subtitles [MUSIC PLAYING] DAVID MALAN: This is CS50. [MUSIC PLAYING] DAVID MALAN: Hello, world. This is the CS50 podcast. My name is David Malan. BRIAN YU: And my name is Brian Yu. And today, we thought we'd discuss academic honesty in CS50. And so every year in CS50, we always have some number of cases of academic dishonesty where some number of students submit work that isn't their own, either by copying homework from a friend or by looking something up online and using a solution they find online as part of their solution. And so this is something that CS50 has had to deal with for years now in terms of how best to address this type of situation, and how best to prevent academic dishonesty in general. DAVID MALAN: Indeed this was-- when I first took over the course myself back in 2007, it was really an end of semester process. After the teaching Fellows would evaluate student's work and provide feedback throughout the semester, I would finally, all too often by semester end, carve out some time in order to then cross compare all of the submissions from that semester looking for statistically unlikely similarities between students work. Indeed, what a student might sometimes unfortunately do is copy the work of another student, lean too heavily on some resource online, copying more than a reasonable number of lines of code. And so by cross comparing all submissions with software itself, do we then notice which lines of code are in both student A student B's work, and then conclude ultimately, that statistically this was unlikely to happen. BRIAN YU: Now, how exactly do you draw those conclusions. Because I'm thinking about a programming language like C, there are only so many parts of the language. Their for loops and their conditions. And probably everyone's solutions to similar problems probably have these sorts of elements. So what exactly do you look for in this process? DAVID MALAN: Yeah, it's quite fair. If we relied on this kind of cross comparison for programs like Hello, World, everyone would appear to have written exactly the same code. But as soon as we get into CS50's second and third weeks where the programs they write in C tend to get a little longer, there does end up being more opportunity for creativity, for different stylized actions by students. And so students code does start to drift. Even though at the end of the day the solutions might still be using for loops and while loops and conditions and so forth, students might format their code slightly differently. They might write slightly different comments. And so what tends to happen over time, as the programs exceed maybe 10, 20, 30 lines of code, is there enough variation? And indeed, unfortunately, what we often notice is not even necessarily that the code is identical, because as you know, that in and of itself might just be a coincidence. Especially, when nowadays we have 800 students, it is absolutely going to be the case that two students write, by chance, very similar code. But unfortunately, the kinds of things we tend to notice is when students have the same typographical errors, or they use precisely the same variable names, or they make precisely the same mistake in precisely the same location. And at that point, our instincts start to kick in and we look at code like this and start to realize, while this may have happened by chance, on scale the odds that had happened in this line and in this line and in this line between two students code is just more likely than not better explained by some deliberate act. BRIAN YU: So at Harvard at least, when there are cases of academic dishonesty, they're usually referred to some administrative body, which now is called the Honor Council here at Harvard. And I think you've pointed out and a couple other people have pointed out that CS50, though it is the largest course that the university, does refer far more people to the Honor Council like any other class on campus. Do you think that has to do with something about computer science or introduction to computer science? Or why do you think that might be? DAVID MALAN: No, I don't. And that's certainly an unfortunate distinction that we've long had, say for, one or two years where there are issues in other departments. No, I don't think that computer science students are any less honest than their classmates in other fields. I don't think students in CS50 or any less honest than students in other computer science courses. I think it really boils down to one, you and I and educators in computer science are perhaps somewhat uniquely positioned with tools-- with software tools via which to detect it. And in a large introductory course like CS50, I think it's important not only out of fairness to those students who are behaving honestly throughout the term, but also because one of our goals should be in this course, to teach students the ethical application of computer science. That we should be holding students to those same expectations as are prescribed in great detail in the courses syllabus. And so I think it's really a function of our one, looking for it. And to two, through on it that really ends up explaining the large numbers. BRIAN YU: Yeah, so I'm looking here at the data from past years in CS50, and it does seem that there's also a fair amount of fluctuation in terms of what percentage of students in the course end up being referred to the Honor Council. Like, in 2009 for example, it looks like nobody was referred to the Honor Council. And in other years like 2010, 2012, there's like 1% or 2% of students. But in other years like 2015, it's up to 5%, 2016 is up to 10%. What do you think accounts for that fluctuation because that's a pretty big difference between one year and another? DAVID MALAN: Yeah, there really has been as you say, from 0% to 10% depending on the year. I think it's a few things. Part of it I think is just a function of how much time I or we put into the process. I think the year in 2009 when there were 0%, I did look for worrisome instances at that particular year, but admittedly in retrospect, I probably spent less time that year than the subsequent year. Because the subsequent year it went up to 2%. With that said, it might have been by chance, just a group of students who exhibited this pattern of behavior with far less frequency than others. So I think that's certainly possible as well. But I think the uptick in more recent years for instance, 10% in 2016 and roughly 4% or 5% then, which is where we've been rather in equilibrium the past few years, I think is also a function of just how much time we invest in it. So back in 2008, and for a few years there after, it was only me who is engaged in this process. I would run the software by myself. I would look at students submissions side by side. And I would ultimately decide which to refer forward to Harvard's Honor Council. And then ultimately, document all those cases. But in more recent years have we involved more of CS50s senior staff in the process. The upside of which is that we can now one, analyze the submissions roughly on a week to week basis. The upside of which is that we can provide the Honor Council with the tails far more quickly. Students themselves, while though, never a pleasant process at least no sooner rather than later, rather than getting to the entire end of the semester and then realizing just how many or how often they cross some line. But two, the fact that we have multiple human eyes on it means that we do allocate more time week to week on each of the individual submissions and the crossways comparisons thereof. The upside though of those multiple humans, we now have two or three of us who ultimately vote on whether or not a case should move forward to the Honor Council is that I at least, and hopefully all of us, have much more comfort in sending a case to the Honor Council because not one pair of eyes, but two or three have all adjudicated it to be a clear indication of a line having been crossed. BRIAN YU: Can you tell me a little more about that process? You've talked about now that there are now a couple of eyes that are all looking at the submissions, but you've also talked about software being involved too. So what is the interplay there between the role that software plays in trying to detect this sort of thing and the role that people play in trying to detect academic dishonesty? DAVID MALAN: Yeah, I should first emphasize that it is not software that is ultimately disciplining students or referring them to Harvard's Honor Council. It is rather just a tool that we use as a first pass. Given that we have some, nowadays, 800 students, each of whom are submitting 10 homework problems over the course of the semester. This is a big O of-- n squared problem times 10 or so. So it's a huge number of comparisons that need to be made, and it just wouldn't be practically done by hand or by eye alone.