Placeholder Image

Subtitles section Play video

  • [MUSIC PLAYING]

  • DAVID MALAN: This is CS50.

  • [MUSIC PLAYING]

  • DAVID MALAN: Hello, world.

  • This is the CS50 podcast.

  • My name is David Malan.

  • BRIAN YU: And my name is Brian Yu.

  • And today, we thought we'd discuss academic honesty in CS50.

  • And so every year in CS50, we always have some number of cases

  • of academic dishonesty where some number of students

  • submit work that isn't their own, either by copying homework from a friend

  • or by looking something up online and using a solution they

  • find online as part of their solution.

  • And so this is something that CS50 has had to deal with for years

  • now in terms of how best to address this type of situation,

  • and how best to prevent academic dishonesty in general.

  • DAVID MALAN: Indeed this was-- when I first took over the course

  • myself back in 2007, it was really an end of semester process.

  • After the teaching Fellows would evaluate student's work

  • and provide feedback throughout the semester,

  • I would finally, all too often by semester end,

  • carve out some time in order to then cross compare all of the submissions

  • from that semester looking for statistically unlikely similarities

  • between students work.

  • Indeed, what a student might sometimes unfortunately do

  • is copy the work of another student, lean too heavily

  • on some resource online, copying more than a reasonable number

  • of lines of code.

  • And so by cross comparing all submissions with software

  • itself, do we then notice which lines of code

  • are in both student A student B's work, and then conclude ultimately,

  • that statistically this was unlikely to happen.

  • BRIAN YU: Now, how exactly do you draw those conclusions.

  • Because I'm thinking about a programming language like C,

  • there are only so many parts of the language.

  • Their for loops and their conditions.

  • And probably everyone's solutions to similar problems

  • probably have these sorts of elements.

  • So what exactly do you look for in this process?

  • DAVID MALAN: Yeah, it's quite fair.

  • If we relied on this kind of cross comparison

  • for programs like Hello, World, everyone would appear

  • to have written exactly the same code.

  • But as soon as we get into CS50's second and third weeks

  • where the programs they write in C tend to get a little longer,

  • there does end up being more opportunity for creativity,

  • for different stylized actions by students.

  • And so students code does start to drift.

  • Even though at the end of the day the solutions

  • might still be using for loops and while loops and conditions and so forth,

  • students might format their code slightly differently.

  • They might write slightly different comments.

  • And so what tends to happen over time, as the programs exceed

  • maybe 10, 20, 30 lines of code, is there enough variation?

  • And indeed, unfortunately, what we often notice

  • is not even necessarily that the code is identical, because as you know,

  • that in and of itself might just be a coincidence.

  • Especially, when nowadays we have 800 students,

  • it is absolutely going to be the case that two students write,

  • by chance, very similar code.

  • But unfortunately, the kinds of things we tend to notice

  • is when students have the same typographical errors,

  • or they use precisely the same variable names,

  • or they make precisely the same mistake in precisely the same location.

  • And at that point, our instincts start to kick in

  • and we look at code like this and start to realize,

  • while this may have happened by chance, on scale

  • the odds that had happened in this line and in this line

  • and in this line between two students code is

  • just more likely than not better explained by some deliberate act.

  • BRIAN YU: So at Harvard at least, when there are cases of academic dishonesty,

  • they're usually referred to some administrative body, which

  • now is called the Honor Council here at Harvard.

  • And I think you've pointed out and a couple other people

  • have pointed out that CS50, though it is the largest course that the university,

  • does refer far more people to the Honor Council like any other class on campus.

  • Do you think that has to do with something about computer science

  • or introduction to computer science?

  • Or why do you think that might be?

  • DAVID MALAN: No, I don't.

  • And that's certainly an unfortunate distinction that we've long had,

  • say for, one or two years where there are issues in other departments.

  • No, I don't think that computer science students are any less honest

  • than their classmates in other fields.

  • I don't think students in CS50 or any less honest than students

  • in other computer science courses.

  • I think it really boils down to one, you and I and educators in computer science

  • are perhaps somewhat uniquely positioned with tools--

  • with software tools via which to detect it.

  • And in a large introductory course like CS50,

  • I think it's important not only out of fairness to those students

  • who are behaving honestly throughout the term, but also because one of our goals

  • should be in this course, to teach students

  • the ethical application of computer science.

  • That we should be holding students to those same expectations as

  • are prescribed in great detail in the courses syllabus.

  • And so I think it's really a function of our one, looking for it.

  • And to two, through on it that really ends up explaining the large numbers.

  • BRIAN YU: Yeah, so I'm looking here at the data from past years in CS50,

  • and it does seem that there's also a fair amount of fluctuation

  • in terms of what percentage of students in the course end

  • up being referred to the Honor Council.

  • Like, in 2009 for example, it looks like nobody

  • was referred to the Honor Council.

  • And in other years like 2010, 2012, there's like 1% or 2% of students.

  • But in other years like 2015, it's up to 5%, 2016 is up to 10%.

  • What do you think accounts for that fluctuation

  • because that's a pretty big difference between one year and another?

  • DAVID MALAN: Yeah, there really has been as you say, from 0% to 10%

  • depending on the year.

  • I think it's a few things.

  • Part of it I think is just a function of how much time

  • I or we put into the process.

  • I think the year in 2009 when there were 0%, I did look for worrisome instances

  • at that particular year, but admittedly in retrospect, I probably

  • spent less time that year than the subsequent year.

  • Because the subsequent year it went up to 2%.

  • With that said, it might have been by chance,

  • just a group of students who exhibited this pattern of behavior

  • with far less frequency than others.

  • So I think that's certainly possible as well.

  • But I think the uptick in more recent years for instance, 10% in 2016

  • and roughly 4% or 5% then, which is where

  • we've been rather in equilibrium the past few years,

  • I think is also a function of just how much time we invest in it.

  • So back in 2008, and for a few years there after,

  • it was only me who is engaged in this process.

  • I would run the software by myself.

  • I would look at students submissions side by side.

  • And I would ultimately decide which to refer forward

  • to Harvard's Honor Council.

  • And then ultimately, document all those cases.

  • But in more recent years have we involved more of CS50s senior staff

  • in the process.

  • The upside of which is that we can now one, analyze the submissions roughly

  • on a week to week basis.

  • The upside of which is that we can provide the Honor Council

  • with the tails far more quickly.

  • Students themselves, while though, never a pleasant

  • process at least no sooner rather than later, rather

  • than getting to the entire end of the semester

  • and then realizing just how many or how often they cross some line.

  • But two, the fact that we have multiple human eyes on it

  • means that we do allocate more time week to week

  • on each of the individual submissions and the crossways comparisons thereof.

  • The upside though of those multiple humans,

  • we now have two or three of us who ultimately vote on whether or not

  • a case should move forward to the Honor Council is that I at least,

  • and hopefully all of us, have much more comfort in sending a case to the Honor

  • Council because not one pair of eyes, but two or three

  • have all adjudicated it to be a clear indication of a line

  • having been crossed.

  • BRIAN YU: Can you tell me a little more about that process?

  • You've talked about now that there are now

  • a couple of eyes that are all looking at the submissions,

  • but you've also talked about software being involved too.

  • So what is the interplay there between the role that software

  • plays in trying to detect this sort of thing and the role

  • that people play in trying to detect academic dishonesty?

  • DAVID MALAN: Yeah, I should first emphasize

  • that it is not software that is ultimately

  • disciplining students or referring them to Harvard's Honor Council.

  • It is rather just a tool that we use as a first pass.

  • Given that we have some, nowadays, 800 students, each of whom

  • are submitting 10 homework problems over the course of the semester.

  • This is a big O of-- n squared problem times 10 or so.

  • So it's a huge number of comparisons that need to be made,

  • and it just wouldn't be practically done by hand or by eye alone.