Placeholder Image

Subtitles section Play video

  • (smooth electronic music)

  • - Hello, everybody, thank you very much for coming.

  • Good morning.

  • Hope you are doing well.

  • I'd like to talk today about conflict resolution

  • in distributed systems, that is if several

  • people change some data independently of each other,

  • what happens, how do we resolve those conflicts

  • that'll occur.

  • My background is I'm a researcher at the University

  • of Cambridge.

  • I was previously in industry

  • and a bunch of internet start ups.

  • So I worked at Linkedin for a couple of years,

  • for example.

  • At the moment, I'm working on this research product

  • called Trve Data.

  • Spelled T-R-V-E

  • and what we're trying to do here

  • is to bring end to end encryption

  • to a larger range of applications.

  • So think something like Google Docs

  • where several people can edit a document

  • at the same time online

  • but without having to trust Google servers

  • because what we want to do is

  • to be able to put data on various servers in the cloud

  • but not have to worry about what happens

  • if they get compromised or so on.

  • So that's kind of the background of all of this.

  • I'm not talking about the encryption

  • and the security protocols today,

  • I'm only focusing on one little piece of that whole project

  • which is what happens if several people edit data

  • at the same time and how do we resolve that.

  • So I'd like to start with a scenario

  • that will probably be familiar with you,

  • which is you, a little blue stick figure here,

  • are hacking on some code on your computer

  • and at some point you decide that this code is done

  • and you commit it using your favorite

  • version of control system.

  • I'll just use git as an example here

  • and so at this point, you'd put the code

  • in the repository and then maybe you push it somewhere

  • so that other people can see that `code as well.

  • So in the case of Git, you maybe you'll push

  • it to a repository on Github

  • and this is now the communication mechanism

  • for people with your team.

  • So if there's somebody else, say,

  • this little red stick figure

  • who is also hacking on code,

  • then well you can synchronize up through

  • the central repository.

  • This is all very familiar, this is what

  • we do everyday

  • and so the little red person here

  • might independently at the same time

  • also be working on the same code base

  • and also do a commit and I'll,

  • what happens if you this person now fetches

  • from Github, well they'll have to either

  • do a merge or rebase if that's how your work flow goes

  • or something along those lines.

  • So somehow these changes are going

  • to have to be combined together

  • and as you've probably experienced,

  • if people change different files in the same repository,

  • that's no problem, they will just get merged cleanly.

  • If one person changes the beginning of a file

  • and another person changes the end of the file,

  • that's probably okay because the version control system

  • will merge them automatically.

  • If people change the same part of the same file,

  • then you're going to have to resolve the merge conflict

  • yourself and then we have these tools

  • for doing three way mergers for copying patches

  • from one side to another and figuring out

  • what the results should be.

  • So you've probably have to find with things like this.

  • This is exactly the kind of problem I'm talking about.

  • But this problem happens not only in software development.

  • It's a very general purpose problem.

  • So imagine you're a lawyer working in a law firm

  • and maybe there's a contract being negotiated

  • with, you've got a client on one side

  • and the other company's law firm on the other side

  • and everybody is sending these versions

  • of contracts back and forth

  • and the contracts are probably Microsoft Word documents

  • because that's how lawyers work

  • and they send these things by email

  • and so you've got one person making changes

  • to these Word documents and then hit save

  • and then at the same time, maybe somebody else,

  • maybe at the different company

  • is also updating the same document.

  • Changes it and now you email these changes to each other

  • and so this is actually very much the same data flow

  • and I actually just reused the same diagram

  • and changed the labels.

  • You've got the email as the communication path

  • and at some point these changes are going

  • to have to be merged together

  • and now before Microsoft Word,

  • I'm not sure there is even this kind

  • of nice user interface for three way mergers.

  • I know you can compare two documents

  • but I think what people end up doing

  • is manually copying the changes

  • from one version of the document to another

  • and so performing this merge really manually.

  • So that's kind of this oh crap situation here.

  • So in this case, it's kind of best to have like

  • an informal lock where one person says,

  • okay I'm going to be editing the document now.

  • Please don't change it for the next day.

  • I'll send it to you and then you can edit it.

  • So people try to sequence their updates

  • like this crude manual communication.

  • What happens in another?

  • Let's look at the third example.

  • Let's look at a to do list.

  • So this is maybe a shared

  • to do list where me and my wife together

  • have this shopping list where I can add stuff

  • and she can add stuff and then whoever next

  • goes to the shop can buy those things.

  • So here buying milk is added

  • to the to do list and let's say this to do list

  • is stored on a central server.

  • Now so thats what allows us to communicate

  • and so I add buy milk to the to do list

  • and press okay button and so it does a request,

  • like maybe an http post request

  • over the network, stores it there, it comes back

  • and says okay that was added to the to do list

  • and at the same time, maybe my wife goes,

  • oh you need to water the plants, so I just remembered.

  • So add stuff to the to do list

  • also does this post to the server

  • and comes back okay

  • and so in this case, actually what has happened

  • is if this central server stores its data

  • in a data base and it uses something like transactions.

  • If this is a relational database say,

  • then we actually have serialization going on

  • and that is these updates are actually applied

  • in a sequential order, in a serial order.

  • That's where serializable comes from

  • and database transactions.

  • Which means they're applied one at a time.

  • So you don't actually have the same concurrency problem

  • as we had with the code editing

  • and with the Word documents being sent back and forth.

  • Cause actually, there's one primary copy of the data

  • and that lives on the server

  • and that's being updated one transaction

  • at a time in sequence.

  • So in this case, you don't

  • get this conflict resolution problem.

  • So this seems nice but on the other hand

  • you have a problem which is well,

  • what if I don't have signal on my mobile phone right now

  • or if the network is interuppted for some other reason

  • or I can press the button to save

  • and I'll just a spinning, spinning wait indicator

  • and nothing will happen.

  • So here we have this, it's kind of obvious,

  • this problem if you don't have internet connection

  • then you can't reach the central server

  • so you can't store any data.

  • You can't edit the data in anyway.