Placeholder Image

Subtitles section Play video

  • KRZYSZTOF OSTROWSKI: My name is Chris.

  • I'm leading the TensorFlow Federated in Seattle.

  • And I'm here to tell you about federated learning

  • and the platform we've built to support it.

  • There are two parts of this talk.

  • First I'll talk about federated learning, how it works and why,

  • and then I'll switch to talk about the platform.

  • All right, let's do it.

  • And this is a machine learning story,

  • so it begins with data, of course.

  • And today, the most exciting data out there

  • is born decentralized on billions of personal devices,

  • like cell phones.

  • So how can we create intelligence and better

  • products from data that's decentralized?

  • Traditionally, what we do is that there's

  • a server in the clouds that is hosting the machine learning

  • model in TensorFlow, and clients all talk to it

  • to make predictions on their behalf.

  • And as they do, the client data accumulates on the server

  • right next to the model.

  • So model and data, that's all in one place.

  • Very easy.

  • We can use traditional techniques that we all know.

  • And what's also great about this scenario

  • is that the same model is exposed to data

  • from all clients and so--

  • pushing millions of clients, and so it's very efficient.

  • All right.

  • If it's so good, why change that, right?

  • Well actually, in some applications,

  • it's not so great.

  • First it doesn't work offline.

  • There's high latency, so applications that need

  • fast turnaround may not work.

  • All these network communications consuming battery life

  • and bandwidth.

  • And some data is too sensitive, so collecting is not

  • an option, or too large.

  • Some sensitive data could be large.

  • OK.

  • What can we do?

  • Maybe we go to the complete other extreme.

  • So ditch the server in the clouds.

  • Now each client is its own client bubble, right?

  • It has its own TensorFlow run time, its own model.

  • And it's training.

  • It's grinding over its data to train

  • and doesn't communicate with anything.

  • So now, of course, nothing leaves the device.

  • None of the concerns from the preceding slide

  • apply, but you have other problems.

  • A single client just doesn't have enough data, very often.

  • It doesn't have enough data to create a good model on its own.

  • So this doesn't always work.

  • What if we bring the several back,

  • but the clients are actually only receiving data

  • from the server?

  • Could that work?

  • So if you have some proxy data on the server that's

  • similar to the on-device data, you could use it.

  • You could pre-train the model on the server,

  • then deploy it to clients, and then let

  • it potentially evolve further.

  • So that could work.

  • Except, very often, there's no good proxy data or not enough

  • of it for the kinds of on-device data you're interested in.

  • A second problem is that this here,

  • the intelligence we're creating is

  • kind of frozen in time, in the sense that, as I mentioned,

  • clients won't be able to do a whole lot on their own.

  • And why does it matter?

  • And here's one concrete example from actual production

  • application.

  • Consider a smart keyboard that's trying

  • to learn to autocomplete.

  • If you train a model in the server

  • and deploy it, now suddenly millions of people

  • start using a new word, what happens?

  • You'd think, hey, it's a strong signal, millions of people.

  • But if you're not one of those millions,

  • your phone has no clue, right?

  • And so it could take a lot of punching

  • into that phone to make it notice that something new has

  • happened, right?

  • So yeah, this is not what we want.

  • We really need the clients so somehow contribute back

  • towards the common good so they can all benefit.

  • Federated learning is one way to do that.

  • Here we start with initial model provided by the server.

  • This one is not pre-trained.

  • We don't assume we have proxy data.

  • It doesn't matter.

  • It can be just 0s.

  • So we send it to the client.

  • The client now trains it locally on its own data.

  • And this is more than just one step of gradient descent,

  • but it's also now training to convergence.

  • Typically, you would just make a few passes

  • over the data on the clients and then produce

  • a locally trained model and send it to the server.

  • And now all the clients are training independently,

  • but they all use the same initial model to start with.

  • And the server's job is to orchestrate this process

  • to make it happen and produce the same--

  • feed the same initial model to all the clients.

  • So now once the server collects the locally trained models

  • from clients, it aggregates them into

  • a so-called federated model.

  • And typically what we do is simply average the model

  • parameters across all clients.

  • So the server just adds the numbers and that's it.

  • So this federated model, it has been influenced by data

  • from our clients, right?

  • Because it's been influenced by the client models, and those,

  • in turn, have been influenced by client data.

  • So we do get those benefits of scale in this scenario,

  • so that's great.

  • But there's one question.

  • What happens to privacy?

  • So let's look at this closely.

  • First, client data never left the device.

  • Only the models trained on this data was shared.

  • So next, the server does not retain, store,

  • any of the client models.

  • It simply adds them up and then throws them away.

  • It deletes them, right?

  • So they are ephemeral.

  • But here they're asking how they know that this

  • is what the server is doing.

  • Maybe the server is secretly, somehow,

  • logging something on site.

  • So there are cryptographic protocols

  • that we can use to ensure that that's all legit.

  • So with those protocols, the server

  • will only see the final result of aggregation

  • that will not have access to any of your client contributions.

  • And we use those in practice, so hopefully

  • to put your mind at rest.

  • So the server only ever sees the final aggregate.

  • You can still wonder how do we know that that doesn't contain

  • anything sensitive.

  • So this is where you would use differential privacy.

  • In a nutshell, each client keeps its updates

  • and adds a little bit of noise.

  • So once the final aggregate emerges of the server,

  • there's enough noise to sort of mask out any

  • of the individual contributions, but there is still

  • enough signal to make progress.

  • So not to get too much into the detail,

  • but this is also a technique we use in production.

  • Differential privacy is an established and a commonly used

  • way to provide anonymity.

  • If you have any more concerns, I'll

  • be happy to discuss them offline.

  • So how does it work in practice?

  • Firstly, it's not enough to just do it once.

  • So once you produce a federated model,

  • you'll feed it back on the server as an initial model

  • for the next round, then execute many thousands

  • of rounds, potentially.

  • That's how long it takes to converge.

  • And so in this scenario, both clients and server have a role.

  • Clients are doing all the learning.

  • That's where all the machine learning sits.

  • And server is just orchestrating the process,

  • aggregating and also providing continuity to this process

  • as we move from one round to another,

  • because the server is what carries

  • the state between rounds.

  • And to drill into this a little bit more,

  • in the practical applications, clients

  • are not all available at the same time.

  • You remember those concerns I mentioned about consuming

  • battery life and bandwidth.

  • We want to give users a good experience,

  • so we want it to be non-disruptive.

  • So we will only perform training when the device is connected

  • into the power source on the Wi-Fi network and idles,

  • so that the user is not negatively affected.

  • And so that means, out of the billions of clients out there,

  • only a small fraction are available at any given

  • time for training.

  • And this is illustrated on this diagram here.

  • This is from an actual production system.

  • We can kind of see, when you track the number of rounds

  • per hour the server completes across time,

  • you can see it kind of maxes out at night,

  • when everyone's asleep and their phone

  • is connected, and beeps at lunch when everybody's punching

  • into their phone while eating.

  • So yeah.

  • So the clients keep coming and going.

  • And so that means that, as we move across rounds

  • from one round to another, the sort of participating clients

  • will change for various reasons, including that some of them

  • lose connectivity.

  • So clients can, in general, drop out at any time.

  • So that's why, in an actual production system when

  • it's deployed, there's always a client selection

  • phase where the exact set of participants is chosen.

  • And there are many factors that go into it, including

  • concepts about bias.

  • But the for this talk, all that's important to remember

  • is that the sort of clients in each round is different.

  • So in a nutshell, the characteristics

  • of production scenarios-- at a glance,

  • there are many of them-- millions, billions.

  • They don't talk to each other.

  • These are cell phones, so no peer to peer connectivity.

  • Communication is the bottleneck in the whole system.

  • Clients want to be anonymous.

  • And, for the most part, they're interchangeable in the sense

  • that, in the grand scheme of things,

  • whether a particular device contributed data or not,

  • it doesn't really affect the result in any way.

  • And the clients are unavailable and they

  • can drop out at any time.

  • Therefore, we have to effectively consider them

  • as stateless.

  • Even if they have some memory, there's

  • no guarantee when they'll be back.

  • So we treat them as stateless low compute nodes.

  • And finally, the distribution of data in clients

  • is very non-uniform because people defer.

  • So it is only for mobile devices?

  • No, not at all.

  • You could use federated learning for things

  • like a group of hospitals wanting

  • to learn something together or a group

  • of financial institutions.

  • So the general approach is the same.

  • Of course, the details will differ a little bit.

  • In this case, clients are very reliable, potentially

  • very capable.

  • But there are fewer of them.

  • So for some of the cryptographic protocols that we're using,

  • they work better when there are more clients.

  • And so here, you may have to work harder or use some more

  • specialized files.

  • So how well does it work in practice?

  • We've deployed it on Google in several applications,

  • including the smart keyboard that I mentioned.

  • So it runs in production on millions of devices.

  • And when you compare the performance

  • of an autocomplete model that learns on federated data,

  • it's clearly better-- a hierarchy.

  • We have some more user clicks than the former model trained

  • on the server.

  • This is illustrated in some of these diagrams here.

  • We can see on the right side, the federated model

  • stabilizes with a better performance.

  • And the reason for that is that the on-device data

  • is the good data, the higher quality

  • data than the proxy data on the server.

  • Also, I mentioned before that non-federated models

  • where limited, and they wouldn't necessarily

  • be able to adapt to changes in the environment

  • and pick up changes over time.

  • And so here, we demonstrate the federated model can actually

  • learn new words that were not initially in the vocabulary

  • and notice that the people are using them and include them.

  • So it's worth pointing out, this was definitely

  • one example of an application we definitely want to use.

  • Differentiate privacy to make sure

  • that the only thing you're learning is common things

  • and that nothing sensitive gets through.

  • So it worked at Google.

  • Of course, what you really want to know

  • is if it will work for your application.

  • So some rough guidelines here.

  • Mostly common sense stuff-- like if their own device

  • data is high quality or if it's sensitive or large, good reason

  • to use federated learning.

  • Of course, you also need the labels for training.

  • And so we can't pay someone to go and label the data,

  • because it's on-device.

  • We can access it.

  • So in some cases, the labels are just part of the data.

  • Like in the smart keyboard, you know, all the characters

  • you're trying to predict, people will eventually

  • type those characters.

  • And so that's what the labels are.

  • In some cases, you will have to work harder

  • to wire up additional signal into your application

  • to have those labels.

  • But other than that, it's an--

  • FL is a new area of active research.

  • Many variants, many extensions exist

  • in lots of publications-- hundreds of publications.

  • Several workshops just this year,

  • one of them organized at Google.

  • So you see a little picture of us in this workshop.

  • Yeah, so it's not guaranteed that any particular solution

  • will immediately work for you.

  • You have to just try things out and see what works.

  • And what we have to all collectively

  • do to advance this area, this promising field,

  • is to explore together.

  • And so that's why we've built TensorFlow Federated.

  • And so let's get to it.

  • All right.

  • So TensorFlow Federated.

  • What is it?

  • Development environment that is designed specifically for

  • federated learning, although it's also

  • applicable to marginal kinds of computations

  • that I will get to in a minute.

  • It provides a new programming language that's--

  • it's not the interface that's embedded in Python,

  • so you kind of don't notice it.

  • But there is actually kind of a programming

  • underneath that combines TensorFlow and distributed

  • communication.

  • In that language, we have implemented a number

  • of federated algorithms.

  • And so we provide everything you need for simulations.

  • So the runtimes, data sets, and everything is there.

  • It's part of TensorFlow, and it's on GitHub,

  • so everything is open source and modifiable.

  • Whom is it for, though?

  • Two main audiences.

  • One is the researchers.

  • Here what we to enable is for people

  • to very quickly get started.

  • And so we provide this pseudocode-like, high-level--

  • language with pseudo high-level abstractions

  • so that it's very easy for you to express your ideas

  • in a way that's super compact and you

  • can see what you're doing.

  • Also, a number of things you can copy, paste, and fork

  • and modify.

  • So that includes the federated learning implementations,

  • but also we will--

  • it's still kind of emerging, but we'll

  • have full end to end examples of research we

  • produced with scripts you can run and modify and do whatever.

  • And data sets and also the simulation infrastructure

  • is designed to be modular, so that whatever kind of resources

  • you might have, whether it's a cluster in a basement

  • or something else, you can configure things in such a way

  • that it works on your hardware.

  • The second equally important audience is practitioners.

  • And so we want to be able to take all the latest research

  • and immediately use it in production.

  • Assuming this is all implemented in TFF,

  • hopefully that will happen.

  • And so we've made a number of decisions to support that.

  • One is the language that I keep mentioning.

  • The abstractions are designed in such a way

  • that we're thinking of production deployment

  • from day one.

  • Even though production deployment options were not

  • something we provided on day one,

  • but they've been on our minds from day one.

  • Also, we designed the system in such a way

  • that whatever code you're writing

  • in TFF to run in a simulation, you

  • can take the same code, without any changes,

  • you can move into production.

  • I'll get to that later.

  • And also the system is composable,

  • so that you can pick the things you want and compose them

  • together and make it work and modify

  • whatever you want using the pseudocode-like language,

  • because the code is in a form that you

  • should be able to actually read it and understand.

  • And perhaps most importantly, we're

  • actually eating our own dog food and using it at Google.

  • So we are investing our resources

  • to make sure the project evolves in such a way that--

  • in a way that's relevant for production deployment.

  • All right.

  • So I keep mentioning a new language.

  • Why do we need a new language for federated learning?

  • The reason for that is that federated programs

  • are distributed, right?

  • So they include clients and server and everything

  • in between.

  • So communication is an essential part of the program.

  • It's not just some system's concern that's second thought.

  • And so it is kind of expected that-- just as in TensorFlow,

  • you are expected to be engineering your model

  • architectures and tinkering with models

  • and adding new operators here and there.

  • Same of federated learning, except now your data

  • flow diagram kind of spans the entire network, right?

  • And so obviously, communication is also something

  • that you should be able to engineer and play with.

  • And we want to give you programming language

  • abstractions that make it super easy to do that.

  • And things like point to point messaging

  • or taking and restoring checkpoints,

  • we've tried to use those.

  • That's what our initial implementations

  • of federated learning were like.

  • It was unreadable.

  • It was very, very difficult to work with.

  • So we've designed a new system based on higher level

  • abstractions as a basis.

  • And hopefully, you see how this is done in TFF

  • and that you like it.

  • Why stress portability between research and production?

  • You know, when we think about it,

  • in idealized federated learning environments,

  • if you can't look at the data, a lot of things

  • that we take for granted become more interesting.

  • Like, you know, you can't just look at the data,

  • so it may not be easy to see where the outliers are

  • or debug problems with your predictions

  • or trying on various models.

  • There are ways to do some of those things,

  • but they're not obvious.

  • And so, for example, you may want

  • to just go ahead and deploy your model into a live system.

  • Turn on the real devices, maybe in dry mode

  • so nothing gets affected, but it kind of runs there.

  • You can see how well it's doing and integrate in this manner.

  • So the kind of traditional boundary

  • between production versus research,

  • all this gets a little bit more fuzzy.

  • You sometimes may have to experiment in production.

  • And so because of that, and the general desire

  • to transfer new research into production ASAP,

  • it's essential, in our mind, to provide

  • this kind of portability.

  • So you write one version of code, and it works.

  • Whether it's research or simulation or production,

  • it's the same code.

  • And a number of decisions in TFF reflect that, like the fact

  • that everything is kind of language agnostic and platform

  • agnostic.

  • And everything is expressed declaratively,

  • so that you can compile it into different kind of execution

  • environments.

  • OK.

  • So where do you start?

  • The basis of building programs in TFF

  • is with federated computation.

  • This was a generalization of federated learning, so--

  • we have clients that have sensitive data.

  • There are very many of them.

  • They do all the training.

  • Server orchestrates these computations

  • and provides continuity over time.

  • The clients want to be anonymous,

  • so whatever operations we do have to be an aggregate.

  • That's, in essence, what defines a federated computation.

  • How do we create those?

  • So now let's go through the various abstractions

  • that we have in TFF one by one.

  • Values.

  • This is a set of clients.

  • Let's say each of them has a temperature sensor that

  • produces some readings, let's say a floating point number.

  • We're going to refer to the collective of all those numbers

  • as a single federated value.

  • So a federated value can be a multi-set

  • of those individual contributions from clients,

  • right?

  • These federated values have also federated types.

  • In this case, it's going to federated float clients.

  • The curly braces indicate that it's a multi-set.

  • General type consists of the type

  • of the individual constituents and what

  • we call a placement, which is essentially

  • the entity of the group of system participants.

  • We have a little placement in TFF.

  • I won't get into it.

  • But for starters, you would only use clients and servers

  • as the ones.

  • Now, suppose we have a server.

  • And there's a number on the server.

  • Let's say it's also some float.

  • We can also call it a federated value in this case.

  • It's not a multi-set because there's just one sample of it.

  • So it's a float in the server.

  • Now let's get to operators.

  • Suppose there is a distributed aggregation protocol that

  • is picking up numbers from the clients and depositing,

  • let's say, the average or something

  • like that on the server.

  • So unlike in a program language like Python,

  • here in TFF you can think of it as a function.

  • In this case, the inputs to the function

  • are in different places than the output.

  • But that's OK because TFF is essentially

  • a programming framework for creating distributed systems.

  • This is a little distributive system.

  • And so you can model this as a function, in fact.

  • And you can even give it a functional type.

  • And this function takes a float and clients

  • and produces a float and server.

  • In TFF, we also have a little library

  • of commonly used functions.

  • Like federated mean will take a federated float and clients

  • and produce the average of those on the server.

  • And others are available.

  • Now, with all that I've introduced,

  • you can actually start writing programs.

  • So let's write a very, very simple,

  • potentially the simplest possible,

  • federated computation.

  • It goes like this.

  • First, TFF is a strongly typed programming language.

  • And so you always start by defining the types of things.

  • I mentioned, we have a federated float and clients.

  • And so there it goes.

  • Next you're going to actually write a computation.

  • And so TFF code is not Python code.

  • But you express it in Python.

  • It's really the same idea as what you have in TensorFlow.

  • TensorFlow code is not Python code.

  • It's TensorFlow.

  • These are TensorFlow things that are

  • executed by TensorFlow runtime.

  • But you can express them in Python.

  • Python is the language in which you construct it.

  • It's the same idea here.

  • So you write a little Python function.

  • You decorate it as a sort of federated computation.

  • You specify the federated type of the inputs.

  • Now, in the body of this Python function,

  • the sensor readings parameter represents the federated flow

  • that came on as the input.

  • And now we can use federated to open TFF to slice and dice

  • the value.

  • In this case, we just called for a mean and that's it.

  • We hit Return.

  • So now, what happens here is that, just as in TensorFlow,

  • the Python function gets traced.

  • And we construct a little TensorFlow computation

  • representation in a serialized form

  • and store it underneath that symbol.

  • It's kind of the same idea as a TF function getting traced

  • and TensorFlow graph getting stored

  • in a serialized form behind it.

  • So that's just what happens.

  • So when I say that TFF programs are not Python,

  • that's what it means.

  • The get average temperature symbol now

  • represents a serialized representation

  • of a code in TFF.

  • And the reason that's important is because, again, we want

  • to run those things on devices.

  • And so they're not going to be interpreted by regular Python.

  • So now let's look at something slightly larger.

  • Let's say you have a set of clients.

  • Each of them has a temperature sensor.

  • And the analyst on the server want

  • to know what fraction of the clients

  • have temperatures reading over some threshold.

  • So have two inputs here, the red and blue.

  • Data is sensitive.

  • We can't collect it.

  • And so what we do instead, we use a federated broadcast

  • operator to move the threshold from the server to the clients.

  • Now that every client has both threshold and its own reading,

  • and they can compare it, run a little block of TensorFlow

  • to produce one if it's over the threshold and zero otherwise.

  • And so you can think of this as like a map step in MapReduce.

  • And we provide it for the map operator for those kinds

  • of things as well.

  • Then finally, so what emerges is a federated float

  • composed of ones and zeros which you can feed as an input

  • to the federated mean operator and produce

  • the federation of the server.

  • So that's it.

  • That's the whole program in a diagram form.

  • And now if you want to write code,

  • it kind of looks the same, except--

  • it's code.

  • So you start by defining your Python function decorated

  • as a TFF computation.

  • You specify all the inputs as formal parameters.

  • And so you see the readings input.

  • These are the temperatures, the threshold on the server.

  • The inputs can be anywhere, whether its clients on server.

  • Just list all of them here.

  • And now in the body of this function,

  • you can again use federated operators to slice

  • and dice those things.

  • So you see the broadcast here again.

  • You see the map and mean and so on.

  • The client side processing, I mentioned it was in TensorFlow.

  • So in this case, the parameter to the map function

  • that represents this processing is implemented

  • in ordinary TensorFlow code.

  • And that's it.

  • You just slap the types on top of it

  • to make sure that everything is strongly typed,

  • because TFF likes things to be strongly typed

  • into a type object for you.

  • And that's it.

  • That's the whole program.

  • You can go and run it.

  • And I think we have a version of this in the tutorials as well.

  • So now we're on a roll, let's try federated training.

  • And I'm going to show just a small example of what

  • we have described in a tutorial on the TensorFlow website.

  • And I'm going to focus just on the computation that represents

  • a single round of federated averaging,

  • just like what we have discussed at the very beginning

  • of this presentation.

  • So this computation takes three parameters.

  • There's a model on the server that the server

  • wants to feed to the clients.

  • There's a learning rate.

  • Let's make it interesting.

  • And there is a set of on-device data.

  • So the first thing we'll do is, just as

  • before, we broadcast the model and the learning rates

  • from the server to the clients.

  • Now that the clients have everything-- model,

  • learning rate, and their own slice of data--

  • they can perform their client-side training.

  • And likewise, like in the present example,

  • is the federated map operator for that.

  • And the local trade function would be another computation,

  • presumably implemented in TensorFlow

  • that, I want to show.

  • It would look as it always does.

  • And finally, so the map function produces

  • a set of client-side models, locally trained models.

  • And now we just code for a mean operator to average them out.

  • You can apply that operator to any kind of value,

  • including structured values.

  • So that's it.

  • The output is the average of client side models.

  • And that's the algorithm that we have.

  • And so that's the whole program.

  • And so the version of it in the tutorial,

  • you can see how that actually runs and works.

  • So this was, of course, a very simplified example.

  • How we can start extending it and make it more interesting?

  • Just two very short examples I'm going to show.

  • One common thing to do is wanting

  • to inject compression in various places

  • to address various kinds of systems and concerns.

  • And so, for example, if you want to compress data

  • during broadcast, apply encoding on the server

  • before you broadcast, and then use a federated map

  • function to decode on the clients after broadcast.

  • And so you can see how basically two lines of code

  • get you what you want, with the decode and encode presumably

  • being implemented in TensorFlow.

  • Second example, if you want differential privacy,

  • very easy.

  • Before you call federated mean to average your values,

  • you just call a federated map operator to the arguments.

  • And to add some clipping and noise--

  • I'm representing here symbolically,

  • but that's something that you would normally just

  • write in TensorFlow.

  • So again, one line of change for a change like this.

  • And you can sort of imagine other modifications

  • you can do like this.

  • So how can you run it?

  • Even though I mentioned TFF code is not Python,

  • you can call it in Python like a function.

  • And it runs in Python.

  • What happens under the hood, we spawn a little runtime for TFF

  • and run a simulation there and return the numbers in Python

  • so it work seamlessly as if it were Python.

  • So in this case, if you, let's say,

  • want to run five runs on training,

  • this is how we'd write it.

  • It's just kind of what you would expect.

  • And a full version of it is, again, in the tutorial.

  • So you just call the computation and get the numbers back.

  • And the model is represented as a NumPy structure.

  • Where do you get data for simulations?

  • You can, of course, make your own.

  • But we also provide a couple of data sets and many more

  • on the way, in that you have simulations data sets module.

  • Each of these has a load data function.

  • When you call it, you get a pair of Python objects

  • that represent training and test data.

  • And these objects allow you to inspect them.

  • Now, I mentioned before, TFF computations

  • don't let you deal with individual clients

  • or their IDs.

  • So this is things like inspecting

  • what clients are in my data set, that's something

  • that you can only do when orchestrating

  • your simulation in Python.

  • You can not do it in TFF for privacy reasons.

  • So in this case, you can look at the client IDs,

  • for example, so that it can simulate

  • what I discussed previously, the client selection.

  • So here you're taking all the clients,

  • and this is picking a random sample of them.

  • Those are my clients for this round.

  • And now I call the trainer object

  • to construct a TF data data set-- this is an eager data set

  • in TensorFlow-- for that particular client

  • and apply whatever pre-processing you want

  • using the regular TF data APIs.

  • And once you create a list of those-- those are my clients,

  • those are my data sets--

  • you can feed it as an argument into the computation

  • just as I've shown before.

  • And it continues fleshing out your little Python loop.

  • So it's very easy, very natural to do.

  • If you don't want to implement everything from scratch, as we

  • sort of did in this tutorial, you

  • might use one of the canned APIs,

  • like the tff.learning module.

  • So for example, here's one function

  • that constructs federated training computations.

  • It's easiest to use with Keras if you have Keras.

  • You don't have to use Keras, but it's much easier if you do.

  • So if you have a Keras model, you just

  • code a one-liner function to convert it into a form

  • that TFF can absorb.

  • And then these one-liner codes shown here take that model

  • and construct computations that you can use

  • for training and evaluation.

  • And you use them in the same way.

  • You write little Python loops as those that you've seen before.

  • So the trained object has an initialized computation.

  • It has a pair of computations.

  • Initialized creates state on the server, the initialized state

  • for the first round.

  • And then the next computation represents a single round

  • of training.

  • So it will take the initial state before the round started

  • and produce new state after the round completed.

  • And that state includes the model as well as

  • various kinds of counters and things like that.

  • In each round, as you saw before,

  • we can perform client selection and simulate

  • various kinds of system behavior and things like that.

  • So it's very easy to use.

  • And same for evaluation.

  • It can take that final state after training,

  • extract the model out of it, and feed it

  • to the evaluation computation.

  • So the eval is a computation.

  • Again, you just call it like a Python function.

  • And that gets you the metric back, and things like this.

  • So by default, when you just invoke computations

  • like functions, as I've shown, it kind of all just

  • runs on your machine in your process.

  • There are various ways to speed it up.

  • We provided a helpful framework for constructing simulation run

  • times.

  • Right now there is one ready-to-use solution.

  • If you want to run multi-threaded simulations,

  • this snippet of code that I'm showing here, with one line,

  • you create a local executor that has multiple threads in it

  • and then make it the default. And then whatever you type

  • will run in that.

  • If you want something more powerful, not long from now

  • we'll have a kind of all-inclusive, ready-to-use

  • solution for running things on Google Cloud and Kubernetes

  • in a multi-machine setting.

  • If you don't want to wait for that,

  • you can actually just go and stitch it up yourself.

  • Because all the components are basically there

  • in that tff.framework namespace.

  • And those include various kinds of little executors

  • you can stack up together in an executor

  • stack that you can use to construct

  • the various multi-machine architectures

  • with multiple tiers of aggregation, support for GPUs,

  • and things like that.

  • And it's designed to be extensible

  • so that people can plug in various kinds of components

  • into it.

  • Now, if you want to go beyond just running simulations,

  • it is also possible.

  • For that, the options are still emerging.

  • But the two that already exist are on the table.

  • It may involve a bit of effort, but it's possible.

  • One is you can actually plug in your physical devices

  • into the simulation framework.

  • So for example, you can implement a simple GRPC backend

  • interface that we supply, say, to run on your Arduino device

  • or something.

  • And then you can plug that as a little worker node

  • into a simulation framework.

  • And now you can run on your physical devices.

  • That's not something you would use for a large scale

  • production setting.

  • But it's certainly doable for smaller scale experiments.

  • And also, we have an emerging set of compiler tools

  • that take TFF computations and transform them

  • into a form that's more amenable for execution

  • in a particular kind of backend.

  • So for example, there is a body of code emerging that

  • supports MapReduce-like systems, that

  • takes computations and makes them look like MapReduces so

  • that we can run it on Hadoop or something.

  • It's usable, not quite finished, but somewhat usable.

  • If you're interested in pursuing either of those options,

  • I'd be happy to discuss them.

  • And more deployment options on the way.

  • I can't really talk about them.

  • But stay tuned for updates.

  • If you need something that we haven't provided,

  • this is intended to be an open framework and a community

  • project.

  • So by all means, please contribute.

  • Just implement it and send it to people, requests,

  • so that everyone can benefit.

  • There are many ways you can contribute.

  • If you're a modeler, you can contribute models and data sets

  • and things like that.

  • If you're interested in machine learning-- federated learning

  • algorithms, you can contribute algorithms to the framework

  • or help us re-architect it to make it easier to use.

  • Contribute core abstractions, also new types of backends.

  • As I mentioned, this backend support

  • for actually deploying things is emerging.

  • And if you have ideas, perhaps you can contribute to the TFF.

  • That's all I have.

  • Thank you very much.

  • [APPLAUSE]

  • AUDIENCE: So this sort of changes the way

  • that you create a model.

  • I have two questions about that.

  • When you start with a model, do you start with some [INAUDIBLE]

  • data to create an initial value that you will then

  • start the clients with?

  • And then secondly, do you ever re-deploy the average model

  • back to the clients?

  • Or do clients sort of spin off on their own--

  • CREW: Sorry to interrupt.

  • Do you mind starting over?

  • AUDIENCE: So the two questions are,

  • when clients start learning on their own data

  • and then you have an averaged model on the server,

  • do you ever send the averaged model back

  • to the clients for performance boost?

  • Or do clients just spin off on their own afterwards?

  • And then the second question is, how do you start to model?

  • Do you use proxy data initially?

  • And how do you iterate with your model's accuracy and things

  • like that?

  • SPEAKER: Yeah.

  • So for the first question, in a system

  • we have running in production, the way it works--

  • and that's different from TFF.

  • That's just a deployed platform.

  • And so there are many ways you can engineer this.

  • But just talking about the particular example,

  • our production system, the clients periodically

  • come back to the server.

  • So every time clients get involved

  • in a new round of training, they automatically

  • get that new model.

  • So that's one way you can arrange for this to happen.

  • That's probably the easiest.

  • So you're kind of contributing as well as benefiting

  • by getting the latest.

  • And the other question was how do you

  • get started on building models.

  • And so, if you do have proxy data and you think it's useful,

  • then it certainly helps to play with it.

  • At least you can get some idea of what

  • model architectures are good.

  • You can never be sure because proxy data is only so good.

  • And if you never looked at the on-device data,

  • you'd never really know for sure how good

  • your proxy data might be.

  • So you might use proxy data.

  • But you might also choose not to.

  • You can simply try different model architectures,

  • deploy them on devices in, like as I mentioned, dry mode.

  • So it would be kind of running on devices

  • and getting evaluated but not affecting

  • anything other than consuming a bit of resources.

  • You could deploy hundreds of those at the same time

  • on different subsets of the clients

  • and see which are the most promising.

  • That second route would be more of a pure approach that

  • applies to any kind of on-device data,

  • including when you have absolutely no idea

  • where to get proxy data.

  • Like some weird sensor data might look like that.

  • And both are possible.

  • AUDIENCE: So first question I have is, does the TFF library--

  • does it integrate with TF Light?

  • And the second question I have is,

  • since it's language platform agnostic,

  • are you able to use it in non-Python--

  • can I use it in the language that's not Python?

  • KRZYSZTOF OSTROWSKI: OK.

  • Let me start from the second one.

  • So TFF computations are not Python.

  • I think I had a link on the slide.

  • If not, I can follow up later.

  • There's a protocol definition that describes

  • what a TFF computation is.

  • And it's a data structure that has absolutely no relationship

  • to Python.

  • So yeah, you could take it and you could execute it

  • in a completely different environment that

  • has nothing to do with Python.

  • And TensorFlow code inside of that computation

  • is represented as GraphDefs, TensorFlow GraphDefs.

  • So if you were to round it on a different kind of TensorFlow

  • run time, to the extent you can take those GraphDefs

  • and convert them for that other runtime,

  • maybe converting the ops or whatever,

  • that's also an option.

  • So TFF itself doesn't integrate with TF Light because TFF

  • itself does not include a platform

  • for on-device execution.

  • TFF is more like--

  • the best way to think of it is more like a compiler framework

  • in a dev environment.

  • But yes, you could use it with TF Light.

  • So you could define your computations

  • and maybe apply some conversion tools

  • to convert all the TensorFlow computations into a form

  • that TF Light can absorb and then

  • arrange for it to be executed.

  • AUDIENCE: Thank you.

  • AUDIENCE: Good talk.

  • Thank you.

  • I had a couple of questions.

  • So does the client--

  • do the models train until convergence?

  • KRZYSZTOF OSTROWSKI: Say that again.

  • Clients--

  • AUDIENCE: The clients, do they train until convergence?

  • Do they, or--

  • SPEAKER: No.

  • Typically, you would make a few passes over the client data

  • sets.

  • Because you don't have to train for convergence.

  • You're going to run 10,000 rounds anyway.

  • So doesn't matter.

  • AUDIENCE: And when the average model doesn't have access

  • to the data, how do you measure its performance and how

  • do you know it's good enough to now deploy--

  • send it back to all the clients?

  • KRZYSZTOF OSTROWSKI: Sorry.

  • If average model is--

  • AUDIENCE: So the average model is on your local server.

  • And then you don't have access to the data.

  • How do you measure the performance

  • of the average model?

  • How do you know when to deploy that model back?

  • KRZYSZTOF OSTROWSKI: Yeah.

  • So I did not describe federated evaluation.

  • But basically, it's like the temperature sensor example.

  • You can take that model, broadcast it to the clients.

  • Now the clients have the model and the data.

  • They can evaluate.

  • Each produces some accuracy metric,

  • average those out or compute a distribution.

  • And there you go.

  • So federated evaluation is kind of the same idea, just simpler.

  • AUDIENCE: OK.

  • And another question was, is there a way in federated

  • learning in TensorFlow where you can share parts of--

  • for example, the clients--

  • KRZYSZTOF OSTROWSKI: Sorry.

  • Share what?

  • AUDIENCE: So the clients have different labels, assuming,

  • but they have similar data.

  • Is there a mechanism where you can say the client shares most

  • of the model but they have their own couple of layers for them?

  • Maybe the last layer of the network

  • is specific to the client but not shared across clients.

  • Or does the entire model have to be shared across all clients?

  • KRZYSZTOF OSTROWSKI: Yeah.

  • It's not a capability that we include at the moment.

  • But it sounds like conceivably something we could do.

  • Maybe you can follow up with that.

  • Maybe you can contribute.

  • AUDIENCE: Thanks.

  • AUDIENCE: So one question that I had was,

  • when you kind of aggregate all of these models

  • into a central server, it seems like one of the problems

  • that federated learning solves is, I guess,

  • distributing computation.

  • But when you get to like a million people using the Google

  • keyboard, or a lot more actually,

  • it seems like either the server is

  • going to have to reject some gradient computations,

  • or there is some hierarchical aggregation

  • system where you aggregate the models upstream or whatever.

  • So I'm wondering if the second is true.

  • Are there latency issues with gradients reaching

  • the central model by the time that the model's changed

  • so much that it might corrupt it a little bit?

  • SPEAKER: So a couple of things.

  • First, this is not the same as gradient descent in the sense

  • that each client does a whole bunch of computation.

  • It trains for a while.

  • So what clients send to the server are not gradients.

  • They're updates, differences between trained models

  • and initial models that include a whole bunch of clients

  • in training.

  • That's just one thing.

  • The second one, with respect to which

  • clients have to participate in computation, so not

  • all clients.

  • If you, say, have 1 million clients,

  • you could pick 1,000 client samples.

  • And first make an iteration of the model on the first 1,000

  • clients.

  • Then make iteration on another 1,000 clients.

  • You don't have to include all the clients at once.

  • The only thing that matters is that eventually most clients

  • participate, so that most clients have a chance

  • to influence the training process at some point.

  • But they don't have to simultaneously be present.

  • But with respect to hierarchical aggregations, that's also true.

  • So both are true.

  • You do have hierarchical aggregations in our system

  • because you don't want a single server to be

  • talking to 10,000 machines.

  • But you also don't have to include the entire population

  • in training.

  • I think I answered all of them.

  • All right.

  • Thank you.

  • [APPLAUSE]

KRZYSZTOF OSTROWSKI: My name is Chris.

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it