Placeholder Image

Subtitles section Play video

  • SUJITH RAVI: Hi, everyone.

  • I'm Sujith.

  • I lead a few machine learning teams in Google AI.

  • We work a lot on--

  • how do you do deep networks and build machine learning systems

  • that scale on the cloud with minimal supervision?

  • We work on language understanding, computer vision,

  • multi-modal applications.

  • But also we do things on the edge.

  • That means, how do you take all these algorithms

  • and fit it to compute and memory-constrained devices

  • on the edge?

  • And I'm here today with my colleague.

  • DA-CHENG JUAN: Hi, I'm Da-Cheng.

  • And I'm working with Sujith on neural structured learning

  • and all the related topics.

  • SUJITH RAVI: So let's get started.

  • You guys have the honor of being here for the last session

  • of the last day.

  • So kudos and bravo--

  • you're really dedicated.

  • So let us begin.

  • So we are very excited to talk to you

  • about neural structured learning, which

  • is a new framework in TensorFlow that

  • allows you to train neural networks

  • with structured signals.

  • But first, let's go over some basics.

  • If you are in this room and you know about deep learning

  • and you care deeply about deep learning,

  • you know how typical neural networks work.

  • So if you want to take, for example, a neural network

  • and train it to recognize images and distinguish

  • between concepts like cats and dogs, what would you do?

  • You feed images like this on the left side, which

  • looks like a dog, and give the label "dog,"

  • and feed it to the network.

  • And the process by which it works

  • is you adjust weights in the network such

  • that the network learns to distinguish and discriminate

  • between different concepts and correctly tag the image

  • and convert the pixels to a category.

  • This is all great.

  • All of you in this room probably have built a network.

  • How many of you have actually built a neural network?

  • Great!

  • And this is the last session, last day--

  • but still, we're very happy that you're all with me here.

  • So it's all great.

  • We have a lot of fancy algorithms,

  • very fancy networks.

  • What is the one core ingredient that we

  • need when we build a network?

  • So almost majority of the applications that we work on,

  • we require label data, annotated data.

  • So it's not one image that you're feeding to this network.

  • You're actually taking a bunch of images paired

  • with their labels, cats and dogs in this case,

  • but of course it could be whatever,

  • depending on the application.

  • But we feed it thousands or hundreds

  • of thousands or even millions of examples into the network

  • to train a good classifier, right?

  • Today, we're going to introduce neural structured learning,

  • which is a framework.

  • We're happy to say it's support an in TensorFlow 2.0 and Keras.

  • And it allows you to train better and more robust

  • neural networks by leveraging structure in the data.

  • So the core idea behind this framework

  • is that we're going to take neural networks

  • and feed it, in addition to feature

  • inputs, structured signals.

  • So think of the abstract image that I showed you earlier.

  • Now in addition to these images paired with labels,

  • you're going to feed it connections or relationships

  • between the samples themselves.

  • I will get to what these relationships might mean.

  • But you have these structured signals and the labels.

  • And you feed both into the network.

  • You might ask, what do you mean by structure, right?

  • Structure is everywhere.

  • In the example that I showed you earlier,

  • if you look at images--

  • just take a look at this graph here.

  • The images that are connected via edges in this picture

  • here basically represent that there's some visual similarity

  • between these.

  • So it is actually pretty easy to construct structured signals

  • from day-to-day sources of data.

  • So in the case of images here, it's visual similarity.

  • But you could think of--

  • what if you tag your images and created

  • albums that represent some specific concepts?

  • So everything within an album or a photo album

  • has some sort of a connection or interaction or relationship

  • between them.

  • So that represents another type of structure.

  • It's not just for images.

  • We can go to more advanced or completely different

  • applications, like, if you want to take scientific publications

  • or news articles and you want to tag them with their topic--

  • one simple thing.

  • Take biomedical literature.

  • All the papers that are published,

  • whether it's Nature or any of the conferences,

  • they have references and citations to other papers.

  • That represents another type of structure or link.

  • So these are the kind of structures

  • that we're talking about here, relationships

  • that are exhibited or modeled between different types

  • of objects.

  • In the natural language space, this occurs everywhere.

  • If you're talking about doing Search,

  • everybody has heard of Knowledge Graph, which

  • is a rich source of information, which captures relationships

  • between entities.

  • So if I talk about the concept Paris and France,

  • the relationship is one is the capital of the other.

  • So these sort of relationships--

  • it's not typical to capture and feed them

  • into a neural network.

  • But these are the kind of relationships

  • which are already existing in day-to-day data sources.

  • So why not leverage them?

  • So that is what we try to do with the neural structured

  • learning.

  • And the key advantages--

  • before we talk about what it does and how we do it,

  • why do you even want to care about it?

  • What is the benefit?

  • So one of them is, as I mentioned earlier,

  • it allows you to take this structure

  • and use it to train neural networks with less

  • labeled data.

  • And that's the costly process, right?

  • So every application that you want to train,

  • if you had to collect a lot of rich annotated data

  • at scale for millions of examples,

  • it's going to be a tedious task.

  • Instead, if you're able to use a framework,

  • like neural structured learning, that automatically captures

  • a data structure and relationship,

  • with minimal supervision you're able to train

  • classifiers or prediction systems with the same accuracy.

  • That would be a huge boon.

  • Who wouldn't want that, right?

  • That's one type of a benefit.

  • Another one is that, typically, when you deploy these systems

  • in practice, in real-world applications,

  • you want the systems or networks to be robust.

  • That means, you train the ones you don't want--

  • if the input distribution changes or data suddenly

  • changes or somebody corrupts the images with adversarial

  • attacks--

  • suddenly the network to flip the predictions and go bonkers.

  • So this is another benefit where,

  • if you use neural structured learning,

  • you can actually improve the quality of your network

  • and also the robustness of the network.

  • So let me dive a little deeper and give

  • you a little more insight into the first scenario.

  • Take document classification as an example.

  • So I'll give you, probably, some example that you probably

  • have at your home.

  • Imagine you have a catalog or a library of books in your home.

  • And these are digitized content.

  • And you want to categorize them and neatly arranged them

  • into specific topics or categories.

  • Now one person might want to categorize them

  • based on the genre.

  • A different person might say, oh,

  • I want it to belong to the same period.

  • A third person might say, oh, I want

  • to capture the books that have the same kind of content

  • or phrases or words that appear by the same author

  • or based on some certain aspect.

  • Like, there is a particular plot twist

  • that you have that is captured in different books.

  • And I want to arrange them based on that.

  • So you can think of--

  • not everybody's needs are the same.

  • So are you going to collect and annotate

  • enough label data for each of those tasks

  • to create a network with very high accuracy,

  • distinguish and classify them into these genres?

  • Probably not.

  • So on the one hand, you have plenty of data.

  • The raw book content is available to you.

  • Or raw news articles are available to you.

  • There's plenty of raw text available to you.

  • But it's hard to construct this labeled annotated data.

  • So this is where Neural Structured Learning or NSL--

  • going back to the previous example,

  • we can model the relationships between these different inputs

  • or samples, using structure.

  • Again, I will tell you what the structure means.

  • But pair it with a few label examples

  • and then train a network that is almost as

  • good as the network that is trained

  • on millions of examples.

  • So imagine you could, for your application, only use 5% or 10%

  • of the label data and train as good

  • a classifier or prediction system.

  • So that's what we're trying to do

  • and help you do with neural structured learning.

  • And I said, structure-- how do you

  • come up with this structure?

  • Or how do you do this for document classification?

  • I'm going to give you a forward reference

  • here that Da-Cheng is going to talk a little bit more

  • about as well.

  • So there is a hands-on tutorial for the exact example

  • that it told you, document classification with a graph.

  • And you can go to the Neural Structured Learning TensorFlow

  • website and try this out for yourself,

  • all within just a few lines of code.

  • All you need to do is construct the data and the format.

  • And with a few lines of code, you

  • should be able to run the system end-to-end.

  • Switching to the second scenario,

  • it's great to have high-quality models.

  • We have seen that neural networks have the capacity

  • to train really, really good quality models,

  • especially as you increase the number of parameters.

  • But robustness is an important concept and, actually,

  • an important criteria when we deploy this

  • to real-world scenarios.

  • For example, if you have an image recognition system

  • and suddenly the network flips its prediction,

  • because the images are corrupted,

  • this is not a good thing.

  • And this is not just for image classification.

  • It could be with text or any kind of sensor data.

  • Here is an example.

  • Take the image on the left.

  • What do you think it is?

  • It's a panda.

  • Take the image on the right.

  • What do you think it is?

  • Anyone who disagrees it's a panda?

  • Basically, your neural network is

  • saying that both of these images are pandas.

  • But that's not what happens if you actually train

  • a neural network, like, for example, resonance or whatever,

  • the state of the [INAUDIBLE] network

  • and you try to apply it to the two images.

  • The first one would be correctly recognized as a panda.

  • The second one would be recognized

  • by a completely different concept,

  • in this case, a gibbon.

  • And the reason it happens is, if you zoom in really close,

  • the second image is actually an adversarial example.

  • It's actually created by adding some noise

  • to the original image.

  • But these are so tiny.

  • It's very imperceivable for the human eye.

  • But the network, based on these changes in the pixels,

  • flips its prediction completely.

  • Imagine this happening in a live system.

  • So you don't want these things to happen.

  • So you want your networks to be robust.

  • So this is where NSL comes again.

  • And again, we're going to use the same concept--

  • use the structure and the data to train more robust models.

  • And here, the structure is slightly different from the one

  • that I mentioned earlier.

  • Earlier, we were talking about explicit relationships or model

  • as a graph.

  • Here, we are going to construct the structure.

  • So we take the original image.

  • And we also generate a perturbed image,

  • or an adversarial example for that original image.

  • Now these two images are joined via a link.

  • So there is a relationship between them.

  • The difference is, this is dynamically

  • generated during the learning process,

  • as opposed to the earlier case where somebody gives you

  • a knowledge graph or these structured signals

  • or you construct them from some data source.

  • And what we try to do here is, using this structure,

  • we already know the label for the first image is a panda.

  • We're trying to force the network

  • to learn that the perturbed image also should

  • be classified as a panda.

  • So that's, at a very high level, how this works.

  • Again, if you want to try this out for any of your network

  • or in any application, there's a tutorial

  • which allows you to do this-- like, use the API.

  • So NSL enables both the neural graph type of learning

  • and also the adversarial learning.

  • And you can just go to the website

  • and run through the code example, all

  • with just a few lines of code.

  • You will see more details later in the talk.

  • So this is at a high level why we

  • would want a framework like NSL and the power of using

  • it to enable more robust networks

  • and also build networks that can be trained

  • with minimal supervision.

  • These are very, very handy when you want to build applications

  • on the fly and very, very custom applications that

  • do not fit the regular mode.

  • Let us now dive a little deeper into how we do this--

  • what the framework is doing.

  • So in the first paradigm, as I said,

  • structure is going to be used to model and as an input

  • and [AUDIO OUT] network.

  • So we call this paradigm Neural Graph Learning.

  • So the core idea is that, in addition to

  • these feature inputs that you're familiar with, like pixels

  • for image classification or word features

  • or phrase features or sentence features for document

  • classification-- in addition to those feature inputs,

  • you're going to pass in structured signals modeled

  • as a graph.

  • You might ask, by the way, at this point,

  • where's the graph coming from in this settings, right?

  • In some cases, it might be given to you, as I said,

  • like the citation graph, knowledge graph.

  • In other cases, you can actually construct a graph.

  • We're very happy to say that we provide tools--

  • again, you will hear more about this--

  • that allow you to construct these graphs from sources

  • of data like word embeddings or image embeddings.

  • So now the goal here in neural graph learning is--

  • the network is going to be forced to jointly optimize

  • both the feature input and the structured signal

  • simultaneously.

  • Let's see how that happens, diving in deeper.

  • If you're trying to look at what exactly

  • the network is learning, every network

  • is trying to optimize some loss.

  • So in image classification, what is the loss

  • when you take the pixels, pass through the network,

  • get some prediction-- what is the error incurred

  • between the predictions and the true label?

  • So in NSL, the Neural Graph Learning setting--

  • we call these networks trained in this mode Neural Graph

  • Machines.

  • What we're trying to optimize is two components.

  • One is the standard loss, which in image classification

  • is the loss incurred when you pass

  • the pixels through the network, get the predictions,

  • and measure the error with the true labels.

  • The other component is going to be

  • based on the structured signal or the graph that you provided.

  • And where that comes in is--

  • if I have an image that looked like the pit bull dog,

  • that's labeled as a pit bull.

  • If I have a different image, which

  • through my structured signal has an edge

  • with the original image, then the network

  • is forced to learn that the source

  • image and its neighbor in the graph

  • should learn similar representations.

  • That means, you're trying to say,

  • respect the structure that you provide as input

  • and, also, try to optimize the supervised loss.

  • Now this is very flexible, as you can imagine.

  • You can use the API to change the formulations.

  • That means, instead of a supervised loss,

  • if you want to do unsupervised learning or a very different

  • kind of loss, you can actually change the first component

  • very easily.

  • And just another quick note--

  • we don't have time for that.

  • But we're happy to answer more questions at the end.

  • The losses themselves, the type of losses,

  • are also customizable.

  • You can use L2 loss, cross entropy,

  • depending on the kind of applications.

  • You can even use ranking losses, if you will.

  • So this now makes it very, very easy for you

  • to train a wide range of applications

  • in a different learning setting, whether it's unsupervised,

  • supervised, or ranking, or classification.

  • But at the same time, you'll be able to pass in some structure

  • in a seamless manner.

  • Here is an example.

  • So the NSL neutral graph learning--

  • take image classification.

  • You start with some samples, as I said--

  • the pixels.

  • And you also have a structure.

  • In this case, the images are connected

  • in the graph, based on some user interaction signal

  • or, basically, for example, as I showed you,

  • they belong to the same album.

  • Or there's some structure tying them together.

  • Assuming this is given to you, we

  • pass this through the network.

  • Both the sample and its neighbors

  • are passed simultaneously through the same network.

  • And the network is learning to optimize within each layer--

  • and this is also configurable, by the way--

  • to push the embedding for neighbors closer to each other.

  • That means two images that are connected in the graph

  • should learn similar embeddings when

  • passed through the network.

  • simultaneously, you should also optimize that they should

  • learn the correct predictions.

  • So if one of them was labeled as a panda,

  • then you also want the prediction error to be minimal.

  • So both of these parts are being optimized jointly.

  • OK-- so hopefully this gives you an idea

  • of how we use neural graph learning

  • and enable this-- we had the neural structured learning

  • framework.

  • As I mentioned, structure can come in different forms.

  • That was an explicit structure we provided as a graph input.

  • But we can also do implicit structures.

  • And this is where the adversarial learning

  • type of paradigms are enabled, using the NSL framework.

  • And here again, we're going to jointly optimize

  • features and structure.

  • Except the difference is the structure

  • is now induced during the learning process,

  • by constructing adversarial examples to the original input.

  • So if you nxi as an input, you create

  • nxi-prime, which is an adversarial version of that.

  • And these two are connected under some sort of weight,

  • based on--

  • this is configurable.

  • And this structure is now passed through the network.

  • And the network is forced to optimize both of them

  • to the same embeddings or representations inside.

  • These are all great.

  • So as I mentioned, this opens up a host

  • of new kind of application or training scenarios.

  • The best part about this is, if you're thinking, oh, now

  • how does this work with transformers or resonants

  • or different kinds of network--

  • network structure doesn't matter here.

  • You can use this with any type of network structure.

  • That's the best part--

  • RNNs, transformers, resonance convolutions.

  • You have combination of CNNs, LSTMs.

  • It doesn't matter.

  • These are learning strategies.

  • You can actually build a network but enable

  • NSL, both in the adversarial and in the neural graph setting

  • very easily with very few lines of code in TF 2.0.

  • And to tell you more about that, I'm

  • handing it over to Da-Cheng.

  • DA-CHENG JUAN: All right, thank you, Sujith.

  • Next, we are going to introduce the libraries, tools,

  • and trainers provided by the structural learning framework.

  • Everything here is compatible with TensorFlow 2.0.

  • So you could train the neural nets with structured signals

  • while enjoying all the great features from TensorFlow 2.0.

  • This is the training workflow we just mentioned previously.

  • Every segment in red here is a new step

  • introduced to the workflow to train with structured signals.

  • And neural structured learning provides libraries and tools

  • for these steps.

  • Let's first take a look at the left part of the workflow.

  • The training samples and neighbors

  • from the same neighborhood are packed to form the new batch.

  • Notice that, in the batch, each training sample

  • is extended to include the neighborhood information.

  • To achieve this in a neural structured learning framework,

  • we provide standalone tools, such as build_graph

  • and pack_nbrs that a user could involve directly.

  • We also provide functions that users could integrate

  • into their own custom pipeline.

  • And you may notice, build_graph and pack_nbrs

  • here are listed both as binaries and functions.

  • This is not a typo.

  • This means they could be invoked both as a binary

  • or as a function.

  • Next, let's take a look at the right part of our figure.

  • Again, we provide libraries for these new steps, introduced

  • to enable graph regularization.

  • Both the training sample and its neighbor

  • will be fed to the neural network.

  • And unpack_neighbor_features is for this purpose.

  • The model in this illustration is convolutional neural net.

  • But it can be any type of neural network,

  • not just limited to the convolutional neural net.

  • Then the difference between the sample and its neighbor

  • embedding is calculated and added

  • to the final loss as the regularization term.

  • In addition, we also provide libraries

  • to generate the adversarial neighbors,

  • as in place of structure signals for regularization.

  • Finally, we also provide Keras APIs

  • for a user to easily build Keras trainers

  • with graph_regularization or adversarial_regularization.

  • The Keras API from neural structured learning

  • supports all three types of model building,

  • either via sequential, via functional API,

  • or via subclassing.

  • This is just a subset of tools and libraries

  • we provided in the neural structured learning framework.

  • Please visit our website to learn more

  • about the tools and APIs in neural structured learning.

  • The first step, if you want to use neural structured learning,

  • is to do a pip install.

  • Here, we provide a code example, demonstrating the API

  • from the Neural Structured Learning library.

  • We first need to read the training data.

  • Note that the data here are pre-processed by the tools

  • or functions to incorporate the graphs into training samples.

  • Next the user could build custom models

  • and treat it as the base model.

  • The user could build this model, the base model--

  • use any of their favorite Keras API,

  • like we just mentioned-- sequential, functional,

  • or subclassing.

  • After the base module is built, we

  • use the API to wrap around the base module

  • to enable the graph_regularization.

  • There are several hyper parameters

  • we need to configurate.

  • For example, we need to specify the maximum number of neighbors

  • considered during our regularization.

  • Also, for each hyper parameter, we

  • provide default values set to a certain number

  • that we know, empirically, they work well.

  • After we enable graph_regularization

  • in Keras model, the rest is just a standard Keras workflow--

  • compile, fit, and then eval.

  • That's it.

  • Within five lines, we are able to enable graph_regularization.

  • And five lines actually include one line

  • that's a common, not the actual logic implementation.

  • Here, let us show some results of a model trained

  • with structured signals.

  • The task is to conduct the sentiment analysis

  • on the IMDB movie reviews.

  • We want to point out that this result is just from one

  • of our internal experiments.

  • Your actual mileage may vary from task

  • to task, from data to data, or from model to model.

  • The x-axis here represents the amount

  • of supervision, which could be converted

  • into the amount of label example.

  • And y-axis here represents the model accuracy.

  • The left figure shows the performance

  • of a bi-directional LSTM.

  • And the right figure shows the performance

  • of a feed forward neural net.

  • As you can see, when we have lots of training examples,

  • when the amount of supervision is high,

  • there is actually not much performance difference.

  • But as soon as the amount of supervision

  • dropped to 5% or even 1%, training

  • with structured signals leads to more accurate models.

  • Usually, the improvement is more than 10%.

  • If you are interested in more results, please,

  • refer to our paper.

  • So training with structured signals sounds really great.

  • But sometimes, we do not have a structure.

  • We do not have a graph to begin with.

  • So what should we do?

  • Neural structured learning provides two methods.

  • The first one is to construct the graph

  • or to construct the structure via data pre-processing.

  • And the second one is to construct such structure

  • via adversarial neighbors.

  • Let's focus on the data pre-processing one first.

  • Again, let's take document classification as an example.

  • Given a sample document, how do we

  • know if another document is similar enough

  • to be a neighbor document?

  • These documents will be projected

  • to the embedding space.

  • For example, we could use the pre-trained [? Bird ?]

  • embedding that's mentioned in the earlier TensorFlow talk,

  • and to project all these documents

  • into generator embedding.

  • Documents that are closer in the embedding space

  • are assumed to have similar semantics.

  • Next, we examine the similarity between two embeddings,

  • [INAUDIBLE] similarity, or other metric could be used here.

  • If the similarity is higher than a predefined threshold,

  • we treat these two documents are similar enough.

  • And therefore, we add an edge between these two documents

  • to make them neighbors.

  • By repeating this process, we could construct a structure

  • or construct a graph among all the data via the data

  • pre-processing.

  • After we have the graph, the rest of the training flow

  • is exactly the same as we mentioned before.

  • The one asks if the graph is given.

  • Let's again take a look at the actual code example.

  • We first load the training data and test samples

  • from the IMDB data set.

  • Next, we load the pre-trained embedding model

  • from the TF Hub.

  • The embedding model we use here is a swivel model.

  • But feel free to replace that with

  • your favorite pre-trained embedding

  • model, such as [? Bird. ?]

  • Next, we project the text or the document

  • of each review from IMDB to the embedding,

  • so we could calculate a similarity between two reviews

  • in the embedding space.

  • Remember, when two reviews are closer in the embedding space,

  • we assume they share similar semantics.

  • After we project text to the embedding,

  • we use Build Graph function provided

  • by the neural structured learning

  • to construct the graph.

  • When invoking this function, we also

  • need to provide a similarity threshold, which

  • is 0.8 in this case.

  • After we have the graph, we call pack_nbr function

  • to incorporate the neighbor samples into each training

  • sample.

  • Here, for each sample, three neighbors are considered.

  • After we augment the training data with graph signals,

  • everything is just like the first code example we show.

  • Read data.

  • Build a base module via either sequential,

  • functional, API, or subclassing-- feels familiar.

  • Then we use the neural structured

  • learning API to wrap around the base module to enable the graph

  • regularization.

  • Again, the rest of the workflow is just a standard Keras flow--

  • compile, fit, and eval.

  • So we also provide a hands-on step-by-step tutorial

  • on our website.

  • So feel free to visit the website

  • and play it by yourself.

  • The second method to construct a structure

  • or construct a graph signal is to build a graph dynamically

  • by adding adversarial neighbors.

  • For each training sample, we find a malicious perturbation

  • based on the reverse gradient direction.

  • In other words, that perturbation

  • is designed to confuse the model by most, which

  • means to maximize the loss.

  • Then this malicious perturbation is

  • added to the original training sample

  • to create an adversarial neighbor.

  • Again, the design of adversarial neighbor

  • targets at confusing the model by most,

  • which is to maximize the loss.

  • Then we add an edge between this adversarial neighbor

  • and the original training example.

  • And therefore, we have constructed a graph

  • or constructed a structure.

  • This is the code example, using adversarial Keras model

  • from neural structured learning.

  • Again-- feel familiar?

  • In addition to these three lines,

  • everything else followed the same workflow

  • introduced before.

  • Neural structured learning has been widely used

  • in many products and services in Google,

  • for example, learning image semantic embedding.

  • Here, we provide six examples, two

  • for each semantic granularity, to illustrate a difference

  • from coarser to ultra-fine granularity.

  • The object in the right is the Golden Gate Bridge.

  • The Golden Gate Bridge is a steel, red bridge.

  • But not all the steel red bridge,

  • such as the image in the middle, are the Golden Gate Bridge.

  • Learning such embedding is a challenging task, partly

  • due to the large variations seen among images

  • that belong to the same class or to the same category.

  • Learning image embedding to capture fine-grained semantics,

  • however, is the core of many image-related applications,

  • such as image search, either query

  • by traditional keywords or the example query image.

  • This is the overall neural architecture used

  • to learn the image embedding.

  • And again-- feel familiar?

  • This is exactly the same workflow

  • we introduced again and again in this talk.

  • And since this talk focuses on neural structured learning,

  • we will not introduce in detail about other techniques,

  • such as Sample Softmax used to train a model.

  • If you are interested, please refer to our paper.

  • Let's zoom in into the structure part.

  • The graph used here is a co-occurrence graph.

  • Essentially, the co-occurrence graph

  • is trying to answer the following question.

  • Given one image is selected, what

  • are other images that's sufficiently similar

  • that will also be selected?

  • Say, the query is the white English bulldog.

  • If two images occur for many, many times,

  • we add an edge between these two images, making them neighbors.

  • So here are some experimental results.

  • For each query image, we provide a top-three nearest neighbors,

  • based on the image embedding learned.

  • The image colored by green are rated

  • to be strongly similar with the query image by human raters.

  • Whereas the image colored by red is not so similar.

  • For example, in the left figure, when the image query

  • is a white scroll, all the white scrolls

  • can be correctly retrieved by using

  • a embedding learned from the neural structured

  • learning framework.

  • In other words, learning with structure

  • is able to capture image semantics much closer

  • to actual human perception.

  • So to recap-- training with structure is very useful.

  • Less labeled data are required to effective train a model.

  • Also, learning with structured signals

  • leads to the more robust model.

  • And neural structured learning provides APIs,

  • provides tools in Keras model.

  • And also, it works for all types of neural net,

  • either free forward neural net, convolutional neural net,

  • or recurrent neural net, or any custom neural net you designed.

  • This is probably the most informative slide of this talk.

  • You can learn tools, libraries, or hands-on tutorials in detail

  • in our website.

  • Also, please star our GitHub.

  • We do take GitHub issues.

  • And we would love to hear from you.

  • We are looking forward to developing this framework

  • with all of you, to make this framework more comprehensive.

  • We will be waiting for your pull request on GitHub.

  • Thank you.

  • [APPLAUSE]

  • Do you have time for questions?

  • OK, so I think we still have some time for questions.

  • SPEAKER 1: You may have to use the mic.

  • Like--

  • AUDIENCE: The build_graph function

  • that you mentioned, does it do pair wise comparison

  • across all the items or--

  • DA-CHENG JUAN: All the samples-- not only training-- yeah.

  • AUDIENCE: I see.

  • So for--

  • DA-CHENG JUAN: Both labeled and un-labeled samples.

  • AUDIENCE: I see.

  • So for IMDB-- it does it--

  • the user does it--

  • how long did it take to build a graph?

  • [INAUDIBLE]--

  • DA-CHENG JUAN: So it heavily depends

  • on what machines you are using, right?

  • So since IMDB review is a relatively small data set,

  • building graph will not take too long.

  • AUDIENCE: I see.

  • SUJITH RAVI: One addition to that-- so it turns out,

  • you don't really have to do all pairs comparison.

  • There are much faster techniques.

  • AUDIENCE: Yeah.

  • SUJITH RAVI: So stay tuned.

  • I think we are in the process of releasing other tools which

  • will make it much faster, even on single machines,

  • without having to do--

  • if you have a million examples, you don't need to do a million

  • cross million, right?

  • AUDIENCE: Sure-- sure.

  • We actually tried to use that.

  • And it took a lot of time.

  • So instead of that, then we used phase [INAUDIBLE]..

  • SUJITH RAVI: Great, so you're probably

  • going to be the first user for something

  • that we're going to release soon then.

  • AUDIENCE: Yeah-- OK.

  • Thank you.

  • AUDIENCE: Very nice.

  • Just so I think I can understand what you're doing--

  • it looks like you're taking images and then GAN-generated

  • images from the generator--

  • associating them and, thereby, negating the ability of GANs

  • to deceive a neural network.

  • Is that correct?

  • SUJITH RAVI: It does not necessarily

  • have to be the GANs structure.

  • The idea is, any network that you're trying to learn,

  • with using the gradients that are back dropped

  • and reversing the gradients to construct a noisy example,

  • if you will.

  • And the idea-- the reason to do this--

  • and then while doing the training--

  • is that, next time the network sees this noisy example,

  • it will still learn to correctly identify it, rather

  • than flipping the predictions.

  • AUDIENCE: And then--

  • SUJITH RAVI: And so we're trying to make

  • the intermediate layers and also the predictions robust.

  • AUDIENCE: Sure.

  • In that loss function, it looked like it

  • had something from a discriminator

  • plus a generator or something-- that notation.

  • Or maybe it was just a coincidence.

  • SUJITH RAVI: The-- which--

  • DA-CHENG JUAN: Which equation?

  • AUDIENCE: That loss function that you had that you

  • were trying to minimize.

  • SUJITH RAVI: Oh, so the loss function just

  • has two components.

  • The first one is the supervised loss or whatever loss

  • your application has.

  • The second one is that we're factorizing

  • the loss over neighbors, like source images and neighbors.

  • And in the case of adversarial, that neighbor image

  • is basically constructed.

  • And the weight is basically--

  • AUDIENCE: Yes.

  • SUJITH RAVI: --whatever weight that you're assigning

  • to that in the field image.

  • AUDIENCE: Very good.

  • Thank you.

  • AUDIENCE: Very neat idea--

  • thanks.

  • But I think on page 34, the graph looked a bit weird to me.

  • Can you open the--

  • DA-CHENG JUAN: Page 34?

  • AUDIENCE: 34-- the one you compared with the standard.

  • DA-CHENG JUAN: This

  • AUDIENCE: Yeah, this one.

  • Yeah-- this graph looked a bit strange to me.

  • How many samples of each point you generated

  • for the error version here?

  • DA-CHENG JUAN: So basically, the error bar

  • is from the test data set.

  • It's not from--

  • AUDIENCE: No, no, no.

  • DA-CHENG JUAN: --the training--

  • AUDIENCE: Why I'm on--

  • how many trials--

  • SUJITH RAVI: --training trials did--

  • AUDIENCE: Yeah, how many times you start from a different seed

  • and--

  • DA-CHENG JUAN: Oh, we trained--

  • AUDIENCE: --trained the networks and then get these results?

  • DA-CHENG JUAN: Oh, how many training trials?

  • I do believe we have five trials per--

  • AUDIENCE: Each point is five samples.

  • DA-CHENG JUAN: Yeah.

  • AUDIENCE: Average up to five samples--

  • OK.

  • SUJITH RAVI: Yeah, technically, like is said,

  • it depends on the network that you're training.

  • Typically, what you would expect is,

  • even if the network is really powerful,

  • like the gap to increase-- so there's some--

  • in the 2% thing, you will see, the gap

  • is lower than what the 5% is.

  • But that's just based on this data set.

  • So typically what you see is the gap

  • increases as the supervision ration goes lower.

  • AUDIENCE: Yeah, this is what I would expect.

  • But if you look at it for 2% and then 1/2 a percent--

  • SUJITH RAVI: Yeah, yeah-- so--

  • AUDIENCE: --they're coinciding [INAUDIBLE]----

  • SUJITH RAVI: Yeah, so this is just one example.

  • We would recommend that you try it

  • on your own data set, like, on one of the networks

  • that you build.

  • AUDIENCE: Hi, how you doing?

  • I was curious how you guys might apply this to, say,

  • segmentation or to video classification.

  • SUJITH RAVI: I think there are a couple of different ways.

  • Like in the video classification,

  • you can look at videos that are sort of related,

  • like, let's say, from the same channel, for example.

  • And you have similar kind of content.

  • Or the metadata kind of matches--

  • you can actually now create these links

  • between these different videos.

  • And assuming that you have a way of processing

  • through [INAUDIBLE] that puts all

  • these frames into some representation,

  • you're going to apply this regularization in the NSL

  • framework to say that, hey, these related videos all

  • should be optimized to learn the same prediction.

  • There are many other ways.

  • We can talk offline about that.

  • AUDIENCE: OK-- thank you.

  • AUDIENCE: If we're talking about--

  • sorry.

  • SUJITH RAVI: Last question-- yeah.

  • DA-CHENG JUAN: Yeah-- last question.

  • AUDIENCE: Yeah-- I have two actually.

  • Sorry.

  • So if we're talking about data augmentation,

  • do you compare the classical principles, like distortion

  • and blurring, stretching, and all this kind of stuff,

  • compared to the adversarial-generated samples?

  • Is there any benefit of using one

  • versus another in this case?

  • SUJITH RAVI: So--

  • AUDIENCE: Like, why adversarial-generated

  • samples versus, let's say, some classical principles?

  • SUJITH RAVI: Because the distortion and rotation

  • are predefined transformation functions that we have, right?

  • The adversarial-- it depends on the data and the network.

  • And you're using the gradients that

  • are back-dropped to create an equal example,

  • the hard example, if you will.

  • Like for transformations, everybody

  • knows, hey, this is the rotation.

  • You apply rotation or you're blurring.

  • But what if somebody attacks and says, it's not a rotated image.

  • But instead, I take pixels here, pixels there,

  • and I transform them or blur them.

  • That's a very different kind of transformation function.

  • So here, you're trying to do this across the board,

  • using the learning process and the gradients that

  • are being learned for the network at that layer.

  • AUDIENCE: Mm-hm.

  • SUJITH RAVI: Does that make sense?

  • AUDIENCE: And the second question--

  • when the network sees the feedback from the graph

  • that, actually, adversarial neighbor gives you

  • a completely low score on the example but the network itself

  • is pretty sure that the class is there-- so how does it result

  • on this level, when you have a pretty

  • convinced scenario versus the adversarial neighbor?

  • SUJITH RAVI: I think the clarification maybe referring

  • to the gibbon--

  • DA-CHENG JUAN: Sure.

  • SUJITH RAVI: --mentioned.

  • So just to clarify, the adversarial example

  • does not necessarily have that wrong label

  • during the learning process.

  • We're constructing it on the fly.

  • And we're making the network force it to--

  • we just target as, hey, that is a gibbon, if you didn't

  • apply the adversarial learning.

  • AUDIENCE: Yeah, but you have the feedback

  • from the graph that says that the classification is wrong

  • here.

  • So you try to regularize it.

  • And by this moment, the network is already

  • pretty much convinced that it's not a gibbon, for example.

  • But the graph tells that, still, the class is wrong.

  • So how does this result?

  • DA-CHENG JUAN: So a quick add on this

  • is, actually, in some experiments,

  • we applied a correct label to the adversary example.

  • In a previous slide, actually, the adversarial example--

  • we will have the [INAUDIBLE] as labeled as a gibbon.

  • AUDIENCE: All right-- OK.

  • OK.

  • DA-CHENG JUAN: So you could think it this way--

  • AUDIENCE: Yes, yes.

  • DA-CHENG JUAN: This sample is very confusing to the model.

  • But we still supervise model to learn the correct label.

  • And therefore, later on--

  • AUDIENCE: All right.

  • DA-CHENG JUAN: --the neural network will be more robust.

  • AUDIENCE: OK-- clear.

  • Thank you very much.

SUJITH RAVI: Hi, everyone.

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it