Name: Optical Character Recognition (OCR) - Computerphile
Uploaded: 2020-03-27T18:22:55.000Z
Duration: 14 min 16 s
Description: Thousands of YouTube videos with English-Chinese subtitles! Now you can learn to understand native speakers, expand your vocabulary, and improve your pronunciation...

no si r is optical character recognition, and the key point about OCR is I think about it as a compression.

Adverbs of manner

You're basically taking an image in two dimensions with some depth, and you're going through a number of stages of compression to extract only the information out of here.

That's of interest to the person looking at it.

And so people talk about this is being basically turning it into the founder of the text that's there, and that is important.

That's what I would call metadata about it.

But the first and foremost thing you're doing is compressing information that's out there into information that could be used either by another machine or by a human.

I've used this largely in system types of engineering where, for example, I worked with Jerome Better Clay, who's from a P F ail in Switzerland.

He wanted to be able to take American signs and turned them into French so that he didn't have to worry about what they said in English.

And that gets complicated in the states because there's brown signs which you're talking about historic places or museums.

There's green signs which are giving you information.

There's blue signs which are giving you other types of information.

And then, of course, there's red and yellow signs which are very, you know, partner, and you better stop.

You better do this and that and it can be daunting if you're coming from another country.

And so it was a really good OCR application where you were taking some type of an image, converting it into the tax said it was associated with it and then for him functionally translating that into another language.

And I think that's very important for people to recognize.

OCR is not just okay, I've got a document, I'm scanning it and it converts it into the text form.

People don't actually do that very much anymore.

People have a supercomputer in their pocket called a mobile phone, and they want to be able to use that for OCR.

So we're going to start off with is the very first stage of compression in the most important one.

Because if you don't do the job right in the first stage, the rest of it is toast.

And so I got to work with one of the world's preeminent experts on OCR over the course of about five years when he was at HP Labs Ray Smith, who's currently with Google.

He has taken the HP Labs OCR work, which he did, and open sourced it.

It's called Tesseract, and I recommend it to people when they start with OCR because it's free.

You're not putting a lot of cost into it other than your own time, and it does work pretty well.

And he and Ray has been sort of guiding that overtime to enable you to be able to use OCR for other languages.

And he's also been a good proponent of what I call med algorithm mix, where you're taking multiple OCR engines and using them to create better.

So let's start off with threshold ing, and I'm gonna do a little drawing here.

And so this is going to go from 0 to 2 55 and basically what this is is looking at the density that we have for an image and so to 55 is going to be pure white.

Unless you're on some Apple systems, which will be the opposite, will be black, but zero is gonna be full black.

And so in between, at about 1 28 you'll have half gray, which will be like this.

And so what you'll get is a hist a gram, which is basically a graph of how many of each of these types of black and white occur.

And most of these grafts are gonna look something like this in an ideal state.

And so when I get a graph like this, I need to be able to do what's called Finalization and bynder.

Ization means that I'm going to turn this from a rich panoply of values from 0 to 2 55 to just zero in a one.

And so you can probably see from this graph.

And so there's a number of methods that actually do bynder ization, the most famous of which are probably the out Sue and then the Hitler at all method.

Okay, so what I've talked about here is a simplification.

What I've done is giving you a global bynder ization method.

There are also local methods that will handle, For example, when you've got Blur when you've got a little you know, Joyner's et cetera, look at those briefly and kind of show you the impact that will have on the text you get.

This is difficult to kind of show by drawing, but I'll do my best.

Suppose you've got, for example, the letter t here and around it is some noise that you've captured in the image.

If I do some type of a global threshold, what I'll get out of this is something that looks like this, and so that's starting to look like a T.

But if I do a better job with this, if I do some local filtering, so this is a global threshold and this is a local financial, that'll taking that into accounts, I may be able to do some trimming so that the T looks more like this, and that's actually what I want out of this.

The character that I've got now again is buying Arise where I'm showing the blue here of the black here.

I've actually got information, all of this stuff is the background which has been threshold it out.

And so now this is a single what we call a connected component character and the connected component character is the next stage for OCR.

And so what I'll do now is start to represent things as if I've drawn an outline around these.

And so this is the outline around that connected component, and those would all use those air basically shapes or objects that I've collected from the connected component.

And so let's take a look at that and we'll talk about the next stage of OCR, which is where we're actually forming the characters.

So this is what I have now and remember, there's no metadata associated with this.

If the original was, for example, in red green blue color, it had 24 bits.

This is one bit, and so I've already done the compression by a factor of up to 24 from what I started with Now I have to figure out what that is.

You and I can look at this and in the context of a Latin language, will know that that's a T.

We don't know that it's English yet we don't know that it's Italian.

We also don't know that it's a Latin language.

So if it was a Cyrillic language of it was an Arabic language et cetera, we might have to look for other things that give us cues.

So the next thing we actually do is collectively look at a bunch of characters and let's say we've got the word, they're in English.

We look at those and we look for characteristics about those that allow us to basically assign the language.

And so work that was done in this area was led by Larry Spits.

Larry Spitz was able to identify what language it was off of.

Just the character set that you got from these connected components.

Let's consider that done science or other people working on this Ray Smith himself has worked on this other people in the various optical character recognition Zorro see our vendors, which include Abby, which include nuance, which include a wide variety of other folks over the years they will be able to ascertain based on the character, said a fairly large character said.

Usually what languages is with a lot of confidence, there's also a default.

If you don't know what it is and you don't have a lot, you start off thinking it's English, right?

So that's kind of a common language that you start with or you think of the local language.

If you're tied into, for example, GPS and we know you're in Forenza, we're going to assume that it's Italian until proven otherwise.

So there's a number of ways of doing that.

I don't have enough time to go into all of those details.

But the bottom line is I now have a character set.

I now have a pretty good idea of what language it is Now.

I have to actually do the downstream matching of those characters to what they are, and then also potentially finding out what font it is so that I can reproduce this in the correct font for the final.

I'm converting this from, let's say, Italian and English and English into Italian.

It's gonna be the default fund on the display of the device.

I'm reading this off, but we'll just go there So again, very wide set of applications that come off for this.

I apologize for going into so many will try to keep it simple.

So from this, the next thing we need to do is actually classifications.

And what we're doing there is we're classifying by the alphabet.

And so if I know that it's English, I'm going to try to do anything from something as in elegance as pattern matching.

Pattern matching is brutal because if I don't have the right font, it might not be a good match.

So if I've got this little teacher, the ideal t that I have might be this.

If I know which font it is, I might have a much better tea that will match against this one much better.

So a lot of trade offs here when I actually do the classification and when I actually do the font identification and most modern systems, there is a meta structure around that that allows me to speculate on which character said it might be, and then also on which font it might be within those characters, so a large case, it might not just declassification by alphabet.

It might also be either simultaneously or sequentially or recursive.

Lee Classification buys the fonts, and so the key thing there is that I'm going to be doing font matching, and you want to move this into classifications because there you're able to bank on a lot of other good work that's been done outside of the field of OCR.

Subtitles ListPlay Video

Optical Character Recognition (OCR) - Computerphile

process

figure

pattern

boost