Subtitles section Play video
Hello.
It's been a while since I've done one of these videos, but this video is eight bits of image processing.
You should know, and you might be thinking, Well, why do I need to know anything about image processing?
Well, images are just too dear rays of data on the algorithms that we apply to this data can shape.
It's in useful ways.
Obviously, some of the applications involved images and cameras and video footage.
But there's also other ways of manipulating two D data to your advantage, for example, in things like procedural generation.
On the whole, I think most programs should have an awareness of image processing.
It is a very useful tool to have in your toolbox.
So let's get started.
Before I start, I'm going to show you the video that I've created to demonstrate the eight bits, and it's quite nice because it allows us to quickly compare the algorithms.
So here it's going to show the first bit, which will be fresh holding, and we can choose different numbers to look at the different algorithms and see what their effects are because it's working with video, you see, here's a live feed of my arm waving around.
I think it makes quite a nice interactive tool, which is great for learning.
But this video is going to be a bit different to some of my others are not going to go through it line by line from scratch.
I've already created the code, and what I really want to emphasize is what is the image processing that is going on?
And how does it work?
Bit one.
This is the process of beina rising an image.
Here I have an image.
I'm going to assume it's a grayscale image, so the pixels go from black to white.
Fish holding involves taking an individual pixel on, classifying it as being above or below.
Official.
If it's above the threshold, you are the one.
If it's below the special, you output zero.
This green line represents a single row in our image.
If I take this row on plot its exposition against the brightness of that pixel, I might get something that looks like this.
Fresh holding involves specifying an appropriate value to access a cut off so any pixels above that value will get classified as one on any below.
It will get classified to zero.
The red dash line represents my threshold value.
So now with my blue pen, I can indicate what the binary equivalent of this might be.
So it starts down here in zero.
But that goes above the threshold toe one below.
Officials above the session old, and we've finalized our image to demonstrate these programs.
I'm using a pixel game engine application that I've already created on.
I feel it's necessary to give you a brief overview of what this application is before we get stuck into the algorithm code, just so it makes some sort of sense.
Fundamentally, it's based on the idea of a frame, which is a fixed to de array of pixels, in this case 3 20 by 2 40 the pixels of floating point type.
So instead of working with RGB values, I'm taking the RGB from the camera and converting it to a floating point value between zero and one by converting to the floating point domain from the interview, domain allows me to avoid complex cities such as interview division.
This simple frame class has some accesses get and set, which will do boundary checks for me, so I could quite happily set, the pixels value beyond the edges of the image.
And if I get something from beyond, the image just returns a black pixel.
So zero is black on white is one.
My frame class also overrides the assignment operator, so I can have multiple frames in my application, and I can transfer the contents of one frame to the other.
With these for this video, I'm not going to dwell on the image capture side of things.
I've already done that in other videos, and it's enough to say that we simply use the escapee library to capture a frame from a webcam.
So in on user create, the webcam is initialized and in on user update, I capture the image from the webcam per frame and convert the pixels to floating point and store it in a frame called input.
This program shows eight different algorithms, and so the bulk of the code shown here handles the selection of which algorithm is currently being demonstrated on.
The algorithms also have a degree of user input, which allows the user to change the values to play with the algorithm and see how they respond under different circumstances.
For example, when the user presses the one key on the keyboard, it changes the current algorithm being demonstrated to threshold.
So let's continue looking at that algorithm.
Here it is on dhe.
You'll see this on most of the albums.
We do a little bit of user input if their values to change and then we actually perform the algorithm under demonstration on fish.
Holding is very simple.
For all of the pixels in the frame, we read the input value of a pixel for that location.
Compare it with a threshold value, which will give us a one or a zero in response.
And then we write that to an output frame at the end of the program.
I then draw the input and output frames.
Hopefully, you can see fresh holding is very simple indeed.
So let's take a look at it.
This is Dash holding now.
My webcam has some automatic gain correction, which is what you saw then, as the image sort of changed and faded, I can't override those settings using the AP I for the camera, but for this video, it doesn't really matter.
I'm in fresh hold mode now on.
We can see The input image here on the left is in gray scale, but the outputs image here on the right is in black and white.
It's been buying arised on, it says here, I can use the said an excuse to change the value of the fresh holds A.
Currently it's 0.5.
It's halfway between the minimum and maximum intensity is for the gray scale.
As I increase the threshold value, we see less pixels being attributed to a binary one.
As I decrease it, we see the opposite.
Fresh holding is essentially the coarsest of filters, and this is usually the first step in removing as much rubbish from an image as you can.
For example, here you can see on the notebook the text.
One lone coder comes through quite clearly, but the lines on this, like grayness off it doesn't so if we were then to go on and extract this text, for example, it's much easier now.
We're not contaminated with this spatial background noise with fresh hold.
It is out bit to motion on for this video.
I'm assuming the simplest kind of motion.
We won't be able to get any direction.
Information from this.
The word motion implies that something has moved on for something to move takes time.
So to detect motion in an image we need to allow time to have elapsed.
Fortunately with video, this is quite simple because a video camera returns successive frames in time, which means we have a built in Delta time between each frame alongside movement in time.
Motion also implies movement in space.
The object was in one location, and now it's in another.
But for this bit, let's not think of objects as being the things we're looking at.
Instead, we're looking at pixel grayscale values.
So over time, if something is moving in the image, a particular pixel value is also changing.
So we can identify that motion has occurred by looking at the difference off pixel values between successive frames of video input on DSO.
On this graph, we can see that the difference between A and B is related to the change in that grayscale value.
The end result of this could be signed and in some applications.
That's a useful thing.
It gives you additional information, but for our application, I'm just going to take the absolute value of this to tell us that motion for that pixel is likely to have occurred.
The code for motion detection is equally a simple us thresh holding.
I'm going to go through every single pixel in my frames with these nested four loops, and I'm going to look at the difference between the current frame onto the previous frame by subtracting them on them, taking the absolute value of that result on dhe, setting that in the corresponding location to the output frame and then draw the input and output frames.
I update the previous input frame before I acquire a new image in the input frame.
Here's the algorithm running on its looking at a reasonably static scene, but as soon as things start to move, I'll bring my hand into the scene.
We're looking at the difference between one frame and the previous frame, but we only see illumination in the output where there has been changed.
So that signifies that motion has occurred in those locations because the frame rate of my camera is reasonably quick.
It's about 20 face per second.
I get what looks like an edge around the object that's moving, but don't be fooled by this, it's not strictly an edge, although you can use it as an edge.
It is just the difference between the two frames.
Motion detection like this is usually a foundation algorithm.
It is used to guide your decisions in subsequent algorithms that you apply to the image.
For example, I might want a system to shut down if nothing in this scene is moving, I mean, why bother taking more images if nothing has changed?
So I could detect that by accumulating the sum of all of the pixels in the output image and then checking that against a threshold value to tell me, has there been enough motion in the image for the system to switch on bit three Low past temporal filtering, as we've just seen in bit to the value of a pixel changes over time.
And if we look over a longer period of time, we might see that the pixels change values quite rapidly between frames.
This is called noise because sensors aren't perfect.
Lighting conditions, electronics and all sorts of things could influence the value of a pixel.
This noise can cause problems because what we actually want to see is the real value of the pixel change over time, she's indicated by this green line, we can approximate that.
That's somewhere in between all of these noise values.
A noise could become a problem if you do things such a stash holding because the noise might just tip you above or below the fish hold inappropriately.
We effectively want to run the grayscale value of the pixel through a low pass temporal filter, so the low frequency component of the pixel is allowed.
Food on the high frequency components are removed.
We can approximate this with a very simple equation for a given pixel value.
P.
We're going to update that pixel value by looking at the difference between the input pixel value on the current pixel value and multiplying that by a constant.
Fundamentally.
If this distance is small, then the change in our output pixel is small, and if it's large than the change in our output, pixel is large, but we can regulate that change with this constant in engineering.
This is also known as an R C.
Filter on its implementation is very simple.
In the low passed section of the program, I'm doing some user inputs so I can change the value of this temple constant.
And then I iterated through all of the pixels in a frame.
I look at the difference between the input on the output.
I scale the difference with our temporal coefficient, and then I accumulate that difference back into the output frame with this plus symbol for this algorithm.
The output frame is persistent between updates of the video camera feed, meaning that output pixels our only changed by a small amount, depending on how large the change waas off the input.
So here in the program are now running bit through the low pass temporal filter on the two images look very similar.
It might not even be that possible to see on the YouTube video, but the input image on the left actually has quite a lot of per pixel noise.
But the output image on the right has no temple noise visible to the naked eye.
If I move my hand into the scene, this is a particularly slow filter, so I could make rapid changes by wiggling my fingers around here.
But we can see that the output image doesn't change very much.
It's ignoring those fast changes, only allowing the really slow changes.
If I leave my hand in a fixed position.
Eventually it feeds into the image, so this is exaggerated in a way I can use the SET index keys to change the value of this constant, but I can make it very slow indeed, which might not immediately seem a useful thing to do.
But if you wanted to do some background subtraction algorithms over moving images, this is quite a nice way to do it.
You can accumulate the background of an image over time and then use that as a way to isolate things in the foreground.
If I increase the value of the constant, it becomes far more life.
Let's keep going a bit until the two images look very similar indeed.
But if you get it too high this concert, you'll start seeing the per pixel camera noise coming back into the output image so low pass Temple filtering is a great way to filter noise, and it also looks all ghostly and cool bit for convolution.
Where's the previous two bits have looked at filtering things in the time domain.
Convolution looks at filtering things in this spatial domain.
Fundamentally, we're going to decide what to do with the pixel by looking at its neighborhood for this example, I'm going to look at the immediate three by three neighborhood of our target pixel on this neighborhood is called a colonel.
And you can think of a colonel as a template of coefficients.
That's a used in a dot product of the neighboring pixels in that region on values of the Colonel to give us a result for the central pixel.
So my colonel might be defined here as a three by three matrix of values.
These values are over laid over the corresponding pixel in that location, and we also include the central value, which is the target pixel.
I can give these colonel values location, information, toe, identify the relationship to the target pixel.
I work out my final pixel value by performing the dot product between a colonel coefficient onto the grayscale value of the pixel at that location.
So this component, for example, is this component of the colonel well supplied by this pixel value.
And we go on to go through all of the colonel locations.
And so what effect do you think this colonel might have, then Well, we can see it is being regions of influence were strongly influenced by the target pixel, the one in the middle.
But we're also a little bit influenced by our immediate north, south, east and west neighbors.
Conveniently in this Colonel, all of these values ad upto one.
This is quite deliberate.
So we take the bulk of our pixels value from what it already is.
But then we take a little bit from its neighbors.