An Introduction to Machine Vision

Machine vision is an incredibly difficult task - a task that seems relatively trivial to humans is infinitely complex for computers to perform. This essay should provide a simple introduction to computer vision, and the sort of obstacles that have to be overcome.

Data size

We will be looking at the picture at the right throughout the essay. We will be making a few changes though - we will say that the picture is an 8-bit 640x480 images (not the 200x150 24-bit image it actually is) since this is the "standard" size and colour-depth of a computer image.

Why is this important? Well, the first consideration/problem of vision systems is the sheer size of the data it has to deal with. Doing the math, we have 640x480 pixels to begin with (307,200). This is multiplied by three to account for the red, green and blue (RGB) data (921,600). So, with just one image we are looking at 900K of data!

So, if we are looking at video of this resolution we would be dealing with 23Mb/sec (or 27Mb/sec in the US) of information! The solution to this is fairly obvious - we just cannot deal with this sort of resolution at this speed at this colour depth! Most vision systems will work with greyscale video with a resolution of 200x150. This greatly reduces the data rate - from 23Mb/sec to 0.72Mb/sec! Most modern day computer can manage this sort of rate very easily.

Of course, receiving the data is the smallest problem that vision systems face - it is processing it that takes the time. So how can we simplify the data down further? I'll present two simple methods - edge detection and prototyping.

Edge Detection

Most vision systems will be determining where and what something is, and for the most part, by detecting the edges of the various shapes in the image should be sufficient to help us on our way. Let us look at two edge detections of our picture:

The left picture is generated by Adobe Photoshop's "Edge Detection" filter, and the right picture is generated by Generation5's ED256 program. You can see that both programs picked out the same features, although Photoshop has done a better job of accentuating more prominent features.

The process of edge detection is surprisingly simple. You merely look for large changes in intensity between the pixel you are studying and the surrounding pixels. This is achieved by using a filter matrix. The two most common edge detecion matrices are called the Laplacian and Laplacian Approximation matrices. I'll use the Laplacian matrix here since the number are all integers. The Laplacian matrix looks like this:

1    1    1
1   -8    1
1    1    1

Now, let us imagine we are looking at a pixel that is in a region bordering a black-to-white block. So the pixel and its surrounding 8 neighbours would have the following values:

255  255  255
255  255  255
0    0    0

Where 255 is white and 0 is black. We then multiply the corresponding values with each other:

255   255  255
255 -2040  255
0       0    0

We then add all of the values together and take the absolute value - giving us the value of 765. Now, if this value is above our threshold (normalling around 20-30, so this is way above the threshold!) then we say that point denotes an edge. Try the above calculation with a matrix that consists of only 255. Experiment with the ED256 program which allows you to play with either the Laplacian or Laplacian Approxmation matrices, even create your own.

Prototyping

Prototyping came about through a data classification technique called competitive learning. Competition learning is employed throughout different fields in AI, especially in neural networks or more specificially self-organizing networks. Competitive learning is meant to create x-number of prototypes given a data set. These prototypes are meant to be approximations of groups of data within the dataset.

Somebody thought it would be neat to apply this sort of technique to an image to see if there are data patterns within an image. Obviously it is different for every image, but on the whole, areas of the image can be classified very well using this techinque. Here a more specific overview of the algorithm:

Prototyping Algorithm

Take x samples of the image (x is a high number like 1000). In our case, these samples would consist of small region of the image (perhaps 15x15 pixels).
Create y number of prototypes (y is normally a smaller number like 9). Again, these prototypes would consist of 15x15 groups of pixels.
Initialize these prototypes to random values (noisy images).
Cycle through our samples, and try and find the prototype that is closest to the sample. Now, alter our prototype to be a little closer to the sample. This is normally done by a weighted average. ED256 brings the chosen prototype 10% closer to the sample.
Do this many times - around 5000. You will find that the prototypes now actually represent groups of pixels that are predominate in the image.
Now, you can create a simpler image only made up of y colours by classifying each pixel according to the prototype it is closest too.

Here is our picture in greyscale and another that has been fed through the prototyping algorithm built into ED256. We use greyscale to make prototyping a lot simpler. I've also enlarged the prototypes and their corresponding colours to help you visualize the process:

Notice how the green corresponds to pixels that have a predominantly white surroundings, most are red because they are similar to the "brick" prototype. For very dark areas (look at the far right window frame) they are classified as dark red.

For another example, look at this picture of a F-22 Raptor. Notice how the red corresponds to the edges on the right wing (and the left too for the some reason!) and the dark green for the left trailing edges/intakes and right vertical tail. Dark blue is for horizontal edges, purple for the dark aircraft body and black for the land.

Conclusion

How do these techniques really help machine vision systems? It all boils down to simplifying the data that the computer has to deal with. Less data, the more time can be spent extrapolating features. The trade-off is between data size and retaining the features within the image. For example, with the prototyping example, we would have no trouble spotting the buildings in the picture, but the tree and the car are a lot harder to differentiate. The same applies with a computer.

In general, edge detection helps when you need to fit a model to a picture - for example, spotting people in a scene. Prototyping helps to classify images, by detecting their prominent features. Prototyping has a lot of uses since it can "spot" traits of an image that humans do not.