Bloomberg D.


Название статьи: Overview of image scaling
Источник: http://www.leptonica.com/scaling.html


Текст статьи

Image scaling is important for many situations in both image processing and image analysis. Consequently, a number of different scaling functions are provided in Leptonica.

Binary images at high resolution (300 to 400 ppi) can be used to make low resolution binary images for analysis or display on a low resolution device (such as a monitor). To make a version of a page image that is readable on a grayscale display, the scale-to-gray function can be used to produce a grayscale image (of at least 4 bpp) at about 100 ppi. Scale-to-gray can also be used to convert a binary image to a very low resolution grayscale page icon thumbnail at about 15 ppi. At such resolution, a 10 x 7 inch image shrinks to 150 x 105 pixels, which appears on a 100 ppi high resolution monitor as an icon of size 1.5 x 1.05 inches.

When scaling a grayscale or RGB color image, two things must be kept in mind. First, if the image is significantly reduced, it is useful to first apply a lowpass filter whose size is roughly the same as the subsampling factor, in order to reduce aliasing. Second, if the image is significantly enlarged, it is useful (though more expensive) to apply a linear interpolation to the original image, in order to avoid constructing blocks of equivalent pixels that are visibly "jagged" when simple pixel replication is used.

Finally, there is the upscale-to-binary operation, in which a grayscale image is upscaled to a higher resolution binary image. This is the inverse of scale-to-gray, and can be thought of as an example of image restoration; namely, restoring a higher resolution binary image, that had been previously downscaled to gray. We don't provide any estimation routines for this explicitly, but the operation can be performed with good results by a succession of two operations: grayscale upscaling using linear interpolation, followed by binarization using a threshold or dithering. (We provide several efficient composite operations of these types.)

Aliasing occurs when a signal is (sub)sampled at less than twice the wavelength of the highest frequency. To avoid aliasing, the high frequencies in the signal are removed with a lowpass filter before sampling. The sinc filter is ideal in that its representation in the fourier domain has constant value up to the cutoff frequency, where it goes to zero. Thus, it will remove aliasing with the smallest amount of blurring of the result. However, the sinc has infinite extension, and is impractical for efficient image processing. For our purposes, a reasonable compromise between acceptable blur and computational efficiency is to use a constant height convolution filter. The function pixBlockconv() has a computational time independent of the size of the filter, and is used for some of the lowpass filtering.

We enumerate and comment on these (four) operations in the next section, and then go into detail after that. See the table at the end of this page for an overview.

What types of image scaling are useful?

Note: in the following, when we refer to grayscale images, we intend to include color images as well. The color images used in Leptonica are either colormapped grayscale (where each gray value refers to a specific RGB color), or they are full 24 bpp RGB images. Likewise, the images in Leptonica that are strictly grayscale are either colormapped grayscale (where each RGB value in the colormap is actually gray, with R=G=B), or grayscale without a colormap. RGB images can be handled as three separate 8 bpp grayscale component images. Leptonica provides functions for separating the component images out so they can be processed as grayscale 8 bpp images, and combining the components back to form a color RGB image. Some of the fastest color operations do not perform this separation, but instead operate on the R, G and B components in situ, pulling them out of the 32 bit pixels and replacing them after the operations are completed. Leptonica also provides functions for converting a colormapped 2, 4 or 8 bpp image to grayscale or full RGB color images. For all scaling operations except simple subsampling or pixel replication, it is necessary to convert a colormapped image to either grayscale or full RGB before applying the scaling operation.

There are four main classes of image scaling:

  1. grayscale --> grayscale
  2. binary --> binary
  3. binary --> grayscale (typically downscaling)
  4. grayscale --> binary (typically upscaling)

Class 1 (grayscale --> grayscale) can be done crudely by sampling, or more accurately by low-pass filtering followed by sampling, by area mapping, or by resampling with linear interpolation. The issues are explored in detail below.

Class 2 (binary --> binary) can be performed with different goals for image reduction (downscaling). If the goal is to have an image without aliasing, a lowpass filter must be used. But if the goal is to analyze the texture, or to prepare for morphological analysis at lower resolution, a variety of nonlinear filters can be used before subsampling. The special case of a power of 2 reduction can be implemented very efficiently with word parallel operations. Of most interest are situations where the filtering and subsampling are combined, so that a filtered image at full resolution does not need to be produced before subsampling. When the image resolution is increased (upscaling), it may be desirable to smooth the boundaries to avoid large, visible stair-steps. The details are given below.

Class 3 (binary --> grayscale) is also known as "scale-to-gray." It is typically used when the image is being dowscaled from a high resolution binary scan for viewing on a lower resolution grayscale or color display, and for typical scan and display resolutions, the 3x and 4x reductions are the most useful. We provide fast scale-to-gray reductions for 2x, 3x, 4x, 8x and 16x reduction, all implemented efficiently with lookup tables. For best results at intermediate scale factors, use binary upscaling before fast scale-to-gray reduction. See below for details.

Class 4 (grayscale --> binary), which is the inverse of scale-to-gray, can be called "upscale-to-binary". It is useful when a lower resolution (e.g., 100 ppi) grayscale image is to be printed on a higher resolution binary device, such as a laser printer. If we were to use pixel replication, the blocks representing the original pixels would be easily visible on the printed page. However, the blocks are much less noticeable if the edges are smoothed, using an interpolated grayscale scaling routine followed by binarization. See below for details