NEXT GENERATION WEB SEARCHES FOR VISUAL CONTENT

Ñòàòüÿ ïðåäîñòàâëÿåò íåêîòîðûå îñîáåííîñòè Web ïîèñêà äëÿ âèçóàëüíîãî ñîäåðæàíèÿ

Ññûëêà íà èñòî÷íèê: Next Generation Web Searches for Visual Content

 

Smith, J.R. and Shih-Fu Chang.

Although visual media account for fully 73 percent of the Web's content, search engines such as ImageScape are only now beginning to sort through these images efficiently.

Major search engines such .is hotbot (http://www.hotbot.com) help us rind text on the Web, but typically have few or no capabilities for finding visual media. Yet many Web users—such as magazine editors or professional Web site designers— need to find images using just a few global features. With hundreds of millions of sites to search through, and 73 percent of the Web devoted to images, rinding exactly the image you need can be a daunting task.
My colleagues and I developed a prototype system called Image Scape (http://skynet.liacs.nl) to find visual media over intranets and the Web. The system inte­grates technologies such as vector-quantization-based compression of the image database and k-d trees for fast searching over high-dimensional spaces. Image - Scape allows queries for images using
•   keywords,
•   semantic icons, and
•   user-drawn sketches.
Keyword queries offer perhaps the most intuitive query method because they directly relate to the user's vocab­ulary. Further, HTML provides the ALT field to spec­ify descriptive text. For example, in the following HTML tag, the image of a whale is referenced by the filename, whale.jpg, and the ALT text.
<IMG   SRC="whale.jpg"   ALT="A Humpback Whale">
However, images frequently lack descriptive text, which eliminates the possibility of text-based search­ing. In this situation, only content-based methods—
those that directly use an image's pictorial informa­tion—are feasible.

PICTORIAL-CONTENT-BASED QUERIES


In the early to mid-1990s, IBM's highly influential QBIC" system conducted visual searches for similar images on picture databases. This paradigm, shown in Figure I, displays an initial set of images. The user selects an image, then the search engine ranks the database images by similarity to the selected image with respect to color, texture, shape, or all of these criteria, as Figure 2 shows. I his approach requires minimal specialized knowledge from the user, a significant advantage.
Web media search engines such as Webseek,' PicToSeek, and ImageRover1 use the query-by-simi­lar-images paradigm. However, they differ in how they find the initial set of images. In particular, Webseek and ImageRover use text queries to narrow the initial set of images, and PicToSeek asks the user to supply an initial image.
As with any query paradigm, query by similar image has its share of problems. First, it does not let the user run searches based on only part of an image. Suppose, for example, the image content contains a person on a beach underneath a sunset. When the user clicks on the image, the system doesn't know whether the user wants to focus on the person, the beach, or the sun­set. Further, the current generation of query-by-simi­lar-image systems uses feature vectors based on global color schemes, texture, and shape. Unfortunately, images that have the same global features can have dif­ferent picture content. Using local features can over­come this problem and help detect visual concepts such as faces and beaches.
In contrast, ImageScape does not touch upon the

query-by-similar-tmages paradigm. Ir focuses on tech­niques for learning visual concepts so that it can use the query-by-icons paradigm. In this paradigm, the user places the icons on a canvas in the position where they should appear in the goal image. Doing so allows the user to explicitly create a query for images of peo­ple under a sky, for example. In this context, the data­base images must be preprocessed for the locations of the available object or concept associated with each icon. The system then returns those database images most similar to the content of objects and concepts specified in the iconic user query. The query-by-icons paradigm has the advantages that users can make a query using their own vocabulary and they can spec

if)' the importance of local pictorial features.
We also investigated the query-by-sketch paradigm. In this paradigm, the user creates a query by drawing a rough sketch of the goal image, with the assumption that the sketch will correspond to the object edges and contours. The system then returns the database images with shapes that most closely resemble the user sketch. Sketch queries thus allow the user to directly specify which part of the image is important. Making effective sketch-based queries requires a robust shape marcher.

IMAGESCAPE SYSTEM OVERVIEW


In the ImageScape system, we chose to focus on text and visual media because they are the Web's domi-

riant media. Figure 3 shows rhe system overview, including the relationships between server, client, and the Web. Continuously sending agents to the Web, the ImageScape system retrieves text, image, and video information.
When ImageScape brings an image to the server, pattern recognition algorithms detect features such as faces, sand, water, and so on, that pertain to the semantic icons and computer sketches. I he analysis module creates a thumbnail, a low-resolution copy of the image requiring minimal storage space, and stores the feature vectors in an optimized representation for searching. When a user sends an image query from a Web-based Java browser or client program to the server, rhe matcher module compares rhe sketches or semantic icons to the feature database and sends the best-ranked images back to the browser. The primary modules consist of

•   vector-quantization-based database compression,
•   sketch queries and computer-generated sketches from images,
•   visual-concept detection,
•   matching of the icons or sketches with the data­base images, and
•  Java client connection to the host server for visual query input and processing and the collection and indexing of the media from rhe Web.

WEB-BASED MEDIA COLLECTION, INDEXING, AND STORAGE


We can visualize the Web as a graph in which the nodes are Web sites and the edges are hyperlinks at those sires. ImageScape's search procedure performs a priority-based breadth-first search on the hyperlinks found from an initial set of Web sites. The priority is proportional to the site's rate of change and querv rate.

Sires that are more likely to have changed since the last visit receive greater priority for a revisit. Further, sires that appear more frequently in the query results also receive greater priority. The Robot Exclusion Protocol also constrains Web searches by specifying the directories the robot can download.
When the robot downloads the images, the system reduces them to thumbnails and stores them in a com­pressed vector quantization-based database. The sys­tem stores similar image blocks with pointers instead of copies.
Storing the media in a compressed database offers the dual advantages of lower storage costs and faster reads from magnetic storage devices. The feature vec­tors used for indexing the images are stored in k-d trees." These trees are binary-tree representations of the feature space that have a near-logarithmic search performance for finding nearest neighbors or similar images in high-dimensional spaces.

SKETCH QUERIES


Our sketch search engine compresses the user sketch at the Java client, sends it to the shape-matching engine, decompresses it, then compares it ro each image in the database based on shape similarity. Consequentl}; the most similar database images are returned ro the Java client at the Web browser. The prevalent question is how to measure the shape simi­larity between the user sketch ami a database image. Our starring point for shape comparison was the the­ory of invariant moments. We derived the moment invariants from shape-statistical moments by nor­malizing first by the centroid and then by the shape area. Moment invariants have proven useful in two-dimensional shape recognition and can be imple­mented in real time. However, they can be sensitive to small changes in the shape contour, which we refer ro as the local-shape-matching problem.
To solve this problem, we turned ro the theory of active contours.' Specifically, an active contour is a spline that deforms ro fit the particular image based on internal and external forces. The internal forces of the active contour hold the active contour together (elas­ticity forces! and keep it smooth (bending forces). The external forces guide the active contour toward image features such as high-intensity gradients or edges. The optimal contour position is computed to minimize total energy. The deformation energy is the total energy required ro move the active contour from its initial position to its final position. We use the defor­mation energy ro measure the shape similarity between closely matching shapes.
In summary, we split (he shape-matching process into two parrs: We address global shape-matching with moment invariants and measure local shape-marching by elastic deformation energy, as Figure 4 shows.

Figure 5 shows examples oj sketch queries and results for the letter V and for a question mark sym­bol. The results for the V show a variety of database images with roughly similar shapes. For the question mark symbol sketch, the query found several differ­ent question mark symbols.

LEARNING VISUAL CONCEPTS AND SEMANTIC QUERIES


Visual-concept detection is essential for the ImageScape search engine because it lets the computer understand our notion of an object or concept. For example, to find an image with a beach under a blue sky, most systems require the user to translate the con­cept of beach to a particular color and texture. In our system, the user has access to icons that represent con­cepts such as blue sky and beach. ImageScape can place these icons spatially on a canvas to create a query for a beach under blue skv.
Rosalind 1'icard reported promising results m clas­sifying blocks in an image into "at a glance" categories, which people can classify without logically analyzing the content. Picard's method exploits the strengths of multiple feature models. More recently, Aditya Vailaya, Anil Jain, and Hong Jiang Zhang reported success m classifying images as city versus landscape. I hey found that the edge direction features are effective because city images typically have long lines along the buildings and streets. Natural scenes can be separated because the edges typically curve or consist of short lines from the contours of hills, trees, or grass. Regarding object detection, the recent surge in face recognition research has motivated the development of robust methods for face detection in complex scenery. These methods use techniques such as positive and negative lace clusters, neural networks, and information theory.1'
As an example of human face detection, we use a method that finds human faces in complex back­grounds, then wc extend the method to include color, texture, and shape. The Kullback relative information is generally regarded as one of the canonical methods of measuring discriminator)' power—how effectively a

feature discriminates between two classes. Specificall we formulated the problem as discriminating betwe( the classes of face and nonface as Figure 6 shows ai used the Kullback relative information to measure tl class separation, which is the distance between tl classes in feature space.
For each pixel, we calculate the Kullback relath information based on the class intensity distribution The brighter pixels have greater relative information < class separation. The greater the class separation, tl easier it is to discriminate between classes. In Figure the image on the right shows that the eve regions ha' greater discriminatory power than the nose region.
Detecting the faces begins by passing a window ov multiple scales—copies of the image at different re olutions—and classifying the window's contents : face or nonface. Wc perform the classification by usii a minimum distance classifier in the feature spa< defined by the most discriminatory features four from the Kullback relative information.

GENERALIZING TO MULTIPLE MODELS


In face detection, we used pixels, which have tl greatest class separation or discriminatory powc Instead of finding the pixels that maximize the cla separation, we found the color, texture, and shape fe, Cures that maximize class separation and minimize tl correlation between features. We define this set of fe, tures as a discriminatory model. Note that minimizir correlation between features is important because tr minimum distance classifier assumes that the featun are independent. In summary, we define the visu. learning algorithm shown in Figure 7 as follows:
1.   Assume that there arc N4 scalar features, each ( which has been normalized to 0 to 255.
2.   Measure the distribution of the positive examplt F|x, y], x = I to M; = 0 to 255.
3.   Measure the distribution of the negative examplt G[z, v|,z= 1 to M; = 0 to 255.

4.   Calculate the Kullback relative information, Kl^ from F and G.

References


[1]  E. Chang, K.-T. Cheng, W.-C. Lai, C.-T. Wu, C.-W. Chang, and Y.-L. Wu.   PBIR — a system that learns subjective im­age query concepts.   Proceedings of ACM Multimedia, http://www.mmdb.ece.ucsb.edu/~demo/corelacm/, pages 611-614, • October 2001.
[2]  E. Chang and B. Li.   Mega — the maximizing expected generalization algorithm for learning complex query concepts (extended version). Technical Report http://www-db.stanford.edu/~echang/mega-extended.pdf, November 2000.
[3]  A. Gersho and R. Gray. Vector Quantization and Signal Compression. Kluwer Academic, 1991.
[4]  B. Li, E. Chang, and C.-S. Li.  Learning image query concepts via intelligent sampling.  Proceedings of IEEE Multimedia and Expo, August 2001.
[5]  Y. Rui, T. S. Huang, and S.-F. Chang. Image retrieval: Current techniques, promising directions and open issues. Journal of Visual Communication and Image Representation, March 1999.
[6]  S. Tong and E. Chang.   Support vector machine active learning for image retrieval.   Proceedings of ACM International Conference on Multimedia, pages 107-1 18, October 2001.

Biography | Abstract | Library | References | Report on search | Individual task
© 2008 DonNTU. All Rights Reserved