Survey of treemap techniques

by Thomas Kerwin

Источник: http://www.cse.ohio-state.edu/~kerwin/treemap-survey.html

The Beginning

The original invention of the treemap visual layout was developed by Ben Schniderman as a way to more easily get a feel for the directory structure on his computer. The traditional approach to displaying tree data utilizes a symbol for each node and lines connecting the nodes for each edge in the tree. This does work, and works very well for small trees. However, when the same approach is used for large trees with hundreds of nodes, the display quickly becomes cluttered with nodes and lines. There are two obvious solutions to this problem, zoom out on the nodes so all of them can fit on the display, or let the user pan around a full size tree. Neither of these are very satisfactory.

Schniderman came up with a different way to show hierarchical data. The treemap starts with a rectangular space and subdivides it recursively. The initial rectangle represents the root node of the tree. That space is divided into a number of horizontally aligned subrectangles, each representing a child of the root node. On each recursive call, the direction of subdivision is rotated: horizontal, then vertical, then horizontal.

This way of representing a tree gives two main benefits. The first is that the main connectivity relationship of the tree, ancestry, is shown with unambiguous graphical relationships. Children of a node are enclosed within that node and parents of a node contain that node. The other benefit is that there is little wasted space in the diagram. The entire original rectangle has been filled with smaller rectangles.

The treemap is an especially good visualization for tree structures in which each leaf node has a weight value associated with it. A weight value is any numeric value that all the nodes share and can be aggregated. This might be file size, in the case of Schniderman's original problem, or physical weight, monetary value, sugar content, etc. The weight of the nodes can be used to influence their size on the screen. A low weight node might be displayed as a very small rectangle next to a high weight sibling. And just like the traditional method of displaying trees, the color and style of the rectangle can be used to convey information about the node.

Subdivision Algorithms

After the initial development of the treemap, many extensions were added to the basic idea. One class of improvements regards the nature of the subdivisions that are used to section the rectangles. Standard treemaps often produce many elongated rectangles, especially when there are many children of a particular node. This was seen as a problem by Mark Bruls, Kees Huizing, and Jarke J. van Wijk, the developers of the squarified treemap subdivision algorithm. Thin rectangles can be difficult to interact with: clicking on a particular one with a mouse takes considerable manual dexterity and comparison of the sizes of two rectangles is difficult when the aspect ratios are significantly different. The squarified treemap attempts to fix these problem by dividing the space into regions having an aspect ratio close to one. In the same paper, the authors introduce frames around nodes as a way to enhance the structure of the treemap. This is needed when using squarified treemaps since child nodes are no longer laid out in a simple, linear manner. This manner provides users of the original treemap layout with cues for identifying sibling relationships. The frame approach gives explicit information that replaces these cues.

Fig.1 – An example of the original treemap algorithm

The squarified treemap does fix the problems with thin rectangles but does not necessarily preserve the natural ordering of the original dataset. In situations where keeping the natural order of the data in the visual display is needed, a squarified view might not be a good choice. In response, Schniderman developed two related subdivision algorithms that give lower aspect ratios than his original algorithm but also maintain ordering information. They do not give average aspect ratios as low as squarified treemaps, but they can provide a useful alternative when the order of the children of a node conveys some relevant information.

Fig.2 – An example of the squarified algorithm, using the same data as above

Visual Improvements

In an attempt to enhance both user satisfaction with the appearance of treemaps and user intuition about hierarchy information, Jarke J. van Wijk and Huub van de Wetering developed a shaded treemap display they called the cushion treemap. The cushion treemap was developed to help users distinguish detail in deeply nested hierarchies. When treemaps become very deep, it becomes increasingly difficult to identify the different levels of the treemap visually. Also, levels become difficult to distinguish in squarified treemaps of well-balanced trees.

The cushion treemap algorithm is an illumination algorithm. At each subdivision level, a height function is added to the smaller rectangles. This height function is accumulated through many subdivisions, with each subdivision providing a higher frequency but lower amplitude component. When no more subdivisions can be done, the node is drawn on the screen with per-pixel illumination. The height function provides the normal for each pixel in the leaf node.

The developers of the cushion treemap have a freely downloadable implementation of the technique called SequoiaView. It can be used to view the file distribution on a Windows system.

Fig.3 – Cushion treemap

A more dramatic change to the rendering of treemaps is seen in the SunBurst visualization. SunBurst was developed by John Stasko, and provides a radial, rather than planar, layout of the tree. A benefit of this technique is that it provides much more room to display information about non-leaf nodes than the traditional treemap display does. The radial layout shows the hierarchy of the tree more explicitly than does the traditional treemap. A line from any node to the center of the SunBurst map will intersect all ancestors of that node. In the rectangular treemap, hierarchical information is only displayed implicitly in the nesting of the nodes. Although siblings can be easy to identify with rectangular treemaps, it is often difficult to find a file given a certain ancestry path. This approach also makes it very easy to identify leaf nodes, since they appear at the extremities of the radial layout. However, this does not make as efficient use of space as the rectangular treemaps, for two reasons. One is that much space is taken up with displaying non-leaf nodes close to the root. This could be useful if extra data about those nodes can be displayed in the space, but if there is no use for this space, it is wasted. The second reason is simply that computer displays are rectangular. As a consequence, when viewing any circular diagram, either the diagram is clipped or there is wasted space in the corners of the output device.

Three-Dimensional Extensions

There have been two main attempts to extend the two-dimensional display of a treemap into three dimensions. Displaying in three dimensions certainly has some inherent advantages, such as more intuitive roaming and zooming.

One extension into 3D is the information cube designed by Jun Rekimoto and Mark Green. The information cube uses a hierarchy of nested rectangular prisms to represent the tree instead of nested rectangles. The benefit of this representation over two-dimensional treemaps is data density. In a three-dimensional space,

The nested box approach of the information cube comes with significant problems. The most basic is occlusion. Rendering solid prisms prevents the user from viewing child nodes, since they are within an opaque object. However, rendering wireframe cubes proves too confusing. The authors' solution is to render the prisms as semi-transparent objects. Another problem involves selecting nodes in the tree. Picking an object in three-dimensions with a two dimensional pointing device has inherent ambiguities. In the information cube, these ambiguities can not be easily rectified by simple transformation of the cube and must be addressed using more complex techniques.

Another three-dimensional extension of the treemap is StepTree, developed by Thomas Bladh, David A. Carr, and Jeremiah Scholl. In StepTree, the algorithm starts with a large, flat rectangular prism which represents the root node. To display children nodes, the prism is subdivided into blocks using a variant of the squarified algorithm. However, the new prisms are translated in the direction perpendicular to the large face on the root prism. In this way, a stairstep-like geometry is created.

Fig.4 – StepTree

StepTree does not give the data density that the information cube approach does, since it is an extrusion of a two-dimensional layout into three dimensions. However, the occlusion problems seen in the information cube is less of a factor with this technique. Occlusion issues can be solved by rotating and translating the display and an unambiguous view for picking an object can always be found. In addition, it can give some additional utility to users. Distinguishing nodes at high levels of the tree from those at low levels of the tree becomes much easier. Thomas Bladh has a implemenation of the algorithm free to download on his site.

Application of Treemaps

Treemaps have been used for a wide array of hierarchical data. NewsMap is an online application that takes news topics from the front page of Google News and displays them in a treemap format. Topics are divided into different regions by category and their size is controled by the number of related articles reported by Google.

Fig.5 – Google News

A similar shallow treemap is used by Smartmoney.com in their "Map of the Market" application. In this application, different stocks are divided into different regions based on the sector of the market that they are in. The size of each stock on the screen is controled by the market capitalization of the company. Finally, the current daily change in value of the stock is shown by the color of the node. The colors are in a gradient scale from red (meaning negative change in value) to green (meaning positive change in value). In this way, users can get a good idea about performance of different market sectors and visually identify anomalous behavior, such as a stock doing very poorly in a sector of good performers.

Stand-alone applications like the previously mentioned SequoiaView and StepTree often use treemaps for the visualization of hard drive space. This is common because the treemap layout was originally designed to help users with this particular task. I personally have found this application of treemaps very useful in finding large, now unneeded files on my disk. Shniderman's research group has put out a implementation of treemap called simply Treemap free for non-commercial use. Treemap has disk space viewing functionality and the ability to view various generic types of tree files.

Shniderman's group also has a specialized version of treemap called PhotoMesa. This application is designed to aid navigation through directories of photos on disk and group them according to different user-supplied criteria. The software is available for purchase, but a trial version is available as well.

Panopticon is a software suite and SDK designed for use in the business world. It has tools to create treemaps based on Excel spreadsheets, but is also designed to give a platform for programmers to create business-related treemap applications.

There are several other general treemap platforms available for programmers. TreeMap Java Library and JTreemap are both written in java and provide treemap widgets and data structures. The InfoVis Toolkit includes treemap displays along with the traditional node and edge tree display and a number of other information visualization tools.

Conclusion

Treemaps have a low learning curve, both for users of applications that take advantage of them and for programmers who implement them. The basic idea is easy to understand. They can add to insights in viewing either shallow or deep trees. They can also provide users with an overview of a large and complex tree structure more quickly than a traditional node and edge display can. The efficient use of space lessens the need for panning and zooming. Treemaps are very good at displaying trees where the data allows aggregation of child weights into the weight of the parent node.

Although the basic treemap is easy to implement, there are many small variations on the theme that can be performed, either in data subdivision or data display. This makes it an interesting and rich area of further research. Trees are a common structure of data in the world, whether it's business data, filesystem data, or biological data and treemaps will continue to be a popular method to visualize those data.