What is Virtual Reality

Автор: Jerry Isdale

Источник: http://www.viz.tamu.edu/faculty/parke/ends489f00/notes/what_vr.html

I. What is Virtual Reality

The term Virtual Reality (VR) is used by many different people with many meanings. There are some people to whom VR is a specific collection of technologies, that is a Head Mounted Display, Glove Input Device and Audio. Some other people stretch the term to include conventional books, movies or pure fantasy and imagination. The NSF taxonomy mentioned in the introduction can cover these as well. However, my personal preference, and for purposes of this paper, we restrict VR to computer mediated systems. The best definition of Virtual Reality I have seen to date comes from the book "The Silicon Mirage" :

"Virtual Reality is a way for humans to visualize, manipulate and interact with computers and extremely complex data"

The visualization part refers to the computer generating visual, auditory or other sensual outputs to the user of a world within the computer. This world may be a CAD model, a scientific simulation, or a view into a database. The user can interact with the world and directly manipulate objects within the world. Some worlds are animated by other processes, perhaps physical simulations, or simple animation scripts. Interaction with the virtual world, at least with near real time control of the viewpoint, in my opinion, is a critical test for a 'virtual reality'.

Some people object to the term "Virtual Reality", saying it is an oxymoron. Other terms that have been used are Synthetic Environments, Cyberspace, Artificial Reality, Simulator Technology, etc. VR is the most common and sexiest. It has caught the attention of the media.

The applications being developed for VR run a wide spectrum, from games to architectural and business planning. Many applications are worlds that are very similar to our own, like CAD or architectural modeling. Some applications provide ways of viewing from an advantageous perspective not possible with the real world, like scientific simulators and telepresense systems, air traffic control systems. Other applications are much different from anything we have ever directly experienced before. These latter applications may be the hardest, and most interesting systems. Visualizing the ebb and flow of the world's financial markets. Navigating a large corporate information base, etc.

I.1. Types of VR Systems

A major distinction of VR systems is the mode with which they interface to the user. This section describes some of the common modes used in VR systems.

I.1.1. Window on World Systems (WoW)

Some systems use a conventional computer monitor to display the visual world. This sometimes called Desktop VR or a Window on a World (WoW). This concept traces its lineage back through the entire history of computer graphics. In 1965, Ivan Sutherland laid out a research program for computer graphics in a paper called "The Ultimate Display" that has driven the field for the past nearly thirty years.

"One must look at a display screen," he said, "as a window through which one beholds a virtual world. The challenge to computer graphics is to make the picture in the window look real, sound real and the objects act real." [quoted from Computer Graphics V26#3]

I.1.2. Video Mapping

A variation of the WoW approach merges a video input of the user's silhouette with a 2D computer graphic. The user watches a monitor that shows his body's interaction with the world. Myron Kruger has been a champion of this form of VR since the late 60's. He has published two books on the subject: "Artificial Reality" and "Artificial Reality II". At least one commercial system uses this approach, the Mandala system. This system is based on a Commodore Amiga with some added hardware and software. A version of the Mandala is used by the cable TV channel Nickelodeon for a game show (Nick Arcade) to put the contestants into what appears to be a large video game.

I.1.3. Immersive Systems

The ultimate VR systems completely immerse the user's personal viewpoint inside the virtual world. These "immersive" VR systems are often equipped with a Head Mounted Display (HMD). This is a helmet or a face mask that holds the visual and auditory displays. The helmet may be free ranging, tethered, or it might be attached to some sort of a boom armature.

A nice variation of the immersive systems use multiple large projection displays to create a 'Cave' or room in which the viewer(s) stand. An early implementation was called "The Closet Cathedral" for the ability to create the impression of an immense environment. within a small physical space. The Holodeck used in the television series "Star Trek: The Next Generation" is afar term extrapolation of this technology.

I.1.4. Telepresence

Telepresence is a variation on visualizing complete computer generated worlds. This a technology links remote sensors in the real world with the senses of a human operator. The remote sensors might be located on a robot, or they might be on the ends of WALDO like tools. Fire fighters use remotely operated vehicles to handle some dangerous conditions. Surgeons are using very small instruments on cables to do surgery without cutting a major hole in their patients. The instruments have a small video camera at the business end. Robots equipped with telepresence systems have already changed the way deep sea and volcanic exploration is done. NASA plans to use telerobotics for space exploration. There is currently a joint US/Russian project researching telepresence for space rover exploration.

I.1.5. Mixed Reality

Merging the Telepresence and Virtual Reality systems gives the Mixed Reality or Seamless Simulation systems. Here the computer generated inputs are merged with telepresence inputs and/or the users view of the real world. A surgeon's view of a brain surgery is overlaid with images from earlier CAT scans and real-time ultrasound. A fighter pilot sees computer generated maps and data displays inside his fancy helmet visor or on cockpit displays.

I.1.6. Fish Tank Virtual Reality

The phrase "fish tank virtual reality" was used to describe a Canadian VR system reported in the 1993 InterCHI proceedings. It combines a stereoscopic monitor display using LCD Shutter glasses with a mechanical head tracker. The resulting system is superior to simple stereo-WoW systems due to the motion parallax effects introduced by the head tracker. (see INTERCHI '93 Conference Proceedings, ACM Press/Addison Wesley , ISBN 0-201-58884-6)

I.2. VR Hardware

There are a number of specialized types of hardware devices that have been developed or used for Virtual Reality applications.

I.2.1. Image Generators

One of the most time consuming tasks in a VR system is the generation of the images. Fast computer graphics opens a very large range of applications aside from VR, so there has been a market demand for hardware acceleration for a long while. There are currently a number of vendors selling image generator cards for PC level machines, many of these are based on the Intel i860 processor. These cards range in price from about $2000 up to $6 or $10,000. Silicon Graphics Inc. has made a very profitable business of producing graphics workstations. SGI boxes are some of the most common processors found in VR laboratories and high end systems. SGI boxes range in price from under $10,000 to over $100,000. The simulator market has produced several companies that build special purpose computers designed expressly for real time image generation. These computers often cost several hundreds of thousands of dollars.

I.2.2. Manipulation and Control Devices

One key element for interaction with a virtual world, is a means of tracking the position of a real world object, such as a head or hand. There are numerous methods for position tracking and control. Ideally a technology should provide 3 measures for position(X, Y, Z) and 3 measures of orientation (roll, pitch, yaw). One of the biggest problem for position tracking is latency, or the time required to make the measurements and preprocess them before input to the simulation engine.

The simplest control hardware is a conventional mouse, trackball or joystick. While these are two dimensional devices, creative programming can use them for 6D controls. There are a number of 3 and 6 dimensional mice/trackball/joystick devices being introduced to the market at this time. These add some extra buttons and wheels that are used to control not just the XY translation of a cursor, but its Z dimension and rotations in all three directions. The Global Devices 6D Controller is one such 6D joystick It looks like a racket ball mounted on a short stick. You can pull and twist the ball in addition to the left/right & forward/back of a normal joystick. Other 3D and 6D mice, joystick and force balls are available from Logitech, Mouse System Corp. among others.

One common VR device is the instrumented glove. The use of a glove to manipulate objects in a computer is covered by a basic patent in the USA. Such a glove is outfitted with sensors on the fingers as well as an overall position/orientation tracker. There are a number of different types of sensors that can be used. VPL (holders of the patent) made several DataGloves, mostly using fiber optic sensors for finger bends and magnetic trackers for overall position. Mattel manufactured the PowerGlove for use with the Nintendo game system, for a short time. This device is easily adapted to interface to a personal computer. It provides some limited hand location and finger position data using strain gauges for finger bends and ultrasonic position sensors. The gloves are getting rare, but some can still be found at Toys R' Us and other discount stores. Anthony Clifton recently posted this suggestion for a" very good resource for PowerGloves etc.: small children. A friend's son had gotten a glove a couple years ago and almost NEVER used it, so I bought it off the kid. Remember children like money more than toys they never use."

The concept of an instrumented glove has been extended to other body parts. Full body suits with position and bend sensors have been used for capturing motion for character animation, control of music synthesizers, etc. in addition to VR applications.

I.2.3. Position Tracking

Mechanical armatures can be used to provide fast and very accurate tracking. Such armatures may look like a desk lamp (for basic position/orientation) or they may be highly complex exoskeletons (for more detailed positions). The drawbacks of mechanical sensors are the encumbrance of the device and its restrictions on motion. Exos Systems builds one such exoskeleton for hand control. It also provides force feedback. Shooting Star system makes a low cost armature system for head tracking. Fake Space Labs and LEEP Systems make much more expensive and elaborate armature systems for use with their display systems.

Ultrasonic sensors can be used to track position and orientation. A set of emitters and receivers are used with a known relationship between the emitters and between the receivers. The emitters are pulsed in sequence and the time lag to each receiver is measured. Triangulation gives the position. Drawbacks to ultrasonics are low resolution, long lag times and interference from echoes and other noises in the environment. Logitech and Transition State are two companies that provide ultrasonic tracking systems.

Magnetic trackers use sets of coils that are pulsed to produce magnetic fields. The magnetic sensors determine the strength and angles of the fields. Limitations of these trackers are a high latency for the measurement and processing, range limitations, and interference from ferrous materials within the fields. However, magnetic trackers seem to be one of the preferred methods. The two primary companies selling magnetic trackers are Polhemus and Ascension.

Optical position tracking systems have been developed. One method uses a ceiling grid LEDs and a head mounted camera. The LEDs are pulsed in sequence and the cameras image is processed to detect the flashes. Two problems with this method are limited space (grid size) and lack of full motion (rotations). Another optical method uses a number of video cameras to capture simultaneous images that are correlated by high speed computers to track objects. Processing time (and cost of fast computers) is a major limiting factor here. One company selling an optical tracker is Origin Instruments.

Inertial trackers have been developed that are small and accurate enough for VR use. However, these devices generally only provide rotational measurements. They are also not accurate for slow position changes.

I.2.4. Stereo Vision

Stereo vision is often included in a VR system. This is accomplished by creating two different images of the world, one for each eye. The images are computed with the viewpoints offset by the equivalent distance between the eyes. There are a large number of technologies for presenting these two images. The images can be placed side-by-side and the viewer asked (or assisted) to cross their eyes. The images can be projected through differently polarized filters, with corresponding filters placed in front of the eyes. Anaglyph images user red/blue glasses to provide a crude (no color) stereovision.

The two images can be displayed sequentially on a conventional monitor or projection display. Liquid Crystal shutter glasses are then used to shut off alternate eyes in synchronization with the display. When the brain receives the images in rapid enough succession, it fuses the images into a single scene and perceives depth. A fairly high display swapping rate (min. 60hz) is required to avoid perceived flicker. A number of companies made low cost LC shutter glasses for use with TVs (Sega, Nintendo, Toshiba, etc.). There are circuits and code for hooking these up to a computer available on many of the On-line systems, BBSs and Internet FTP sites mentioned later. However, locating the glasses themselves is getting difficult as none are still being made or sold for their original use. Stereographics sells a very nice commercial LC shutter system called CrystalEyes.

Another alternative method for creating stereo imagery on a computer is to use one of several split screen methods. These divide the monitor into two parts and display left and right images at the same time. One method places the images side by side and conventionally oriented. It may not use the full screen or may otherwise alter the normal display aspect ratio. A special hood viewer is placed against the monitor which helps the position the eyes correctly and may contain a divider so each eye e sees only its own image. Most of these hoods, such as the one for the V5 of Rend386, use fresnel lenses to enhance the viewing. An alternative split screen method orients the images so the top of each points out the side of the monitor. A special hood containing mirrors is used to correctly orient the images. A very nice low cost (under $200) unit of this type is the Cyberscope available from Simsalabim.

I.2.5. Head Mounted Display (HMD)

One hardware device closely associated with VR is the Head Mounted Device (HMD).

These use some sort of helmet or goggles to place small video displays in front of each eye, with special optics to focus and stretch the perceived field of view. Most HMDs use two displays and can provide stereoscopic imaging. Others use a single larger display to provide higher resolution, but without the stereoscopic vision.

Most lower cost HMDs ($3000-10,000 range ) use LCD displays, while others use small CRTs, such as those found in camcorders. The more expensive HMDs use special CRTs mounted along side the head or optical fibers to pipe the images from non-head mounted displays. ($60,000 and up). A HMD requires a position tracker in addition to the helmet. Alternatively, the display can be mounted on an armature for support and tracking (a Boom display).

I.2.6. Health Hazards from Stereoscopic Displays

There was an article supplement with CyberEdge Journal issue #17 entitled "What's Wrong with your Head Mounted Display". It is a summary report on the findings of a study done by the Edinburgh Virtual Environment Lab, Dept. of Psychology, Univ. of Edinburgh on the eye strain effects of stereoscopic Head Mounted Displays. There have been a number of anecdotal reports of stress with HMDs and other stereoscopic displays, but few, if any, good clinical studies. This study was done very carefully and the results are a cause for some concern.

The basic test was to put 20 young adults on a stationary bicycle and let them cycle around a virtual rural road setting using a HMD (VPL LX EyePhone and a second HMD LEEP optic equipped system). After 10 minutes of light exercise, the subjects were tested...

"The results were alarming: measures of distance vision , binocular fusion and convergence displayed clear signs of binocular stress in a significant number of the subjects. Over half the subjects also reported symptoms of such stress, such as blurred vision."

The article goes on to describe the primary reason for the stress - the difference between the image focal depth and the disparity. Normally, the when your eyes look at a close object they focus (accommodate) close and also rotate inward (converge). When they accommodate on a far object, the eyes also diverge. However, a stereoscopic display does not change the either the effective focal plane (set by the optics) and the disparity depth. The eyes strain to decouple the signals.

The article discusses some potential solutions, but notes that most of them (dynamic focal/disparity) are difficult to implement. It mentions monoscopic HMDs only to say that while they would seem to avoid the problems, they were not tested. The article does not discuss non-HMD stereoscopic devices at all, but I would extrapolate that they should show some similar problems. The full article is available from CyberEdge Journal for a small fee.

There has been a fair bit of discussion ongoing in the sci.virtual-worlds newsgroup (check the Sept./Oct. 93 archives) about this and some other studies. One contributor, Dipl.-Ing. Olaf H. Kelle, University of Wuppertal, Germany, reported only 10% of his users showing eye strain. His system is setup with a focal depth of 3m which seems to be a better, more comfortable viewing distance. Others have noted that long duration monitor use often leads to the user staring or not blinking. It is common for VDT users to be cautioned to look away from the screen occasionally to adjust their focal depth and to blink. Another contributor, John Nagle provided the following list of other potential problems with HMDs: electrical safety, Falling/tripping over real world objects, simulator sickness (disorientation due to conflicting motion signals from eyes and inner ear), Eye Strain, Induced post-HMD accidents ("some flight simulators some flight simulators, usually those for military fighter aircraft, it's been found necessary to forbid simulator users to fly or drive for a period of time after flying the simulator".).

I.3. Levels of VR Hardware Systems

The following defines a number of levels of VR hardware systems. These are not hard levels, especially towards the more advanced systems.

I.3.1. Entry VR (EVR)

The 'Entry Level' VR system takes a stock personal computer or workstation and implements a WoW system. The system may be based on an IBM clone (MS-DOS/Windows) machine or an Apple Macintosh, or perhaps a Commodore Amiga. The DOS type machines (IBM PC clones) are the most prevalent. There are Mac based systems, but few very fast rendering ones. Whatever the base computer it includes a graphic display, a 2D input device like a mouse, trackball or joystick, the keyboard, hard disk & memory.

I.3.2. Basic VR (BVR)

The next step up from an EVR system adds some basic interaction and display enhancements. Such enhancements would include a stereographic viewer (LCD Shutter glasses) and a input/control device such as the Mattel PowerGlove and/or a multidimensional (3D or 6D) mouse or joystick.

I.3.3. Advanced VR (AVR)

The next step up the VR technology ladder is to add a rendering accelerator and/or frame buffer and possibly other parallel processors for input handling, etc. The simplest enhancement in this area is a faster display card. For the PC class machines, there are a number of new fast VGA and SVGA accelerator cards. These can make a dramatic improvement in the rendering performance of a desktop VR system. Other more sophisticated image processors based on the Texas Instruments TI34020 or Intel i860 processor can make even more dramatic improvements in rendering capabilities. The i860 in particular is in many of the high end professional systems. The Silicon Graphics Reality Engine uses a number of i860 processors in addition to the usual SGI workstation hardware to achieve stunning levels of realism in real time animation.

An AVR system might also add a sound card to provide mono, stereo or true 3D audio output. Some sound cards also provide voice recognition. This would be an excellent additional input device for VR applications.

I.3.4. Immersion VR (IVR)

An Immersion VR system adds some type of immersive display system: a HMD, a Boom, or multiple large projection type displays (Cave).

An IVR system might also add some form of tactile, haptic and touch feedback interaction mechanisms. The area of Touch or Force Feedback (known collectively as Haptics) is a very new research arena.

I.3.5. Cockpit Simulators

A common variation on VR is to use a Cockpit or Cab compartment to enclose the user. The virtual world is viewed through some sort of view screen and is usually either projected imagery or a conventional monitor. The cockpit simulation is very well known in aircraft simulators, with a history dating back to the early Link Flight Trainers (1929?). The cockpit is often mounted on a motion platform that can give the illusion of a much larger range of motion. Cabs are also used in driving simulators for ships, trucks, tanks and 'battle mechs'. The latter are fictional walking robotic devices (i.e. the Star Wars films). The BattleTech location based entertainment (LBE) centers use this type of system.

I.3.6. SIMNET, Defense Simulation Internet

One of the biggest VR projects is the Defense Simulation Internet. This project is a standardization being pushed by the USA Defense Department to enable diverse simulators to be interconnected into a vast network. It is an outgrowth of the Defense Advanced Research Projects Administration (DARPA) SIMNET project of the later 1980s. SIMNET was/is a collection of tank simulators (Cab type) that are networked together to allow unit tactical training. Simulators in Germany can operate in the same virtual world as simulators in the USA, partaking of the same battle exercise.

The basic Distributed Interactive Simulation (DIS) protocol has been defined by the Orlando Institute for Simulation & Training. It is the basis for the next generation of SIMNET, the Defense Simulation Internet (DSI). (love those acronyms!) An accessible, if somewhat dark, treatment of SIMNET and DSI can be found in the premier issue of WIRED magazine (January 1993) entitled "War is Virtual Hell" by Bruce Sterling.

I.4. Available VR Software Systems

There are currently quite a number of different efforts to develop VR technology. Each of these projects have different goals and approaches to the overall VR technology. Large and small University labs have projects underway (UNC, Cornell, U.Rochester, etc.). ARPA , NIST, National Science Foundation and other branches of the US Government are investing heavily in VR and other simulation technologies. There are industry supported laboratories too, like the Human Interface Technologies Laboratory (HITL) in Seattle and the Japanese NTT project. Many existing and startup companies are also building and selling world building tools (Autodesk, IBM', Sense8, VREAM).

There are two major categories for the available VR software: toolkits and authoring systems. Toolkits are programming libraries, generally for C or C++ that provide a set of functions with which a skilled programmer can create VR applications. Authoring systems are complete programs with graphical interfaces for creating worlds without resorting to detailed programming. These usually include some sort of scripting language in which to describe complex actions, so they are not really non-programming, just much simpler programming. The programming libraries are generally more flexible and have faster renders than the authoring systems, but you must be a very skilled programmer to use them.

I.5. Aspects of A VR Program

Just what is required of a VR program? The basic parts of the system can be broken down into an Input Processor, a Simulation Processor, a Rendering Process, and a World Database. All these parts must consider the time required for processing. Every delay in response time degrades the feeling of 'presence' and reality of the simulation.

I.5.1. Input Processes

The Input Processes of a VR program control the devices used to input information to the computer. There are a wide variety of possible input devices: keyboard, mouse, trackball, joystick, 3D & 6D position trackers (glove, wand, head tracker, body suit, etc.). A networked VR system would add inputs received from net. A voice recognition system is also a good augmentation for VR, especially if the user's hands are being used for other tasks. Generally, the input processing of a VR system is kept simple. The object is to get the coordinate data to the rest of the system with minimal lag time. Some position sensor systems add some filtering and data smoothing processing. Some glove systems add gesture recognition. This processing step examines the glove inputs and determines when a specific gesture has been made. Thus it can provide a higher level of input to the simulation.

I.5.2. Simulation Process

The core of a VR program is the simulation system. This is the process that knows about the objects and the various inputs. It handles the interactions, the scripted object actions, simulations of physical laws (real or imaginary) and determines the world status. This simulation is basically a discrete process that is iterated once for each time step or frame. A networked VR application may have multiple simulations running on different machines, each with a different time step. Coordination of these can be a complex task.

It is the simulation engine that takes the user inputs along with any tasks programmed into the world such as collision detection, scripts, etc. and determines the actions that will take place in the virtual world.

I.5.3. Rendering Processes

The Rendering Processes of a VR program are those that create the sensations that are output to the user. A network VR program would also output data to other network processes. There would be separate rendering processes for visual, auditory, haptic (touch/force), and other sensory systems. Each renderer would take a description of the world state from the simulation process or derive it directly from the World Database for each time step.

I.5.3.1. Visual Renderer

The visual renderer is the most common process and it has a long history from the world of computer graphics and animation. The reader is encouraged to become familiar with various aspects of this technology.

The major consideration of a graphic renderer for VR applications is the frame generation rate. It is necessary to create a new frame every 1/20 of a second or faster. 20 frames per second (fps) is roughly the minimum rate at which the human brain will merge a stream of still images and perceive a smooth animation. 24fps is the standard rate for film, 25fps is PAL TV, 30fps is NTSC TV. 60fps is Showscan film rate. This requirement eliminates a number of rendering techniques such as raytracing and radiosity. These techniques can generate very realistic images but often take hours to generate single frames.

Visual renderers for VR use other methods such as a 'painter's algorithm', a Z-Buffer, or other Scanline oriented algorithm. There are many areas of visual rendering that have been augmented with specialized hardware. The Painter's algorithm is favored by many low end VR systems since it is relatively fast, easy to implement and light on memory resources. However, it has many visibility problems. For a discussion of this and other rendering algorithms, see one of the computer graphics reference books listed in a later section.

The visual rendering process is often referred to as a rendering pipeline. This refers to the series of sub-processes that are invoked to create each frame. A sample rendering pipeline starts with a description of the world, the objects, lighting and camera (eye) location in world space. A first step would be eliminate all objects that are not visible by the camera. This can be quickly done by clipping the object bounding box or sphere against the viewing pyramid of the camera. Then the remaining objects have their geometry's transformed into the eye coordinate system (eye point at origin). Then the hidden surface algorithm and actual pixel rendering is done.

The pixel rendering is also known as the 'lighting' or 'shading' algorithm. There are a number of different methods that are possible depending on the realism and calculation speed available. The simplest method is called flat shading and simply fills the entire area with the same color. The next step up provides some variation in color across a single surface. Beyond that is the possibility of smooth shading across surface boundaries, adding highlights, reflections, etc.

An effective short cut for visual rendering is the use of "texture" or "image" maps. These are pictures that are mapped onto objects in the virtual world. Instead of calculating lighting and shading for the object, the renderer determines which part of the texture map is visible at each visible point of the object. The resulting image appears to have significantly more detail than is otherwise possible. Some VR systems have special 'billboard' objects that always face towards the user. By mapping a series of different images onto the billboard, the user can get the appearance of moving around the object.

I need to correct my earlier statement that radiosity cannot be used for VR systems due to the time requirements. There have recently been at least two radiosity renderers announced for walkthrough type systems - Lightscape from Lightscape Graphics Software of Canada and Real Light from Atma Systems of Italy. These packages compute the radiosity lighting in a long time consuming process before hand. The user can interactively control the camera view but cannot interact with the world. An executable demo of the Atma product is available for SGI systems from ftp.iunet.it (192.106.1.6) in the directory ftp/vendor/Atma.

I.5.3.2. Auditory Rendering

A VR system is greatly enhanced by the inclusion of an audio component. This may produce mono, stereo or 3D audio. The latter is a fairly difficult proposition. It is not enough to do stereo-pan effects as the mind tends to locate these sounds inside the head. Research into 3D audio has shown that there are many aspects of our head and ear shape that effect the recognition of 3D sounds. It is possible to apply a rather complex mathematical function (called a Head Related Transfer Function or HRTF) to a sound to produce this effect. The HRTF is a very personal function that depends on the individual's ear shape, etc. However, there has been significant success in creating generalized HRTFs that work for most people and most audio placement. There remains a number of problems, such as the 'cone of confusion' wherein sounds behind the head are perceived to be in front of the head.

Sound has also been suggested as a means to convey other information, such as surface roughness. Dragging your virtual hand over sand would sound different than dragging it through gravel.

I.5.3.3. Haptic Rendering

Haptics is the generation of touch and force feedback information. This area is a very new science and there is much to be learned. There have been very few studies done on the rendering of true touch sense (such as liquid, fur, etc.). Almost all systems to date have focused on force feedback and kinesthetic senses. These systems can provide good clues to the body regarding the touch sense, but are considered distinct from it. Many of the haptic systems thus far have been exo-skeletons that can be used for position sensing as well as providing resistance to movement or active force application.

I.5.3.4. Other Senses

The sense of balance and motion can be served to a fair degree in a VR system by a motion platform. These are used in flight simulators and some theaters to provide some motion cues that the mind integrates with other cues to perceive motion. It is not necessary to recreate the entire motion perfectly to fool the mind into a willing suspension of disbelief.

The sense of temperature has seen some technology developments. There exist very small electrical heat pumps that can produce the sensation of heat and cold in a localized area. These system are fairly expensive.

Other senses such as taste, smell, pheromone, etc. are beyond our ability to render rapidly and effectively. Sometimes, we just don't know enough about the functioning of these other senses.

I.6. World Space

The virtual world itself needs to be defined in a 'world space'. By its nature as a computer simulation, this world is necessarily limited. The computer must put a numeric value on the locations of each point of each object within the world. Usually these 'coordinates' are expressed in Cartesian dimensions of X, Y, and Z (length, height, depth). It is possible to use alternative coordinate systems such as spherical but Cartesian coordinates are the norm for almost all applications. Conversions between coordinate systems are fairly simple (if time consuming).

I.6.1. World Coordinates

A major limitation on the world space is the type of numbers used for the coordinates. Some worlds use floating point coordinates. This allows a very large range of numbers to be specified, with some precision lost on large numbers. Other systems used fixed point coordinates, which provides uniform precision on a more limited range of values. The choice of fixed versus floating point is often based on speed as well as the desire for a uniform coordinate field.

I.6.2. A World Divided: Separation of Environments

One method of dealing with the limitations on the world coordinate space is to divide a virtual world up into multiple worlds and provide a means of transiting between the worlds. This allows fewer objects to be computed both for scripts and for rendering. There should be multiple stages (aka rooms, areas, zones, worlds, multiverses, etc.) and a way to move between them (Portals).

I.7. World Database

The storage of information on objects and the world is a major part of the design of a VR system. The primary things that are stored in the World Database (or World Description Files) are the objects that inhabit the world, scripts that describe actions of those objects or the user (things that happen to the user), lighting, program controls, and hardware device support.

I.7.1. Storage Methods

There are a number of different ways the world information may be stored: a single file, a collection of files, or a database. The multiple file method is one of the more common approaches for VR development packages. Each object has one or more files (geometry, scripts, etc.) and there is some overall 'world' file that causes the other files to be loaded. Some systems also include a configuration file that defines the hardware interface connections.

Sometimes the entire database is loaded during program startup, other systems only read the currently needed files. A real database system helps tremendously with the latter approach. An Object Oriented Database would be a great fit for a VR system, but I am not aware of any projects currently using one.

The data files are most often stored as ASCII (human readable) text files. However, in many systems these are replaced by binary computer files. Some systems have all the world information compiled directly into the application.

I.7.2. Objects

Objects in the virtual world can have geometry, hierarchy, scripts, and other attributes. The capabilities of objects has a tremendous impact on the structure and design of the system. In order to retain flexibility, a list of named attribute/values pairs is often used. Thus attributes can be added to the system without requiring changes to the object data structures.

These attribute lists would be addressable by name (i.e. cube.mass => mass of the cube object). They may be a scalar, vector, or expression value. They may be addressable from within the scripts of their object. They might be accessible from scripts in other objects.

I.7.2.1. Position/Orientable

An object is positionable and orientable. That is, it has a location and orientation in space. Most objects can have these attributes modified by applying translation and rotation operations. These operations are often implemented using methods from vector and matrix algebra.

I.7.2.2. Hierarchy

An object may be part of an object part HIERARCHY with a parent, sibling, and child objects. Such an object would inherit the transformations applied to it's parent object and pass these on to it's siblings and children. Hierarchies are used to create jointed figures such as robots and animals. They can also be used to model other things like the sun,  planets and moons in a solar system.

I.7.2.3. Bounding Volume

Additionally, an object should include a BOUNDING VOLUME. The simplest bounding volume is the Bounding Sphere, specified by a center and radius. Another simple alternative is the Bounding Cube. This data can be used for rapid object culling during rendering and trigger analysis. Objects whose bounding volume is completely outside the viewing area need not be transformed or considered further during rendering. Collision detection with bounding spheres is very rapid. It could be used alone, or as a method for culling objects before more rigorous collision detection algorithms are applied.

I.7.3. Object Geometry

The modeling of object shape and geometry is a large and diverse field. Some approaches seek to very carefully model the exact geometry of real world objects. Other methods seek to create simplified representations. Most VR systems sacrifice detail and exactness for simplicity for the sake of rendering speed.

The simplest objects are single dimensional points. Next come the two dimensional vectors. Many CAD systems create and exchange data as 2D views. This information is not very useful for VR systems, except for display on a 2D surface within the virtual world. There are some programs that can reconstruct a 3D model of an object, given a number of 2D views.

The sections below discuss a number of common geometric modeling methods. The choice of method used is closely tied to the rendering process used. Some renderers can handle multiple types of models, but most use only one, especially for VR use. The modeling complexity is generally inversely proportional to the rendering speed. As the model gets more complex and detailed, the frame rate drops.

I.7.3.1. 3D PolyLines & PolyPoints

The simplest 3D objects are known as PolyPoints and PolyLines. A PolyPoint is simply a collection of points in space. A Polyline is a set of vectors that form a continuous line.

I.7.3.2. Polygons

The most common form of objects used in VR systems are based on flat polygons. A polygon is a planar, closed multi-sided figure. They maybe convex or concave, but some systems require convex polygons. The use of polygons often gives objects a faceted look. This can be offset by more advanced rendering techniques such as the use of smooth shading and texture mapping.

Some systems use simple triangles or quadrilaterals instead of more general polygons. This can simplify the rendering process, as all surfaces have a known shape. However, it can also increase the number of surfaces that need to be rendered.

Polygon Mesh Format (aka Vertex Join Set) is a useful form of polygonal object. For each object in a Mesh, there is a common pool of Points that are referenced by the polygons for that object. Transforming these shared points reduces the calculations needed to render the object. A point at the edge of a cube is only processed once, rather once for each of the three edge/polygons that reference it. The PLG format used by REND386 is an example of a Polygonal Mesh, as is the BYU format used by the 'ancient' MOVIE.BYU program.)

The geometry format can support precomputed polygon and vertex normals. Both Polygons and vertices should be allowed a color attribute. Different renderers may use or ignore these and possibly more advanced surface characteristics. Precomputed polygon normals are very helpful for backface polygon removal. Vertices may also have texture coordinates assigned to support texture or other image mapping techniques.

I.7.3.3. Primitives

Some systems provide only Primitive Objects, such as cubes, cones, and spheres. Sometimes, these objects can be slightly deformed by the modeling package to provide more interesting objects.

I.7.3.4. Solid Modeling & Boolean Operations

Solid Modeling (aka Computer Solid Geometry, CSG) is one form of geometric modeling that uses primitive objects. It extends the concept by allowing various addition, subtraction, Boolean and other operations between these primitives. This can be very useful in modeling objects when you are concerned with doing physical calculations, such as center of mass, etc. However, this method does incur some significant calculations and is not very useful for VR applications. It is possible to convert a CSG model into polygons. Various complexity polygonal models (# polygons) could be made from a single high resolution ''metaobject" of a CSG type.

I.7.3.5. Curves & Patches

Another advanced form of geometric modeling is the use of curves and curved surfaces (aka patches). These can be very effective in representing complex shapes, like the curved surface of an automobile, ship or beer bottle. However, there is significant calculation involved in determining the surface location at each pixel, thus curve based modeling is not used directly in VR systems. It is possible, however, to design an object using curves and then compute a polygonal representation of those curved patches. Various complexity polygonal models could be made from a single high resolution 'metaobject'.

I.7.3.6. Dynamic Geometry (aka morphing)

It is sometimes desirable to have an object that can change shape. The shape might simply be deformed, such a bouncing ball or the squash/stretch used in classical animation ('toons'), or it might actually undergo metamorphosis into a completely different geometry. The latter effect is commonly known as 'morphing' and has been extensively used in films, commercials and television shows. Morphing can be done in the image domain (2D morph) or in the geometry domain (3D morph). The latter is applicable to VR systems. The simplest method of doing a 3D morph is to precompute the various geometry's and step through them as needed. A system with significant processing power can handle real time object morphing.

I.7.3.7. Swept Objects & Surface of Revolution

A common method for creating objects is known as Sweeping and Surfaces of Revolution. These methods use an outline or template curve and a backbone. The template is swept along the backbone creating the object surface (or rotated about a single axis to create a surface of revolution). This method may be used to create either curve surfaces or polygonal objects. For VR applications, the sweeping would most likely be performed during the object modeling (creation) phase, and the resulting polygonal object stored for real time use.

I.7.3.8. Texture Maps & Billboard Objects

As mentioned in the section on rendering, texture maps (images) can be used to provide the appearance of more geometric complexity without the geometric calculations. Using flat polygonal objects that maintain an orientation towards the eye/camera (billboards) and multiple texture maps can extend this trick even further. Texture maps, even without billboard objects, are an excellent way to increase apparent scene complexity. Variations on the image mapping concept are also used to simulate reflections, etc.

I.7.4. Lights

Lighting is a very important part of a virtual world (if it is visually rendered). Lights can be ambient (everywhere), or located. Located lights have position and may have orientation, color, intensity and a cone of illumination. The more complex the light source, the more computation is required to simulate its effect on objects.

I.7.5. Cameras

Cameras or viewpoints may be described in the World Database. Generally, each user has only one viewpoint at a time (ok, two closely spaced viewpoints for stereoscopic systems). However, it may be useful to define alternative cameras that can be used as needed. An example might be an overhead camera that shows a schematic map of the virtual world and the user's location within it (You Are Here.)

I.7.6. Scripts and Object Behavior

A virtual world consisting only of static objects is only of mild interest. Many researchers and enthusiasts of VR have remarked that interaction is the key to a successful and interesting virtual world. This requires some means of defining the actions that objects take on their own and when the user (or other objects) interact with them. This i refer to generically as the World Scripting. I divide the scripts into three basic types: Motion Scripts, Trigger Scripts and Connection Scripts

Scripts may be textual or they might be actually compiled into the program structure. The use of visual programming languages for world design was pioneered by VPL Research with their Body Electric system. This Macintosh based language used 2d blocks on the screen to represent inputs, objects and functions. The programmer would connect the boxes to indicate data flow.

There is no common scripting language used in today's VR products. The commercial authoring packages, such as VR Studio, VREAM and Superscape all contain some form of scripting language. Autodesk's CDK has the "Cyberspace Description Format" (CDF) and the Distributed Shared Cyberspace Virtual Representation (DSCVR) database. These are only partially implemented in the current release. They are derived from the Linda distributed programming language/database system. ("Coordiantation Languages and their Significance", David Gelernter and Nicholas Carriero, Communications of the ACM, Feb 1992 V35N2). On the homebrew/freeware side, some people are experimenting with several Object Oriented interpretive languages such as BOB ("Your own tiny Object-Oriented Language", David Betz, DrDobbs Journal Sept 1991). Object Orientation, although perhaps not in the conventional class-inheritance mechanism, is very nicely suited to world scripting. Interpretive langauges are faster for development, and often more accessible to 'non-programmers'.

I.7.6.1. Motion Scripts

Motion scripts modify the position, orientation or other attributes of an object, light or camera based on the current system tick. A 'tick' is one advancement of the simulation clock. Generally, this is equivalent to a single frame of visual animation. (VR generally uses Discrete Simulation methods)

For simplicity and speed, only one motion script should be active for an object at any one instant. Motion scripting is a potentially powerful feature, depending on how complex we allow these scripts to become. Care must be exercised since the interpretation of these scripts will require time, which impacts the frame and delay rates.

Additionally, a script might be used to attach or detach an object from a hierarchy. For example, a script might attach the user to a CAR object when he wishes to drive around the virtual world. Alternatively, the user might 'pick up' or attach an object to himself.

I.7.6.2. Physical or Procedural Modeling and Simulation

A complex simulation could be used that models the interactions of the real physical world. This is sometimes referred to as Procedural Modeling. It can be a very complex and time consuming application. The mathematics required to solve the physical interaction equations can also be fairly complex. However, this method can provide a very realistic interaction mechanism. (for more on Physical Simulation, see the book by Ronen Barzel listed in the Computer Graphics Books section)

I.7.6.3. Simple Animation

A simpler method of animation is to use simple formulas for the motion of objects. A very simple example would be "Rotate about Z axis once every 4 seconds". This might also be represented as "Rotate about Z 10 radians each frame".

A slightly more advanced method of animation is to provide a 'path' for the object with controls on its speed at various points. These controls are sometimes referred to as "slow in-out". They provide a much more realistic motion than simple linear motion.

If the motion is fixed, some systems can precompute the motion and provide a 'channel' of data that is evaluated at each time instance. This may be a simple lookup table with exact values for each frame, or it may require some sort of simple interpolation.

I.7.6.4. Trigger Scripts

Trigger Scripts are invoked when some trigger event occurs, such as collision, proximity or selection. The VR system needs to evaluate the trigger parameters at each TICK. For proximity detectors, this may be a simple distance check from the object to the 3D eye or effector object (aka virtual human) Collision detection is a more involved process. It is desirable but may not be practical without off loading the rendering and some UI tasks from the main processor.
 

I.7.6.5. Connection Scripts

Connection scripts control the connection of input and output devices to various objects. For example a connection script may be used to connect a glove device to a virtual hand object. The glove movements and position information is used to control the position and actions of the hand object in the virtual world. Some systems build this function directly into the program. Other systems are designed such that the VR program is almost entirely a connection script.

I.7.7. Interaction Feedback

The user must be given some indication of interaction feedback when the virtual cursor selects or touches an object. Crude systems have only the visual feedback of seeing the cursor (virtual hand) penetrate an object. The user can then grasp or otherwise select the object. The selected object is then highlighted in some manner. Alternatively, an audio signal could be generated to indicate a collision. Some systems use simple touch feedback, such as a vibration in the joystick, to indicate collision, etc.

I.7.8. Graphical User Interface/Control Panels

A VR system often needs to have some sort of control panels available to the user. The world database may contain information on these panels and how they are integrated into the application. Alternatively, they may be a part of the program code.

There are several ways to create these panels. There could be 2D menus that surround a WoW display, or are overlaid onto the image. An alternative is to place control devices inside the virtual world. The simulation system must then note user interaction with these devices as providing control over the world.

One primary area of user control is control of the viewpoint (moving around within the virtual world). Some systems use the joystick or similar device to move. Others use gestures from a glove, such as pointing, to indicate a motion command.

The user interface to the VW might be restricted to direct interaction in the 3D world. However, this is extremely limiting and requires lots of 3D calculations. Thus it is desirable to have some form of 2D Graphical user interface to assist in controlling the virtual world. These 'control panels' of the would appear to occlude portions of the 3D world, or perhaps the 3D world would appear as a window or viewport set in a 2D screen interface. The 2D interactions could also be represented as a flat panel floating in 3D space, with a 3D effector controlling them.

I.7.8.1. Two Dimensional Controls

There are four primary types of 2D controls and displays. (controls cause changes in the virtual world, displays show some measurement on the VW.) Buttons, Sliders, Gauges and Text. Buttons may be menu items with either icons or text identifiers. Sliders are used for more analog control over various attributes. A variation of a slider is the dial, but these are harder to implement as 2D controls. Gauges are graphical depiction's of the value of some attribute(s) of the world. Text may be used for both control and display. The user might enter text commands to some command parser. The system may use text displays to show the various attributes of the virtual world.

An additional type of 2D display might be a map or locator display. This would provide a point of reference for navigating the virtual world.

The VR system needs a definition for how the 2D cursor effects these areas. It may be desirable to have a notion of a 'current control' that is the focus of the activity (button pressed, etc.) for the 2D effector. Perhaps the arrow keys on the keyboard could be used to change the current control, instead of using the mouse (which might be part of the 3D effector at present).

I.7.8.2. Three Dimensional Controls

Some systems place the controls inside the virtual world. These are often implemented as a floating control panel object. This panel contains the usual 2D buttons, gauges, menu items, etc. perhaps with a 3D representation and interaction style.

There have also been some published articles on 3D control Widgets. These are interaction methods for directly controlling the 3D objects. One method implemented at Brown University attaches control handles to the objects. These handles can be grasped, moved, twisted, etc. to cause various effects on an object. For example, twisting one handle might rotate the object, while a 'rack' widget would provide a number of handles that can be used to deform the object by twisting its geometry.

I.7.9. Hardware Control & Connections

The world database may contain information on the hardware controls and how they are integrated into the application. Alternatively, they may be a part of the program code. Some VR systems put this information into a configuration file. I consider this extra file simply another part of the world database.

The hardware mapping section would define the input/output ports, data speeds, and other parameters for each device. It would also provide for the logical connection of that device to some part of the virtual world. For example a position tracker might be associated with the viewer's head or hand.

I.7.10. Room/Stage/Area Descriptions

If the system supports the division of the virtual world into different areas, the world database would need multiple scene descriptions. Each area description would give the names of objects in scene, stage description (i.e. size, backgrounds, lighting, etc.). There would also be some method of moving between the worlds, such as entering a doorway, etc., that would most likely be expressed in object scripts.

I.8. World Authoring versus Playback

A virtual world can be created, modified and experienced. Some VR systems may not distinguish between the creation and experiencing aspects. However, there is currently a much larger body of experience to draw upon for designing the world from the outside. This method may use techniques borrowed from architectural and other forms of Computer Aided Design (CAD) systems. Also the current technologies for immersive VR systems are fairly limiting in resolution, latency, etc. They are not nearly as well developed as those for more conventional computer graphics and interfaces.

For many VR systems, it makes a great deal of sense to have a Authoring mode and a Playback mode. The authoring mode may be a standard text editor and compiler system, or it may include 3D graphic and other tools. Such a split mode system makes it easier to create a stand alone application that can be delivered as a product.

An immersive authoring ability may be desirable for some applications and some users. For example, an architect might have the ability to move walls, etc. when immersed, while the clients with him, who are not as familiar with the system, are limited to player status. That way they can't accidentally rearrange the house by leaning on a wall.