Назад в библиотеку

OpenGL 3 & DirectX 11: The War Is Over

Авторы: Fedy Abi-Chahla

Источник: http://www.tomshardware.com/...

1. Introduction

Given the prevalence of DirectX nowadays, we tend to forget that 10 years ago an all-out war was being waged between Microsoft and Silicon Graphics in the field of 3D APIs. The two companies were both trying to win over developers, with Microsoft using its financial muscle and SGI relying on its experience and its reputation in the field of real-time 3D. In this modern David-versus-Goliath battle, the “little guy” won a precious ally in one of the most famous game developers–-Mr. John Carmack. In part due to the success of the Quake engine, solid support for OpenGL became important enough to motivate makers of 3D cards to provide complete drivers. In fact, it gave 3dfx one of its early advantages and knocked ATI to the back of the pack as it struggled with its OpenGL support.

Meanwhile, Microsoft was starting from scratch, and the learning curve was steep. So, for several years, Direct3D’s capabilities were beyond the curve, with an interface that many programmers found a lot more confusing than OpenGL’s. But nobody can accuse Microsoft of being easily discouraged. With each new version of Direct3D, it gradually began to catch up with OpenGL. The engineers in Redmond worked very hard to bring performance up to its rival API’s level.

A turning point was reached with DirectX 8, released in 2001. For the first time, Microsoft’s API did more than just copy from SGI. It actually introduced innovations of its own like support for vertex and pixel shaders. SGI, whose main source of revenue was the sale of expensive 3D workstations, was in a bad position, having failed to foresee that the explosion of 3D cards for gamers would prompt ATI and Nvidia to move into the professional market with prices so low (due to economies of scale) that SGI couldn’t keep up. OpenGL’s development was also handicapped by bitter disputes among its proponents. Since the ARB—the group in charge of ratifying the API’s development—included many different, competing companies, it was hard to reach agreement on the features to be added to the API. Instead, each company promoted its own agenda. Conversely, Microsoft was working solely with ATI and Nvidia, using its weight to cast a deciding vote if there was disagreement.

With DirectX 9, Microsoft managed to strike a decisive victory, imposing its API on developers. Only John Carmack and those who insisted on portability remained faithful to OpenGL. But their ranks dwindled. And yet a reversal of fortunes was still possible. It had happened with Web browsers, after all. Even when a company has maneuvered itself into a near monopoly, if it rests on its laurels, it’s not all that rare for a competitor to rise from his ashes. So when the Khronos group took over OpenGL two years ago, many hopes were rekindled with all eyes on the upcoming SIGGRAPH conference that year.

Last month, Khronos was to have announced OpenGL 3, a major revision of the API that’s supposed to catch up with Microsoft, which was also scheduled to launch its next-gen DirectX 11 API. But things didn’t really go as planned.

2. OpenGL 3 (3DLabs And The Evolution Of OpenGL)

To fully understand the controversy that surrounded the announcement of OpenGL 3, we have to go back a few years to 2002. At that time, as we said in our introduction, OpenGL was losing ground. Up until that point, DirectX had simply copied the capabilities of OpenGL. This time, however, SGI’s API had been overtaken. With DirectX 9, Microsoft introduced support for a high-level shader language, HLSL, and OpenGL had nothing to compare. It is important to note that OpenGL’s origins lie with IRIS GL, an API initially created by SGI to expose the functionality of its hardware. For a long time, ATI and Nvidia simply followed SGI’s rendering model, which meant that OpenGL was especially well-suited to the makers’ cards even before they were released. But with the introduction of shaders, the new GPUs moved away from the traditional rendering pipeline.

At the time, one company realized the importance of the need for a rapid evolution to OpenGL if the API had any hope of being applied to modern GPUs: 3DLabs. That’s not surprising, because 3DLabs abandoned gaming cards after its Permedia 2 was EoLed to concentrate on the professional market, where OpenGL is the standard. 3DLabs presented a plan with several points for bringing OpenGL into a new era. First was inclusion of a high-level shader language: GLSL. Then it called for complete revision of the API. Many of its features no longer made sense on modern 3D cards, but the need for backward compatibility required GPU manufacturers to support them at least at the software level. Not only does that make writing drivers more complex, increasing the occurrence of bugs, but the legacy capabilities also made the API confusing for new programmers.

So 3DLabs wanted to expose a subset of functionality that would guarantee efficient execution by the GPU and eliminate outmoded or redundant features. This subset was called OpenGL 2.0 Pure and was intended for developers of new applications. For backward compatibility, the full set of extensions in OpenGL 1.x was available in Open GL 2.0.

Unfortunately, after interminable discussions within the ARB, the plan was rejected. And when OpenGL 2.0 finally became available, all it did was to add support for GLSL to the API. All of 3DLabs’ other proposals ended up in the trash, leaving OpenGL still lagging behind the Microsoft API.

3. A Need For Change

Another example demonstrates the ARB’s inability to make rapid, efficient decisions. For a long time, OpenGL relied on a technique called pbuffers to render textures. All programmers agreed that the technique was very poorly conceived, difficult to use, and yielded poor performance. So, ATI proposed an extension to replace it—über-buffers. This extension was very ambitious. Beyond rendering to a texture, ATI wanted to make it possible to render to an array of vertices, along with other advanced capabilities. It may have all been a bit too ambitious, since the extension took too long to define, programmers got impatient, and Nvidia and 3DLabs finally made a competing proposal to at least enable rendering to a texture efficiently, without the generic approach taken by ATI’s solution. It ended up taking several years to see results from all these efforts—in the form of an extension called framebuffer_object, just to offer a basic feature already in DirectX 9!

So, in 2005, OpenGL had caught up with the Microsoft’s API launched three years earlier. All of the major players (ATI, Nvidia, 3Dlabs, and the software developers) agreed that things couldn’t go on this way, or else OpenGL would sink into oblivion little by little due to obsolescence. In this agitated context, the ARB passed the baton to Khronos in 2006, putting the future of OpenGL into the group’s hands. ATI and Nvidia both swore a pledge that they would rise above their own rivalry and collaborate effectively so that OpenGL could finally enter the 21st century. Developers were enthusiastic, since the Khronos group had shown itself to be very effective in managing OpenGL ES, the 3D API for mobile peripherals.

Very quickly the Khronos group began issuing communications about the future of OpenGL. Again the plan was based on a reworking of the API in two stages. The first revision, Longs Peak, would offer a R300/NV30 level of functionality on par with Shader Model 2 and a new, more flexible programming model. A little like OpenGL 2.0 Pure, which 3DLabs had proposed years before, the Khronos group planned to drop aspects of the API that were considered obsolete and focus on a small number of modern functions. This subset was called OpenGL Lean and Mean. The second major revision, codenamed Mount Evans, was to take the new API, correct any faults that had appeared in the meantime and add R600/G80 (Shader Model 4) features. The draft timetable was very tight, calling for the arrival of Mount Evans less than six months after Longs Peak. But the members of Khronos seemed confident.

In another change from the ARB, Khronos decided to communicate more openly. An informational newsletter was made available on the OpenGL site, to begin educating developers about the new API and let them give their impressions of it. Everything seemed to be going well until the end of 2007. Whereas the final specification for Longs Peak was expected in September, the Khronos group announced that due to problems, it would be delayed—without providing any details. The effort at more open communication of a few months earlier was forgotten and the Khronos group continued its work behind a total blackout. No more newsletter—in fact, there was not any news at all about the new API’ progress.

4. The Revelation

No more was heard of OpenGL 3 until August 2008 at the SIGGRAPH conference. But while some people were expecting a pleasant surprise, Khronos had a serious disillusionment in store for fans of OpenGL. Not only was the API nearly a year late, but to top it all off, most of the new aspects of Longs Peak had been completely abandoned. After the OpenGL 2.0 fiasco, which really delivered only an OpenGL 1.6 with a different name, this OpenGL 3.0 was beginning to look like no more than version 2.2. The unpleasant surprise, coupled with the absence of communication for several months, resulted in some very aggressive reactions toward the Khronos group on forums everywhere. Faced with the storm of reaction, Khronos responded on the official OpenGL forum through Barthold Lichtenbelt of Nvidia. His highly detailed response at least provided a little insight into what had been going on in the wings. We learned, for example, that certain points of implementation weren’t decided on in time, and that in parallel, a lot of people felt that it had become urgent to enable OpenGL support for the latest GPUs. So the plan was modified in order to extend OpenGL 2 to include Direct3D 10 functionality.

Even if the argument holds up, Khronos can still be criticized for not trying to put out the fire immediately rather than suddenly cutting off all communication with the outside world. And the similarity with what happened six years earlier with OpenGL 2.0 doesn’t really inspire optimism for the future. After two promises to rewrite the API—both of them failures—how are we supposed to have faith in the future of OpenGL? Finally, a comment by John Carmack at the latest QuakeCon didn’t really help the situation. Asked about the status of OpenGL 3, he answered in terms that were a lot less politically correct than Mr. Lichtenbelt’s statement.

According to Carmack, OpenGL 3’s falling short of what it was supposed to be is mainly the fault of certain CAD software developers who weren’t really favorable to Longs Peak. They were afraid of problems with compatibility and their applications due to the disappearance of certain older functions. That version was tacitly confirmed by Lichtenbelt: During the Longs Peak design phase, we ran into disagreement over what features to remove from the API...The disagreements happened because of different market needs...We discovered we couldn’t do one API to serve all..

So in the end, OpenGL 3 is nothing more than an incremental update. The API hasn’t really been changed. Khronos has simply marked certain capabilities as being deprecated and created a context in which using those functions will cause errors. That’s a far cry from what was promised (driver developers still need to provide support), but it is a step forward since it allows developers to prepare for future versions that may finally offer a true Lean and Mean mode. OpenGL 3 also introduces the notion of profiles. For the moment there’s only one profile, but the plan calls for creating a profile for games and another for CAD, for example, with each profile supporting a different subset of functions.

Aside from that, the features offered by OpenGL 3 are pretty much the same as what Direct3D 10 offers, except for Geometry Shaders and Geometry Instancing, which have been added to the API as an extension. But some features of Direct3D 10.1, like independent blending modes for MRTs, are also supported.

5. Direct3D 11

With Direct3D 10, Microsoft made the most sweeping revision of its API since its creation. Admittedly, all those years of compatibility were beginning to handicap the evolution of its API and the goal was to provide a sound foundation for future developments. Yet, the new programming interface received a mixed reception from gamers and developers alike.

Microsoft is largely to blame. After hyping the merits of its API for several years before it was actually available, surely it had to expect a certain amount of discontent when gamers realized that the actual product didn’t really change much for them. Add to that the fact that the new API was written exclusively for Vista, and it was enough to generate animosity toward what had been presented as nothing less than a small revolution. As for developers, things were even more complicated. By associating Direct3D 10 and Vista, Microsoft greatly limited the number of existing computers that would be able to run a game using the API.

Further—and this is no secret for anybody—the PC as a gaming platform has lost ground in recent years with the emergence of the new consoles, to which several major developers from the PC world have now switched. id Software, Epic, and Lionhead are now all working on multi-platform projects, if not developing exclusively for consoles. And since both HD consoles on the market use a DirectX 9 GPU, developers have all the motivation they need to stick with the previous MS API.

So why are we talking about Direct3D 11 now? First of all, because Microsoft has finally lifted the veil from its API and because, after all, it’s still a newsworthy event—one that’ll give us an idea of what to expect from next year’s hardware. And what’s more, there’s a good chance that Direct3D 11 will prove to be a more important page in the history of the API than version 10 was. While Direct3D 10 was a complete revision, with all the risks that entails, Microsoft has now put enough distance between it and this new version to correct the problems raised by the first major overhaul of its API. So you could call Direct3D 11 a major update, albeit an incremental one. It re-uses all the concepts that were introduced with Direct3D 10, and is compatible with the preceding version and with the preceding generation’s hardware. And finally, it’ll be available not only on Windows 7, but also on Vista. So Microsoft has corrected the biggest problems with the preceding version and it’s being whispered among developers that some of them are skipping Direct3D 10 and moving directly to version 11 for their future games.

That rationale holds water for several reasons. A typical game’s development phase is between two and four years. So by the time a game that is just now starting its development phase is released, Direct3D 11 will be already well established for PCs, since it’ll run on all PCs shipped with Windows 7 and work on the great majority of PCs running Vista. And, it seems very probable that regardless of their release dates, future consoles will use Direct3D 11-compatible GPUs (or something close, like the Xenos in the Xbox 360, which is a superset of DirectX 9). Consequently, aiming at that level of functionality will enable developers to get the jump on the next generation of consoles. But we aren’t here to do a market study. What does the new API bring with it from a technical point of view?

6. Multi-Threaded Rendering

Multi-threaded rendering? "But," you’re saying, "we’ve had multi-core CPUs for several years now and developers have learned to use them. So multi-threading their rendering engines is nothing new with Direct3D 11." Well, this may come as a surprise to you, but current engines still use only a single thread for rendering. The other threads are used for sound, decompression of resources, physics, etc. But rendering is a heavy user of CPU time, so why not thread it, too? There are a several reasons, some of them related to the way GPUs operate and others to the 3D API. So Microsoft set about solving the latter and working around the former.

First of all, threading the rendering process seems attractive at first, but when you look at it a little closer, you realize that, after all, there’s only one GPU (even when several of them are connected via SLI or CrossFire, their purpose is to create the illusion that there’s only a single, virtual GPU) and consequently only one command buffer. When a single resource is shared by several threads, mutual exclusion (mutex) is used to prevent several threads from writing commands simultaneously and stepping on each others’ feet. That means that all the advantages of using several threads are canceled out by the critical section, which serializes code. No API can solve this problem—it’s inherent in the way the CPU and GPU communicate. But Microsoft is offering an API that can try to work around it. Direct3D 11 introduces secondary command buffers that can be saved and used later.

So, each thread has a deferred context, where the commands written are recorded in a display list that can then be inserted into the main processing stream. Obviously, when a display list is called by the main thread (the “Execute” in the “Multi-threaded Submission” diagram below) it has to be ascertained that its thread has finished filling it. So there’s still synchronization, but this execution model at least allows some of the rendering work to be parallelized, even if the resulting acceleration won’t be ideal.

Another problem with the previous Direct3D versions had to do with creation of resources—textures, for example. In the current versions of the API (9 and 10), resource creation had to take place in the rendering thread. Developers got around the problem by creating a thread that read and decompressed the texture from the disk and filled the resource (the Direct3D object), which itself was created in the main thread.

But as you can see, a large share of the workload was still on the main thread, which was already overloaded. That doesn’t ensure good balance, needed for good execution times. So, Microsoft has introduced a new interface with Direct3D 11: a programmer can create one Device object per thread, which will be used to load resources. Synchronization within the functions of a Device is more finely managed than in Direct3D 10 and is much more economical with CPU time.

7. Tessellation

The major new feature of Direct3D 10 was the appearance of Geometry Shaders, which finally made it possible to create or destroy vertices on the GPU. But the role of this unit has been somewhat misinterpreted. Rather than being used for massive expansion of geometry, it’s better suited to implementing more flexible Point Sprites, managing Fur Shading, or calculating the silhouette of an object for shadow volume algorithms. Nothing is better than a dedicated unit for doing tessellation. Initially planned for Direct3D 10 (which explains its presence in the Radeon HD series), it seems that Microsoft, ATI, and Nvidia weren’t able to reach an agreement in time, and so it disappeared from the specifications, only to return with a vengeance with Direct3D 11. So, tessellation is the big new feature of Direct3D 11—or at least the one that’s easiest to sell for non-specialists.

So, Microsoft has introduced three new stages in its rendering pipeline:

Unlike other stages of the pipeline, these don’t operate with triangles as primitives, but use patches. The Hull Shader takes the control points for a patch as input and determines certain parameters of the Tesselator, such as, for example, TessFactor, which indicates the degree of fineness of the tessellation. The Tesselator is a fixed-function unit, so the programmer does not control how tessellation is calculated. The unit sends the points generated to the Domain Shader, which can apply operations to them. An example should make all this clearer. Let’s take a case that has come up in each generation since Matrox’s Parhelia—Displacement Mapping.

As input to the vertex shader, we have the control points of the patch. The programmer can manipulate these as he or she sees fit, since they aren’t numerous. To simplify, they are a very coarse version of the final mesh. These transformed points are then passed to the Hull Shader, which determines how many times to subdivide each side of the patch (for example, as a function of the size of the patch in pixels on the display). The Tesselator handles tessellation as such. That is, the creation of the geometry, which is passed to the Domain Shader. It transforms the points generated into the appropriate space (the points exiting the Tesselator are in the space of the patch), producing classic vertices that it can displace as a function of a texture and perform Displacement Mapping.

The potential is enormous. Thanks to tessellation, it becomes possible to do without the normal map and implement a level of detail directly on the GPU, allowing the use of very detailed models (several million polygons instead of 10,000 or so with current games)—at least in theory. In practice, tessellation raises some problems that have kept the technique from taking off so far. Can Direct3D 11 and compatible cards avoid these pitfalls and come up with a functional version? It’s too early to say, but in any event not everybody is convinced, and id Software is working on solving the same geometry problem with a completely different approach based on ray casting with voxels.

8. Compute Shader And Texture Compression

We mentioned this open secret in the conclusion of our article on CUDA. Microsoft wasn’t about to let the GPGPU market get away and now has its own language for using the GPU to crunch other tasks besides drawing pretty pictures. And guess what? The model they chose, like OpenCL, appears to be quite similar to CUDA, confirming the clarity of Nvidia’s vision. The advantage over the Nvidia solution lies in portability—a Compute Shader will work on an Nvidia or ATI GPU and on the future Larrabee, plus feature better integration with Direct3D, even if CUDA does already have a certain amount of support. But we won’t spend any more time on this subject, even if it is a huge one. Instead, we’ll look at all this in more detail in a few months with a story on OpenCL and Compute Shaders.

Improved Texture Compression

First included with DirectX 6 10 years ago, DXTC texture compression quickly spread to GPUs and has been used massively by developers ever since. Admittedly, the technology developed by S3 Graphics was effective, and the hardware cost was modest, which no doubt explains its success. But now needs have changed. DXTC wasn’t designed with compressing HDR image sources or normal maps in mind. So Direct3D’s goal was twofold: enabling compression of HDR images and limiting the “blockiness” of traditional DXTC modes. To do that, Microsoft introduced two new modes: BC6 for HDR images and BC7 for improving the quality of compression for LDR images.

9. Shader Model 5

With Shader Model 5, Microsoft applies certain concepts of object-oriented programming to its shader language, HLSL. Unlike preceding versions, which introduced new capabilities (Dynamic Branching, integer support, etc.), the purpose here is to facilitate programmers’ work by solving a common problem in current game engines: namely the upsurge in the number of shaders due to the large number of permutations. We’ll take a concrete example: Suppose that an engine manages two types of materials, plastic and metal, and two types of light: spot and omni. A programmer has to write four shaders in order to handle all cases:

renderPlasticSpot () … // rendering plastic using spot light …
renderPlasticOmni () … // rendering plastic using omni light …
renderMetalSpot() … //rendering metal using spot light …
renderMetalOmni() … //rendering metal using omni light …

This example is very simple since there are only two materials and two types of light, but in practice there can be several dozen. Obviously doing it this way can quickly become unmanageable. There’s a tremendous amount of code duplication and each time a bug is corrected somewhere it has to be corrected in all the other shaders. To solve this problem, programmers use what’s commonly called an über–shader, which brings together all the combinations:

Render() #ifdef METAL // code specific to metal material #elif PLASTIC // code specific to plastic material #endif #ifdef SPOT // code specific to spot light #elif OMNI // code specific to omni light #endif

This solution solves the problem by generating shaders on the fly from common code fragments. The downside is that it makes reading the shaders difficult and requires additional effort to be sure that all the fragments are inserted where they need to be. But with Direct3D 11, it’s now possible to make your code much more legible by using a derived interface and classes:

Light myLight; Material myMaterial; Render() myMaterial.render (); myLight.shade ();

Light and Material are interfaces, and the code is contained in the derived classes OmniLight and SpotLight, PlasticMaterial and MetalMaterial. So the code is all in a single place, which makes bug correction easier. At the same time, legibility doesn’t suffer thanks to the organization of the code, which resembles the concept of virtual functions in object-oriented languages. This feature will be welcomed by programmers, but won’t have any real impact for gamers.

Miscellaneous

As you can imagine, we’ve only skimmed the surface of Direct3D 11’s new features and Microsoft hasn’t even released all the details yet. Among the topics we haven’t mentioned are the increase in maximum texture size from 4K x 4K to 16K x 16K and the possibility of limiting the number of mipmaps loaded in VRAM. There’s also the possibility of changing the depth value of a pixel without disabling functionality like early Z checking, support for double-precision floating-point types, scatter memory writes, etc.

10. Conclusion

We were expecting a lot from OpenGL 3, and as you can tell by reading this article, we’re disappointed—both in the API itself (with the disappearance of promised features) and in the way it’s been handled (a year-long delay and a lack of clear communication on the part of the Khronos group). With this version, OpenGL barely keeps up with Direct3D 10, and at a time when Microsoft has chosen to publicize the first details of version 11 of its own API.

There’s nothing revolutionary from Microsoft either, but unlike OpenGL, Direct3D already underwent a major revision of its architecture two years ago. There were some rough stretches of road, but today Microsoft can reap the benefits of the efforts made then to rebuild the API on a sound foundation.

So, it’s undeniable that Redmond is looking to the future, whereas one gets the impression that Khronos is content with just supporting current GPUs. Here’s hoping it’ll prove us wrong by speeding up the evolution of OpenGL 3, since it is the only API available for multi-platform development. But too many letdowns up until now certainly have our faith in the organization shaken.