faster rendering using relief textures?

Authored by Owen Pearn
  • 2 September 2005 - edited
  • 27 January 2000 - minor edits

what a difference five years makes

Relief texture mapping can now be done in realtime on the video card with shaders.

Watch a video showing Relief Mapping in DOOM 3 running in real time on a GeForce 6800 GT.

This is from "Rendering Surface Details in Games with Relief Mapping Using a Minimally Invasive Approach" by Fabio Policarpo and Manuel M. Oliveira. They generated depth maps from the DOOM 3 normal maps.

Example shader source code in nVidia's CineFX 4.0 Technical Brief


historical stuff below

what are relief textures? 

Relief textures are a new image-based rendering technique developed by Manuel Oliveira and Gary Bishop at the University of North Carolina at Chapel Hill.

It appears that relief textures have the potential to significantly increase visual realism of rendered geometry while keeping system load constant. Briefly, a preprocessing stage replaces geometry with texture, in such a way that it looks like the geometry is still there, even from viewpoints almost parallel to the texture. Sort of like bump-mapping, only better.

"Figure 1. Scene rendered using three conventionally texture-mapped quadrilaterals (left). Same scene rendered with three Relief Textures and five quadrilaterals (right). Note the dormers." more

A relief texture is rendered by doing a "computation" followed by a normal texture map (done by the graphics card). If the computation can be done in less time than the time saved by not processing the replaced geometry, then there will be a net performance win.

This is where you come in.

I am searching for ways to do the computation "fast" (ideally by the graphics card). This may involve the application of some Extra Cunning.


got code?

Source code and much, much more by Sergey Parilov.


can relief textures plug into existing code?

From an OpenGL pipeline's point of view, relief textures are dynamic (procedural) textures. This means that they can be selectively applied where they make the most difference, without requiring a complete rewrite of the entire pipeline to fit a new paradigm.


how do relief textures work?

A relief texture is a texture with an associated heightfield (a height per texel).

In a preprocessing step, geometry is rendered from a number of viewpoints using orthographic projection. The resultant images become the textures, and the height at each texel is the distance from a supporting plane to the surface sample that projects onto the pixel, forming the heightfield.

To render, the texture undergoes a "pre-warp" operation per texel, creating a new texture. This new texture is then texture-mapped in the usual way. Effectively, the heightfield replaces the geometry. The goal is to do the pre-warp with less cost (or at least no additional cost) than it would take to render the original geometry.

Manuel Oliveira explains it in his paper (.pdf).


can relief textures help realtime gaming?

If some geometry can be replaced by relief textures with a net performance win, then there will be some extra time to put in more stuff (ie. geometry, textures, puppies).

If some geometry can be replaced by relief textures without a net performance win, but without increasing total load, visual realism might be increased anyway.

Take today's state-of-the-art game, Quake3Arena. All geometry lives in a binary-space-partitioning tree. This is used to do culling by the cpu (not the graphics card), as well as in-game visibility checks and collision detection. Each frame, only the visible geometry is sent over the bus to the graphics card for rendering. Most of the textures remain resident in graphic card RAM. A few dynamic textures are sent each frame.

The target machine is something like a pentium II with a 300 mhz cpu and an ATI Rage 128 or Nvidia RIVA TNT graphics card. These graphics cards have a street-price today of about US $120. The graphics pipeline is OpenGL. Brian Hook reports (.ppt) 1.5 million triangles per second through this pipeline on a 500 Mhz pentium III driving an ATI Rage128 graphics card.

The graphics cards above don't have on-board geometry engines, so the cpu is doing the transform for each triangle (in the opengl client-side driver), and sending each transformed triange over the bus to the graphics card for rasterization (texture mapping). Note that Quake3Arena does its own lighting using lightmaps (blended textures) and does not use the OpenGL lighting path. Reportedly, between 50% and 75% of the time is spent in the OpenGL client-side driver, authored by the manufacturer of the graphics card. The balance is spent doing culling, visibility, collision detection, networking, input, sound and other necessary things.

Typically, folks want to run this type of game at a frame rate of (at least) 30 frames per second, which gives a per-frame triangle budget of (1.5 million / 30) 50,000, used for visible dungeon, aliens, environment, various damage-inducing-projectiles and stunt puppies.

The triangles are put into strips and/or fans to reduce average vertexes per triangle from 3. Let's assume this goes to 1.5 vertexes per triangle on average, which gives (1.5 * 50,000) 75,000 vertexes to transform and send per frame.

Each vertex is about 32 bytes which gives (32 * 75,000) 2,400,000 bytes per frame, which at 30 fps, puts (30 * 2,400,000) 72 MB/sec over the bus (PCI bus peak is 132 MB/sec, AGP peak is 264 MB/sec, AGP 2X is 532 MB/sec).

The TNT graphics card has a reported peak of 6 million rendered triangles per second.

The bottleneck appears to be the cpu having to transform 75,000 vertexes per frame and send them to the graphics card. A 500 Mhz cpu has (500,000,000 / 30) 16,666,666 cycles per frame to spend, and assuming we can use at most 75% of that for vertex processing, that's (75% * 16,666,666) 12,500,000 cycles, which for 75,000 vertexes, is (12,500,000 / 75,000) 166 cpu cycles per vertex.

So today, this game appears to be geometry-bound. The key point here is that we have no choice but to send all visible geometry each frame.

(Note: I know that tomorrow's cards will have geometry transform engines on them, and I know that single-instruction-multiple-data client-side drivers (MMX, KNI, 3DNow) will reduce the cpu load, and I know that dynamic level-of-detail schemes could be implemented.)

Some notes on optimizing OpenGL drivers for Quake3.


observations

  • Only visible relief textures are candidates for updating each frame. 
  • Not every candidate may need updating each frame (we have a choice as to which relief texture(s) get updated each frame, and how). This may be visually "okay", because OpenGL will still have a texture to map (it just won't be the theoretically correct texture). We need only send a "very small" amount of geometry to get the (albeit "incorrect") texture rendered. 
  • "Small" changes in viewpoint may not require rerendering, especially for textures "far away" from the camera. 
  • Give priority to candidates that are being viewed from a viewpoint that is "very different" from their last rendered viewpoint and/or "big-on-screen". 
  • For a candidate that we have chosen to update, we may need only update those texels which will undergo the most perceivable change, given the change in viewpoint. 
  • More than one pre-warped texture per relief texture could be cached on the card, corresponding to more than one viewpoint. 
  • Rather than a displacement per texel, we may only need a displacement per "group" of texels. 
  • Relief textures do not need to be preprocessed for viewpoints that will never be seen. 
  • There may be a more compact set of preprocessed viewpoints than a bounding box (or part thereof). 
  • Collision detection gets harder. (Is it enough to collision detect against the polygon only, or do we have to collision detect against the heightfield as well?) 


questions

  • Is there any combination of OpenGL primitives that could do the pre-warp in hardware (even if those primitives are unlikely to be "fast" on today's consumer-level hardware)? 
  • Is there an existing OpenGL extension that could do the pre-warp (like ARB_imaging), or could a new extension be defined? 
  • Could the pre-warp be approximated by existing OpenGL primitives? 
  • How do folks currently do procedural textures "fast"? 
  • Can the heightfield be encoded somehow in a texture and somehow blended with the image texture to do the pre-warp? 
  • Can we approximate the heightfield with a parametric spline and use OpenGL evaluators to do the pre-warp? (does this need to be done on-card to be done "fast"?) 

  •  

     
     
     
     
     
     
     

    " ... All polynomial or rational polynomial splines of any degree (up to the maximum degree supported by the GL implementation) can be described using evaluators. These include almost all splines used in computer graphics, including B-splines, Bezier curves, Hermite splines, and so on. ..." more

" ... parametric curves and surfaces are both common (particularly in CAD and 3D modeling applications) and quite computationally expensive. It makes sense for OpenGL to off-load the floating-point intensive task of evaluating these curves and surfaces to fast 3D transformation engines when available ..." more
  • Do pbuffers help? 

  •  

     
     
     
     
     
     
     

    "... Pbuffers are allocated sections of framebuffer memory for offscreen rendering. ..." more

  • Could a regular mesh be defined on the relief texture (separate from the heightfield) and the pre-warp expressed as a warping of that mesh? 
  • What's a texture matrix? 
  • Does lighting get harder? 
  • What is the actual reduction in the number of vertexes that have to be transformed and sent over the bus? 
  • What is the actual net increase or decrease in bus traffic? 
  • Any issues that are showstoppers? 
  • What will break, given that the z buffer will have values for the polygon onto which the relief texture is mapped, rather than values for the surface that the relief texture is representing? 
  • Any issues that are showstoppers?


pre-warp example source

"Figure 7. Pseudocode for left-to-right warp and construction of one pixel with u (or v) index = I, color = C and displacement D." more
get Iin, Cin, Din

Inext = Equation_10a(Iin, Din)

for (Iout = integer(Iprev + 1); Iout <= Inext; Iout++) 

linearly interpolate Cout between Cprev and Cin
linearly interpolate Dout between Dprev and Din
put Iout, Cout, Dout 
Iprev= Inext; Cprev= Cin; Dprev= Din


references - OpenGL warping, image processing

  • "... This extension defines a new depth texture format. An important application of depth texture images is shadow casting, but separating this from the shadow extension allows for the potential use of depth textures in other applications such as image-based rendering or displacement mapping. ..." more 
  • "... Or: render into a texture (SGI has an extension to do render directly to texture memory; what hardware are you using?) and then draw a grid with the texture mapped onto it, with texture coordinates set to give you the correct warping. Texture mapping is a great way of approximating ANY image transformation. ..." more 
  • "... With OpenGL, procedurally modifying a triangle mesh's UV and mapping a (possibly dynamically generated) texture to it is the canonical way to implement distorted/stained/simple mirrors, flowing water/lava surfaces, heated air, predator invisibility effects, Q1 like sky textures etc ..." more 
  • "... UAV shows an unusual use of projective texturing and shadow testing for accelerated image orthorectification. ..." more 
  • " ... Image warping or dewarping may be implemented using texture mapping by defining a correspondence between a uniform polygonal mesh and a warped mesh. The points of the warped mesh are assigned the corresponding texture coordinates of the uniform mesh and the mesh is texture mapped with the original image. Using this technique simple transformations such as zoom, rotation, or shearing can be efficiently implemented. The technique also easily extends to much higher order warps such as those needed to correct distortion in satellite imagery. ..." more 
  • "... 2. The bottleneck in rendering graphics is not in the pipeline that manipulates pixels, it is in sending the information about the scene across the bus (for hardware that works on systems with a bus - PCs). This is due to the relatively verbose nature of graphics APIs such as OpenGL and Direct3D. 

  •  

     
     
     
     
     
     
     

    3. A great deal of image manipulation can already be done by treating the image as a texture, mapping it to a polygon, and rendering that. OK, this doesn't give you warping operators for depth images, but virtually everything else is supported. ..." more

  • " ... The best choice depends on the set of operations that are accelerated by your graphics system, as well as the precision you need for the final result. On many current PC cards, it's likely that convolution, accumulation buffer operations, and glCopyTexImage are either unavailable or slow, in which case you're stuck with the glReadPixels/filter/glDrawPixels solution. However, the first thing I'd try is the approach using glCopyTexImage followed by blending multiple textured quadrilaterals into the framebuffer; my guess is that the basic operations are more likely to be accelerated in that case than in any other. ..." more 
  • "... This has nothing to do with perspective-correct interpolation of texture coordinates, which is why changing the perspective correction hint has no effect. It's more a problem of approximating a nonlinear model with a linear one. ..." more 
  • " ... 11:00 D. Image Processing (Grantham)

  • 1. OpenGL Image Processing
    2. Image Warping with Textures
    3. Accum Buffer Convolution
    4. Antialiasing with Accumulation Buffer
    5. Texture Synthesis & Procedural Texturing ..." more
  • Advanced Graphics Programming Techniques Using OpenGL more 
  • The Accumulation Buffer: Interpolation and Extrapolation more 
  • Angus Dorbie's Performer Pages 


references - geometry load

  • "... Larger polygon counts mean that geometry bandwidth will become increasingly important over the next couple of generations of hardware.

  • ...
    For a lit, smoothly shaded, textured triangle under the standard lighting model (diffuse + specular + ambient):
    Vertex = XYZ + NxNyNz + RGB * 3 + UV = 17 scalars/vertex.
    Typical choices are float for XYZ and normal, ubyte for color, and short for texture coordinates ->
    Vertex = (3 * 4) + (3 * 4) + (3 * 1) * 3 + 2 * 2 = 37 bytes/vertex
    3 vertices/triangle = 111 bytes.
    ...
    Furthermore, increasing numbers of applications depend on host-based culling, morphing, dynamics, and other preprocessing techniques which mean that almost every vertex *does* change on every frame." more
  • "... Why is the bandwidth of AGP significant? Everyone insists on focusing on texture swapping, texture uploading, and direct execute-mode texturing. What is completely neglected in this hoopla is the fact that triangle data can take just as much(and soon to be far more) bandwidth as texture data. ..." more 
  • "... Each triangle requires at least (X,Y,U,V,Q) for each vertex. Each component will be 4 bytes (assuming floats), which means each vertex takes at least 20 bytes -- so a triangle requires 60 bytes to be supplied, minimum. With lighting and Z(W)-buffering, you also have to add (R,G,B,Z) to every vertex, which would probably add up to at least another 8 bytes/vtx (Z(W)=float, R,G,B = 1 byte each, plus pad byte or alpha). So that's 84 bytes per triangle. ..." more 
  • "... We've measured PCI bus transfers to the video card of 90MB/sec on our Pentium Pro 200MHz Dell. That's enough bandwidth to get 2.8M vertices/sec over the bus, or between, say, 930K and 2.8M tri/sec depending on whether they are disjoint or in long strips. ..." more 
  • "... But the assertion that the bus is not the limiting factor in any realistic example is correct. ..." more 
  • "... reaching for 7 digits of drawn triangles/sec in your application for next year is not at all unreasonable, but it does take considerable amounts of work in streamlining your pipeline. ..." more 


references - OpenGL texture management

  • "... Nope, the big deal is being able to render into a texture efficiently or you spend all of your time switching contexts and copying or uploading textures (plus lots of state changes) ..." more 
  • "... you're best bet (and the real OpenGL solution) is a glCopyTexSubImage call ..." more 
  • "... I've been looking all over the place to try and find out how to use the card to render something, then use the results as a texture." more 
  • "... I want to use hardware to render into a texture ..." more 
  • "... One of the settings for glCopyTexSubImage() is forcing you down that path and if you get the state set up correctly you won't have that problem. ..." more 
  • " ... Paging textures ..." more 
  • Texture Subloading more 
  • " ... every new design I've seen in quite a while is capable of doing texture readback. If the drivers handle CopyTexSubImage in the way they should, then there should be no need to transfer subimages back to host memory. ..." more 
  • "... This is an excellent application of texgen and 1D textures. You can create a 1-dimensional texture map with contour lines or coloured contour bands. Then create a triangle mesh from the height field as Paul suggests. Then set up texture coordinate generation, ..." more 
  • nvidia game developer's conference powerpoint presentations:

  • http://www.nvidia.com/Marketing/Developer/Pages.nsf/pages/devTechStuffFR_top
    gdc99_mgprojtex.ppt (michael gold projective textures)
    gdc99_mgtexgen.ppt (michael gold gltexgen)
    gdc99_mgtexmng.ppt (michael gold texture management)
    GDC-Texture.ppt (D3D textures) 
  • OpenGL extension list currently at: http://reality.sgi.com/ljp_engr/registry/SGIX/pbuffer.txt 
  • opengl performance tuning 
owen pearn home page