faster rendering using relief textures?
Authored by Owen Pearn
-
2 September 2005 - edited
-
27 January 2000 - minor edits
what a difference five years makes
Relief texture mapping can now be done in realtime on the video card with
shaders.
Watch a video
showing Relief Mapping in DOOM 3 running in real time on a GeForce 6800
GT.
This is from "Rendering Surface Details in Games with Relief Mapping
Using a Minimally Invasive Approach" by Fabio
Policarpo and Manuel
M. Oliveira. They generated depth maps from the DOOM 3 normal maps.
Example shader source code in nVidia's CineFX
4.0 Technical Brief.
historical stuff below
what are relief textures?
Relief textures
are a new image-based rendering
technique developed by Manuel
Oliveira and Gary Bishop at
the University of North Carolina at Chapel Hill.
It appears that relief textures have the potential to significantly
increase visual realism of rendered geometry while keeping system load
constant. Briefly, a preprocessing stage replaces geometry with texture,
in such a way that it looks like the geometry is still there, even from
viewpoints almost parallel to the texture. Sort of like bump-mapping, only
better.
"Figure 1. Scene rendered using three conventionally texture-mapped
quadrilaterals (left). Same scene rendered with three Relief Textures and
five quadrilaterals (right). Note the dormers." more
 
A relief texture is rendered by doing a "computation" followed by a
normal texture map (done by the graphics card). If the computation can
be done in less time than the time saved by not processing the replaced
geometry, then there will be a net performance win.
This is where you come in.
I am searching for ways to do the computation "fast" (ideally by the
graphics card). This may involve the application of some Extra Cunning.
got code?
Source code and much, much more by Sergey
Parilov.
can relief textures plug into existing code?
From an OpenGL pipeline's point of view, relief textures are dynamic (procedural)
textures. This means that they can be selectively applied where they make
the most difference, without requiring a complete rewrite of the entire
pipeline to fit a new paradigm.
how do relief textures work?
A relief texture is a texture with an associated heightfield (a height
per texel).
In a preprocessing step, geometry is rendered from a number of viewpoints
using orthographic projection. The resultant images become the textures,
and the height at each texel is the distance from a supporting plane to
the surface sample that projects onto the pixel, forming the heightfield.
To render, the texture undergoes a "pre-warp" operation per texel, creating
a new texture. This new texture is then texture-mapped in the usual way.
Effectively, the heightfield replaces the geometry. The goal is to do the
pre-warp with less cost (or at least no additional cost) than it would
take to render the original geometry.
Manuel Oliveira explains it in his paper
(.pdf).
can relief textures help realtime gaming?
If some geometry can be replaced by relief textures with a net performance
win, then there will be some extra time to put in more stuff (ie. geometry,
textures, puppies).
If some geometry can be replaced by relief textures without a net performance
win, but without increasing total load, visual realism might be increased
anyway.
Take today's state-of-the-art game, Quake3Arena.
All geometry lives in a binary-space-partitioning
tree. This is used to do culling by the cpu (not the graphics card),
as well as in-game visibility checks and collision detection. Each frame,
only the visible geometry is sent over the bus to the graphics card for
rendering. Most of the textures remain resident in graphic card RAM. A
few dynamic textures are sent each frame.
The target machine is something like a pentium II with a 300 mhz cpu
and an ATI Rage 128 or Nvidia RIVA TNT graphics card. These graphics cards
have a street-price today of about US $120. The graphics pipeline is OpenGL.
Brian Hook reports
(.ppt) 1.5 million triangles per second through this pipeline on a 500
Mhz pentium III driving an ATI Rage128 graphics card.
The graphics cards above don't have on-board geometry engines, so the
cpu is doing the transform for each triangle (in the opengl client-side
driver), and sending each transformed triange over the bus to the graphics
card for rasterization (texture mapping). Note that Quake3Arena does its
own lighting using lightmaps (blended textures) and does not use the OpenGL
lighting path. Reportedly, between 50% and 75% of the time is spent in
the OpenGL client-side driver, authored by the manufacturer of the graphics
card. The balance is spent doing culling, visibility, collision detection,
networking, input, sound and other necessary things.
Typically, folks want to run this type of game at a frame rate of (at
least) 30 frames per second, which gives a per-frame triangle budget of
(1.5 million / 30) 50,000, used for visible dungeon, aliens, environment,
various damage-inducing-projectiles and stunt puppies.
The triangles are put into strips and/or fans to reduce average vertexes
per triangle from 3. Let's assume this goes to 1.5 vertexes per triangle
on average, which gives (1.5 * 50,000) 75,000 vertexes to transform and
send per frame.
Each vertex is about 32 bytes which gives (32 * 75,000) 2,400,000 bytes
per frame, which at 30 fps, puts (30 * 2,400,000) 72 MB/sec over the bus
(PCI bus peak is 132 MB/sec, AGP peak is 264 MB/sec, AGP 2X is 532 MB/sec).
The TNT graphics card has a reported
peak of 6 million rendered triangles per second.
The bottleneck appears to be the cpu having to transform 75,000 vertexes
per frame and send them to the graphics card. A 500 Mhz cpu has (500,000,000
/ 30) 16,666,666 cycles per frame to spend, and assuming we can use at
most 75% of that for vertex processing, that's (75% * 16,666,666) 12,500,000
cycles, which for 75,000 vertexes, is (12,500,000 / 75,000) 166 cpu cycles
per vertex.
So today, this game appears to be geometry-bound. The key point here
is that we have no choice but to send all visible geometry each frame.
(Note: I know that tomorrow's cards will have geometry transform engines
on them, and I know that single-instruction-multiple-data client-side drivers
(MMX, KNI, 3DNow) will reduce the cpu load, and I know that dynamic level-of-detail
schemes could be implemented.)
Some notes on optimizing
OpenGL drivers for Quake3.
observations
-
Only visible relief textures are candidates for updating each frame.
-
Not every candidate may need updating each frame (we have a choice as to
which relief texture(s) get updated each frame, and how). This may be visually
"okay", because OpenGL will still have a texture to map (it just won't
be the theoretically correct texture). We need only send a "very small"
amount of geometry to get the (albeit "incorrect") texture rendered.
-
"Small" changes in viewpoint may not require rerendering, especially for
textures "far away" from the camera.
-
Give priority to candidates that are being viewed from a viewpoint that
is "very different" from their last rendered viewpoint and/or "big-on-screen".
-
For a candidate that we have chosen to update, we may need only update
those texels which will undergo the most perceivable change, given the
change in viewpoint.
-
More than one pre-warped texture per relief texture could be cached on
the card, corresponding to more than one viewpoint.
-
Rather than a displacement per texel, we may only need a displacement per
"group" of texels.
-
Relief textures do not need to be preprocessed for viewpoints that will
never be seen.
-
There may be a more compact set of preprocessed viewpoints than a bounding
box (or part thereof).
-
Collision detection gets harder. (Is it enough to collision detect against
the polygon only, or do we have to collision detect against the heightfield
as well?)
questions
-
Is there any combination of OpenGL primitives that could do the pre-warp
in hardware (even if those primitives are unlikely to be "fast" on today's
consumer-level hardware)?
-
Is there an existing OpenGL extension that could do the pre-warp (like
ARB_imaging), or could a new extension be defined?
-
Could the pre-warp be approximated by existing OpenGL primitives?
-
How do folks currently do procedural textures "fast"?
-
Can the heightfield be encoded somehow in a texture and somehow blended
with the image texture to do the pre-warp?
-
Can we approximate the heightfield with a parametric spline and use OpenGL
evaluators to do the pre-warp? (does this need to be done on-card to be
done "fast"?)
" ... All polynomial or rational polynomial splines of any degree (up
to the maximum degree supported by the GL implementation) can be described
using evaluators. These include almost all splines used in computer graphics,
including B-splines, Bezier curves, Hermite splines, and so on. ..." more
" ... parametric curves and surfaces are both common (particularly
in CAD and 3D modeling applications) and quite computationally expensive.
It makes sense for OpenGL to off-load the floating-point intensive task
of evaluating these curves and surfaces to fast 3D transformation engines
when available ..." more
pre-warp example source
"Figure 7. Pseudocode for left-to-right warp and construction of
one pixel with u (or v) index = I, color = C and displacement D." more
get Iin, Cin, Din
Inext = Equation_10a(Iin, Din)
for (Iout = integer(Iprev + 1); Iout <= Inext; Iout++)
linearly interpolate Cout between Cprev and Cin
linearly interpolate Dout between Dprev and Din
put Iout, Cout, Dout
Iprev= Inext; Cprev= Cin; Dprev= Din
references - OpenGL warping, image processing
-
"... This extension defines a new depth texture format. An important application
of depth texture images is shadow casting, but separating this from the
shadow extension allows for the potential use of depth textures in other
applications such as image-based rendering or displacement mapping. ..."
more
-
"... Or: render into a texture (SGI has an extension to do render directly
to texture memory; what hardware are you using?) and then draw a grid with
the texture mapped onto it, with texture coordinates set to give you the
correct warping. Texture mapping is a great way of approximating ANY image
transformation. ..." more
-
"... With OpenGL, procedurally modifying a triangle mesh's UV and mapping
a (possibly dynamically generated) texture to it is the canonical way to
implement distorted/stained/simple mirrors, flowing water/lava surfaces,
heated air, predator invisibility effects, Q1 like sky textures etc ..."
more
-
"... UAV shows an unusual use of projective texturing and shadow testing
for accelerated image orthorectification. ..." more
-
" ... Image warping or dewarping may be implemented using texture mapping
by defining a correspondence between a uniform polygonal mesh and a warped
mesh. The points of the warped mesh are assigned the corresponding texture
coordinates of the uniform mesh and the mesh is texture mapped with the
original image. Using this technique simple transformations such as zoom,
rotation, or shearing can be efficiently implemented. The technique also
easily extends to much higher order warps such as those needed to correct
distortion in satellite imagery. ..." more
-
"... 2. The bottleneck in rendering graphics is not in the pipeline that
manipulates pixels, it is in sending the information about the scene across
the bus (for hardware that works on systems with a bus - PCs). This is
due to the relatively verbose nature of graphics APIs such as OpenGL and
Direct3D.
3. A great deal of image manipulation can already be done by treating
the image as a texture, mapping it to a polygon, and rendering that. OK,
this doesn't give you warping operators for depth images, but virtually
everything else is supported. ..." more
-
" ... The best choice depends on the set of operations that are accelerated
by your graphics system, as well as the precision you need for the final
result. On many current PC cards, it's likely that convolution, accumulation
buffer operations, and glCopyTexImage are either unavailable or slow, in
which case you're stuck with the glReadPixels/filter/glDrawPixels solution.
However, the first thing I'd try is the approach using glCopyTexImage followed
by blending multiple textured quadrilaterals into the framebuffer; my guess
is that the basic operations are more likely to be accelerated in that
case than in any other. ..." more
-
"... This has nothing to do with perspective-correct interpolation of texture
coordinates, which is why changing the perspective correction hint has
no effect. It's more a problem of approximating a nonlinear model with
a linear one. ..." more
-
" ... 11:00 D. Image Processing (Grantham)
1. OpenGL Image Processing
2. Image Warping with Textures
3. Accum Buffer Convolution
4. Antialiasing with Accumulation Buffer
5. Texture Synthesis & Procedural Texturing ..." more
-
Advanced Graphics Programming Techniques Using OpenGL more
-
The Accumulation Buffer: Interpolation and Extrapolation more
-
Angus Dorbie's Performer Pages
references - geometry load
-
"... Larger polygon counts mean that geometry bandwidth will become increasingly
important over the next couple of generations of hardware.
...
For a lit, smoothly shaded, textured triangle under the standard lighting
model (diffuse + specular + ambient):
Vertex = XYZ + NxNyNz + RGB * 3 + UV = 17 scalars/vertex.
Typical choices are float for XYZ and normal, ubyte for color, and
short for texture coordinates ->
Vertex = (3 * 4) + (3 * 4) + (3 * 1) * 3 + 2 * 2 = 37 bytes/vertex
3 vertices/triangle = 111 bytes.
...
Furthermore, increasing numbers of applications depend on host-based
culling, morphing, dynamics, and other preprocessing techniques which mean
that almost every vertex *does* change on every frame." more
-
"... Why is the bandwidth of AGP significant? Everyone insists on focusing
on texture swapping, texture uploading, and direct execute-mode texturing.
What is completely neglected in this hoopla is the fact that triangle data
can take just as much(and soon to be far more) bandwidth as texture data.
..." more
-
"... Each triangle requires at least (X,Y,U,V,Q) for each vertex. Each
component will be 4 bytes (assuming floats), which means each vertex takes
at least 20 bytes -- so a triangle requires 60 bytes to be supplied, minimum.
With lighting and Z(W)-buffering, you also have to add (R,G,B,Z) to every
vertex, which would probably add up to at least another 8 bytes/vtx (Z(W)=float,
R,G,B = 1 byte each, plus pad byte or alpha). So that's 84 bytes per triangle.
..." more
-
"... We've measured PCI bus transfers to the video card of 90MB/sec on
our Pentium Pro 200MHz Dell. That's enough bandwidth to get 2.8M vertices/sec
over the bus, or between, say, 930K and 2.8M tri/sec depending on whether
they are disjoint or in long strips. ..." more
-
"... But the assertion that the bus is not the limiting factor in any realistic
example is correct. ..." more
-
"... reaching for 7 digits of drawn triangles/sec in your application for
next year is not at all unreasonable, but it does take considerable amounts
of work in streamlining your pipeline. ..." more
references - OpenGL texture management
-
"... Nope, the big deal is being able to render into a texture efficiently
or you spend all of your time switching contexts and copying or uploading
textures (plus lots of state changes) ..." more
-
"... you're best bet (and the real OpenGL solution) is a glCopyTexSubImage
call ..." more
-
"... I've been looking all over the place to try and find out how to use
the card to render something, then use the results as a texture." more
-
"... I want to use hardware to render into a texture ..." more
-
"... One of the settings for glCopyTexSubImage() is forcing you down that
path and if you get the state set up correctly you won't have that problem.
..." more
-
" ... Paging textures ..." more
-
Texture Subloading more
-
" ... every new design I've seen in quite a while is capable of doing texture
readback. If the drivers handle CopyTexSubImage in the way they should,
then there should be no need to transfer subimages back to host memory.
..." more
-
"... This is an excellent application of texgen and 1D textures. You can
create a 1-dimensional texture map with contour lines or coloured contour
bands. Then create a triangle mesh from the height field as Paul suggests.
Then set up texture coordinate generation, ..." more
-
nvidia game developer's conference powerpoint presentations:
http://www.nvidia.com/Marketing/Developer/Pages.nsf/pages/devTechStuffFR_top
gdc99_mgprojtex.ppt (michael gold projective textures)
gdc99_mgtexgen.ppt (michael gold gltexgen)
gdc99_mgtexmng.ppt (michael gold texture management)
GDC-Texture.ppt (D3D textures)
-
OpenGL extension list currently at: http://reality.sgi.com/ljp_engr/registry/SGIX/pbuffer.txt
-
opengl
performance tuning
|