The Inner Product, March 2003
Jonathan Blow (jon@number-none.com)
Last updated 16 January 2004
 
Unified Rendering Level-of-Detail, Part 1
Related articles:
Rendering 
Level-of-Detail Forecast 
(August 2002)
Unified Rendering LOD, Part 2 
(April 2003)
Unified Rendering LOD, Part 3 
(May 2003)
Unified Rendering LOD, Part 4 
(June 2003)
Unified Rendering LOD, Part 5 
(July 2003)
Last year, in the article "Rendering Level-of-Detail Forecast" (August 
2002), I discussed a few of the most popular methods of LOD management. I 
opined that most LOD schemes are too complicated for the benefit they provide, 
and I gave a high-level overview of the method I would choose to implement LOD 
management over large, generalized scenes.
Now, I'm going to put my money where my mouth is, and implement such a scheme. 
It's a big project; the resulting source code will be the biggest this column 
has seen, by far.
Goals of the System
We would like the system to reduce the number of triangles used to render 
distant objects, where precise detail is not necessary. Some algorithms, like 
the Lindstrom-Koller and ROAM descendants, do this by modifying the world mesh 
at a granularity of individual triangles. These algorithms are unacceptably slow 
at high detail levels. Also, many of these systems try to utilize frame 
coherence, which makes them unsuitable for interactive systems. (This may sound 
like a controversial statement because frame coherence is a standard tool in the 
graphics guy's optimization toolbox; but that is mainly because the culture of 
graphics optimization has its origin in tools for batch rendering, for film and 
similar applications. In games, we unquestioningly accepted frame coherence 
exploitation as a good thing; but it inherently produces unstable frame rates, 
and frame rate stability is a major priority for smooth interactivity. Look for 
an explanation of this, and a corresponding rant, in a future column.)
So, we want the system to operate at a large granularity, raising or lowering 
the detail of big chunks of geometry all at once. Thatcher Ulrich's 
appropriately-named "Chunked LOD" system works like this, and it runs very 
quickly. However, it uses the "binary triangle tree" tessellation, the same one 
that ROAM uses. This tessellation is very inefficient about how it allocates its 
triangles (though you won't find a serious efficiency analysis in any research 
paper promoting binary triangle trees!). Also, it means that regular-grid height 
fields are the only kind of geometry on which the algorithm can be easily used. 
With a lot of work, the algorithm can be extended to other topologies. But the 
amount of work necessary is large, it complicates the runtime system 
tremendously, and the system still won't handle arbitrary input meshes.
Thatcher chose the BTT topology to make it easy to fill the cracks between 
neighboring chunks of terrain. This is important because cracks look awful. We 
want to fill cracks correctly, and quickly, but without restricting the mesh 
topologies. Arbitrary mesh topology is important because we want to create a 
method of _unified_ LOD management, which can be used for expansive outdoor 
terrains, or small enclosed dungeons, or for animated characters. 
Aside from topology, we want the system to place a minimal number of other 
constraints on mesh storage and rendering. An example of such constraints is 
seen with progressive meshes. It is a complicated and somewhat inefficient task 
to render a progressive mesh as a series of triangle strips; it is nigh 
impossible to render it as a series of vertex-cache-optimized triangle strips. 
If we introduce too many constraints, then when we come along to add new 
features to our engine, like stencil shadows or normal-mapping geometry 
enhancement, we may find that constraints of the new features clash with the old 
LOD constraints. This forces us to eliminate the new feature, or to dump our LOD 
system and make a new one. Either outcome is bad.
Finally, we want the system not to exhibit popping from one LOD to another. 
Popping looks ugly and is distracting to players; often we are looking for 
movement in our surroundings, and LOD popping creates false movement. Because we 
are choosing to adjust detail at a large mesh granularity, we will naturally get 
a lot of popping unless we spend significant effort to avoid it.
To summarize, I'll list all the main goals now as bullet points:
Reducing the detail of distant geometry
My overall approach will be to cut the world geometry into pieces at preprocess 
time, and generate several detail levels for each piece. At runtime, I will 
choose an appropriate detail level for each piece and then render it, filling in 
the cracks. 
To generate the various levels of detail, I will use Garland-Heckbert Error 
Quadric Simplification, or "EQS" (see For More Information). The input to EQS is 
a mesh of arbitrary topology, and it produces a mesh of arbitrary topology. We 
like that since it doesn't restrict us. 
When reducing the mesh, I use what Garland and Heckbert call "subset placement": 
I pick a vertex, then drag it to the position of a nearby vertex, then remove 
all the triangles that have become degenerate. The alternative is "optimal 
placement", where you solve for the position of a new vertex, then move two 
vertices into that new position. Subset placement is easier to program than 
optimal placement, but it probably produces a lower-quality output 
triangulation. Nevertheless, I chose subset placement because of its extreme 
generality. The output vertices are a subset of the input vertices, which means 
that they comprise valid and undistorted mesh data. With optimal placement, you 
must interpolate and extrapolate to produce the values for each vertex (not just 
position, but texture coordinates, perhaps tangent frames, bone blending weights 
for animated meshes, or other arbitrary vertex-associated data). Linear 
interpolation might not be good enough, or might require specialized 
renormalization; extrapolated values might need to be bounded in 
application-specific ways. It is definitely possible to do these things, but 
there's still no guarantee that the machine-generated coordinates will look 
good. It costs a lot of extra work and software complexity, to gain a 
potentially small visual benefit. Also, since the work is highly dependant on 
the type of data stored at each vertex, it's difficult to make a generalized 
tool that operates in a "hands-off" manner.
To be clinical: the choice to use optimal placement creates extra dependencies 
between the mesh simplification tool and the data format. Since dependencies are 
the bane of software engineering and project management, it is best to use 
subset placement in almost all cases.
Dividing the geometry into pieces
For now, in order to make it easy to divide the world into blocks, and to 
simplify other tasks like crack-filling, we'll limit ourselves to rendering 
height fields. By the end of this series, though, we'll be operating on triangle 
soups. There's a fine line to walk here -- I want to make the system simple 
enough that I can progress quickly programming it, so I start with just height 
fields; but at the same time, I don't want this to be a permanent restriction, 
so I must be careful not to rely on techniques that cannot be extended beyond 
height fields. In other words, for every simplification I impose now, I need to 
have a believable story about how that piece of the system will be upgraded in 
the future. It's a sort of algorithmic bootstrapping, if you will.
With a height field, it's easy to divide the world: you have some big array of 
height samples for the entire world, and you copy out rectangular subsections of 
that array. Because the array is evenly-sampled, we can easily match up the 
vertices along the edges of the blocks, which is necessary for crack-filling 
(discussed later). Note that there is no limit on the size or aspect ratio of 
these subsections; you can choose them arbitrarily based on your needs. This 
contrasts starkly with the binary triangle tree algorithms, which want your 
blocks to be square and power-of-two-plus-one in size, which is often 
inconvenient.
In a future upgrade, to handle unrestricted input meshes that extend arbitrarily 
into all three dimensions, I'll clip the meshes against axis-aligned planes to 
divide them into a bunch of cube-shaped regions. When clipping triangles against 
planes, we create all the new vertices ourselves, and in fact we create them in 
pairs. By saving the information about which vertices correspond, we will make 
it easy to perform crack-filling. But that's a subject for a future article.
Operating at a large granularity
Suppose we choose subsections of the height field that are 21x21 samples. That 
gives us 20x20 quads, or 800 triangles per block. Suppose we use mesh 
simplification to generate low-res versions of the block at 400 triangles, 200, 
100, 50, and 25. Based on the distance from each block to the camera, we choose 
one of these detail levels and render it. The result is shown in Figure 1a. It's 
bad because most of the rendered blocks consist of a small number of triangles. 
Current graphics hardware and drivers do not like that very much; the game will 
run slowly.
To solve this problem, we can hierarchically combine terrain blocks before 
putting them into the EQS routine to reduce them. Since we're working on a 
height field, I chose to combine 4 blocks at a time, prior to simplification. If 
the original blocks are 800 triangles each, I combine 4 of them to get a 
3200-triangle block; then I simplify tha mesh back down to 800 triangles. Thus 
all the rendered blocks have the same number of triangles, regardless of scale, 
as you see in Figure 1b. This simplifies some of the math we'll look at later 
on.
With fully 3D input geometry, I would be combining 8 blocks into 1, requiring a 
more extreme reduction ratio. You may wish to think about the implications.
|  |  | 
| Figure 1a: The squares represent blocks of our world mesh; the numbers denote how many triangles make up each block. If we merely reduce the detail levels of a fixed set of blocks, we will inefficiently render blocks that contain small numbers of triangles. | Figure 1b: By combining blocks, we can eliminate this problem. Note that the total number of triangles in 1b is higher than in 1a; we will worry about this in a future article. | 
Filling Cracks
After rendering two blocks, we need to fill the gap between them by rendering an 
appropriate set of connecting triangles. I will call this set of triangles a 
"seam". Because all of our blocks are precomputed, we can also precompute the 
seams. At runtime we only render precomputed arrays, which is very fast. 
Most people think of the crack-filling problem in a way that makes it needlessly 
difficult. They think about how, when given two arbitrary-LOD blocks, to match 
up the vertices along their edges. This is a difficult problem. Fortunately, 
though, we don't need to solve it. Instead, we can first build seams between the 
highest-LOD blocks. This is trivial; you just iterate along the array and 
generate a row of triangles (see Figure 2b). For each vertex of each seam 
triangle, you store an integer telling you which block the vertex is connected 
to, and another integer telling you the index of the vertex inside that block. 
So we're only storing indices, not spatial coordinates. That's important for the 
next step.
Suppose we reduce one of the blocks, as in Figure 2c. All we need is for the EQS 
routine to tell us which vertex in the source mesh corresponds to each vertex in 
the destination mesh. That's especially easy since we used subset placement; 
when we collapse a vertex into another vertex, we simply record the index of 
where it went. When detail reduction is complete, we return an array that tells 
us "For each index in the source mesh, here is the index of the corresponding 
vertex in the output mesh". Now we use this array to re-map the indices of the 
seam triangles, and we throw away any degenerate triangles. Now we have a valid 
precomputed seam for two blocks at differing resolutions. We can repeat this 
process as often as we want, reducing either block as much as we want. When we 
combine blocks together, we merge their seams. (Seams that end up interior to 
the block are tossed in with the block's regular geometry; seams on the borders 
are merged together to make bigger seams).
The seams between the highest-resolution blocks, which we generated from the 
original height field sampling pattern, are not necessary for rendering. If we 
were to render them, all their triangles would be collapsed to 0 area; Figure 2 
is drawn with the blocks artifically pulled apart to make the filling pattern 
clear. So we throw those seams away after the preprocess, keeping only the seams 
that involve detail-reduced blocks.
 
|  | |
| Figure 2a: Two blocks of terrain, connected by a seam. The blocks are drawn with an exaggerated gap between them; in actual game rendering, they would be touching. The seam is drawn with intentionally strange coloring to show the individual triangles. | |
|  |  | 
| Figure 2b: A wireframe version of 2a. | Figure 2c: The right-hand block has been reduced to 1/4th its original triangle count. The seam is created by cross-referencing the seam of 2b through an index map provided by the mesh simplifier. | 
Now the question arises: how many of these seams do we need to precompute, and 
how do we organize them in memory? Clearly we need an appropriate seam between a 
given block and any of its possible neighbors. To reduce the number of 
possibilities, I decided to place a restriction on the LODs of rendered blocks: 
two neighboring blocks are not allowed to differ by more than one level of 
detail. So for any given block, all his neighbors will either be the same size, 
one-half his size (measuring just the length of an edge, not area), or double 
his size. 
If every block stored all the seams for all its neighbors, we'd be storing every 
seam twice: both block A and block B would store the seam that attaches A to B. 
Instead, I store only seams for neighbors in the +X and +Y directions. 
In the end, each block has eight seam pointers. There are 4 for the +X direction 
and 4 for the +Y. The four pointers are: seam to lower-resolution block, seam to 
same-resolution block, and two seams to the two higher-resolution neighbor 
blocks.
After all this seam-filling, there will still be some small holes in the 
terrain, where the corners of blocks meet, as seen in Figure 3. For the case of 
a height map, these can be filled very quickly at runtime. I won't explain the 
details now, since the procedure will change significantly when we adapt the 
algorithm to generalized meshes.
|  | 
| Figure 3: Three terrain blocks (gray with black borders) and the seam fills between them (red with dotted borders). Note the hole in the middle. These blocks are drawn with exaggerated gaps; the actual hole would be very small. | 
Sample Code
This month's sample code renders a height field with filled seams. It also does 
some work to eliminate popping. Next month we'll discuss popping in detail, and 
look at methods of choosing which LOD to use for a given block.