The Inner Product,
Jonathan Blow (email@example.com)
Last updated 12 January 2004
Scalar Quantization (June 2002)
Part 1 (August 2003)
Arithmetic Coding, Part 2 (September 2003)
Compression Across Unreliable Networks (October 2003)
Piecewise Linear Data Models (November 2003)
Note: This article is currently missing its figures.
We will attempt to remedy this situation in the future.
This month's article continues the discussion of packing values into small spaces, which is useful for network communications and save-games. Previously, I discussed integers and scalars; this month I'll talk about vectors.
We most commonly deal with two different categories of vectors. The first category is unconstrained vectors, with "unconstrained" meaning that none of the vector's coordinates can be written as an equation of its other coordinates. An example is the position vector of an entity in the game world; we may set limits on the magnitudes of the x, y, or z components, but otherwise, they are independent. The second category is unit vectors, which are spherically constrained. Examples are surface normals, direction vectors (perhaps representing which way a player is looking), and rotations (which are unit vectors in 4 dimensions).
We'll start by looking at a 2D position vector. The most obvious, and most common, way to quantize the vector is to treat the coordinates as two separate scalars and encode them using a method from last month, probably the vanilla scalar quantization. This is equivalent to dividng the map up into a bunch of grid squares, finding which square contains your entity, and using the center of that square to represent the entity's position.
The resolution of that grid controls the quality of our encoding; if the grid squares are too big, we won't be able to accurately represent positions. Usually we choose a grid resolution by first determining how much error we can tolerate; then we find the square size that gives us that much error at maximum. The most an entity's position can be wrong, via this encoding, is the distance from the center of the square to one of its corners (Figure 1).
This technique has some disadvantages. One problem is that the accuracy of transmitted entity positions is highly anisotropic. If an entity is moving along a diagonal of the grid, its average error will be much higher than an entity moving along an axis. Anisotropy is nice in rendering, but here it is a bad thing. It's bad because, if we make the grid squares small enough that the error in diagonal directions is acceptable, then we get more accuracy than we need in the axis-aligned directions. That means we've wasted some bandwidth.
We can reduce this effect by quantizing into shapes that are more round than squares. Wargamers will be familiar with the tiling of the plane by hexagons (Figure 2). A square of area 1 has extremal points about .707 units away from the center; a hexagon of area 1 has extremal points about .620 units from the center. That's a little more than 10% closer. This implies, via a little bit of math, that you need 30% more squares than hexagons to tile an area with equal error thresholds. So when using hexagons, you save about .4 bits per transmitted position. That might sound small; but if you have a bandwidth bill the size of a successful MMORPG's, you've saved perhaps $2,500 a month.
(Statistically-minded readers can compute the covariance matrices for the square and hexagon and observe the magnitudes of the eigenvalues, which tells you their relative compactness).
We might try to do better than hexagons by seeding the plane with a bunch of unstructured points, and assigning all input values to the closest point. This is equivalent to tessellating the plane by the Voronoi diagram of those points, and choosing the Voronoi region in which an entity lies. We would want the points to be as evenly-spaced as possible; we could precompute them by running an energy relaxation program that causes points to repel each other until they reach equilibrium. I have included such a program in this month's sample code. When you run it, you get ... hexagons.
It turns out that the problem of tessellating the plane into congruent shapes is equivalent to the circle packing problem. (At a coarse level, you can think about the relationship this way: we are trying to tesselate the 2D plane with compact shapes, and a circle is the most compact 2D shape; we want to minimize the amount of the plane that ends up between the circles, as these portions produce higher-than-ideal error.) A lot of smart people have already spent a lot of time thinking about circle packing; that's good for us. It's been proven that, in 2D, the most optimal way to pack circles is in a hexagonal lattice (see the References). Translated to the task at hand, this means that there is no better way to evenly quantize the 2D plane than by using hexagonal tessellation (where when I say "better" I am judging with respect to the error metric outlined earlier).
At this point we may wish to think about why crystals like to form in hexagonal shapes, or why bees build honeycombs as a hexagonal tiling. This kind of thing helps maintain my faith that game programming is teaching me deep and important things.
Expanding to 3D
Suppose we want to quantize 3D space. We can turn again to the circle packing guys for an answer to this. As it happens, there are two equally good packing shapes for 3D, both of which are proven to be optimal among regular lattice packings, and very strongly believed to be optimal as general packings as well. One of them is a 3D extension of the 2D hexagonal tiling. The other one, which I find easier to visualize, is called the "face-centered cubic packing". You can find out lots of relevant things about circle packing just by typing "face-centered cubic packing" into your favorite search engine (Google).
However, direct 3D quantization is probably overkill for most games. For example, most MMORPG worlds are predominantly 2-dimensional, and most things in the world are standing or resting on a solid surface. So a more efficient way of transmitting position would be to use the 2D hexagonal tiling to communicate the entity's (x, y) position; then use a vertical raycast to compute a small integer index for which surface the entity is standing on. Though when you do this, you will have to provide some handling for exceptional cases, when the triangle that the entity is standing on does not extend all the way to the (x, y) of the quantized position.
You can Huffman code these vertical triangle indices; so when something is supported by a very popular triangle, the z information will require only 1 bit to transmit.
When an entity is flying or swimming, you can use a fallback method of transmitting position that consumes more bandwidth in order to explicitly communicate z.
"Independent" Coordinates are Related By Our Problem Domain
So you see that before we even introduce constraints, the two dimensions are tied together somehow -- we can't really treat them independently without suffering a bandwidth hit. The dimensions are intertwined by a requirement of our problem domain: that we want some kind of relativity with respect to orientation. That is, we want isotropic behavior, so that the reference frame can rotate and our world should not behave much differently. At some level I think this is a deep thing, though when stated like this, it's sort of a "well duh" kind of thing.
Mathematically, this comes down to the fact that we're using the Euclidean norm as our distance metric, and the Euclidean norm ties dimensions together.
Next I'll discuss constraints, which tie things together even more. First, though, I'll clarify that the general process we are performing here, mapping an input vector to some vector in a fixed dictionary, is known as _vector quantization_, or VQ. VQ has been studied extensively, and there's a lot of literature out there about it. We're not going to philophize about general VQ here; we're just looking at a couple of very specific cases.
The most popular kind of constrained vector in games is the unit vector, either in 3D (to represent things like aiming directions or surface normals) or in 4D (to represent rotations). We'll start with 3D.
The most common ad hoc method of quantizing 3D unit vectors is to convert them into azimuth/elevation form and uniformly quantize each parameter separately. This technique sucks. When you use it, you get Figure 3. In order for the patches near the equator to be acceptably small, there needs to be a big excess of patches near the poles. We're wasting bandiwdth.
The set of all 3D unit vectors is equivalent to the set of points on the unit sphere in 3D. So to perform VQ, we want a dictionary consisting of points evenly spaced on the unit sphere.
In 2D, it was easier to visualize things by solving the equivalent problem of tessellating the plane. But on the unit sphere in 3D, things are less intuitive. We could start solving some math about how to create a maximally uniform tessellation of the curved surface of
the sphere (or hitting the math books, as I'm sure several people have worked this one out before), but that would be a big pain. We could also settle for some shape that seems reasonable, like a soccer ball; but this doesn't give us any control over how many dictionary vectors we get to use (the resolution of our encoding), and we also don't know how close to optimal it would be for a given dictionary size.
The simplest solution for us is just to run another relaxation program, using the dot products between vectors as a metric that tells us how much they repel each other. Such a program has also been included in this month's sample code. The number of vectors you desire is passed in as a command-line parameter.
Sometimes, quantization is part of a preprocess; so you can quantize your input vector just by comparing it against each of the dictionary vectors in sequence, and outputting the one with the highest dot product. This is appealing because of its simplicity.
If you need more speed than that, you can BSP the set of dictionary vectors. To do this, first identify the nearby vectors for each vector in the dictionary, and for each pair of vectors, create a separating plane exactly halfway between them. Now construct a BSP tree from these separating planes. Mark each leaf node of the tree according as belonging to whichever dictionary vector belongs inside it. Now you're ready to encode very quickly.
We want to encode rotations in a manner that is isotropic in rotation space. Most ad hoc methods decompose the rotations into various parameters, like Euler angles or an axis/angle representation, and then encode each coordinate separately. This causes the same anisotropy problems we just saw with 3D unit vectors, but now they're more complex and harder to visualize.
We are going to encode rotations by leveraging the fact that quaternions are the natural, undistorted representation of 3D rotations. Justifying that statement is beyond the scope of this month's article, but I'll do it in a future issue sometime. For now, I'll just say that it's related to the fact that constrained quaternion interpolation gives you the minimal-torque transition between two rotations, and refer you to the standard quaternion references on the Web.
We could quantize the quaternion (x, y, z, w) components separately, taking into account that the 4th component of a unit quaternion is mostly determined by the other 3. But this is not a good idea, since we want our quantization to be regular in curved space, not linear space.
The most bandwidth-efficient method is to treat the quaternion, a 4D unit vector, in exactly the way we just treated 3D unit vectors. We use a relaxation scheme to precompute a set of dictionary vectors, then match the input quaternion against that dictionary.
The biggest drawback with this method is that, because 4D space is big, we may need a lot of dictionary vectors to achieve the required resolution. If we have too many dictionary vectors, keeping a BSP tree of those vectors in RAM becomes cumbersome. We can start playing tricks to reduce this storage, by e.g. generating vectors that cover only 1/16 of the unit sphere, and requiring the generated set of vectors to tile with itself to cover the sphere. The tiling constraint will introduce a small amount of inefficiency, but it's not too bad. Try visualizing the equivalent trick in 3D space: imagine generating vectors for only one octant of the 3D unit sphere, and attaching that octant to copies of itself in order to cover the whole sphere.
With quaternions, we shouldn't forget that we do not actually want to cover the whole sphere, because quaternions that face in opposite directions represent the same rotation. So even if we generate vectors for a whole sphere, we then want to cull about half of them; if the input quaternion does not lie within the hemisphere we have covered, we try its opposite.
If you still need too much resolution for this approach to be practical, then I recommend using an axis/angle encoding; use the previously discussed 3D unit vector encoder for the axis, and scalar quantization for the angle. To make this work best, you need to balance how much resolution goes into the angle quantization versus the vector quantization. I haven't worked this out, but the easiest thing to do would be to run a numerical optimizer that tries various possibilities until it achieves the best balance.
Often the vectors we want to transmit will be biased somehow. For example, in an FPS or MMORPG, the direction a player is looking will usually be along the horizon.
You might take advantage of this by distributing the 3D unit dictionary vectors so that many of them point toward the equator of the sphere, and few of them point toward the poles. This allows you to achieve a low mean error for the directions that you transmit, but your maximum error will rise, and this is probably not good. As an analogy, think of frame rate. If you are optimizing a rendering system to achieve high average frame rate, you don't want to do things that sometimes cause the frame rate to drop very low. Those drops will be very objectionable, even though the average is high.
I think the best method of exploiting this kind of pattern is to keep the quantization size fixed as we did this month; then, use a statistical encoding scheme, like Huffman or arithmetic coding, on the output.
Handbook of Discrete and Computational Geometry, Jacob E. Goodman and Joseph O'Rourke, eds, CRC Press, 1997. Chapter 50, "Sphere Packing and Coding Theory", by J.A. Rush.
“Circle Packing”, Eric Weisstein’s World of Mathematics,. http://mathworld.wolfram.com/CirclePacking.html