New Page 1

The Inner Product, February 2004
Jonathan Blow (jon@number-none.com)
Last updated 13 May 2004

Designing the Programming Language

"Lerp", Part 2

Related articles:
Predicate Logic (December 2003)
Lerp 1 (January 2004)
Lerp 3 (March 2004)

Two months ago I looked at predicate logic as a possible way to help simplify game programming. Last month I introduced a programming language called Lerp; the paradigm behind Lerp is a fusion between traditional imperative programming, and the declarative style of predicate logic.

Toward the end of last month's article, I implemented C-style structs as syntactic sugar over predicate logic databases. This means that the programming language has very good introspection support, and all sorts of interesting pattern-matching queries can be performed on fundamental data structures.

All the time, random people propose high-level programming features that sound cool at first but, when applied to real problems, are not actually helpful. (In fact this seems to be the norm for new language features). Not only have I proposed a feature unproven in the world of games, I have used it to replace struct, the traditional workhorse of the imperative language. Given all this, the onus is upon me to show that the database features are useful for nontrivial real-world problems.

A Simple Real-World App

I thought it would be fun to choose a program from an earlier installment of The Inner Product and port it from C++ to Lerp. After a bit of consideration, I chose the app from “Interactive Profiling, Part 1” (December 2002). The program is relatively simple; you have a standard mouselook and keyboard-walk interface for moving around the world; the ground is flat, and there are about 100 crates scattered around. The crates are rotated at different orientations, and there are a number of different textures on them. (Originally, the point of this program was to provide a simple world for a profiler to run on. To help keep things simple, since profiling is not the point of this exercise, in the port I reduced the profiling information to a traditional frame counter.)

This app has been a good crucible for the early design of the language. I started with the vague idea “I want to base a language on predicate logic” and selectively refined the language based on the demands made by concrete programs like this one – hopefully keeping myself grounded in reality.

Nature of the Test Program

One interesting aspect of this test program is that it draws all those crates at a very low level. It uses glVertex(), glTexCoord() and glColor() to render each crate at one vertex at a time. Lerp is intended to be a scripting language, and in a real game, you’d rarely use the scripting language to render at this low a level. Instead you would use a render_mesh() function, which would be implemented in fast and efficient C++ code.

In fact, this test app is so low-level that it even computes the lighting for the faces itself, by dynamically computing surface normals and dot-producting them with the light direction. If the program were trying to do less work, it could store surface normals for each entity and tell OpenGL to do the lighting itself.

As a result of all this, the program is actually much more stressful than necessary to meet our domain requirements. So if it can be made to run quickly, we’re in good shape.

Program Fundamentals

The test app uses a Vector3 class to represent points, and it does a bunch of math on those points, passing the results to the aforementioned OpenGL calls to render the world.

Naturally I need to implement a way to call OpenGL functions from Lerp. That’s straightforward enough, I can just code wrappers in C++ that are callable from Lerp.

I could also implement a Vector3 class in C++, manipulated via wrappers, but that would be a cop-out. The whole point of this program is to stress-test Lerp’s data structure handling mechanisms, and the Vector3 is the fundamental data structure of this program. So the Vector3 needs to be implemented using the database features.

Implementing Vector3

How to implement a Vector3 as a database? I could take the approach we tend to use in C++, which is just to have some named slots x, y, and z, with a floating-point value for each. But lately I am not very attached to these named coordinates, as they don’t generalize very well to higher dimensions – for the 4^th dimension, sure, we can use w, but beyond that the letters just get arbitrary. And if Lerp is going to be a high-level language, the code I write for Vector3 should just be re-usable for Vector4 or Vector11 or any other dimension. So, whereas I might introduce the ability to use names as aliases for the dimensions at some later date, for now I am just going to number them 1, 2, 3. This is in keeping with the Differential Forms and Clifford Algebra traditions of naming dimensions e₁, e₂, … except I am leaving off the e. Also, it introduces uniformity between the way we reference vectors and matrices (we use integer subscripts for matrices).

So a Vector3 will just be a database containing facts that tell me what the coordinates are. Each coordinate will be represented by one fact, perhaps like this: [‘coordinate 1 0.337], meaning “The value of coordinate 1 is 0.337”. But on second thought, every fact stored in a Vector is going to have ‘coordinate as the first entry, so we can just eliminate that redundancy. Hence the facts will look like [1 0.337] or [3 -1.562]. From now on I will also call facts “tuples”, since really what we are looking at here are tuples of arbitrary data elements.

For a Vector3 to be valid, the facts inside it should be two elements long, the first element being an integer, the second element being a floating-point number. (There’s another important constraint that we’ll talk about later). It would be nice to have some error checking, so the program can tell us when we make mistakes. So I created the ability to declare a database schema inside a struct declaration. The schema controls which facts can be stored in a database, and it provides error-checking as well as program documentation. For a Vector3 it looks like this:

struct Vector3 {
[?Integer ?Float];
}

The syntax for a schema uses square brackets just like a fact tuple would. This schema indicates that the first slot can be any Integer and the second can be any Float. But we ought to be more specific than this; the first coordinate of a Vector3 can’t be just any integer, it has to be between 1 and 3. So I added some new syntax to declare this in a compact way:

struct Vector3 {
[(1..3) ?Float];
}

At this point, it’s easy to see that some optimized version of Lerp could look at this declaration and know to store the Vector3 floating-point values compactly into an array. That’s good to keep in the back of our minds, but we won’t obsess on it now.

Since I added this concept of ‘..’ indicating a range of integers, I made it work in imperative expressions also. You can say each (1..n) {…} and the code will loop n times over the block. This is used in the example code to initialize the crates.

Syntactic Sugar for Vector3

Using the syntax introduced last month, we can manipulate this new Vector3 type. Supposing we have a Vector3 called v, we can find its x coordinate by evaluating v.[1 ??], in other words, “For the tuple with 1 in the first slot, what is the value in the second slot?” But what’s interesting – and very hard to do in C++ -- is we can also perform queries that go backwards from the usual direction. So we can say things like: each v.[?? 0.0], in other words, “Give me a list of which coordinates are zero.”

Still, the v.[1 ??] seems cumbersome, where in C we would just say v.x. Also, we need to think about initializing the vector. In C we can say v.x = foo. As of last month’s sample code, we used the operator ‘.+’ to add things to databases (and ‘.-’ to remove them); so to set coordinate 1 of the vector, we’d say v .+ [1 foo].

But we have a consistency problem here that can cause more tedium. Not only must each tuple of v be well-formed, but there should be only one tuple for each spatial coordinate. If v has two different entries with 1 in the first slot, we’re in trouble. So before saying v .+ [1 foo] above, we need to say v .- [1 ?], in other words, “Remove any tuple with 1 in the first slot”. If we don’t remember to do that, we create a glaring data inconsistency. That’s annoying and it encourages errors. Furthermore, there’s no way for the system to catch the errors, because the compiler doesn’t know about this constraint.

To help improve error checking, and to make programming nicer, I added the idea of a domain/range relationship to the struct schema declaration. The power of the database approach is that we can query the data in a freeform way, we’re not restricted; but often, as with the Vector3, we are modeling functions (not arbitrary relations). In these cases, one direction of query is “forward” and another direction is “backward”. We want to make it easy to talk about the “forward” direction, since that will be the most common case.

So in the schema, you can use the symbol ‘|’ to indicate a division in a tuple; everything to the left is the domain, everything to the right is the range. The definition of Vector3 becomes:

struct Vector3 {
[(1..3) | ?Float];
}

(Sometimes the domain of a tuple won’t be on the left-hand side. In those cases I provide an alternative syntax for declaring it; see the sample code if you’re interested.)

Now at least if we assert two tuples for the same coordinate, the system can signal an error. But I also added some syntactic sugar to imperative expressions, using C-style array subscripting syntax. If you say v[1] as an rvalue, that’s equivalent to the expression v.[1 ??] (since the compiler knows by looking at the schema that the domain of v is one element wide). If you use v[1] as an lvalue, as in v[1] = foo, the compiler removes from v any old tuple that matches [1 ?], before adding the new tuple [1 foo].

Using this nice brief syntax, we can define basic vector operations like addition, scalar multiplication and the dot product. Listing 1 shows two examples. I think they are pretty neat, because they are brief and they work on sparse vectors of arbitrary size! (The sparsity part requires some small language additions that are beyond the scope of this article; also notice that I have generalized in the Listing from Vector3 to Vector. Thinking about implementing these features is, for now, an exercise for the interested reader). I like that I can define a versatile vector class so simply and clearly.

Listing 1: Basic Vector operations written in Lerp.

proc dot_product(Vector a, Vector b) {
return + each a[?i] a[i] * b[i];
}

proc *(Vector v, Float factor) {
    Vector result;
    each v[?i] result[i] = v[i] * factor;
    return result;
}

The Matrix4

Writing a simple 3D app involves using some matrices as well as vectors, so let’s look at a matrix definition:

struct Matrix4 {
[(1..4) (1..4) | ?Float];
}

It’s very much like a vector, except the tuple is 3 elements wide because there are two indices. Now suppose I have some matrix m and I want to extract the first column vector from it. Nicely, the database query operation just provides the right answer for us – we don’t need wrapper functions or anything. We just evaluate m.[? 1 ?] and the result is a database consisting of length-2 tuples that fill in the question marks, in other words, the values from all tuples that have a 1 in the second slot. This has the same form as a Vector4 would, but for the purposes of typechecking, can the compiler know it’s supposed to be a Vector4? There’s a sense in which it can; if you write down the schema for this resulting database, it’s just the schema for Matrix4 without the second slot: [(1..4) | ?Float]. That matches the schema for Vector4, so if we allow anonymous databases to type-match based on their schemas, we achieve some pretty reasonable type-safety.

Here’s another trick: the trace of a matrix m is just + each m[?i ?i]. Note that we’re using the array subscript notation for brevity, like we did with vectors; because we’ve specified the same variable name for both index slots, the values in the two slots must be equal, so the ‘each’ iteration only travels along the diagonal of the matrix. (Math operators in Lerp automatically expand across ‘each’ just like function calls. For example, + each (1..10) evaluates to 55, * each (1..5) evaluates to 120.)

I get warm fuzzy feelings from this. These expressions are so short and simple that they probably don’t justify creating a wrapper function in many cases – when you want to get a column vector from m, you may not want to call some get_column_vector() function, when you can just say m.[? n ?]. This is interesting because it may allow compound math expressions to merge more organically than they might otherwise.

But How Does It Run?

When you try out this month’s sample code, you’ll find that it runs at a reasonable speed. On my 1.5GHz Pentium-M laptop, it clocks in at 30 frames per second. Keep in mind that the language system is almost entirely unoptimized – the bytecode is bloated, the function interfaces are slow, the code garbage collects every frame, and all the databases are still implemented as linked lists! Despite all this, the application runs at a smooth frame rate. If nothing else, this goes to show that computers are really fucking fast these days.

Conclusion

I’ve demonstrated that, despite its foundations in wacky language features, Lerp can accomplish real-world tasks. I encourage you to download the sample code and play with the interpreter yourself. Next month we’ll look at some all new pattern-matching language features.