Sometimes the simplest of things have complex answers. In programming we like to think of simple data types, but in reality there aren’t any. While it is nice to consider things as pure data, it is possible that such things don’t actually exist.
Here I’d like to demonstrate the complexity of simple things by exploring what seems to be a simple data type: a point. There is no conclusion drawn here, rather just a bunch of details. These results however are important when we consider what an object-oriented language should be providing in terms of data types — an article which I intend on writing later.
What is a Point?
Beyond perhaps a basic integer, or floating point number, a two dimensional Point is one of the simplest data structures we’ll encounter. The purpose of a point is to locate a single position on a surface. Its definition can be simply an x and y value specifying the coordinates on a plane. In C this might look like this:
[sourcecode]
struct Point
{
float x,y;
};
[/sourcecode]
We’ve already made a few assumptions, or simplifications. That “float” could easily be an integer value instead, which is common for work on a computer screen with integral pixel locations. Or perhaps it could be a higher precision “double” if we are working in a virtual space. In any case it still satisfies our notion of a point, just with a different level of precision.
I also mentioned a “plane”, which most of us assume to be a Cartesian plane. This is our typical x-axis goes right and y-axis goes up plane. Now, depending on what you’ve programmed before you might already disagree and say the y-axis goes down on the screen, not up. That is, the origin is the top-left corner and the y-axis points to the bottom. Both are used, in fact even more combinations are used. This tells us that our simple Point class, on its own, is not enough to identify a location, it is only valid in combination with a given coordinate space.
Why did we choose an x,y representation however? There is another common way to represent a point. We could specify it in polar coordinates: the distance from the origin and a radial position. Logically these are both Point data types, as they both indicate the same position on a surface. The following functions have the same meaning for both types of Points:
[sourcecode]
Point translate( Point pt, Point origin );
float distance_between( Point a, Point b );
Point rotate_around_origin( Point pt, float radians );
[/sourcecode]
The implementation varies depending on the point, but the meaning stays the same. This has implications for object oriented programming: should your point class represent a logical point, or a concrete data type? It’s easy for one point representation to expose itself as the other, and then you would only need one version of the above functions. Doing so would however not be very efficient, as there are different optimal forms for each point representation.
Beyond 3d
To move into three dimensions we can add a third component to our Point, a z-axis value. Now we can locate a position within a volume. The choice of axes and origins starts to become more relevant now. In the world of 3d you’ll find virtually every combination used.
Choosing the names x,y,z is common and historical, but perhaps not the best approach. How do we extend this convention beyond three dimensions? You could protest now that very few of us work with string theory, thus N-dimensional space is not interesting. However, 4d representations are actually quite common when working with 3d. Adding one dimension simplifies some math — if you pick up any graphics algorithm book you’ll notice they tend to use 4-value vectors and 4×4 matrices.
Wait, where did the “vector” come from? Well, points aren’t really anything more than an ordered collection of values, which is what a vector is. Looking at it this way is required to understand the math involved in rendering objects. It also gets us something more, we don’t have to worry about what letter comes after “z” in our naming scheme. Thus, in code, a point may be nothing more than a typedef for an array (a fixed size vector). The below are actually all common definitions you might see in source code:
[sourcecode]
typedef int[2] Point;
typedef float[3] Point;
typedef double[4] Point;
[/sourcecode]
Okay, you might not see the last one as often. Just to make things more interesting the 4-vector form often fixes the last value at 1. Since it is always 1 you don’t actually need to store it, so while your calculations do use all 4 values, your saved points may use only 3. This is a further difficulty in deciding what actually is a concrete, logical, or abstract data type.
Relative Points
If you’ve done any OpenGL, or 3d animation/cad work you’ll also have seen something else at this point. When describing an object, the points tend to be described relative to that object. For example, if we have a cube, the points on that cube are all described as though the origin is the center of that cube. Often it goes even further saying all objects are normalized to a size of one (meaning the corners fo the cube are +/-1 in each dimension). When we put the cube in a scene, we’ll then translate its origin an its size appropriately.
Take even a simpler example, the description of a rectangle. Logically we know what a rectangle is, but how do we represent that? There are a few common ways: have a point for the upper-left and lower-right corner; have a point for the upper-left corner and another set of values for the size, or have a point for the middle and a set of values for the size. That “set of values for the size” has a concrete data type equivalent to the point itself (it is an ordered collection of values, the same length as the point).
You should now be able to see that the point data on its own is almost meaningless. While our C-structure seems quite concrete, it doesn’t in fact let us locate a position as we had hoped. Nor does it come anywhere close to letting us render full objects (even basic ones like a rectangle). We need context. We need to assign external meaning to these values.
Alternate Points
In programming we like to avoid redundancy. This includes avoiding specialized types when they are not needed. To this end using a simple array for points could make more sense than a specialized class. Somehow it feels like points however have special operations, functions which aren’t needed on all vectors. We would like to think of our Point class as something special.
It turns out the Point is not special; most operations on a point have meaning on other logical data types. Consider first a complex number: a number with one value for its real and one for its imaginary component. This sounds very similar to the point in 2d and indeed visualizing a complex number as a location on a cartesian plane is common. For both cases there are common vector operations. For a point vector addition represents translation, whereas for a complex number it represent numeric addition.
[sourcecode]
/* Using distinct types */
Point translate( Point pt, Point off ) { return Point( pt.x + off.x, pt.y + off.y ); }
Complex add( Complex a, Complex b ) { return Complex( a.real + b.real, a.imag + b.imag; }
/* With an array */
array add( array a, array ) { return array( a[0] + b[0], a[1] + b[1] ); }
[/sourcecode]
It seems kind of silly to maintain distinct code for each type. While this example is quite trivial, the same pattern holds for more complex operations. Consider that the rotation of a Point is very similar to the multiplication of complex numbers (and the use of polar notation is also interesting again).
Let’s move to 3d though for another example. Think about a color, representing as a red, green, and blue component (as they often are). Now, suppose you wish to blend two colors. If you simply combine individual components you’ll have issues with luminosity. If you instead consider the color as a coordinate in a 3-space, like a 3d Point, you get a better solution: the blended color is merely the middle point between the two other points. If you start looking more into colors, you’ll actually notice that things like luminosity, hue, color-space conversion, and matching tend to be expressable as vector and matrix operations.
Where does this leave our Point data type? While logically we perform different operations on points, complex numbers, and colors, underneath they are all the same vector operations. Does this mean our Point class, and functions, are nothing more than different names for the vector operations?
To The Point
As I said in the beginning, I’m not striving for a conclusion here, simply interested in exploring the complexity involved in a very basic data type. We’ve seen here that different types can be represented by the same data structures (point and complex number). Functions can have the same meaning even if the representation changes (rotating a cartesian versus polar coordinate). Types from different domains can actually share implementations (colors and 3d points). We’ve also seen that our simple point class also requires a context to really mean anything at all.
While reading programming literature I often see terms like “concrete”, “logical” and “abstract” data type bandied about without clear consideration of what they actually mean. It seems that one’s perspective on the data could change the definitions. Are we considering the operations on the data, or the underlying binary representation? Does our result exist in a virtual space, or does it represent a concrete location on a device?
All these questions are perhaps not significant for one program. In the general realm of object oriented programming and language design they become very relevant. It’s not an easy topic, nor do I think we’re done exploring it; I’ll certainly come back to it. For now though, each time you code a new data structure, just think about our one question, “What is a point?”