Programming

A game of what’d they press? Hit testing in a 2d/3d layout

A simple tap of the screen invokes a staggering amount of calculations. That screen is actually big tree of UI elements, and we need to navigate it quickly to find the right one. Making matters worse are the various transformations, including 3D ones, that can be applied at each node. We might even have to traverse a viewport, which has a new camera entirely.

In this article I give an overview of how hit testing works in Fuse

Brute-force basics

I like starting with the simplest approach to understand the problem. In the case of hit-testing we scan through all of the elements on the display and check if the point is inside of them. All we need to know is where each element is located, and what part is actually hitable — the part that is supposed to react to the user touching it.

The layout system gives each element a position and size, called it’s placement, but this isn’t enough to figure out where it really is. There can be transformations applied, such as rotations and translations. For simplicity these transforms and the position are combined into a single local transform. This describes its position relative to its parent. We can construct a full local-to-world transform, but for the case of hit-testing we’ll actually work the other way.

At the root element the coordinates in the pointer event match the local coordinates. For each child we convert these coordinates to the child’s local coordinates. This is the opposite conversion that the local transform does, which converts from the child’s coordinates to the parent’s. No problem, we simply invert the matrix to go the other direction.

As with all things graphics in nature, linear algebra is one of our key tools. If we limited a framework to only translations, we could hide this: 2×2 matrices tend to become just inline equations. By adding rotations and skew, we force ourselves to 3×3 matrices. Add in 3D rotations, we we bump ourselves up to 4×4 matrices.

Each child repeats this process for its own children as well as check if there is a local hit. An element that draws something, like a Rectangle, checks if the coordinates are within that drawing. These are basic geometry tests, custom to each drawing.

The element which is actually hit must be the element the user actually sees. This is the one at the top of the drawing stack. In the true brute force approach we’d track the distance to an object and keep checking the rest of the nodes in the tree. The first optimization is to not do this. The children are sorted according to Z-order. If we start at the top-most ones first we can simply stop looking when we find the first hit.

A tidbit, for this to work it means a node must checks its children prior to its own visual. The children are on top of the local visual. Usually at least. Adjust accordingly.

Bounds optimization

There are several techniques to improve this approach. The first is the obvious one of not scanning children that aren’t visible. At any given moment most of the UI isn’t visible. Unfortunately there are two meanings of visible here. There is a strict Visibility flag which is easy to check:

1
2
3
4
public void HitTest(HitTestContext htc)
{
    if (!IsVisible) 
        return;

But we also have logical visibility: is the element visible by the user on the screen?

The logic here parallels that used by drawing: there is no point drawing something which is not on the screen. In drawing we check each child’s render bounds to the current drawing area. Originally we used the render bounds for hit testing as well, but as features evolved that became wrong, and potentially inefficient. Instead we have a parallel hit test bounds system.

The hit test bounds of a node are the bounds of it’s local visual, and the hit test bounds of all it’s children.While scanning the tree we check if the local pointer position is within these bounds. If it isn’t then we skip over the child entirely. This part is fast…

…calculating the bounds is slow. It needs to be the merged bounds for all of the descendents of the node. It’s not something we wish to calculate often so we cache it. Unless something changes we can keep using the cached version. Of course, with animation things can change quite often, but rarely more than just for a part of the UI tree.

Cached calculations are a major pain to deal with. It makes it vitally important for any code that might change the calculation to invalidate the cache. It’s easy to miss a spot, and also easy to invalidate too often. Programming language support for this is nowhere to be found. I think I’ll write an article dedicated to this topic.

Until 3D comes along

At some point we decided 3D transforms, like rotating back into the screen, would be a nice visual effect. This complicates hit testing. The crux of the problem is that a 2D point has no meaning in 3D space. We can’t simply take the mouse coordinates and transform them into local coordinates.

3D hit testing uses a world ray instead of a point. The pointer coordinates are instead interpreted as a ray from the the surface of the screen into the 3D world. If we were dealing with a lot of 3D objects an optimized algorithm would diverge from ours. But we may have only a couple of actual 3D objects, like a cube or sphere, and mostly we just have 3D transforms of flat objects. The panel may be rotated into the screen, but the panel itself is still flat.

The ray calculations are more expensive than the simple matrix multiplication required for flat objects. To avoid this calculation our bounds type, used for both hit test and rendering, tracks a 3d volume bounds. During hit testing we call the IsFlat function and can optimize which calculation to perform.

When we do a ray calculation we end up back in 2D space quite quickly. This is what that bit of code looks like now:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
float2 localPoint;
bool hit;
if (bounds.IsFlat && HitTestTransform == HitTestTransformMode.LocalPoint)
{
    if (!TryParentToLocal(htc.LocalPoint, out localPoint))
        return;
    hit = bounds.ContainsPoint(localPoint);
}
else
{
    //find intersection of ray with Z=0 plane
    var world = Viewport.PointToWorldRay(htc.WindowPoint);
    var local = Viewport.WorldToLocalRay(Viewport, world, this);
    localPoint = ViewportHelpers.LocalPlaneIntersection(local);

    hit = bounds.IsFlat ? bounds.ContainsPoint(localPoint) : bounds.IntersectsRay(local);
} 

The added HitTestTransform test allows the Viewport to force the ray calculation. It’s one of those 2D/3D bridge classes, so while the Viewport itself is flat, it’s contents may not be. This is a virtual function, thus this will be the Viewport type, and the Viewport virtual getter will also resolve to the Viewport class.

Happy tapping

So the next time you tap your mobile device, please pause, and give thanks. Without hundreds of years of linear algebra research, by individuals like Cayley and Gauss, we’d not be able to figure out where you tapped!

Categories: Programming, Use Case

Tagged as: , , , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s