My previous article explains how I draw a rounded rectangle using a shader instead of tesselation. The biggest remaining issue is using eight calls to draw; one draw call would be much nicer. Here I show how I merge all these draw calls into one and upload a minimal amount of data to the GPU.

### Linear parameterization

The first step isn’t directly about combining calls, but does make it easier. I want to avoid uploading the vertex data to the GPU each time I draw a rectangle. The previous article shows how to calculate all of the vertixes required for a rounded rectangle. Obviously those move around depending on the radius amount of each corner and the size of the rectangle; the vertex set can’t just be static.

Lookling at the calculations I realize they are all fairly straightforward linear combinations of the input values. There are only 9 variables that control the shape of the rectangle:

- CornerRadius 0…3
- Size X,Y
- Extend X,Y
- Mn = Min(Size.X/2, Size.Y/2)

I didn’t show the

`Extend`

variable in my last article. This is an additional size added to the rectangle which is not part of the rectangle itself. It’s used to expand the edges of the triangles to extend over the region where a stroke will be drawn.

The odd one here is probably `Mn`

. This is a calculation of the `Size`

variables and could be done on the GPU. It is however good to avoid conditionals whenever possible — though `Min`

is likely a cheap conditional operation. More importantly, it makes the next calculations easier.

### Dot product

Every vertex is just a simple linear combination of that input vector. Let the input vector be called `Input`

, it contains the list of values above in order. We can multiply, dot product, that vector by another to get the value of each vertex.

For example, consider the vertex at `CornerRadius[0], Size.Y + Extend.Y`

. `CornerRadius[0]`

can be expressed instead as `Input · [1,0,0,0, 0,0, 0,0, 0]`

. The second expression is `Input · [0,0,0,0 0,1, 0,1, 0]`

.

Expressing each vertex with an array of numbers like that would be quite aweful from a code perspective. The next guy looking at my code would hate me: it’d be unreadable. Instead I tried to define this more procedurally. I defined each component of the input vector on its own first:

1 2 3 4 5 6 |
... var CornerRadius3 = new float[]{0,0,0,1, 0,0, 0,0, 0 }; var SizeX = new float[]{0,0,0,0, 1,0, 0,0, 0 }; var SizeY = new float[]{0,0,0,0, 0,1, 0,0, 0 }; var ExtendX = new float[]{0,0,0,0, 0,0, 1,0, 0 }; ... |

Then I defined a few simple vector operations on those arrays: `sub`

, `add`

and `neg`

. That same vertex from before, `CornerRadius[0], Size.Y + Extend.Y`

, can now be expressed as `CornerRadius0, add(SizeY,ExtendY),`

. That’s a lot easier to read then a bunch of 0’s and 1’s. Here’s a sample of several vertex definitions:

1 2 3 4 5 |
sub( SizeX, CornerRadius1), sub(SizeY, CornerRadius1 ), add( SizeX, ExtendX), sub(SizeY, CornerRadius1 ), Mn, sub(SizeY,Mn), sub( SizeX, Mn), sub(SizeY, Mn), |

This is one place where a language that supports custom operators is helpful. I’d be able to retain the exact syntax from before and work on custom types instead. Neverthless, I find the above solution quite readable.

This is a one-time setup step. All these vertices are then stored in a buffer on the GPU. They are constant data. When I draw a rectangle I use them as vertex attributes, specify the `Input`

as uniforms, and use a dot product to get each vertex.

### Distance function

I need all the distance functions to have the same form. In the multiple draw approach the corners had one function, and each side had it’s own specialized function. They must all be merged into a single form. I’m using the circle form as the basis:

1 |
distance_to_edge = vector.length( pixel_position, circle_center ) - radius |

This can be generalized to refer to an arbitrary edge:

1 |
distance_to_edge = vector.length( pixel_position, edge_position ) - edge_offset |

Where `edge_position`

is a vertex attribute that specifies where an “edge” of this triangle is. `edge_offset`

is a value that says how far away this “edge” is from the real shape edge. It is an inverted calculation.

For a corner triangle the `edge_position`

is the same for all vertices: it is the single position that defines the center of the circle. For the sides it is the line through the center of the rectangle that defines the edge.

For each side vertex I provide the horizontal position along that line in the vertex attributes. The GPU will interpolate along that line for each point in the triangle.

All the distance calculations now have the same form so they can be combined into a single shader program. The actual values are of course distinct per rectangle. Again I can express them as linear combinations of the same `Input`

uniforms I used below. I create a second vertex buffer for these (this is just easier to manage from code, obviously all attributes could be combined into one larger buffer).

### One program

All that together results in a single draw call with a unified shader program. And it only requires sending that small set of uniforms, `Input`

for each rectangle. The vertex buffers are a one-time initialization.

I’ll do one more followup article looking at a few niceties that emerge from my rectangle drawing: anti-aliasing and edge normals.

Categories: Programming, Use Case

Hi– I’m fascinated by your elegant approach to drawing rounded rectangles. I’m a relative newbie to OpenGL and have only used tessellation to do this so I’m a bit lost on how to write the shader program for this. Any chance you’d share your shader code?

The code is part of the Fuse product. It has a very compact and odd form that’d be hard to present in a useful fashion — the article does my best to describe what happens.

I was stumped on how to find “pixel_position” as you call it. One approach is to use gl_FragCoord in the shader, but it gives coordinates in window space which causes messy calculations for any kind of transformation. I finally found a nice solution– use texture coordinates with a UV buffer on quads with the typical 0 to 1 values at the corners. Same setup as if you were going to texture a quad with an image, but you skip the bitmap & sampling stuff. That eliminates the need to do all that complex breaking up of the rectangular parts of the rounded rectangle to get a distance. Instead you just query the components of the vec2 UV value in the shader to know how close to an edge you are. Works great in the corners as well since you are always working with a unit circle regardless of the actual size of the rounded rectangle you are creating. Also, in contrast to gl_FragCoord, the coordinates are local which simplifies things quite a bit.

Sorry, I guess I didn’t explain too well where pixel_position comes from — I don’t use gl_FragCoord. The rectangle vertices are expressed in virtual coordinates and I have the density of the output buffer. I just multiply to get the actual pixel position. Your solution is likely similar to this now.

I’m not sure how you avoid the breakup since you need to at least split the calculation for the corners and the straight segments (or perhaps you have an application where you don’t, but I need to do strokes around everywhere). Mine is also complicated by allowing a distinct corner radius for each corner. My breakdown minimizes the the logic in the shader, avoiding any costly conditionals.

Yeah I’m curious how each approach would benchmark against each other. I took the idea to its extreme conclusion by using just 2 triangles in a single quad with all the logic in the shader. I first determine if the fragment falls within one of the four corner quadrants, and, if so, then test its distance against that corner’s “center” point that I precalculate and upload to the shader as a uniform. I’m just using a solid color fill, but strokes/borders can be added by just adding more bounds checking. I know best practices say to limit/eliminate conditionals in shaders, but I wonder if there is much performance impact for such a simple object as a rounded rectangle? I’ll post again if I notice problems. At any rate, using a vector/distance approach at the pixel level to render circular corners makes a ton more sense to me than using a bunch of vertices in a triangle fan like I saw so many suggest before I ran across your elegant solution.

Conditionals in the pixel shader are quite costly. On some of the mobile devices we’ve tested the addition or removal of a single branch can change the speed quite significantly (it depends on their particular calculation pipeline). Some devices seem to prefer more triangles as well. From my testing on my game it appears that, with the pixel count remaining about equal, all counts less than 400 triangles perform roughly the same on many devices. So it makes sense to move calculations into the vertex shader instead.

Thanks so much for sharing this excellent idea for rounded rectangles. I was looking for something like this to avoid passing all the tesselated vertices to the CPU but I wasn’t sure how to approach it. Your article has been super helpful.

Similar to Chad’s question, I’m wondering…if all you’re passing to the vertex shader is the 9 inputs you described, how do you specify the transformation matrix per vertex? I’m very new to OpenGL so I may be missing something. My idea was just to add an additional uniform matrix that has the mvp calculated by the cpu and then multiply each vertex by that in the vertex shader.

I also pass a view-projection matrix to the shader, just like you’d use in any other draw call. That combined with the uniforms is enough to create the vertices for the rectangle.

Thanks! Also, I didn’t understand your reply to Chad about the pixel_position calculation. I am aware of gl_FragCoord and interpolating a vec2 from the vertex shader as two ways to know the pixel position of each fragment in the fragment shader, but is there some other way? In particular, I didn’t understand when you said “I have the density of the output buffer”. Could you explain in more detail?

When I calculate the size/position of the rectangle I’m able to do this in pixel coordinates: my vertices will align perfectly with the pixels on screen. The actual vertex values are those pixel coordinates. It is the view projection that can convert these values into the clip space values.

When the GPU interpolates my vertex values it’s giving me directly the pixel location. This is because it interpolates the value per-pixel on the screen, and my vertex values are expressed in pixel coordinates. Thus I don’t need gl_FragCoord since the interpolation is already producing the pixel value.