Tags

, , , ,

Ultimately any program must communicate with the outside world. Be it showing the user data or sending a text based protocol, the need to format character data is ever present. All languages have some way of doing this, and in general they rank from nearly useless to somewhat acceptable. It always ends up some mix of core features and standard library functions. There is unlikely an ideal solution that fits all needs, therefore the ideal solution must be to support many different options.

Use Cases

There are many different reasons to format data. A client program needs to display numbers on forms. A server needs to produce a JSON response. Web pages need to be written in HTML. A statistics package needs to write out data in CSV.  Communications with the DB need to be done in SQL. Some programs need to even emit code in another language.

Even though that barely scratches the surface we can already see some general themes emerging. In some cases precise control over the formatting on individual fields is required. Other cases require large templates where bits and pieces will replaced. And yet other cases have structured data in memory that needs to be written in a fixed format. To repeat from before, it is highly unlike a single solution will solve all needs. Therefore it makes sense to create generic language features that would allow libraries to provide specific solutions.

Implications

The C-style print is a very popular option. It always proves useful at some point. The primary problem with printf is a lack of type safety and serious issues with automatic type conversoins. A language must find someway to allow a type-safe printf. In the general case this means tying a string format specifier to a variable argument list. This inevtiably leads however to being able to have custom code tell the compiler about additional parameter semantics, since surely nobody will agree the formatters in printf are enough. This seemingly simple goal can be a large driver in the design of the language.

For large template output a significant problem is trying to keep track of the format of each string and applying the correct escaping. Functional style notation for this type of output is terrible, either a template or stream based approach is better. For that to be type safe the compiler must be aware of the structure of the document. For example, if at the point in the stream an HTML attribute is required, whatever value is passed will be converted to an HTML attribute. Furthermore, should that value not be convertible it must produce a compile-time error to satisfy static typing needs. Like the first goal of formatting this is very lofty goal.

This second implication actually further implies a king data sub-typing requirement. It isn’t enough for a string to just be string. Additional meta-data about its character set and encoding are required. Traditionally this might be handled with generic types and specializations. That however quickly becomes onerous as the number of type combinations explodes. This is best left to another discussion.

Advertisements