Ideal Language

The Ideal Lanuage has a way to format strings

Ultimately any program must communicate with the outside world. Be it showing the user data or sending a text based protocol, the need to format character data is ever present. All languages have some way of doing this, and in general they rank from nearly useless to somewhat acceptable. It always ends up some mix of core features and standard library functions. There is unlikely an ideal solution that fits all needs, therefore the ideal solution must be to support many different options.

Use Cases

There are many different reasons to format data. A client program needs to display numbers on forms. A server needs to produce a JSON response. Web pages need to be written in HTML. A statistics package needs to write out data in CSV.  Communications with the DB need to be done in SQL. Some programs need to even emit code in another language.

Even though that barely scratches the surface we can already see some general themes emerging. In some cases precise control over the formatting on individual fields is required. Other cases require large templates where bits and pieces will replaced. And yet other cases have structured data in memory that needs to be written in a fixed format. To repeat from before, it is highly unlike a single solution will solve all needs. Therefore it makes sense to create generic language features that would allow libraries to provide specific solutions.

Implications

The C-style print is a very popular option. It always proves useful at some point. The primary problem with printf is a lack of type safety and serious issues with automatic type conversoins. A language must find someway to allow a type-safe printf. In the general case this means tying a string format specifier to a variable argument list. This inevtiably leads however to being able to have custom code tell the compiler about additional parameter semantics, since surely nobody will agree the formatters in printf are enough. This seemingly simple goal can be a large driver in the design of the language.

For large template output a significant problem is trying to keep track of the format of each string and applying the correct escaping. Functional style notation for this type of output is terrible, either a template or stream based approach is better. For that to be type safe the compiler must be aware of the structure of the document. For example, if at the point in the stream an HTML attribute is required, whatever value is passed will be converted to an HTML attribute. Furthermore, should that value not be convertible it must produce a compile-time error to satisfy static typing needs. Like the first goal of formatting this is very lofty goal.

This second implication actually further implies a king data sub-typing requirement. It isn’t enough for a string to just be string. Additional meta-data about its character set and encoding are required. Traditionally this might be handled with generic types and specializations. That however quickly becomes onerous as the number of type combinations explodes. This is best left to another discussion.

3 replies »

  1. In Ada you have concatenation so no need to use string formatting, enumerations can be printed as it’s name with the image attribute, also numbers have an image attribute to convert it to a string.

    I think the string formatting from C is really un-neccesary.

    Put_Line (“Something ” & Integer’Image (-54) & ” ” & Fruit_Enum_Type’Imge (Orange));

    Would display: Something -54 ORANGE.

    • While I do use concatenation sometimes, for any non-trivial situation a formatting function is preferred. I need the ability to specific how to format numbers, and if there are multiple fields it is nicer to see the whole string at once.

  2. The D programming language solves the trivial case–plugging formatted arbitrary types into a template string–type safely by using its ridiculously powerful metaprogramming capabilities to scan the format string at compile time and generate a template function taking only the correct number and types of arguments.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s