Defective Language

A Failed Experiment with Python Type Annotations

I like Python, but wish it had static typing. The added safety would go a long way to improving quality and reducing development time. So today I tried to make use of type annotations and a static type-checker called mypy.

After a few basic tests, I was excited. But my glee turned to disappointment rather quickly. There are two fundamental issues that make it an unusable solution.

  • You can’t have self-referential classes in the type annotations, thus no containers
  • You can’t have inferred return type values, thus requiring extensive wasteful annotations

Let’s look at both problems.

Self-Referential

I’m trying to render some articles for Interview.Codes with my MDL processor. The parser uses a Node class to create a parse tree. This class contains Node children as part of a tree structure. Logically, that means I’d have functions like below.

class Node(Object):

    def add_sub( self, sub : Node ):
        ...

    def get_subs( self ) -> Sequence[Node]:
        ...

mypy has no trouble understanding this, but it’s unfortunately not valid Python code. You can’t refer to Node within the Node class.

The workaround suggested is using a TypeVar.

NodeT = TypeVar( `NodeT`, bound=`Node` )
class Node(Object):
    def add_sub( self, sub : NodeT ):
        ...

    def get_subs( self ) -> Sequence[NodeT]:
        ...

This is ugly. I’m reminded of C++’s _t pattern. Part of my attraction to Python is the simplified syntax. Having to decorate classes like this makes it far less appealing. Plus, it’s boiler-plate code adding overhead for later understanding.

The limitation in Python comes from Node not yet being in the symbol table. It doesn’t make it into the symbol table until after the class is processed, meaning you can’t use Node within the class. This is a limitation of the compiler. There’s no reason this needs to be this way, except perhaps for backwards compatibility with screwy old code.

Perhaps we can’t use the class name. But we could have a Self or Class symbol that refers to the enclosing class.

No Inferred Return Types

One of the great values of Python is not having to put types everywhere. You can write functions like below.

def get_value():
    return 123

Now, if you’re using TypeScript or C++ the compiler can happily infer the return type of functions. For unknown reasons, mypy choses not to infer the return types of functions. Instead, if there is no type annotation it assumes it returns type Any.

This means I must annotate all functions with information the static type checker already knows. It’s redundant and messy.

You’re additionally forced to learn the names and structure of all types. Ones you could otherwise safely ignore.

def get_iter():
    return iter(sequence)

def get_closure(self):
    return lamba q : self.op(q)

Why should I have to know the type that iter returns to write this function? Or do you have any idea what type get_closure returns? I know how to use the return, and can even reason it’s a function, but I’d have no idea how to specify its type. Knowing the myriad of types isn’t feasible. You’ll end up spending more time trying to tweak types than using the code.

This complexity helped drive the introduction of the auto keyword to C++. There are many situations where writing the type information isn’t workable. This is especially true when dealing with parametric container classes,

Inferring return types is an essential feature.

Avoiding it for now

These two problems repeat throughout my codebase. I’m okay when there’s a limitation that occasionally affects the code, but this is fundamental. To use type checking, I’d have to add the redundant class declarations to every container-like class. To use type checking at all, I’d have to annotate the return value of all functions.

Static type checking should not be a tradeoff and there’s no fundamental reason these limitations can’t be lifted. When these are fixed, I’ll happily come back and use type annotations.


Image Credit: Mari Carmen

24 replies »

  1. Just use ‘Node’ as annotation. And of course, you don’t have to annotate self.

    For your second problem, of course those trivial oneliners are easy to infer. But in most real life functions, it’s much harder. Note that Python doesn’t have disjoint types, it has a class hierarchy. Many times it’s not even clear to humans what the narrowest type is. (Even C++, in which it is enormously easier, had auto many years before it had return type inference.)

    • I don’t understand what you mean by saying to use `Node` as annotation. I was unable to refer to it inside the `Node` class.

      I’m okay if not all cases can be inferred. But I don’t see it as an unsolvable problem. It ends up becoming a big disjoint type. There’s nothing wrong with that as the inferred type. I can specify the type in complex cases, it’s only the basic cases that bother me.

    • @mortoray, @veky meant not `Node`, but `’Node’`, (he omitted backticks to show code and meant the given quotes literally).

      This way it is just a string, so Python doesn’t fall over it, but mypy and other tooling evaluates it lazily. No need for the `TypeVar`, I haven’t seen that suggested before but I have seen and used the quotes ‘trick’.

    • I’ve had a few cases where it was difficult to in down a type – you can leave those unhinted and let mypy just check what you do specify.

      Also, and I hope this doesn’t come across rude – if the problem is simply and efficiently solvable – why not solve it and contribute a PR? Python can always do with more contributors :)

  2. “The limitation in Python comes from Node not yet being in the symbol table”

    This would mean Python does not allow recursive structures which is unlikely.

    • Python is dynamic, meaning it can do recursive structures without needing to refer to the symbols involved.

      The limitation appears because the annotation now need to refer to the symbols. So yes, the type annotations do not allow recursion.

  3. I was thinking in that direction and at this stage I am considering super-classing the type system by using meta classes and kind of a messaging/callback mechanics. It is a shitty approach to the issue since you end up reflecting the entire programming concept of python backwards, but I can’t think of other options. Building more complicated abstractions in python is hell to maintain, especially when you are rushing a project and lack the luxury to take a week and rethink the diagram so that the new feature clicks in without drilling holes.

    At this stage it is just an idea, since I have a couple of projects pending in the queue, but I will definitely take a shot at it.

    • Do you mean you’d override the calling mechanism of Python? I’ve seen these types of super abstractions done in C++ often… they usually fail. Not saying yours will, but the more one fights the language, the more troublesome it becomes. :/

      I’m still hoping I can get the annotations working to some degree. I like Python, but this lack of typing makes it
      unsuitable to anything over a few thousand lines of code.

      Rethinking designs is hell, since without that type safety every change becomes a chore of hunting down the affected code pieces.

    • Yes, I was pointed to that from another thread and I’ll be checking it out. Solving the self-referential types would be a significant step. It was the more problematic of the two issues.

    • In a way yes. My particular case is a web back-end cms framework, so what I actually need is closer to a stream processing on top of a database and not a classical standalone application. I am already overriding most of the calling mechanics by introducing handler classes that serve as endpoints for the post calls coming from the front end.

      I am however interested in the reasons why such solution fail(apart from the obvious complexity overhead both during development and maintenance ). I am also interested in the reasons behind such approach in c++. The c language family comes with robust type management system which along with the scope controls leaves you with a pretty smart compiler(unless you decide to abuse pointers :) ). There is the entire architecture approach that is somewhat similar, but it focuses more on the transaction side of the application models, compared to the ‘language-in-a-sandbox’ model that is expanded upon in all modern scripting/prototyping languages

  4. What would be really good is if you submit some patches to help move things in the right direction. No programming language is everything to everyone but with help from people that are as passionate as you a programming language can go a long way. Python needs people with your passion to contribute features to help improve its functionality.

  5. from __future__ import annotations

    Now you can self reference classes. It’ll be a standard feature in python 4.0.

  6. Self-reference is easy. If you are using Python 3.7, you can ‘from __future__ import annotations’, and Python will perform delayed analysis of the annotations. If you aren’t using Python 3.7, you can put quotes around the class name.

  7. Everything in this post is wrong.

    One, as an above poster mentioned you can stringify the type name. Ie make it ‘Node’ not Node. This works fine in 3.6 and above. This will become the default in python 3.8.

    Also you’re full of crap about C++. You cannot use auto to elide return types.. those must be made explicitly.

    Auto is for type declarations only.

  8. You can have self referrential, i do it all the time, just enclose the type in quotes. Works in pycharm

  9. Other commenters have already pointed you to the mypy docs that describe your forward reference problem and how to fix it.

    Implicit return types have been proposed before and will probably be implemented at some point behind a flag, but your argument in favor of the feature because no one could be expected to know the correct return types themselves doesn’t make sense to me. Have you read the docs for the typing module? Without knowing anything else about your code, I can minimally guess that get_iter() returns an Iterator or an Iterable, and get_closure() returns a Callable.

    As an aside, your comment “I like Python, but this lack of typing makes it unsuitable to anything over a few thousand lines of code” is a pretty wild claim considering the existence of websites like Reddit and Instagram, and applications like Dropbox.

  10. Sorry if this has been said before, but you can reference Node inside Node in function annotations starting with python3.8. They did exactly what you suggested and changed the class evaluation such that it now is possible.
    If you want to use this feature in older versions of python use
    ‘from __future__ import annotation’

  11. Check out nim! Based on your complaints, it sounds like a language that would be beneficial to your programming habits and it plays well with C/C++ code.

  12. For the first problem, put quotes around the type.

    For the second, just declare the return type. How is that unusable? Python uses duck typing, so how would the type checker know which protocol you’re intending to return?

Leave a Reply to jek Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s