Am I, as a programmer, in danger of being obsoleted by a program of my own creation? A recent BBC article gives programmers an 8% chance of being automated, whereas an NPR guide puts this as 48%. This is quite a contrast. The second number is so shocking that I felt the need to dig a bit deeper.
The first bit of hilarity
Making predictions about the future of an occupation is highly subjective. There are a lot of variables to consider, and a lot of possibilities of what automation entails. It’s not surprising that two studies would end up with very different results.
Except both of these articles are based on the same research by Frey and Osborne!
The source material lists “Computer Programmers” as job #293 with a probability of 0.48 of being automated. It also lists “Software Developers, Applications” as job #130 with a probability of 0.042 and “Software Developers, Systems Software” as #181 with a probability of 0.13. Just for fun there’s also #208 which includes “Web Developers” and has a 0.21 probability.
At this point alone I think one is justified in being suspicious of the source material. It’s hard to imagine how those positions are differentiated to the point of having such a huge variance in automation probability.
It’s not clear how the BBC has derived their value of 8.5% for programming. The only job listed with that value is #164, “Fitness Trainers and Aerobics Instructors”. Perhaps they grouped together several listings instead, or did their own analysis on UK data?
The second nonsense
Looking through the results a couple of other jobs catch my eye. #109 “Network and Computer Systems Administrators” and #110 “Database Administrators” both with a value of 0.03! That’s a lot lower than programmers, indeed some of the lowest in the whole study.
This doesn’t make any sense though. It only takes a casual glance at Amazon Web Services, Azure, or any other platform-as-a-service provider to note these two jobs have already been highly automated. With the push of a button I can deploy a Ruby or Python app on Heroku with zero knowledge of how networks or databases even work.
What went wrong?
These few spot checks point to something being wrong. Reading the study it appears they used a form of machine learning (ML) to categorize jobs. The form of ML they are using requires training data. They mark several jobs as being automatible or not, and then let the algorithm determine the rest.
The source of the data is O*Net, an online service from the department of labour in the US. Though the database provides a lot of details about several jobs, it was not intended for type of analysis — something pointed out by the researchers themselves. I did check a few of the jobs though and the descriptions and “facts” in the database are reasonable.
A questionable step now was the researchers’ choice to choose only a few of the details from the DB. They hand-identified certain facts that determine how automatible a job is and appear to have used only those in their analysis. To me this seems a bit odd, and defeats the purpose of machine learning. If we already know the details that determine automation, why exactly are we doing the analysis? It seems like it’d make more sense to just use all of the details in the DB and use ML to figure out which ones are pertinent.
In any case, to train the ML algorithm several of the jobs must be marked as either automatible or not. In the researchers own words, “Our label assignments were based on eyeballing the O∗ NET tasks and job description of each occupation.” So I decided to check a few of their labels:
- “Chefs and Head Cooks” are marked as 0 (not automatible) whereas “Cooks, Fast Food” are marked as 1 (automatible). This seems highly questionable. I understand the roles tend to differ in a restaurant, but to absolutely say one can be automated and the other absolutely not doesn’t seem valid.
- “Waiters and Waitresses” are assigned a 0 (not automatible). Given the presence of food automats, and indeed several automated restaurants this also seems like an invalid assignment.
- “Technical Writers” is given a surprising value of 1 (automatible). This seems doubtful since it would require a computer to understand the profession it is documenting. Plus it’d have to form coherent sentences, which if I look at the current state of machine translation is a long way off.
- “Concierges” are assigned 0 (not-automatible), but I have trouble even identifying one task they do that hasn’t already been put on a self-service website.
A machine learning panacea
The scare of jobs being obsoleted by machines is a sensational topic. How robots and people coexist in the future is a big concern for many. It’s not surprising that the NPR and BBC latch onto such research (just as I’m riding the wave with my own writing). They might however have done a few checks on the source material first. I expected more of these two organizations. Publishing the raw, unverified output a machine learning algorithm doesn’t seem wise.
To do research and present potentials for automation, as done in the original study, is totally reasonable. Certainly I find fault with the study, but that is the whole purpose of research. The authors lay out their method and data, and I can point out what I consider flaws. In this case I think they’ve restricted the input to their ML too much, and have made some training errors.
Despite questioning this research, I actually hope a lot more programming will be automated. I find myself repeating the same tasks far too often. Automation is a goal that all programmers should be striving towards. It doesn’t mean we’ll be obsoleted, it just means we get to work more on the interesting bits and less on the boring stuff.