Microsoft is absolutely at fault for WannaCry

Microsoft has played a significant role in the damage of the WannaCry ransomware. Certainly the proximate cause lies with the malware’s authors, and they should be held accountable. The complacent NSA is also culpable in their role in creating or discovering, yet failing to report the exploit. We can even say that users must share part of the blame for not keeping their system up-to-date. But in no uncertain terms, it is the design of Microsoft’s Windows operating system allowed the attack to happen.

Remote code execution

WannaCry uses an exploit in the SMB (file server) subsystem of Windows. It executes arbitrary code and takes control of the machine. This raises a vital question:

Why is a component that is responsible for file sharing capable of taking over the machine?

If we look at how Window’s architecture it’s not a problem explaining why this is possible. These subsystems are treated as privileged users and given extensive access to the computer. This is the core of the problem. There is a lack of privilege separation and an assumption that components are well-behaved.

If we had no alternative to this design we could cut Microsoft some slack. But we do have ways to mitigate such attacks, and it appears Microsoft has chosen not to implement them. Thus they must bear a significant part of the responsibility for the WannaCry ransomware.

Windows 10 does not appear to have been hit by the malware. If this is due to actual architectural changes, as I describe here, then great! That’s a solid reason to upgrade. But it’s not clear if that is the case; the security bulletin indicates they patched remote code execution on Windows 10 as well.

Injecting code

Let’s assume for a moment that all software has defects, ones that would allow an attacker to compromise security. Given our known history this isn’t a bad assumption to make. Yet we continue to ignore this while writing software. We are still coding as though the system is impenetrable, which is a terrible practice.

We need to be defensive. Obviously the first line of defense is safer coding and execution: buffer protection, safe types, address randomization, etc. There’s lots of work in this direction, but it isn’t perfect, so we have to assume we’ll continue to fail here.

The second line of defense is not allowing an attacker to run arbitrary code. It sounds so obvious, so why isn’t it done? The WannaCry attacker injected their own code via the SMB system.

CPUs have no-execute and read-only flags for memory. An OS can separate executable code and data memory from each other. Had this been done the attack vector would not have worked. The attacker would still be able to corrupt the data memory, but there would be no way for them to jump into that code.

CPUs didn’t always have the no-execute ability, but it’s been around for over 15 years now. Is Windows not using this feature? And if it is, how exactly was the code injected? (It’s kind of understandable if WinXP didn’t support this feature, as it wasn’t widely available when that OS was released.)

Privilege escalation

Let’s extend our assumption to distrusting software entirely. A typical downloaded application cannot take over the system on its own, so why can the SMB component?

Consider some of the features of file sharing: we need access to a particular set of files, not the entire filesystem; we need some way to authenticate users; we need a way to access the internet. These are all well definable interfaces that an operating system can provide. By partitioning privileges the OS can limit what an application is capable of doing.

Yet it seems the WannaCry malware has gained full control of the system. This is only possible if the SMB component is not segregated. We know from Samba that this protocol can run as isolated software. There are also numerous technologies on other OSs that further segregate and isolate components. Were none of those employed in Windows SMB?

I understand changing the structure of an OS is a phenomenal amount of work, but I have to assume Microsoft has the resources. Maybe they are doing this and it just isn’t working. Why did the exploit gain so much access to the system?

And on and on

Assume the worst, that all our protections have failed. Surely we can still protect the user’s data somebody. Isn’t the sudden change of many files something Windows Defender could detect? Even if it didn’t, why isn’t there a rollback mechanism?

In fairness, Windows has options for making versioned backups. It is a user error for not enabling this, but there’s obviously something preventing people from doing so. I’m also not sure if these backup files would be protected from WannaCry.

We need to stop assuming our computers are safe and instead design assuming they will be compromised. This is a core tenet of secure server design, so why isn’t it applied to desktop systems?

Microsoft puts a lot of effort into security, but this doesn’t absolve them of blame in the WannaCry affair. Their system design has allowed this attack to happen, despite there being known techniques that could have prevented it, or at least mitigated the severity.

8 replies »

  1. The patch has been out since March. This exploit went zero-day in February. Those who were effected were not patched since at least March and performed no mitigations during the month of February (block ports, disabled SMBv1).

    Port 445 should not be open to the outside world, many were.

    It has been recommended for some time to disabled SMBv1 by Microsoft which takes a couple minutes to deploy with a GPO. A couple years ago it went into depreciation.

    There is a lot of burden as well on those who don’t patch, or those who don’t easily approve patches, or where SMBv1 was never disabled when recommended, or no mitigations are taken when possible against zero-days.

    • I don’t want to play favourites here. If this can cause the same problem on other platforms it means they are doing something equally wrong. There’s no reason that Samba should be able to get root access on any platform.

      I believe SELinux would prevent this flaw from escalating. I think it’s the direction all OS’s will have to take.

      Reading some of the reports it also appears there is a `noexec` option for filesystems, which could be used for the samba write share. This would be another good, and standard approach that would make sense.

  2. I believe that Microsoft’s fault in the virus is only indirect. Yes they left holes, but something seems to me that they did it on “someone’s request”!

  3. I think that they should be sent to court for this attack. I wonder if any company hit by the virus did it…

    • They published the patch well before the outbreak and set the computers default options to try and patch daily. They blog about each month’s patches. They have a well known patch cycle (patch Tuesday). I’m pretty sure even a judge who has a clerk print out his/her emails would throw this out. Maybe they need to dispatch the Geek Squad proactively!

  4. I tend to agree that smb should run as a separate service user. Still I guess there might be a lot that should be rewritten cause smb not only needs access to the filesystem but it also requires user information for authentication etc. I don’t know how separated these systems are in Windows but it is definitely possible. In addition, the same service has to access files from different users.

    Regarding not-executable memory I also agree, and i am confident this is already implemented. However there are methods of overcoming this (see ROP attacks).

    Last but not least, this kind of Ransom attacks don’t need access to the whole system but access to the files is enough. Backups and versions are also accessible by the smb service this this is another obstacle.

    At the end the users’ boxes have to be patches and ahould not expose port 445 to the internet. This is already a simple but fairly good protection measure.

  5. Talking about non-executable memory in a post-ROP world feels out of touch and/or naive. I’ve never seen a single line of Windows source code but I could guarantee the SMB binary marks all the RAM it works with as non-executable. That’s simply not a meaningful deterrent to a skilled attacker at this point. I’d suggest reading at least the Wikipedia page on the subject, if not the original paper: https://dl.acm.org/citation.cfm?doid=1315245.1315313

    Note there have been follow up papers that showed techniques for automatically generating these kinds of attacks *without* any access to source code.

    And isolation is a great theory but we’re also living in a world where skilled attackers are writing exploits to break out of fully isolated VMs into the hypervisor level, let alone programs running within an OS.

    Security is really, really hard. The defender has to make every decision perfectly right, while the attacker only needs one opening to make an exploit. I think the idea that anyone can write non-trivial code that’s anything more than “hard to break” is a pipe dream. We have to accept that all programs are flawed and it’s only a matter of time before those flaws are exploited. That’s why we have patches and mitigation strategies like limiting your exposed attack area — things that the victims of WannaCry didn’t do. Software companies take this stuff really seriously and do all they can, but people need to realize they also need good system admins for critical systems because things are just really complicated in today’s world.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s