Defective Language

# The bitwise complement, or not operator, is unsafe

I came upon a problem with bitwise complement (logical not): it isn’t safe. It’s nothing new, but it really registered while implementing the logical operations in Leaf. The issue is the result depends on the size of the type. This leads to generating invalid results when combined with operators like `and`, `or`, `xor`. I don’t see an easy fix for this problem.

### The problem

The bitwise complement inverts the bits of a binary value: a logical not operation on each bit. This operation only makes sense if we’re speaking of a fixed bit-size. This means numbers have different complements depending solely on the size of their type. Consider the value `1`:

• `¬1` = `0`
• `¬01` = `10`
• `¬001` = `110`
• `¬0001` = `1110`

The complement’s value depends on the size of `1`. That almost sounds silly, and slightly disconcerting. But it’s the reality in programming. And it’s problematic.

 ```1 2 3 4``` ```var a : binary 32bit = pine() var b : binary 64bit = cone() var c = b and not a ```

At first glance this seems fine, but it most likely doesn’t produce the desired result. `not a` only produces a complement for the 32 bits in `a`. It will then be 0 extended to 64-bits. The top 32-bits will always be cleared, which is not likely the intent.

Something curious happens when we use a signed value. If the top-most bit of a signed 2’s complement value is `1`, the value is negative. When negative values are extended, they are sign-extended: the new bits take the value of the highest most bit in the input. This keeps negative values negative.

This makes the not operator magically work in a lot of common situations. A positive number is negative in its complement form, and will be bit extended “correctly”.

C also promotes all small integers to the normal integer size before doing any bitwise operations. Given a `int8_t a` the result of `~a` is a 32-bit value. This again, makes some operations work as expected but is confusing when you start dealing with larger integers.

This leads to results that aren’t as immediately obvious as one might hope (the result is in the comment on each line):

 ```1 2 3 4 5 6 7 8 9``` ```printf( "%lX\n", ~int(0) & ~int64_t(1) ); //FFFFFFFFFFFFFFFE printf( "%lX\n", ~int64_t(0) & ~int64_t(1) ); //FFFFFFFFFFFFFFFE printf( "%lX\n", ~uint64_t(0) & ~uint64_t(1) ); //FFFFFFFFFFFFFFFE printf( "%lX\n", ~unsigned(0) & ~uint64_t(1) ); //FFFFFFFE printf( "8bit: %X\n", ~int(0) & 0x7F ); //7F printf( "8bit: %X\n", ~int(0) & int8_t(0x7F) ); //7F printf( "8bit: %X\n", ~int(0) & 0x80 ); //80 printf( "8bit: %X\n", ~int(0) & int8_t(0x80) ); //FFFFFF80 ```

It’s these small unexpected situations that can create hard to track defects in code.

### Leaf complexity

Leaf allows for literal rational types. This is to ensure values are not truncated prematurely, or assigned to the wrong type. It solves the 1/2 = 0 problem. It however magnifies the bitwise complement problem.

Literal values may be declared, for convenience, without an explicit type:

 ```1 2``` ```literal flag_a = 0x1 literal flag_b = 0x2 ```

That is very much different than:

 ```1 2``` ```literal flag_a : binary = 0x1 literal flag_b : binary = 0x2 ```

Which is also different than explicit bit-sizes:

 ```1 2``` ```literal flag_a : binary 16bit = 0x1 literal flab_b : binary 128bit = 0x1 ```

The `not` of these values is a different scalar depending on the size of the type. Without care it can be error-prone to use it. I have yet to think of a good general solution, but have some ideas.

#### Material nonimplication operator

One key location where this causes a problem is in the expression `a and not b`. It requires `not b` be evaluated without knowing the size of `a`, thus resulting in an incorrect complement.

On closer look though, this operation is actually the material nonimplication operation. If instead we had a single operator the problem would go away. With `a ↛ b` the two types can be unified in size first, and then the operation done. This compiles to the standard `a and not b` but with `b` having the correct bit size.

The operator `↛` works in Leaf, but on the odd chance your keyboard doesn’t have that symbol, I’ve given it a second one: `and_not`. I’m uncertain of the name, but certainly more programmers will understand that than material nonimplication. Possibly `exclude` is another name for it.

#### Removing not

I thought about removing bitwise complement entirely, but that seemed somewhat radical. Though not always safe, it is a useful operation.

I’ve also added several other operators to see if it helps: xor ↮, nor ↓, nand ↑. I won’t know if this is helpful until more Leaf code is created.

The other option is take a C-like approach and special encode the complement. Not as a negative, but perhaps a special complement marker. This would be an easy addition to rational numbers, but for binary values it would require an actual new type marker to encode. It sounds like more trouble than it’s worth.

`not` isn’t safe and I don’t know what to do about it!

### 6 replies »

1. Matthew says:

“The bitwise complement inverts the bits of a binary value: a logical not operation on each bit.”

¬1 = 0
¬01 = 10
¬001 = 100
¬0001 = 1000

¬001 = 110
¬0001 = 1110

Or am I missing something?

• You’re correct. I’ve fixed my mistake now, thank you.

2. Jona Stubbe says:

It seems all this “unsafety” would be avoided if you just didn’t allow implicit integer widening. Or am I wrong?

• Yes, I failed to mention that as one of the options. It seems to make a lot of sense to disallow this for `binary` types. I’m going to think about this as an option.

I guess the problem also affects other operators, even something like `and`. Consider a `var a : 16bit` where all bits are set. If you and another value with this you only get 16bits retained, though you might expect all bits, since that’s what you assigned to `a`.

3. Scott says:

~x is the same as -x-1, so there’s an easy way to even extend to big integers.

That also demonstrates, though, that this is no more “unsafe” than addition. 0u8-1 and 0u16-1 also give different results.

I think what you’re actually finding here is that you need an integer overflow story…

• I’ve now limited the bitwise operations to the `binary` type only (like `unsigned` in C). I will probably also reject automatic widening. These both make certain bitwise code a bit more cumbersome, but with the advantage of not getting incorrect values.

Defining ~x as `-x-1` doesn’t work for non-fixed size types. In Leaf that would mean `~0b0010 = -1` which is highly unlikely what the user wants. Constant numbers in Leaf don’t have a fixed size (or even type) until they are coerced into an actual type (they are tracked as rationals, or 1024-bit precision floats during compilation). This is part of why bitwise operations make no sense on non-binary types in Leaf.