Key questions
There is an ancient mayan saying that computers can solve a lot of problems that we wouldn't have to solve without them. Today, we can sink our teeth into a problem just like that. It pairs really well with this lightning talk from 2012 called Wat.
My own Wat-moment started with the Buffer
class. Let's get two of them right away.
$ node
> data1 = Buffer.from([0xf5, 0xcf, 0xe2, 0xf0, 0xef])
<Buffer f5 cf e2 f0 ef>
> data2 = Buffer.from([0xfe, 0x99, 0x88, 0xeb, 0xd9])
<Buffer fe 99 88 eb d9>
It's clearly visible to the naked eye that these are indeed two different buffers, but the Node.js can also confirm it for us:
> data1 === data2
false
That's all nice and shiny, but let's look at another example.
> container = {}
{}
> container[data1] = 'foo'
'foo'
> container[data2]
???
What will be the value of the last expression?
a) null
b) undefined
c) it creates a black hole in place of the node interpreter
d) nothing
Maybe a lot of people would go with the b
. Maybe someone who knows Node.js a bit better would pick c
. But the right answer is so terrible that it's not even an option.
> container[data2]
'foo'
What happens behind the scenes? A key of an object cannot be a Buffer
type so it calls a toString
method on it automatically. In case of the Buffer
type, the toString
can have an optional encoding
parameter, but if it doesn't get one it'll go with utf8
by default.
Our good-looking byte array doesn't know anything about behaving as a well formed UTF-8 string (that's why it's in our example), so all its bytes are replaced with the Unicode replacement character, which looks like this: �.
Both of our buffers are ignorant in this regard so at the end of the conversion they both contain only five replacement characters.
> data1.toString() === data2.toString()
true
> container
{ '�����': 'foo' }
After all this it seems reasonable that we get back the value for the first data when we use the second data as the key. Now imagine this situation deep down in an in-memory cache layer and the only symptom you see is that sometimes, maybe once in a hundred thousand cases the data from the cache is not right. It's a really fun experience.
What could we do about this? Maybe we are better not using the Buffer
type as a key, but if we really need to, we could call the toString
with a different encoding
parameter. The examples below could all work in this case:
> data1.toString('hex') === data2.toString('hex')
false
> data1.toString('base64') === data2.toString('base64')
false
> data1.toString('binary') === data2.toString('binary')
false