Using ipset the wrong way

Sometimes you can find a key-value store in the most unexpected places

ipset
Python

25 March 2022

One night laying on my back watching the ceiling sleeplessly, I had strange thoughts. Using ipset, we can store IP addresses (and some other things) which can later be used to simplify our iptables rules.

But what is an IP address, if not 4 bytes of random data? At least for IPv4. So, in theory, we could take a longer text, convert it to IP addresses and store it in ipset, making it a generic key-value store.

What real-world application would this have? Most probably nothing, but the idea sounded interesting enough to investigate some more. Maybe I can learn a thing or two from it.

Text to IP address

Let's start at the beginning. We have some good-looking text, and we want to transform it into IPv4 addresses.

'notebook'

First, we need to split it into 4-byte chunks because that's what we can store in an IP address.

['note', 'book']

Then we convert every character into a number.

[[110, 111, 116, 101], [98, 111, 111, 107]]

And at last, we join them together with dots to get the IP addresses.

['110.111.116.101', '98.111.111.107']

Funny little fact: the note belongs to a Chinese telecommunications company, and the book belongs to Verizon.

If the length of the text cannot be divided by four (Who would be such evil to write text like that?), then we fill the missing bits with zeros that we have to cut off when we convert the addresses back to text.

'pencil'
    => [112, 101, 110, 99, 105, 108]
        => ['112.101.110.99', '105.108.0.0']

Storing the IP addresses

Naturally, we will use ipset for that, but we can already suspect that we will have some complications by looking at the name. It's a set, so an IP address can be in it only once (we cannot store the gomugomu text), and the order of the members isn't guaranteed (perhaps we get back booknote instead of notebook).

By default, we can store 65536 IP addresses in an ipset, so we could use the first two bytes of an address (that's exactly 65536 different values) as a serial number, and the other two bytes will be the data. It solves both problems of sequence and uniqueness, but we halve the available storage.

Lucky for us ipset can store other things, not just IP addresses. For example, IP address and port pairs. And there are 65536 different ports. What a pleasant surprise. So, we can use the port as a serial number, and the address will be just for the data.

In practice, this is how it would look like to store the notebook value under the drawer key:

# ipset create drawer hash:ip,port
# ipset add drawer 110.111.116.101,0
# ipset add drawer 98.111.111.107,1
# ipset list drawer
Name: drawer
Type: hash:ip,port
Revision: 5
Header: family inet hashsize 1024 maxelem 65536
Size in memory: 216
References: 0
Number of entries: 2
Members:
98.111.111.107,tcp:1
110.111.116.101,tcp:0

Let's see some code

Converting back and forth won't be a big surprise. We already discussed the method earlier.

def text_to_ip(text: str) -> List[str]:
    parts = [str(c) for c in text.encode()]
    remainder = len(parts) % 4
    if remainder > 0:
        parts += ['0'] * (4 - remainder)

    addresses = []
    for i in range(0, len(parts), 4):
        addresses.append('.'.join(parts[i:i + 4]))

    return addresses

We convert the text into bytes and convert the individual bytes back to strings, so later, the join will work. Next, fill the missing bytes with zeros so the resulting length will be divisible by four, and at last, we make each group of fours into an IP address.

def ip_to_text(addresses: List[str]) -> str:
    text = []
    for addr in addresses:
        text += [chr(int(c)) for c in addr.split('.')]

    return ''.join(text).strip('\x00')

Converting it back to text is even easier. We just convert all parts of the IP address back to the corresponding character, join it back together into one long string, and cut off the zeroes from the end.

Storing the addresses could be a bit challenging. Of course, we could use the subprocess module to call the ipset command hundreds of times to save a single value, but it does not feel that elegant, let alone efficient.

We could use libipset shipped with ipset and the ctypes module of Python. It's a bit more complicated, but it's also ten times faster than using subprocess in this case.

First, we will need something to talk to the libipset library.

from ctypes import cdll, c_int, POINTER, c_char_p, CFUNCTYPE, c_void_p


class IpSet:
    __output = b''

    def __init__(self):
        self.__library = cdll.LoadLibrary('libipset.so.13')
        self.__library.ipset_load_types()
        self.__library.ipset_init.restype = POINTER(c_int)
        self.__ipset = self.__library.ipset_init()
        self.__library.ipset_custom_printf(
            self.__ipset,
            None, None, self.__ipset_print_outfn,
            None
        )

    def __del__(self):
        self.__library.ipset_fini(self.__ipset)

    def run(self, command: List[str]):
        IpSet.__output = b''
        command = ['ipset'] + command

        self.__library.ipset_parse_argv(
            self.__ipset,
            len(command),
            (c_char_p * len(command))(*[
                c_char_p(arg.encode()) for arg in command
            ])
        )

        return IpSet.__output

    @staticmethod
    @CFUNCTYPE(c_int, POINTER(c_int), c_void_p, c_char_p, c_char_p)
    def __ipset_print_outfn(session, p, fmt, outbuf):
        IpSet.__output += outbuf
        return 0

We need to load the library and call some functions on it, so it's appropriately initialized, and of course, we need to juggle with C types here, so there is "a bit" of extra code because of that, but at the end, we successfully run the command.

It wasn't an easy ride to come up with that class, it took a considerable amount of time, and I had to go through the documentation of ctypes, the relevant part of the source code of ipset, and of course, Google also helped a lot. In the end, I managed to glue together all the pieces without getting segmentation faults constantly. It works, but (considering my slight incompetence in this area) it might not be the perfect solution.

From this point, it's a smooth ride to write the client to our new key-value store.

import re


class IpSetKeyValueStore:
    def __init__(self, ipset: IpSet):
        self.__ipset = ipset
        self.__ip_pattern = re.compile(r'(\d+\.\d+\.\d+\.\d+),.*:(\d+)')

    def __del__(self):
        del self.__ipset

    def get(self, key: str) -> str:
        result = self.__ipset.run(['list', '-output', 'save', key])
        data = self.__ip_pattern.findall(result.decode('utf-8'))

        addresses = [ip for ip, _ in sorted(data, key=lambda x: int(x[1]))]
        return ip_to_text(addresses)

    def set(self, key: str, value: str) -> None:
        self.__ipset.run(['create', '-exist', key, 'hash:ip,port'])
        self.__ipset.run(['flush', key])

        i = 0
        for ip in text_to_ip(value):
            self.__ipset.run(['add', key, f'{ip},{i}'])
            i += 1

    def delete(self, key: str) -> None:
        self.__ipset.run(['destroy', key])

There are many ways to improve and extend this further. For example, ipset supports timeouts, so expiring keys can be easily added, or the storage capacity can be significantly increased using IPv6 addresses. I leave these as an exercise for the reader.

It is also worth mentioning that we could add a comment when we store an IP address, making it so much easier to store any data in the ipset. But where's the fun in that?

deadlime