Good old UNIX bashing
I’m a Fedora user, and Fedora is a GNU/Linux system. While GNU stands for “GNU’s Not UNIX” it is definitely a UNIX-like system and Linux is definitely a UNIX-like kernel. There are other UNIX-like systems in the wild, and there is even POSIX: the Portable Operating System Interface. Don’t ask me where the X fits in that acronym, my guess would be to rhyme with UNIX since POSIX is a UNIX-like family of standards that still carries stupid UNIX limitations from a different time in computing. POSIX had decades to fix those limitations, therefore I blame POSIX for the one I will cover today.
But before that, let’s digress a bit.
My stupid dial-up modem
I already mentioned that my dial-up modem is built on Fedora and coming from me that sounds amazing. The sad reality is that like most hardware, my modem’s firmware is crap.
Bullshitware
I can see without a problem the difference between hardware and software. For technical terms it’s a rare occurrence but I find the French equivalent even better. The French word “logiciel” is constructed like “logicware”, which to me is more expressive than software. But software works for me regardless.
Then you have different “kinds” of software, like free software or open source as defined respectively by the Free Software Foundation (FSF) and the Open Source Initiative (OSI). The rest to me is bullshitware.
I suppose it was a trend from the last century to name varieties of software somethingware: freeware (of course not to be confused with free software, I mean, who would fall for that?) or shareware. Then there are more legitimate entries that still register high on my bullshit-o-meter: malware, ransomware or even bloatware.
Finally, my personal (least) favorite is firmware. Firmware is software that controls hardware. I would probably need a whole post to explain why firmware is bullshit, but in one sentence it’s simply software that pretends to be special.
My modem’s firmware
My modem was unusable when I unpacked it. It failed to fulfill its main job, which was to connect me to the internet. I had to call the manufacturer to be redirected to their unmaintained (likely on purpose) support system, to be provided with a firmware update.
The firmware binary is hosted with sources and build instructions. Everything is present except the vendor’s web interface. You can see severely outdated and vulnerable packages. You can even see build instructions, and that’s where it gets really ugly for me: you need Fedora 5 to cross-compile the firmware.
When that particular model was released, I can assure you that Fedora already had two digits in its version numbers. So that is likely an unmaintained tool chain that they keep reusing from model to model. And don’t get me wrong, I’m not complaining about the act of reusing.
Stupid self-hosting
I’m going to need a new firmware update soon, and sadly there is nothing I can do besides hoping for the vendor to provide a patched firmware. My gut feeling tells me that this is a bug I could fix myself if I had access to the source code and the tool chain.
I wanted to self-host a service that turns out to also get a high stupidity rating but one full-blown digression is enough. After all this is a post about something POSIX, I could spend ages before getting to the point.
After some efforts and a lot of complaining I finally have a hosting setup that is almost satisfying. I try to access it from a remote location, and it looks fine. When I try from my laptop I get an invalid TLS certificate.
I configured the modem to forward TCP packets to ports 80 and 443 to the host where my service is installed, so what went wrong?
Stupid firmware
Much like any “modern” modem or set-top box, the modem can be configured via a web interface available on the gateway’s address. From there I can configure the DSL connexion, DNS resolvers, some aspects of the router like port forwarding or DHCP leases.
The DHCP customizations offered are way too simplistic and I had to configure the server with a fixed IP address, opting out of DHCP, to even enable port forwarding reliably.
The thing I didn’t expect was that port forwarding wasn’t based on the IP
address but on which side of the modem the request comes from. So inside my
network any request to the public IP address is handled as if the gateway IP
address was used. I checked the packets with tcpdump
and Wireshark.
So when I try to reach my self-hosted service from inside my private network I’m presented with the TLS certificate of the modem’s web interface.
Working around stupidity
As I am waiting for my new shiny support ticket to find a solution, I worked
around the problem with iptables
. It’s not pretty, but it works. It works,
but I don’t want to manually enable or disable the workaround every time I
switch to a different network.
It was a no-brainer for me that the solution would be a shell script, that I would manage as a systemd unit. Since systemd has a tight integration with DBus I thought it would be possible to have a unit that would be triggered by DBus messages. I couldn’t find anything of the sort, but I didn’t search hard.
Then I looked at how to integrate with DBus from a shell script that would
monitor certain messages: if systemd can’t trigger executions for me I might
as well have a simple service where the shell scripts listens to the events
I’m interested in. What I could find looked too complicated for my taste so
instead of listening to NetworkManager’s DBus API I simply used nmcli
.
I’m also not too fond of nmcli
, for example nmcli connection monitor NAME
won’t tell me when I connect to or disconnect from NAME. It tells me nothing
when disconnected and only tells me this three times when the connection is
established:
NAME: connection profile changed
Not very useful for what I’m trying to do…
Poking at nmcli
I finally find a combination of two commands to capture the
events I’m interested in. First, nmcli connection show --active
lets me know
the current status, and nmcli device monitor wlp2s0
gives me connected
and
disconnected
events for my wireless device.
At this point I’m still playing in my terminal and it’s trivial to process the output of both commands:
# wireless (dis)connected events
nmcli device monitor wlp2s0 | grep connected
# active NAME connection
nmcli connection show --active | grep -q '^NAME\s'
Now that it works I put a script together for my systemd service:
nmcli device monitor wlp2s0 |
grep connected |
while read line
do
# disable iptables workaround
if nmcli connection show --active | grep -q '^NAME\s'
then
# enable iptables workaround
fi
done
I start the service and play with my connection while monitoring iptables
rules and then… crickets. End of the longer-than-usual digression.
Interactive buffering
Finally, we can look at my POSIX peeve. Why is nothing happening? And why did it work in my terminal emulator?
It turns out to be a fairly easy answer that I knew but forgot after years of
not running into this use case. Writing to the standard output is always
buffered by default, and line-bufferred when writing to a tty. So what I was
seeing from my terminal was very responsive, while the script feeding the
while
loop was waiting for grep
to fill its buffer. It could have waited
for a long long time…
Conditional buffering
It’s rather easy to reproduce in a terminal. I can create a small file and feed it to grep along with my terminal input:
$ cat >test.txt
a
b
c
d
^D
$ cat test.txt - | grep .
a
b
c
d
^D # hangs until I close cat's standard input
$ exit
Now, if the output of the grep
command is not a tty, events happen in a
different order:
$ cat test.txt - | grep . | sed ''
^D # hangs until I close cat's standard input
a
b
c
d
$ exit
The grep
command will perform full buffering since its output goes to the
sed
command’s input.
Standard buffering
At this point I could come to the conclusion that grep
is a stupid utility
since it assumes that only a tty implies an interactive workload. My problem
with this assumption is that my workaround script sits in limbo because of
that.
That conclusion would be slightly incorrect because this assumption is made by
the C runtime. There is a function called setvbuf()
to control the buffering
behavior and the defaults for the stdout
stream are line-buffered for a tty
and fully buffered otherwise.
I can even verify it with a simple program that will pipe stdin
to stdout
line by line:
#define _POSIX_C_SOURCE 200809L
#include <stdio.h>
#include <stdlib.h>
int
main(void)
{
char *line = NULL;
size_t n = 0;
while (getline(&line, &n, stdin) >= 0)
printf("%s", line);
free(line);
if (ferror(stdin)) {
perror("getline");
return (EXIT_FAILURE);
}
return (EXIT_SUCCESS);
}
I will call this program stupidcat
to stay on brand and it will behave like
grep
regarding output buffering:
$ c99 -o stupidcat stupidcat.c
$ cat test.txt - | ./stupidcat
a
b
c
d
^D # hangs until I close cat's standard input
$ cat test.txt - | ./stupidcat | sed ''
^D # hangs until I close cat's standard input
a
b
c
d
$ exit
So as it stands, my stupidcat
is as stupid as grep
, but I could add an
interactive mode that would enforce a specific buffering mode and not rely
on defaults. That would slightly increase the amount of code in stupidcat.c
but nothing scary or groundbreaking.
When sensible defaults turn stupid
My stupidcat
is stupid by design, but it’s only here to illustrate that this
behavior is not grep
’s fault. So all I need now is to tell grep
to do line
buffering for my non-tty interactive needs.
And that’s where things become really unpleasant. There is no such option, and
to be fair not only grep
but also sed
and cat
have this problem. Of all
the standard utilities I have used so far only cat
has a -u
option (which
I assume stands for unbuffered).
Oddly enough, GNU cat
defaults to unbuffered mode and as a result the -u
options is only supported to be ignored:
$ cat test.txt - | cat | cat
a
b
c
d
^D # hangs until I close the leftmost cat's standard input
$ exit
While I tend to be unhappy about GNU extensions because they hurt portability,
I’m also not surprised that GNU grep
has a --line-buffered
option, nor am
I surprised that GNU sed
has a -u
option (sounds familiar) aliased to
--unbuffered
(sounds familiar) and in that case I have to be in the GNU camp
even though more often than not I am in GNU’s camp, but I digress.
What appals me, is that POSIX 2017 still doesn’t have an option for grep
and
sed
, and that the FUTURE DIRECTIONS section of the manuals simply says none.
There might be other pipeline-friendly commands lacking the ability to change
the buffering behavior, there is one that I know of that I will mention later.
A pseudo solution
Another solution would be to have a program that pretends to be a tty to make
other commands buffer on a line-by-line basis. Such a program could openpty
or forkpty
a pseudoterminal and answer yes to the existential isatty
question.
In other words my script could instead look like this:
nmcli device monitor wlp2s0 |
grep connected |
pty | # this line was added
while read line
do
# disable iptables workaround
if nmcli connection show --active | grep -q '^NAME\s'
then
# enable iptables workaround
fi
done
It turns out though that such solutions exist like GNU stdbuf
or the
unbuffer
command from the expect project
but suddenly you need a TCL runtime. I’m not
saying that those programs solve this by pretending to be a tty, but I’m
definitely saying that I didn’t find a standard POSIX utility to solve this
problem.
To solve this stupid problem, I can use grep --line-buffered connected
and
be done with it, but what if I want a portable solution to the general problem
of interactive workloads not involving a tty?
An awk(ward) solution
A discussion on shell utilities mentioning grep
and sed
wouldn’t be
complete if awk
wasn’t somehow also covered given the overlap that can exist
between the three commands, to which I will promptly remedy.
As promised, I found another utility that suffers from full buffering if its
output is not a tty, and that command is awk
. While awk
is very different
from grep
and less but still quite different from sed
, my solution was to
emulate my grep
command with a small awk
script.
The WTF moment
A quick search quickly led me to the GNU awk
documentation, and as usual you
can always count on awk
when a utility lets you down. According to the docs
all my problems can be solved with the fflush()
function:
fflush([filename])
Flush any buffered output associated with filename, which is either a file opened for writing or a shell command for redirecting output to a pipe or coprocess.
[…]
POSIX standardizes
fflush()
as follows: if there is no argument, or if the argument is the null string (""
), thenawk
flushes the buffers for all open output files and pipes.
Wondering why searching for the “buf” pattern in the POSIX awk
manual yields
nothing this time I look for “flush” and still no result. I’m not sure where
the GNU awk
folks got the idea that this was in POSIX, maybe it was
discussed and strongly considered, but it’s not in the POSIX 2017 docs.
More awkwardness
Since I was in the neighborhood I kept reading to find what knobs I had to control the standard output and quickly came across a very awkward section:
Output Statements
Both
printf
statements shall write to standard output by default. The output shall be written to the location specified byoutput_redirection
if one is supplied, as follows:
> expression>> expression| expression
In all cases, the expression shall be evaluated to produce a string that is used as a pathname into which to write (for
'>'
or">>"
) or as a command to be executed (for'|'
).
Despite the weird formatting on the POSIX end, I sensed that I was onto
something so I check GNU awk
and nawk
manuals to come to the conclusion
that I could emulate my grep
statement like this:
awk '/connected/ {print | "cat"}'
In other words, for each line from the nmcli
output awk
forks a shell to
execute the cat
command and the cat
command will naturally flush its
output when it exits. And with that the cat is out of the bag, there is no
global solution with awk
and if you script has multiple print
statements
you’ll need to pipe them all! (or wrap that in a function.)
Closing words
I can’t remember when I last ran into buffering problems with a shell script, it must have been at least a few years. The difference between then and now is that nowadays I try to stick to portable code even when portability is not a requirement because I can always learn or rediscover something.
It’s a lot of waste to fork a shell and a cat
process to emulate fflush(3)
but that doesn’t bother me. The shell script doesn’t run a tight while
loop
and connecting to or disconnecting from a wireless network is not something
that happens continuously. I also love how mentioning in a casual discussion
that to solve a silly networking problem I forked a cat from shell in an awk
script: it sounds like nonsensical plain English you could find in something
like Alice in Wonderland, but I digress.
The GNU awk
manual also states that forking a shell should be enough to
flush:
awk '/connected/ {print; system("")}'
Even though nawk
behaves similarly, and GNU awk
flushes without forking a
shell for maximum efficiency, I see nothing of the sorts in the POSIX awk
manual, so I’ll stick to my guns and ignore this claim.
Lessons learned: non free firmwares are to be avoided like a plague and much like a plague it’s kinda hard to avoid, you can’t always gawk at a manual and trust what it says, and POSIX 2017 failed to identify a decades old common pitfall that can trap even experienced shell script authors.