POSIX interactive stupidity

Good old UNIX bashing

I’m a Fedora user, and Fedora is a GNU/Linux system. While GNU stands for “GNU’s Not UNIX” it is definitely a UNIX-like system and Linux is definitely a UNIX-like kernel. There are other UNIX-like systems in the wild, and there is even POSIX: the Portable Operating System Interface. Don’t ask me where the X fits in that acronym, my guess would be to rhyme with UNIX since POSIX is a UNIX-like family of standards that still carries stupid UNIX limitations from a different time in computing. POSIX had decades to fix those limitations, therefore I blame POSIX for the one I will cover today.

But before that, let’s digress a bit.

My stupid dial-up modem

I already mentioned that my dial-up modem is built on Fedora and coming from me that sounds amazing. The sad reality is that like most hardware, my modem’s firmware is crap.

Bullshitware

I can see without a problem the difference between hardware and software. For technical terms it’s a rare occurrence but I find the French equivalent even better. The French word “logiciel” is constructed like “logicware”, which to me is more expressive than software. But software works for me regardless.

Then you have different “kinds” of software, like free software or open source as defined respectively by the Free Software Foundation (FSF) and the Open Source Initiative (OSI). The rest to me is bullshitware.

I suppose it was a trend from the last century to name varieties of software somethingware: freeware (of course not to be confused with free software, I mean, who would fall for that?) or shareware. Then there are more legitimate entries that still register high on my bullshit-o-meter: malware, ransomware or even bloatware.

Finally, my personal (least) favorite is firmware. Firmware is software that controls hardware. I would probably need a whole post to explain why firmware is bullshit, but in one sentence it’s simply software that pretends to be special.

My modem’s firmware

My modem was unusable when I unpacked it. It failed to fulfill its main job, which was to connect me to the internet. I had to call the manufacturer to be redirected to their unmaintained (likely on purpose) support system, to be provided with a firmware update.

The firmware binary is hosted with sources and build instructions. Everything is present except the vendor’s web interface. You can see severely outdated and vulnerable packages. You can even see build instructions, and that’s where it gets really ugly for me: you need Fedora 5 to cross-compile the firmware.

When that particular model was released, I can assure you that Fedora already had two digits in its version numbers. So that is likely an unmaintained tool chain that they keep reusing from model to model. And don’t get me wrong, I’m not complaining about the act of reusing.

Stupid self-hosting

I’m going to need a new firmware update soon, and sadly there is nothing I can do besides hoping for the vendor to provide a patched firmware. My gut feeling tells me that this is a bug I could fix myself if I had access to the source code and the tool chain.

I wanted to self-host a service that turns out to also get a high stupidity rating but one full-blown digression is enough. After all this is a post about something POSIX, I could spend ages before getting to the point.

After some efforts and a lot of complaining I finally have a hosting setup that is almost satisfying. I try to access it from a remote location, and it looks fine. When I try from my laptop I get an invalid TLS certificate.

I configured the modem to forward TCP packets to ports 80 and 443 to the host where my service is installed, so what went wrong?

Stupid firmware

Much like any “modern” modem or set-top box, the modem can be configured via a web interface available on the gateway’s address. From there I can configure the DSL connexion, DNS resolvers, some aspects of the router like port forwarding or DHCP leases.

The DHCP customizations offered are way too simplistic and I had to configure the server with a fixed IP address, opting out of DHCP, to even enable port forwarding reliably.

The thing I didn’t expect was that port forwarding wasn’t based on the IP address but on which side of the modem the request comes from. So inside my network any request to the public IP address is handled as if the gateway IP address was used. I checked the packets with tcpdump and Wireshark.

So when I try to reach my self-hosted service from inside my private network I’m presented with the TLS certificate of the modem’s web interface.

Working around stupidity

As I am waiting for my new shiny support ticket to find a solution, I worked around the problem with iptables. It’s not pretty, but it works. It works, but I don’t want to manually enable or disable the workaround every time I switch to a different network.

It was a no-brainer for me that the solution would be a shell script, that I would manage as a systemd unit. Since systemd has a tight integration with DBus I thought it would be possible to have a unit that would be triggered by DBus messages. I couldn’t find anything of the sort, but I didn’t search hard.

Then I looked at how to integrate with DBus from a shell script that would monitor certain messages: if systemd can’t trigger executions for me I might as well have a simple service where the shell scripts listens to the events I’m interested in. What I could find looked too complicated for my taste so instead of listening to NetworkManager’s DBus API I simply used nmcli.

I’m also not too fond of nmcli, for example nmcli connection monitor NAME won’t tell me when I connect to or disconnect from NAME. It tells me nothing when disconnected and only tells me this three times when the connection is established:

NAME: connection profile changed

Not very useful for what I’m trying to do…

Poking at nmcli I finally find a combination of two commands to capture the events I’m interested in. First, nmcli connection show --active lets me know the current status, and nmcli device monitor wlp2s0 gives me connected and disconnected events for my wireless device.

At this point I’m still playing in my terminal and it’s trivial to process the output of both commands:

# wireless (dis)connected events
nmcli device monitor wlp2s0 | grep connected

# active NAME connection
nmcli connection show --active | grep -q '^NAME\s'

Now that it works I put a script together for my systemd service:

nmcli device monitor wlp2s0 |
grep connected |
while read line
do
	# disable iptables workaround
	if nmcli connection show --active | grep -q '^NAME\s'
	then
		# enable iptables workaround
	fi
done

I start the service and play with my connection while monitoring iptables rules and then… crickets. End of the longer-than-usual digression.

Interactive buffering

Finally, we can look at my POSIX peeve. Why is nothing happening? And why did it work in my terminal emulator?

It turns out to be a fairly easy answer that I knew but forgot after years of not running into this use case. Writing to the standard output is always buffered by default, and line-bufferred when writing to a tty. So what I was seeing from my terminal was very responsive, while the script feeding the while loop was waiting for grep to fill its buffer. It could have waited for a long long time…

Conditional buffering

It’s rather easy to reproduce in a terminal. I can create a small file and feed it to grep along with my terminal input:

$ cat >test.txt
a
b
c
d
^D
$ cat test.txt - | grep .
a
b
c
d
^D # hangs until I close cat's standard input
$ exit

Now, if the output of the grep command is not a tty, events happen in a different order:

$ cat test.txt - | grep . | sed ''
^D # hangs until I close cat's standard input
a
b
c
d
$ exit

The grep command will perform full buffering since its output goes to the sed command’s input.

Standard buffering

At this point I could come to the conclusion that grep is a stupid utility since it assumes that only a tty implies an interactive workload. My problem with this assumption is that my workaround script sits in limbo because of that.

That conclusion would be slightly incorrect because this assumption is made by the C runtime. There is a function called setvbuf() to control the buffering behavior and the defaults for the stdout stream are line-buffered for a tty and fully buffered otherwise.

I can even verify it with a simple program that will pipe stdin to stdout line by line:

#define _POSIX_C_SOURCE 200809L

#include <stdio.h>
#include <stdlib.h>

int
main(void)
{
	char *line = NULL;
	size_t n = 0;

	while (getline(&line, &n, stdin) >= 0)
		printf("%s", line);

	free(line);

	if (ferror(stdin)) {
		perror("getline");
		return (EXIT_FAILURE);
	}

	return (EXIT_SUCCESS);
}

I will call this program stupidcat to stay on brand and it will behave like grep regarding output buffering:

$ c99 -o stupidcat stupidcat.c
$ cat test.txt - | ./stupidcat
a
b
c
d
^D # hangs until I close cat's standard input
$ cat test.txt - | ./stupidcat | sed ''
^D # hangs until I close cat's standard input
a
b
c
d
$ exit

So as it stands, my stupidcat is as stupid as grep, but I could add an interactive mode that would enforce a specific buffering mode and not rely on defaults. That would slightly increase the amount of code in stupidcat.c but nothing scary or groundbreaking.

When sensible defaults turn stupid

My stupidcat is stupid by design, but it’s only here to illustrate that this behavior is not grep’s fault. So all I need now is to tell grep to do line buffering for my non-tty interactive needs.

And that’s where things become really unpleasant. There is no such option, and to be fair not only grep but also sed and cat have this problem. Of all the standard utilities I have used so far only cat has a -u option (which I assume stands for unbuffered).

Oddly enough, GNU cat defaults to unbuffered mode and as a result the -u options is only supported to be ignored:

$ cat test.txt - | cat | cat
a
b
c
d
^D # hangs until I close the leftmost cat's standard input
$ exit

While I tend to be unhappy about GNU extensions because they hurt portability, I’m also not surprised that GNU grep has a --line-buffered option, nor am I surprised that GNU sed has a -u option (sounds familiar) aliased to --unbuffered (sounds familiar) and in that case I have to be in the GNU camp even though more often than not I am in GNU’s camp, but I digress.

What appals me, is that POSIX 2017 still doesn’t have an option for grep and sed, and that the FUTURE DIRECTIONS section of the manuals simply says none. There might be other pipeline-friendly commands lacking the ability to change the buffering behavior, there is one that I know of that I will mention later.

A pseudo solution

Another solution would be to have a program that pretends to be a tty to make other commands buffer on a line-by-line basis. Such a program could openpty or forkpty a pseudoterminal and answer yes to the existential isatty question.

In other words my script could instead look like this:

nmcli device monitor wlp2s0 |
grep connected |
pty | # this line was added
while read line
do
	# disable iptables workaround
	if nmcli connection show --active | grep -q '^NAME\s'
	then
		# enable iptables workaround
	fi
done

It turns out though that such solutions exist like GNU stdbuf or the unbuffer command from the expect project but suddenly you need a TCL runtime. I’m not saying that those programs solve this by pretending to be a tty, but I’m definitely saying that I didn’t find a standard POSIX utility to solve this problem.

To solve this stupid problem, I can use grep --line-buffered connected and be done with it, but what if I want a portable solution to the general problem of interactive workloads not involving a tty?

An awk(ward) solution

A discussion on shell utilities mentioning grep and sed wouldn’t be complete if awk wasn’t somehow also covered given the overlap that can exist between the three commands, to which I will promptly remedy.

As promised, I found another utility that suffers from full buffering if its output is not a tty, and that command is awk. While awk is very different from grep and less but still quite different from sed, my solution was to emulate my grep command with a small awk script.

The WTF moment

A quick search quickly led me to the GNU awk documentation, and as usual you can always count on awk when a utility lets you down. According to the docs all my problems can be solved with the fflush() function:

fflush([filename])

Flush any buffered output associated with filename, which is either a file opened for writing or a shell command for redirecting output to a pipe or coprocess.

[…]

POSIX standardizes fflush() as follows: if there is no argument, or if the argument is the null string (""), then awk flushes the buffers for all open output files and pipes.

Wondering why searching for the “buf” pattern in the POSIX awk manual yields nothing this time I look for “flush” and still no result. I’m not sure where the GNU awk folks got the idea that this was in POSIX, maybe it was discussed and strongly considered, but it’s not in the POSIX 2017 docs.

More awkwardness

Since I was in the neighborhood I kept reading to find what knobs I had to control the standard output and quickly came across a very awkward section:

Output Statements

Both print and printf statements shall write to standard output by default. The output shall be written to the location specified by output_redirection if one is supplied, as follows:

> expression>> expression| expression

In all cases, the expression shall be evaluated to produce a string that is used as a pathname into which to write (for '>' or ">>") or as a command to be executed (for '|' ).

Despite the weird formatting on the POSIX end, I sensed that I was onto something so I check GNU awk and nawk manuals to come to the conclusion that I could emulate my grep statement like this:

awk '/connected/ {print | "cat"}'

In other words, for each line from the nmcli output awk forks a shell to execute the cat command and the cat command will naturally flush its output when it exits. And with that the cat is out of the bag, there is no global solution with awk and if you script has multiple print statements you’ll need to pipe them all! (or wrap that in a function.)

Closing words

I can’t remember when I last ran into buffering problems with a shell script, it must have been at least a few years. The difference between then and now is that nowadays I try to stick to portable code even when portability is not a requirement because I can always learn or rediscover something.

It’s a lot of waste to fork a shell and a cat process to emulate fflush(3) but that doesn’t bother me. The shell script doesn’t run a tight while loop and connecting to or disconnecting from a wireless network is not something that happens continuously. I also love how mentioning in a casual discussion that to solve a silly networking problem I forked a cat from shell in an awk script: it sounds like nonsensical plain English you could find in something like Alice in Wonderland, but I digress.

The GNU awk manual also states that forking a shell should be enough to flush:

awk '/connected/ {print; system("")}'

Even though nawk behaves similarly, and GNU awk flushes without forking a shell for maximum efficiency, I see nothing of the sorts in the POSIX awk manual, so I’ll stick to my guns and ignore this claim.

Lessons learned: non free firmwares are to be avoided like a plague and much like a plague it’s kinda hard to avoid, you can’t always gawk at a manual and trust what it says, and POSIX 2017 failed to identify a decades old common pitfall that can trap even experienced shell script authors.