Updating my Hare tool chain

Considering the amount of spare time I can allocate to various projects I decided early on to build my own hare package and only that to learn Hare by rewriting my long-gestated tsess project. Both qbe and harec were available on Fedora, the latter as a snapshot.

Only recently we finally had a 0.24.0 release of hare and harec, and the long-awaited 1.2 release of qbe. And qbe is the reason why I stuck for so long on an ancient commit for my hare package. With limited time to track local packages, I had to draw the line somewhere or I would otherwise end up eating too much of my spare time on a regular basis.

Before the formal releases for the Hare tool chain, a release candidate of harec landed in Fedora Rawhide (currently f40), but relying on a snapshot of qbe in the absence of a proper release for features needed by harec. Luckily both qbe and harec packages would install just fine on Fedora 39 and after updating my local hare packaging I was ready to update tsess to “modern” Hare.

Cross compiling Hare code

Cross compilation is really easy when it comes to the hare build command. When I started learning Hare it was only a matter of adding a -t option to define the target architecture. Early on, I made my automake configuration build upon autoconf’s --host option. The configure script for tsess would look for a GCC tool chain for that architecture and fall back to a regular C compiler.

It does not matter since the Hare tool chain produces static binaries, right?

I already covered this earlier…

Dynamic linking of Hare programs

Hare offers C interoperability, which sounds like a requirement in the systems programming landscape. In that case it becomes possible to link with libc and use a variety of services it provides, ranging from its memory allocator to the resolution of domain or service names. To my knowledge, tsess would benefit from none of such services as of today, but it still creates new opportunities.

With dynamic linking it becomes possible to find memory leaks with Valgrind for example. And unfortunately, I found some, but haven’t studied them yet.

It is rather easy to compile a dynamically linked tsess:

$ ./configure HAREFLAGS=-lc >/dev/null
$ make
  CCLD     gen_cmd
  GEN      bin/tsess/cli/gen/cmd.ha
  CCLD     gen_env
  GEN      bin/tsess/config/gen/env_vars.ha
  CCLD     gen_prop
  GEN      bin/tsess/config/gen/prop_defs.ha
  CCLD     tsess
7/7 tasks completed (100%)
  GEN      man/tsess.1
  GEN      man/tsess.5
$ file * | awk -F '[:,] *' '$2 ~ "ELF" {printf "%s: %s, %s, %s\n", $1, $3, $5, $6}'
gen_cmd: x86-64, dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2
gen_env: x86-64, dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2
gen_prop: x86-64, dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2
tsess: x86-64, dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2

Cross compilation of tsess with dynamic linking is a little more involved, but not difficult on Fedora.

Cross compilation on Fedora

It is amazing to see how today’s tooling is making cross compilation so much easier. From package managers to emulation, it does not take much effort to get a working cross compilation environment on Fedora.

The first thing needed for cross compilation is a tool chain. As I stated before, by default the configure script will fall back to a regular C compiler which would work for Clang, but not GCC:

$ ./configure --host=aarch64-unknown-linux-gnu >/dev/null
$ grep ^CC= config.log

After installing the gcc-aarch64-linux-gnu package the configure script will find the right compiler out of the box:

$ ./configure --host=aarch64-unknown-linux-gnu >/dev/null
$ grep ^CC= config.log

In order to link with glibc, I need it installed in the tool chain’s sysroot, and the cross compiler has one configured by default:

$ gcc --print-sysroot
$ aarch64-linux-gnu-gcc --print-sysroot

If you are wondering what the package ships in its sysroot, the answer is nothing:

$ rpm -ql gcc-aarch64-linux-gnu | grep sys-root

The sys-root/ directory is empty, I need to populate it.

Creating a Fedora sysroot on Fedora

At the heart of Fedora we find several critical components, several of them constitute its software distribution tool chain. Fedora is an RPM-based Linux distribution, and the actual software distribution is operated by DNF. DNF introduces the notion of remote package repositories. I would install the gcc-aarch64-linux-gnu package with DNF if it wasn’t a dependency of my homemade hare package in the first place.

On top of it, another package called mock can be used to “take source RPMs and build them in a chroot”. Mock does a lot more but that’s literally how it is described today:

$ rpm -q --qf '%{description}\n' mock
Mock takes an SRPM and builds it in a chroot.

Mock can help set up a sysroot for a limited number of architectures. As of today that would be aarch64, i686, ppc64le, s390x and of course x86-64. As the description suggests, mock is backed by the chroot facility (not sure which one between the chroot(2) system call and chroot(1) program). Or rather, it was backed by chroot also known as its “simple” isolation backend. It now defaults to systemd-nspawn and can perform the bootstrap from an OCI image with podman (instead of installing all the packages from scratch).

In addition to this, DNF can work with packages on different architectures with the help of QEMU.

In other words, I can create a Fedora sysroot with a single command line thanks to mock. In fact, I can even create sysroots for other RPM-based distributions:

$ ls /etc/mock/templates/ | awk -F- '{print $1}' | sort | uniq

I can imagine the amount of work it used to require to create and maintain a sysroot when we didn’t have such well-integrated tools.

For the sake of the exercise, I wanted to link tsess to both libc and libcurl. There is nothing in tsess requiring anything from libcurl, but it shows how easy it is to bring up, and how fast it happens:

$ sudo rm -rf /var/*/mock/fedora-39-aarch64*/
$ time mock -r fedora-39-aarch64 --install glibc-devel curl-devel
INFO: Unable to build arch aarch64 natively on arch x86_64. Setting forcearch to use software emulation.
Finish: run

real	4m59.646s
user	4m45.281s
sys	0m13.513s

I couldn’t find how to really remove the OCI image with podman, it appeared to still be present after removing and pruning it, but regardless, in a few minutes I have a working sysroot. Most of the time spent was on downloading metadata and packages, on a DSL connection.

I ended up with a sysroot after next to no manual operations.

The only problems I have with the package management tool chain on Fedora is its lack of good and meaningful names. RPM is fine, it used to carry meaning before being turned into a backronym, but by that time it was well… well known? Then Yum, meaningless, but established. Why did DNF (is meaninglesser a word?) not keep the Yum name? Why is it still called Yum on RHEL? I don’t need an answer, that was a rhetorical question. And mock? Who calls a piece of software that has no relation to software testing mock? Probably somebody who wants to make sure it does not show up in search engine results.

I get it though, naming things is pretty damn hard, but it’s really a shame because as demonstrated, mock is a powerful tool.

Then again, it seems that meaningless names are the norm when it comes to package managers. Either that, or I don’t have the reference. The outcome is the same. See for example apt/aptitude, sbuild, pacman and zypper…

Cross compiling tsess

Now if I want to compile tsess for aarch64 on my x86-64 laptop I need to shoehorn my sysroot into my build system somehow. I could do it at configure time, and I actually documented how to do this. My preferred solution is to populate /usr/aarch64-linux-gnu/sys-root directly.

However, I treat /usr as immutable, so I don’t want to add anything inside. The simplest solution I found requires privileges, but is otherwise free of friction:

$ ls -1 /usr/aarch64-linux-gnu/sys-root
$ sudo mount --bind /var/lib/mock/fedora-39-aarch64/root /usr/aarch64-linux-gnu/sys-root
$ ls -1 /usr/aarch64-linux-gnu/sys-root

I’m now ready to build a tsess dynamically linked to both libc and libcurl, even though only libc is relevant:

$ ./configure --host=aarch64-unknown-linux-gnu HAREFLAGS='-lc -lcurl' >/dev/null
$ make
  CCLD     gen_cmd
  GEN      bin/tsess/cli/gen/cmd.ha
  CCLD     gen_env
  GEN      bin/tsess/config/gen/env_vars.ha
  CCLD     gen_prop
  GEN      bin/tsess/config/gen/prop_defs.ha
  CCLD     tsess
13/13 tasks completed (100%)
  GEN      man/tsess.1
  GEN      man/tsess.5
$ file * | awk -F '[:,] *' '$2 ~ "ELF" {printf "%s: %s, %s, %s\n", $1, $3, $5, $6}'
gen_cmd: x86-64, dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2
gen_env: x86-64, dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2
gen_prop: x86-64, dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2
tsess: ARM aarch64, dynamically linked, interpreter /lib/ld-linux-aarch64.so.1

Only tsess is built for the aarch64 target because the other programs are executed during the build.

We can easily check that the Hare flags were honored:

$ readelf --dynamic tsess | grep NEEDED
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libcurl.so.4]

Wrapping up

In the end, I find amazing how easy it is to cross compile a dynamically linked tsess, so easy it feels like cheating. It really boils down to a handful of commands:

  • mock
  • mount
  • configure
  • make

And that’s really it! It could have been a very short post, but I couldn’t help sprinkle a little digression here and there.

This apparent simplicity builds upon a huge pile of complications:

  • the autotools (autoconf, automake and libtool in this case)
  • years of RPM history (the whole tool chain, including pieces I omitted)
  • years of packaging refinement for cross GCC tool chains
  • years of Linux kernel history (everything podman relies on)

The effective simplicity comes from some degree of cohesion between all the moving parts. Catch me on a bad day and I will describe it as a high degree of coupling between components poorly cobbled together, but the cohesion is still there regardless.

For example, on my machine RPM ships with a %configure macro that takes care of applying system-wide configure flags recognized by any package relying on the autotools. I greatly suspect that the packaging of cross compilers evolved following autoconf’s conventions, and probably influenced autoconf in return.

The bottom line is that for this use case I only need to configure the bare minimum at each step:

  • tell mock to install my build dependencies in an aarch64 root
  • bind the aarch64 root directory to the cross compiler sysroot
  • tell the configure script to target aarch64
  • tell hare to link with libc (and for fun, libcurl)

And once again, that’s really it!

It’s amazing to see how our tools evolved and to see the concrete effect of iterative refinements. The Hare tool chain is still young but it is already quite usable and it shows that lessons from the past were effectively learned. It also didn’t take too much effort from me to find a crowbar and retrofit it in my familiar ecosystem, and just like that I can rebuild tsess in mere minutes and deploy it to the ARM board I occasionally SSH into.

And that’s a wrap for a yak that won’t need a shaving for a while.