Updating my Hare tool chain
Considering the amount of spare time I can allocate to various projects I
decided early on to build my own
hare package
and only that to learn Hare by rewriting my long-gestated
tsess project. Both
qbe and harec
were available on Fedora, the
latter as a snapshot.
Only recently we finally had a
0.24.0 release
of hare
and harec
, and the long-awaited 1.2 release of qbe
. And qbe
is
the reason why I stuck for so long on an ancient commit for my hare
package.
With limited time to track local packages, I had to draw the line somewhere or
I would otherwise end up eating too much of my spare time on a regular basis.
Before the formal releases for the Hare tool chain, a release candidate of
harec
landed in Fedora Rawhide (currently f40), but relying on a snapshot
of qbe
in the absence of a proper release for features needed by harec
.
Luckily both qbe
and harec
packages would install just fine on Fedora 39
and after updating my local hare
packaging I was ready to update tsess
to “modern” Hare.
Cross compiling Hare code
Cross compilation is really easy when it comes to the hare build
command.
When I started learning Hare it was only a matter of adding a -t
option to
define the target architecture. Early on, I made my automake configuration
build upon autoconf’s --host
option. The configure script for tsess
would
look for a GCC tool chain for that architecture and fall back to a regular C
compiler.
It does not matter since the Hare tool chain produces static binaries, right?
I already covered this earlier…
Dynamic linking of Hare programs
Hare offers C interoperability, which sounds like a requirement in the systems
programming landscape. In that case it becomes possible to link with libc
and use a variety of services it provides, ranging from its memory allocator
to the resolution of domain or service names. To my knowledge, tsess
would
benefit from none of such services as of today, but it still creates new
opportunities.
With dynamic linking it becomes possible to find memory leaks with Valgrind for example. And unfortunately, I found some, but haven’t studied them yet.
It is rather easy to compile a dynamically linked tsess
:
$ ./configure HAREFLAGS=-lc >/dev/null
$ make
CCLD gen_cmd
GEN bin/tsess/cli/gen/cmd.ha
CCLD gen_env
GEN bin/tsess/config/gen/env_vars.ha
CCLD gen_prop
GEN bin/tsess/config/gen/prop_defs.ha
CCLD tsess
7/7 tasks completed (100%)
GEN man/tsess.1
GEN man/tsess.5
$ file * | awk -F '[:,] *' '$2 ~ "ELF" {printf "%s: %s, %s, %s\n", $1, $3, $5, $6}'
gen_cmd: x86-64, dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2
gen_env: x86-64, dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2
gen_prop: x86-64, dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2
tsess: x86-64, dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2
Cross compilation of tsess
with dynamic linking is a little more involved,
but not difficult on Fedora.
Cross compilation on Fedora
It is amazing to see how today’s tooling is making cross compilation so much easier. From package managers to emulation, it does not take much effort to get a working cross compilation environment on Fedora.
The first thing needed for cross compilation is a tool chain. As I stated before, by default the configure script will fall back to a regular C compiler which would work for Clang, but not GCC:
$ ./configure --host=aarch64-unknown-linux-gnu >/dev/null
$ grep ^CC= config.log
CC='gcc'
After installing the gcc-aarch64-linux-gnu
package the configure script will
find the right compiler out of the box:
$ ./configure --host=aarch64-unknown-linux-gnu >/dev/null
$ grep ^CC= config.log
CC='aarch64-linux-gnu-gcc'
In order to link with glibc, I need it installed in the tool chain’s sysroot, and the cross compiler has one configured by default:
$ gcc --print-sysroot
$ aarch64-linux-gnu-gcc --print-sysroot
/usr/aarch64-linux-gnu/sys-root
If you are wondering what the package ships in its sysroot, the answer is nothing:
$ rpm -ql gcc-aarch64-linux-gnu | grep sys-root
/usr/aarch64-linux-gnu/sys-root
The sys-root/
directory is empty, I need to populate it.
Creating a Fedora sysroot on Fedora
At the heart of Fedora we find several critical components, several of them
constitute its software distribution tool chain. Fedora is an RPM-based Linux
distribution, and the actual software distribution is operated by DNF. DNF
introduces the notion of remote package repositories. I would install the
gcc-aarch64-linux-gnu
package with DNF if it wasn’t a dependency of my
homemade hare
package in the first place.
On top of it, another package called mock
can be used to “take source RPMs
and build them in a chroot
”. Mock does a lot more but that’s literally how
it is described today:
$ rpm -q --qf '%{description}\n' mock
Mock takes an SRPM and builds it in a chroot.
Mock can help set up a sysroot for a limited number of architectures. As of
today that would be aarch64, i686, ppc64le, s390x and of course x86-64. As the
description suggests, mock is backed by the chroot
facility (not sure which
one between the chroot(2)
system call and chroot(1)
program). Or rather,
it was backed by chroot
also known as its “simple” isolation backend. It now
defaults to systemd-nspawn
and can perform the bootstrap from an OCI image
with podman
(instead of installing all the packages from scratch).
In addition to this, DNF can work with packages on different architectures with the help of QEMU.
In other words, I can create a Fedora sysroot with a single command line
thanks to mock
. In fact, I can even create sysroots for other RPM-based
distributions:
$ ls /etc/mock/templates/ | awk -F- '{print $1}' | sort | uniq
almalinux
amazonlinux
anolis
centos
circlelinux
custom
epel
eurolinux
fedora
mageia
navy
openeuler
openmandriva
opensuse
oraclelinux
rhel
rocky
I can imagine the amount of work it used to require to create and maintain a sysroot when we didn’t have such well-integrated tools.
For the sake of the exercise, I wanted to link tsess
to both libc
and
libcurl
. There is nothing in tsess
requiring anything from libcurl
, but
it shows how easy it is to bring up, and how fast it happens:
$ sudo rm -rf /var/*/mock/fedora-39-aarch64*/
$ time mock -r fedora-39-aarch64 --install glibc-devel curl-devel
INFO: Unable to build arch aarch64 natively on arch x86_64. Setting forcearch to use software emulation.
[...]
INFO:
Finish: run
real 4m59.646s
user 4m45.281s
sys 0m13.513s
I couldn’t find how to really remove the OCI image with podman
, it appeared
to still be present after removing and pruning it, but regardless, in a few
minutes I have a working sysroot. Most of the time spent was on downloading
metadata and packages, on a DSL connection.
I ended up with a sysroot after next to no manual operations.
The only problems I have with the package management tool chain on Fedora is its lack of good and meaningful names. RPM is fine, it used to carry meaning before being turned into a backronym, but by that time it was well… well known? Then Yum, meaningless, but established. Why did DNF (is meaninglesser a word?) not keep the Yum name? Why is it still called Yum on RHEL? I don’t need an answer, that was a rhetorical question. And mock? Who calls a piece of software that has no relation to software testing mock? Probably somebody who wants to make sure it does not show up in search engine results.
I get it though, naming things is pretty damn hard, but it’s really a shame because as demonstrated, mock is a powerful tool.
Then again, it seems that meaningless names are the norm when it comes to package managers. Either that, or I don’t have the reference. The outcome is the same. See for example apt/aptitude, sbuild, pacman and zypper…
Cross compiling tsess
Now if I want to compile tsess
for aarch64 on my x86-64 laptop I need to
shoehorn my sysroot into my build system somehow. I could do it at configure
time, and I actually documented how to do this. My preferred solution is to
populate /usr/aarch64-linux-gnu/sys-root
directly.
However, I treat /usr
as immutable, so I don’t want to add anything inside.
The simplest solution I found requires privileges, but is otherwise free of
friction:
$ ls -1 /usr/aarch64-linux-gnu/sys-root
$ sudo mount --bind /var/lib/mock/fedora-39-aarch64/root /usr/aarch64-linux-gnu/sys-root
$ ls -1 /usr/aarch64-linux-gnu/sys-root
afs
bin
boot
builddir
dev
etc
home
installation-homedir
lib
lib64
media
mnt
opt
proc
root
run
sbin
srv
sys
tmp
usr
var
I’m now ready to build a tsess
dynamically linked to both libc
and
libcurl
, even though only libc
is relevant:
$ ./configure --host=aarch64-unknown-linux-gnu HAREFLAGS='-lc -lcurl' >/dev/null
$ make
CCLD gen_cmd
GEN bin/tsess/cli/gen/cmd.ha
CCLD gen_env
GEN bin/tsess/config/gen/env_vars.ha
CCLD gen_prop
GEN bin/tsess/config/gen/prop_defs.ha
CCLD tsess
13/13 tasks completed (100%)
GEN man/tsess.1
GEN man/tsess.5
$ file * | awk -F '[:,] *' '$2 ~ "ELF" {printf "%s: %s, %s, %s\n", $1, $3, $5, $6}'
gen_cmd: x86-64, dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2
gen_env: x86-64, dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2
gen_prop: x86-64, dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2
tsess: ARM aarch64, dynamically linked, interpreter /lib/ld-linux-aarch64.so.1
Only tsess
is built for the aarch64 target because the other programs are
executed during the build.
We can easily check that the Hare flags were honored:
$ readelf --dynamic tsess | grep NEEDED
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
0x0000000000000001 (NEEDED) Shared library: [libcurl.so.4]
Wrapping up
In the end, I find amazing how easy it is to cross compile a dynamically
linked tsess
, so easy it feels like cheating. It really boils down to a
handful of commands:
- mock
- mount
- configure
- make
And that’s really it! It could have been a very short post, but I couldn’t help sprinkle a little digression here and there.
This apparent simplicity builds upon a huge pile of complications:
- the autotools (autoconf, automake and libtool in this case)
- years of RPM history (the whole tool chain, including pieces I omitted)
- years of packaging refinement for cross GCC tool chains
- years of Linux kernel history (everything
podman
relies on)
The effective simplicity comes from some degree of cohesion between all the moving parts. Catch me on a bad day and I will describe it as a high degree of coupling between components poorly cobbled together, but the cohesion is still there regardless.
For example, on my machine RPM ships with a %configure
macro that takes care
of applying system-wide configure flags recognized by any package relying on
the autotools. I greatly suspect that the packaging of cross compilers evolved
following autoconf’s conventions, and probably influenced autoconf in return.
The bottom line is that for this use case I only need to configure the bare minimum at each step:
- tell mock to install my build dependencies in an aarch64 root
- bind the aarch64 root directory to the cross compiler sysroot
- tell the
configure
script to target aarch64 - tell
hare
to link withlibc
(and for fun,libcurl
)
And once again, that’s really it!
It’s amazing to see how our tools evolved and to see the concrete effect of
iterative refinements. The Hare tool chain is still young but it is already
quite usable and it shows that lessons from the past were effectively learned.
It also didn’t take too much effort from me to find a crowbar and retrofit it
in my familiar ecosystem, and just like that I can rebuild tsess
in mere
minutes and deploy it to the ARM board I occasionally SSH into.
And that’s a wrap for a yak that won’t need a shaving for a while.