Previously on Digressive Developer

In my previous post introducing my new tsess project I spent much more time exploring the Hare language and reference implementation tool chain than the actual introduction.

This was not an exploration in depth of Hare, but more like a wide surface look at everything that stuck in my mind in my tsess journey. One thing in particular failed, and if I had paid attention it shouldn’t have.

Sorting out dependencies

In order to find dependencies the script contained the following function:

find_mods() {
	echo rt # always built
	hare deps -T +linux+x86_64 -d "$1" |
	awk '$2 == "->" {print $3 " " $1}' | # "foo"; "bar"
	tr -d '";' | # foo bar
	grep -Fv .ha | # only modules
	grep -wv rt | # already built first
	tsort
}

It lists dependencies in the dot format and arrange the output to be processed by tsort. With this simple trick I can build dependencies in the right order.

Notice however how the dependency order is reversed with the awk script printing the dependency first and the dependent next. I reversed this order because initially I ran into a missing dependency doing it in the right order and I used tac to reverse the order:

find_mods() {
	echo rt # always built
	hare deps -T +linux+x86_64 -d "$1" |
	awk '$2 == "->" {print $1 " " $3}' | # "foo"; "bar"
	tr -d '";' | # foo bar
	grep -Fv .ha | # only modules
	grep -wv rt | # already built first
	tsort |
    tac
}

And this would result in this output:

$ ./hare_build.sh hello-print.ha && ./hello-print
Building mod rt
Building mod types
Building mod bytes
Building mod encoding::utf8
Building mod math
Building mod strings
Could not open module 'encoding::utf8': typedef variable $HARE_TD_encoding::utf8 not set

That’s when I noticed that I was exporting HARE_TD_encoding_utf8 and trying to export the expected variable would not work. I was also not too fond of using tac so I ended up reversing the tsort input with awk instead of its output with tac.

So when I got this result:

$ ./hare_build.sh hello-print.ha
Building mod rt
Building mod encoding::utf8
Building mod errors
Building mod types
Building mod math
Building mod bytes
Building mod time
Could not open module 'linux::vdso': typedef variable $HARE_TD_linux::vdso not set

My reaction should not have been to conclude that I had run into the wrong environment variable being exported (HARE_TD_linux_vdso) but really to ask myself where the devil this module comes from and why it wasn’t listed as a dependency.

I should have known that, because I tried to submit a tsort format for the hare deps subcommand but it got postponed until after a large update to the build driver. And I knew then that the dependencies listing is not recursive, I was planning to send a patch series eventually with it.

I tweaked the script to incrementally add missing dependencies as I was trying to sort them out and eventually managed to manually link my hello-print.ha program. One thing that bothered me though, was the somewhat specific order of .a files on the ld command line to get a result.

Exporting problems

One thing that is not possible in a POSIX shell but that can be done from a POSIX shell is exporting variables like HARE_TD_encoding::utf8 to the environment.

It can’t be done in the shell’s environment, but it can be done for harec’s environment. I attempted just that with the help of the env command, and it worked.

I ended up using a construct I’m not fond of:

env $HARE_TD harec -o "$td".ssa -N $(mod_ns "$1") -t $td.td $(mod_src "$1" ha)
[...]
HARE_TD="$HARE_TD HARE_TD_$1=$td.td"

But it had the merit to work.

Putting everything together

At this point I wasn’t sure whether compiling all modules would be enough to successfully link the program. The dependencies listing ended up looking like this:

find_mods() {
	for mod
	do
		hare deps -T +linux+x86_64 -d "$mod"
	done |
	awk '$2 == "->" {print $1 " " $3}' | # "foo" "bar";
	tr -d '";' | # foo bar
	tsort |
	grep -Fv "$1" # only dependencies
}

Instead of giving rt a special treatment here, the only thing to filter out is the module itself, the $1 parameter. The topological sort as populated by awk is back to the “natural” order.

The complete script looks like this:

#!/bin/sh

set -e
set -u

tag_filter() {
	grep -Pv '\+(?!linux|x86_64)'
}

mod_dir() {
	printf %s "/usr/src/hare/stdlib/$1" |
	sed 's|::|/|g'
}

mod_ns() {
	printf %s "$1" |
	sed 's|::|.|g'
}

mod_td() {
	printf %s "$1" |
	sed 's|::|_|g'
}

mod_src() {
	dir=$(mod_dir "$1")
	ext=$2
	find "$dir" -name "*.$ext" -type f |
	grep -v "$dir/[^+-].*/" | # do not enter submodules
	tag_filter
}

find_mods() {
	for mod
	do
		hare deps -T +linux+x86_64 -d "$mod"
	done |
	awk '$2 == "->" {print $1 " " $3}' | # "foo" "bar";
	tr -d '";' | # foo bar
	tsort |
	grep -Fv "$1" # only dependencies
}

build_mod() {
	td=$(mod_td "$1")

	test -f "$td.td" &&
	return

	for mod in $(find_mods "$1")
	do
		build_mod "$mod"
	done

	echo "Building mod $1"
	td=$(mod_td "$1")

	env $HARE_TD harec -o "$td".ssa -N $(mod_ns "$1") -t $td.td $(mod_src "$1" ha)
	qbe -o "$td".s "$td".ssa
	as -o "$td".o "$td".s

	for s in $(mod_src "$1" s)
	do
		as -o "$td-$(basename "$s" .s).o" "$s"
	done
	ar -r -c "$td".a "$td"*.o

	HARE_TD="$HARE_TD HARE_TD_$1=$td.td"
}

rm -f -- *.td *.ssa *.s *.o *.a

HARE_TD=
for mod in $(find_mods "$1") rt
do
	build_mod "$mod"
done

prog=$(basename "$1" .ha)

env $HARE_TD harec -o "$prog".ssa "$1"
qbe -o "$prog".s "$prog".ssa
as -o "$prog".o "$prog".s

ld.lld -T /usr/src/hare/stdlib/rt/hare.sc -o "$prog" "$prog".o *.a

To solve the *.a ordering problem I tried other linkers. The default was bfd, gold had the same problem, and lld appeared to do just fine so I picked the latter and never tried to find the right command line options to ask the other linkers to unconfuse themselves.

Lo and behold, it works:

$ ./hare_build.sh hello-print.ha
Building mod rt
Building mod errors
Building mod types
Building mod bytes
Building mod encoding::utf8
Building mod strings
Building mod io
Building mod memio
Building mod path
Building mod format::elf
Building mod linux
Building mod types::c
Building mod linux::vdso
Building mod math
Building mod time
Building mod fs
Building mod bufio
Building mod os
Building mod ascii
Building mod strconv
Building mod fmt
$ ./hello-print
hello

When I linked the program manually I was able to get away with less modules, but that made a difference of only 1kB on the final binary. If I add the --gc-sections option I get the exact same file size for both manual and scripted linking, but the programs are not identical.

I will not go down this rabbit hole, but I’m happy to report that it is possible to emulate the Hare tool chain with a shell script (and all the transparency that comes with it). I could probably also emulate hare deps instead of invoking it but I was merely interested in the build pipeline here.

Simple and accessible.