Previously on Digressive Developer
In my previous post introducing my new tsess project I spent much more time exploring the Hare language and reference implementation tool chain than the actual introduction.
This was not an exploration in depth of Hare, but more like a wide surface
look at everything that stuck in my mind in my tsess
journey. One thing in
particular failed, and if I had paid attention it shouldn’t have.
Sorting out dependencies
In order to find dependencies the script contained the following function:
find_mods() {
echo rt # always built
hare deps -T +linux+x86_64 -d "$1" |
awk '$2 == "->" {print $3 " " $1}' | # "foo"; "bar"
tr -d '";' | # foo bar
grep -Fv .ha | # only modules
grep -wv rt | # already built first
tsort
}
It lists dependencies in the dot
format and arrange the output to be processed by tsort
. With this simple
trick I can build dependencies in the right order.
Notice however how the dependency order is reversed with the awk
script
printing the dependency first and the dependent next. I reversed this order
because initially I ran into a missing dependency doing it in the right order
and I used tac
to reverse the order:
find_mods() {
echo rt # always built
hare deps -T +linux+x86_64 -d "$1" |
awk '$2 == "->" {print $1 " " $3}' | # "foo"; "bar"
tr -d '";' | # foo bar
grep -Fv .ha | # only modules
grep -wv rt | # already built first
tsort |
tac
}
And this would result in this output:
$ ./hare_build.sh hello-print.ha && ./hello-print
Building mod rt
Building mod types
Building mod bytes
Building mod encoding::utf8
Building mod math
Building mod strings
Could not open module 'encoding::utf8': typedef variable $HARE_TD_encoding::utf8 not set
That’s when I noticed that I was exporting HARE_TD_encoding_utf8
and trying
to export the expected variable would not work. I was also not too fond of
using tac
so I ended up reversing the tsort
input with awk
instead of
its output with tac
.
So when I got this result:
$ ./hare_build.sh hello-print.ha
Building mod rt
Building mod encoding::utf8
Building mod errors
Building mod types
Building mod math
Building mod bytes
Building mod time
Could not open module 'linux::vdso': typedef variable $HARE_TD_linux::vdso not set
My reaction should not have been to conclude that I had run into the wrong
environment variable being exported (HARE_TD_linux_vdso
) but really to ask
myself where the devil this module comes from and why it wasn’t listed as a
dependency.
I should have known that, because I tried to submit a tsort
format for the
hare deps
subcommand but it got postponed until after a large update to the
build driver. And I knew then that the dependencies listing is not recursive,
I was planning to send
a patch series eventually with it.
I tweaked the script to incrementally add missing dependencies as I was trying
to sort them out and eventually managed to manually link my hello-print.ha
program. One thing that bothered me though, was the somewhat specific order of
.a
files on the ld
command line to get a result.
Exporting problems
One thing that is not possible in a POSIX shell but that can be done from a
POSIX shell is exporting variables like HARE_TD_encoding::utf8
to the
environment.
It can’t be done in the shell’s environment, but it can be done for harec
’s
environment. I attempted just that with the help of the env
command, and it
worked.
I ended up using a construct I’m not fond of:
env $HARE_TD harec -o "$td".ssa -N $(mod_ns "$1") -t $td.td $(mod_src "$1" ha)
[...]
HARE_TD="$HARE_TD HARE_TD_$1=$td.td"
But it had the merit to work.
Putting everything together
At this point I wasn’t sure whether compiling all modules would be enough to successfully link the program. The dependencies listing ended up looking like this:
find_mods() {
for mod
do
hare deps -T +linux+x86_64 -d "$mod"
done |
awk '$2 == "->" {print $1 " " $3}' | # "foo" "bar";
tr -d '";' | # foo bar
tsort |
grep -Fv "$1" # only dependencies
}
Instead of giving rt
a special treatment here, the only thing to filter out
is the module itself, the $1
parameter. The topological sort as populated by
awk
is back to the “natural” order.
The complete script looks like this:
#!/bin/sh
set -e
set -u
tag_filter() {
grep -Pv '\+(?!linux|x86_64)'
}
mod_dir() {
printf %s "/usr/src/hare/stdlib/$1" |
sed 's|::|/|g'
}
mod_ns() {
printf %s "$1" |
sed 's|::|.|g'
}
mod_td() {
printf %s "$1" |
sed 's|::|_|g'
}
mod_src() {
dir=$(mod_dir "$1")
ext=$2
find "$dir" -name "*.$ext" -type f |
grep -v "$dir/[^+-].*/" | # do not enter submodules
tag_filter
}
find_mods() {
for mod
do
hare deps -T +linux+x86_64 -d "$mod"
done |
awk '$2 == "->" {print $1 " " $3}' | # "foo" "bar";
tr -d '";' | # foo bar
tsort |
grep -Fv "$1" # only dependencies
}
build_mod() {
td=$(mod_td "$1")
test -f "$td.td" &&
return
for mod in $(find_mods "$1")
do
build_mod "$mod"
done
echo "Building mod $1"
td=$(mod_td "$1")
env $HARE_TD harec -o "$td".ssa -N $(mod_ns "$1") -t $td.td $(mod_src "$1" ha)
qbe -o "$td".s "$td".ssa
as -o "$td".o "$td".s
for s in $(mod_src "$1" s)
do
as -o "$td-$(basename "$s" .s).o" "$s"
done
ar -r -c "$td".a "$td"*.o
HARE_TD="$HARE_TD HARE_TD_$1=$td.td"
}
rm -f -- *.td *.ssa *.s *.o *.a
HARE_TD=
for mod in $(find_mods "$1") rt
do
build_mod "$mod"
done
prog=$(basename "$1" .ha)
env $HARE_TD harec -o "$prog".ssa "$1"
qbe -o "$prog".s "$prog".ssa
as -o "$prog".o "$prog".s
ld.lld -T /usr/src/hare/stdlib/rt/hare.sc -o "$prog" "$prog".o *.a
To solve the *.a
ordering problem I tried other linkers. The default was
bfd
, gold
had the same problem, and lld
appeared to do just fine so I
picked the latter and never tried to find the right command line options to
ask the other linkers to unconfuse themselves.
Lo and behold, it works:
$ ./hare_build.sh hello-print.ha
Building mod rt
Building mod errors
Building mod types
Building mod bytes
Building mod encoding::utf8
Building mod strings
Building mod io
Building mod memio
Building mod path
Building mod format::elf
Building mod linux
Building mod types::c
Building mod linux::vdso
Building mod math
Building mod time
Building mod fs
Building mod bufio
Building mod os
Building mod ascii
Building mod strconv
Building mod fmt
$ ./hello-print
hello
When I linked the program manually I was able to get away with less modules,
but that made a difference of only 1kB on the final binary. If I add the
--gc-sections
option I get the exact same file size for both manual and
scripted linking, but the programs are not identical.
I will not go down this rabbit hole, but I’m happy to report that it is
possible to emulate the Hare tool chain with a shell script (and all the
transparency that comes with it). I could probably also emulate hare deps
instead of invoking it but I was merely interested in the build pipeline here.
Simple and accessible.