Best wishes for the new year(s)
2024 is finally here, and a new entry in this blog was long overdue. It’s not that I had nothing to write about, I always have something silly in store I could mention here. I simply had little time to spare, and chose to do other things on my spare time.
In these last three and a half years I have kept up to date with many things, which is an implicit requirement when choosing Fedora as an operating system but one thing I failed to uphold was the maintenance of this blog. It was a little bit painful to deal with updates to Hugo and the Jane theme, but I somehow succeeded.
Since January is the time for new year’s resolution, this years I made up two for myself:
- migrate my development workflow to tsess
- write at least one blog post per year
I’m so good at upholding resolutions that I already succeeded both. I can go back to extreme procrastination with a good conscience. See you next year.
Introducing tsess?
In 2023 I discovered the Hare programming language when a FOSDEM presentation was brought to my attention, although admittedly I don’t remember how. I suspect it was from LWN because I was lagging far behind on this feed, and nowadays I’m not even trying to follow it, but I digress.
My first reaction was of course to compare it to Rust, a language I have been following since version 0.4 as a serious contender for C. The philosophy behind Hare matches many opinions I forged over time, and as it turned out, more than Rust. So I started reading about Hare and playing with it until it was time, six months later, to have a “real” project to effectively learn it.
I started musing about this tsess
project when the
tmuxinator package was orphaned in
Fedora and eventually removed from the repositories. In late 2022, two years
after maintaining my own RPMs for tmuxinator and its dependencies outside of
Fedora, I started tsess
in Rust and failed in the context of self-imposed
constraints. In August 2023 I decided to try again in Hare.
In 2023 I also authored my first Go project for work and
among other things it involved crafting a simple parser, which is most of what
I have done in Rust so far. The Hare tool chain shows similarities with Go’s
and the Hare language itself looks like C enhanced by concepts found in both
Rust and Go. Rewriting tsess
in Hare became all the more interesting for me
in that context.
And in the common Internet tradition, I have to explain why Vim is better than Emacs, or the other way around.
The Rust-Go-Hare showdown
Rust
I have been following Rust’s evolution for years now but never worked on a real project doing actual work in production somewhere. I really love Rust so all the negative things I’m going to say about it must be framed in this context: I really love Rust.
In a nutshell, Rust is to me a language that encodes resource management in its type system (and to some extent its standard library) and effectively turns the compiler into a static analyzer. For forcing me to think harder about ownership I am forever grateful.
Rust also has relatively straightforward C interoperability but despite starting as a C-like language like so many others it embraces functional programming constructs. The standard library is by design wide, providing a lot of variants for a lot of features. Likewise, the syntax sometimes offers several ways radically different to achieve the same thing and interestingly the Clippy linter will steer you towards certain constructs.
Rust forces you to handle return values, elegantly generalizes null handling
with its Option
type and offers a pleasant higher-level error management
with the Result
type. It is also expression-oriented, which is refreshing
coming from C.
In comparison, C is a simple statement-oriented imperative language. The
simplicity of C is downright impressive, but the language is sometimes too
simplistic and overall plagued with undefined behavior. Rust on the other
hand is too complicated. I call it an imperacryptive language and my
inability to fit it in my head is a problem, even assisted by tools. Mind
you, there are some bits of C equally cryptic like the rules for struct
layouts or the mere ability to come up with Duff’s device but the things
difficult to remember (for me) about C are things I rarely need or actively
avoid (unfortunately you can’t just look away from undefined behavior).
I’m also not in line with the Rust community philosophy of having many crates
“doing one thing and doing it well” à la UNIX™. I don’t like that cargo
is
both a dependency manager and a build system, this is the main ingredient in
the recipe for supply chain attacks and generally a lack of separation of two
critical concerns. Cargo also poorly composes with polyglot projects, like any
language’s self-centered all-encompassing tool chain. This is true of Go and
Hare as well, but to a lesser extent for the latter. I became familiar with
this problem in a past life working on Java projects built with the Maven
build system, but don’t get me started on this one or I will dial digressions
up to eleven.
Go
Go in comparison is much simpler than Rust, and has an overall pleasant look and feel. Rust has a rather heavily ceremonial syntax and in comparison Go is very straightforward.
Like Rust, Go has very interesting design choices, but my main problem is the amount of magic that goes behind the scenes. Rust is certainly not devoid of magic constructs but in general I know what I’m doing or trying to do. With Go most of the time all bets are off, as far as I’m concerned. It should be noted that I never tried to formally learn either Rust or Go, unlike Hare. In Go’s defense, before this first work project I had spent much less time looking at it than say, Rust.
Error handling in Go is interesting too, but not to my liking. Still, I tried
to adhere to the coding standard and do things the Go way because I was going
to ultimately hand over this code to someone else for future maintenance, but
then I ran into exotic error handling in the text/scanner
package I relied
on for my parser. Navigating the standard library was not always trivial. For
example do I use net
package to work with CIDRs? What about the net/netip
package?
The Go tool chain is clearly the worst of the three in my ranking. The way Go manages dependencies as Git repositories is horrendous. At least Cargo works with actual “crates” repositories. I understand some of the constraints behind the source-only compilation of mostly-statically-linked programs for both Rust and Go (and Hare!) but it doesn’t mean that I approve. At least Rust solves source code incompatibilities with an interesting system of editions. In Go, the resulting dependency hell appears to be universally solved with vendoring, and well, to me this is only adding one circle to this inferno. This is giving a license to software maintainers to not carefully design interfaces. Can’t just upgrade to the latest version of X, or need a specific patch? Lock your dependency or vendor it.
Oh, and I don’t like ending up with a hello world weighing as much as a web browser once compiled. I may be exaggerating a little here, but man, Go programs are anything but lean.
Hare
Now Hare is a beast of its own, but one I found incredibly easy to tame. Hare looks like C, with Rust-like type decomposition, a Go-like tool chain, and everything pretty much toned down.
Hare is expression-oriented like Rust, and unlike Go. While the syntax looks ceremonial, with for example code blocks delimited by curly braces requiring a semi-colon after the closing brace, once you realize the block is just another expression it makes sense. So far Hare has the most consistent syntax of the three.
In fact, Hare comes with a formal
specification that I spent a lot of
time reading on an on and off basis. In the roughly six month between the time
I discovered Hare and the time I started writing tsess
I had probably read
half of the document. I’m not sure whether this is me being more mature, the
Hare specification being more accessible (than say, the ISO C standards) or a
case of all of the above, but it was instrumental helping me fit Hare in my
brain.
Between the specification and overall consistency of the language design, Hare managed to enter my slow brain faster than any other programming language I can remember learning. I found one minor inconsistency not worth mentioning, but I may revisit it in the future.
While consistency probably plays a role, my favorite aspect of Hare is the emphasis on simplicity. I feel that nowadays simplicity is underappreciated in the software landscape. Making and keeping a non-trivial system simple is no small fit. In that regard Rust failed utterly, keeping the core language small and simple, but not the language we see, the internal representation, the one the compiler “desugars” actual Rust code to. I think I eventually had a sugar overdose that distanced myself from Rust. On the other hand of the spectrum C is so simple it tips over to being too simplistic in some areas. Meanwhile, Go has too much magic and I wasn’t able to get a good grasp on all the code I wrote.
Hare appears to draw inspiration from Rust in the error handling department.
But instead of forcing an arbitrary Result
type down our throat, it enables
any type to become an error by suffixing it with a !
character. Like Rust,
Hare requires error handling. Unlike Hare, Rust is explicit about handling
everything (I mentioned Option
already). Suffixing an expression with !
tells the compiler that it should be fail-safe, or otherwise abort. The ?
suffix tells the compiler to return from the function if the return value is
an error.
One thing I would like to see added is a _
suffix to ignore the value of an
expression, and by extension the error cases, effectively casting it to void
:
if (something == wrong) {
fmt::fprint(os::stderr, "Something went wrong!\n")_;
os::exit(STATUS_SOMETHING_WRONG);
};
Ignoring a potential failure to write the error message would ensure that the programs exits with the desired status, instead of aborting.
This can be done like this:
let ign = fmt::fprint(os::stderr, "Something went wrong!\n");
But I’m not sure whether the specification will evolve in a stricter direction where for example you can’t have unused variables. The problem with the PDF specification draft is the lack of git commit to review changes since the revision I downloaded. I can probably infer it to roughly a year back.
It also becomes unwieldy to ignore multiple expressions in the same scope,
something that a _
suffix could elegantly solve while still being explicit.
Something you would catch in a code review to dispute or argue.
Now here is something to watch, I just had an idea and I need to follow my intuition right now. I will be back soon:
I’m back, thank you for your patience. Complaining about the unwieldiness of
the construct above and my suggestion to make _
cast away to void
gave me
the idea to try just that:
fmt::fprint(os::stderr, "Something went wrong!\n"): void;
And well, it does what I need the way I suggested it without polluting the code, so I think I have no further complaints about error handling.
Damn.
Moving on, another “problem” with Hare is that it’s still under development, so things change. I have been there since Rust 0.4, but I’m also stuck on an ancient version of Hare because I decided to use the qbe package available on Fedora.
One concrete example is @noreturn
functions, or in more recent versions of
Hare, functions returning never
:
fn some_function(arg: some_type) (void | error) = {
let obj = foo::allocate(arg);
defer foo::finish(&obj);
bar::do_some(&obj)?;
bar::do_more(&obj)?;
bar::prepare(&obj)?;
os::exec(&obj.cmd);
};
The defer
keyword (inspired by Go?) ensures that obj
is freed if one of
the three functions from the bar
module fails. The problem is that it also
ensures that it is freed before calling os::exec()
that is not supposed to
return. I haven’t been able to ponder this much, but I can see cases where a
defer
expression should be evaluated even when the program will not return,
but I don’t know what kind of syntax could express both cases in a meaningful
way. Maybe an always
keyword to match the never
type?
defer foo::rm_tmp_dir(): always;
One thing I love in Hare is the parity between logical and bitwise boolean operators. There are gaps like this or the lack of explicit endianness support in the C standards that I find surprising to still have to this day. There is however one boolean operator that does not have a counterpart:
if (res is error) {
// do something
};
There is a match
keyword similar to Rust’s, not as ergonomic, but still very
usable. Sometimes it is simpler to go with the is
operator and match
is
arguably overkill. I understand that is
composes just fine with !
but for
this very case I would arbitrate in favor of a little duplication:
if (!(this is less_readable)) {
// do something
};
// vs
if (expr not convoluted) {
// do something
};
I also really like that break
and continue
keywords only operate on loops,
and it’s probably for the best that falling through switch
or match
cases
is not possible. The multi-modal for
loop is bold, but so far there has
always been a mode that fits my needs. There’s just one thing that irritates
me about both Rust, Go and Hare: what is wrong about the do-while style of
loops? I’d love to see a single-mode do expression for ( expression )
kind
of loop.
While I could complain about the Hare tool chain for being of the kind I do
not like, I would rather point out that the hare
command, unlike go
and
cargo
is a build driver without dependency management. It will only discover
Hare modules and not endeavor to fetch them or introduce a new “standard” form
of repository. And this is not by accident or lack of implementation so far,
this is a
stated
design choice:
Dependencies
- We have a module system!
- But no package manager
- Use your distro’s package manager
- Choose your dependencies conservatively
- stdlib gets you most of the way there
How refreshing!
If I had to pick one word to summarize Hare, it would be thoughtful. I could
go on praising or complaining about Hare, but at this rate I would never say a
thing about tsess
.
After this small digression it shouldn’t come as a surprise that Hare wins this showdown by a wide margin. I still love Rust and never really liked Go, but Hare was a major discovery for me in 2023, as big as Git and Varnish a little more than a decade ago.
Introducing tmuxinator
Now let’s rewind back forever ago, after I started using Linux as my main
operating system. It was Ubuntu, and it was a mixture of pleasantness and
frustration because I never really like the apt
ecosystem. I knew about
Fedora but so many things were missing, a different kind of frustration, and
yet I eventually bit the bullet. I became a Fedora contributor the day I
realized I had all the skills needed to maintain some packages I was lacking,
but I digress…
At some point in my Linux journey, I settled on a terminal emulator called terminator that I think was unmaintained at the time. It appears to exist as Gnome Terminator today. The whole deal is to have multiple terminals in the same window, and do interesting things with them.
At some point in my current day job, a former colleague tried to convince me to use tmux by repeatedly telling me to use it. I eventually got tired of it and started looking, and that’s when I ran into tmuxinator. I can only assume the name was a reference to both tmux and terminator but I can’t tell for sure. After setting up tmux key bindings to manage tmux panes almost like terminator terminals (I couldn’t use the Alt key for keyboard shortcuts) I had a close enough setup to replace my previous one. And on top of that my colleague would no longer bother me about tmux.
Once the muscle memory was successfully rewired, it was an overall win.
What does tmuxinator bring to the table? The management of tmux sessions from a declarative configuration read from YAML files that happen to also be ERB templates, via a command line interface.
I had been a happy tmuxinator user until it disappeared from the Fedora repositories. To bring it back on my machine I had to make an RPM and two more for Ruby gems that also went away. I have no Ruby programming experience and felt really unfit for maintenance on behalf of Fedora (even though it proved to be relatively low maintenance).
I’m not particularly interested in Ruby. Interpreted languages are generally not my vibe and I run away from systems prone to “magic” constructs and my understanding is that Ruby is a very dynamic language that encourages magic syntax tricks.
Meeting the hare tool chain
The hare
command I already mentioned mainly coordinates operations performed
by other utilities, including the harec
compiler that turns Hare code into
QBE’s intermediate language. The hare
command maintains a cache of build
artifacts, its way of dealing with incremental rebuilds.
The pipeline goes like this:
harec
translates.ha
files to a.ssa
file- it may also produce a “typedef”
.td
file
- it may also produce a “typedef”
qbe
translates the.ssa
file to a.s
fileas
compiles the.s
file to a.o
object file- the module may ship
.s
files too
- the module may ship
ar
may collect object files into a.a
archive- rinse next module and repeat
ld
links.o
and.a
archives
The .td
file could be compared to a .h
file in C, it’s just a Hare file
containing the module exported types and symbols.
If the Hare program is linking to shared objects, cc
is used in place of
ld
for linking.
And finally, harec
may compile multiple .ha
files into a single .ssa
(and .td
) file because a translation unit in Hare is a module, so all the
module files must be compiled at once. The hare
program does a little bit
more than just the above, it also manages a system of tags similar to Rust
conditional compilation with the #[cfg_attr]
attribute or cfg!
macro,
except that it takes the form of directories named after tag names, or files
(Hare or assembly) suffixed with tag names.
To illustrate the process, let’s write a hare_build.sh
script with
hard-coded tags:
#!/bin/sh
set -e
set -u
tag_filter() {
grep -Pv '\+(?!linux|x86_64)'
}
mod_dir() {
printf %s "/usr/src/hare/stdlib/$1" |
sed 's|::|/|g'
}
mod_ns() {
printf %s "$1" |
sed 's|::|.|g'
}
mod_td() {
printf %s "$1" |
sed 's|::|_|g'
}
mod_src() {
dir=$(mod_dir "$1")
ext=$2
find "$dir" -name "*.$ext" -type f |
grep -v "$dir/[^+-].*/" | # do not enter sub-modules
tag_filter
}
build_mod() {
td=$(mod_td "$1")
harec -o "$1".ssa -N $(mod_ns "$1") -t $td.td $(mod_src "$1" ha)
qbe -o "$1".s "$1".ssa
as -o "$1".o "$1".s
for s in $(mod_src "$1" s)
do
as -o "$1-$(basename "$s" .s).o" "$s"
done
ar -r -c "$1".a "$1"*.o
eval "export 'HARE_TD_$td=$td.td'"
}
find_mods() {
echo rt # always built
hare deps -T +linux+x86_64 -d "$1" |
awk '$2 == "->" {print $3 " " $1}' | # "foo"; "bar"
tr -d '";' | # foo bar
grep -Fv .ha | # only modules
grep -wv rt | # already built first
tsort
}
rm -f -- *.td *.ssa *.s *.o *.a
for mod in $(find_mods "$1")
do
echo "Building mod $mod"
build_mod "$mod"
done
prog=$(basename "$1" .ha)
harec -o "$prog".ssa "$1"
qbe -o "$prog".s "$prog".ssa
as -o "$prog".o "$prog".s
ld -T /usr/src/hare/stdlib/rt/hare.sc -o "$prog" "$prog".o *.a
I still relied on hare deps
to collect modules, but otherwise tried to
decompose the logic to better illustrate how the build system is glued
together, for the static linking case. It doesn’t honor the HAREPATH
and
probably a lot of other things expected from hare build
.
Let’s see it in action with a basic program that does nothing, possibly the smallest legal Hare program:
$ cat hello-void.ha
export fn main() void = void;
$ ./hare_build.sh hello-void.ha
Building mod rt
$ ./hello-void && echo OK
OK
Next, let’s try a program that writes hello to its standard output:
$ cat hello-write.ha
use rt;
let msg: []u8 = ['h', 'e', 'l', 'l', 'o', '\n'];
export fn main() void = rt::write(1, &msg[0], len(msg))!: void;
$ ./hare_build.sh hello-write.ha && ./hello-write
Building mod rt
hello
And finally, let’s see it in action with the fmt
module:
$ cat hello-print.ha
use fmt;
export fn main() void = fmt::print("hello\n")!: void;
$ ./hare_build.sh hello-print.ha
Building mod rt
Building mod encoding::utf8
Building mod errors
Building mod types
Building mod math
Building mod bytes
Building mod time
Could not open module 'linux::vdso': typedef variable $HARE_TD_linux::vdso not set
This shell experiment only goes so far. For modules below the top level it
appears that harec
expects environment variables with ::
in their names,
which is not legal for shell variables. At least not possible with a portable
POSIX shell script. Expecting HARE_TD_linux_vdso
would be cleaner, but also
conflict with a top-level module called linux_vdso
. I don’t see an obvious
solution to this besides forbidding underscores in module names.
Overall the tool chain is very lean. I can build the QBE compiler backend in less than a minute on my machine, including the time needed to browse to its web site and download a release archive.
Most of the stack is standard off-the-shelf tools, and the hare
command
takes care of coordinating everything. The directory nature of translation
units make this harder to integrate in a Makefile, compared to the POSIX way
of building C programs and libraries for example.
I even contributed a couple minor patches as I was playing with it and
building my own RPM for hare
.
Still, I proudly managed to commit the unforgivable in that department.
The GNU build system
The Autotools are a widely-hated set of tools that make up the GNU build
system, and operate on top of the venerable make
command. This is a radical
departure from the general simplicity of the Hare ecosystem.
I wouldn’t be surprised if the autotools weren’t welcome in the Hare community after witnessing hostility towards them left and right over the last decade.
While I share a lot of grievances regarding the autotools, I stand by my
choice. Even good old make
has its share of problems but to this day, make
is what offers the best level of abstraction for me. And automake
has the
best integration with its underlying make
backend.
On top of that, it offers many services ranging from the installation of files
of different natures, a decent test driver, and a form of standardization over
a set of make
targets. This is also the only build system I know that erases
itself for distribution, requiring only a shell and make
on top of the
package’s tool chain for downstream consumers. Such downstream consumers can
be your operating system’s package manager, and for example it integrates out
of the box in an RPM spec.
I just wish it wasn’t oh so complicated underneath.
When I wrote my first toy project in Rust I
decided to use the autotools and I was happy about the result, but I was also
constantly fighting the Rust tool chain. For tsess
I decided to throw the
towel and use cargo
, I’d figure the non-Rust bits later, but I never reached
that point.
When I rebooted tsess
in Hare, I ended up using the unholy trinity:
Autoconf
There isn’t much to say here. The Hare tool chain only brings a handful of
dependencies, and I bring automake
, libtool
, hare
and
scdoc.
I wrote my own Autoconf macros for Hare configuration, but nothing worth
dwelling on, besides maybe the creation of a HARE_ENV
variable for the
hare
and harec
commands, and the injection of the VPATH into HAREPATH
.
One thing that is not easy to do is deriving the default HAREPATH
from the
hare
command. For the ancient Hare tool chain I’m using, hare version -v
does not inspire confidence and I considered submitting a hare config
subcommand inspired by pkg-config
to probe the Hare tool chain:
DEFAULT_HAREPATH=$(hare config harepath)
But I will wait until QBE 1.2 is released and qbe
/harec
packages are
updated on Fedora. I’m not interested in maintaining more packages locally.
Libtool
Likewise, there isn’t much to say about Libtool except that once included it changes how Automake rules are put together, and incorporating the Libtool behemoth is what made the rest simpler.
How ironic.
Automake
Normally one would declare a list of programs or binaries, then exhaustively list its sources, and in theory things just work. Things may only just work when languages are known to Autoconf, which is obviously not the case here.
I managed to keep the complications for tsess
relatively under control:
HARE_BUILD = env $(HARE_ENV) $(HARE) build
LINK = $(HARE_BUILD) $(HAREFLAGS) -t $(host_cpu) $($(@)_HAREFLAGS) -o $@
bin_PROGRAMS = tsess
tsess_SOURCES = \
bin/tsess/attach/attach.ha \
bin/tsess/cli/cli.ha \
bin/tsess/config/env.ha \
bin/tsess/config/error.ha \
bin/tsess/config/list.ha \
bin/tsess/config/prop.ha \
bin/tsess/config/sess.ha \
bin/tsess/lexer/lex.ha \
bin/tsess/lexer/tok.ha \
bin/tsess/parser/parse.ha \
bin/tsess/parser/rule.ha \
bin/tsess/text/pos.ha \
bin/tsess/text/scan.ha \
bin/tsess/text/text.ha \
bin/tsess/tmux/proc.ha \
bin/tsess/main.ha
nodist_tsess_SOURCES = \
bin/tsess/cli/gen/cmd.ha \
bin/tsess/config/gen/env_vars.ha
tsess_DEPENDENCIES = \
$(tsess_SOURCES) \
$(nodist_tsess_SOURCES)
tsess_HAREFLAGS = \
-D 'TSESS_STRING="$(PACKAGE_STRING)"' \
-D 'TSESS_VERSION="$(PACKAGE_VERSION)"'
tsess_LDADD = $(srcdir)/bin/tsess/main.ha
With Libtool the build is normally driven by COMPILE
and LINK
variables,
and in turn they rely on the configured CC
and CCLD
variables. But if
tsess_SOURCES
contained .c
files instead it would implicitly populate
tsess_OBJECTS
with their .o
counterparts.
As illustrated earlier, a Hare build is not as straightforward as turning
individual C files into object files and linking them together, so nothing
picks up the .ha
files. This is why tsess_DEPENDENCIES
has to reference
the sources explicitly, to trigger $(LINK)
commands.
And so only the LINK
variable is overridden to invoke hare build
, and it
appears in the output as a CCLD
operation, which is not inaccurate. This
means that a Makefile.am
file cannot contain rules for both Hare code and
another language managed by the autotools like C, but it is still possible to
have both in the same project in separate Makefiles.
It would be that simple if it weren’t for the nodist_tsess_SOURCES
. Hare
does not have a macro system like C or Rust and encourages code generation
instead, so I decided to find a reason to justify having some.
VPATH hell
At this point it should be mentioned that many things that are not strictly
necessary in tsess
are there for learning purposes. For code generation,
it’s not so clear cut as I wanted to avoid disconnecting the documentation
from the code.
With Automake the make
VPATH contains the build directory and the source
directory. They can be the same directory, which is usually the case when
./configure
is executed. When the configure script is invoked from a
different directory like ../tsess/configure
it results in $srcdir
having a
different value than $builddir
and in this example ../tsess
. This is why
$(srcdir)
is added escaped to the HAREPATH
so it’s expanded by make
.
I don’t know what $builddir
can be besides .
and I suspect it only exists
for parity. There are others “abs” and “top” variants of the VPATH components.
This means that for a single source tree there can be multiple concurrent build trees. For example one building with GCC and another with Clang. It also means that a “dist” archive, once unpacked, can be mounted read-only and not dirtied in a continuous integration environment (or fail the build if that happens).
Unfortunately there are some gotchas in the VPATH department. For example if we consider the following suffix rule:
SUFFIXES = .1.scd .1
.1.scd.1:
$(SCDOC) <$< >$@
If the source lives in the VPATH outside of the current directory, gmake
will create the target in the current directory, but bmake
will create it
in the same directory as the source. If the source tree is read-only, or in
general with Automake’s model, bmake
’s behavior is a problem.
This is also a problem with tools that don’t support searching files from a
PATH. In scdoc
’s case
there isn’t an include feature in the first place, so I used Autoconf instead
to reconstruct the manuals from multiple fragments.
The downside of this approach is that as far as make
is concerned, the
.scd
files are always seen as updated after the configure step so manual
pages always need to be rebuilt. So even when building from a “dist” archive
scdoc
fails to erase itself like the rest of the autotools and there is no
actual benefit of shipping the man pages.
It’s still possible for end users to erase scdoc
at configure time:
$ tar xf tsess-0.1.tar.gz
$ cd tsess-0.1
$ ./configure SCDOC=/bin/true
Or at least it would be if scdoc
didn’t strictly work with standard input
and output. The result of $(SCDOC) <$< >$@
would result in an empty manual.
If it was possible to run $(SCDOC) -o $@ $<
instead, then that would have
been possible. I think this is a concrete example of too much simplicity
making a system simplistic.
To work around these limitations, I wrote a script called mkmissing
that
rebuilds target based on the checksum of sources. So unless some scd
file
is patched, scdoc
should never be needed during a downstream build from a
release archive.
And I didn’t just wing it, I carefully crafted mkmissing
. For some reason I
really enjoy shell scripting, I can’t help it…
Other tools like hare
can work with VPATH builds, but the way HAREPATH
works adds a limitation: a module cannot be split across multiple components
of the HAREPATH
. This is why generated files land in gen
sub-modules:
nodist_tsess_SOURCES = \
bin/tsess/cli/gen/cmd.ha \
bin/tsess/config/gen/env_vars.ha
Thankfully hare
doesn’t enforce that sub-modules be sub-directories of their
“parent” modules. There doesn’t seem to be a relationship beyond a logical
hierarchy.
The idea here is to generate the model for environment variables and the
command line interface based on what the manual says. This way the manual is
authoritative and changes to the documentation are immediately effective, or
at least actionable. The getopt
module (as far as my ancient Hare goes) has
poor ergonomics but an excellent design, so I was able to implement a thin
wrapper on top to get my desired result:
$ tsess -h
Usage: tsess [-h] [-q|-v]
List available subcommands.
Options:
-h Print usage and exit.
-q Quiet mode.
-v Verbose mode.
Subcommands:
attach Attach to a session.
list List available sessions.
version Display the version number.
$ tsess -h attach
Usage: tsess attach [-L|-S] <name>
Attach to a session.
Options:
-L socket-name Alternative socket name of the tmux server.
-S socket-path Alternative socket path to the tmux server.
Parameters:
name The name of the session.
Throwaway Hare programs generate the Hare sources from the manual fragments, and that became a problem from cross compilation.
Cross compiling tsess
Hare currently supports Linux and FreeBSD, and apparently OpenBSD is being worked on. It also targets the three architectures supported by QBE: x86_64, aarch64 and riscv64.
Cross compiling tsess
is incredibly easy:
$ ./configure --host=riscv64-unknown-linux-gnu >/dev/null
$ make
CCLD gen_cmd
GEN bin/tsess/cli/gen/cmd.ha
CCLD gen_env
GEN bin/tsess/config/gen/env_vars.ha
CCLD tsess
GEN man/tsess.1
GEN man/tsess.5
$ file gen_cmd gen_env tsess
gen_cmd: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, with debug_info, not stripped
gen_env: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, with debug_info, not stripped
tsess: ELF 64-bit LSB executable, UCB RISC-V, double-float ABI, version 1 (SYSV), statically linked, with debug_info, not stripped
That alone is quite the payoff.
I think that it even is easier than cross compiling a C project with a mix of programs for the build and host architectures.
Part of this is solved by my local RPM packaging of Hare. In order to build
Hare a config.mk
file is needed to specify the various commands for normal
and cross builds. In the %prep
step of the RPM build I generate the right
configuration on the fly based on the target architecture with a Lua macro:
%prep
%autosetup -p1 -n %{name}-%{shortcommit}
tee config.mk <<'EOF'
PREFIX = %{_prefix}
BINDIR = %{_bindir}
MANDIR = %{_mandir}
SRCDIR = %{_usrsrc}
STDLIB = $(SRCDIR)/hare/stdlib
HAREPATH = $(SRCDIR)/hare/stdlib:$(SRCDIR)/hare/third-party
PLATFORM = %{_host_os}
ARCH = %{_arch}
HAREC = harec
HAREFLAGS =
QBE = qbe
AS = %{__as}
AR = %{__ar}
LD = %{__ld}
SCDOC = scdoc
HARECACHE = .cache
BINOUT = .bin
%{lua:
for arch in string.gmatch(macros.hare_arches, '%S+') do
local host = arch.."-linux-gnu-"
local host_as = "as"
local host_ar = "ar"
local host_cc = "cc"
local host_ld = "ld"
if arch == macros._arch then
host = ""
host_as = macros.__as
host_ar = macros.__ar
host_cc = macros.__cc
host_ld = macros.__ld
end
print(string.upper(arch).."_AS = "..host..host_as.."\n")
print(string.upper(arch).."_AR = "..host..host_ar.."\n")
print(string.upper(arch).."_CC = "..host..host_cc.."\n")
print(string.upper(arch).."_LD = "..host..host_ld.."\n")
end}
EOF
So all I need is to target the host architecture when “linking” Hare programs:
LINK = $(HARE_BUILD) $(HAREFLAGS) -t $(host_cpu) $($(@)_HAREFLAGS) -o $@
And code generators can override it with their hare flags:
gen_cmd_HAREFLAGS = -t $(build_cpu)
This way, there’s no need to worry about treating code generation like manual
page generation. The Hare tool chain is a requirement anyway, unlike scdoc
that I made optional with the help of mkmissing
.
Testing
Hare, like Go and Rust, comes with native support for unit testing in the form
of a @test
attribute that is part of the language specification. I’m not
planning to use the @test
attribute in this project.
As a matter of fact, the 0.1 release of tsess
ships with no automated tests.
I could spend an entire post on why I don’t engage in unit testing and prefer
to focus on functional testing and why the proverbial test pyramid should be
considered harmful, but I’m already spending an entire post on introducing
tsess
and I believe that so far it still hasn’t been formally introduced.
The bottom line is that I actually have plans for testing and I figured about fifty percent of the testing architecture. What’s certain is that it will rely on Automake’s default test driver.
For all its flaws, I’m convinced that adopting the GNU build system for
tsess
brought more value than liabilities.
From zero to zero dot one
The development of tsess
happened in small bursts. It took me much less than
a week’s worth of full time work to get to the first release. Except of course
that if I had tried to do everything in one go it would likely have taken more
time. Between bursts I would read or reread parts of the spec, read more code
from the standard library or contribute trivial patches to the project. So I
had time to let whatever I learned sink in or perform some experiments before
putting them in practice on the tsess
code base:
- August 23rd, one evening, built a working lexer
- August 27th, one evening, built a working parser
- Septenber 2nd, one evening, ended up shaving the
mkmissing
yak - October 5th, a train ride, started modeling configuration
- November 23rd, one evening, more work on configuration, first codegen
- December 4th, one evening, command line interface, more codegen
- December 14th, one afternoon, added a
tmux
controller - December 15th, one morning, a minimum viable
tmuxinator
replacement - December 31th,
tsess
0.1 released
Well, most of it happened in December, and of course this project had to start with a parser.
Parsing the configuration file
I enjoy writing parsers. For my day job I worked on several occasions on the VCL parser and my first Rust project was a VCL preprocessor to experiment with the VCL syntax.
Why make up my own configuration file format when I could reuse an established
format like tmuxinator
did? The answer is two-fold: I wanted a configuration
format really tailored to the user experience for tsess
and it was a very
good opportunity to evaluate Hare since I worked on more parsers than I can
remember.
Here is for example the tmuxinator
configuration of my Fedora “workspace”:
name: fedora
root: ~/fedora
windows:
<% for pkg in [
'haproxy',
'libslz',
'makeself',
'numatop',
'python-funcparserlib',
'python-webcolors',
'ShellCheck',
'tiptop',
'vcsh',
'vmod-querystring',
] %>
- <%= "#{pkg}" %>:
layout: main-horizontal
root: ~/fedora/<%= "#{pkg}" %>
panes:
- vim <%= "#{pkg}" %>.spec
- git status
<% end %>
Reminder, it’s an ERB template producing YAML.
In tsess
my configuration file looks like this instead:
name = fedora
root = "~/fedora"
[@pkg@]
layout = main-horizontal
root = "@pkg@"
pane.editor.send-keys = << EOF
vim @pkg@.spec
EOF
pane.prompt.send-keys = << EOF
git status
EOF
[pkg:haproxy]
[pkg:libslz]
[pkg:makeself]
[pkg:numatop]
[pkg:python-funcparserlib]
[pkg:python-webcolors]
[pkg:ShellCheck]
[pkg:tiptop]
[pkg:vcsh]
[pkg:vmod-querystring]
Looking at my needs I took the problem from the other end. Instead of enabling
a programmatic definition of my repetitive windows and panes, I opted for a
window template called pkg
and applied the template to all my windows. It
just so happens that I don’t need to further configure individual windows, but
I could do something like this for example:
[pkg:special]
pane.prompt.split = above
In theory I should be able to write this and still send an implicit C-m
key
at the end:
pane.editor.send-keys = "vim @pkg@.spec"
But the distinction between single- and multi-line properties is missing from the 0.1 release. So instead I’m using the here-document syntax. It’s also possible to use single-quoted strings and here-documents to disable template expansion.
This is arguably over-engineered, but again learning purposes… Also, it’s overall much less engineering than ERB templating (that can embed Ruby code) and as long as it is not YAML, probably anything is better.
The configuration format is decently documented in the tsess(5)
manual. The
tsess(1)
manual documents ~/.config/tsess/*.tsess
as the default location
for configuration files, but the tsess/
directory is currently omitted.
Working with tmux
Another thing that inspired tsess
a long time ago happened by accident one
day, innocently running ps
. There, I found quite the unexpected line, with a
single process from the ps
output filling my entire terminal.
Its command line started like this:
sh -c #!/bin/bash # Clear rbenv variables before starting tmux
After collecting the actual shell script with its new line characters present,
I could witness the translation of the YAML configuration into tmux
commands
and shell if
statements. I started looking at the os::exec
module and it
didn’t take long to find my marks. Like the getopt
module it is very well
designed, but unlike the getopt
module it also has excellent ergonomics, so
I was ready to move on in no time.
When it was finally time to start interacting with a tmux
server, I read the
manual and found the control mode, that uses a simple text protocol to issue
commands. Emphasis on simple!
Hare is one of those languages like Rust that only works with UTF-8 strings,
so using tmux
in control mode would be really no fun if there wasn’t a -u
option to guarantee UTF-8 encoding. Thanks to that it becomes really trivial
to work with off-the-shelf goodies from the strings
and bufio
modules.
Working with strings
Another reason to start my tsess
project was that I was increasingly itching
to work on a parser that would not need to duplicate strings per token. I gave
up quickly when I wrote my VCL preprocessor, as a Rust beginner. Writing my
parser in Rust was incredibly pleasant, besides this failure.
To explain what I mean by not duplicating strings, consider the following contents to parse:
property = 'value'
The self-imposed constraint I insisted on keeping was to load the contents
just once ("property = 'value'\n"
) and reference tokens in place. In C
that’s impossible with null-terminated char *
strings.
In Varnish Cache, we use the following data structure:
typedef struct {
const char *b;
const char *e;
} txt;
A token has some data, and a txt
field to track the base and end of its text
in the loaded source. In other words, a token would reference a substring of
the source being parsed, in-place. The important thing is of course that a
token must not outlive its source material.
If C leaves me to my own devices, Rust on the other hand has the ability to
express lifetimes of types in relation to others. So I decided that I wanted
to parse tsess
’s configuration in Rust with a single heap allocation.
I hit my head pretty hard several times on the advanced lifetimes wall, but I made progress. Approximately a year ago, I read this documentation page on advanced lifetime and that was the breakthrough I needed.
Here is the commit message I wrote to myself back then:
lexer: Let lexer::Eval reference its text
The reason why the first attempt failed was the single lexer lifetime. Working with a &&str reference allows a distinction between the string lifetime and the shorter lexer lifetime.
In fact, an Eval<’s> should now be able to outlive the lexer that made it in the first place, since it’s only bound to the source’s lifetime.
The Eval implementation of the Display trait is replaced with Debug, and now that it carries its own text, it can escape it by itself.
As a result the main loop now looks like what it was meant to look like from the beginning, this time with great success.
And here is the diff of the main loop in this commit:
fn main() {
- for eval in lexer::Lexer::from(SRC) {
- match eval.input {
- Ok(_) => {
- print!("{} '", eval);
- for c in eval.text(SRC).chars() {
- print!("{}", c.escape_default())
- }
- print!("'\n");
- }
- Err(_) => println!("{}", eval),
- };
+ for eval in lexer::Lexer::from(&SRC) {
+ println!("{eval:?}")
}
}
Thank you past me for holding the hand of your future self, me, I couldn’t remember this without you. And I agree with you, the main loop became slightly more readable.
So after becoming a master of 'a, 'b: 'a
lifetimes I completed the lexical
analyzer and the parser. Eventually, when it was time to load configuration
files and stop working with a hard-coded str
, I could not figure how to get
a &&str
from a Box<str>
to apply my mighty advanced lifetime scheme.
I tried many things and they all failed, and that’s when I heard about Hare for the first time. Hare is not as nice as Rust when it comes to massaging strings to do things with them, but Hare also doesn’t have half a dozen types to represent strings under various circumstances like Rust. I’m exaggerating on purpose here, but this typically the kind of complications that make Rust hard to fit into one brain without daily practice.
Hare works with UTF-8 strings. The str
type is just a []u8
(slice of 8 bit
unsigned integers) in disguise with the promise that it is valid UTF-8. Hare
insists for the most part to work with iterators when it comes to strings. I’m
surprised because it was my understanding that UTF-8 ensures that you always
know whether you are at the boundary of a code point, but using indices to get
substrings is not common practice. At least the []
operator is not available
for str
but we can find byte-wise and rune-wise indexing in the strings
module.
The preferred way to work with strings appear to be iterators. I wrote my own
text
module to implement functions like C’s strcspn()
on top of iterators
and in general whatever convenience would make sense for the parser. I used it
for the code generators as well.
The first release
I struggled with a couple things, like Vim rendering mangled during startup
and until a pane is resized and it looks like a SIGWINCH
was enough. When I
first successfully attached to my fedora
session, it was only a matter of
minutes to migrate my other sessions from tmuxinator
to tsess
.
The reason why I took care of a couple TODOs in the documentation and performed the release on December 31th was simply to say that this journey to publishing my first Hare project belonged exclusively to 2023. Yes, I’m quite vain.
I’m not expecting anyone to use it, but I will continue tsess
’ development
and reach the scope I have in mind before cutting a 1.0 release. Just in case,
you may find it on Sourcehut.
Closing words
As for Hare, I find it to be an interesting language striking an excellent
balance in a spectrum going from C to Rust. I find the cursor between safety
and agency to be where I want it most of the time, possibly where I should
want it if I knew better. I just regret for now the lack of support for macros
because code generation is one thing, but there are other use cases. This is
for one may shoehorn generics in C, but I digress. Just like Rust, I will keep
an eye on Hare, but unlike Rust I may play with Hare on a more regular basis,
starting with exploring tsess
until I can agree to make a 1.0 release.
I still haven’t formally introduced tsess
, but I will assume that you can
get the gist of it after seeing a configuration example and the command line
usage. You can also Read The Fancy Manuals to learn more, but not much more.