Introducing tsess

Best wishes for the new year(s)

2024 is finally here, and a new entry in this blog was long overdue. It’s not that I had nothing to write about, I always have something silly in store I could mention here. I simply had little time to spare, and chose to do other things on my spare time.

In these last three and a half years I have kept up to date with many things, which is an implicit requirement when choosing Fedora as an operating system but one thing I failed to uphold was the maintenance of this blog. It was a little bit painful to deal with updates to Hugo and the Jane theme, but I somehow succeeded.

Since January is the time for new year’s resolution, this years I made up two for myself:

migrate my development workflow to tsess
write at least one blog post per year

I’m so good at upholding resolutions that I already succeeded both. I can go back to extreme procrastination with a good conscience. See you next year.

Introducing tsess?

In 2023 I discovered the Hare programming language when a FOSDEM presentation was brought to my attention, although admittedly I don’t remember how. I suspect it was from LWN because I was lagging far behind on this feed, and nowadays I’m not even trying to follow it, but I digress.

My first reaction was of course to compare it to Rust, a language I have been following since version 0.4 as a serious contender for C. The philosophy behind Hare matches many opinions I forged over time, and as it turned out, more than Rust. So I started reading about Hare and playing with it until it was time, six months later, to have a “real” project to effectively learn it.

I started musing about this tsess project when the tmuxinator package was orphaned in Fedora and eventually removed from the repositories. In late 2022, two years after maintaining my own RPMs for tmuxinator and its dependencies outside of Fedora, I started tsess in Rust and failed in the context of self-imposed constraints. In August 2023 I decided to try again in Hare.

In 2023 I also authored my first Go project for work and among other things it involved crafting a simple parser, which is most of what I have done in Rust so far. The Hare tool chain shows similarities with Go’s and the Hare language itself looks like C enhanced by concepts found in both Rust and Go. Rewriting tsess in Hare became all the more interesting for me in that context.

And in the common Internet tradition, I have to explain why Vim is better than Emacs, or the other way around.

The Rust-Go-Hare showdown

Rust

I have been following Rust’s evolution for years now but never worked on a real project doing actual work in production somewhere. I really love Rust so all the negative things I’m going to say about it must be framed in this context: I really love Rust.

In a nutshell, Rust is to me a language that encodes resource management in its type system (and to some extent its standard library) and effectively turns the compiler into a static analyzer. For forcing me to think harder about ownership I am forever grateful.

Rust also has relatively straightforward C interoperability but despite starting as a C-like language like so many others it embraces functional programming constructs. The standard library is by design wide, providing a lot of variants for a lot of features. Likewise, the syntax sometimes offers several ways radically different to achieve the same thing and interestingly the Clippy linter will steer you towards certain constructs.

Rust forces you to handle return values, elegantly generalizes null handling with its Option type and offers a pleasant higher-level error management with the Result type. It is also expression-oriented, which is refreshing coming from C.

In comparison, C is a simple statement-oriented imperative language. The simplicity of C is downright impressive, but the language is sometimes too simplistic and overall plagued with undefined behavior. Rust on the other hand is too complicated. I call it an imperacryptive language and my inability to fit it in my head is a problem, even assisted by tools. Mind you, there are some bits of C equally cryptic like the rules for struct layouts or the mere ability to come up with Duff’s device but the things difficult to remember (for me) about C are things I rarely need or actively avoid (unfortunately you can’t just look away from undefined behavior).

I’m also not in line with the Rust community philosophy of having many crates “doing one thing and doing it well” à la UNIX™. I don’t like that cargo is both a dependency manager and a build system, this is the main ingredient in the recipe for supply chain attacks and generally a lack of separation of two critical concerns. Cargo also poorly composes with polyglot projects, like any language’s self-centered all-encompassing tool chain. This is true of Go and Hare as well, but to a lesser extent for the latter. I became familiar with this problem in a past life working on Java projects built with the Maven build system, but don’t get me started on this one or I will dial digressions up to eleven.

Go

Go in comparison is much simpler than Rust, and has an overall pleasant look and feel. Rust has a rather heavily ceremonial syntax and in comparison Go is very straightforward.

Like Rust, Go has very interesting design choices, but my main problem is the amount of magic that goes behind the scenes. Rust is certainly not devoid of magic constructs but in general I know what I’m doing or trying to do. With Go most of the time all bets are off, as far as I’m concerned. It should be noted that I never tried to formally learn either Rust or Go, unlike Hare. In Go’s defense, before this first work project I had spent much less time looking at it than say, Rust.

Error handling in Go is interesting too, but not to my liking. Still, I tried to adhere to the coding standard and do things the Go way because I was going to ultimately hand over this code to someone else for future maintenance, but then I ran into exotic error handling in the text/scanner package I relied on for my parser. Navigating the standard library was not always trivial. For example do I use net package to work with CIDRs? What about the net/netip package?

The Go tool chain is clearly the worst of the three in my ranking. The way Go manages dependencies as Git repositories is horrendous. At least Cargo works with actual “crates” repositories. I understand some of the constraints behind the source-only compilation of mostly-statically-linked programs for both Rust and Go (and Hare!) but it doesn’t mean that I approve. At least Rust solves source code incompatibilities with an interesting system of editions. In Go, the resulting dependency hell appears to be universally solved with vendoring, and well, to me this is only adding one circle to this inferno. This is giving a license to software maintainers to not carefully design interfaces. Can’t just upgrade to the latest version of X, or need a specific patch? Lock your dependency or vendor it.

Oh, and I don’t like ending up with a hello world weighing as much as a web browser once compiled. I may be exaggerating a little here, but man, Go programs are anything but lean.

Hare

Now Hare is a beast of its own, but one I found incredibly easy to tame. Hare looks like C, with Rust-like type decomposition, a Go-like tool chain, and everything pretty much toned down.

Hare is expression-oriented like Rust, and unlike Go. While the syntax looks ceremonial, with for example code blocks delimited by curly braces requiring a semi-colon after the closing brace, once you realize the block is just another expression it makes sense. So far Hare has the most consistent syntax of the three.

In fact, Hare comes with a formal specification that I spent a lot of time reading on an on and off basis. In the roughly six month between the time I discovered Hare and the time I started writing tsess I had probably read half of the document. I’m not sure whether this is me being more mature, the Hare specification being more accessible (than say, the ISO C standards) or a case of all of the above, but it was instrumental helping me fit Hare in my brain.

Between the specification and overall consistency of the language design, Hare managed to enter my slow brain faster than any other programming language I can remember learning. I found one minor inconsistency not worth mentioning, but I may revisit it in the future.

While consistency probably plays a role, my favorite aspect of Hare is the emphasis on simplicity. I feel that nowadays simplicity is underappreciated in the software landscape. Making and keeping a non-trivial system simple is no small fit. In that regard Rust failed utterly, keeping the core language small and simple, but not the language we see, the internal representation, the one the compiler “desugars” actual Rust code to. I think I eventually had a sugar overdose that distanced myself from Rust. On the other hand of the spectrum C is so simple it tips over to being too simplistic in some areas. Meanwhile, Go has too much magic and I wasn’t able to get a good grasp on all the code I wrote.

Hare appears to draw inspiration from Rust in the error handling department. But instead of forcing an arbitrary Result type down our throat, it enables any type to become an error by suffixing it with a ! character. Like Rust, Hare requires error handling. Unlike Hare, Rust is explicit about handling everything (I mentioned Option already). Suffixing an expression with ! tells the compiler that it should be fail-safe, or otherwise abort. The ? suffix tells the compiler to return from the function if the return value is an error.

One thing I would like to see added is a _ suffix to ignore the value of an expression, and by extension the error cases, effectively casting it to void:

if (something == wrong) {
	fmt::fprint(os::stderr, "Something went wrong!\n")_;
	os::exit(STATUS_SOMETHING_WRONG);
};

Ignoring a potential failure to write the error message would ensure that the programs exits with the desired status, instead of aborting.

This can be done like this:

let ign = fmt::fprint(os::stderr, "Something went wrong!\n");

But I’m not sure whether the specification will evolve in a stricter direction where for example you can’t have unused variables. The problem with the PDF specification draft is the lack of git commit to review changes since the revision I downloaded. I can probably infer it to roughly a year back.

It also becomes unwieldy to ignore multiple expressions in the same scope, something that a _ suffix could elegantly solve while still being explicit. Something you would catch in a code review to dispute or argue.

Now here is something to watch, I just had an idea and I need to follow my intuition right now. I will be back soon:

I’m back, thank you for your patience. Complaining about the unwieldiness of the construct above and my suggestion to make _ cast away to void gave me the idea to try just that:

fmt::fprint(os::stderr, "Something went wrong!\n"): void;

And well, it does what I need the way I suggested it without polluting the code, so I think I have no further complaints about error handling.

Damn.

Moving on, another “problem” with Hare is that it’s still under development, so things change. I have been there since Rust 0.4, but I’m also stuck on an ancient version of Hare because I decided to use the qbe package available on Fedora.

One concrete example is @noreturn functions, or in more recent versions of Hare, functions returning never:

fn some_function(arg: some_type) (void | error) = {
	let obj = foo::allocate(arg);
	defer foo::finish(&obj);

	bar::do_some(&obj)?;
	bar::do_more(&obj)?;
	bar::prepare(&obj)?;
	os::exec(&obj.cmd);
};

The defer keyword (inspired by Go?) ensures that obj is freed if one of the three functions from the bar module fails. The problem is that it also ensures that it is freed before calling os::exec() that is not supposed to return. I haven’t been able to ponder this much, but I can see cases where a defer expression should be evaluated even when the program will not return, but I don’t know what kind of syntax could express both cases in a meaningful way. Maybe an always keyword to match the never type?

defer foo::rm_tmp_dir(): always;

One thing I love in Hare is the parity between logical and bitwise boolean operators. There are gaps like this or the lack of explicit endianness support in the C standards that I find surprising to still have to this day. There is however one boolean operator that does not have a counterpart:

if (res is error) {
	// do something
};

There is a match keyword similar to Rust’s, not as ergonomic, but still very usable. Sometimes it is simpler to go with the is operator and match is arguably overkill. I understand that is composes just fine with ! but for this very case I would arbitrate in favor of a little duplication:

if (!(this is less_readable)) {
	// do something
};

// vs

if (expr not convoluted) {
	// do something
};

I also really like that break and continue keywords only operate on loops, and it’s probably for the best that falling through switch or match cases is not possible. The multi-modal for loop is bold, but so far there has always been a mode that fits my needs. There’s just one thing that irritates me about both Rust, Go and Hare: what is wrong about the do-while style of loops? I’d love to see a single-mode do expression for ( expression ) kind of loop.

While I could complain about the Hare tool chain for being of the kind I do not like, I would rather point out that the hare command, unlike go and cargo is a build driver without dependency management. It will only discover Hare modules and not endeavor to fetch them or introduce a new “standard” form of repository. And this is not by accident or lack of implementation so far, this is a stated design choice:

Dependencies

We have a module system!

But no package manager

Use your distro’s package manager

Choose your dependencies conservatively

stdlib gets you most of the way there

How refreshing!

If I had to pick one word to summarize Hare, it would be thoughtful. I could go on praising or complaining about Hare, but at this rate I would never say a thing about tsess.

After this small digression it shouldn’t come as a surprise that Hare wins this showdown by a wide margin. I still love Rust and never really liked Go, but Hare was a major discovery for me in 2023, as big as Git and Varnish a little more than a decade ago.

Introducing tmuxinator

Now let’s rewind back forever ago, after I started using Linux as my main operating system. It was Ubuntu, and it was a mixture of pleasantness and frustration because I never really like the apt ecosystem. I knew about Fedora but so many things were missing, a different kind of frustration, and yet I eventually bit the bullet. I became a Fedora contributor the day I realized I had all the skills needed to maintain some packages I was lacking, but I digress…

At some point in my Linux journey, I settled on a terminal emulator called terminator that I think was unmaintained at the time. It appears to exist as Gnome Terminator today. The whole deal is to have multiple terminals in the same window, and do interesting things with them.

At some point in my current day job, a former colleague tried to convince me to use tmux by repeatedly telling me to use it. I eventually got tired of it and started looking, and that’s when I ran into tmuxinator. I can only assume the name was a reference to both tmux and terminator but I can’t tell for sure. After setting up tmux key bindings to manage tmux panes almost like terminator terminals (I couldn’t use the Alt key for keyboard shortcuts) I had a close enough setup to replace my previous one. And on top of that my colleague would no longer bother me about tmux.

Once the muscle memory was successfully rewired, it was an overall win.

What does tmuxinator bring to the table? The management of tmux sessions from a declarative configuration read from YAML files that happen to also be ERB templates, via a command line interface.

I had been a happy tmuxinator user until it disappeared from the Fedora repositories. To bring it back on my machine I had to make an RPM and two more for Ruby gems that also went away. I have no Ruby programming experience and felt really unfit for maintenance on behalf of Fedora (even though it proved to be relatively low maintenance).

I’m not particularly interested in Ruby. Interpreted languages are generally not my vibe and I run away from systems prone to “magic” constructs and my understanding is that Ruby is a very dynamic language that encourages magic syntax tricks.

Meeting the hare tool chain

The hare command I already mentioned mainly coordinates operations performed by other utilities, including the harec compiler that turns Hare code into QBE’s intermediate language. The hare command maintains a cache of build artifacts, its way of dealing with incremental rebuilds.

The pipeline goes like this:

harec translates .ha files to a .ssa file
- it may also produce a “typedef” .td file
qbe translates the .ssa file to a .s file
as compiles the .s file to a .o object file
- the module may ship .s files too
ar may collect object files into a .a archive
rinse next module and repeat
ld links .o and .a archives

The .td file could be compared to a .h file in C, it’s just a Hare file containing the module exported types and symbols.

If the Hare program is linking to shared objects, cc is used in place of ld for linking.

And finally, harec may compile multiple .ha files into a single .ssa (and .td) file because a translation unit in Hare is a module, so all the module files must be compiled at once. The hare program does a little bit more than just the above, it also manages a system of tags similar to Rust conditional compilation with the #[cfg_attr] attribute or cfg! macro, except that it takes the form of directories named after tag names, or files (Hare or assembly) suffixed with tag names.

To illustrate the process, let’s write a hare_build.sh script with hard-coded tags:

#!/bin/sh

set -e
set -u

tag_filter() {
	grep -Pv '\+(?!linux|x86_64)'
}

mod_dir() {
	printf %s "/usr/src/hare/stdlib/$1" |
	sed 's|::|/|g'
}

mod_ns() {
	printf %s "$1" |
	sed 's|::|.|g'
}

mod_td() {
	printf %s "$1" |
	sed 's|::|_|g'
}

mod_src() {
	dir=$(mod_dir "$1")
	ext=$2
	find "$dir" -name "*.$ext" -type f |
	grep -v "$dir/[^+-].*/" | # do not enter sub-modules
	tag_filter
}

build_mod() {
	td=$(mod_td "$1")
	harec -o "$1".ssa -N $(mod_ns "$1") -t $td.td $(mod_src "$1" ha)
	qbe -o "$1".s "$1".ssa
	as -o "$1".o "$1".s

	for s in $(mod_src "$1" s)
	do
		as -o "$1-$(basename "$s" .s).o" "$s"
	done
	ar -r -c "$1".a "$1"*.o

	eval "export 'HARE_TD_$td=$td.td'"
}

find_mods() {
	echo rt # always built
	hare deps -T +linux+x86_64 -d "$1" |
	awk '$2 == "->" {print $3 " " $1}' | # "foo"; "bar"
	tr -d '";' | # foo bar
	grep -Fv .ha | # only modules
	grep -wv rt | # already built first
	tsort
}

rm -f -- *.td *.ssa *.s *.o *.a

for mod in $(find_mods "$1")
do
	echo "Building mod $mod"
	build_mod "$mod"
done

prog=$(basename "$1" .ha)

harec -o "$prog".ssa "$1"
qbe -o "$prog".s "$prog".ssa
as -o "$prog".o "$prog".s

ld -T /usr/src/hare/stdlib/rt/hare.sc -o "$prog" "$prog".o *.a

I still relied on hare deps to collect modules, but otherwise tried to decompose the logic to better illustrate how the build system is glued together, for the static linking case. It doesn’t honor the HAREPATH and probably a lot of other things expected from hare build.

Let’s see it in action with a basic program that does nothing, possibly the smallest legal Hare program:

$ cat hello-void.ha
export fn main() void = void;
$ ./hare_build.sh hello-void.ha
Building mod rt
$ ./hello-void && echo OK
OK

Next, let’s try a program that writes hello to its standard output:

$ cat hello-write.ha
use rt;

let msg: []u8 = ['h', 'e', 'l', 'l', 'o', '\n'];

export fn main() void = rt::write(1, &msg[0], len(msg))!: void;
$ ./hare_build.sh hello-write.ha && ./hello-write
Building mod rt
hello

And finally, let’s see it in action with the fmt module:

$ cat hello-print.ha
use fmt;

export fn main() void = fmt::print("hello\n")!: void;
$ ./hare_build.sh hello-print.ha
Building mod rt
Building mod encoding::utf8
Building mod errors
Building mod types
Building mod math
Building mod bytes
Building mod time
Could not open module 'linux::vdso': typedef variable $HARE_TD_linux::vdso not set

This shell experiment only goes so far. For modules below the top level it appears that harec expects environment variables with :: in their names, which is not legal for shell variables. At least not possible with a portable POSIX shell script. Expecting HARE_TD_linux_vdso would be cleaner, but also conflict with a top-level module called linux_vdso. I don’t see an obvious solution to this besides forbidding underscores in module names.

Overall the tool chain is very lean. I can build the QBE compiler backend in less than a minute on my machine, including the time needed to browse to its web site and download a release archive.

Most of the stack is standard off-the-shelf tools, and the hare command takes care of coordinating everything. The directory nature of translation units make this harder to integrate in a Makefile, compared to the POSIX way of building C programs and libraries for example.

I even contributed a couple minor patches as I was playing with it and building my own RPM for hare.

Still, I proudly managed to commit the unforgivable in that department.

The GNU build system

The Autotools are a widely-hated set of tools that make up the GNU build system, and operate on top of the venerable make command. This is a radical departure from the general simplicity of the Hare ecosystem.

I wouldn’t be surprised if the autotools weren’t welcome in the Hare community after witnessing hostility towards them left and right over the last decade.

While I share a lot of grievances regarding the autotools, I stand by my choice. Even good old make has its share of problems but to this day, make is what offers the best level of abstraction for me. And automake has the best integration with its underlying make backend.

On top of that, it offers many services ranging from the installation of files of different natures, a decent test driver, and a form of standardization over a set of make targets. This is also the only build system I know that erases itself for distribution, requiring only a shell and make on top of the package’s tool chain for downstream consumers. Such downstream consumers can be your operating system’s package manager, and for example it integrates out of the box in an RPM spec.

I just wish it wasn’t oh so complicated underneath.

When I wrote my first toy project in Rust I decided to use the autotools and I was happy about the result, but I was also constantly fighting the Rust tool chain. For tsess I decided to throw the towel and use cargo, I’d figure the non-Rust bits later, but I never reached that point.

When I rebooted tsess in Hare, I ended up using the unholy trinity:

Autoconf

There isn’t much to say here. The Hare tool chain only brings a handful of dependencies, and I bring automake, libtool, hare and scdoc.

I wrote my own Autoconf macros for Hare configuration, but nothing worth dwelling on, besides maybe the creation of a HARE_ENV variable for the hare and harec commands, and the injection of the VPATH into HAREPATH.

One thing that is not easy to do is deriving the default HAREPATH from the hare command. For the ancient Hare tool chain I’m using, hare version -v does not inspire confidence and I considered submitting a hare config subcommand inspired by pkg-config to probe the Hare tool chain:

DEFAULT_HAREPATH=$(hare config harepath)

But I will wait until QBE 1.2 is released and qbe/harec packages are updated on Fedora. I’m not interested in maintaining more packages locally.

Libtool

Likewise, there isn’t much to say about Libtool except that once included it changes how Automake rules are put together, and incorporating the Libtool behemoth is what made the rest simpler.

How ironic.

Automake

Normally one would declare a list of programs or binaries, then exhaustively list its sources, and in theory things just work. Things may only just work when languages are known to Autoconf, which is obviously not the case here.

I managed to keep the complications for tsess relatively under control:

HARE_BUILD = env $(HARE_ENV) $(HARE) build

LINK = $(HARE_BUILD) $(HAREFLAGS) -t $(host_cpu) $($(@)_HAREFLAGS) -o $@

bin_PROGRAMS = tsess

tsess_SOURCES = \
	bin/tsess/attach/attach.ha \
	bin/tsess/cli/cli.ha \
	bin/tsess/config/env.ha \
	bin/tsess/config/error.ha \
	bin/tsess/config/list.ha \
	bin/tsess/config/prop.ha \
	bin/tsess/config/sess.ha \
	bin/tsess/lexer/lex.ha \
	bin/tsess/lexer/tok.ha \
	bin/tsess/parser/parse.ha \
	bin/tsess/parser/rule.ha \
	bin/tsess/text/pos.ha \
	bin/tsess/text/scan.ha \
	bin/tsess/text/text.ha \
	bin/tsess/tmux/proc.ha \
	bin/tsess/main.ha

nodist_tsess_SOURCES = \
	bin/tsess/cli/gen/cmd.ha \
	bin/tsess/config/gen/env_vars.ha

tsess_DEPENDENCIES = \
	$(tsess_SOURCES) \
	$(nodist_tsess_SOURCES)

tsess_HAREFLAGS = \
	-D 'TSESS_STRING="$(PACKAGE_STRING)"' \
	-D 'TSESS_VERSION="$(PACKAGE_VERSION)"'

tsess_LDADD = $(srcdir)/bin/tsess/main.ha

With Libtool the build is normally driven by COMPILE and LINK variables, and in turn they rely on the configured CC and CCLD variables. But if tsess_SOURCES contained .c files instead it would implicitly populate tsess_OBJECTS with their .o counterparts.

As illustrated earlier, a Hare build is not as straightforward as turning individual C files into object files and linking them together, so nothing picks up the .ha files. This is why tsess_DEPENDENCIES has to reference the sources explicitly, to trigger $(LINK) commands.

And so only the LINK variable is overridden to invoke hare build, and it appears in the output as a CCLD operation, which is not inaccurate. This means that a Makefile.am file cannot contain rules for both Hare code and another language managed by the autotools like C, but it is still possible to have both in the same project in separate Makefiles.

It would be that simple if it weren’t for the nodist_tsess_SOURCES. Hare does not have a macro system like C or Rust and encourages code generation instead, so I decided to find a reason to justify having some.

VPATH hell

At this point it should be mentioned that many things that are not strictly necessary in tsess are there for learning purposes. For code generation, it’s not so clear cut as I wanted to avoid disconnecting the documentation from the code.

With Automake the make VPATH contains the build directory and the source directory. They can be the same directory, which is usually the case when ./configure is executed. When the configure script is invoked from a different directory like ../tsess/configure it results in $srcdir having a different value than $builddir and in this example ../tsess. This is why $(srcdir) is added escaped to the HAREPATH so it’s expanded by make.

I don’t know what $builddir can be besides . and I suspect it only exists for parity. There are others “abs” and “top” variants of the VPATH components.

This means that for a single source tree there can be multiple concurrent build trees. For example one building with GCC and another with Clang. It also means that a “dist” archive, once unpacked, can be mounted read-only and not dirtied in a continuous integration environment (or fail the build if that happens).

Unfortunately there are some gotchas in the VPATH department. For example if we consider the following suffix rule:

SUFFIXES = .1.scd .1
.1.scd.1:
	$(SCDOC) <$< >$@

If the source lives in the VPATH outside of the current directory, gmake will create the target in the current directory, but bmake will create it in the same directory as the source. If the source tree is read-only, or in general with Automake’s model, bmake’s behavior is a problem.

This is also a problem with tools that don’t support searching files from a PATH. In scdoc’s case there isn’t an include feature in the first place, so I used Autoconf instead to reconstruct the manuals from multiple fragments.

The downside of this approach is that as far as make is concerned, the .scd files are always seen as updated after the configure step so manual pages always need to be rebuilt. So even when building from a “dist” archive scdoc fails to erase itself like the rest of the autotools and there is no actual benefit of shipping the man pages.

It’s still possible for end users to erase scdoc at configure time:

$ tar xf tsess-0.1.tar.gz
$ cd tsess-0.1
$ ./configure SCDOC=/bin/true

Or at least it would be if scdoc didn’t strictly work with standard input and output. The result of $(SCDOC) <$< >$@ would result in an empty manual. If it was possible to run $(SCDOC) -o $@ $< instead, then that would have been possible. I think this is a concrete example of too much simplicity making a system simplistic.

To work around these limitations, I wrote a script called mkmissing that rebuilds target based on the checksum of sources. So unless some scd file is patched, scdoc should never be needed during a downstream build from a release archive.

And I didn’t just wing it, I carefully crafted mkmissing. For some reason I really enjoy shell scripting, I can’t help it…

Other tools like hare can work with VPATH builds, but the way HAREPATH works adds a limitation: a module cannot be split across multiple components of the HAREPATH. This is why generated files land in gen sub-modules:

nodist_tsess_SOURCES = \
        bin/tsess/cli/gen/cmd.ha \
        bin/tsess/config/gen/env_vars.ha

Thankfully hare doesn’t enforce that sub-modules be sub-directories of their “parent” modules. There doesn’t seem to be a relationship beyond a logical hierarchy.

The idea here is to generate the model for environment variables and the command line interface based on what the manual says. This way the manual is authoritative and changes to the documentation are immediately effective, or at least actionable. The getopt module (as far as my ancient Hare goes) has poor ergonomics but an excellent design, so I was able to implement a thin wrapper on top to get my desired result:

$ tsess -h
Usage: tsess [-h] [-q|-v]

List available subcommands.

Options:
  -h              Print usage and exit.
  -q              Quiet mode.
  -v              Verbose mode.

Subcommands:
  attach          Attach to a session.
  list            List available sessions.
  version         Display the version number.

$ tsess -h attach
Usage: tsess attach [-L|-S] <name>

Attach to a session.

Options:
  -L socket-name  Alternative socket name of the tmux server.
  -S socket-path  Alternative socket path to the tmux server.

Parameters:
  name            The name of the session.

Throwaway Hare programs generate the Hare sources from the manual fragments, and that became a problem from cross compilation.

Cross compiling tsess

Hare currently supports Linux and FreeBSD, and apparently OpenBSD is being worked on. It also targets the three architectures supported by QBE: x86_64, aarch64 and riscv64.

Cross compiling tsess is incredibly easy:

$ ./configure --host=riscv64-unknown-linux-gnu >/dev/null
$ make
  CCLD     gen_cmd
  GEN      bin/tsess/cli/gen/cmd.ha
  CCLD     gen_env
  GEN      bin/tsess/config/gen/env_vars.ha
  CCLD     tsess
  GEN      man/tsess.1
  GEN      man/tsess.5
$ file gen_cmd gen_env tsess
gen_cmd: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, with debug_info, not stripped
gen_env: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, with debug_info, not stripped
tsess:   ELF 64-bit LSB executable, UCB RISC-V, double-float ABI, version 1 (SYSV), statically linked, with debug_info, not stripped

That alone is quite the payoff.

I think that it even is easier than cross compiling a C project with a mix of programs for the build and host architectures.

Part of this is solved by my local RPM packaging of Hare. In order to build Hare a config.mk file is needed to specify the various commands for normal and cross builds. In the %prep step of the RPM build I generate the right configuration on the fly based on the target architecture with a Lua macro:

%prep
%autosetup -p1 -n %{name}-%{shortcommit}

tee config.mk <<'EOF'
PREFIX = %{_prefix}
BINDIR = %{_bindir}
MANDIR = %{_mandir}
SRCDIR = %{_usrsrc}
STDLIB = $(SRCDIR)/hare/stdlib
HAREPATH = $(SRCDIR)/hare/stdlib:$(SRCDIR)/hare/third-party
PLATFORM = %{_host_os}
ARCH = %{_arch}
HAREC = harec
HAREFLAGS =
QBE = qbe
AS = %{__as}
AR = %{__ar}
LD = %{__ld}
SCDOC = scdoc
HARECACHE = .cache
BINOUT = .bin
%{lua:
for arch in string.gmatch(macros.hare_arches, '%S+') do
    local host = arch.."-linux-gnu-"
    local host_as = "as"
    local host_ar = "ar"
    local host_cc = "cc"
    local host_ld = "ld"
    if arch == macros._arch then
        host = ""
        host_as = macros.__as
        host_ar = macros.__ar
        host_cc = macros.__cc
        host_ld = macros.__ld
    end
    print(string.upper(arch).."_AS = "..host..host_as.."\n")
    print(string.upper(arch).."_AR = "..host..host_ar.."\n")
    print(string.upper(arch).."_CC = "..host..host_cc.."\n")
    print(string.upper(arch).."_LD = "..host..host_ld.."\n")
end}
EOF

So all I need is to target the host architecture when “linking” Hare programs:

LINK = $(HARE_BUILD) $(HAREFLAGS) -t $(host_cpu) $($(@)_HAREFLAGS) -o $@

And code generators can override it with their hare flags:

gen_cmd_HAREFLAGS = -t $(build_cpu)

This way, there’s no need to worry about treating code generation like manual page generation. The Hare tool chain is a requirement anyway, unlike scdoc that I made optional with the help of mkmissing.

Testing

Hare, like Go and Rust, comes with native support for unit testing in the form of a @test attribute that is part of the language specification. I’m not planning to use the @test attribute in this project.

As a matter of fact, the 0.1 release of tsess ships with no automated tests. I could spend an entire post on why I don’t engage in unit testing and prefer to focus on functional testing and why the proverbial test pyramid should be considered harmful, but I’m already spending an entire post on introducing tsess and I believe that so far it still hasn’t been formally introduced.

The bottom line is that I actually have plans for testing and I figured about fifty percent of the testing architecture. What’s certain is that it will rely on Automake’s default test driver.

For all its flaws, I’m convinced that adopting the GNU build system for tsess brought more value than liabilities.

From zero to zero dot one

The development of tsess happened in small bursts. It took me much less than a week’s worth of full time work to get to the first release. Except of course that if I had tried to do everything in one go it would likely have taken more time. Between bursts I would read or reread parts of the spec, read more code from the standard library or contribute trivial patches to the project. So I had time to let whatever I learned sink in or perform some experiments before putting them in practice on the tsess code base:

August 23rd, one evening, built a working lexer
August 27th, one evening, built a working parser
Septenber 2nd, one evening, ended up shaving the mkmissing yak
October 5th, a train ride, started modeling configuration
November 23rd, one evening, more work on configuration, first codegen
December 4th, one evening, command line interface, more codegen
December 14th, one afternoon, added a tmux controller
December 15th, one morning, a minimum viable tmuxinator replacement
December 31th, tsess 0.1 released

Well, most of it happened in December, and of course this project had to start with a parser.

Parsing the configuration file

I enjoy writing parsers. For my day job I worked on several occasions on the VCL parser and my first Rust project was a VCL preprocessor to experiment with the VCL syntax.

Why make up my own configuration file format when I could reuse an established format like tmuxinator did? The answer is two-fold: I wanted a configuration format really tailored to the user experience for tsess and it was a very good opportunity to evaluate Hare since I worked on more parsers than I can remember.

Here is for example the tmuxinator configuration of my Fedora “workspace”:

name: fedora
root: ~/fedora

windows:
<% for pkg in [
        'haproxy',
        'libslz',
        'makeself',
        'numatop',
        'python-funcparserlib',
        'python-webcolors',
        'ShellCheck',
        'tiptop',
        'vcsh',
        'vmod-querystring',
] %>
  - <%= "#{pkg}" %>:
      layout: main-horizontal
      root: ~/fedora/<%= "#{pkg}" %>
      panes:
        - vim <%= "#{pkg}" %>.spec
        - git status
<% end %>

Reminder, it’s an ERB template producing YAML.

In tsess my configuration file looks like this instead:

name = fedora
root = "~/fedora"

[@pkg@]
layout = main-horizontal
root = "@pkg@"

pane.editor.send-keys = << EOF
vim @pkg@.spec
EOF

pane.prompt.send-keys = << EOF
git status
EOF

[pkg:haproxy]
[pkg:libslz]
[pkg:makeself]
[pkg:numatop]
[pkg:python-funcparserlib]
[pkg:python-webcolors]
[pkg:ShellCheck]
[pkg:tiptop]
[pkg:vcsh]
[pkg:vmod-querystring]

Looking at my needs I took the problem from the other end. Instead of enabling a programmatic definition of my repetitive windows and panes, I opted for a window template called pkg and applied the template to all my windows. It just so happens that I don’t need to further configure individual windows, but I could do something like this for example:

[pkg:special]
pane.prompt.split = above

In theory I should be able to write this and still send an implicit C-m key at the end:

pane.editor.send-keys = "vim @pkg@.spec"

But the distinction between single- and multi-line properties is missing from the 0.1 release. So instead I’m using the here-document syntax. It’s also possible to use single-quoted strings and here-documents to disable template expansion.

This is arguably over-engineered, but again learning purposes… Also, it’s overall much less engineering than ERB templating (that can embed Ruby code) and as long as it is not YAML, probably anything is better.

The configuration format is decently documented in the tsess(5) manual. The tsess(1) manual documents ~/.config/tsess/*.tsess as the default location for configuration files, but the tsess/ directory is currently omitted.

Working with tmux

Another thing that inspired tsess a long time ago happened by accident one day, innocently running ps. There, I found quite the unexpected line, with a single process from the ps output filling my entire terminal.

Its command line started like this:

sh -c #!/bin/bash  # Clear rbenv variables before starting tmux

After collecting the actual shell script with its new line characters present, I could witness the translation of the YAML configuration into tmux commands and shell if statements. I started looking at the os::exec module and it didn’t take long to find my marks. Like the getopt module it is very well designed, but unlike the getopt module it also has excellent ergonomics, so I was ready to move on in no time.

When it was finally time to start interacting with a tmux server, I read the manual and found the control mode, that uses a simple text protocol to issue commands. Emphasis on simple!

Hare is one of those languages like Rust that only works with UTF-8 strings, so using tmux in control mode would be really no fun if there wasn’t a -u option to guarantee UTF-8 encoding. Thanks to that it becomes really trivial to work with off-the-shelf goodies from the strings and bufio modules.

Working with strings

Another reason to start my tsess project was that I was increasingly itching to work on a parser that would not need to duplicate strings per token. I gave up quickly when I wrote my VCL preprocessor, as a Rust beginner. Writing my parser in Rust was incredibly pleasant, besides this failure.

To explain what I mean by not duplicating strings, consider the following contents to parse:

property = 'value'

The self-imposed constraint I insisted on keeping was to load the contents just once ("property = 'value'\n") and reference tokens in place. In C that’s impossible with null-terminated char * strings.

In Varnish Cache, we use the following data structure:

typedef struct {
	const char *b;
	const char *e;
} txt;

A token has some data, and a txt field to track the base and end of its text in the loaded source. In other words, a token would reference a substring of the source being parsed, in-place. The important thing is of course that a token must not outlive its source material.

If C leaves me to my own devices, Rust on the other hand has the ability to express lifetimes of types in relation to others. So I decided that I wanted to parse tsess’s configuration in Rust with a single heap allocation.

I hit my head pretty hard several times on the advanced lifetimes wall, but I made progress. Approximately a year ago, I read this documentation page on advanced lifetime and that was the breakthrough I needed.

Here is the commit message I wrote to myself back then:

lexer: Let lexer::Eval reference its text

The reason why the first attempt failed was the single lexer lifetime. Working with a &&str reference allows a distinction between the string lifetime and the shorter lexer lifetime.

In fact, an Eval<’s> should now be able to outlive the lexer that made it in the first place, since it’s only bound to the source’s lifetime.

The Eval implementation of the Display trait is replaced with Debug, and now that it carries its own text, it can escape it by itself.

As a result the main loop now looks like what it was meant to look like from the beginning, this time with great success.

And here is the diff of the main loop in this commit:

 fn main() {
-    for eval in lexer::Lexer::from(SRC) {
-        match eval.input {
-            Ok(_) => {
-                print!("{} '", eval);
-                for c in eval.text(SRC).chars() {
-                    print!("{}", c.escape_default())
-                }
-                print!("'\n");
-            }
-            Err(_) => println!("{}", eval),
-        };
+    for eval in lexer::Lexer::from(&SRC) {
+        println!("{eval:?}")
     }
 }

Thank you past me for holding the hand of your future self, me, I couldn’t remember this without you. And I agree with you, the main loop became slightly more readable.

So after becoming a master of 'a, 'b: 'a lifetimes I completed the lexical analyzer and the parser. Eventually, when it was time to load configuration files and stop working with a hard-coded str, I could not figure how to get a &&str from a Box<str> to apply my mighty advanced lifetime scheme.

I tried many things and they all failed, and that’s when I heard about Hare for the first time. Hare is not as nice as Rust when it comes to massaging strings to do things with them, but Hare also doesn’t have half a dozen types to represent strings under various circumstances like Rust. I’m exaggerating on purpose here, but this typically the kind of complications that make Rust hard to fit into one brain without daily practice.

Hare works with UTF-8 strings. The str type is just a []u8 (slice of 8 bit unsigned integers) in disguise with the promise that it is valid UTF-8. Hare insists for the most part to work with iterators when it comes to strings. I’m surprised because it was my understanding that UTF-8 ensures that you always know whether you are at the boundary of a code point, but using indices to get substrings is not common practice. At least the [] operator is not available for str but we can find byte-wise and rune-wise indexing in the strings module.

The preferred way to work with strings appear to be iterators. I wrote my own text module to implement functions like C’s strcspn() on top of iterators and in general whatever convenience would make sense for the parser. I used it for the code generators as well.

The first release

I struggled with a couple things, like Vim rendering mangled during startup and until a pane is resized and it looks like a SIGWINCH was enough. When I first successfully attached to my fedora session, it was only a matter of minutes to migrate my other sessions from tmuxinator to tsess.

The reason why I took care of a couple TODOs in the documentation and performed the release on December 31th was simply to say that this journey to publishing my first Hare project belonged exclusively to 2023. Yes, I’m quite vain.

I’m not expecting anyone to use it, but I will continue tsess’ development and reach the scope I have in mind before cutting a 1.0 release. Just in case, you may find it on Sourcehut.

Closing words

As for Hare, I find it to be an interesting language striking an excellent balance in a spectrum going from C to Rust. I find the cursor between safety and agency to be where I want it most of the time, possibly where I should want it if I knew better. I just regret for now the lack of support for macros because code generation is one thing, but there are other use cases. This is for one may shoehorn generics in C, but I digress. Just like Rust, I will keep an eye on Hare, but unlike Rust I may play with Hare on a more regular basis, starting with exploring tsess until I can agree to make a 1.0 release.

I still haven’t formally introduced tsess, but I will assume that you can get the gist of it after seeing a configuration example and the command line usage. You can also Read The Fancy Manuals to learn more, but not much more.