There was a silly little project I’d tried to encourage many folks to attempt last summer. Sri picked it up back in September and after many months, I decided to wrap it up and publish what’s there.
The intention is a simple, 2-sided A4 that folks can print and give out at repair cafes, like the End of 10 event series. Here’s the original issue, if you’d like to look at the initial thought process.
When I hear fairly technical folks talk about Linux in 2026, I still consistently hear things like “I don’t want to use the command line.” The fact that Spotify, Discord, Slack, Zoom, and Steam all run smoothly on Linux is far removed from these folks’ conception of the Linux desktop they might have formed back in 2009. Most people won’t come to Linux because it’s free of shlop and ads — they’re accustomed to choking on that stuff. They’ll come to Linux because they can open a spreadsheet for free, play Slay The Spire 2, or install Slack even though they promised themselves they wouldn’t use their personal computer for work.
The GNOME we all know and love is one we take for granted… and the benefits of which we assume everyone wants. But the efficiency, the privacy, the universality, the hackability, the gorgeous design, and the lack of ads? All these things are the icing on the cake. The cake, like it or not, is installing Discord so you can join the Sunday book club.
Here’s the A4. And here’s a snippet:
If you try this out at a local repair cafe, I’d love to know which bits work and which don’t. Good luck!
It’s been a while since we last shared a major update of Graphs. We’ve had a few minor releases, but the last time we had a substantial feature update was over two years ago.
This does not mean that development has stalled, to the contrary. But we’ve been working hard on some major changes that took some time to get completely right. Now after a long development cycle, we’re finally getting close enough to a release to be able to announce an official beta period. In this blog, I’ll try to summarize most of the changes in this release.
New data types
In previous version of Graphs, all data types are treated equally. This means that an equation is actually just regular data that is generated when loading. Which is fine, but it also means that the span of the equation is limited, the equation cannot be changed afterward, and operations on the equation will not be reflected in the equation name. In Graphs 2.0, we have three distinct data types: Datasets, Generated Datasets and Equations.
Datasets are the regular, imported data that you all know and love. Nothing really has changed here. Generated Datasets are essentially the same as regular datasets, but the difference is that these datasets are generated from an equation. They work the same as regular datasets, but for generated datasets you can change the equation, step size and the limits after creating the item. Finally, the major new addition is the concept of equations. As the name implies, equations are generated based on an equation you enter, but they span an infinite range. Furthermore, operations you perform on equations are done analytically. Meaning if you translate the equation `y = 2x + 3` with 3 in the y-direction, it will change to `y = 2x + 6`. If you perform a derivative, the equation will change to `y = 2x` etcetera. This is a long-requested feature, and has been made possible thanks to the magic of sympy and some trickery on the canvas. Below, there’s a video that demonstrates these three data types.
https://blogs.gnome.org/sstendahl/files/2026/04/Screencast-From-2026-04-14-12-23-56.webmRevamped Style Editor
We have redesigned the style editor, where we now show a live preview of the edited styles. This has been a pain point in the past, when you edit styles you cannot see how it actually affects the canvas. Now the style editor immediately tells you how it will affect a canvas, making it much easier to change the style exactly to your preferences.
We have also added the ability to import styles. Since Graphs styles are based on matplotlib styles, most features from a matplotlib style generally work. Similarly, you can now export your styles as well making it easier to share your style or simply to send it to a different machine. Finally, the style editor can be opened independently of Graphs. By opening a Graphs style from your file explorer, you can change the style without having to open Graphs.
We also added some new options, such as the ability to style the new error bars. But also the option to draw tick labels (so the values) on all axes that have ticks.
The revamped style editorImproved data import
We have completely reworked the way data is imported. Under the hood, our modules are completely modular making it possible to add new parsers without having to mess with the code. Thanks to this rework, we have added support for spreadsheets (LibreOffice .ods and Microsoft Office .xlxs) and for sqlite databases files. The UI automatically updates accordingly. For example for spreadsheets, columns are imported by the column name (alphabetical letter) instead of an index, while sqlite imports show the tables present in the database.
The new import dialogFurthermore, the import dialog has been improved. It is not possible to add multiple files at once, or import multiple datasets from the same file. Settings can be adjusted for each dataset individually. And you can even import just from a single column. We also added the ability to import error-bars on either axes, and added some pop-up buttons that explain certain settings.
Error bars
I mentioned this in the previous paragraph, but as it’s a feature that’s been requested multiple times I thought it’d be good to state this explicitly as well. We now added support for error bars. Error bars can easily be set on the import dialog, and turned on and off for each axis when editing the item.
Reworked Curve fitting
The curve fitting has been reworked completely under the hood. While the changes may not be that obvious as a user, the code has basically been completely replaced. The most important change is that the confidence band is now calculated completely correctly using the delta-method. Previously a naive approach was used where the limits were calculated using the standard deviation each parameter. This does not hold up well in most cases though. The parameter values that are given are also no longer rounded in the new equation names (e.g. 421302 used to be rounded to 421000). More useful error messages are provided when things go wrong, custom equations now have an apply button which improves smoothness when entering new equations, the root mean squared error is added as a second goodness-of-fit measure, you can now check out the residuals of your fit. The residuals can be useful to check if your fit is physically correct. A good fit will show residuals scattered randomly around zero with no visible pattern. A systematic pattern in the residuals, such as a curve or a trend suggests that the chosen model may not be appropriate for the data.
The old version of Graphs with the naive calculation of the confidence band The new version of Graphs with the proper calculation of the confidence band.UI changes
We’ve tweaked the UI a bit all over the place. But one particular change that is worth to highlight, is that we have moved the item and figure settings to the sidebar. The reason for this, is that the settings are typically used to affect the canvas so you don’t want to lose sight of how your setting affects the canvas while you’re updating. For example, when setting the axes limits, you want to see how your graph looks with the new limit, having a window obstructing the view does not help.
Another nice addition is that you can now simply click on a part of the canvas, such as the limits, and it will immediately bring you to the figure settings with the relevant field highlighted. See video below.
https://blogs.gnome.org/sstendahl/files/2026/04/Screencast-From-2026-04-14-11-30-21.webmMobile screen support
With the upcoming release, we finally have full support for mobile devices. See here a quick demonstration on an old OnePlus 6:
https://blogs.gnome.org/sstendahl/files/2026/04/smoll.webmFigure exporting
One nice addition is the improved figure export. Instead of simply taking the same canvas as you see on the screen, you can now explicitly set a certain resolution. This is vital if you have a lot of figures in the same work, or need to publish your figures in academic journals, and you need consistency both in size and in font sizes. Of course, you can still use the previous setting and have the same size as in the application.
The new export figure dialogMore quality of life changes
The above are just a highlight of some major feature updates. But there’s a large amount of features that we added. Here’s a rapid-fire list of other niceties that we added:
And a whole bunch of bug-fixes, under-the-hood changes, and probably some features I have forgotten about. Overall, it’s our biggest update yet by far, and I am excited to finally be able to share the update soon.
As always, thanks to everyone who has been involved in this version. Graphs is not a one-person project. The bulk of the maintenance is done by me and Christoph, the other maintainer. And of course, we should thank the entire community. Both within GNOME projects (such as help from the design team, and the translation team), as well as outsiders that come with feedback, report or plain suggestions.
Getting the beta
This release is still in beta while we are ironing out the final issues. The expected release date is somewhere in the second week of may. In the meantime, feel free to test the beta. We are very happy for any feedback, especially in this period!
You can get the beta directly from flathub. First you need to add the flathub beta remote:
flatpak remote-add --if-not-exists flathub-beta https://flathub.org/beta-repo/flathub-beta.flatpakrepo
Then, you can install the application:
flatpak install flathub-beta se.sjoerd.Graphs
To run the beta version by default, the following command can be used:
sudo flatpak make-current se.sjoerd.Graphs beta
Note that the sudo is neccesary here, as it sets the current branch on the system level. To install this on a per-user basis, the flag –user can be used in the previous commands. To switch back to the stable version simply run the above command replacing beta with stable.
The beta branch on update should get updated somewhat regularly. If you don’t feel like using the flathub-beta remote, or want the latest build. You can also get the release from the GitLab page, and build it in GNOME Builder.
Back in 2019, we undertook a radical overhaul of how GNOME app icons work. The old Tango-era style required drawing up to seven separate sizes per icon and a truckload of detail. A task so demanding that only a handful of people could do it. The "new" style is geometric, colorful, but mainly achievable. Redesigning the system was just the first step. We needed to actually get better icons into the hands of app developers, as those should be in control of their brand identity. That's where app-icon-requests came in.
As of today, the project has received over a hundred icon requests. Each one represents a collaboration between a designer and a developer, and a small but visible improvement to the Linux desktop.
How It WorksIdeally if a project needs a quick turnaround and direct control over the result, the best approach remains doing it in-house or commission a designer.
But if you're not in a rush, and aim to be a well designed GNOME app in particular, you can make use of the idle time of various GNOME designers. The process is simple. If you're building an app that follows the GNOME Human Interface Guidelines, you can open an icon request. A designer from the community picks up the issue, starts sketching ideas, and works with you until the icon is ready to ship. If your app is part of GNOME Circle or is aiming to join, you're far more likely to get a designer's attention quickly.
The sketching phase is where the real creative work happens. Finding the right metaphor for what an app does, expressed in a simple geometric shape. It's the part I enjoy most, and why I've been sharing my Sketch Friday process on Mastodon for over two years now (part 2). But the project isn't about one person's sketches. It's a team effort, and the more designers join, the faster the backlog shrinks.
HighlightsHere are a few of the icons that came through the pipeline. Each started as a GitLab issue and ended up as pixels on someone's desktop.
Alpaca, an AI chat client, went through several rounds of sketching to find just the right llama. Bazaar, an alternative to GNOME Software, took eight months and 16 comments to go from a shopping basket concept through a price tag to the final market stall. Millisecond, a system tuning tool for low-latency audio, needed several rounds to land on the right combination of stopwatch and waveform. Field Monitor shows how multiple iterations narrow down the concept. And Exhibit, the 3D model viewer, is one of my personal favorites.
You can browse all 127 completed icons to see the full range — from core GNOME apps to niche tools on Flathub.
Papers: From Sketch to ShipTo give a sense of what the process looks like up close, here's Papers — the GNOME document viewer. The challenge was finding an icon that says "documents" without being yet another generic file icon.
The early sketches explored different angles — a magnifying glass over stacked pages, reading glasses resting on a document. The final icon kept the reading glasses and the stack of colorful papers, giving it personality while staying true to what the app does. The whole thing played out in the GitLab issue, with the developer and designer going back and forth until both were happy.
While the new icon style is far easier to execute than the old high-detail GNOME icons, that doesn't mean every icon is quick. The hard part was never pushing pixels — it's nailing the metaphor. The icon needs to make sense to a new user at a glance, sit well next to dozens of other icons, and still feel like this app to the person who built it. Getting that right is a conversation between the designer's aesthetic judgment and the maintainer's sense of identity and purpose, and sometimes that conversation takes a while.
Bazaar is a good example.
The app was already shipping with the price tag icon when Tobias Bernard — who reviews apps for GNOME Circle — identified its shortcomings and restarted the process. That kind of quality gate is easy to understate, but it's a big part of why GNOME apps look as consistent as they do. Tobias is also a prolific icon designer himself, frequently contributing icons to key projects across the ecosystem. In this case, the sketches went from a shopping basket through the price tag to a market stall with an awning — a proper bazaar. Sixteen comments and eight months later, the icon shipped.
Get InvolvedThere are currently 20 open icon requests waiting for a designer. Recent ones like Kotoba (a Japanese dictionary), Simba (a Samba manager), and Slop Finder haven't had much activity yet and could use a designer's attention.
If you're a designer, or want to become one, this is a great place to start contributing to Free software. The GNOME icon style was specifically designed to be approachable: bold shapes, a defined color palette, clear guidelines. Tools like Icon Preview and Icon Library make the workflow smooth. Pick a request, start with a pencil sketch on paper, and iterate from there. There's also a dedicated Matrix room #appicondesign:gnome.org where icon work is discussed — it's invite-only due to spam, but feel free to poke me in #gnome-design or #gnome for an invitation. If you're new to Matrix, the GNOME Handbook explains how to get set up.
If you're an app developer, don't despair shipping with a placeholder icon. Follow the HIG, open a request, and a designer will help you out. If you're targeting GNOME Circle, a proper icon is part of the deal anyway.
A good icon is one of those small things that makes an app feel real — finished, polished, worth installing. Now that we actually have a place to browse apps, an app icon is either the fastest way to grab attention or make people skip. If you've got some design chops and a few hours to spare, pick an issue and start sketching.
Need a Fast Track?If you need a faster turnaround or just want to work with someone who's been helping out with GNOME's visual identity for as long as I can remember — Hylke Bons offers app icon design for open source projects through his studio, Planet Peanut. Hylke has been a core contributor to GNOME's icon work for well over a decade. You'll be in great hands.
His service has a great freebie for FOSS projects — funded by community sponsors. You get three sketches to choose from, a final SVG, and a symbolic variant, all following the GNOME icon guidelines. If your project uses an OSI-approved license and is intended to be distributed through Flathub, you're eligible. Consider sponsoring his work if you can — even a small amount helps keep the pipeline going.
This winter I was bored and needed something new, so I spent lots of my free time disassembling and analysing Monster World IV for the SEGA Mega Drive. More specifically, I looked at the 2008 Virtual Console revision of the game, which adds an English translation to the original 1994 release.
My long term goal would be to fully disassemble and analyse the game, port it to C or Rust as I do, and then port it to the Game Boy Advance. I don’t have a specific reason to do that, I just think it’s a charming game from a dated but charming series, and I think the Monaster World series would be a perfect fit on the Game Boy Advance. Since a long time, I also wanted to experiment with disassembling or decompiling code, understanding what doing so implies, understanding how retro computing systems work, and understanding the inner workings of a game I enjoy. Also, there is not publicly available disassembly of this game as far as I know.
As Spring is coming, I sense my focus shifting to other projets, but I don’t want this work to be gone forever and for everyone, especially not for future me. Hence, I decided to publish what I have here, so I can come back to it later or so it can benefit someone else.
First, here is the Ghidra project archive. It’s the first time I used Ghidra and I’m certain I did plenty of things wrong, feedback is happily welcome! While I tried to rename things as my understanding of the code grew, it is still quite a mess of clashing name conventions, and I’m certain I got plenty of things wrong.
Then, here is the Rust-written data extractor. It documents how some systems work, both as code and actual documentation. It mainly extracts and documents graphics and their compression methods, glyphs and their compression methods, character encodings, and dialog scripts. Similarly, I’m not a Rust expert, I did my best but I’m certain there is area for improvement, and everything was constantly changing anyway.
There is more information that isn’t documented and is just floating in my head, such as how the entity system works, but I yet have to refine my understanding of it. Same goes for the optimimzations allowed by coding in assembly, such as using specific registers for commonly used arguments. Hopefully I will come back to this project and complete it, at least when it comes to disassembling and documenting the game’s code.
Red Hat just published the Accessibility Conformance Report (ACR) for Red Hat Enterprise Linux 10.
Accessibility Conformance Reports basically document how our software measures up against accessibility standards like WCAG and Section 508. Since RHEL 10 is built on GNOME 47, this report is a good look at how our stack handles various accessibility things from screen readers to keyboard navigation.
Getting a desktop environment to meet these requirements is a huge task and it’s only possible because of the work done by our community in projects like: Orca, GTK, Libadwaita, Mutter, GNOME Shell, core apps, etc…
Kudos to everyone in the GNOME project that cares about improving accessibility. We all know there’s a long way to go before desktop computing is fully accessible to everyone, but we are surely working on that.
If you’re curious about the state of accessibility in the 47 release or how these audits work, you can find the full PDF here.
Sometimes in my posts I need to show a screen recording. Videos can get heavy rapidly and take a lot of time to load.
I also write my posts in markdown which has syntax to include images:
Using that syntax for videos doesn't work though. Since html is valid markdown, it's possible to manually add <video> tags, but it's a bit more tedious.
It's also possible to use ffmpeg to convert a mp4 video into a looping animated AVIF. The command to do it is
$ ffmpeg -i demo.mp4 -loop 0 demo.avifAVIF also compresses very well, without losing too much detail.
$ ls -lh total 1.8M -rw-r--r--. 1 thib thib 566K Apr 11 09:26 typst-live-preview.avif -rw-r--r--. 1 thib thib 1.2M Apr 8 22:02 typst-live-preview.mp4The support for AVIF in browsers is excellent, sitting at more than 94% as of writing.
My only remaining gripe is that Astro chokes on AVIF images when trying to optimize images in Markdown posts. A workaround for it is to store the AVIFs as static assets so Astro doesn't try to optimize them.
This post attempts to explain how Huion tablet devices currently integrate into the desktop stack. I'll touch a bit on the Huion driver and the OpenTablet driver but primarily this explains the intended integration[1]. While I have access to some Huion devices and have seen reports from others, there are likely devices that are slightly different. Huion's vendor ID is also used by other devices (UCLogic and Gaomon) so this applies to those devices as well.
This post was written without AI support, so any errors are organic artisian hand-crafted ones. Enjoy.
The graphics tablet stackFirst, a short overview of the ideal graphics tablet stack in current desktops. At the bottom is the physical device which contains a significant amount of firmware. That device provides something resembling the HID protocol over the wire (or bluetooth) to the kernel. The kernel typically handles this via the generic HID drivers [2] and provides us with an /dev/input/event evdev node, ideally one for the pen (and any other tool) and one for the pad (the buttons/rings/wheels/dials on the physical tablet). libinput then interprets the data from these event nodes, passes them on to the compositor which then passes them via Wayland to the client. Here's a simplified illustration of this:
Unlike the X11 api, libinput's API works both per-tablet and per-tool basis. In other words, when you plug in a tablet you get a libinput device that has a tablet tool capability and (optionally) a tablet pad capability. But the tool will only show up once you bring it into proximity. Wacom tools have sufficient identifiers that we can a) know what tool it is and b) get a unique serial number for that particular device. This means you can, if you wanted to, track your physical tool as it is used on multiple devices. No-one [3] does this but it's possible. More interesting is that because of this you can also configure the tools individually, different pressure curves, etc. This was possible with the xf86-input-wacom driver in X but only with some extra configuration, libinput provides/requires this as the default behaviour.
The most prominent case for this is the eraser which is present on virtually all pen-like tools though some will have an eraser at the tail end and others (the numerically vast majority) will have it hardcoded on one of the buttons. Changing to eraser mode will create a new tool (the eraser) and bring it into proximity - that eraser tool is logically separate from the pen tool and can thus be configured differently. [4]
Another effect of this per-tool behaviour is also that we know exactly what a tool can do. If you use two different styli with different capabilities (e.g. one with tilt and 2 buttons, one without tilt and 3 buttons), they will have the right bits set. This requires libwacom - a library that tells us, simply: any tool with id 0x1234 has N buttons and capabilities A, B and C. libwacom is just a bunch of static text files with a C library wrapped around those. Without libwacom, we cannot know what any individual tool can do - the firmware and kernel always expose the capability set of all tools that can be used on any particular tablet. For example: wacom's devices support an airbrush tool so any tablet plugged in will announce the capabilities for an airbrush even though >99% of users will never use an airbrush [5].
The compositor then takes the libinput events, modifies them (e.g. pressure curve handling is done by the compositor) and passes them via the Wayland protocol to the client. That protocol is a pretty close mirror of the libinput API so it works mostly the same. From then on, the rest is up to the application/toolkit.
Notably, libinput is a hardware abstraction layer and conversion of hardware events into others is generally left to the compositor. IOW if you want a button to generate a key event, that's done either in the compositor or in the application/toolkit. But the current versions of libinput and the Wayland protocol do support all hardware features we're currently aware of: the various stylus types (including Wacom's lens cursor and mouse-like "puck" devices) and buttons, rings, wheels/dials, and touchstrips on pads. We even support the rather once-off Dell Canvas Totem device.
Huion devicesHuion's devices are HID compatible which means they "work" out of the box but they come in two different modes, let's call them firmware mode and tablet mode. Each tablet device pretends to be three HID devices on the wire and depending on the mode some of those devices won't send events.
Firmware modeThis is the default mode after plugging the device in. Two of the HID devices exposed look like a tablet stylus and a keyboard. The tablet stylus is usually correct (enough) to work OOTB with the generic kernel drivers, it exports the buttons, pressure, tilt, etc. The buttons and strips/wheels/dials on the tablet are configured to send key events. For example, the Inspiroy 2S I have sends b/i/e/Ctrl+S/space/Ctrl+Alt+z for the buttons and the roller wheel sends Ctrl-/Ctrl= depending on direction. The latter are often interpreted as zoom in/out so hooray, things work OOTB. Other Huion devices have similar bindings, there is quite some overlap but not all devices have exactly the same key assignments for each button. It does of course get a lot more interesting when you want a button to do something different - you need to remap the key event (ideally without messing up your key map lest you need to type an 'e' later).
The userspace part is effectively the same, so here's a simplified illustration of what happens in kernel land: Any vendor-specific data is discarded by the kernel (but in this mode that HID device doesn't send events anyway).
Tablet modeIf you read a special USB string descriptor from the English language ID, the device switches into tablet mode. Once in tablet mode, the HID tablet stylus and keyboard devices will stop sending events and instead all events from the device are sent via the third HID device which consists of a single vendor-specific report descriptor (read: 11 bytes of "here be magic"). Those bits represent the various features on the device, including the stylus features and all pad features as buttons/wheels/rings/strips (and not key events!). This mode is the one we want to handle the tablet properly. The kernel's hid-uclogic driver switches into tablet mode for supported devices, in userspace you can use e.g. huion-switcher. The device cannot be switched back to firmware mode but will return to firmware mode once unplugged.
Once we have the device in tablet mode, we can get true tablet data and pass it on through our intended desktop stack. Alas, like ogres there are layers.
hid-uclogic and udev-hid-bpfHistorically and thanks in large parts to the now-discontinued digimend project, the hid-uclogic kernel driver did do the switching into tablet mode, followed by report descriptor mangling (inside the kernel) so that the resulting devices can be handled by the generic HID drivers. The more modern approach we are pushing for is to use udev-hid-bpf which is quite a bit easer to develop for. But both do effectively the same thing: they overlay the vendor-specific data with a normal HID report descriptor so that the incoming data can be handled by the generic HID kernel drivers. This will look like this:
Notable here: the stylus and keyboard may still exist and get event nodes but never send events[6] but the uclogic/bpf-enabled device will be proper stylus/pad event nodes that can be handled by libinput (and thus the rest), with raw hardware data where buttons are buttons.
ChallengesBecause in true manager speak we don't have problems, just challenges. And oh boy, we collect challenges as if we'd be organising the olypmics.
hid-uclogic and libinputFirst and probably most embarrassing is that hid-uclogic has a different way of exposing event nodes than what libinput expects. This is largely my fault for having focused on Wacom devices and internalized their behaviour for long years. The hid-uclogic driver exports the wheels and strips on separate event nodes - libinput doesn't handle this correctly (or at all). That'd be fixable but the compositors also don't really expect this so there's a bit more work involved but the immediate effect is that those wheels/strips will likely be ignored and not work correctly. Buttons and pens work.
udev-hid-bpf and huion-switcherhid-uclogic being a kernel driver has access to the underlying USB device. The HID-BPF hooks in the kernel currently do not, so we cannot switch the device into tablet mode from a BPF, we need it in tablet mode already. This means a userspace tool (read: huion-switcher) triggered via udev on plug-in and before the udev-hid-bpf udev rules trigger. Not a problem but it's one more moving piece that needs to be present (but boy, does this feel like the unix way...).
Huion's precious product IDsBy far the most annoying part about anything Huion is that until relatively recently (I don't have a date but maybe until 2 years ago) all of Huion's devices shared the same few USB product IDs. For most of these devices we worked around it by matching on device names but there were devices that had the same product id and device name. At some point libwacom and the kernel and huion-switcher had to implement firmware ID extraction and matching so we could differ between devices with the same 0256:006d usb IDs. Luckily this seems to be in the past now with modern devices now getting new PIDs for each individual device. But if you have an older device, expect difficulties and, worse, things to potentially break after firmware updates when/if the firmware identification string changes. udev-hid-bpf (and uclogic) rely on the firmware strings to identify the device correctly.
udev-hid-bpf and hid-uclogicBecause we have a changeover from the hid-uclogic kernel driver to the udev-hid-bpf files there are rough edges on "where does this device go". The general rule is now: if it's not a shared product ID (see above) it should go into udev-hid-bpf and not the uclogic driver. Easier to maintain, much more fire-and-forget. Devices already supported by udev-hid-bpf will remain there, we won't implement BPFs for those (older) devices, doubly so because of the aforementioned libinput difficulties with some hid-uclogic features.
Reverse engineering requiredThe newer tablets are always slightly different so we basically need to reverse-engineer each tablet to get it working. That's common enough for any device but we do rely on volunteers to do this. Mind you, the udev-hid-bpf approach is much simpler than doing it in the kernel, much of it is now copy-paste and I've even had quite some success to get e.g. Claude Code to spit out a 90% correct BPF on its first try. At least the advantage of our approach to change the report descriptor means once it's done it's done forever, there is no maintenance required because it's a static array of bytes that doesn't ever change.
Plumbing support into userspaceBecause we're abstracting the hardware, userspace needs to be fully plumbed. This was a problem last year for example when we (slowly) got support for relative wheels into libinput, then wayland, then the compositors, then the toolkits to make it available to the applications (of which I think none so far use the wheels). Depending on how fast your distribution moves, this may mean that support is months and years off even when everything has been implemented. On the plus side these new features tend to only appear once every few years. Nonetheless, it's not hard to see why the "just sent Ctrl=, that'll do" approach is preferred by many users over "probably everything will work in 2027, I'm sure".
So, what stylus is this?A currently unsolved problem is the lack of tool IDs on all Huion tools. We cannot know if the tool used is the two-button + eraser PW600L or the three-button-one-is-an-eraser-button PW600S or the two-button PW550 (I don't know if it's really 2 buttons or 1 button + eraser button). We always had this problem with e.g. the now quite old Wacom Bamboo devices but those pens all had the same functionality so it just didn't matter. It would matter less if the various pens would only work on the device they ship with but it's apparently quite possible to use a 3 button pen on a tablet that shipped with a 2 button pen OOTB. This is not difficult to solve (pretend to support all possible buttons on all tools) but it's frustrating because it removes a bunch of UI niceties that we've had for years - such as the pen settings only showing buttons that actually existed. Anyway, a problem currently in the "how I wish there was time" basket.
SummaryOverall, we are in an ok state but not as good as we are for Wacom devices. The lack of tool IDs is the only thing not fixable without Huion changing the hardware[7]. The delay between a new device release and driver support is really just dependent on one motivated person reverse-engineering it (our BPFs can work across kernel versions and you can literally download them from a successful CI pipeline). The hid-uclogic split should become less painful over time and the same as the devices with shared USB product IDs age into landfill and even more so if libinput gains support for the separate event nodes for wheels/strips/... (there is currently no plan and I'm somewhat questioning whether anyone really cares). But other than that our main feature gap is really the ability for much more flexible configuration of buttons/wheels/... in all compositors - having that would likely make the requirement for OpenTabletDriver and the Huion tablet disappear.
OpenTabletDriver and Huion's own driverThe final topic here: what about the existing non-kernel drivers?
Both of these are userspace HID input drivers which all use the same approach: read from a /dev/hidraw node, create a uinput device and pass events back. On the plus side this means you can do literally anything that the input subsystem supports, at the cost of a context switch for every input event. Again, a diagram on how this looks like (mostly) below userspace:
Note how the kernel's HID devices are not exercised here at all because we parse the vendor report, create our own custom (separate) uinput device(s) and then basically re-implement the HID to evdev event mapping. This allows for great flexibility (and control, hence the vendor drivers are shipped this way) because any remapping can be done before you hit uinput. I don't immediately know whether OpenTabletDriver switches to firmware mode or maps the tablet mode but architecturally it doesn't make much difference.
From a security perspective: having a userspace driver means you either need to run that driver daemon as root or (in the case of OpenTabletDriver at least) you need to allow uaccess to /dev/uinput, usually via udev rules. Once those are installed, anything can create uinput devices, which is a risk but how much is up for interpretation.
[1] As is so often the case, even the intended state does not necessarily spark joy
[2] Again, we're talking about the intended case here...
[3] fsvo "no-one"
[4] The xf86-input-wacom driver always initialises a separate eraser tool even if you never press that button
[5] For historical reasons those are also multiplexed so getting ABS_Z on a device has different meanings depending on the tool currently in proximity
[6] In our udev-hid-bpf BPFs we hide those devices so you really only get the correct event nodes, I'm not immediately sure what hid-uclogic does
[7] At which point Pandora will once again open the box because most of the stack is not yet ready for non-Wacom tool ids
After successfully moving this blog to Zola, doubts got suppressed and I couldn't resist porting the GNOME Release Notes too.
The ProofThe blog port worked better than expected. Fighting CI github action was where most enthusiasm was lost. The real test though was whether Zola could handle a site way more important than my little blog — one hosting release notes for GNOME.
What ChangedThe main work was porting the templates from Liquid to Tera, the same exercise as the blog. That included structural change to shift releases from Jekyll pages to proper Zola posts. This enabled two things that weren't possible before:
The site now has a working RSS feed — years of broken promises finally fulfilled. The full archive from GNOME 2.x through 50 is available. And perhaps best of all: zero dependency management and supporting people who "just want to write a bit of markdown". Just a single binary.
I'd say it's another success story and if I were a Jekyll project in the websites team space, I'd start to worry.
Over the past week, I’ve been building goblin, a linter specifically designed for GObject-based C codebases.
If you know Rust’s clippy or Go’s go vet, think of goblin as the same thing for GObject/GLib.
Why this existsA large part of the Linux desktop stack (GTK, Mutter, Pango, NetworkManager) is built on GObject. These projects have evolved over decades and carry a lot of patterns that predate newer GLib helpers, are easy to misuse, or encode subtle lifecycle invariants that nothing verifies.
This leads to issues like missing dispose/finalize/constructed chain-ups (memory leaks or undefined behavior), incorrect property definitions, uninitialized GError* variables, or function declarations with no implementation.
These aren’t theoretical. This GTK merge request recently fixed several missing chain-ups in example code.
Despite this, the C ecosystem lacks a linter that understands GObject semantics. goblin exists to close that gap.
What goblin checksgoblin ships with 35 rules across different categories:
23 out of 35 rules are auto-fixable. You should apply fixes one rule at a time to review the changes:
goblin --fix --only use_g_strcmp0 goblin --fix --only use_clear_functions CI/CD Integrationgoblin fits into existing pipelines.
GitHub Actions - name: Run goblin run: goblin --format sarif > goblin.sarif - name: Upload SARIF results uses: github/codeql-action/upload-sarif@v3 with: sarif_file: goblin.sarifResults show up in the Security tab under "Code scanning" and inline on pull requests.
GitLab CI goblin: image: ghcr.io/bilelmoussaoui/goblin:latest script: - goblin --format sarif > goblin.sarif artifacts: reports: sast: goblin.sarifResults appear inline in merge requests.
ConfigurationRules default to warn, and can be tuned via goblin.toml:
min_glib_version = "2.40" # Auto-disable rules for newer versions [rules] g_param_spec_static_name_canonical = "error" # Make critical use_g_strcmp0 = "warn" # Keep as warning use_g_autoptr_inline_cleanup = "ignore" # Disable # Per-rule ignore patterns missing_implementation = { level = "error", ignore = ["src/backends/**"] }You can adopt it gradually without fixing everything at once.
Try it # Run via container podman run --rm -v "$PWD:/workspace:Z" ghcr.io/bilelmoussaoui/goblin:latest # Install locally cargo install --git https://github.com/bilelmoussaoui/goblin goblin # Usage goblin # Lint current directory goblin --fix # Apply automatic fixes goblin --list-rules # Inspect available rulesThe project is early, so feedback is especially valuable (false positives, missing checks, workflow issues, etc.).
When a container crashes, it can be for several reasons. Sometimes the log won't tell you much about why the container crashed, and you can't get a shell into that container because... it has already crashed. It turns out that kubectl debug can let you do exactly that.
I was trying to ship Helfertool on our Kubernetes cluster. The firs step was to get it to work locally in my Minikube. The container I was deploying kept crashing, with an error message that put me on the right track: Cannot write to log directory. Exiting.
The container expected me to mount a volume on /log so it could write logs, which I did. I wanted to run a quick test from within the container to see if I could create a file in that directory. But when your container has already crashed you can't get a shell into it.
My better informed colleague Quentin told me about kubectl debug, a command that lets me create a copy of the crashing container but with a different COMMAND.
So instead of running its normal program, I can ask the container to run sh with the following command
$ kubectl debug mypod -it \ --copy-to=mypod-debug \ --container=my-pods-image \ -- shAnd just like that I have shell inside a similar container. Using this trick I could confirm that I can't touch a file in that /log directory because it belongs to root while my container is running unprivileged.
That's a great trick to troubleshoot from within a crashing container!
Update on what happened across the GNOME project in the week from April 03 to April 10.
GNOME Core Apps and Libraries Blueprint ↗A markup language for app developers to create GTK user interfaces.
James Westman reports
blueprint-compiler is now available on PyPI. You can install it with pip install blueprint-compiler.
GNOME Circle Apps and Libraries Hieroglyphic ↗Find LaTeX symbols
FineFindus reports
Hieroglyphic 2.3 is out now. Thanks to the exciting work done by Bnyro, Hieroglyphic can now also recognize Typst symbols (a modern alternative to LaTeX). Hardware-acceleration will now be preferred, when available, reducing power-consumption.
Download the latest version from FlatHub.
Amberol ↗Plays music, and nothing else.
Emmanuele Bassi says
Amberol 2026.1 is out, using the GNOME 50 run time! This new release fixes a few issues when it comes to loading music, and has some small quality of life improvements in the UI, like: a more consistent visibility of the playlist panel when adding songs or searching; using the shortcuts dialog from libadwaita; and being able to open the file manager in the folder containing the current song. You can get Amberol on Flathub.
Third Party ProjectsAlexander Vanhee says
A new version of Bazaar is out now. It features the ability to filter search results via a new popover and reworks the add-ons dialog to include a page that shows more information about a specific entry. If you try to open an add-on via the AppStream scheme, it will now display this page, which is useful when you want to redirect users to install an add-on from within your app.
Also, please take a look at the statistics dialog — it now features a cool gradient.
Check it out on Flathub
dabrain34 reports
GstPipelineStudio 0.5.1 is out now. It’s a great pleasure to announce this new version allowing to deal with DOT files directly. Check the project web page for more information or the following blog post for more details about the release.
Anton Isaiev announces
RustConn (connection manager for SSH, RDP, VNC, SPICE, Telnet, Serial, Kubernetes, MOSH, and Zero Trust protocols)
Versions 0.10.9–0.10.14 landed with a solid round of usability, security, and performance work.
Staying connected got easier. If an SSH session drops unexpectedly, RustConn now polls the host and reconnects on its own as soon as it’s back. Wake-on-LAN works the same way: send the magic packet and RustConn connects automatically once the machine boots. You can also right-click any connection to check if the host is online, and a new “Connect All” option opens every connection in a folder at once. For RDP there’s a Mouse Jiggler that keeps idle sessions alive.
Terminal Activity Monitor is a new per-session feature that watches for output activity or silence, which is handy for long-running jobs. You get notifications as tab icons, toasts, and desktop alerts when the window is in the background.
Security got a lot of attention. RDP now defaults to trust-on-first-use certificate validation instead of blindly accepting everything. Credentials for Bitwarden and 1Password are no longer visible in the process list. VNC passwords are zeroized on drop. Export files are written with owner-only permissions. Dangerous custom arguments are blocked for both VNC and FreeRDP viewers.
Hoop.dev joins as the 11th Zero Trust provider. There’s also a new custom SSH agent socket setting that lets Flatpak users connect through KeePassXC, Bitwarden, or GPG-based SSH agents, something the Flatpak sandbox previously made difficult.
Smoother on HiDPI and 4K. RDP frame rendering skips a 33 MB per-frame copy when the data is already in the right format. Highlight rules, search, and log sanitization patterns are compiled once instead of on every keystroke or terminal line.
GNOME HIG polish. Success notifications now use non-blocking toasts instead of modal dialogs. Sidebar context menus are native PopoverMenus with keyboard navigation and screen reader support. Translations completed for all 15 languages.
Project: https://github.com/totoshko88/RustConn Flatpak: https://flathub.org/en/apps/io.github.totoshko88.RustConn
Phosh ↗A pure wayland shell for mobile devices.
Guido announces
Phosh 0.54 is out:
There’s now a notification when an app fails to start, the status bar can be extended via plugins, and the location quick toggle has a status page to set the maximum allowed accuracy.
On the compositor side we improved X11 support, making docked mode (aka convergence) with applications like emacs or ardour more fun to use.
The on screen keyboard Stevia now supports Japanese and Chinese input via UIM, has a new us+workman layout and automatic space handling can be disabled.
There’s more - see the full details here.
DocumentationEmmanuele Bassi announces
The GNOME User documentation project has been ported to use Meson for its configuration, build, and installation. The User documentation contains the desktop help and the system administration guide, and gets published on the user help website, as well as being available locally through the Help browser. The switch to Meson improved build times, and moved the tests and validation in the build system. There’s a whole new contribution guideline as well. If you want to help writing the GNOME documentation, join us in the Docs room on Matrix!
Shell Extensions Weather O’Clock ↗Display the current weather inside the pill next to the clock.
Cleo Menezes Jr. reports
Weather O’Clock 50 released with fluffier animations: smooth fades between loading, weather and offline states; instant temperature updates; first-fetch spinner; offline indicator; GNOME Shell 45–50 support; and various bug fixes.
That’s all for this week!See you next week, and be sure to stop by #thisweek:gnome.org with updates on your own projects!
I've finally gotten around to porting this blog over to Zola. I've been running on Jekyll for years now, after originally conceiving this blog in Middleman (and PHP initially). But time catches up with everything, and the friction of maintaining Ruby dependencies eventually got to me.
The SpeedI can't stress this enough — Zola is fast. Not "for a static site generator" fast. Just fast. My old Jekyll setup needed a good few seconds to rebuild after a change. Zola builds in milliseconds. The entire site rebuilds almost before I can release the key. It's not critical for a site that gets updated 5 times a year, but it's still impressive.
No DependenciesThis is the big one. Every time you leave a project alone for a few months and come back, you know it's not just going to magically work. The gem versions drift, Bundler gets confused, and suddenly you're down a rabbit hole of version conflicts. The only reason all our Jekyll projects were reasonably easy to work with was locking onto Ruby 3.1.2 using rvm. But at some point the layers of backwardism catch up with you.
Zola is a single binary. That's it. No bundle install, no Gemfile, no "works on my machine" prayers. Download, run, done. It even embeds everything — syntax highlighting, image processing, Sass compilation (if you haven't embraced the modern CSS light yet) — all built-in. The site builds the same on any machine with zero setup.
The HeritageZola started life as Gutenberg in 2015/2016, a learning project for Rust by Vincent Prouillet. He was using Hugo before, but hated the Go template engine. That spawned Tera, the Jinja2-inspired template engine that Zola uses.
The project got renamed to Zola in 2018 when the name conflicts with Project Gutenberg got too annoying. It's pure Rust, which means it's fast, memory-safe, and ships as a tiny static binary.
Asset ColocationOne thing I've always focused on for this blog architecture wise is the structure — images and media live right alongside the post, not stuffed into some shared /images/ folder somewhere like most Jekyll sites seem to do. Zola calls this "asset colocation," and it's a first-class feature. No plugins needed. Just put your images in the same folder as your index.md, reference them directly, and Zola handles the rest.
This is how I'd already been running things with Jekyll, so the port was refreshingly painless on that front.
The TemplatingThe main work was porting the templates. It was the main shostopper when Bilal suggested Zola a couple of years ago. I was hoping something with liquid to pop up, but it seems like people running their own blogs is not a Tik Tok trend. Zola uses Tera instead of Liquid. The syntax is similar enough to get by, but there's enough branches in your path to stumble on. The error messages actually make sense though and point you at the problem, which is a refreshing change from debugging broken Liquid includes.
The ImprovementsBeyond speed, I've been cleaning up things the old theme dragged along:
The site's cleaner now, light by default, faster to build, and I don't need to invoke Ruby just to write a blog post. The experience was so damn good, it motivated me to jump at a much larger project I'm hopefully going to post about next.
Hear ye, hear ye: Wastrel and Hoot means REPL!
Which is to say, Wastrel can now make native binaries out of WebAssembly files as produced by the Hoot Scheme toolchain, up to and including a full read-eval-print loop. Like the REPL on the Hoot web page, but instead of requiring a browser, you can just run it on your console. Amazing stuff!
try it at homeFirst, we need the latest Hoot. Build it from source, then compile a simple REPL:
echo '(import (hoot repl)) (spawn-repl)' > repl.scm ./pre-inst-env hoot compile -fruntime-modules -o repl.wasm repl.scmThis takes about a minute. The resulting wasm file has a pretty full standard library including a full macro expander and evaluator.
Normally Hoot would do some aggressive tree-shaking to discard any definitions not used by the program, but with a REPL we don’t know what we might need. So, we pass -fruntime-modules to instruct Hoot to record all modules and their bindings in a central registry, so they can be looked up at run-time. This results in a 6.6 MB Wasm file; with tree-shaking we would have been at 1.2 MB.
Next, build Wastrel from source, and compile our new repl.wasm:
wastrel compile -o repl repl.wasmThis takes about 5 minutes on my machine: about 3 minutes to generate all the C, about 6.6MLOC all in all, split into a couple hundred files of about 30KLOC each, and then 2 minutes to compile with GCC and link-time optimization (parallelised over 32 cores in my case). I have some ideas to golf the first part down a bit, but the the GCC side will resist improvements.
Finally, the moment of truth:
$ ./repl Hoot 0.8.0 Enter `,help' for help. (hoot user)> "hello, world!" => "hello, world!" (hoot user)> staticsWhen I first got the REPL working last week, I gasped out loud: it’s alive, it’s alive!!! Now that some days have passed, I am finally able to look a bit more dispassionately at where we’re at.
Firstly, let’s look at the compiled binary itself. By default, Wastrel passes the -g flag to GCC, which results in binaries with embedded debug information. Which is to say, my ./repl is chonky: 180 MB!! Stripped, it’s “just” 33 MB. 92% of that is in the .text (code) section. I would like a smaller binary, but it’s what we got for now: each byte in the Wasm file corresponds to around 5 bytes in the x86-64 instruction stream.
As for dependencies, this is a pretty minimal binary, though dynamically linked to libc:
linux-vdso.so.1 (0x00007f6c19fb0000) libm.so.6 => /gnu/store/…-glibc-2.41/lib/libm.so.6 (0x00007f6c19eba000) libgcc_s.so.1 => /gnu/store/…-gcc-15.2.0-lib/lib/libgcc_s.so.1 (0x00007f6c19e8d000) libc.so.6 => /gnu/store/…-glibc-2.41/lib/libc.so.6 (0x00007f6c19c9f000) /gnu/store/…-glibc-2.41/lib/ld-linux-x86-64.so.2 (0x00007f6c19fb2000)Our compiled ./repl includes a garbage collector from Whippet, about which, more in a minute. For now, we just note that our use of Whippet introduces no run-time dependencies.
dynamicsJust running the REPL with WASTREL_PRINT_STATS=1 in the environment, it seems that the REPL has a peak live data size of 4MB or so, but for some reason uses 15 MB total. It takes about 17 ms to start up and then exit.
These numbers I give are consistent over a choice of particular garbage collector implementations: the default --gc=stack-conservative-parallel-generational-mmc, or the non-generational stack-conservative-parallel-mmc, or the Boehm-Demers-Weiser bdw. Benchmarking collectors is a bit gnarly because the dynamic heap growth heuristics aren’t the same between the various collectors; by default, the heap grows to 15 MB or so with all collectors, but whether it chooses to collect or expand the heap in response to allocation affects startup timing. I get the above startup numbers by setting GC_OPTIONS=heap-size=15m,heap-size-policy=fixed in the environment.
Hoot implements Guile Scheme, so we can also benchmark Hoot against Guile. Given the following test program that sums the leaf values for ten thousand quad trees of height 5:
(define (quads depth) (if (zero? depth) 1 (vector (quads (- depth 1)) (quads (- depth 1)) (quads (- depth 1)) (quads (- depth 1))))) (define (sum-quad q) (if (vector? q) (+ (sum-quad (vector-ref q 0)) (sum-quad (vector-ref q 1)) (sum-quad (vector-ref q 2)) (sum-quad (vector-ref q 3))) q)) (define (sum-of-sums n depth) (let lp ((n n) (sum 0)) (if (zero? n) sum (lp (- n 1) (+ sum (sum-quad (quads depth))))))) (sum-of-sums #e1e4 5)We can cat it to our repl to see how we do:
Hoot 0.8.0 Enter `,help' for help. (hoot user)> => 10240000 (hoot user)> Completed 3 major collections (281 minor). 4445.267 ms total time (84.214 stopped); 4556.235 ms CPU time (189.188 stopped). 0.256 ms median pause time, 0.272 p95, 7.168 max. Heap size is 28.269 MB (max 28.269 MB); peak live data 9.388 MB.That is to say, 4.44s, of which 0.084s was spent in garbage collection pauses. The default collector configuration is generational, which can result in some odd heap growth patterns; as it happens, this workload runs fine in a 15MB heap. Pause time as a percentage of total run-time is very low, so all the various GCs perform the same, more or less; we seem to be benchmarking eval more than the GC itself.
Is our Wastrel-compiled repl performance good? Well, we can evaluate it in two ways. Firstly, against Chrome or Firefox, which can run the same program; if I paste in the above program in the REPL over at the Hoot web site, it takes about 5 or 6 times as long to complete, respectively. Wastrel wins!
I can also try this program under Guile itself: if I eval it in Guile, it takes about 3.5s. Granted, Guile’s implementation of the same source language is different, and it benefits from a number of representational tricks, for example using just two words for a pair instead of four on Hoot+Wastrel. But these numbers are in the same ballpark, which is heartening. Compiling the test program instead of interpreting is about 10× faster with both Wastrel and Guile, with a similar relative ratio.
Finally, I should note that Hoot’s binaries are pretty well optimized in many ways, but not in all the ways. Notably, they use too many locals, and the post-pass to fix this is unimplemented, and last time I checked (a long time ago!), wasm-opt didn’t work on our binaries. I should take another look some time.
generational?This week I dotted all the t’s and crossed all the i’s to emit write barriers when we mutate the value of a field to store a new GC-managed data type, allowing me to enable the sticky mark-bit variant of the Immix-inspired mostly-marking collector. It seems to work fine, though this kind of generational collector still baffles me sometimes.
With all of this, Wastrel’s GC-using binaries use a stack-conservative, parallel, generational collector that can compact the heap as needed. This collector supports multiple concurrent mutator threads, though Wastrel doesn’t do threading yet. Other collectors can be chosen at compile-time, though always-moving collectors are off the table due to not emitting stack maps.
The neat thing is that any language that compiles to Wasm can have any of these collectors! And when the Whippet GC library gets another collector or another mode on an existing collector, you can have that too.
missing piecesThe biggest missing piece for Wastrel and Hoot is some kind of asynchrony, similar to JavaScript Promise Integration (JSPI), and somewhat related to stack switching. You want Wasm programs to be able to wait on external events, and Wastrel doesn’t support that yet.
Other than that, it would be lovely to experiment with Wasm shared-everything threads at some point.
what’s nextSo I have an ahead-of-time Wasm compiler. It does GC and lots of neat things. Its performance is state-of-the-art. It implements a few standard libraries, including WASI 0.1 and Hoot. It can make a pretty good standalone Guile REPL. But what the hell is it for?
Friends, I... I don’t know! It’s really cool, but I don’t yet know who needs it. I have a few purposes of my own (pushing Wasm standards, performance work on Whippet, etc), but you or someone you know needs a wastrel, do let me know at wingo@igalia.com: I would love to be able to spend more time hacking in this area.
Until next time, happy compiling to all!
I love Markdown with all my heart. It's a markup language so simple to understand that even people who are not software engineers can use it in a few minutes.
The flip side of that coin if that Markdown is limited. It can let you create various title levels, bold, italics, strikethrough, tables, links, and a bit more, but not so much.
When it comes to more complex documents, most people resort to full fledged office suite like Microsoft Office or LibreOffice. Both have their merits, but office file formats are brittle and heavy.
The alternative is to use another more complex markup language. Academics used to be into LaTeX but it's often tedious to use. Typst emerged more recently as a simpler yet useful markup language to create well formatted documents.
Tinymist is a language server for Typst. It provides the usual services a Language server provides, like semantic highlighting, code actions, formatting, etc.
But it really stands out by providing a live preview feature that keeps your cursor in sync in Helix when you are clicking around in the live preview!
I only had to install it with
$ brew install tinymistI then configured Helix to use tinymist for Typst documents, enabling live preview along the way. This happens of course in ~/.config/helix/languages.toml
[language-server.tinymist] command = "tinymist" config = { preview.background.enabled = true, preview.background.args = ["--data-plane-host=127.0.0.1:23635", "--invert-colors=never", "--open"] } [[language]] name = "typst" language-servers = ["tinymist"]A warm thank you to my lovely friend Felix for showing me the live preview mode of tinymist!
Over on his excellent blog, Matt Keeter posts some results from having ported a bytecode virtual machine to tail-calling style. He finds that his tail-calling interpreter written in Rust beats his switch-based interpreter, and even beats hand-coded assembly on some platforms.
He also compares tail-calling versus switch-based interpreters on WebAssembly, and concludes that performance of tail-calling interpreters in Wasm is terrible:
1.2× slower on Firefox, 3.7× slower on Chrome, and 4.6× slower in wasmtime. I guess patterns which generate good assembly don't map well to the WASM stack machine, and the JITs aren't smart enough to lower it to optimal machine code.In this article, I would like to argue the opposite: patterns that generate good assembly map just fine to the Wasm stack machine, and the underperformance of V8, SpiderMonkey, and Wasmtime is an accident.
some numbersI re-ran Matt’s experiment locally on my x86-64 machine (AMD Ryzen Threadripper PRO 5955WX). I tested three toolchains:
Compiled natively via cargo / rustc
Compiled to WebAssembly, then run with Wasmtime
Compiled to WebAssembly, then run with Wastrel
For each of these toolchains, I tested Raven as implemented in Rust in both “switch-based” and “tail-calling” modes. Additionally, Matt has a Raven implementation written directly in assembly; I test this as well, for the native toolchain. All results use nightly/git toolchains from 7 April 2026.
My results confirm Matt’s for the native and wasmtime toolchains, but wastrel puts them in context:
We can read this chart from left to right: a switch-based interpreter written in Rust is 1.5× slower than a tail-calling interpreter, and the tail-calling interpreter just about reaches the speed of hand-written assembler. (Testing on AArch64, Matt even sees the tail-calling interpreter beating his hand-written assembler.)
Then moving to WebAssembly run using Wasmtime, we see that Wasmtime takes 4.3× as much time to run the switch-based interpreter, compared to the fastest run from the hand-written assembler, and worse, actually shows 6.5× overhead for the tail-calling interpreter. Hence Matt’s conclusions: there must be something wrong with WebAssembly.
But if we compare to Wastrel, we see a different story: Wastrel runs the basic interpreter with 2.4× overhead, and the tail-calling interpreter improves on this marginally with a 2.3x overhead. Now, granted, two-point-whatever-x is not one; Matt’s Raven VM still runs slower in Wasm than when compiled natively. Still, a tail-calling interpreter is inherently a pretty good idea.
where does the time goWhen I think about it, there’s no reason that the switch-based interpreter should be slower when compiled via Wastrel than when compiled via rustc. Memory accesses via Wasm should actually be cheaper due to 32-bit pointers, and all the rest of it should be pretty much the same. I looked at the assembly that Wastrel produces and I see most of the patterns that I would expect.
I do see, however, that Wastrel repeatedly reloads a struct memory value, containing the address (and size) of main memory. I need to figure out a way to keep this value in registers. I don’t know what’s up with the other Wasm implementations here; for Wastrel, I get 98% of time spent in the single interpreter function, and surely this is bread-and-butter for an optimizing compiler such as Cranelift. I tried pre-compilation in Wasmtime but it didn’t help. It could be that there is a different Wasmtime configuration that allows for higher performance.
Things are more nuanced for the tail-calling VM. When compiling natively, Matt is careful to use a preserve_none calling convention for the opcode-implementing functions, which allows LLVM to allocate more registers to function parameters; this is just as well, as it seems that his opcodes have around 9 parameters. Wastrel currently uses GCC’s default calling convention, which only has 6 registers for non-floating-point arguments on x86-64, leaving three values to be passed via global variables (described here); this obviously will be slower than the native build. Perhaps Wastrel should add the equivalent annotation to tail-calling functions.
On the one hand, Cranelift (and V8) are a bit more constrained than Wastrel by their function-at-a-time compilation model that privileges latency over throughput; and as they allow Wasm modules to be instantiated at run-time, functions are effectively closures, in which the “instance” is an additional hidden dynamic parameter. On the other hand, these compilers get to choose an ABI; last I looked into it, SpiderMonkey used the equivalent of preserve_none, which would allow it to allocate more registers to function parameters. But it doesn’t: you only get 6 register arguments on x86-64, and only 8 on AArch64. Something to fix, perhaps, in the Wasm engines, but also something to keep in mind when making tail-calling virtual machines: there are only so many registers available for VM state.
the value of timeWell friends, you know us compiler types: we walk a line between collegial and catty. In that regard, I won’t deny that I was delighted when I saw the Wastrel numbers coming in better than Wasmtime! Of course, most of the credit goes to GCC; Wastrel is a relatively small wrapper on top.
But my message is not about the relative worth of different Wasm implementations. Rather, it is that performance oracles are a public good: a fast implementation of a particular algorithm is of use to everyone who uses that algorithm, whether they use that implementation or not.
This happens in two ways. Firstly, faster implementations advance the state of the art, and through competition-driven convergence will in time result in better performance for all implementations. Someone in Google will see these benchmarks, turn them into an OKR, and golf their way to a faster web and also hopefully a bonus.
Secondly, there is a dialectic between the state of the art and our collective imagination of what is possible, and advancing one will eventually ratchet the other forward. We can forgive the conclusion that “patterns which generate good assembly don’t map well to the WASM stack machine” as long as Wasm implementations fall short; but having shown that good performance is possible, our toolkit of applicable patterns in source languages also expands to new horizons.
Well, that is all for today. Until next time, happy hacking!
In an earlier blog post we found out that Pystd's simple sorting algorithm implementations were 5-10% slower than their stdlibc++ counterparts. The obvious follow up nerd snipe is to ask "can we make the Pystd implementation faster than stdlibc++?"
For all tests below the data set used was 10 million consecutive 64 bit integers shuffled in a random order. The order was the same for all algorithms.
Stable sortIt turns out that the answer for stable sorting is "yes, surprisingly easily". I made a few obvious tweaks (whose details I don't even remember any more) and got the runtime down to 0.86 seconds. This is approximately 5% faster than std::stable_sort. Done. Onwards to unstable sort.
Unstable sortThis one was not, as they say, a picnic. I suspect that stdlib developers have spent more time optimizing std::sort than std::stable_sort simply because it is used a lot more.
After all the improvements I could think of were done, Pystd's implementation was consistently 5-10% percent slower. At this point I started cheating and examined how stdlibc++'s implementation worked to see if there are any optimization ideas to steal. Indeed there were, but they did not help.
Pystd's insertion sort moves elements by pairwise swaps. Stdlibc++ does it by moving the last item to a temporary, shifting the array elements onwards and then moving the stored item to its final location. I implemented that. It made things slower.
Stdlibc++'s moves use memmove instead of copying (at least according to code comments). I implemented that. It made things slower.
Then I implemented shell sort to see if it made things faster. It didn't. It made them a lot slower. So did radix sort.
Then I reworked the way pivot selection is done and realized that if you do it in a specific way, some elements move to their correct partitions as a side effect of median selection. I implemented that and it did not make things faster. It did not make them slower, either, but the end result should be more resistant against bad pivot selection so I left it in.
At some point the implementation grew a bug which only appeared with very large data sets. For debugging purposes I reduce the limit where introsort switches from qsort to insertion sort from 16 to 8. I got the bug fixed but the change made sorting a lot slower. As it should.
But this raises a question, namely would increasing the limit from 16 to 32 make things faster? It turns out that it did. A lot. Out of all perf improvements I implemented, this was the one that yielded the biggest improvement by a fairly wide margin. Going to 64 elements made it even faster, but that made other algorithms using insertion sort slower, so 32 it is. For now at least.
After a few final tweaks I managed to finally beat stdlibc++. By how much you ask? Pystd's best observed time was 0.754 seconds while stdlibc++'s was 0.755 seconds. And it happened only once. But that's enough for me.