I have a server at home. It runs a Kubernetes cluster and a few services. I want to expose them to the Internet, so I can e.g. share public links from my Nextcloud, or synchronize my Kobo reader with Grimmory. But I don't want to expose my home IP to the world, and I want to have some reasonable protection against unsophisticated DoS attacks.
I realized that I can achieve that with a cheap VPS that acts as a front, HAProxy, and Wireguard.
I rented a tiny VPS for €4/month at Infrawire (1 vCPU, 2 GB RAM, 25 GB NVMe). I installed a Debian 13 on it, because I want that front server to be as stable and low maintenance as possible, and installed the Debian-packaged HAProxy onto it. I also installed Wireguard. The VPS has a publicly accessible IP, so it will be my Wireguard server: my server at home can reach the VPS to establish a tunnel, the opposite is not true.
On my k3s node, I've installed Wireguard as well. I configured Wireguard on the VPS and my k3s node to establish a tunnel between the two. I've also bound the sshd on my VPS to the wireguard address. Infrawire offers a console so I can unstick myself if I locked me out of my own server (e.g. by misconfiguring Wireguard on any side, or if my server at home had any failure).
I pointed all my DNS records to the VPS. The HAProxy is a "dumb" tcp forwarder, so I can keep operating like before on my cluster. In particular, HAProxy doesn't do TLS termination. My certificates are fetched on my cluster by cert-manager like before, using the http-01 challenge and Let's Encrypt. I could also move to dns-01 challenges, but http-01 just works and lets me switch to a registrar without an API if need be.
That way, I don't need a fixed IP at home, and I don't have to do any port-forwarding from my home router to my k3s cluster. Even better: the VPS has an anti-DDoS protection included, and I can also configure HAProxy to refuse too many connections from a same IP, I can make it close TCP connections that take too long to establish, and more. If my VPS gets hammered, I can still access my services from within my home network.
It's probably not your fault.
On a cache miss, there are two things a reverse proxy (which Fastly is to us) can do. It can make the client wait until the proxy itself fetches the requested content and then serve it, with subsequent requests being served from the cache. From a user's perspective, it means staring at "hung" process, and people tend not to be understanding when a program is stuck seemingly doing nothing.
Instead, the proxy can stream the response from the origin, caching it at the end. This makes the client receive the data right away, although it's not without drawbacks.
In a streaming setup like Flathub's, an all-MISS path adds some upstream latency before the first byte, but also limits the download speed to what the slowest link can deliver. As we don't run servers in the same datacenter or on a single backbone network, the hop from Fastly through the caching proxy to the master server incurs a penalty that may affect how quickly the data gets back.
In order to cache files larger than 20MB, Fastly expects customers who use streaming misses to use segmented caching. Anything larger than that gets broken down into smaller chunks. When Fastly wants the data from us, it will add a Range header specifying which bytes we should respond with. Fastly will then serve the request after reconstructing the file from various chunks. Our caching proxies also use the value of the Range header in the caching key to avoid requesting the full file over and over again from the master server as well.
While great for caching, many concurrent range MISSes can turn what would be a sequential file read into scattered, random reads. It wouldn't matter with SSD or NVMe, but as the repository is stored on HDDs, when combined with streaming misses, it can turn cold transfer speed into min(network bottleneck, ZFS random-read bottleneck).
Counterintuitively, you may improve your download speeds by aborting the ongoing Flatpak operation and starting it again. While the initial request was slow, there's a non-zero chance it went through all the caching layers and it will become a cache hit in the meantime.
FlatpakLet's talk Flatpak. When installing or upgrading applications, Flatpak will try to use delta files. A typical delta is an update file that contains only the difference between versions. There are also from-scratch deltas, which effectively are an archive with all files required to install an app from scratch, thus the name.
Flathub generates a single upgrade delta and a from-scratch delta for the latest version. Delta generation is an expensive process in terms of disk reads and writes, but also disk space. Because our ZFS setup isn't exactly the fastest, generating more delta files also affects how quickly we can publish an update. Yes, in theory we could be doing this out of band but we don't. In hindsight, Titanic wasn't unsinkable after all.
What happens if you are not updating often enough? A lot of suffering. Flatpak will download each missing file between the version you are on and the one you want to upgrade to, separately. This is an almost certain cache miss causing even more random seeks on the master server. At this point Flatpak would be better off downloading the from-scratch delta but it can't. The behaviour is controlled by OSTree, which doesn't offer any knobs to affect it. It is the right choice if the goal is to limit the bandwidth used by the client to fetch updates, but an incredibly bad one for anyone on a reliable connection; downloading a single large file is almost always faster than fetching multiple smaller ones.
What do? Some brave soul could fix OSTree to apply a better heuristic on when to use from-scratch deltas for upgrades, or at least make it expose an API that lets Flatpak choose. For the rest of us mere mortals, we can only update regularly or wait patiently for the update to finish.
I often write about how when stuff works well, you take it for granted.
It’s true for technology: when’s the last time you hit a compiler bug in GCC? Once upon a time these were a common thing and you had to choose your C compiler wisely. Yet I haven’t recently seen an article that says “GCC is going great” .
It’s true for people too. When someone does an excellent job maintaining an open source project then, they do occasionally get some gratitude, but — if you do a bad job, it’s amazing how quickly the negative comments pile up in the issue tracker, many of which taking subtle or not-so-subtle digs at the project owners. Maybe we created this situation for ourselves by having a prominent “report issue” button but no corresponding “send flowers to the maintainer” button.
On that note, a hat tip to Carlos Garnacho for all his work on the Localsearch extractor sandbox which recently got a shout out its “extremely strong” design.
(It’s worth noting that Localsearch also stopped using GStreamer to parse media files altogether, which the discussion in that thread missed. We love GStreamer but it isn’t the right tool for metadata scanning. The 3.9 and 3.10 series use libav/ffmpeg instead, but given that US software patent laws make it tricky for USA folk to distribute that, the plan is to move to using MediaInfoLib)
Fairphone 5It’s coming up to two years since I switched to a Fairphone 5. The real proof of this device will be in 2033 when I manage ten years of using the same phone.
Meanwhile, I recently had some issues with it not charging via the USB-C port. I thought it might be a bit tricky to fix, but it really is easy: buy the replacement part (about 20€), take off the back cover, remove a few small screws and switch over the whole USB port + speaker unit.
I hear some fellow Android users complaining about Alphabet/Google’s intrusive AI integration. Apparently the power button is now the AI button? I use the stock Android, and I know vendors have their hands tied somewhat by Alphabet/Google, so its worth noting that disabling the AI integration on the Fairphone 5 is a single config setting.
I’d be interested to know more about the kernel version as it is old as hell. I guess this is a vendor/Android thing, and hopefully most of the many known vulnerabilities in this old version of Linux are mitigated by sandboxing higher up in Android. If you’re a high risk cybercrime target then I would definitely not recommend using the vendor Android OS on this device. (Probably best to avoid Android altogether if this is your situation!)
So its not perfect, but I just wanted to shout out again that there are some good people doing good work here. If only all smartphones were built like this one.
Korg Minilogue XDOne reason I’m not writing much about open source software is that I’m spending a lot of my time outside work making music in various guises, these days mainly as part of soon to be huge Galician disco revival group Muaré. This band needs a website, so in future I don’t have to link you to Instagram, but you know how the world is at the moment. We do at least have a Bandcamp page.
When it comes to music gear, I seem to be a Yamaha guy. It’s amazing actually that the same company that made my trombone also makes excellent digital pianos, and if and when I need a motorbike, Yamaha also sells those.
When it comes to synths though I’ve been really enjoying the Korg Minilogue XD. It’s cheap, built like a tank and its ten years old so there are plenty of second hand models around. It’s not fucking Behringer (please don’t give money to Behringer). It’s simple and sounds great.
But most impressively, it support plugins via a freely available SDK. You can develop your own custom digital oscillators and effects for this thing and deploy them over USB. Of all major pro audio manufacturers, Korg are the only company I know to support this. So even though the hardware is now 10 years old, it can still learn new tricks, and there is an active scene of both free and commercial plugins for the platform. Perhaps the most active commercial outfit is Sinevibes. There is, of course, reddit. The SDK is not truly open source (and few things in pro audio ever are) but it’s free from any licensing fees, and the whole thing is sat here in a Git repo. Pretty good.
If I’d had more time to prepare I might have a video here of some cool Minilogue XD tunes I made. But I guess you’ll have to wait til next month for that. Until then!
I’ve been putting together varlink-glib, which is a library for writing Varlink clients and services in C. The basic idea is to keep the transport policy out of the library. You get a connected GIOStream however you want, whether that is GLib networking, socket activation, or something more specialized, and then wrap it in a VarlinkClientProtocol or VarlinkServerProtocol.
The API is built around DexFuture, which makes the async parts feel a lot nicer in C than the usual callback layering. Client calls return a future, server replies return a future, and internally the protocol can use fibers for the work that wants to look sequential while still integrating with the GLib main context. This is very much the style of API I want for system services: explicit enough that you can debug it, but not so painfully manual that every call site becomes a state machine.
future = example_calc_add_call_invoke (proxy, call, VARLINK_METHOD_CALL_DEFAULT, add_reply_cb, NULL, NULL);There is also a varlink-codegen tool which takes a .varlink interface and generates typed C wrappers for it. That gives you proxy objects, server skeletons, call objects, reply objects, and error constants instead of making every application hand-roll JSON. You can still drop down to VarlinkMessage, VarlinkMessageBuilder, and VarlinkMessageReader for forwarding or weird infrastructure cases, but most code should get to stay typed.
File descriptor passing works when the transport is a Unix socket connection. This follows the same general model as systemd’s Varlink support: the JSON payload contains an integer index, while the actual descriptors are sent out-of-band with SCM_RIGHTS and attached to the containing message as a GUnixFDList. Generated code can continue to treat the field as an integer, while the actual descriptor list stays attached to the underlying VarlinkMessage.
fd_list = g_unix_fd_list_new (); fd_index = g_unix_fd_list_append (fd_list, fd, &error); varlink_message_set_unix_fd_list (parameters, fd_list);Protocol upgrades are supported too. A method call can ask to upgrade the connection, and once the final successful reply is sent, Varlink stops being valid on that connection. The VarlinkProtocol is still a GIOStream, so the next protocol can continue reading and writing through the same object. That keeps the handoff explicit without requiring a separate transport abstraction.
I also wired in optional Sysprof capture support. When enabled, client and server RPCs can show up as Sysprof marks with useful bits like method name, result, reply count, one-way, multi-reply, upgrade, Varlink error, and GError details. That matters because once you have concurrent calls, generated dispatch through VarlinkServerRegistry, and services doing real work, “it got slow somewhere” is not enough information.
There is still more polish to do, but the shape is there: typed generated APIs for normal users, low-level message APIs for infrastructure, DexFuture for async flow, Unix FD passing for the system service cases, protocol upgrades for handoff cases, and profiler hooks so it can be debugged when the happy path stops being happy.
NumbersSo what does that look like performance wise you might ask? For a very simple Echo interface in both D-Bus and Varlink you can get a rough estimate. No daemons, just serialization on a socketpair(). I haven’t started performance tuning yet so there may be ground to make up on both sides. But the answer is that the testcase for varlink-glib is about 5x faster than the testcase for GDBusConnection in either synchronous or asynchronous modes.
This doesn’t apply to all use-cases of D-Bus of course. But for a specific case I use it for (P2P IPC between peer processes), it is pretty big difference.
A small personal note: as I wrote in my recent update from France, I am no longer employed by Red Hat. Work like this is currently self-funded, out of pocket, while my family and I settle into a new chapter. If you find it useful, a note of encouragement or a contribution means a lot right now. It helps make it possible to keep improving the free software infrastructure many of us rely on.
Some more great news: I’m pleased to announce that HP has also agreed to be premier sponsor for the Linux Vendor Firmware Service (LVFS) as part of our sustainability effort.
With the industry support from HP (and our existing sponsors of Lenovo, Dell, Framework, OSFF and of course Linux Foundation and Red Hat) we can turbo-charge the growth of the LVFS even more. Thanks!
Good evening, friends. Tonight I have a few loosely-knit stories.
sootA couple years ago, my house was heated by a condensing gas boiler. It was awful from both an environmental and a geopolitical perspective: environmental, as I would emit somewhere around 2.5 tons of CO2 equivalent per year to heat my home, which compares poorly to the target total CO2e emissions of 2 tons per year per person; and geopolitical, because although France gets 40% of its gas from Norway, with whom we have no beef, all the rest is a problem in some way. (Algeria, 10%, is the least of my worries; the 20% for Russia and the US respectively are the most, followed by 10% for the Gulf states.)
Still, natural gas is better than fuel oil, which we had at my former rental house. It is a lamentably visceral experience to call up the fuel provider and say, yes, s’il vous plaît, can you drive a diesel-powered tanker truck out to my house, unroll the hose, and pour out 1500 liters of toxic fuel oil into a tank under my garden. Yes, I will just burn it all. Sure, see you again next year.
Some friends of mine recently had their fuel boiler die, which is itself an experience: one of them came over to visit, completely covered in soot, saying that the chimneysweep (whom he also has to call every year) said that his boiler is on its way out, that the chimney is completely clogged, and now because of the cleaning his basement is also covered in soot; awful. What to replace it with? Apparently despite the prohibition on new fuel-oil boiler installs, it might be possible to just install a new one; or they could hook up to natural gas from the street; or they could install a heat pump. Which to do?
To all these questions there is a moral answer, which we can phrase in terms in CO2 emissions and localized PM2.5 pollution, and it is always and everywhere to stop burning things. But fortunately we don’t need to rely only on moralism: electrification is just better, in essentially all ways. Owning and operating an electric car is a better experience than a petrol car. Induction stoves are better than gas; I know, I did not believe this for the longest time, but I was wrong. The experience of using a heat pump is pretty much equivalent to gas, so it’s a harder sell, but it is a relief to no longer have a pressurized methane tube connected to my house.
In the end, I think my neighbors are going to go for the heat pump, despite the 20k€ price tag, labor included. (Oddly, I think the deciding factor was that my neighbor confessed to having had a long chat with an AI chatbot, after which she felt she had a good understanding of the proposed solution and its tradeoffs; make of that what you will!)
solarIn late November I got some brave lads to install nineteen solar panels on my roof. Each of these magic rectangles can make up to 500W of power in optimal conditions, but my house faces south, with the roof inclined east and west, so it’s unlikely that I will ever hit the full 9.5 kW of potential power.
December was... very dark. The panels produced a total of 145 kWh over the month, but I used 1250 kWh of electricity, essentially all to run the heat pump. I live in a basin that is mostly covered by low clouds from November to February, and slanty photons couldn’t make much headway through the fog. The house is well-insulated (20-25 cm of wood-fiber exterior insulation on sides, 40 under the roof, though it is an old house with a few less-insulated bits), so it’s not that I am leaking lots of heat, and I have a combination of low-temperature floor heating and low-temperature radiators, so it’s not that I’m running the heat pump inefficiently to generate a too-high output temperature; it’s just, you know, cold in winter. A typical day would be between 1 and 5 degrees C. Cold; cold and dark.
Things got a little better in January: 285 kWh produced, though the heating needs are higher than in December, with 1450 kWh total consumed. In February we grew to 419 kWh produced, for 850 kWh consumed. In March we equalized, with about 850 kWh produced and consumed, but although the bulk of my consumption in this month is for heating, the “need” to heat overnight meant that I consume from the grid overnight, but feed in to the grid during the day. I have a small battery (7 kWh), but it’s not enough to store the “excess” electricity generated in a day; I should probably arrange to have the system heat only during the day in these months, to avoid taking from the grid.
With practically no heating needs now, as you can imagine, I am just feeding a lot of excess to the grid. We’re halfway through May, just coming through a cold snap (the peasant lore is that we just passed the saints de glace, the date you need to wait for to plant crops that aren’t frost-hardy), but still we’ve produced more than twice as much as we’ve consumed (550 kWh vs 220 kWh), and essentially all the excess goes to the grid. The 7 kWh battery is quite enough to cover night-time electricity needs.
I didn’t know before, but often a solar panel installation doesn’t work when the grid is down. This is because the inverters that convert the DC from the panels to AC for the house need to match phase with the grid, and if the grid’s phase signal is down, they stop. It’s also for safety, so that line workers can repair downed lines without worrying that every house is a live wire. I spent a little extra to install a cutout that allows the house to run in “island mode” if the grid is down. We almost never have that situation here, though, but it seemed prudent that if we were going all-in on electricity, that perhaps we should take precautions.
When you buy a solar installation, you can either have little DC/AC inverters attached to the back of each panel (microinverters), or feed DC from all panels wired in series (they call them strings; there may be 2 or 3 of them in a home setup) to a central inverter. I have the latter. The panels happen to be assembled locally by MaviWatt, though surely the cells themselves are from China. My panels are installed on top of the ceramic roof tiles with little clips and an aluminum structure. (It used to be that sometimes panels would replace tiles and become the roof. That’s not done so much any more here.) Installation is, like, 60% of the price of solar. Often you need scaffolding, though my installers just used ladders; perhaps living in the mountains where I am, there are more people used to doing ropes and rock-climbing and such. I don’t think they took as much care of themselves as they should, though.
My inverter is made by Huawei (SUN2000), as is my battery and the cutout (“backup”) box. Some batteries have their own microinverter, allowing them to consume and produce AC, but this one is DC, hence the need to have the same brand as the inverter. It sends all my electricity usage data to China or something, so that it can send it to the app on my phone. It’s not ideal from an geopolitical perspective but it is good kit.
sedimentationAlthough we haven’t hit the height of summer yet, I would like to offer a few observations that have precipitated out of solution.
Firstly, at least in my house, the baseline load without heating is pretty low: 200 or 300 watts or so. (I didn’t know this before looking at Huawei’s app.) We have a recently renovated, not tiny, but otherwise normal sort of house with, you know, the usual lot of modern conveniences, idle chargers plugged in here and there, and also my work computers and such, and it all runs on less than a handful of the old 60W bulbs. That’s interesting.
As far as actual load, there are only a few things that count: heating, when it’s cold; it can easily average 2 kW on a cold day. Plug in the electric car (I don’t have a wall box yet, just with the mains plug), that’s another kilowatt. I hardly drive, though, so it’s not a huge load. Using hot water is perhaps the most surprising thing: it can cause a spike up to 6 kW, over a short time, despite the heat coming from the heat pump; probably there is some tuning to do there. The oven and stove are little tiny blips. There’s the kettle, but it’s also a little blip. Nothing else matters: not the dishwasher, not the washing machine, nothing. You can leave the lights on all day and it just doesn’t matter.
Call me naïve, but I had hoped that solar would help my electricity usage in winter. This is simply not the case. Though the heat pump is efficient, there does not appear to be a magical energy solution for December, which is the bulk of my energy usage. My electricity bill is fixed-rate: 20 cents per kWh used. Using 4000 kWh or so from the grid over winter costs me 800€; annoying. I don’t have a natural before-and-after experiment as we added on to the house as we were renovating, but for context, in my previous poorly-insulated rental house that was half the size of this one, we’d pay 2000€ or so per year for heating oil. Perhaps I can lower the 800€ via variable-rate metering, to let the battery do some arbitrage, but there are some fundamental constraints that can’t be finagled away.
When I got my solar panels, I was resigned to never getting peak power, as they are on two different sections of the roof. It turns out that doesn’t matter: firstly, because 9.5 kW is a lot of power, as you can appreciate from the numbers above. I could never do anything with 9 kW. But secondly, because power isn’t equally valuable at different times of the day: by having east and west roof pitches, I can start producing earlier and continue producing later than if I had, say, a flat roof with panels tilted to the south. And the morning and the evening are the peak hours both for my house and for the grid, so that lets me consume more of my local production both when I need it and when the grid is under higher stress.
I was interested to hear that Alec Watson of Technology Connections had reservations about residential rooftop solar. I found a video in which he explains his perspective, which has a delightfully socialist character. His beef is partly due to the net metering scheme in some parts of the US, in which each kWh fed to the grid makes your meter run backwards; Watson finds it unfair, because it lets those wealthy households who have the capital to install solar to opt out of paying for the grid, which is a social good. In some cases, these households actually capture a part of what consumers pay for the grid, unlike industrial producers who are paid wholesale rates that don’t include transmission. Also, he finds it less efficient overall to install solar panels on houses rather than in bigger solar parks; each euro that society allocates to solar would go farther if we pooled them together.
Both points are interesting, but I would offer a couple responses. Firstly, at least in Europe, net metering is not really a thing; we have smart meters and I hear from friends in Portugal that there can even be a charge for grid injection at some times, if the grid is overloaded. France’s case is a bit weirder; I wouldn’t have gotten as large a system as I did, but there was a government program to offer a fixed buyback rate of 7 cents per kWh, stable for 20 years, if you installed more than 9 kW of panels. But given the lack of solar in December, I still pay the grid when I need energy the most.
Putting solar panels on roofs is indeed less efficient than putting them on a field. But, we are not in a situation of scarce solar panels: China could make another 350 GW of panels this year if there were demand. An incentive like the 7-cent buyback rate encourages capital allocation to solar, effectively calling these panels into existence. The bank loans me 20k€ at 4%, and the elimination of 3000 kWh that I would have bought from the grid in a year plus the 9000 kWh that I sell to the grid covers the cost entirely, and I get a life insurance policy on the remaining principal. It’s not a great investment financially but it doesn’t cost me anything either.
sinAs a person with a conscience, I have always experienced questions of energy as questions of sin; to leave a light on is not simply inefficient but a moral failing. Each kilometer a car travels on fossil fuel carries with it a quantum of guilt and must be justified in some way, otherwise a moral stain attaches.
Solar panels and electrification changes all this. 8 or 9 months out of the year, I live in a world of abundance: the electrical generation capacity that I have called into existence is free, clean, and much, much more than I need. Owning and operating a car still has externalities, but the emissions and cost aspects are entirely gone. It’s a funny feeling, and disorienting.
I grew up in the south of the US, where everyone has air conditioning. I came to see it as sinful, too; burning things and making emissions just so you could be a bit more comfortable. I haven’t lived in air conditioning since then, but it does get hot in summer, and I would be more comfortable if I could pump heat out of my house. Now I can. I have excess power available right when air conditioning (or, in my case, floor cooling) is needed. On a societal level, solar plus air conditioning is going to be a key part keeping our cities liveable while we ride out higher temperatures.
‘centersIt is with a sense of dissonance, then, that I have been experiencing Datacenter Discourse™: there is a lingering language of sin proceeding from an environmentalism born in penury, in a world in which every kilowatt-hour is precious and scarce. If China has unallocated capacity for another 350 GW of panels this year, why stress about a few GW of datacenters?
Of course, there are many aspects to these AI datacenters, but today I am just thinking about energy. Given that each GW of datacenter places extra demand on a grid, equivalent to 3 million times my home’s baseline load, or maybe 300 thousand of its winter load, if society wants this kind of datacenter to be a thing, it needs to add that amount of clean energy to the grid, with adequate battery storage to even out supply. We should, as a society, require this via legislation, because the market seems only too happy to use natural gas or even coal if it is marginally cheaper. At least if the datacenter boom busts, we’d be left with more clean energy production.
Conversely... and I don’t think I’m going too far here, but causing new fossil generation to come online in 2026, or even prolonging the life of existing generation, should result in the state confiscating all property of those responsible. (I have moderated my previous position, which was hanging.) Such people are not fit to live in society, so society should not allow them to own things.
Anyway. I think that those of us that wish “AI” were not a thing are losing the battle, and that we should prepare to fall back to more defensible positions; otherwise we risk a rout. A requirement to bring additional clean capacity online in sufficient amounts should be a baseline ask when it comes to datacenters. We have the productive capacity in the form of solar panels, at an affordable price, more than enough space in terms of the existing cropland that is inefficiently turned into ethanol to burn, batteries are a thing, and we just lack the political will to turn what could be into what is.
And as for AI datacenters themselves: there are enough aspects to argue about as it is. We do ourselves a disservice by weighing down the Discourse with outdated ideas of what is and isn’t possible.
There is one specific way in which the non-corporate open source projects typically document how their infrastructure work: not at all, and Flathub is no different. The full picture likely lives only in my brain, and while it could be sorted out by anyone (especially in this LLM age, yay or nay), why should it only be me thinking at night about all the single points of failure?
Like any system that evolved naturally, it's all over the place. It's tempting to tell its history chronologically, but even then, it's difficult to find a good entry point. Instead, this post focuses on what happens when users call flatpak install; later entries will cover the website and, finally, the build infrastructure. Buckle up!
CDN, caching proxies, the master serverThe secret of making computers work well is to have them not do anything at all, and that's the story behind serving Flathub's OSTree repository. Content-addressed objects are extremely cacheable as they are immutable, offloading the effort to the CDN provider.
When the client connects to dl.flathub.org, you can be certain it hits some layer of cache. Almost all the heavy-lifting is done by Fastly. At the peak, when both EMEA and North America are awake and at computers, 50 million requests per hour are cache hits served by Fastly's infrastructure, with a modest 20 million being misses passed down to our servers. There would be no Flathub without Fastly; Fastly does it completely for free, not even for fake Internet points as we are incredibly bad at highlighting what our sponsors do for us.
You can't do enough cache, and so various Fastly servers talk to Fastly-managed shield server which caches the most requested objects to avoid spilling over too much to us. For legit cache misses, the request will be served by one of 8 caching proxies we are running at different VPS providers. We use a consistent hashing director at Fastly which will pick the backend based on the path being requested. In the past, we used a dumb round-robin but as a result, each caching proxy had its own independent copy of the working set, wasting disk space and producing a higher miss rate against the master server. Hashing by URL behaves like one big cache instead of N copies.
These days, the caching proxy fleet consists of 3 servers at Mythic Beasts, 2 servers at AWS, another 2 at NetCup and a single server at DigitalOcean. We don't collect overly detailed metrics, but on average, each proxy serves around 1 TB/month back to Fastly and pulls roughly 5 TB/month from origin. With only 100 GB of disk space per proxy against a multi-TB working set, we're not so much caching the long tail as smoothing it. In the ideal world, we would be retaining much more data at this layer, but it's not the world we live in.
Each of these servers is running the latest stable Debian release. The requests are served by the usual nginx setup with proxy_cache enabled. There is some custom Lua code for invalidating certain paths after publishing new builds finishes (spoilers!). Vanilla nginx doesn't support the PURGE method, and third-party modules like ngx_cache_purge have not seen any maintenance for over 10 years. In the end, it was more maintainable to write Lua code to calculate the caching key of a URL and then run os.remove to "purge" it from the cache.
There's also a systemd timer for refreshing the Fastly IP allowlist. We used to expose these servers publicly, but a vision of everything crumbling down due to a DDoS attack kept me awake at night so this had to change.
On the far end of this setup sits a lonely physical server living in one of the Mythic Beasts' datacenters. This is The Server holding the entire Flathub repo on an equivalent of RAID10 in ZFS world: two 2-disk mirror vdevs on which ZFS stripes data across. There is more nuance to this setup, but the ultimate advantage is that we can tolerate a disk failure in each of the mirrors, while being less taxing to resilver after a swap. The entire reachable data set is around 4TB of data, with the remaining 6TB unused. There will be more about the repository maintenance later on!
Ironically, it's the only server running Ubuntu. At the time, it was the easiest way to have support for ZFS readily available. We could re-provision it to Debian, but on the other hand, what for? It works fine that way. It has survived at least 2 major upgrades between LTS-es; if it ain't broke, don't fix it.
The master server itself has to be partially public as it's where new builds are being uploaded. It no longer exposes the raw Flathub repository for the same reason caching proxies don't. This is accomplished with Tailscale and a lightweight ACL config ensuring caching proxies can talk only to the HTTP server running on the main repo server and vice versa (for issuing PURGE requests). Yes, all involved parties have public IP addresses assigned so this could technically be pure WireGuard setup but I prefer to make this someone else's concern, especially given how generous Tailscale's free plan is.
It's not much, but it's honest work. For how little we have, the file-serving half of Flathub's infrastructure works unreasonably well. Stay tuned for part 2!
A little while ago, my colleague Sebastian started complaining about OOMs caused by Evolution taking up tens of gigabytes of memory. We discussed using sysprof to debug it, but it was too busy a time for Sebastian to set aside a few hours to do that.
Funnily enough, the most efficient fix at the time was to buy more RAM, since rust-analyzer was also causing OOM issues.
A few weeks went by. Restarting Evolution had become a daily ritual for Sebastian.
Then, on a whim, I decided investigating this might be a good test for an LLM.
I updated my Evolution git repo, built it, and started up Claude Code in the source root. This was the only prompt I supplied:
Find memory leaks in Evolution, current sourcedir. Particularly leaks that could accumulate over several hours. A colleague has a leak that slowly accumulates memory usage to several GB over the course of a day, requiring a restart of Evolution. That is the main focus, but we can fix other leaks in the process.I wish I was lying, but that was all Claude Code needed to find the problem: Evolution just needed to call malloc_trim(0) from time to time.
I refused to believe it at first. I was only convinced when we saw the memory drop after running gdb -p $(pidof evolution) -batch -ex "call malloc_trim(0)" -ex detach
This seems absurd! Doesn't glibc reclaim freed memory from time to time?
Yes, it does. It calls sbrk() to do that. However, sbrk() can only reclaim free memory at the top of the heap, since it simply moves the program break downward to do so. malloc_trim(0) calls sbrk() and then also calls madvise(..., MADV_DONTNEED) on the free pages, which allows the kernel to reclaim them.
So if you have 10GB of unused memory followed by 4 bytes allocated at the top of the heap, your RSS is >10GB, even if you're using a few hundred megs. Till you call malloc_trim(0).
Note that you can only get into this situation if you have hundreds of thousands of small allocs/deallocs happening repeatedly. If your alloc is >128KB, mmap() is used for the allocation, and none of this applies.
Coincidentally, GLib's use of GSlice for GObject allocations was masking this issue in the past, but GSlice has been a no-op for some time now (for good reasons). Ideally, Evolution should not be using GObject for such ephemeral objects.
Lesson learned: if you have memory usage issues and you suspect fragmentation, try malloc_trim(0) before you go thinking about fancy allocators.
Last summer I wrote up Octopus Agile Prices For Linux, a small GTK app to show the current Octopus Agile electricity price and the next day of half-hourly rates. It did one thing, which is a good number of things for a desktop utility to do.
Since then the app has become a bit less narrow. But it now does enough more that the launch post undersells it, and in a couple of places sends people looking for the wrong name.
The app is now called Agile Rates. The application ID is still com.nedrichards.octopusagile, because changing stable app IDs is not exciting for anyone, but the name changed because Agile is no longer the whole story. Thanks to code from Andy Piper, it can also work with Octopus Go and Intelligent Go tariffs. Intelligent Go needs an API key because those prices are account-specific, but plain Agile and Go can still be set up manually.
That was the first larger change: setup had to become a thing.
The original app assumed you knew your tariff and region, or at least were willing to rummage in preferences until the graph stopped being wrong. That is fine for a scratch-your-own-itch project and a bit rude for an app on Flathub. The current version opens with a setup assistant. You can connect an Octopus account with an API key and account number, in which case the app tries to detect the active electricity tariff. Or you can keep it simple and choose the tariff and region manually.
The second change is the one I actually use most: finding the cheapest slot.
The launch version showed a graph and left the planning to the human. That works for quick glances, but most of my real questions are more specific:
When should the dishwasher run? When should the washing machine run? Is there a cheap three-hour block before tomorrow afternoon?So there is now a “find cheapest time” tool. Pick a duration and it searches the available forecast window for the cheapest continuous block. The chart now scrolls to the chosen time instead of making you squint along the bars like you are reading a very dull railway timetable.
The graph itself has had a lot of tweaks. It has grid lines, clearer day boundaries, better current-price highlighting, less terrible dark-mode contrast, and layout rules that behave on narrower screens. The preferences window and main window are adaptive now too. Handy if you split your screen or have a Linux phone.
The biggest recent addition is usage history. If you connect an account, the app can fetch recent smart meter consumption data, cache it locally, and show a Usage view. That includes kWh history, a seven-day trend, an estimated monthly usage figure, and charts. It also tries to estimate spend by matching historical usage to tariff rates and standing charges.
Underneath that, the project has become more like a real small application. There are unit tests for pricing, tariff selection, adaptive layout, usage insights, and historical cost calculation. The development Flatpak manifest runs the Meson tests inside the GNOME SDK, which catches the class of bugs where the host Python environment was accidentally being too kind. Ruff is in the loop for linting. The app moved to the GNOME 50 runtime. Screenshots, AppStream metadata, branding colours, and icons have all been tidied up.
So the current description is: Agile Rates is a small GNOME app for UK Octopus Energy customers who want current and upcoming smart tariff rates, a cheap-time finder, and, if they connect their account, recent usage and estimated spend history. It is independent and is not affiliated with, endorsed by, or sponsored by Octopus Energy. I hope you find it useful.
A slightly more collected version of originally 18 Signal messages. This is a simplification. I am evidently no expert in Unicode specifically or text encoding in general.
I, for a long time, believed that while many modern standards are a mess of legacy compatibility built on legacy compatibility, Unicode was an exception. That the only compromise it made was ASCII-compatibility, but even that wasn’t such a big one given that its character set is the most common one in computing even to this day. I was wrong.
I got a US keyboard so now I have 2 different ways of typing accented characters. I can either hold the A key until I get an option of à, á, â, ä, ǎ, etc., or I can press ⌥ E and then A to get to á, combining ´ and a regular a. I started wondering… when typing it one way or the other, the results must be different, right? I looked for a website that showed me what code points I was typing, and… they were the same?
Most systems (the OS/browser in this case) normalize all text either one way or the other. In this case, to a single code point. Unicode does have deprecation, so you would think that when they introduced combining characters, they would have deprecated the precomposed versions of characters that can be written using them, right? Nope!
It’s arbitrary which way each system normalizes text. Some do it composed (á) and some decomposed (a + ◌́). Both are part of the standard. And of course, you need to treat them as equivalent when not normalized so you might as well do it when you can anyway.
Precomposed characters are the legacy solution for representing many special letters in various character sets. In Unicode, they were included for compatibility with early encoding systems […].
From Precomposed character - WikipediaOh well, my day is ruined. My new life goal is advocacy for the deprecation of all precomposed characters… or maybe I should just accept that all computing will be plagued by backwards compatibility headaches ’til the end of time.
I'm a sucker for pixel art and very constrained music grooveboxes. While I'm not into chiptunes, they sure are a cultural phenomenon.
You heard me boast about the Dirtywave M8 numerous times, even in person, because it's my tool of choice for producing and performing music. Its genius lies in high sound quality and a workflow that grew out of the tiny screen and button constraints on the Nintendo Gameboy, the platform of choice for an app called LSDJ, which the M8 is modelled after. That, and the sheer amount of sound engines living in your pocket. Building on the shoulders of giants and all.
The small M8 community has a few 'celebrities', such as Ess Mattisson. I first heard of Ess when I ran into an amazing single channel track called Wertstoffe. Ess has a great pedigree as the creator of the original Digitone FM synthesizer while working at Elektron. FM remains his forte, and after creating numerous plugins through Fors, he has now released a little 2-operator FM synth and sequencer for the platform of the future, Nintendo Gameboy Advance.
What makes FMS a bit crazy is what it's doing under the hood. The Gameboy Advance has no FM synthesis hardware at all. Its audio gives you two Direct Sound DMA channels of 8-bit signed PCM — that's 256 amplitude levels, roughly 48 dB of dynamic range. For comparison, a CD has 96 dB, in much finer fidelity. The CPU is an ARM7TDMI running at 16.78 MHz with 256 KB of RAM, and that's where all the FM math happens. Sine waves, modulation, mixing four channels, all in real time, in software, on a chip from 2001 that was designed to shuffle sprites around. The hiss you hear is just part of the deal: quantization noise from that 8-bit DAC. So few amplitude steps means everything that comes out has this fuzzy, slightly crushed quality. You can't get rid of it. It is the sound. And somehow there are four channels of 2-operator FM synthesis in there, each with envelopes and ratio control. On a Gameboy Advance.
Picking GBA as a platform of choice in 2026 may be strange. Surprisingly, it can be used on a very large array of hardware. Not only can you plug a memory card into the original hardware or new fancy clones like the Analogue Pocket, you have an exponentially larger choice of dozens if not hundreds of Chinese emulator handhelds from Anbernic, Powkiddy, Miyoo or Retroid. You can also use the Steam Deck or any PC running one of the many emulators, RetroArch being the most popular one.
FMS really touched me. Partly because I have a soft spot for the Nordic demo scene, but mainly for its novel approach to composition. Just like with the M8, creating basic building blocks and then applying transposition to break the looping monotony is my favorite workflow. This little thing has that in the form of pattern and trig transposition but also a novel take on "effects". Yes, you heard me right. There's a sorta-kinda-delay. Even does stereo field ping-pong.
I will keep on trying to create something that … sounds good. The process has been amazing. I truly love some of the sequencing tricks and workflows. The sequencer is, however, so good it would be worth seeing it run on top of a higher quality sound engine too.
USB-C is excellent, provided you don’t look too closely.
I’ve been seeing a drum beat of interest in the internals of USB-C. Darryl Morley’s macOS WhatCable, Chromebooks exposing lots of lovely info about emarkers, USB cable testers and a bit more. Very infrastructure club topics. So I made a small GTK app also called WhatCable which is intended to show what Linux knows about your USB ports, cables, chargers and devices, but written as a GNOME/libadwaita app and using the interfaces Linux exposes through sysfs.
The hope was fairly straightforward: plug things into my Framework 13, ask Linux what is going on, and present the answer in a way that doesn’t require remembering which bit of /sys to poke. In particular I wanted cable identity and e-marker details. These are the useful little facts that tell you whether a cable is what it claims to be, or at least what it claims to be electronically. Given the number of USB-C cables in the house whose origin story is “came in a box with something”, this felt like a public service, or at least a satisfying evening.
The first bit is pleasantly sensible. Linux has standard-ish places for this information:
/sys/bus/usb/devices /sys/class/typec /sys/class/usb_power_delivery /sys/bus/thunderbolt/devicesWhen those are populated, a normal unprivileged app can learn quite a lot. It can show USB devices, Type-C ports, partners, cables, roles, power data, Thunderbolt and USB4 domains. That’s exactly the sort of thing a small Flatpak app should be good at: read some public kernel state, translate it into something at least moderately human friendly and then depart.
On my Framework 13, the USB device and Thunderbolt sides were useful. The Type-C side was not. /sys/class/typec existed but had no ports. /sys/class/usb_power_delivery existed but was empty. This is a slightly annoying result, because it means the nice standard API is present as a signpost rather than a destination.
The next clue was that the machine clearly does have USB-C machinery, and not just because I could look at the side of the device. It is a Framework 13 with the embedded controller and Cypress CCG power delivery controllers doing real work. The relevant kernel modules were loaded, including UCSI and Chrome EC pieces. There was also an ACPI UCSI device at:
/sys/bus/acpi/devices/USBC000:00but ucsi_acpi did not appear to bind to it and create the Type-C class ports. So the hardware and firmware know things, but they were not arriving in the standard Linux userspace shape.
Framework’s own tooling gives another route in. I built framework_tool from FrameworkComputer/framework-system and asked the EC what it could see. The Framework-specific PD port command did not work on this firmware:
USB-C Port 0: [ERROR] EC Response Code: InvalidCommandand similarly for the other ports. That’s not very poetic, but it is at least clear.
The Chromebook-style power command was more useful. With a charger connected it reported, for example:
USB-C Port 0 (Right Back): Role: Sink Charging Type: PD Voltage Now: 19.776 V, Max: 20.0 V Current Lim: 2250 mA, Max: 2250 mA Dual Role: Charger Max Power: 45.0 WThat’s good information. It’s not cable identity, but it is the kind of port state people actually want when they are trying to work out why a laptop is charging slowly, or not charging, or doing something else mildly USB-C shaped.
framework_tool --pd-info could also talk through the EC to the Cypress controllers and report their firmware details:
Right / Ports 01 Silicon ID: 0x2100 Mode: MainFw Ports Enabled: 0, 1 FW2 (Main) Version: Base: 3.4.0.A10, App: 3.8.00 Left / Ports 23 Silicon ID: 0x2100 Mode: MainFw Ports Enabled: 0, 1 FW2 (Main) Version: Base: 3.4.0.A10, App: 3.8.00Again, useful. Again, not the cable.
Much of this investigation and app code was written with AI tools in the loop. That was useful for chasing down boring plumbing and generating probes. The decisive test was asking the Chrome EC for the newer Type-C discovery data directly. The EC advertised USB PD support, but not the newer Type-C command set. EC_CMD_TYPEC_STATUS and EC_CMD_TYPEC_DISCOVERY both came back as invalid commands on all four ports.
That means that on this Framework 13 firmware path I cannot get Discover Identity results, SOP/SOP’ discovery data, SVIDs, mode lists or e-marker details through Chrome EC host commands. The cable may well be telling the PD controller interesting things, but those things are not exposed through a stable unprivileged interface I can sensibly use in a desktop app.
This is the main lesson from the whole exercise: USB-C inspection on Linux is not one API. It is a set of possible stories. Sometimes the kernel Type-C class tells you lots of things. Sometimes Thunderbolt sysfs tells you a different useful slice. Sometimes a vendor EC can tell you power state, but only as root. Sometimes the information exists below you somewhere, but not in a form you should build an app around.
So WhatCable needs to be honest. It should show the sources it can read, and it should say when a source is unavailable rather than pretending absence means certainty. “No cable identity exposed on this machine” is a very different statement from “this cable has no identity”. The former is boring but true. The latter is how you end up lying with an icon (it is not a nice icon).
The current shape I think is right is:
That last point matters. On the host /dev/cros_ec exists, but it is root-only. Making a normal app require broad device access would be a poor bargain. A small privileged helper that answers a few known-safe questions might be acceptable. A graphical app with arbitrary EC command execution would be exciting in the wrong way.
This is not quite the result I wanted when I started. I wanted to show a friendly “this is a 100W e-marked cable” label and feel very clever about it. What I have instead is a more modest app and a better understanding of where the bodies are buried. That’s still useful. A tool that tells you what your machine actually exposes is better than one that implies the USB-C universe is more orderly than it is. Given this, I’m not going to be sharing this one more widely, but fork away if you wish, or come back with a better idea.
It’s very easy to run with GNOME Builder, so just check out the source and ‘press play’ or get an artifact out of the Github Actions. If you run WhatCable on a different laptop and see rich Type-C data, lovely. If you run it on a Framework 13 like mine and mostly see USB devices, Thunderbolt controllers and a note that Type-C data is missing, that is also information. Not as glamorous as catching a suspicious cable in the act, but much more likely to be true.
GNOME’s GitLab runners use Podman as the container runtime with SELinux in Enforcing mode on Fedora. The GitLab Runner Docker/Podman executor spawns multiple containers per job: a helper container that clones the repository and handles artifacts, and a build container that runs the actual CI script. Both containers need to share a /builds volume — and this is where SELinux’s Multi-Category Security (MCS) becomes a problem.
The MCS problemAn SELinux label has four fields: user:role:type:level. For containers the interesting part is the level, also called the MCS field. A level looks like s0:c123,c456 — s0 is the sensitivity (always s0 in targeted policy), and c123,c456 are the categories. A process or file can carry up to two categories.
MCS access is based on dominance. A subject’s label dominates an object’s label if the subject’s categories are a superset of (or equal to) the object’s categories:
Subject Object Access? Why s0:c100,c200 s0:c100,c200 Yes Exact match s0:c100,c200 s0:c100 Yes Subject’s categories are a superset s0:c100,c200 s0:c100,c300 No Subject lacks c300 s0:c0.c1023 s0:c100,c200 Yes Full range dominates everything s0 s0:c100,c200 No No categories can’t dominate any s0 s0 Yes Both have no categoriesHow this applies to the runners:
The range syntax (s0-s0:c0.c1023) is used for processes that need to operate across multiple levels. It means “my low clearance is s0 and my high clearance is s0:c0.c1023.” The process can read objects at any level within that range and create objects at any level within it. This is why Podman needs the full range — it creates containers with different MCS labels and needs to access all of them.
When Podman starts a container, it picks a random pair of categories (e.g., s0:c512,c768) from within its allowed range and assigns that as the container’s process label. Files created by the container inherit that label. Another container gets a different random pair (e.g., s0:c33,c901). Since c512,c768 and c33,c901 do not match — neither is a superset of the other — SELinux denies cross-container file access. This is the isolation mechanism, and the root cause of the problem with GitLab Runner’s multi-container-per-job architecture.
The helper container gets one random MCS pair, writes the cloned repo to /builds labeled with that pair, and the build container gets a different pair. The build container cannot read or write those files. The :Z volume flag (exclusive relabel) relabels the volume to the mounting container’s category, but that only helps the first container — the second one still has a different label.
The test scriptI wrote a script that demonstrates the problem with both standard containers (crun) and microVMs (libkrun). The script creates two containers per test — a helper that writes a file to a shared /builds volume, and a build container that tries to read it — simulating the GitLab Runner workflow:
#!/bin/bash # Description: SELinux MCS Diagnostic (crun vs krun) if [ "$(getenforce)" != "Enforcing" ]; then echo "WARNING: SELinux is not in Enforcing mode. This test requires Enforcing mode." exit 1 fi TEST_BASE="/tmp/gitlab-runner-mcs-test" CRUN_DIR="$TEST_BASE/crun-builds" KRUN_DIR="$TEST_BASE/krun-builds" # Cleanup from previous runs rm -rf "$TEST_BASE" mkdir -p "$CRUN_DIR" "$KRUN_DIR" echo "=======================================================" echo " TEST 1: Standard Container Isolation (crun)" echo "=======================================================" # 1. CREATE Helper podman create --name crun-helper -v "$CRUN_DIR:/builds:Z" fedora bash -c " echo '[crun] -> Helper Process Context (Inside):' cat /proc/self/attr/current echo 'crun-data' > /builds/artifact.txt echo '[crun] -> File Label INSIDE Helper:' ls -Z /builds/artifact.txt " > /dev/null echo "[crun] Starting Helper Container (applying :Z relabel)..." HELPER_HOST_LABEL_CRUN=$(podman inspect -f '{{.ProcessLabel}}' crun-helper) echo "[crun] -> HOST METADATA: Podman assigned process label: $HELPER_HOST_LABEL_CRUN" podman start -a crun-helper echo "" echo "[crun] -> File Label ON HOST (Notice the specific MCS category):" ls -Z "$CRUN_DIR/artifact.txt" # 2. CREATE Build Container (The Victim) podman create --name crun-build -v "$CRUN_DIR:/builds" fedora bash -c " echo ' [Build-Internal] Process Context:' cat /proc/self/attr/current 2>/dev/null echo ' [Build-Internal] Executing ls -laZ /builds :' ls -laZ /builds 2>&1 | sed 's/^/ /' echo ' [Build-Internal] Executing cat /builds/artifact.txt :' cat /builds/artifact.txt 2>&1 | sed 's/^/ /' " > /dev/null echo "" echo "[crun] Starting Build Container to inspect shared volume..." BUILD_HOST_LABEL_CRUN=$(podman inspect -f '{{.ProcessLabel}}' crun-build) echo "[crun] -> HOST METADATA: Podman assigned process label: $BUILD_HOST_LABEL_CRUN" podman start -a crun-build podman rm -f crun-helper crun-build > /dev/null echo "" echo "=======================================================" echo " TEST 2: MicroVM Isolation (libkrun / virtio-fs) FIXED" echo "=======================================================" # --- Write the execution scripts to the host to avoid parsing errors --- cat << 'EOF' > "$TEST_BASE/krun_helper.sh" #!/bin/bash echo '[krun] -> Helper Process Context (Inside VM):' cat /proc/self/attr/current 2>/dev/null || echo ' (SELinux disabled/unavailable in guest kernel)' echo 'krun-data' > /builds/artifact.txt echo '[krun] -> File Label INSIDE Helper VM (Blindspot):' ls -laZ /builds/artifact.txt 2>&1 | sed 's/^/ /' EOF cat << 'EOF' > "$TEST_BASE/krun_build.sh" #!/bin/bash echo ' [Build-Internal] Process Context (Inside VM):' cat /proc/self/attr/current 2>/dev/null || echo ' (SELinux disabled/unavailable in guest kernel)' echo ' [Build-Internal] Executing ls -laZ /builds :' ls -laZ /builds 2>&1 | sed 's/^/ /' echo ' [Build-Internal] Executing cat /builds/artifact.txt :' cat /builds/artifact.txt 2>&1 | sed 's/^/ /' EOF chmod +x "$TEST_BASE/krun_helper.sh" "$TEST_BASE/krun_build.sh" # --------------------------------------------------------------------- # 1. CREATE Helper MicroVM podman create --name krun-helper --runtime krun --memory=1024m \ -v "$KRUN_DIR:/builds:Z" \ -v "$TEST_BASE/krun_helper.sh:/script.sh:ro,Z" \ fedora /script.sh > /dev/null echo "[krun] Starting Helper MicroVM (applying :Z relabel)..." HELPER_HOST_LABEL_KRUN=$(podman inspect -f '{{.ProcessLabel}}' krun-helper) echo "[krun] -> HOST METADATA: Podman assigned process label: $HELPER_HOST_LABEL_KRUN" podman start -a krun-helper echo "" echo "[krun] -> File Label ON HOST (Podman applied the helper's MCS category via :Z):" ls -Z "$KRUN_DIR/artifact.txt" # 2. CREATE Build MicroVM (The Victim) podman create --name krun-build --runtime krun --memory=1024m \ -v "$KRUN_DIR:/builds" \ -v "$TEST_BASE/krun_build.sh:/script.sh:ro,Z" \ fedora /script.sh > /dev/null echo "" echo "[krun] Starting Build MicroVM to inspect shared volume..." BUILD_HOST_LABEL_KRUN=$(podman inspect -f '{{.ProcessLabel}}' krun-build) echo "[krun] -> HOST METADATA: Podman assigned process label: $BUILD_HOST_LABEL_KRUN" echo " *** THE virtiofsd DAEMON ON THE HOST IS TRAPPED IN THIS CONTEXT ***" podman start -a krun-build # Cleanup podman rm -f krun-helper krun-build > /dev/null echo "" echo "=======================================================" echo " Test Complete."Test 1 (crun) creates a helper container that mounts the builds directory with :Z (exclusive relabel) and writes artifact.txt. Podman assigns it a random MCS label — in this run it was s0:c20,c540. The file on disk inherits that label. Then a second container (the build container) mounts the same path without :Z and gets a different random label (s0:c46,c331). Since c46,c331 does not dominate c20,c540, the build container is denied access to the file.
Test 2 (krun) runs the same scenario but with --runtime krun, which boots each container inside a lightweight microVM via libkrun. The helper VM gets container_kvm_t:s0:c823,c999 and the build VM gets container_kvm_t:s0:c309,c405 — same MCS mismatch, same denial. The type changes from container_t to container_kvm_t, but the MCS mechanism is identical. On the host side, virtiofsd — the daemon that serves the volume into the VM via virtio-fs — runs under the MCS label Podman assigned to the VM. The build VM’s virtiofsd is trapped in s0:c309,c405 and cannot access files labeled s0:c823,c999.
An interesting detail: inside the libkrun VMs, cat /proc/self/attr/current returns just kernel — SELinux is not available in the guest. The VM thinks it has no mandatory access control, but the host-side virtiofsd is still fully subject to MCS enforcement. This is a blindspot worth being aware of.
The output from a run on Fedora with SELinux Enforcing and Podman 5.8.2:
======================================================= TEST 1: Standard Container Isolation (crun) ======================================================= [crun] Starting Helper Container (applying :Z relabel)... [crun] -> HOST METADATA: Podman assigned process label: system_u:system_r:container_t:s0:c20,c540 [crun] -> Helper Process Context (Inside): system_u:system_r:container_t:s0:c20,c540 [crun] -> File Label INSIDE Helper: system_u:object_r:container_file_t:s0:c20,c540 /builds/artifact.txt [crun] -> File Label ON HOST (Notice the specific MCS category): system_u:object_r:container_file_t:s0:c20,c540 /tmp/gitlab-runner-mcs-test/crun-builds/artifact.txt [crun] Starting Build Container to inspect shared volume... [crun] -> HOST METADATA: Podman assigned process label: system_u:system_r:container_t:s0:c46,c331 *** COMPARE THE cXXX,cYYY ABOVE TO THE FILE LABEL. THIS MISMATCH CAUSES THE DENIAL *** [Build-Internal] Process Context: system_u:system_r:container_t:s0:c46,c331 [Build-Internal] Executing ls -laZ /builds : ls: cannot open directory '/builds': Permission denied [Build-Internal] Executing cat /builds/artifact.txt : cat: /builds/artifact.txt: Permission denied ======================================================= TEST 2: MicroVM Isolation (libkrun / virtio-fs) FIXED ======================================================= [krun] Starting Helper MicroVM (applying :Z relabel)... [krun] -> HOST METADATA: Podman assigned process label: system_u:system_r:container_kvm_t:s0:c823,c999 [krun] -> Helper Process Context (Inside VM): kernel [krun] -> File Label INSIDE Helper VM (Blindspot): -rw-r--r--. 1 root root system_u:object_r:container_file_t:s0:c823,c999 10 May 2 2026 /builds/artifact.txt [krun] -> File Label ON HOST (Podman applied the helper's MCS category via :Z): system_u:object_r:container_file_t:s0:c823,c999 /tmp/gitlab-runner-mcs-test/krun-builds/artifact.txt [krun] Starting Build MicroVM to inspect shared volume... [krun] -> HOST METADATA: Podman assigned process label: system_u:system_r:container_kvm_t:s0:c309,c405 *** THE virtiofsd DAEMON ON THE HOST IS TRAPPED IN THIS CONTEXT *** [Build-Internal] Process Context (Inside VM): kernel [Build-Internal] Executing ls -laZ /builds : ls: /builds: Permission denied ls: cannot open directory '/builds': Permission denied [Build-Internal] Executing cat /builds/artifact.txt : cat: /builds/artifact.txt: Permission denied ======================================================= Test Complete. GitLab’s official suggestion and why it falls shortGitLab’s documentation on configuring SELinux MCS suggests applying the same MCS label to all containers launched by a runner:
[[runners]] [runners.docker] security_opt = ["label=level:s0:c1000,c1000"]This works — all containers get the same category pair, so the helper and build containers can share files. But it collapses MCS isolation between all concurrent jobs on that runner. With concurrent = 4, four simultaneous jobs all run as s0:c1000,c1000 and can read each other’s /builds content — cloned source code, build artifacts, cached dependencies. On a shared or multi-tenant runner, this is a security regression: it trades MCS isolation for functionality.
For runners with concurrent = 1 or dedicated single-tenant runners this is an acceptable tradeoff, but it does not generalize to shared infrastructure where multiple untrusted projects run side by side.
How GNOME currently handles thisGNOME’s runners are managed via an Ansible role that enforces SELinux in Enforcing mode, installs rootless Podman running as a dedicated podman system user with linger enabled, and deploys custom SELinux policy modules. The Podman service runs under SELinuxContext=system_u:system_r:container_runtime_t:s0-s0:c0.c1023 via a systemd override — the full MCS range (s0-s0:c0.c1023) gives the container runtime the ability to spawn containers at any MCS level and relabel volumes accordingly, as explained in the dominance rules above.
Four custom SELinux .te modules are compiled and loaded on every runner host: pydocuum (allows the image cleanup daemon to talk to the Podman socket), podman (grants user_namespace create and /dev/null mapping), flatpak (permits the filesystem mounts flatpak builds need), and gnome_runner (covers binfmt_misc access, device nodes, and other permissions GNOME OS builds require).
For the MCS problem specifically, the runner config.toml — rendered from a Jinja2 template via per-host Ansible variables — sets a fixed MCS label per runner type. Here’s a representative snippet from one of the runner hosts:
[[runners]] name = "a15948139c78" executor = "docker" [runners.docker] image = "quay.io/fedora/fedora:latest" privileged = false security_opt = ["label=level:s0:c100,c100"] devices = ["/dev/kvm", "/dev/udmabuf"] cap_add = ["SYS_PTRACE", "SYS_CHROOT"] [[runners]] name = "a15948139c78-flatpak" executor = "docker" [runners.docker] image = "quay.io/gnome_infrastructure/gnome-runtime-images:gnome-master" privileged = false security_opt = ["seccomp:/home/podman/gitlab-runner/flatpak.seccomp.json", "label=level:s0:c200,c200"] cap_drop = ["all"]This is the same approach GitLab’s documentation suggests, with one refinement: we use different fixed categories per runner type — c100,c100 for untagged runners and c200,c200 for flatpak runners — so that flatpak builds and regular builds remain MCS-isolated from each other, even though builds of the same type share a category.
This is a pragmatic compromise, not an ideal solution. All concurrent jobs on the same runner type share the same MCS category. With concurrent: 4 on our Hetzner runners, four simultaneous untagged jobs can read each other’s /builds content. For GNOME’s use case — a community CI infrastructure where the runners are shared by GNOME project maintainers — this is an acceptable tradeoff. The alternative, leaving MCS labels random, would break every single job. But it is precisely this tradeoff that motivates exploring per-job VM isolation via microVMs.
Exploring libkrunlibkrun is a lightweight Virtual Machine Monitor (VMM) that integrates with Podman via --runtime krun, running each container inside a microVM with its own lightweight kernel. The appeal is strong: per-container VM isolation would give each job its own kernel and address space, making the MCS cross-container problem irrelevant inside the VM.
I tested libkrun on a Fedora system and hit an immediate blocker: Fatal glibc error: rseq registration failed. The rseq (Restartable Sequences) syscall was introduced in Linux kernel 5.3 and is required by glibc >= 2.35. libkrun uses a custom minimal kernel that does not expose rseq support. Since the guest images — Fedora in our case — ship modern glibc that expects rseq to be available, the process aborts at startup before any user code runs.
The libkrun kernel is compiled into the library itself and cannot be modified or replaced by the user. This is not a configuration issue but a fundamental limitation of the current libkrun release.
Even if the rseq issue were resolved, the MCS challenge would still be there — as the test script demonstrates in Test 2. On the host side, Podman assigns MCS labels to the virtiofsd process that serves the volume into the VM via virtio-fs. Different VMs get different host-side MCS labels, meaning the same :Z relabel / cross-container access denial applies. The mechanism changes from overlay mounts to virtio-fs, but the SELinux enforcement is identical: virtiofsd for the build VM runs at container_kvm_t:s0:c309,c405 and cannot access files labeled s0:c823,c999 by the helper VM’s virtiofsd.
Firecracker and the custom executor pathFirecracker is another microVM technology, the one behind AWS Lambda and Fly.io, that could provide strong per-job isolation. However, there is no native GitLab Runner executor for Firecracker. The only integration path is the Custom Executor, which requires implementing prepare, run, and cleanup scripts from scratch.
The job image is exposed via CUSTOM_ENV_CI_JOB_IMAGE, but everything else is on the operator: pulling the OCI image, extracting a rootfs, booting a Firecracker VM with the right kernel and network configuration, injecting the build script, mounting or copying the cloned repository into the VM, collecting artifacts and cache after the job finishes, and tearing the VM down. GitLab provides an LXD-based example that shows the pattern — prepare creates a container and installs dependencies, run pipes the job script into it, cleanup destroys it — but adapting that to microVMs adds the complexity of VM lifecycle management, kernel and rootfs preparation, networking, and storage. This is a significant engineering effort, essentially rebuilding the entire Docker executor workflow from scratch.
What comes nextMCS is a core SELinux feature. Type enforcement (TE) already confines processes by type — container_t can only access container_file_t, not user_home_t or httpd_sys_content_t — but TE alone cannot distinguish one container_t process from another. MCS adds that layer: by assigning each container a unique category pair, the kernel enforces isolation between processes that share the same type. Container A at s0:c100,c100 and Container B at s0:c200,c200 are both container_t, but MCS ensures they cannot touch each other’s files. The conflict with GitLab Runner’s multi-container-per-job architecture is that two containers that need to share a volume are given different categories by default. The workarounds we deploy today, including the fixed MCS labels on GNOME’s runners, trade that inter-container isolation for functionality.
The most promising direction I’ve found so far is the combination of Cloud Hypervisor and the fleeting-plugin-fleetingd plugin. Cloud Hypervisor is built on Intel’s Rust-VMM crate and is essentially a more capable sibling of Firecracker — it supports CPU and memory hotplugging, VFIO device passthrough, and virtio-fs, features that are often necessary for complex CI tasks like building large binaries or running UI tests and that Firecracker’s minimalist design deliberately omits. The fleeting-plugin-fleetingd is a community plugin for GitLab’s Instance Executor (the modern evolution of the Custom Executor) that automates the full VM lifecycle: downloading cloud images, creating Copy-on-Write disks, launching Cloud Hypervisor VMs with direct kernel boot, provisioning them via cloud-init, and tearing them down after each build. Each job gets a fresh disposable VM, which is exactly the per-job isolation model we need. The plugin already handles networking via TAP interfaces and nftables SNAT, and supports customization of the VM image through cloud-init commands — so preinstalling Podman or other build tools is straightforward.
Beyond that, I’ll also keep evaluating libkrun (promising Red Hat technology), Firecracker with a hand-rolled custom executor, and QEMU’s microvm machine type. The common denominator across all of these — except for the fleeting-plugin-fleetingd path — is that none of them have an existing GitLab Runner integration. Regardless of which microVM technology we settle on, the path forward involves either building a workflow from scratch using the Custom Executor and its prepare, run, cleanup hooks, or leveraging the fleeting plugin ecosystem that GitLab has been building around the Instance and Docker Autoscaler executors.
CVE-2026-31431The urgency of per-job VM isolation was underscored by CVE-2026-31431 (“Copy Fail”), a nine-year-old logic bug in the kernel’s algif_aead cryptographic module disclosed at the end of April. The flaw lets an unprivileged local user write four controlled bytes into the page cache of any readable file — enough to patch a setuid binary like /usr/bin/su and escalate to root. Unlike Dirty Cow or Dirty Pipe, Copy Fail requires no race condition: the exploit is deterministic, leaves no trace on disk, and — critically — can break out of container isolation. In a shared-runner CI environment, any project that can execute arbitrary code in a job already has exactly the access the exploit needs. Separately, Claude Mythos — an Anthropic model trained for cybersecurity research that escaped its own sandbox during a red-team exercise in April — demonstrated that AI-assisted vulnerability discovery and exploitation is no longer theoretical; models can now autonomously find and chain bugs that would take human researchers weeks to exploit. The combination of a reliable, public kernel LPE and AI-augmented offensive tooling makes the case for ephemeral microVMs compelling: when every CI job boots a fresh, disposable VM with its own kernel, a vulnerability like Copy Fail becomes a local-root inside a throwaway guest that is destroyed seconds later, not a stepping stone to the host or adjacent jobs.
That should be all for today, stay tuned!
GNOME is once again participating in GSoC. This year, we have 6 contributors working on adding Debug Adapter Protocol support to GJS, incorporating vocab-style puzzles into GNOME Crosswords, creating a native GTK4/Rust rewrite of the Pitivi timeline ruler, porting gitg to GTK4, implementing app uninstallation in the GNOME Shell app grid, and enabling recovery from GPU resets.
As we onboard the contributors, we will be adding them to Planet GNOME, where you can get to know them better and follow their project updates.
GSoC is a great opportunity to welcome new people into our project. Please help them get started and make them feel at home in our community!
Special thanks to our community mentors, who are donating their time and energy to help welcome and guide our new contributors: Philip Chimento, Jonathan Blandford, Yatin, Alex Băluț, Alberto Fanjul, Adrian Vovk, Jonas Ådahl, and Robert Mader.
For more information, visit https://summerofcode.withgoogle.com/programs/2026/organizations/gnome-foundation
Recently, I have been using GNOME OS, as my daily driver.
After being a seasoned Linux for long, dabbling in distros like Alpine Linux, Arch Linux, Fedora (and even Silverblue), I tried switching to something more opinionated and that "works by default" all while being hard to break.
And given my existing relationship with GNOME, GNOME OS was a choice worth looking into.
One feature of GNOME OS is that it is immutable (i.e. system files are read-only). It also doesn't ship with a package manager, so it doesn't have functionality built-in to install extra packages.
You can install GUI Applications normally using Flathub (and Snap/AppImage), but installing non-GUI applications like development tools or CLI packages is not built-in.
There are of course several solutions you can use, such as homebrew, coldbrew, but today we will focus on mise.
What is mise?mise pitches itself as "One tool to manage languages, env vars, and tasks per project, reproducibly."
However, I only use a fraction of it's functionality, in that I only use it to install packages.
How to install it?The instructions are here: https://mise.jdx.dev/getting-started.html
But essentially it's as easy as running this (remember to read the source of the installer first):
curl https://mise.run | sh Activating miseThen you will need to "activate" mise, which essentially makes tools installed by mise available by modifying your $PATH variable
echo 'eval "$(~/.local/bin/mise activate bash --shims)"' >> ~/.bashrcThe instructions above are for bash, so you will need to consult the docs to get instructions for your shell.
You will need to re-login for the mise command to be available, or open a new shell.
A note on shimsFeel free to skip this section, as it's just an explainer
Also, note that the above command use the --shims flag, which is NOT the default. It essentially means that mise will modify the $PATH variable, instead of doing a weird thing where it will re-activate itself after each command you run.
The non-shim way to activate mise is useful when you use mise to install different package versions across different repositories, but that sometimes breaks IDEs and is our of the scope of this blog post.
Installing packagesYou can start installing your first package with mise:
mise use -g javaThe above command installs java globally (hence the -g flag), which you can now confirm by running:
$ java --version openjdk 26.0.1 2026-04-21 OpenJDK Runtime Environment (build 26.0.1+8-34) OpenJDK 64-Bit Server VM (build 26.0.1+8-34, mixed mode, sharing)You can install much more tools, of which you can find a non-complete list here: mise-tools.
For example, you can similarly install a specific major version of nodejs
mise use -g node@22Or install the latest LTS version of node
mise use -g node@ltsOr you can be overlay specific
mise use -g node@v25.9.0 mise use -g node@25.9.0 # this works too! SearchingUse mise search to find packages.
mise search typ Tool Description typos Source code spell checker. https://github.com/crate-ci/typos typst A new markup-based typesetting system that is powerful and easy to learn. https://github.com/typst/typst typstyle Beautiful and reliable typst code formatter. https://github.com/Enter-tainer/typstyle quicktype Generate types and converters from JSON, Schema, and GraphQL provided by https://quicktype.io. https://www.npmjs.com/package/quicktype Uninstalling mise unuse -g node Updating mise self-update # updating mise itself mise up # updating tools installed by mise mise outdated # checking if you have outdated tools Config FileTools you install with mise globally will be saved in the file ~/.config/mise/config.toml, which you can commit to your dotfiles so you can have similar tools across different machines.
Here's an example of my mise config file at the time of writing this blog post.
# ~/.config/mise/config.toml [tools] bat = "latest" btop = "latest" bun = "latest" caddy = "latest" "cargo:mergiraf" = "latest" deno = "latest" difftastic = "latest" doggo = "latest" fastfetch = "latest" fzf = "latest" github-cli = "latest" "github:railwayapp/railpack" = "latest" glab = "latest" helix = "latest" java = "latest" lazygit = "latest" node = "latest" "npm:vscode-langservers-extracted" = "latest" oha = "latest" pipx = "latest" pnpm = "latest" prettier = "latest" rust = "latest" scooter = "latest" tmux = "latest" usage = "latest" yt-dlp = { version = "latest", rename_exe = "yt-dlp" } zellij = "latest" "github:patryk-ku/music-discord-rpc" = { version = "latest", asset_pattern = "music-discord-rpc" } rclone = "latest" mc = "latest" go = "latest" "go:git.sr.ht/~migadu/alps/cmd/alps" = "latest" "npm:localtunnel" = "latest"After the tools inside the config has changed, you can run the following comand to make mise re-install packages from the config file
mise install Mise BackendsMise is able to install packages from multiple sources. These sources are called "backends" by mise.
When you type mise use -g node@22, it will resolve node against the registry and figure out that the default backend for node is core
CoreThe default backend is called core and tools from this backend are usually provided from the official source.
Other tools that are available from core include Node.js, Ruby, Python, etc...
We could also have been explicit with the backend we want to use
mise use -g core:nodeYou can find a list of all core packages here.
AquaYou can also install packages from the Aqua registry.
Language Package ManagersYou can also install tools from their respective package managers. Here are a few examples
npmYou can install prettier, typescript, oxlint and other JavaScript/TypeScript tools published on the npm registry. Find the tools on npm
mise use -g npm:prettier pipxYou can install black, poetry and other Python tools from pypi. Find the tools on pypi
mise use -g pipx:black pipx:git+https://github.com/psf/black.git # from a github repo cargoYou can install cargo packages with this backed. You need to have rust installed beforehand though, which you can do with mise
mise use -g rustThen install your packages
mise use -g cargo:ezaThere are more language package manager backends like: gem, go and more.
GithubYou can install packages from Github directly, as long as the project you are trying to install from uses Github releases
mise use -g github:railwayapp/railpackmise will usually auto-detect which asset you want to use, but you can also specify the asset glob in ~/.config/mise/config.toml
[tools] "github:patryk-ku/music-discord-rpc" = { version = "latest", asset_pattern = "music-discord-rpc" }I got myself a Yubikey recently, and I wanted to use it as a nice convenience to:
I've only managed to do the first two, since they both rely on Linux Pluggable Authentication Modules (PAM). Luckily for me, one of PAM's modules supports U2F, the standard Yubikeys rely on.
First I need to install pam-u2f to add U2F support to PAM, and pamu2fcfg to configure my key.
$ sudo rpm-ostree install pam-u2f pamu2fcfgSince I'm running an immutable OS I need to reboot, and then I can create the correct directory and file to dump an U2F key into it.
$ mkdir -p ~/.config/Yubico $ pamu2fcfg > ~/.config/Yubico/u2f_keysThen I make sure to have a root session open in case I lock myself out of sudoers.
$ sudo su #In a different terminal, I can edit the sudoers file to add this line
#%PAM-1.0 auth sufficient pam_u2f.so cue openasuser auth include system-auth account include system-auth password include system-auth session optional pam_keyinit.so revoke session required pam_limits.so session include system-authI save this file and open a new terminal. I type in sudo vi and it asks me to touch my FIDO authenticator before opening vi! If I touch the Yubikey, it indeed opens vi with root privileges.
Let's break down the line:
It's also possible to use it to unlock my session, but it would be a bit reckless to allow anyone with my Yubikey to log into my laptop. If my backpack gets stolen and it has both my Yubikey and my laptop, anyone can log in.
It's possible to make the login screen require either my user password, or all of
If someone fails more than three times to enter the correct PIN, the Yubikey will lock itself and require a PUK to be unlocked. This gives me an additional layer of security, and it's more convenient than having to type a full length passphrase.
I've added the following line to /etc/pam.d/greetd (the greeter I use):
#%PAM-1.0 auth sufficient pam_u2f.so cue openasuser pinverification=1 userpresence=1 auth substack system-auth [...][!warning] I can lose my Yubikey
I use my Yubikey as a nice convenience to set up a weaker PIN while not compromising too much on security. I use it instead of a password, no in addition to it.
Since I can lose or break my Yubikey and I don't want to buy two of them, I make the U2F login sufficient but not required. This means I can still fallback to password authentication if I lose my Yubikey.
Finally, DankMaterialShell uses its own lockscreen manager too. I still want to be able to fallback to password authentication if need be, so I'll configure it to accept U2F OR the password, not both.
This means that the lockscreen will call /etc/pam.d/dankshell-u2f to know what to do when the screen is locked. Since this file doesn't exist, I can create it with the following content.
#%PAM-1.0 auth sufficient pam_u2f.so cue openasuser pinverification=1 userpresence=1I need a fallback for when I don't have my Yubikey, so I also create the one for this occasion
#%PAM-1.0 auth include system-authFinally, I have a consistent setup where both my login and lock screen require me to plug my key, enter its PIN and touch it, or enter my full password. When it comes to sudo, I can only touch my key without requiring an PIN.
My next quest will be to use my Yubikey to unlock my LUKS-encrypted disk.
A few years back I did a quick exploration of what GNOME app icons might look like in an alternate universe where we kept on using VGA displays. Chiselling pixels away is therapeutic. So while there is absolutely no use for these, I keep on making them if only to bring some attention to what really matters for GNOME, having nice apps.
Here's a batch of mostly GNOME Circle app icons, with some 3rd party ones thrown in.
If you're reading this on my site rather than Planet GNOME or some flickering terminal in an abandoned Vault, then congratulations. You've stumbled upon a working Pip-Boy module! Found it half-buried under irradiated rubble, its phosphor display still humming with that familiar green glow. Enjoy these icons the way the dwellers of Vault 101 were always meant to, one glorious scanline at a time.
Hello there,
You thought I’d given up on “status update” blog posts, did you ? I haven’t given up, despite my better judgement, this one is just even later than usual.
Recently I’ve been using my rather obscure platform as a blogger to theorize about AI and the future of the tech industry, mixed with the occasional life update, couched in vague terms, perhaps due to the increasing number of weirdos in the world who think doxxing and sending death threats to open source contributors is a meaningful use of their time.
In fact I do have some theories about how George Orwell (in “Why I Write”) and Italo Calvino (in “If On a Winter’s Night a Traveller”) made some good guesses from the 20th century about how easy access to LLMs would affect communication, politics and art here in the 21st. But I’ll leave that for another time.
It’s also 8 years since I moved to this new country where I live now, driving off the boat in a rusty transit van to enjoy a series of unexpected and amazing opportunities. Next week I’m going to mark the occasion with a five day bike ride through the mountains of Asturias, something I’ve been dreaming of doing for several years. But I’m not going to talk about that, either.
The original idea of writing a monthly post was to keep tabs on various open source software projects I sometimes manage to contribute to, and perhaps even to motivate me to do more such volunteering. Well that part didn’t work, house renovations and an unexpectedly successful gig playing synth and trombone took over all my free time; but after many years of working on corporate consultancy and doing a little open source in the background, I’m trying to make a space at work to contribute in the open again.
I could tell the whole story here of how Codethink became “the build system people”. Maybe I will actually. It all started with BuildStream. In fact, that’s not even true. it all started in 2011 when some colleagues working with MeeGo and Yocto thought, “This is horrible, isn’t it?”
They set out to create something better, and produced Baserock, which unfortunately turned out even worse. But it did have some good ideas. The concept of “cache keys” to identify build inputs and content-addressed storage to hold build outputs began there, as did the idea of opening a “workspace” to make drive-by changes in build inputs within a large project.
BuildStream took this core idea, extended it to support arbitrary source kinds and element kinds defined by plugins, and added a shiny interface on top. initially It used OSTree to store and distribute build artifacts, later migrating to the Google REAPI with the goal of supporting Enterprise(TM) infrastructure. You can even use it alongside Bazel, if you like having three thousand commandline options at your disposal.
Unfortunately it was 2016, so we wrote the whole thing in Python. (In our defence, the Rust programming language had only recently hit 1.0 and crates.io was still a ghost town, and we’d probably still be rewriting the ruamel.yaml package in Rust if we had taken that road.) But the company did make some great decisions, particularly making a condition of success for the BuildStream project that it could unify the 5 different build+integration systems that GNOME release team were maintaining. And that success meant not just a prototype of this, but release team actually using BuildStream to make releases. Tristan even ended up joining the GNOME release team for a while. We discussed it all at the 2017 Manchester GUADEC event, coincidentally. It was a great time. (Aside from those 6 months leading up to the conference.)
At this point, the Freedesktop SDK already existed, with the same rather terrible name that it has today, and was already the base runtime for this new app container tool that was named… xdg-app. (At least that eventually gained a better name). However, if you can remember 8 years ago, it had a very different form than today. Now, my memory of what happened next is especially hazy at this point, because like I told you in the beginning, I was on a boat with my transit van heading towards a new life in Spain. All I have to go on 8 years later is the Git history, but somehow the Freedesktop SDK grew a 3-stage compiler bootstrap, over 600 reusable BuildStream elements, its own Gitlab namespace, and even some controversial stickers. As a parting gift I apparently added support for building VMs, the idea being that we’d reinstate the old GNOME Continuous CI system that had unfortunately died of neglect several years earlier. This idea got somewhat out of hand, let’s say.
It took me a while to realize this, but today Freedesktop SDK is effectively the BuildStream reference distribution. What Poky is to BitBake in the Yocto project, this is what Freedesktop SDK is to BuildStream. And this is a pretty important insight. It explains the problem you may have experienced with the BuildStream documentation: you want to build some Linux package, so you read through the manual right to the end, and then you still have no fucking idea how to integrate that package.
This isn’t a failure on the part of the authors, instead the issue is that your princess is in another castle. Every BuildStream project I’ve ever worked on has junctioned freedesktop-sdk.git and re-used the elements, plugins, aliases, configurations and conventions defined there, all of which are rigorously undocumented. The Freedesktop SDK Guide, for reasons that I won’t go into, doesn’t venture much further than than reminding you how to call Make targets.
And this is something of a point of inflection. The BuildStream + Freedesktop SDK ecosystem has clearly not displaced Yocto, nor for that matter Linux Mint. But, like many of my favourite musicians, it has been quietly thriving in obscurity. People I don’t know are using it to do things that I don’t completely understand. I’ve seen it in comparison articles, and even job adverts. ChatGPT can generate credible BuildStream elements about as well as it can generate Dockerfiles (i.e. not very well, but it indicates a certain level of ubiquity). There have been conferences, drama, mistakes, neglect. It’s been through an 8 person corporate team hyper-optimizing the code, and its been though a mini dark age where volunteers thanklessly kept the lights on almost single handledly, and its even survived its transition to the Apache Foundation.
Through all of this, the secret to its success probably that its just a really nice tool to work with. As much as you can enjoy software integration, I enjoy using BuildStream to do it; things rarely break, when they do its rarely difficult to fix them, and most importantly the UI is really colourful! I’m now using it to build embedded system images for a product named CTRL, which you can think of as.. a Linux distribution. There are some technical details to this which I’m working to improve, which I won’t bore you with here.
I also won’t bore you with the topic of community governance this month, but that’s what’s currently on my mind. If you’ve been part of the GNOME Foundation for a few years, you’ll know this something that’s usually boring and occasionally becomes of almost life-or-death importance. The “let’s just be really sound” model works great, until one day when you least expect it, and then suddenly it really doesn’t. There is no perfect defence against this, and in open source communities its our diversity that brings the most resilience. When GNOME loses, KDE gains, and that way at least we still don’t have to use Windows. Indeed, this is one argument for investing in BuildStream even if it remains forever something of a minority sport. I guess I just need to remember that when you have to start thinking hard about governance, that’s a sign of success.