You are here

Planet GNOME

Subscribe to Feed Planet GNOME
Planet GNOME -
Përditësimi: 3 javë 1 ditë më parë

Ruben Vermeersch: Go: debugging multiple response.WriteHeader calls

Pre, 26/01/2018 - 4:11md

Say you’re building a HTTP service in Go and suddenly it starts giving you these:

http: multiple response.WriteHeader calls

Horrible when that happens, right?

It’s not always very easy to figure out why you get them and where they come from. Here’s a hack to help you trace them back to their origin:

type debugLogger struct{} func (d debugLogger) Write(p []byte) (n int, err error) { s := string(p) if strings.Contains(s, "multiple response.WriteHeader") { debug.PrintStack() } return os.Stderr.Write(p) } // Now use the logger with your http.Server: logger := log.New(debugLogger{}, "", 0) server := &http.Server{ Addr: ":3001", Handler: s, ErrorLog: logger, } log.Fatal(server.ListenAndServe())

This will output a nice stack trace whenever it happens. Happy hacking!

Comments | More on | @rubenv on Twitter

Christian Schaller: An update on Pipewire – the multimedia revolution

Pre, 26/01/2018 - 3:35md

We launched PipeWire last September with this blog entry. I thought it would be interesting for people to hear about the latest progress on what I believe is going to be a gigantic step forward for the Linux desktop. So I caught up with Pipewire creator Wim Taymans during DevConf 2018 in Brno where Wim is doing a talk about Pipewire and we discussed the current state of the code and Wim demonstrated a few of the things that PipeWire now can do.

Christian Schaller and Wim Taymans testing PipeWire with Cheese

Priority number 1: video handling

So as we said when we launched the top priority for PipeWire is to address our needs on the video side of multimedia. This is critical due to the more secure nature of Wayland, which makes the old methods for screen sharing not work anymore and the emergence of desktop containers in the form of Flatpak. Thus we need PipeWire to help us provide appliation and desktop developers with a new method for doing screen sharing and also to provide a secure way for applications inside a container to access audio and video devices on the system.

There are 3 major challenges PipeWire wants to solve for video. One is device sharing, meaning that multiple applications can share the same video hardware device, second it wants to be able to do so in a secure manner, ensuring your video streams are not highjacked by a rogue process and finally it wants to provide an efficient method for sharing of multimedia between applications, like for instance fullscreen capture from your compositor (like GNOME Shell) to your video conferencing application running in your browser like Google Hangouts, Blue Jeans or Pexip.

So the first thing Wim showed me in action was the device sharing. We launched the GNOME photoboot application Cheese which gets PipeWire support for free thanks to the PipeWire GStreamer plugin. And this is an important thing to remember, thanks to so many Linux applications using GStreamer these days we don’t need to port each one of them to PipeWire, instead the PipeWire GStreamer plugin does the ‘porting’ for us. We then launched a gst-launch command line pipeline in a terminal. The result is two applications sharing the same webcam input without one of them blocking access for the other.

As you can see from the screenshot above it worked fine, and this was actually done on my Fedora Workstation 27 system and the only thing we had to do was to start the ‘pipewire’ process in a termal before starting Cheese and the gst-launch pipeline. GStreamer autoplugging took care of the rest. So feel free to try this out yourself if you are interested, but be aware that you will find bugs quickly if you try things like on the fly resolution changes or switching video devices. This is still tech preview level software in Fedora 27.

The plan is for Wim Taymans to sit down with the web browser maintainers at Red Hat early next week and see if we can make progress on supporting PipeWire in Firefox and Chrome, so that conferencing software like the ones mentioned above can start working fully under Wayland.

Since security was one of the drivers for the move to Wayland from X Windows we of course also put a lot of emphasis of not recreating the security holes of X in the compositor. So the way PipeWire now works is that if an application wants to do full screen capture it will check with the compositor through a dbus-api, or a portal in Flatpak and Wayland terminology, and only allows the permited application to do the screen capture, so the stream can’t be highjacked by a random rougue application or process on your computer. This also works from within a sandboxed setting like Flatpaks.

Jack Support

Another important goal of PipeWire was to bring all Linux audio and video together, which means PipeWire needed to be as good or better replacement for Jack for the Pro-Audio usecase. This is a tough usecase to satisfy so while getting the video part has been the top development priority Wim has also worked on verifying that the design allows for the low latency and control needed for Pro-Audio. To do this Wim has implemented the Jack protocol on top of PipeWire.

Carla, a Jack application running on top of PipeWire.

Through that work he has now verified that he is able to achieve the low latency needed for pro-audio with PipeWire and that he will be able to run Jack applications without changes on top of PipeWire. So above you see a screenshot of Carla, a Jack-based application running on top of PipeWire with no Jack server running on the system.

ALSA/Legacy applications

Another item Wim has written the first code for and verfied will work well is the Alsa emulation. The goal of this piece of code is to allow applications using the ALSA userspace API to output to Pipewire without needing special porting or application developer effort. At Red Hat we have many customers with older bespoke applications using this API so it has been of special interest for us to ensure this works just as well as the native ALSA output. It is also worth nothing that Pipewire also does mixing so that sound being routed through ALSA will get seamlessly mixed with audio coming through the Jack layer.

Bluetooth support

The last item Wim has spent some time on since last September is working on making sure Bluetooth output works and he demonstrated this to me while we where talking together during DevConf. The Pipewire bluetooth module plugs directly into the Bluez Bluetooth framework, meaning that things like the GNOME Bluetooth control panel just works with it without any porting work needed. And while the code is still quite young, Wim demonstrated pairing and playing music over bluetooth using it to me.

What about PulseAudio?

So as you probably noticed one thing we didn’t mention above is how to deal with PulseAudio applications. Handling this usecase is still on the todo list and the plan is to at least initially just keep PulseAudio running on the system outputing its sound through PipeWire. That said we are a bit unsure how many appliations would actually be using this path because as mentioned above all GStreamer applications for instance would be PipeWire native automatically through the PipeWire GStreamer plugins. And for legacy applications the PipeWire ALSA layer would replace the current PulseAudio ALSA layer as the default ALSA output, meaning that the only applications left are those outputing to PulseAudio directly themselves. The plan would also be to keep the PulseAudio ALSA device around so if people want to use things like the PulseAudio networked audio functionality they can choose the PA ALSA device manually to be able to keep doing so.
Over time the goal would of course be to not have to keep the PulseAudio daemon around, but dropping it completely is likely to be a multiyear process with current plans, so it is kinda like XWayland on top of Wayland.


So you might read this and think, hey if all this work we are almost done right? Well unfortunately no, the components mentioned here are good enough for us to verify the design and features, but they still need a lot of maturing and testing before they will be in a state where we can consider switching Fedora Workstation over to using them by default. So there are many warts that needs to be cleaned up still, but a lot of things have become a lot more tangible now than when we last spoke about PipeWire in September. The video handling we hope to enable in Fedora Workstation 28 as mentioned, while the other pieces we will work towards enabling in later releases as the components mature.
Of course the more people interesting in joining the PipeWire community to help us out, the quicker we can mature these different pieces. So if you are interested please join us in #pipewire on or just clone the code of github and start hacking. You find the details for irc and git here.

Bastian Ilsø Hougaard: GNOME at FOSDEM 2018 – with socks and more!

Pre, 26/01/2018 - 1:50pd

Sunrise over Hobart seen from Mt Wellington, Tasmania (CC-BY-SA 4.0).

It’s been a while huh? The past six months held me busy traveling and studying abroad in Australia, but I’m back! With renewed energy, and lots and lots of GNOME socks for everyone. Like previous years, I’m helping out in GNOME’s FOSDEM booth at the FOSDEM 2018 conference.

FOSDEM 2016. (CC-BY-SA 4.0)

I have arranged a whopping 420 pairs of GNOME socks produced and hopefully arriving before my departure. baby Socks, ankle socks, regular Socks and even knee socks – maybe I should order an extra suit case to fill up. Even so, I estimate I can probably bring 150 pairs at max (last year my small luggage held 55 pairs..). Because of the large quantity I’ve designed them to be fairly neutral and “simple” (well, actually the pattern is rather complicated).

Sample sock made prior to production.

Breakdown of the horizontally repeatable sock pattern.

I plan to bring them to FOSDEM 2018, Open Source Days in Copenhagen, FOSS North and GUADEC. However, we have also talked about getting some socks shipped to the US or Asia, although a box of 100 socks weigh a lot resulting in expensive shipping. So if anyone is going to any of the aforementioned conferences and can keep some pairs in their luggage, let me know!

Apart from GNOME Booth staffing I am also helping out with organizing small newcomer workshops at FOSDEM! If you are coming to FOSDEM and is interested in mentoring one or two newcomers with your project, let us know on the Newcomer workshop page (more details here too). Most of all, I look forward to meeting fellow GNOME people again as I feel I have been gone quite a long time. I miss you!

Adrien Plazas: GTK+ Apps on Phones

Enj, 25/01/2018 - 4:42md

As some of you may already know, I recently joined Purism to help developing GTK+ apps for the upcoming Librem 5 phone.

Purism and GNOME share a lot of ideas and values, so the GNOME HIG and GNOME apps are what we will focus on primarily: we will do all we can to not fork nor to reinvent the wheel but to help allowing existing GTK+ applications to work on phones.

How Fit are Existing GTK+ Apps?

Phones are very different from laptops and even tablets: their screen is very small and their main input method is a single thumb on a touchscreen. Luckily, many GNOME applications are touch-friendly and are fit for small screens. Many applications present you a tree of information you can browse and I see two main layouts used by for GNOME applications to let you navigate it.

A first kind of layout is found in applications like Documents, I'll call it stack UI: it uses all the available space to display the collection of information sources (in that case, documents), clicking an element from the collection will focus on it, displaying its content stacked on top of the collection and letting you go back to the collection with a Back button. Applications sporting this layout are the most phone-enabled ones as phone apps typical following a similar layout. Some polish may be needed to make them shine on a phone but overall not so much. Other applications using this layout are Music, Videos, Games, Boxes…

A second kind of layout is found in applications like Contacts, I'll call it panel UI: it displays all the levels of information side-by-side in panels: the closer the information is from the left, the closer it is from the root, each selected node of the information tree being highlighted. This is nice if you have enough window space to display all this information and focus on the leaves isn't required by the user as it allows to quickly jump to other elements of the collection. Unfortunately window space is rare on phones, so these applications would need to be adjusted to fit their screens. Other applications using this layout are Settings, Geary, Polari, FeedReader…

Of course, other layouts exist and are used, but I won't cover these here.

Stack UIs respond to size changes by displaying more or less of the current level of information, but panel UIs tend to seemingly arbitrarily limit the minimum size of the window to keep displaying all the levels information, even though some may not be needed. The responsibility of handling the layout ans sizes to display more or less of some levels of information is sometimes offloaded to the user via the usage of GtkPanel, who then has to manually adjust which information to hide or to display by changing the width of columns every time they need access to another information or when the window's size changes. A notable example of hurtful GtkPanel usage is Geary, which can be a bit of a pain to use half-maximized on a 1920×1080 screen.

Responsive GTK+ Apps

Panel UIs need to be smarter and should decide depending on the window's size which information is relevant to the user and should be displayed. As we don't want to replace the current experience but to extend it, the UIs need to respond to window size changes and explicit focus change requests.

One way of doing it would be to stack the panels one on top of the other to show only one at a time, adding extra Back buttons as needed, effectively switching the UI between panels and a stack.

Another one would be to have floating panels like on KDE Discover. I am not a fan of this method, but on the other hand I'm not a designer.

I expect that to make applications like Geary easier to use even on laptops.

Implementing GtkResponsiveBox

I will try to implement a widget I call GtkResponsiveBox. It contains two children displayed side by side when the box's size is above a given threshold and only one of them when the size is below it.

I expect this widget to look like a weird mix of GtkPaned and GtkStack, to be orientable and to have the following sizes:

  • minimal size = max (widget 1 minimal size, widget 2 minimal size)
  • natural size = widget 1 natural size + widget 2 natural size
  • threshold size = widget 1 minimal size + widget 2 minimal size

I am not completely sure yet how to implement it nor if this widget is a good idea overall. Don't expect anything working soon as it's the first time I subclass GtkContainer. I'll let you know how implementing the widget goes, but in the meantime any comment is welcome!

Thanks Philip for your guide detailing how to implement a custom container, it helps me a lot!

Michael Catanzaro: Announcing Epiphany Technology Preview

Enj, 25/01/2018 - 12:09pd

If you use macOS, the best way to use a recent development snapshot of WebKit is surely Safari Technology Preview. But until now, there’s been no good way to do so on Linux, short of running a development distribution like Fedora Rawhide.

Enter Epiphany Technology Preview. This is a nightly build of Epiphany, on top of the latest development release of WebKitGTK+, running on the GNOME master Flatpak runtime. The target audience is anyone who wants to assist with Epiphany development by testing the latest code and reporting bugs, so I’ve added the download link to Epiphany’s development page.

Since it uses Flatpak, there are no host dependencies asides from Flatpak itself, so it should work on any system that can run Flatpak. Thanks to the Flatpak sandbox, it’s far more secure than the version of Epiphany provided by your operating system. And of course, you enjoy automatic updates from GNOME Software or any software center that supports Flatpak.


(P.S. If you want to use the latest stable version instead, with all the benefits provided by Flatpak, get that here.)

Ismael Olea: I'm going to FOSDEM 2018

Mër, 24/01/2018 - 8:51md

Yeah. I finally decided I’m going to FOSDEM this year. 2018 is the year I’m re-taken my life as I like it and a right way to start it is meeting all those friends and colleagues I missed in those years of exile. I plan to attend to the beer event as soon I arrive to Brussels.

If you want to talk to me about GUADEC 2018, Fedora Flock 2018 or whatever please reach me by Twitter (@olea) or Telegram (@IsmaelOlea).

BTW, there are a couple relevant Telegram Groups FOSDEM related:

General English Telegram group:

Spanish spoken one:

PS:A funny thing about FOSDEM is… this is the place when the Spaniards (or Madrileños indeed) opensource entusiasts can meet at once a year… in Brussels!

Michael Meeks: 2018-01-24 Wednesday

Mër, 24/01/2018 - 6:37md
  • Poked mail, chat with Miklos, poked at partner mail and tasks. Sync with Noel & Eloy, customer call, out to see the dentist.
  • Pleased to discover from Pranav that although Linux' skype client inexplicably (and globally) eats the ctrl-alt-shift-D key-combination (without doing anything with it seemingly) - I can still select an online window and do map._docLayer.toggleTileDebugMode().

Nuritzi Sanchez: Meet Shobha Tyagi from GNOME.Asia Summit 2016

Mër, 24/01/2018 - 2:09pd

This month’s community spotlight is on Shobha Tyagi, one of the volunteer organizers of GNOME.Asia Summit 2016.

Courtesy of Shobha Tyagi

Shobha’s history with GNOME began when she participated in the Outreach Program for Women (OPW) internship in December 2013, with GNOME as her mentoring organization. She attended her first GUADEC in 2014 while she was an OPW intern, and met Emily Chen, who introduced her to the GNOME.Asia Summit.

Passionate about helping to spread GNOME throughout Asia, Shobha was resolute to rise to the challenge of bringing GNOME.Asia Summit to her home in Delhi, India. Fast-forward two years, Shobha is proudly leading the local organizing team of GNOME.Asia, which is ready to lift its curtain in Delhi, on April 21, 2016.

We chatted with Shobha about GNOME and her experience organizing GNOME.Asia.

Why did you choose to work with GNOME for your OPW internship? To be honest, I thought that since GNOME organizes OPW, I would receive the most productive mentoring from GNOME. Sure enough, that happened! I decided to make my initial contribution to Documentation, and after that I met my guru and mentor, Ekaterina Gerasimova. Courtesy of Shobha Tyagi

Do you have a favorite thing about GNOME?

My favorite thing about GNOME is its people. The same people who create it, maintain it, and use it – they are what makes GNOME really great. I really enjoy committing my patches directly to the upstream repositories and meeting the contributors in person. I also get great satisfaction whenever I tell people about GNOME and let them know how they can also contribute. You submitted the winning bid to host GNOME.Asia Summit 2016; do you have any tips for those who are interested to bid for upcoming GNOME conferences? Sure! It does help if you have attended a GNOME conference in the past, but once you have made up your mind to bid, have faith in yourself and just write your proposal. Can you describe a challenge you faced while organizing the GNOME.Asia Summit and how you overcame it? There are many challenges, especially when you are the only one who knows the ins and outs of the event and have a limited amount of time. I’m surrounded by very supportive people. Even so, people expect more from the person who lays the initial groundwork. I thank the summit committee members for their tremendous help and persistence through countless IRC meetings and discussion, without which, it would have been impossible to overcome all of the small obstacles throughout the entire planning experience. What’s the most exciting part about being an organizer? The most exciting part is learning new things! Writing sponsorship documents, calling for presentations, picking up basic web development skills, identifying keynote speakers, chief guests and sponsors, amongst other things. I learned first-hand what goes into designing logos, posters, and stickers. There were also other tasks that I wouldn’t have had to do in a normal situation like arranging a day tour to Taj Mahal for a big group. Life after GNOME.Asia Summit Delhi; what is going to be your next project? After the GNOME.Asia Summit, I would like to focus my efforts on establishing a GNOME user group in Delhi. Advice for eager newcomers and first-time contributors? My advice for them is to come and join GNOME! GNOME enables you and me to contribute, and when we contribute, we help each other improve our lives. If you are committed, you can commit patches too. And now, some fun questions. What is your favorite color?  Yellow. Favorite food? All vegetarian Indian food. What is your spirit animal? Cow! They have a calm demeanor, and symbolize abundance and fertility since they represents both earth and sky. Finally, and this one is important; what do you think cats dream about? Cats dream about being loved, cared for and pampered by their master.

Shobha is helping to organize the 2016 GNOME.Asia Summit while working as an Assistant Professor at Manav Rachna International University, and pursuing a doctorate in Software Engineering. She has been a Foundation member since 2014, and has previously contributed to the Documentation team.

Thank you so much, Shobha, for sparing some of your time to talk to us! We wish you a successful Summit!

Interviewed by Adelia Rahim. 

Nuritzi Sanchez: Giving Spotlight | Meet Øyvind Kolås, GEGL maintainer extraordinaire

Mër, 24/01/2018 - 2:09pd

Last month, we had the pleasure of interviewing Øyvind Kolås, aka “pippin,” about his work on GEGL — a fundamental technology enabling GIMP and GNOME Photos.

GIMP Stickers, CC-BY-SA Michael Natterer

This interview is part of a “Giving Spotlight” series we are doing on some long-time GNOME contributors who have fundraising campaigns. The goal is to help GNOME users understand the importance of the technologies, get to know the maintainers, and learn how to support them.

Without further ado, we invite you to get to know Øyvind and his work on GEGL!

The following interview was conducted over email. 

Getting to know Øyvind

Where are you from and where are you based?

I’m from the town of Ørsta – in the end of a fjord in Norway, but since high-school I’ve been quite migratory. Studying fine art in Oslo and Spain, color science research at a color lab and lecturing multimedia CD-ROM authoring in south-eastern Norway, and working on GNOME technologies like Clutter and cairo for Opened Hand and Intel in London, followed by half a decade of low-budget backpacking. At the moment I am based in Norway – and try to keep in touch with a few people and places – among others England, Germany, and Spain.

Øyvind “pippin” Kolås, CC BY-NC-ND Ross Burton

What do you do and how does it relate to GNOME?

I like tinkering with code – frequently code that involves graphics or UI. This results in sometimes useful, at other times odd, but perhaps interesting, tools, infrastructure, or other software artifacts. Through the years and combined with other interests, this has resulted in contributions to cairo and Clutter, as well as being the maintainer of babl and GEGL, which provide pixel handling and processing machinery for GIMP 2.9, 2.10 and beyond.

How did you first get involved in GNOME?

I attended the second GUADEC which happend in Copenhagen in 2001. This was my first in-person meeting with the people behind nicknames in #gimp as well as meeting in-person GIMP developers and power users, and the wider community around it including the GNOME project.

Why is your fundraising campaign important?

I want GIMP to improve and continue being relevant in the future, as well as
having a powerful graph-based framework for other imaging tasks. I hope that my continued maintainership of babl/GEGL will enable many new and significant workflows in GIMP and related software, as well as provide a foundation for implementing and distributing experimental image processing filters.

Wilber Week 2017, a hackathon for GEGL and GIMP, CC-BY-SA Debarshi Ray Getting to know GEGL

How did your project originate?

GEGL’s history starts in 1997 with Rythm and Hues studios, a Californian visual effects and animation company. They were experimenting with a 16bit/high bit depth fork of GIMP known as filmgimp/cinepaint. Rythm and Hues succeeded in making GIMP work on high bit depth images, but the internal architecture was found to be lacking – and they started GEGL as a more solid future basis for high bit depth non-destructive editing in GIMP. Their funding/management interest waned, and GEGL development went dormant. GIMP however continued considering GEGL to be its future core.

How did you start working on GEGL?

I’ve been making and using graphics-related software since the early ’90s. In 2003-2004 I made a video editor for my own use in a hobby collaboration music video/short film venture. This video editing project was discontinued and salvaged for spare parts, like babl and a large set of initial operations when I took up maintainership and development of GEGL.

What are some of the greatest challenges that you’ve faced along the way?

When people get to know that I am somehow involved in development of the GIMP project, they expect me to be in control of and responsible for how the UI currently is. I have removed some GIMP menu items in attempts to clean things up and reduce technical debt, but most improvements I can take credit for now, and in the future, are indirect, like how moving to GEGL enables higher bit depths and on-canvas preview instead of using postage stamp-sized previews in dialogs.

What are some of your greatest successes?

Bringing GEGL from a duke-nukem-forever state, where GIMP was waiting on GEGL for all future enhancements, to GEGL waiting for GIMP to adopt it. The current development series of GIMP (2.9.x) is close to be released as 2.10 which will be the new stable; it is a featureful version with feature parity with 2.8 but a new engine under the hood. I am looking forward to seeing where GIMP will take GEGL in the future.

What are you working on right now?

One of the things I am working on – and playing with – at the moment is experiments in color separation. I’m making algorithms that simulate the color mixing behavior of inks and paints. That might be useful in programs like GIMP for tasks ranging from soft-proofing spot-colors to preparing photos or designs for multi-color silk-screening, for instance for textiles.

Which projects depend on your project? What’s the impact so far?

There are GIMP and GNOME Photos, as well as imgflo, which is a visual front-end provided by the visual programming environment noflo. GEGL (and babl, a companion library), are designed to be generally useful and do not have any APIs that could only be considered useful for GIMP. GEGL itself also contains various example and experimental command line and graphical interfaces for image and video processing.

How can I get involved? 

GEGL continues needing, and thankfully getting, contributions, new filters, fixes to old filters, improvements to infrastructure, improved translations, and documentation. Making more projects use GEGL is also a good way of attracting more contributors. With funds raised through Liberapay and Patreon, I find it easier to allocate time and energy towards making the contribution experience of others smoother.

And now a few questions just for fun…

What is your favorite place on Earth?

Tricky, I have traveled a lot and not found a single place that is a definitive favorite. Places I’ve found to be to my liking are near the equator and have little seasonal variation, as well as are sufficiently high altitude to cool down to a comfortable day high temperature of roughly 25 degrees Celsius.

Favorite ice cream?

Could I have two scoops in a waffle cone, one mango sorbet, one coconut please?

Finally, our classic question: what do you think cats dream about?

Some cats probably dream about being able to sneak through walls.

Øyvind Kolås, CC BY-NC-ND Ross Burton



Thank you, Øyvind, for your answers. We look forward to seeing your upcoming work on GEGL this year and beyond!

Please consider supporting Øyvind through his GEGL Liberapay or GEGL Patreon campaigns. 

Michael Meeks: 2018-01-23 Tuesday

Mar, 23/01/2018 - 10:00md
  • Mail chew, admin, meeting prep, partner cal, lunch. Installed LibreOffice build tools on HP / Ryzen 5 laptop; upgraded Windows 10 endlessly. Build ESC stats, commercial call.
  • Went out with J. for the first time in some months; just lovely to spend time alone with her out of the house.

Ismael Olea: Opensource gratitude

Mar, 23/01/2018 - 8:11md

Some weeks ago I’ve read somewhere in Twitter about how good will be to adopt and share the practice of thanking the opensource developers of the tools you use and love. Don’t remember neither who or where, and probably I’m stealing the method s/he proposed. Personally I’m getting used myself to visiting the project development site, and if not better method is available, to open and issue with a text like this:

Im opening this issue just to thankyou for the tool you wrote. It’s nice, useful and saves a lot of my time.


PS: please don’t close the issue so other persons could vote it to exprese their gratitude too.

As an example I’ve just wrote it for the CuteMarkEd editor:

Hope this bring a litte bit of endorphines dose to those people who, with their effort, are building the infraestructure of the digital society. Think about it.

Richard Hughes: GCab and CVE-2018-5345

Mar, 23/01/2018 - 2:35md

tl;dr: Update GCab from your distributor.

Longer version: Just before Christmas I found a likely exploitable bug in the libgcab library. Various security teams have been busy with slightly more important issues, and so it’s taken a lot longer than usual to be verified and assigned a CVE. The issue I found was that libgcab attempted to read a large chunk into a small buffer, overwriting lots of interesting things past the end of the buffer. ALSR and SELinux saves us in nearly all cases, so it’s not the end of the world. Almost a textbook C buffer overflow (rust, yada, whatever) so it was easy to fix.

Some key points:

  • This only affects libgcab, not cabarchive or libarchive
  • All gcab versions less than 0.8 are affected
  • Anything that links to gcab is affected, so gnome-software, appstream-glib and fwupd at least
  • Once you install the fixed gcab you need to restart anything that’s using it, e.g. fwupd
  • There is no silly branded name for this bug
  • The GCab project is incredibly well written, and I’ve been hugely impressed with the code quality
  • You can test if your GCab has been fixed by attempting to decompress this file, if the program crashes, you need to update

With Marc-André’s blessing, I’ve released version v0.8 of gcab with this fix. I’ve also released v1.0 which has this fix (and many more nice API additions) which also switches the build system to Meson and cleans up a lot of leaks using g_autoptr(). If you’re choosing a version to update to, the answer is probably 1.0 unless you’re building for something more sedate like RHEL 5 or 6. You can get the Fedora 27 packages here or they’ll be on the mirrors tomorrow.

Didier Roche: Welcome To The (Ubuntu) Bionic Age: Nautilus, a LTS and desktop icons

Mar, 23/01/2018 - 10:54pd
Nautilus, Ubuntu 18.04 LTS and desktop icons: upstream and downstream views.

If you are following closely the news of various tech websites, one of the latest hot topic in the community was about Nautilus removing desktop icons. Let’s try to clarify some points to ensure the various discussions around it have enough background information and not reacting on emotions only as it could be seen lately. You will have both downstream (mine) and upstream (Carlos) perspectives here.

Why upstream Nautilus developers are removing the desktop icons

First, I wasn’t personally really surprised by the announce. Let’s be clear: GNOME, since its 3.0 release, doesn’t have any icons on the desktop by default. There was an option in Tweaks to turn it back on, but let’s be honest: this wasn’t really supported.

The proof is that this code was never really maintained for 7 years and didn’t transition to newer view technologies like the ones Nautilus is migrating to. Having patched myself this code many years ago for Unity (moving desktop icons to the right depending on the icon size, in intellihide mode which thus doesn’t workarea STRUT), I can testify that this code was getting old. Consequently, it became old and rotten for something not even used on default upstream GNOME experience! It would be some irony to keep it that way.

I’m reading a lot of comments about “just keep it as option, the answer is easy”. Let me disagree with this. As already stated previously during my artful blog post series, and for the same reason that we keep Ubuntu Dock with a very few set of supported options, any added option has a cost:

  • It’s another code path to test (manually, most of the time, unfortunately), and the exploding combination of options which can interact badly between each other just produce an unfinished projects, where you have to be careful to not enable this and that option together, or it crashes or cause side-effects… People having played enough with Compiz Config Settings Manager should know what I’m talking about.
  • Not only that, but more code means more bugs, and if you have to transition to a newer technology, you have to modify that code as well. And working on that is detrimental to other bug fixes, features, tests or documentations that could benefit the project. So, this piece of code that you keep and don’t use, has a very negative impact on your whole project. Worse, it impacts indirectly even users who are using the defaults as they are not benefiting of planned enhancements from other part of the project, due to maintainer’s time constraints.

So, yeah, there is never “just an option”.

In addition to that argument that I took to defend upstream’s position (even in front of the French ubuntu community), I want also to hilight that the plan to remove desktop icons was really well executed in term of communication. However, seeing the feedback the upstream developers got when following this communication plan, which takes time, doesn’t motivate to do it again, where in my opinion, it should be the standard for any important (or considered as this) decisions:

  • Carlos blogged about it on planet GNOME by the end of December. He didn’t only explain the context, why this change, who are impacted by this, possible solutions, but he also presented some proposals. So, there is complete what/why/who/how!
  • In addition to this, there is a very good abstract, more technically oriented, in an upstream bug report.
  • A GNOME Shell proof of concept extension was even built to show that the long term solution for users who wants to support icons on the desktop is feasible. It wasn’t just a throwaway experience as clear goals and targets were defined on what’s need to be done to move from a PoC to a working extension. And yes, by the exact same group who are removing desktop icons from Nautilus, I guess that means a lot!

Consequently, those are the detailed information for users to understand why this change is happening and what will be the long-term consequence of it. That is the foundation for good comments and exchanges on the various striking news blog posts. Very well done Carlos! I hope that more and more of those decisions on any free software project will be presented and explained as well as this one. ;)

A word from Nautilus upstream maintainer

Now that I’ve said what I wanted to tell about my view on the upstream changes, and before detailing what we are going to do for Ubuntu 18.04 LTS, let me introduce you Nautilus upstream maintainer already many times mentioned: Carlos Soriano.

Hello Ubuntu and GNOME community,

Thanks Didier for the detailed explanation and giving me a place in your blog!

I’m writting here because I wanted to clarify some details in the interaction with downstreams, in this case Ubuntu and Canonical developers. When I wrote the blog post with all the details I explained only the part that purely refers to upstream Nautilus. And that was actually quite well received. However, two weeks after my blog post some websites explained the change in a not very good way (the clickbait magic).

Usually that is not very important, those who want factual information know that we are in IRC all the time and that we usually write blog posts about upcoming changes in our blog agregation at Planet GNOME or here in Didier’s blog for Ubuntu. This time though, some people in our closer communities (both GNOME and Ubuntu) got splashed with missconceptions, and I wanted to address that, and what better than to do it with Didier :)

One missconception was that Ubuntu and Canonical were ‘yet again’ using older version of software just because. Well maybe you are surprised now, my recommendation for Ubuntu and Canonical was to actually stay in Nautilus 3.26. With a LTS version coming that is by far the most reasonable option. While for a regular user the upstream recommendation is to try out nemo-desktop (by the way, this is another missconception, we said nemo-desktop, not Nemo the app, for a user those are in practice two different things), for a distribution that needs to support and maintain all kind of requests and stability promises for years, staying with a single code that they already worked with is the best option.

Another missconception I saw these days is that seems we take decisions in a rush. In short, I became Nautilus maintainer 3 years and 4 months ago. Exactly 3 years and one month ago I realized that we need to remove that part from Nautilus. It has been quite hard to reason within myself during these 3 years that an option that upstream was not considered the experience we wanted to provide was holding most of the major works on Nautilus, including making away contributions from new contributors given the poor state of the desktop part that unfortunately impacted the whole code of the application itself. In all this time, dowstreams like Ubuntu were a major reason for me to hold this code. Discussions about this decision happened all this time with the other developers of Nautilus.

And the last missconception was that it looks like GNOME devs and Ubuntu devs are in completely separate nichos where noone communicates with each other. While we are usually focused on our personal tasks, when a change is going to happen we do communicate. In this case, I reached out to the desktop team at Canonical before taking the final decision, providing a draft of the blog post to check out the impact, and give possible options for the LTS release of Ubuntu and further.

In summary, the take out from here is that while we might have slighly different visions, at the end of the day we just want to provide the best experience to the users, and for that believe me we do the best we can.

In case you have any question you can always reach out to us Nautilus upstream in #nautilus IRC channel at or in our mailing list.

Hope you enjoy this read, and hopefully we will have the benefits of this work to be shown soon. Thanks again Didier!

What does this mean for Ubuntu?

We thought about this as the Ubuntu Desktop team. Our next release is a LTS (Long-Term Support) version, meaning that Ubuntu 18.04 (currently named Bionic Beaver during its development) will have 5 years of support in term of bug fixes and security updates.

It also mean that most of our user audience will upgrade from our last Ubuntu 16.04 LTS to Ubuntu 18.04 LTS (or even 14.04 -> 16.04 -> 18.04!). The changes are quite large in those last 2 years in term of software updates and new features. On top of this, those users will experience for the first time the Unity -> GNOME Shell transition, and we want to give them a feeling of comfort and familiar landmarks in our default experience, despite the huge changes underneath.

On Ubuntu desktop, we are shipping a Dock, visible by default. Consequently, the desktop view itself, without any application on top, is more important in our user experience than it is for upstream GNOME Shell default one. We think that shipping icons on the desktop is still relevant for our user bases.

Where does this leave us regarding those changes? Thinking about the problem, we came to approximately the same conclusions that upstream Nautilus developers have:

  • Staying for the LTS release, on Nautilus 3.26: the pros is that it’s a battle tested code, that we already know we can support (shipped on 17.10). This matches the fact that the LTS is a very important and strong commitment to us. The cons is that it won’t be by release date the latest and greatest upstream Nautilus release, and maybe some integrations with other parts of GNOME 3.28 code would require more downstream work from us.
  • Using an alternative file manager for the desktop, like Nemo. Shipping on a LTS entirely new code, having to support 2 file managers (Nautilus for normal file browsing and Nemo for the desktop) and ensuring the integration between those two and all other applications works well quickly ruled out that solution.
  • Upgrading to Nautilus 3.28 and shipping the PoC GNOME-Shell extension, contributing to it as much as possible before release. The issue (despite being the long-term solution) is that we’ll ship as in the previous solution entirely new code and that the extension needs new APIs from Nautilus which aren’t fully shaped yet (and maybe won’t be ready for GNOME 3.28 even). Also, we did plan a long time in advance (end of September for 18.04 LTS) on the features and work needed to be done for the next release and we still have a lot to do for Ubuntu 18.04 LTS, some of them being GNOME upstream code. Consequently, rushing into this extension coding, looking to our approaching Feature Freeze deadline on March 1st, would mean that we either drop some initially planned features, or fix less bugs, less polish, which will be detrimental to our overall release quality.

As in every release, we decide on a component by component bases what to do (upgrade to latest or not), weighing the pros and cons and trying to take the best decision for our end user audience. We think that most of GNOME components will be upgraded to 3.28. However, in that particular instance, we decided to keep Nautilus to version 3.26 on Ubuntu 18.04 LTS. You can read the discussion that took place during our weekly Ubuntu Desktop meeting on IRC, leading to that decision.

Another pro to that decision is that it gives flavors shipping Nautilus by default, like Ubuntu Budgie and Edubuntu, a little bit more time to find a solution matching their need, as they don’t run GNOME Shell, and so, can’t use that extension.

The experience will thus be: desktop icons (with Nautilus 3.26) on the default ubuntu session. The vanilla GNOME session I talked many times about will still be running Nautilus 3.26 (as we can only have one version of a software in the archive and installed on user’s machine with traditional deb packages), but with icons on the desktop disabled, as we did on Ubuntu Artful. I think some motivated users will build Nautilus 3.28 in a ppa, but it won’t receive official security and bug fixes support of course.

Meanwhile, we will start contributing for a more long term plan on this new GNOME Shell extension with the Nautilus developers to shape a proper API, have good Drag and Drop support and so on, progressively… This will give better long term code and we hope that the following ubuntu releases will be able to move to it once we reach the minimal set of features we want from it (and consequently, update to latest Nautilus version!).

I hope that sheds some lights on both GNOME upstream and our ubuntu decisions, seeing the two perspectives and why those actions were taken, as well as the long term plan. Hopefully, those posts explaining a little bit the context will lead to informed and constructive comments as well!

Alberto Ruiz: GUADEC 2017: GNOME’s Renaissance

Mar, 23/01/2018 - 1:35pd

NOTE: This is a blog post I kept as a draft right after GUADEC to reflect on it and the GNOME project but failed to finish and publish until now. Forgive any outdated information though I think the post is mostly relevant still.

I’m on my train back to London from Manchester, where I just spent 7 amazing days with my fellow GNOME community members. Props to the local team for an amazing organization, everything went smoothly and people seemed extremely pleased with the setup as far as I can tell and the venues seemed to have worked extremely well. I mostly want to reflect on a feeling that I have which is that GNOME seems to be experiencing a renaissance in the energy and focus of the community as well as the broader interest from other players.

Source: kitty-kat @ flickr CC BY-SA 4.0 Peak attendance and sponsorship

Our attendance numbers have gone up considerably from the most recent years, approximately 260 registrations, minus a bunch of people who could not make it in the end. That is an increase of ~50-60 attendants which is really encouraging.

On top of that this years’ sponsorships went up both in the number of companies sponsoring and supporting the event, this is really encouraging as it shows that there is an increased interest in the project and acknowledgement that GUADEC


There are two comebacks that were actually very encouraging, first, Canonical and Ubuntu community members are back, and it was really encouraging to see them participating. Additionally, members of the Solaris Desktop team showed up too. Also, having Andrew Walton from VMWare around was encouraging too.

Balance was brought back to the force

Historically Red Hat would have a prominent presence at GUADEC and in GNOME in general, it was really amazing to see that Endless is now a Foundation AdBoard member, but also, that the amount of Endless crew at GUADEC matched that of Red Hat.

Contrary of what people may think, Red Hat does not generally enjoy being the only player in any given free software community, however since Nokia and Sun/Oracle retreated their heavy investment in GNOME, Red Hat was probably the most prominent player in the community until now. While Red Hat is still investing as heavily as ever in GNOME, we’re not the only major player anymore, and that is something to celebrate and to look forward to see expanded to other organizations.

It was particularly encouraging to see the design sessions packed with people from Endless, Canonical and Red Hat as well as many other interested individuals.


It feels to me that Flatpak, and specially the current efforts around Flathub have helped focused the community towards a common vision for the developer and the user story and you can feel the common excitement that we’re onto something and its implications across the stack and our infrastructure.

Obviously, not everybody shares this enthusiasm around flatpak itself, but the broad consensus is that the model around it is worth pursuing and that it has the potential to raise considerably the viability of the Free Software desktop and personal devices, not to mention, it gives a route towards monetization of free software and Linux apps.

GitLab, Meson and BuildStream

Another batch of modernization in our stack and infrastructure. First and foremost, the GitLab migration which we believe it will not only improve the interaction of newcomers and early testers as well as improving our Continuous Integration pipeline.

Consensus around Meson and leaving autotools behind is another big step and many other relevant free software projects are jumping to the bandwagon. And last but not least, Tristan and Codethink are leading an effort to consolidate continuous and jhbuild into BuildStream, a better way to build a collection of software from multiple source repositories.

Looking ahead

I think that the vibe at GUADEC and the current state of GNOME is really exciting, there are a lot of things to look forward to as well. My main take away is that the project is in an incredibly healthy state, with a rich ecosystem of people committing entire companies on products based on what the GNOME community writes and a commitment to solve some of the very hard remaining problems left to make the Free Desktop a viable contender to what the industry offers right now.

Promising times ahead.

Sebastian Dröge: Speeding up RGB to grayscale conversion in Rust by a factor of 2.2 – and various other multimedia related processing loops

Dje, 21/01/2018 - 2:54md

In the previous blog post I wrote about how to write a RGB to grayscale conversion filter for GStreamer in Rust. In this blog post I’m going to write about how to optimize the processing loop of that filter, without resorting to unsafe code or SIMD instructions by staying with plain, safe Rust code.

I also tried to implement the processing loop with faster, a Rust crate for writing safe SIMD code. It looks very promising, but unless I missed something in the documentation it currently is missing some features to be able to express this specific algorithm in a meaningful way. Once it works on stable Rust (waiting for SIMD to be stabilized) and includes runtime CPU feature detection, this could very well be a good replacement for the ORC library used for the same purpose in GStreamer in various places. ORC works by JIT-compiling a minimal “array operation language” to SIMD assembly for your specific CPU (and has support for x86 MMX/SSE, PPC Altivec, ARM NEON, etc.).

If someone wants to prove me wrong and implement this with faster, feel free to do so and I’ll link to your solution and include it in the benchmark results below.

All code below can be found in this GIT repository.

Table of Contents
  1. Baseline Implementation
  2. First Optimization – Assertions
  3. First Optimization – Assertions Try 2
  4. Second Optimization – Iterate a bit more
  5. Third Optimization – Getting rid of the bounds check finally
  6. Summary
  7. Addendum: slice::split_at
Baseline Implementation

This is how the baseline implementation looks like.

pub fn bgrx_to_gray_chunks_no_asserts( in_data: &[u8], out_data: &mut [u8], in_stride: usize, out_stride: usize, width: usize, ) { let in_line_bytes = width * 4; let out_line_bytes = width * 4; for (in_line, out_line) in in_data .chunks(in_stride) .zip(out_data.chunks_mut(out_stride)) { for (in_p, out_p) in in_line[..in_line_bytes] .chunks(4) .zip(out_line[..out_line_bytes].chunks_mut(4)) { let b = u32::from(in_p[0]); let g = u32::from(in_p[1]); let r = u32::from(in_p[2]); let x = u32::from(in_p[3]); let grey = ((r * RGB_Y[0]) + (g * RGB_Y[1]) + (b * RGB_Y[2]) + (x * RGB_Y[3])) / 65536; let grey = grey as u8; out_p[0] = grey; out_p[1] = grey; out_p[2] = grey; out_p[3] = grey; } } }

This basically iterates over each line of the input and output frame (outer loop), and then for each BGRx chunk of 4 bytes in each line it converts the values to u32, multiplies with a constant array, converts back to u8 and stores the same value in the whole output BGRx chunk.

Note: This is only doing the actual conversion from linear RGB to grayscale (and in BT.601 colorspace). To do this conversion correctly you need to know your colorspaces and use the correct coefficients for conversion, and also do gamma correction. See this about why it is important.

So what can be improved on this? For starters, let’s write a small benchmark for this so that we know whether any of our changes actually improve something. This is using the (unfortunately still) unstable benchmark feature of Cargo.

#![feature(test)] #![feature(exact_chunks)] extern crate test; pub fn bgrx_to_gray_chunks_no_asserts(...) [...] } #[cfg(test)] mod tests { use super::*; use test::Bencher; use std::iter; fn create_vec(w: usize, h: usize) -> Vec<u8> { iter::repeat(0).take(w * h * 4).collect::<_>() } #[bench] fn bench_chunks_1920x1080_no_asserts(b: &mut Bencher) { let i = test::black_box(create_vec(1920, 1080)); let mut o = test::black_box(create_vec(1920, 1080)); b.iter(|| bgrx_to_gray_chunks_no_asserts(&i, &mut o, 1920 * 4, 1920 * 4, 1920)); } }

This can be run with cargo bench and then prints the amount of nanoseconds each iterator of the closure was taking. To only really measure the processing itself, allocations and initializations of the input/output frame are happening outside of the closure. We’re not interested in times for that.

First Optimization – Assertions

To actually start optimizing this function, let’s take a look at the assembly that the compiler is outputting. The easiest way of doing that is via the Godbolt Compiler Explorer website. Select “rustc nightly” and use “-C opt-level=3” for the compiler flags, and then copy & paste your code in there. Once it compiles, to find the assembly that corresponds to a line, simply right-click on the line and “Scroll to assembly”.

Alternatively you can use cargo rustc –release — -C opt-level=3 –emit asm and check the assembly file that is output in the target/release/deps directory.

What we see then for our inner loop is something like the following

.LBB4_19: cmp r15, r11 mov r13, r11 cmova r13, r15 mov rdx, r8 sub rdx, r13 je .LBB4_34 cmp rdx, 3 jb .LBB4_35 inc r9 movzx edx, byte ptr [rbx - 1] movzx ecx, byte ptr [rbx - 2] movzx esi, byte ptr [rbx] imul esi, esi, 19595 imul edx, edx, 38470 imul ecx, ecx, 7471 add ecx, edx add ecx, esi shr ecx, 16 mov byte ptr [r10 - 3], cl mov byte ptr [r10 - 2], cl mov byte ptr [r10 - 1], cl mov byte ptr [r10], cl add r10, 4 add r8, -4 add r15, -4 add rbx, 4 cmp r9, r14 jb .LBB4_19

This is already quite optimized. For each loop iteration the first few instructions are doing some bounds checking and if they fail jump to the .LBB4_34 or .LBB4_35 labels. How to understand that this is bounds checking? Scroll down in the assembly to where these labels are defined and you’ll see something like the following

.LBB4_34: lea rdi, [rip + .Lpanic_bounds_check_loc.D] xor esi, esi xor edx, edx call core::panicking::panic_bounds_check@PLT ud2 .LBB4_35: cmp r15, r11 cmova r11, r15 sub r8, r11 lea rdi, [rip + .Lpanic_bounds_check_loc.F] mov esi, 2 mov rdx, r8 call core::panicking::panic_bounds_check@PLT ud2

Also if you check (with the colors, or the “scroll to source” feature) which Rust code these correspond to, you’ll see that it’s the first and third access to the 4-byte slice that contains our BGRx values.

Afterwards in the assembly, the following steps are happening: 0) incrementing of the “loop counter” representing the number of iterations we’re going to do (r9), 1) actual reading of the B, G and R value and conversion to u32 (the 3 movzx, note that the reading of the x value is optimized away as the compiler sees that it is always multiplied by 0 later), 2) the multiplications with the array elements (the 3 imul), 3) combining of the results and division (i.e. shift) (the 2 add and the shr), 4) storing of the result in the output (the 4 mov). Afterwards the slice pointers are increased by 4 (rbx and r10) and the lengths (used for bounds checking) are decreased by 4 (r8 and r15). Finally there’s a check (cmp) to see if r9 (our loop) counter is at the end of the slice, and if not we jump back to the beginning and operate on the next BGRx chunk.

Generally what we want to do for optimizations is to get rid of unnecessary checks (bounds checking), memory accesses, conditions (cmp, cmov) and jumps (the instructions starting with j). These are all things that are slowing down our code.

So the first thing that seems useful to optimize here is the bounds checking at the beginning. It definitely seems not useful to do two checks instead of one for the two slices (the checks are for the both slices at once but Godbolt does not detect that and believes it’s only the input slice). And ideally we could teach the compiler that no bounds checking is needed at all.

As I wrote in the previous blog post, often this knowledge can be given to the compiler by inserting assertions.

To prevent two checks and just have a single check, you can insert a assert_eq!(in_p.len(), 4) at the beginning of the inner loop and the same for the output slice. Now we only have a single bounds check left per iteration.

As a next step we might want to try to move this knowledge outside the inner loop so that there is no bounds checking at all in there anymore. We might want to add assertions like the following outside the outer loop then to give all knowledge we have to the compiler

assert_eq!(in_data.len() % 4, 0); assert_eq!(out_data.len() % 4, 0); assert_eq!(out_data.len() / out_stride, in_data.len() / in_stride); assert!(in_line_bytes <= in_stride); assert!(out_line_bytes <= out_stride);

Unfortunately adding those has no effect at all on the inner loop, but having them outside the outer loop for good measure is not the worst idea so let’s just keep them. At least it can be used as some kind of documentation of the invariants of this code for future readers.

So let’s benchmark these two implementations now. The results on my machine are the following

test tests::bench_chunks_1920x1080_no_asserts ... bench: 4,420,145 ns/iter (+/- 139,051) test tests::bench_chunks_1920x1080_asserts ... bench: 4,897,046 ns/iter (+/- 166,555)

This is surprising, our version without the assertions is actually faster by a factor of ~1.1 although it had fewer conditions. So let’s take a closer look at the assembly at the top of the loop again, where the bounds checking happens, in the version with assertions

.LBB4_19: cmp rbx, r11 mov r9, r11 cmova r9, rbx mov r14, r12 sub r14, r9 lea rax, [r14 - 1] mov qword ptr [rbp - 120], rax mov qword ptr [rbp - 128], r13 mov qword ptr [rbp - 136], r10 cmp r14, 5 jne .LBB4_33 inc rcx [...]

While this indeed has only one jump as expected for the bounds checking, the number of comparisons is the same and even worse: 3 memory writes to the stack are happening right before the jump. If we follow to the .LBB4_33 label we will see that the assert_eq! macro is going to do something with core::fmt::Debug. This is setting up the information needed for printing the assertion failure, the “expected X equals to Y” output. This is certainly not good and the reason why everything is slower now.

First Optimization – Assertions Try 2

All the additional instructions and memory writes were happening because the assert_eq! macro is outputting something user friendly that actually contains the values of both sides. Let’s try again with the assert! macro instead

test tests::bench_chunks_1920x1080_no_asserts ... bench: 4,420,145 ns/iter (+/- 139,051) test tests::bench_chunks_1920x1080_asserts ... bench: 4,897,046 ns/iter (+/- 166,555) test tests::bench_chunks_1920x1080_asserts_2 ... bench: 3,968,976 ns/iter (+/- 97,084)

This already looks more promising. Compared to our baseline version this gives us a speedup of a factor of 1.12, and compared to the version with assert_eq! 1.23. If we look at the assembly for the bounds checks (everything else stays the same), it also looks more like what we would’ve expected

.LBB4_19: cmp rbx, r12 mov r13, r12 cmova r13, rbx add r13, r14 jne .LBB4_33 inc r9 [...]

One cmp less, only one jump left. And no memory writes anymore!

So keep in mind that assert_eq! is more user-friendly but quite a bit more expensive even in the “good case” compared to assert!.

Second Optimization – Iterate a bit more

This is still not very satisfying though. No bounds checking should be needed at all as each chunk is going to be exactly 4 bytes. We’re just not able to convince the compiler that this is the case. While it may be possible (let me know if you find a way!), let’s try something different. The zip iterator is done when the shortest iterator of both is done, and there are optimizations specifically for zipped slice iterators implemented. Let’s try that and replace the grayscale value calculation with

let grey = in_p.iter() .zip(RGB_Y.iter()) .map(|(i, c)| u32::from(*i) * c) .sum::<u32>() / 65536;

If we run that through our benchmark after removing the assert!(in_p.len() == 4) (and the same for the output slice), these are the results

test tests::bench_chunks_1920x1080_asserts_2 ... bench: 3,968,976 ns/iter (+/- 97,084) test tests::bench_chunks_1920x1080_iter_sum ... bench: 11,393,600 ns/iter (+/- 347,958)

We’re actually 2.9 times slower! Even when adding back the assert!(in_p.len() == 4) assertion (and the same for the output slice) we’re still slower

test tests::bench_chunks_1920x1080_asserts_2 ... bench: 3,968,976 ns/iter (+/- 97,084) test tests::bench_chunks_1920x1080_iter_sum ... bench: 11,393,600 ns/iter (+/- 347,958) test tests::bench_chunks_1920x1080_iter_sum_2 ... bench: 10,420,442 ns/iter (+/- 242,379)

If we look at the assembly of the assertion-less variant, it’s a complete mess now

.LBB0_19: cmp rbx, r13 mov rcx, r13 cmova rcx, rbx mov rdx, r8 sub rdx, rcx cmp rdx, 4 mov r11d, 4 cmovb r11, rdx test r11, r11 je .LBB0_20 movzx ecx, byte ptr [r15 - 2] imul ecx, ecx, 19595 cmp r11, 1 jbe .LBB0_22 movzx esi, byte ptr [r15 - 1] imul esi, esi, 38470 add esi, ecx movzx ecx, byte ptr [r15] imul ecx, ecx, 7471 add ecx, esi test rdx, rdx jne .LBB0_23 jmp .LBB0_35 .LBB0_20: xor ecx, ecx .LBB0_22: test rdx, rdx je .LBB0_35 .LBB0_23: shr ecx, 16 mov byte ptr [r10 - 3], cl mov byte ptr [r10 - 2], cl cmp rdx, 3 jb .LBB0_36 inc r9 mov byte ptr [r10 - 1], cl mov byte ptr [r10], cl add r10, 4 add r8, -4 add rbx, -4 add r15, 4 cmp r9, r14 jb .LBB0_19

In short, there are now various new conditions and jumps for short-circuiting the zip iterator in the various cases. And because of all the noise added, the compiler was not even able to optimize the bounds check for the output slice away anymore (.LBB0_35 cases). While it was able to unroll the iterator (note that the 3 imul multiplications are not interleaved with jumps and are actually 3 multiplications instead of yet another loop), which is quite impressive, it couldn’t do anything meaningful with that information it somehow got (it must’ve understood that each chunk has 4 bytes!). This looks like something going wrong somewhere in the optimizer to me.

If we take a look at the variant with the assertions, things look much better

.LBB3_19: cmp r11, r12 mov r13, r12 cmova r13, r11 add r13, r14 jne .LBB3_33 inc r9 movzx ecx, byte ptr [rdx - 2] imul r13d, ecx, 19595 movzx ecx, byte ptr [rdx - 1] imul ecx, ecx, 38470 add ecx, r13d movzx ebx, byte ptr [rdx] imul ebx, ebx, 7471 add ebx, ecx shr ebx, 16 mov byte ptr [r10 - 3], bl mov byte ptr [r10 - 2], bl mov byte ptr [r10 - 1], bl mov byte ptr [r10], bl add r10, 4 add r11, -4 add r14, 4 add rdx, 4 cmp r9, r15 jb .LBB3_19

This is literally the same as the assertion version we had before, except that the reading of the input slice, the multiplications and the additions are happening in iterator order instead of being batched all together. It’s quite impressive that the compiler was able to completely optimize away the zip iterator here, but unfortunately it’s still many times slower than the original version. The reason must be the instruction-reordering. The previous version had all memory reads batched and then the operations batched, which is apparently much better for the internal pipelining of the CPU (it is going to perform the next instructions without dependencies on the previous ones already while waiting for the pending instructions to finish).

It’s also not clear to me why the LLVM optimizer is not able to schedule the instructions the same way here. It apparently has all information it needs for that if no iterator is involved, and both versions are leading to exactly the same assembly except for the order of instructions. This also seems like something fishy.

Nonetheless, we still have our manual bounds check (the assertion) left here and we should really try to get rid of that. No progress so far.

Third Optimization – Getting rid of the bounds check finally

Let’s tackle this from a different angle now. Our problem is apparently that the compiler is not able to understand that each chunk is exactly 4 bytes.

So why don’t we write a new chunks iterator that has always exactly the requested amount of items, instead of potentially less for the very last iteration. And instead of panicking if there are leftover elements, it seems useful to just ignore them. That way we have API that is functionally different from the existing chunks iterator and provides behaviour that is useful in various cases. It’s basically the slice equivalent of the exact_chunks iterator of the ndarray crate.

By having it functionally different from the existing one, and not just an optimization, I also submitted it for inclusion in Rust’s standard library and it’s nowadays available as an unstable feature in nightly. Like all newly added API. Nonetheless, the same can also be implemented inside your code with basically the same effect, there are no dependencies on standard library internals.

So, let’s use our new exact_chunks iterator that is guaranteed (by API) to always give us exactly 4 bytes. In our case this is exactly equivalent to the normal chunks as by construction our slices always have a length that is a multiple of 4, but the compiler can’t infer that information. The resulting code looks as follows

pub fn bgrx_to_gray_exact_chunks( in_data: &[u8], out_data: &mut [u8], in_stride: usize, out_stride: usize, width: usize, ) { assert_eq!(in_data.len() % 4, 0); assert_eq!(out_data.len() % 4, 0); assert_eq!(out_data.len() / out_stride, in_data.len() / in_stride); let in_line_bytes = width * 4; let out_line_bytes = width * 4; assert!(in_line_bytes <= in_stride); assert!(out_line_bytes <= out_stride); for (in_line, out_line) in in_data .exact_chunks(in_stride) .zip(out_data.exact_chunks_mut(out_stride)) { for (in_p, out_p) in in_line[..in_line_bytes] .exact_chunks(4) .zip(out_line[..out_line_bytes].exact_chunks_mut(4)) { assert!(in_p.len() == 4); assert!(out_p.len() == 4); let b = u32::from(in_p[0]); let g = u32::from(in_p[1]); let r = u32::from(in_p[2]); let x = u32::from(in_p[3]); let grey = ((r * RGB_Y[0]) + (g * RGB_Y[1]) + (b * RGB_Y[2]) + (x * RGB_Y[3])) / 65536; let grey = grey as u8; out_p[0] = grey; out_p[1] = grey; out_p[2] = grey; out_p[3] = grey; } } }

It’s exactly the same as the previous version with assertions, except for using exact_chunks instead of chunks and the same for the mutable iterator. The resulting benchmark of all our variants now looks as follow

test tests::bench_chunks_1920x1080_no_asserts ... bench: 4,420,145 ns/iter (+/- 139,051) test tests::bench_chunks_1920x1080_asserts ... bench: 4,897,046 ns/iter (+/- 166,555) test tests::bench_chunks_1920x1080_asserts_2 ... bench: 3,968,976 ns/iter (+/- 97,084) test tests::bench_chunks_1920x1080_iter_sum ... bench: 11,393,600 ns/iter (+/- 347,958) test tests::bench_chunks_1920x1080_iter_sum_2 ... bench: 10,420,442 ns/iter (+/- 242,379) test tests::bench_exact_chunks_1920x1080 ... bench: 2,007,459 ns/iter (+/- 112,287)

Compared to our initial version this is a speedup of a factor of 2.2, compared to our version with assertions a factor of 1.98. This seems like a worthwhile improvement, and if we look at the resulting assembly there are no bounds checks at all anymore

.LBB0_10: movzx edx, byte ptr [rsi - 2] movzx r15d, byte ptr [rsi - 1] movzx r12d, byte ptr [rsi] imul r13d, edx, 19595 imul edx, r15d, 38470 add edx, r13d imul ebx, r12d, 7471 add ebx, edx shr ebx, 16 mov byte ptr [rcx - 3], bl mov byte ptr [rcx - 2], bl mov byte ptr [rcx - 1], bl mov byte ptr [rcx], bl add rcx, 4 add rsi, 4 dec r10 jne .LBB0_10

Also due to this the compiler is able to apply some more optimizations and we only have one loop counter for the number of iterations r10 and the two pointers rcx and rsi that are increased/decreased in each iteration. There is no tracking of the remaining slice lengths anymore, as in the assembly of the original version (and the versions with assertions).


So overall we got a speedup of a factor of 2.2 while still writing very high-level Rust code with iterators and not falling back to unsafe code or using SIMD. The optimizations the Rust compiler is applying are quite impressive and the Rust marketing line of zero-cost abstractions is really visible in reality here.

The same approach should also work for many similar algorithms, and thus many similar multimedia related algorithms where you iterate over slices and operate on fixed-size chunks.

Also the above shows that as a first step it’s better to write clean and understandable high-level Rust code without worrying too much about performance (assume the compiler can optimize well), and only afterwards take a look at the generated assembly and check which instructions should really go away (like bounds checking). In many cases this can be achieved by adding assertions in strategic places, or like in this case by switching to a slightly different abstraction that is closer to the actual requirements (however I believe the compiler should be able to produce the same code with the help of assertions with the normal chunks iterator, but making that possible requires improvements to the LLVM optimizer probably).

And if all does not help, there’s still the escape hatch of unsafe (for using functions like slice::get_unchecked() or going down to raw pointers) and the possibility of using SIMD instructions (by using faster or stdsimd directly). But in the end this should be a last resort for those little parts of your code where optimizations are needed and the compiler can’t be easily convinced to do it for you.

Addendum: slice::split_at

User newpavlov suggested on Reddit to use repeated slice::split_at in a while loop for similar performance.

This would for example like

pub fn bgrx_to_gray_split_at( in_data: &[u8], out_data: &mut [u8], in_stride: usize, out_stride: usize, width: usize, ) { assert_eq!(in_data.len() % 4, 0); assert_eq!(out_data.len() % 4, 0); assert_eq!(out_data.len() / out_stride, in_data.len() / in_stride); let in_line_bytes = width * 4; let out_line_bytes = width * 4; assert!(in_line_bytes <= in_stride); assert!(out_line_bytes <= out_stride); for (in_line, out_line) in in_data .exact_chunks(in_stride) .zip(out_data.exact_chunks_mut(out_stride)) { let mut in_pp: &[u8] = in_line[..in_line_bytes].as_ref(); let mut out_pp: &mut [u8] = out_line[..out_line_bytes].as_mut(); assert!(in_pp.len() == out_pp.len()); while in_pp.len() >= 4 { let (in_p, in_tmp) = in_pp.split_at(4); let (out_p, out_tmp) = { out_pp }.split_at_mut(4); in_pp = in_tmp; out_pp = out_tmp; let b = u32::from(in_p[0]); let g = u32::from(in_p[1]); let r = u32::from(in_p[2]); let x = u32::from(in_p[3]); let grey = ((r * RGB_Y[0]) + (g * RGB_Y[1]) + (b * RGB_Y[2]) + (x * RGB_Y[3])) / 65536; let grey = grey as u8; out_p[0] = grey; out_p[1] = grey; out_p[2] = grey; out_p[3] = grey; } } }

Performance-wise this brings us very close to the exact_chunks version

test tests::bench_exact_chunks_1920x1080 ... bench: 1,965,631 ns/iter (+/- 58,832) test tests::bench_split_at_1920x1080 ... bench: 2,046,834 ns/iter (+/- 35,990)

and the assembly is also very similar

.LBB0_10: add rbx, -4 movzx r15d, byte ptr [rsi] movzx r12d, byte ptr [rsi + 1] movzx edx, byte ptr [rsi + 2] imul r13d, edx, 19595 imul r12d, r12d, 38470 imul edx, r15d, 7471 add edx, r12d add edx, r13d shr edx, 16 movzx edx, dl imul edx, edx, 16843009 mov dword ptr [rcx], edx lea rcx, [rcx + 4] add rsi, 4 cmp rbx, 3 ja .LBB0_10

Here the compiler even optimizes the storing of the value into a single write operation of 4 bytes, at the cost of an additional multiplication and zero-extend register move.

Overall this code performs very well too, but in my opinion it looks rather ugly compared to the versions using the different chunks iterators. Also this is basically what the exact_chunks iterator does internally: repeatedly calling slice::split_at. In theory both versions could lead to the very same assembly, but the LLVM optimizer is currently handling both slightly different.

Christian Hergert: Builder happenings for January

Sht, 20/01/2018 - 12:37md

I’ve been very busy with Builder since returning from the holidays. As mentioned previously, we’ve moved to gitlab. I’m very happy about it. I can see how this is going to improve the engagement and communication between our existing community and help us keep new contributors.

I made two releases of Builder so far this month. That included both a new stable build (which flatpak users are already using) and a new snapshot for those on developer operating systems like Fedora Rawhide.

The vast majority of my work this month has been on stabilization efforts. Builder is already a very large project. Every moving part we add makes this Rube Goldberg machine just a bit more difficult to maintain. I’ve tried to focus my time on things that are brittle and either improve or replace the designs. I’ve also fixed a good number of memory leaks and safety issues. However, the memory overhead of clang sort of casts a large shadow on all that work. We really need to get clang out of process one of these days.

Over the past couple years, our coding style evolved thanks to new features like g_autoptr() and friends. Every time I come across old-style code during my bug hunts, I clean it up.

Builder learned how to automatically install Flatpak SDK Extensions. These can save you a bunch of time when building your application if you have a complex stack. Things like Rust and Mono can just be pulled in and copied into your app rather than compiled from source on your machine. In doing so, every app that uses the technology can share objects in the OSTree repository, saving disk space and network transfer.

That allowed me to create a new template, a GNOME C♯ template. It uses the Mono SDK extension and gtk-sharp for 3.x. If you want to help here, work on a omni-sharp language server plugin for us!

A new C++ template using Gtkmm was added. Given that I don’t have a lot of recent experience with Gtkmm, it’d be nice to have someone from that community come in and make sure things are in good shape.

I also did some cleanup on our code-indexer to avoid threading in our API. Creating plugins on threads turned out to be rather disastrous, so now we try extra hard to keep things on the main thread with the typical async/finish function pairs.

I created a new messages panel to elevate warnings to the user without them having to run Builder from a terminal. If you want an easy project to work on, we need to go find interesting calls to g_warning() and use ide_context_warning() instead.

Our flatpak plugin now tries extra hard to avoid downloads. Those were really annoying for people when opening builder. It took some troubleshooting in flatpak-builder, and that is fixed now.

In the process of fixing the extraneous downloading I realized we could start bundling flatpak-builder with Builder. After a couple of fixes to flatpak-builder Builder Nightly no longer requires flatpak-builder on the host. That’s one less thing to go wrong for people going through the newcomers work-flow.

We just landed the beginning of a go-langserver plugin. It seems like the language server for Go is pretty new though. We only have a symbol resolver thus far.

I found a fun bug in Vala that caused const gchar * const * parameters to async functions to turn into gchar **, int. It was promptly fixed upstream for us (thanks Rico).

Some 350 commits have landed this month so far, most of them around stabilizing Builder. It’s a good time to start playing with the Nightly branch if you’re into that.

Oh, and after some 33 years on Earth, I finally needed glasses. So I look educated now.

Jim Hall: Programming with ncurses

Pre, 19/01/2018 - 12:33pd
Over at Linux Journal, I am writing an article series about programming on Linux. While graphical user interfaces are very cool, not every program needs to run with a point-and-click interface. So in my "Getting started with ncurses" article series, I discuss how to write programs using the ncurses library functions.

Maybe you aren't familiar with curses or ncurses, but I guarantee you've run programs that use this library. Many programs that run in "terminal" mode, including vi editor, use the curses set of functions to draw to the screen. The curses functions allow you to put text anywhere on the screen, or read from the keyboard.

My article series starts with a simple example that demonstrates how to put characters and text on the screen. My example program is a chaos game iteration of Sierpinski's Triangle, which is a very simple program (only 73 lines).

Follow-up articles in the series will include a "Quest" program to demonstrate how to query the screen and use the arrow keys, and how to add colors.

Linux Journal has posted the second part of my article series: Creating an adventure game in the terminal using ncurses. Soon to come: The same adventure game, using colors!

Morten Welinder: Security From Whom, Indeed

Enj, 18/01/2018 - 3:05md

So Spectre and Meltdown happened.

That was completely predicable, so much so that I, in fact, did predict that side-channel attacks, including those coming via javascript run in a browser, was the thing to look out for. (This was in the context of pointing out that pushing Wayland as a security improvement over plain old X11 was misguided.)

I recall being told that such attacks were basically nation-state level due to cost, complexity, and required target information. How is that prediction working out for you?

Philip Chimento: Announcing Flapjack

Mër, 17/01/2018 - 9:54md

Here’s a post about a tool that I’ve developed at work. You might find it useful if you contribute to any desktop platform libraries that are packaged as a Flatpak runtime, such as GNOME or KDE.

Flatpak is a system for delivering desktop applications that was pioneered by the GNOME community. At Endless, we have jumped aboard the Flatpak train. Our product Endless OS is a Linux distribution, but not a traditional one in the sense of being a collection of packages that you install with a package manager; it’s an immmutable OS image, with atomic updates delivered through OSTree. Applications are sandboxed-only and Flatpak-only.

Flatpak makes the lives of application developers much easier, who want to get their applications to users without having to care which Linux distribution those users use. It means that as an application developer, I don’t have to fire up three different virtual machines and email five packaging contributors whenever I make a release of my application. (Or, in theory it would work that way if I would stop using deprecated libraries in my application!)

This is what flapjacks are in the UK, Ireland, Isle of Man, and Newfoundland. Known as “granola bars” or “oat bars” elsewhere. By Alistair Young, CC BY 2.0,

On my work computer I took the leap and now develop everything on an immutable OSTree system just like it would be running in production. I now develop everything inside a Flatpak sandbox. However, while Flatpak works great when packaging some code that already exists, it is a bit lacking in the developer experience.

For app developers, Carlos Soriano has written a tool called flatpak-dev-cli based on a workflow designed by Thibault Saunier of the Pitivi development team. This has proven very useful for developing Flatpak apps.

But a lot of the work I do is not on apps, but on the library stack that is used by apps on Endless OS. In fact, my team’s main product is a Flatpak runtime. I wanted an analogue of flatpak-dev-cli for developing the libraries that live inside a Flatpak runtime.


…while this is what flapjacks are everywhere else in Canada, and in the US. Also known as “pancakes.” By Belathee Photography, CC BY-SA 3.0,

Flapjack is that tool. It’s a wrapper around Flatpak-builder that is intended to replace JHBuild in the library developer’s toolbox.

For several months I’ve been using it in my day-to-day work, on a system running Endless OS (which has hardly any developer tools installed by default.) It only requires Flatpak-builder, Git, and Python.

In Flapjack’s README I included a walkthrough for reproducing Tristan’s trick from his BuildStream talk at GUADEC 2017 where he built an environment with a modified copy of GTK that showed all the UI labels upside-down.

That walkthrough is pretty much what my day-to-day development workflow looks like now. As an example, a recent bug required me to patch eos-knowledge-lib and xapian-glib at the same time, which are both components of Endless’s Modular Framework runtime. I did approximately this:

flapjack open xapian-glib flapjack open eos-knowledge-lib cd checkout/xapian-glib # ... make changes to code ... flapjack test xapian-glib # ... keep changing and repeating until the tests pass! cd ../eos-knowledge-lib # ... make more changes to code ... flapjack test eos-knowledge-lib # ... keep changing and repeating until the tests pass! flapjack build # ... keep changing and repeating until the whole runtime builds! flapjack run com.endlessm.encyclopedia.en # run Encyclopedia, which is an app that uses this runtime, to check # that my fix worked git checkout -b etc. etc. # create branches for my work and push them

I also use Flapjack’s “devtools manifest” to conveniently provide developer tools that aren’t present in Endless OS’s base OSTree layer. In Flapjack’s readme I gave an example of adding the jq tool to the devtools manifest, but I also have cppcheck, RR, and a bunch of Python modules that I added with flatpak-pip-generator. Whenever I need to use any of these tools, I just open flapjack shell and they’re available!

Questions you might ask Why is it called Flapjack?

The working title was jokingly chosen to mess up your muscle memory if you were used to typing flatpak, but it stuck and became the real name. If it does annoy you, you can alias it to fj or something.

Flatpak-builder is old news, why does Flapjack not use BuildStream?

I would like it if that were the case! I suspect that BuildStream would solve my main problem with Flapjack, which is that it is slow. In fact I started out writing Flapjack as a wrapper around BuildStream, instead of Flatpak-builder. But at the time BuildStream just didn’t have enough documentation for me to get my head around it quickly enough. I hear that this is changing and I would welcome a port to BuildStream!

As well, it was not possible to allow --socket=x11 during a build like you can with Flatpak-builder, so I couldn’t get it to run unit tests for modules that depended on GTK.

Why are builds with Flapjack so slow?

The slowest parts are caching each build step (I suspect here is where using BuildStream would help a lot) and exporting the runtime’s debug extension to the local Flatpak repository. For the latter, this used to be even slower, before my colleague Emmanuele Bassi suggested to use a “bare-user” repository. I’m still looking for a way to speed this up. I suspect it should be possible, since for Flapjack builds we would probably never care about the Flatpak repository history.

Can I use Flapjack to develop system components like GNOME Shell?

No. There still isn’t a good developer story for working on system components on an immutable OS! At Endless, the people who work on those components will generally replace their OSTree file system with a mutable one. This isn’t a very good strategy because it means you’re developing on a system that is different from what users are running in production, but I haven’t found any better way so far.


Thanks to my employer Endless for allowing me to reserve some time to write this tool in a way that it would be useful for the wider Flatpak community, rather than just internally.

That’s about it! I hope Flapjack is useful for you. If you have any other questions, feel free to ask me.

Where to find it

Flapjack’s page on PyPI:
The code on GitHub:
Report bugs and request features:

Andy Wingo: instruction explosion in guile

Mër, 17/01/2018 - 11:30pd

Greetings, fellow Schemers and compiler nerds: I bring fresh nargery!

instruction explosion

A couple years ago I made a list of compiler tasks for Guile. Most of these are still open, but I've been chipping away at the one labeled "instruction explosion":

Now we get more to the compiler side of things. Currently in Guile's VM there are instructions like vector-ref. This is a little silly: there are also instructions to branch on the type of an object (br-if-tc7 in this case), to get the vector's length, and to do a branching integer comparison. Really we should replace vector-ref with a combination of these test-and-branches, with real control flow in the function, and then the actual ref should use some more primitive unchecked memory reference instruction. Optimization could end up hoisting everything but the primitive unchecked memory reference, while preserving safety, which would be a win. But probably in most cases optimization wouldn't manage to do this, which would be a lose overall because you have more instruction dispatch.

Well, this transformation is something we need for native compilation anyway. I would accept a patch to do this kind of transformation on the master branch, after version 2.2.0 has forked. In theory this would remove most all high level instructions from the VM, making the bytecode closer to a virtual CPU, and likewise making it easier for the compiler to emit native code as it's working at a lower level.

Now that I'm getting close to finished I wanted to share some thoughts. Previous progress reports on the mailing list.

a simple loop

As an example, consider this loop that sums the 32-bit floats in a bytevector. I've annotated the code with lines and columns so that you can correspond different pieces to the assembly.

0 8 12 19 +-v-------v---v------v- | 1| (use-modules (rnrs bytevectors)) 2| (define (f32v-sum bv) 3| (let lp ((n 0) (sum 0.0)) 4| (if (< n (bytevector-length bv)) 5| (lp (+ n 4) 6| (+ sum (bytevector-ieee-single-native-ref bv n))) 7| sum)))

The assembly for the loop before instruction explosion went like this:

L1: 17 (handle-interrupts) at (unknown file):5:12 18 (uadd/immediate 0 1 4) 19 (bv-f32-ref 1 3 1) at (unknown file):6:19 20 (fadd 2 2 1) at (unknown file):6:12 21 (s64<? 0 4) at (unknown file):4:8 22 (jnl 8) ;; -> L4 23 (mov 1 0) at (unknown file):5:8 24 (j -7) ;; -> L1

So, already Guile's compiler has hoisted the (bytevector-length bv) and unboxed the loop index n and accumulator sum. This work aims to simplify further by exploding bv-f32-ref.

exploding the loop

In practice, instruction explosion happens in CPS conversion, as we are converting the Scheme-like Tree-IL language down to the CPS soup language. When we see a Tree-Il primcall (a call to a known primitive), instead of lowering it to a corresponding CPS primcall, we inline a whole blob of code.

In the concrete case of bv-f32-ref, we'd inline it with something like the following:

(unless (and (heap-object? bv) (eq? (heap-type-tag bv) %bytevector-tag)) (error "not a bytevector" bv)) (define len (word-ref bv 1)) (define ptr (word-ref bv 2)) (unless (and (<= 4 len) (<= idx (- len 4))) (error "out of range" idx)) (f32-ref ptr len)

As you can see, there are four branches hidden in the bv-f32-ref: two to check that the object is a bytevector, and two to check that the index is within range. In this explanation we assume that the offset idx is already unboxed, but actually unboxing the index ends up being part of this work as well.

One of the goals of instruction explosion was that by breaking the operation into a number of smaller, more orthogonal parts, native code generation would be easier, because the compiler would only have to know about those small bits. However without an optimizing compiler, it would be better to reify a call out to a specialized bv-f32-ref runtime routine instead of inlining all of this code -- probably whatever language you write your runtime routine in (C, rust, whatever) will do a better job optimizing than your compiler will.

But with an optimizing compiler, there is the possibility of removing possibly everything but the f32-ref. Guile doesn't quite get there, but almost; here's the post-explosion optimized assembly of the inner loop of f32v-sum:

L1: 27 (handle-interrupts) 28 (tag-fixnum 1 2) 29 (s64<? 2 4) at (unknown file):4:8 30 (jnl 15) ;; -> L5 31 (uadd/immediate 0 2 4) at (unknown file):5:12 32 (u64<? 2 7) at (unknown file):6:19 33 (jnl 5) ;; -> L2 34 (f32-ref 2 5 2) 35 (fadd 3 3 2) at (unknown file):6:12 36 (mov 2 0) at (unknown file):5:8 37 (j -10) ;; -> L1

good things

The first thing to note is that unlike the "before" code, there's no instruction in this loop that can throw an exception. Neat.

Next, note that there's no type check on the bytevector; the peeled iteration preceding the loop already proved that the bytevector is a bytevector.

And indeed there's no reference to the bytevector at all in the loop! The value being dereferenced in (f32-ref 2 5 2) is a raw pointer. (Read this instruction as, "sp[2] = *(float*)((byte*)sp[5] + (uptrdiff_t)sp[2])".) The compiler does something interesting; the f32-ref CPS primcall actually takes three arguments: the garbage-collected object protecting the pointer, the pointer itself, and the offset. The object itself doesn't appear in the residual code, but including it in the f32-ref primcall's inputs keeps it alive as long as the f32-ref itself is alive.

bad things

Then there are the limitations. Firstly, instruction 28 tags the u64 loop index as a fixnum, but never uses the result. Why is this here? Sadly it's because the value is used in the bailout at L2. Recall this pseudocode:

(unless (and (<= 4 len) (<= idx (- len 4))) (error "out of range" idx))

Here the error ends up lowering to a throw CPS term that the compiler recognizes as a bailout and renders out-of-line; cool. But it uses idx as an argument, as a tagged SCM value. The compiler untags the loop index, but has to keep a tagged version around for the error cases.

The right fix is probably some kind of allocation sinking pass that sinks the tag-fixnum to the bailouts. Oh well.

Additionally, there are two tests in the loop. Are both necessary? Turns out, yes :( Imagine you have a bytevector of length 1025. The loop continues until the last ref at offset 1024, which is within bounds of the bytevector but there's one one byte available at that point, so we need to throw an exception at this point. The compiler did as good a job as we could expect it to do.

is is worth it? where to now?

On the one hand, instruction explosion is a step sideways. The code is more optimal, but it's more instructions. Because Guile currently has a bytecode VM, that means more total interpreter overhead. Testing on a 40-megabyte bytevector of 32-bit floats, the exploded f32v-sum completes in 115 milliseconds compared to around 97 for the earlier version.

On the other hand, it is very easy to imagine how to compile these instructions to native code, either ahead-of-time or via a simple template JIT. You practically just have to look up the instructions in the corresponding ISA reference, is all. The result should perform quite well.

I will probably take a whack at a simple template JIT first that does no register allocation, then ahead-of-time compilation with register allocation. Getting the AOT-compiled artifacts to dynamically link with runtime routines is a sufficient pain in my mind that I will put it off a bit until later. I also need to figure out a good strategy for truly polymorphic operations like general integer addition; probably involving inline caches.

So that's where we're at :) Thanks for reading, and happy hacking in Guile in 2018!