You are here

Planet GNOME

Subscribe to Feed Planet GNOME
Planet GNOME - https://planet.gnome.org/
Përditësimi: 1 ditë 20 orë më parë

Jussi Pakkanen: C is dead, long live C (APIs)

Dje, 21/04/2024 - 3:39md

In the 80s and 90s software development landscape was quite different from today (or so I have been told). Everything that needed performance was written in C and things that did not were written in Perl. Because computers of the time were really slow, almost everything was in C. If you needed performance and fast development, you could write a C extension to Perl.

As C was the only game in town, anyone could use pretty much any other library directly. The number of dependencies available was minuscule compared to today, but you could use all of them fairly easily. Then things changed, as they have a tendency to do. First Python took over Perl. Then more and more languages started eroding C's dominant position. This lead to a duplication of effort. For example if you were using Java and wanted to parse XML (which was the coolness of its day), you'd need an XML parser written in Java. Just dropping libxml in your Java source tree would not cut it (you could still use native code libs but most people chose not to).

The number of languages and ecosystems kept growing and nowadays we have dozens of them. But suppose you want to provide a library that does something useful and you'd like it to be usable by as many people as possible. This is especially relevant for providing closed source libraries but the same applies to open source libs as well. You especially do not want to rewrite and maintain multiple implementations of the code in different languages. So what do you do?

Let's start by going through a list of programming languages and seeing what sort of dependencies they can use natively (i.e. the toolchain or stdlib provides this support out of the box rather than requiring an addon, code generator, IDL tool or the like)

  • C: C
  • Perl: Perl and C
  • Python: Python and C
  • C++: C++ and C
  • Rust: Rust and C
  • Java: Java and C
  • Lua: Lua and C
  • D: D, subset of C++ and C
  • Swift: Swift, Objective C, C++ (eventually?) and C
  • PrettyMuchAnyNewLanguage: itself and C
The message is quite clear. The only thing in common is C, so that is what you have to use. The alternative is maintaining an implementation per language leaving languages you explicitly do not support out in the cold.

So even though C as a language is (most likely) going away, C APIs are not. In fact, designing C APIs is a skill that might even see a resurgence as the language ecosystem fractures even further. Note that providing a library with a C API does not mean having to implement it in C. All languages have ways of providing libraries whose external API is compatible with C. As an extreme example, Visual Studio's C runtime libraries are nowadays written in C++.

CapyPDF's design and things picked up along the way

One of the main design goals of CapyPDF was that it should provide a C API and be usable from any language. It should also (eventually) provide a stable API and ABI. This means that the ground truth of the library's functionality is the C header. This turns out to have design implications to the library's internals that might be difficult to add in after the fact.

Hide everything

Perhaps the most important declaration in widely usable C headers is this.

typedef struct _someObject SomeObject;

In C parlance this means "there is a struct type _someObject somewhere, create an alias to it called SomeObjectType". This means that the caller can create pointers to structs of type SomeObject but do nothing else with them. This leads to the common "opaque structs" C API way of doing things:

SomeObject *o = some_object_new();some_object_do_something(o, "hello");
some_object_destroy(o);

This permits you to change the internal representation of the object while still maintaining stable public API and ABI. Avoid exposing the internals of structs whenever possible, because once made public they can never be changed.

Objects exposed via pointers must never move in memory

This one is fairly obvious when you think about it. Unfortunately it means that if you want to give users access to objects that are stored in an std::vector, you can't do it with pointers, which is the natural way of doing things in C. Pushing more entries in the vector will eventually cause the capacity to be exceeded so the storage will be reallocated and entries moved to the new backing store. This invalidates all pointers.

There are several solutions to this, but the simplest one is to access those objects via type safe indices instead. They are defined like this:

typedef struct { int32_t id; } SomeObjectId;

This struct behaves "like an integer" in that you can pass it around as an int but it does not implicitly convert to any other "integer" type.

Objects must be destructable in any order

It is easy to write into documentation that "objects of type X must be destroyed before any object Y that they use". Unfortunately garbage collected languages do not read your docs and thus provide no guarantee whatsoever on object destruction order. When used in this way any object must be destructable at any time regardless of the state of any other object.

This is the opposite of how modern languages want to work. For the case of CapyPDF especially page draw contexts were done in an RAII style where they would submit their changes upon destruction. For an internal API this is nice and usable but for a public C API it is not. The implicit action had to be replaced with an explicit function to add the page that takes both object pointers (the draw context and document) as arguments. This ensures that they both must exist and be valid at the point of call.

Use transactionality whenever possible

It would be nice if all objects were immutable but sadly that would mean that you can't actually do anything. A library must provide ways for end users to create, mutate and destroy objects. When possible try to do this with a builder object. That is, the user creates a "transactional change" that they want to do. They can call setters and such as much as they want, but they don't affect the "actual document". All of this new state is isolated in the builder object. Once the user is finished they submit the change to the main object which is then validated and either rejected or accepted as a whole. The builder object then becomes an empty shell that can be either reused or discarded.

CapyPDF is an append only library. Once something has been "committed" it can never be taken out again. This is also something to strive towards, because removing things is a lot harder than adding them.

Prefer copying to sharing

When the library is given some piece of data, it makes a private copy of it. Otherwise it would need to coordinate the life cycle of the shared piece of data with the caller. This is where bugs lie. Copying does cost some performance but makes a whole class of difficult bugs just go away. In the case of CapyPDF the performance hit turned out not to be an issue since most of the runtime is spent compressing the output with zlib.

Every function call can fail, even those that can't

Every function in the library returns an error code. Even those that have no way of failing, because circumstances can change in the future. Maybe some input that could be anything somehow needs to be validated now and you can't change the function definition as it would break API. Thus every function returns an error code (except the function that converts an error code into an error string). Sadly this means that all "return values" must be handled via out parameters.

ErrorCode some_object_new(SomeObject **out_ptr);

This is not great, but such is life. 

Think of C APIs as "in-process RPC"When designing the API of CapyPDF it was helpful to think of it like a call to a remote endpoint somewhere out there on the Internet. This makes you want to design functions that are as high level as possible and try to ignore all implementation details you can, almost as if the C API was a slightly cumbersome DSL. 

Peter Hutterer: udev-hid-bpf: quickstart tooling to fix your HID devices with eBPF

Enj, 18/04/2024 - 6:17pd

For the last few months, Benjamin Tissoires and I have been working on and polishing a little tool called udev-hid-bpf [1]. This is the scaffolding required quickly and easily write, test and eventually fix your HID input devices (mouse, keyboard, etc.) via a BPF program instead of a full-blown custom kernel driver or a semi-full-blown kernel patch. To understand how it works, you need to know two things: HID and BPF [2].

Why BPF for HID?

HID is the Human Interface Device standard and the most common way input devices communicate with the host (HID over USB, HID over Bluetooth, etc.). It has two core components: the "report descriptor" and "reports", both of which are byte arrays. The report descriptor is a fixed burnt-in-ROM byte array that (in rather convoluted terms) tells us what we'll find in the reports. Things like "bits 16 through to 24 is the delta x coordinate" or "bit 5 is the binary button state for button 3 in degrees celcius". The reports themselves are sent at (usually) regular intervals and contain the data in the described format, as the devices perceives reality. If you're interested in more details, see Understanding HID report descriptors.

BPF or more correctly eBPF is a Linux kernel technology to write programs in a subset of C, compile it and load it into the kernel. The magic thing here is that the kernel will verify it, so once loaded, the program is "safe". And because it's safe it can be run in kernel space which means it's fast. eBPF was originally written for network packet filters but as of kernel v6.3 and thanks to Benjamin, we have BPF in the HID subsystem. HID actually lends itself really well to BPF because, well, we have a byte array and to fix our devices we need to do complicated things like "toggle that bit to zero" or "swap those two values".

If we want to fix our devices we usually need to do one of two things: fix the report descriptor to enable/disable/change some of the values the device pretends to support. For example, we can say we support 5 buttons instead of the supposed 8. Or we need to fix the report by e.g. inverting the y value for the device. This can be done in a custom kernel driver but a HID BPF program is quite a lot more convenient.

HID-BPF programs

For illustration purposes, here's the example program to flip the y coordinate. HID BPF programs are usually device specific, we need to know that the e.g. the y coordinate is 16 bits and sits in bytes 3 and 4 (little endian):

SEC("fmod_ret/hid_bpf_device_event") int BPF_PROG(hid_y_event, struct hid_bpf_ctx *hctx) { s16 y; __u8 *data = hid_bpf_get_data(hctx, 0 /* offset */, 9 /* size */); if (!data) return 0; /* EPERM check */ y = data[3] | (data[4] << 8); y = -y; data[3] = y & 0xFF; data[4] = (y >> 8) & 0xFF; return 0; } That's it. HID-BPF is invoked before the kernel handles the HID report/report descriptor so to the kernel the modified report looks as if it came from the device.

As said above, this is device specific because where the coordinates is in the report depends on the device (the report descriptor will tell us). In this example we want to ensure the BPF program is only loaded for our device (vid/pid of 04d9/a09f), and for extra safety we also double-check that the report descriptor matches.

// The bpf.o will only be loaded for devices in this list HID_BPF_CONFIG( HID_DEVICE(BUS_USB, HID_GROUP_GENERIC, 0x04D9, 0xA09F) ); SEC("syscall") int probe(struct hid_bpf_probe_args *ctx) { /* * The device exports 3 interfaces. * The mouse interface has a report descriptor of length 71. * So if report descriptor size is not 71, mark as -EINVAL */ ctx->retval = ctx->rdesc_size != 71; if (ctx->retval) ctx->retval = -EINVAL; return 0; } Obviously the check in probe() can be as complicated as you want.

This is pretty much it, the full working program only has a few extra includes and boilerplate. So it mostly comes down to compiling and running it, and this is where udev-hid-bpf comes in.

udev-hid-bpf as loader

udev-hid-bpf is a tool to make the development and testing of HID BPF programs simple, and collect HID BPF programs. You basically run meson compile and meson install and voila, whatever BPF program applies to your devices will be auto-loaded next time you plug those in. If you just want to test a single bpf.o file you can udev-hid-bpf install /path/to/foo.bpf.o and it will install the required udev rule for it to get loaded whenever the device is plugged in. If you don't know how to compile, you can grab a tarball from our CI and test the pre-compiled bpf.o. Hooray, even simpler.

udev-hid-bpf is written in Rust but you don't need to know Rust, it's just the scaffolding. The BPF programs are all in C. Rust just gives us a relatively easy way to provide a static binary that will work on most tester's machines.

The documentation for udev-hid-bpf is here. So if you have a device that needs a hardware quirk or just has an annoying behaviour that you always wanted to fix, well, now's the time. Fixing your device has never been easier! [3].

[1] Yes, the name is meh but you're welcome to come up with a better one and go back in time to suggest it a few months ago.
[2] Because I'm lazy the terms eBPF and BPF will be used interchangeably in this article. Because the difference doesn't really matter in this context, it's all eBPF anyway but nobody has the time to type that extra "e".
[3] Citation needed

Matthias Clasen: Graphics offload revisited

Mër, 17/04/2024 - 7:39md

We first introduced support for dmabufs and graphics offload last fall, and it is included in GTK 4.14. Since then, some improvements have happened, so it is time for an update.

Improvements down the stack

The GStreamer 1.24 release has improved support for explicit modifiers, and the GStreamer media backend in GTK has been updated to request dmabufs from GStreamer.

Another thing that happens on the GStreamer side is that dmabufs sometimes come with padding: in that case GStreamer will give us a buffer with a viewport and expect us to only show that part of the buffer. This is sometimes necessary to accommodate stride and size requirements of hardware decoders.

GTK 4.14 supports this when offloading, and only shows the part of the dmabuf indicated by the viewport.

Improvements inside GTK

We’ve merged new GSK renderers for GTK 4.14. The new renderers support dmabufs in the same way as the old gl renderer. In addition, the new Vulkan renderer produces dmabufs when rendering to a texture.

In GTK 4.16, the GtkGLArea widget will also provide dmabuf textures if it can, so you can put it in a GtkGraphicsOffload widget to send its output directly to the compositor.

You can see this in action in the shadertoy demo in gtk4-demo in git main.

Shadertoy demo with golden outline around offloaded graphics Improved compositor interaction

One nice thing about graphics offload is that the compositor may be able to pass the dmabuf to the KMS apis of the kernel without any extra copies or compositing. This is known as direct scanout and it helps reduce power consumption since large parts of the GPU aren’t used.

The compositor can only do this if the dmabuf is attached to a fullscreen surface and has the right dimensions to cover it fully. If it does not cover it fully, the compositor needs some assurance that it is ok to leave the outside parts black.

One way for clients to provide that assurance is to attach a specially constructed black buffer to a surface below the one that has the dmabuf attached. GSK will do this now if it finds black color node in the rendernode tree, and the GtkGraphicsOffload widget will put that color there if you set the “black-background” property. This should greatly increase the chances that you can enjoy the benefits of direct scanout when playing fullscreen video.

Offloaded content with fullscreen black background

In implementing this for GTK 4.16, we found some issues with mutter’s support for single-pixel buffers, but these have been fixed quickly.

To see graphics offload and direct scanout in action in a GTK4 video player, you can try the Light Video Player.

If you want to find out if graphics offload works on your system or debug why it doesn’t, this recent post by Benjamin is very helpful.

Summary

GTK 4 continues to improve for efficient video playback and drives improvements in this area up and down the stack.

A big thank you for pushing all of this forward goes to Robert Mader.

Philippe Normand: From WebKit/GStreamer to rust-av, a journey on our stack’s layers

Mar, 16/04/2024 - 10:15md

In this post I’ll try to document the journey starting from a WebKit issue and ending up improving third-party projects that WebKitGTK and WPEWebKit depend on.

I’ve been working on WebKit’s GStreamer backends for a while. Usually some new feature needed on WebKit side would trigger work on GStreamer. That’s quite common and healthy actually, by improving GStreamer (bug fixes or implementing new features) we make the whole stack stronger (hopefully). It’s not hard to imagine other web-engines, such as Servo for instance, leveraging fixes made in GStreamer in the context of WebKit use-cases.

Sometimes though we have to go deeper and this is what this post is about!

Since version 2.44, WebKitGTK and WPEWebKit ship with a WebCodecs backend. That backend leverages the wide range of GStreamer audio and video decoders/encoders to give low-level access to encoded (or decoded) audio/video frames to Web developers. I delivered a lightning talk at gst-conf 2023 about this topic.

There are still some issues to fix regarding performance and some W3C web platform tests are still failing. The AV1 decoding tests were flagged early on while I was working on WebCodecs, I didn’t have time back then to investigate the failures further, but a couple weeks ago I went back to those specific issues.

The WebKit layout tests harness is executed by various post-commit bots, on various platforms. The WebKitGTK and WPEWebKit bots run on Linux. The WebCodec tests for AV1 currently make use of the GStreamer av1enc and dav1ddec elements. We currently don’t run the tests using the modern and hardware-accelerated vaav1enc and vaav1dec elements because the bots don’t have compatible GPUs.

The decoding tests were failing, this one for instance (the ?av1 variant). In that test both encoding and decoding are tested, but decoding was failing, for a couple reasons. Rabbit hole starts here. After debugging this for a while, it was clear that the colorspace information was lost between the encoded chunks and the decoded frames. The decoded video frames didn’t have the expected colorimetry values.

The VideoDecoderGStreamer class basically takes encoded chunks and notifies decoded VideoFrameGStreamer objects to the upper layers (JS) in WebCore. A video frame is basically a GstSample (Buffer and Caps) and we have code in place to interpret the colorimetry parameters exposed in the sample caps and translate those to the various WebCore equivalents. So far so good, but the caps set on the dav1ddec elements didn’t have those informations! I thought the dav1ddec element could be fixed, “shouldn’t be that hard” and I knew that code because I wrote it in 2018 :)

So let’s fix the GStreamer dav1ddec element. It’s a video decoder written in Rust, relying on the dav1d-rs bindings of the popular C libdav1d library. The dav1ddec element basically feeds encoded chunks of data to dav1d using the dav1d-rs bindings. In return, the bindings provide the decoded frames using a Dav1dPicture Rust structure and the dav1ddec GStreamer element basically makes buffers and caps out of this decoded picture. The dav1d-rs bindings are quite minimal, we implemented API on a per-need basis so far, so it wasn’t very surprising that… colorimetry information for decoded pictures was not exposed! Rabbit hole goes one level deeper.

So let’s add colorimetry API in dav1d-rs. When working on (Rust) bindings of a C library, if you need to expose additional API the answer is quite often in the C headers of the library. Every Dav1dPicture has a Dav1dSequenceHeader, in which we can see a few interesting fields:

typedef struct Dav1dSequenceHeader { ... enum Dav1dColorPrimaries pri; ///< color primaries (av1) enum Dav1dTransferCharacteristics trc; ///< transfer characteristics (av1) enum Dav1dMatrixCoefficients mtrx; ///< matrix coefficients (av1) enum Dav1dChromaSamplePosition chr; ///< chroma sample position (av1) ... uint8_t color_range; ... ... } Dav1dSequenceHeader;

After sharing a naive branch with rust-av co-maintainers Luca Barbato and Sebastian Dröge, I came up with a couple pull-requests that eventually were shipped in version 0.10.3 of dav1d-rs. I won’t deny matching primaries, transfer, matrix and chroma-site enum values to rust-av‘s Pixel enum was a bit challenging :P Anyway, with dav1d-rs fixed up, rabbit hole level goes up one level :)

Now with the needed dav1d-rs API, the GStreamer dav1ddec element could be fixed. Again, matching the various enum values to their GStreamer equivalent was an interesting exercise. The merge request was merged, but to this date it’s not shipped in a stable gst-plugins-rs release yet. There’s one more complication here, ABI broke between dav1d 1.2 and 1.4 versions. The dav1d-rs 0.10.3 release expects the latter. I’m not sure how we will cope with that in terms of gst-plugins-rs release versioning…

Anyway, WebKit’s runtime environment can be adapted to ship dav1d 1.4 and development version of the dav1ddec element, which is what was done in this pull request. The rabbit is getting out of his hole.

The WebCodec AV1 tests were finally fixed in WebKit, by this pull request. Beyond colorimetry handling a few more fixes were needed, but luckily those didn’t require any fixes outside of WebKit.

Wrapping up, if you’re still reading this post, I thank you for your patience. Working on inter-connected projects can look a bit daunting at times, but eventually the whole ecosystem benefits from cross-project collaborations like this one. Thanks to Luca and Sebastian for the help and reviews in dav1d-rs and the dav1ddec element. Thanks to my fellow Igalia colleagues for the WebKit reviews.

Sonny Piers: Retro v2

Mar, 16/04/2024 - 1:36md

Retro; the customizable clock widget is now available on Flathub in v2

This new release comes with

Support both 12h and 24h clock format. It follows GNOME Date & Time preference while being sandboxed thanks to libportal new API for the settings portal.

Energy usage has been improved by using a more efficient method to get the time and by making use of the magic GtkWindow.suspended property to stop updating the clock when the window is not visible.

Better support for round clocks. The new GTK renderer fixed the visual glitch on transparent corners caused by large border radius. Retro now restores window dimensions and disables the border radius on maximize to make it look good, no matter the shape.

Controls have been moved to a floating header bar to stay out of the way and prevent interference with customizations.

There are further improvements to do, but I decided to publish early because Retro was using GNOME 43 runtime which is end-of-life and I have limited time to spend on it.

Help welcome https://github.com/sonnyp/Retro/issues

Benjamin Otte: Making GTK graphics offloading work

Dje, 14/04/2024 - 6:37md

(I need to put that somewhere because people ask about it and having a little post to explain it is nice.)

What’s it about?
GTK recently introduced the ability to offload graphics rendering, but it needs rather recent everything to work well for offloading video decoding.

So, what do you need to make sure this works?

First, you of course need a video to test. On a modern desktop computer, you want a 4k 60fps video or better to have something that pushes your CPU to the limits so you know when it doesn’t work. Of course, the recommendation has to be Big Buck Bunny at the highest of qualities – be aware that the most excellent 4000×2250 @ 60fps encoding is 850MB. On my Intel TigerLake, that occasionally drops frames when I play that with software decoding, and I can definitely hear the fan turn on.
When selecting a video file, keep in mind that the format matters.

Second, you need hardware decoding. That is provided by libva and can be queried using the vainfo tool (which comes in the `libva-utils` package in Fedora). If that prints a long list of formats (it’s about 40 for me), you’re good. If it doesn’t, you’ll need to go hunt for the drivers – due to the patent madness surrounding video formats that may be more complicated than you wish. For example, on my Intel laptop on Fedora, I need the intel-media-driver package which is hidden in the nonfree RPMFusion repository.
If you look at the list from vainfo, the format names give some hints – usually VP9 and MPEG2 exist. H264 and HEVC aka H265 are the patent madness, and recent GPUs can sometimes do AV1. The Big Buck Bunny video from above is H264, so if you’re following along, make sure that works.

Now you need a working video player. I’ll be using gtk4-demo (which is in the gtk4-devel-tools package, but you already have that installed of course) and its video player example because I know it works there. A shoutout goes out to livi which was the first non-demo video player to have a release that supports graphics offloading. You need GTK 4.14 and GStreamer 1.24 for this to work. At the time of writing, this is only available in Fedora rawhide, but hopefully Fedora 40 will gain the packages soon.

If you installed new packages above, now is a good time to check if GStreamer picked up all the hardware decoders. gst-inspect-1.0 va will list all the elements with libva support. If it didn’t pick up decoders for all the formats it should have (there should be a vah264dec listed for H264 if you want to decode the video above), then the easiest way to get them is to delete GStreamer’s registry cache in ~/.cache/gstreamer-1.0.

If you want to make sure GStreamer does the right thing, you can run the video player with GST_DEBUG=GST_ELEMENT_FACTORY:4. It will print out debug messages about all the elements it is creating for playback. If that includes a line for an element from the previous list (like `vah264dec` in our example) things are working. If it picks something else (like `avdec_h264` or `openh264dec`) then they are not.

Finally you need a compositor that supports YUV formats. Most compositors do – gnome-shell does since version 45 for example – but checking can’t hurt: If wayland-info (in the wayland-utils package in Fedora) lists the NV12 format, you’re good.

And now everything works.
If you have a 2nd monitor you can marvel at what goes on behind the scenes by running the video player with GDK_DEBUG=dmabuf,offload and GTK will tell you what it does for every frame, and you can see it dynamically switching between offloading or not as you fullscreen (or not), click on the controls (or not) and so on. Or you could have used it previously to see why things didn’t work.
You can also look at the top and gputop variant of your choice and you will see that the video player takes a bit of CPU to drive the video decoding engine and inform the compositor about new frames and the compositor takes a bit of CPU telling the 3D engine to composite things and send them to the monitor. With the video above it’s around 10% on my laptop for the CPU usage each and about 20% GPU usage.

And before anyone starts complaining that this is way too complicated: If you read carefully, all of this should work out of the box in the near future. This post just lists the tools to troubleshoot what went wrong while developing a fast video player.

Felix Häcker: Fragments 3.0

Dje, 07/04/2024 - 8:51md

It has finally happened! The long awaited major update of Fragments is now available, which includes many exciting new features.

The most important addition is support for torrent files. It is now possible to select the files you want to download from a torrent. The files can be searched and sorted, individual files can be opened directly from Fragments.

https://blogs.gnome.org/haeckerfelix/files/2024/04/Screencast-from-2024-04-07-19-53-28.webm

Further new features

    • Added torrents can now be searched
    • In addition to magnet links, *.torrent links in the clipboard are now also recognized
    • Prevent system from going to sleep when torrents are active
    • New torrents can be added via drag and drop
    • Automatic trashing of *.torrent files after adding them
    • Stop downloads when a metered network gets detected

    Improvements

      • When controlling remote sessions, the local Transmission daemon no longer gets started
      • Torrents are automatically restarted if an incorrect location has been fixed
      • Torrents can now also be added via CLI
      • Clipboard toast notification is no longer displayed multiple times
      • Reduced CPU/resource consumption through adaptive polling interval
      • Improved accessibility of the user interface
      • Modernized user interface through the use of new Adwaita widgets
      • Update from Transmission 3.0.5 to 4.0.5

      Thanks to Maximiliano and Tobias for once again helping with this release. As usual this release contains many other improvements, fixes and new translations thanks to all the contributors and upstream projects.

      Also a big shoutout to the Transmission project, without which Fragments would not be possible, for their fantastic 4.0 release!

      The new Fragments release can be downloaded and installed from Flathub: