You are here

Agreguesi i feed

Colin Walters: LLMs and core software: human driven

Planet GNOME - Mër, 18/03/2026 - 9:17md

It’s clear LLMs are one of the biggest changes in technology ever. The rate of progress is astounding: recently due to a configuration mistake I accidentally used Claude Sonnet 3.5 (released ~2 years ago) instead of Opus 4.6 for a task and looked at the output and thought “what is this garbage”?

But daily now: Opus 4.6 is able to generate reasonable PoC level Rust code for complex tasks for me. It’s not perfect – it’s a combination of exhausting and exhilarating to find the 10% absolutely bonkers/broken code that still makes it past subagents.

So yes I use LLMs every day, but I will be clear: if I could push a button to “un-invent” them I absolutely would because I think the long term issues in larger society (not being able to trust any media, and many of the things from Dario’s recent blog etc.) will outweigh the benefits.

But since we can’t un-invent them: here’s my opinion on how they should be used. As a baseline, I agree with a lot from this doc from Oxide about LLMs. What I want to talk about is especially around some of the norms/tools that I see as important for LLM use, following principles similar to those.

On framing: there’s “core” software vs “bespoke”. An entirely new capability of course is for e.g. a nontechnical restaurant owner to use an LLM to generate (“vibe code”) a website (excepting hopefully online orderings and payments!). I’m not overly concerned about this.

Whereas “core” software is what organizations/businesses provide/maintain for others. I work for a company (Red Hat) that produces a lot of this. I am sure no one would want to run for real an operating system, cluster filesystem, web browser, monitoring system etc. that was primarily “vibe coded”.

And while I respect people and groups that are trying to entirely ban LLM use, I don’t think that’s viable for at least my space.

Hence the subject of this blog is my perspective on how LLMs should be used for “core” software: not vibe coding, but using LLMs responsibly and intelligently – and always under human control and review.

Agents should amplify and be controlled by humans

I think most of the industry would agree we can’t give responsibility to LLMs. That means they must be overseen by humans. If they’re overseen by a human, then I think they should be amplifying what that human thinks/does as a baseline – intersected with the constraints of the task of course.

On “amplification”: Everyone using a LLM to generate content should inject their own system prompt (e.g. AGENTS.md) or equivalent. Here’s mine – notice I turn off all the emoji etc. and try hard to tune down bulleted lists because that’s not my style. This is a truly baseline thing to do.

Now most LLM generated content targeted for core software is still going to need review, but just ensuring that the baseline matches what the human does helps ensure alignment.

Pull request reviews

Let’s focus on a very classic problem: pull request reviews. Many projects have wired up a flow such that when a PR comes in, it gets reviewed by a model automatically. Many projects and tools pitch this. We use one on some of my projects.

But I want to get away from this because in my experience these reviews are a combination of:

  • Extremely insightful and correct things (there’s some amazing fine-tuning and tool use that must have happened to find some issues pointed out by some of these)
  • Annoying nitpicks that no one cares about (not handling spaces in a filename in a shell script used for tests)
  • Broken stuff like getting confused by things that happened after its training cutoff (e.g. Gemini especially seems to get confused by referencing the current date, and also is unaware of newer Rust features, etc)

In practice, we just want the first of course.

How I think it should work:

  • A pull request comes in
  • It gets auto-assigned to a human on the team for review
  • A human contributing to that project is running their own agents (wherever: could be local or in the cloud) using their own configuration (but of course still honoring the project’s default development setup and the project’s AGENTS.md etc)
  • A new containerized/sandboxed agent may be spawned automatically, or perhaps the human needs to click a button to do so – or perhaps the human sees the PR come in and thinks “this one needs a deeper review, didn’t we hit a perf issue with the database before?” and adds that to a prompt for the agent.
  • The agent prepares a draft review that only the human can see.
  • The human reviews/edits the draft PR review, and has the opportunity to remove confabulations, add their own content etc. And to send the agent back to look more closely at some code (i.e. this part can be a loop)
  • When the human is happy they click the “submit review” button.
  • Goal: it is 100% clear what parts are LLM generated vs human generated for the reader.

I wrote this agent skill to try to make this work well, and if you search you can see it in action in a few places, though I haven’t truly tried to scale this up.

I think the above matches the vision of LLMs amplifying humans.

Code Generation

There’s no doubt that LLMs can be amazing code generators, and I use them every day for that. But for any “core” software I work on, I absolutely review all of the output – not just superficially, and changes to core algorithms very closely.

At least in my experience the reality is still there’s that percentage of the time when the agent decided to reimplement base64 encoding for no reason, or disable the tests claiming “the environment didn’t support it” etc.

And to me it’s still a baseline for “core” software to require another human review to merge (per above!) with their own customized LLM assisting them (ideally a different model, etc).

FOSS vs closed

Of course, my position here is biased a bit by working on FOSS – I still very much believe in that, and working in a FOSS context can be quite different than working in a “closed environment” where a company/organization may reasonably want to (and be able to) apply uniform rules across a codebase.

While for sure LLMs allow organizations to create their own Linux kernel filesystems or bespoke Kubernetes forks or virtual machine runtime or whatever – it’s not clear to me that it is a good idea for most to do so. I think shared (FOSS) infrastructure that is productized by various companies, provided as a service and maintained by human experts in that problem domain still makes sense. And how we develop that matters a lot.

UK Plans To Require Labels On AI-Generated Content

Slashdot - Mër, 18/03/2026 - 9:00md
An anonymous reader quotes a report from Reuters: Britain plans to consider requiring labels on AI-generated content to protect consumers from disinformation and deepfakes, the government said on Wednesday, as it outlined other areas of focus to tackle the evolving global challenge. Technology minister Liz Kendall stressed the need to strike the right balance between protecting the creative industries and allowing the AI sector to innovate, saying in a statement that the government would take time to "get this right." The next phase of the government's work on copyright and AI would also look at the harms posed by digital replicas without consent, ways for creators to control their work online and support for independent creative organizations, she said. [...] Louise Popple, a copyright expert at law firm Taylor Wessing, noted that the government had not ruled out a broad exception that would allow AI developers to train on copyright works. "That's a subtle difference of approach and could be interpreted to mean that everything is still up for grabs" she said. "It feels very much like the hard issues are being kicked down the road by the government." In 2024, Britain proposed easing copyright rules to let developers train models on lawfully accessed material, with creators able to reserve their rights. On Wednesday, Kendall said that having engaged with creatives, AI firms, industry bodies, unions and academics, the government had concluded it "no longer has a preferred option." "We will help creatives control how their work is used. This sits at the heart of our ambition for creatives – including independent and smaller creative organizations -- to be paid fairly," she said.

Read more of this story at Slashdot.

Meta Is Shutting Down VR Social Platform Horizon Worlds

Slashdot - Mër, 18/03/2026 - 8:00md
Meta is shutting down its VR social platform Horizon Worlds, which was once a key piece of the pivot to the metaverse. The company said the app will be taken off the Quest store at the end of March, and fully removed from Quest headsets by June 15. After that date, it will shift to a standalone "mobile-only experience." CNBC reports: The shift for Horizon Worlds, which was once a central part of the company's push into virtual reality, comes weeks after Meta cut over 1,000 employees from Reality Labs, the unit responsible for the metaverse. [...] The social platform has never drawn more than a couple hundred thousand active users a month, CNBC previously reported. The virtual 3D social network where avatars could interact and play games with other users officially launched in late 2021. It operated exclusively on the Quest VR platform until Meta launched a mobile app version in September 2023. The mobile version of Horizon Worlds was built to provide an entry point for users without VR headsets, functioning similarly to Roblox.

Read more of this story at Slashdot.

SaaS Apocalypse Could Be OpenSource's Greatest Opportunity

Slashdot - Mër, 18/03/2026 - 7:00md
Longtime Slashdot reader internet-redstar writes: Nearly a trillion dollars has been wiped from software stocks in 2026, with hedge funds making billions shorting Salesforce, HubSpot, and Atlassian. At FOSDEM 2026, cURL maintainer Daniel Stenberg shut down his bug bounty program after AI-generated slop overwhelmed his team. A new article on HackerNoon argues that most commercial SaaS could inevitably become OpenSource, not out of ideology but economics. The author points to Proxmox replacing VMware at enterprise scale and startups like Holosign replicating DocuSign at $19/month flat as evidence. The catch, the article claims, is that maintainers who refuse to embrace AI tools risk being forked, or simply replicated from scratch, by those who do.

Read more of this story at Slashdot.

2026 Turing Award Goes To Inventors of Quantum Cryptography

Slashdot - Mër, 18/03/2026 - 6:00md
Dave Knott shares a report from the New York Times: On Wednesday, the Association for Computing Machinery, the world's largest society of computing professionals, said Drs. Charles Bennett and Gilles Brassard had won this year's Turing Award for their work on quantum cryptography and related technologies. The Turing Award, which was introduced in 1966, is often called the Nobel Prize of computing, and it includes a $1 million prize, which the two scientists will share. [...] The two met in 1979 while swimming in the Atlantic just off the north shore of Puerto Rico. They were taking a break while attending an academic conference in San Juan. Dr. Bennett swam up to Dr. Brassard and suggested they use quantum mechanics to create a bank note that could never be forged. Collaborating between Montreal and New York, they applied Dr. Bennett's idea to subway tokens rather than bank notes. In a research paper published in 1983, they showed that their quantum subway tokens could never be forged, even if someone managed to steal the subway turnstile housing the elaborate hardware needed to read them. This led to quantum cryptography. After describing their new form of encryption in a research paper published in 1984, they demonstrated the technology with a physical experiment five years later. Called BB84, their system used photons -- particles of light -- to create encryption keys used to lock and unlock digital data. Thanks to the laws of quantum mechanics, the behavior of a photon changes if someone looks at it. This means that if anyone tries to steal the keys, he or she will leave a telltale sign of the attempted theft -- a bit like breaking the seal on an aspirin bottle.

Read more of this story at Slashdot.

next-20260318: linux-next

Kernel Linux - Mër, 18/03/2026 - 5:49md
Version:next-20260318 (linux-next) Released:2026-03-18

n8n 1.122.0 Critical RCE Auth Bypass Exploit CVE-2025-68613

LinuxSecurity.com - Mër, 18/03/2026 - 5:29md
n8n (CVE-2025-68613) is an open-source automation tool used to connect APIs, databases, and SaaS apps into workflows. It is commonly used to move data between systems, trigger jobs, and tie services together, and in many environments, it also holds credentials and access to internal services.

Federal Cyber Experts Called Microsoft's Cloud 'a Pile of Shit', Yet Approved It Anyway

Slashdot - Mër, 18/03/2026 - 5:00md
ProPublica reports that federal cybersecurity reviewers had serious, yearslong concerns about Microsoft's GCC High cloud offering, yet they approved it anyway because the product was already deeply embedded across government. As one member of the team put it: "The package is a pile of shit." From the report: In late 2024, the federal government's cybersecurity evaluators rendered a troubling verdict on one of Microsoft's biggest cloud computing offerings. The tech giant's "lack of proper detailed security documentation" left reviewers with a "lack of confidence in assessing the system's overall security posture," according to an internal government report reviewed by ProPublica. For years, reviewers said, Microsoft had tried and failed to fully explain how it protects sensitive information in the cloud as it hops from server to server across the digital terrain. Given that and other unknowns, government experts couldn't vouch for the technology's security. Such judgments would be damning for any company seeking to sell its wares to the U.S. government, but it should have been particularly devastating for Microsoft. The tech giant's products had been at the heart of two major cybersecurity attacks against the U.S. in three years. In one, Russian hackers exploited a weakness to steal sensitive data from a number of federal agencies, including the National Nuclear Security Administration. In the other, Chinese hackers infiltrated the email accounts of a Cabinet member and other senior government officials. The federal government could be further exposed if it couldn't verify the cybersecurity of Microsoft's Government Community Cloud High, a suite of cloud-based services intended to safeguard some of the nation's most sensitive information. Yet, in a highly unusual move that still reverberates across Washington, the Federal Risk and Authorization Management Program, or FedRAMP, authorized the product anyway, bestowing what amounts to the federal government's cybersecurity seal of approval. FedRAMP's ruling -- which included a kind of "buyer beware" notice to any federal agency considering GCC High -- helped Microsoft expand a government business empire worth billions of dollars. "BOOM SHAKA LAKA," Richard Wakeman, one of the company's chief security architects, boasted in an online forum, celebrating the milestone with a meme of Leonardo DiCaprio in "The Wolf of Wall Street." It was not the type of outcome that federal policymakers envisioned a decade and a half ago when they embraced the cloud revolution and created FedRAMP to help safeguard the government's cybersecurity. The program's layers of review, which included an assessment by outside experts, were supposed to ensure that service providers like Microsoft could be entrusted with the government's secrets. But ProPublica's investigation -- drawn from internal FedRAMP memos, logs, emails, meeting minutes, and interviews with seven former and current government employees and contractors -- found breakdowns at every juncture of that process. It also found a remarkable deference to Microsoft, even as the company's products and practices were central to two of the most damaging cyberattacks ever carried out against the government.

Read more of this story at Slashdot.

Apple Can Delist Apps 'With Or Without Cause,' Judge Says In Loss For Musi App

Slashdot - Mër, 18/03/2026 - 4:00md
An anonymous reader quotes a report from Ars Technica: Musi, a free music streaming app that had tens of millions of iPhone downloads and garnered plenty of controversy over its method of acquiring music, has lost an attempt to get back on Apple's App Store. A federal judge dismissed Musi's lawsuit against Apple with prejudice and sanctioned Musi's lawyers for "mak[ing] up facts to fill the perceived gaps in Musi's case." Musi built a streaming service without striking its own deals with copyright holders. It did so by playing music from YouTube, writing in its 2024 lawsuit against Apple that "the Musi app plays or displays content based on the user's own interactions with YouTube and enhances the user experience via Musi's proprietary technology." Musi's app displayed its own ads but let users remove them for a one-time fee of $5.99. Musi claimed it complied with YouTube's terms, but Apple removed it from the App Store in September 2024. Musi does not offer an Android app. Musi alleged that Apple delisted its app based on "unsubstantiated" intellectual property claims from YouTube and that Apple violated its own Developer Program License Agreement (DPLA) by delisting the app. Musi was handed a resounding defeat yesterday in two rulings from US District Judge Eumi Lee in the Northern District of California. Lee found that Apple can remove apps "with or without cause," as stipulated in the developer agreement. Lee wrote (PDF): "The plain language of the DPLA governs because it is clear and explicit: Apple may 'cease marketing, offering, and allowing download by end-users of the [Musi app] at any time, with or without cause, by providing notice of termination.' Based on this language, Apple had the right to cease offering the Musi app without cause if Apple provided notice to Musi. The complaint alleges, and Musi does not dispute, that Apple gave Musi the required notice. Therefore, Apple's decision to remove the Musi app from the App Store did not breach the DPLA."

Read more of this story at Slashdot.

Faqet

Subscribe to AlbLinux agreguesi