This is a fascinating explotation of how LLMs fall for prompt injection attacks. It turns out that they learn to recognize the style of text in different role/instruction blocks, and not just the tags.
Their conclusion:
Role tags were a formatting trick that became the security architecture and the cognitive scaffolding of modern LLMs. We’ve shown that this architecture doesn’t survive into the model’s actual representations, and that such role confusion is linked to prompt injection.
Unless LLMs achieve genuine role perception, we think injection defense will remain a perpetual whack-a-mole game. And the continuous nature of role boundaries opens the threat of injections designed to subtly shift LLM states through seemingly innocuous text, legally and at scale.
More generally, roles are quietly one of the most important abstractions in the LLM stack, providing the boundaries meant to separate self from other, thought from communication, instruction from data. They’re human-controlled switches in an otherwise continuous system. We think they deserve a lot more study than they’ve gotten.
Last week, I listened to a fascinating talk by K. Melton on cognitive security, cognitive hacking, and reality pentesting. The slides from the talk are here, but—even better—Menton has a long essay laying out the basic concepts and ideas.
The whole thing is important and well worth reading, and I hesitate to excerpt. Here’s a taste:
The NeuroCompiler is where raw sensory data gets interpreted before you’re consciously aware of it. It decides what things mean, and it does this fast, automatic, and mostly invisible. It’s also where the majority of cognitive exploits actually land, right in this sweet spot between perception and conscious thought.
This is my term for what Daniel Kahneman called System 1 thinking. If the Sensory Interface is the intake port, the NeuroCompiler is what turns that input into “filtered meaning” before the Mind Kernel ever sees it. It takes raw signal (e.g., photons, sound waves, chemical gradients, pressure) and translates it into something actionable based on binary categories like threat or safe, familiar or novel, trustworthy or suspicious.
The speed is both an evolutionary feature and a modern bug. Processing here is fast enough to get you out of the way of a thrown object before you’ve consciously registered it. But “good enough most of the time” means “predictably wrong some of the time….
A critical architectural feature: the NeuroCompiler can route its output directly back to the Sensory Interface and out as behavior, skipping the conscious awareness of the Mind Kernel entirely. Reflex and startle responses use this mechanism, making this bypass pathway enormously useful for survival. Yet it leaves a wide-open backdoor. If the layer that holds access to skepticism and deliberate evaluation can be bypassed completely, a host of exploits become possible that would otherwise fail.
That’s just one of the five levels Melton talks about: sensory interface, neurocompiler, mind kernel, the mesh, and cultural substrate.
Melton’s taxonomy is compelling, and her parallels to IT systems are fascinating. I have long said that a genius idea is one that’s incredibly obvious once you hear it, but one that no one has said before. This is the first time I’ve heard cognition described in this way.
Just as Schneier said: just read the substack post. This framework provides a structure to something that has always been there. In which people and organizations (criminals, magicians, cults, psychologists and so on) already relied upon and exploited to access and change our perception.
Most people know that robots no longer sound like tinny trash cans. They sound like Siri, Alexa, and Gemini. They sound like the voices in labyrinthine customer support phone trees. And even those robot voices are being made obsolete by new AI-generated voices that can mimic every vocal nuance and tic of human speech, down to specific regional accents. And with just a few seconds of audio, AI can now clone someone’s specific voice.
This technology will replace humans in many areas. Automated customer support will save money by cutting staffing at call centers. AI agents will make calls on our behalf, conversing with others in natural language. All of that is happening, and will be commonplace soon.
But there is something fundamentally different about talking with a bot as opposed to a person. A person can be a friend. An AI cannot be a friend, despite how people might treat it or react to it. AI is at best a tool, and at worst a means of manipulation. Humans need to know whether we’re talking with a living, breathing person or a robot with an agenda set by the person who controls it. That’s why robots should sound like robots.
You can’t just label AI-generated speech. It will come in many different forms. So we need a way to recognize AI that works no matter the modality. It needs to work for long or short snippets of audio, even just a second long. It needs to work for any language, and in any cultural context. At the same time, we shouldn’t constrain the underlying system’s sophistication or language complexity.
We have a simple proposal: all talking AIs and robots should use a ring modulator. In the mid-twentieth century, before it was easy to create actual robotic-sounding speech synthetically, ring modulators were used to make actors’ voices sound robotic. Over the last few decades, we have become accustomed to robotic voices, simply because text-to-speech systems were good enough to produce intelligible speech that was not human-like in its sound. Now we can use that same technology to make robotic speech that is indistinguishable from human sound robotic again.
A ring modulator has several advantages: It is computationally simple, can be applied in real-time, does not affect the intelligibility of the voice, and—most importantly—is universally “robotic sounding” because of its historical usage for depicting robots.
Responsible AI companies that provide voice synthesis or AI voice assistants in any form should add a ring modulator of some standard frequency (say, between 30-80 Hz) and of a minimum amplitude (say, 20 percent). That’s it. People will catch on quickly.
Here are a couple of examples you can listen to for examples of what we’re suggesting. The first clip is an AI-generated “podcast” of this article made by Google’s NotebookLM featuring two AI “hosts.” Google’s NotebookLM created the podcast script and audio given only the text of this article. The next two clips feature that same podcast with the AIs’ voices modulated more and less subtly by a ring modulator:
Raw audio sample generated by Google’s NotebookLM
Audio sample with added ring modulator (30 Hz-25%)
Audio sample with added ring modulator (30 Hz-40%)
We were able to generate the audio effect with a 50-line Python script generated by Anthropic’s Claude. One of the most well-known robot voices were those of the Daleks from Doctor Who in the 1960s. Back then robot voices were difficult to synthesize, so the audio was actually an actor’s voice run through a ring modulator. It was set to around 30 Hz, as we did in our example, with different modulation depth (amplitude) depending on how strong the robotic effect is meant to be. Our expectation is that the AI industry will test and converge on a good balance of such parameters and settings, and will use better tools than a 50-line Python script, but this highlights how simple it is to achieve.
Of course there will also be nefarious uses of AI voices. Scams that use voice cloning have been getting easier every year, but they’ve been possible for many years with the right know-how. Just like we’re learning that we can no longer trust images and videos we see because they could easily have been AI-generated, we will all soon learn that someone who sounds like a family member urgently requesting money may just be a scammer using a voice-cloning tool.
We don’t expect scammers to follow our proposal: They’ll find a way no matter what. But that’s always true of security standards, and a rising tide lifts all boats. We think the bulk of the uses will be with popular voice APIs from major companies—and everyone should know that they’re talking with a robot.
This essay was written with Barath Raghavan, and originally appeared in IEEE Spectrum.
While the idea is nice, it needs some improvement. This way of doing it does affect intelligibility when listening at higher speeds. The 20% setting did take away some intelligibility at 1.5x (max that my reader allowed me to go) when the female voice was on - especially when she was speaking fast -, and at 40% it was pretty bad.
Sad news from Ubuntu founder Mark Shuttleworth today: longtime Ubuntu and Debian contributor Steve Langasek has passed away. In a touching post on the Ubuntu Discourse, Mark Shuttleworth shares: “Steve passed away at the dawn of 2025. His time was short but remarkable. He will forever remain an inspiration.” “Judging by the outpouring of feelings this week, he is equally missed and mourned by colleagues and friends across the open source landscape, in particular in Ubuntu and Debian where he was a great mind, mentor and conscience.” As a former Debian and Ubuntu release manager, and a long-term Canonical employee, […]
NIST’s second draft of its “SP 800-63-4“—its digital identify guidelines—finally contains some really good rules about passwords:
The following requirements apply to passwords:
lVerifiers and CSPs SHALL require passwords to be a minimum of eight characters in length and SHOULD require passwords to be a minimum of 15 characters in length.
Verifiers and CSPs SHOULD permit a maximum password length of at least 64 characters.
Verifiers and CSPs SHOULD accept all printing ASCII [RFC20] characters and the space character in passwords.
Verifiers and CSPs SHOULD accept Unicode [ISO/ISC 10646] characters in passwords. Each Unicode code point SHALL be counted as a signgle character when evaluating password length.
Verifiers and CSPs SHALL NOT impose other composition rules (e.g., requiring mixtures of different character types) for passwords.
Verifiers and CSPs SHALL NOT require users to change passwords periodically. However, verifiers SHALL force a change if there is evidence of compromise of the authenticator.
Verifiers and CSPs SHALL NOT permit the subscriber to store a hint that is accessible to an unauthenticated claimant.
Verifiers and CSPs SHALL NOT prompt subscribers to use knowledge-based authentication (KBA) (e.g., “What was the name of your first pet?”) or security questions when choosing passwords.
Verifiers SHALL verify the entire submitted password (i.e., not truncate it).