Cybersecurity

Meet the French startup fixing the guardrail gap holding enterprise AI back

As Europe pushes for sovereign AI infrastructure, Giskard is securing enterprise AI agents against manipulation, unsafe outputs, and malicious prompts.

Cate Lawrence 07 May 2026

Meet the French startup fixing the guardrail gap holding enterprise AI back

As enterprises rapidly adopt agentic AI systems capable of autonomously executing tasks, interacting with tools, and making decisions across workflows, concerns around security, hallucinations, governance, and operational control are becoming a major barrier to deployment.

French company Giskard is launching Giskard Guards, Europe's first independent sovereign guardrail platform for enterprise AI agents.

I spoke to co-CEO and co-founder Jean-Marie John-Mathews to learn more.

Enterprise AI agents face a growing range of risks, from hallucinations and unreliable outputs to security vulnerabilities, unsafe behaviour, regulatory compliance issues, and failures in how agents interact with tools, sensitive data, and real-world operational systems.

Why AI agents remain vulnerable to manipulation

So why does all of this occur?

Well, simply put, AI agents are built on LLMs, and those models undergo training and reinforcement learning to make them helpful.

In other words, training creates compliance, compliance creates vulnerability.

“The challenge is that the model wants to remain helpful even when the request is malicious. That creates a major trade-off in how the systems are trained and fine-tuned,” explained John-Mathews.

Further, many existing AI guardrails are ill-suited to the rise of chatbots and AI agents, as they were originally designed for social media moderation.

“Traditional guardrails were mostly built for content moderation on social networks,” shared John-Mathews.

“With chatbots and AI agents, you’re not just moderating text — you also need to understand the actions the AI is performing.”

A seemingly harmless request could trigger sensitive backend actions.

“A user might say, ‘Please delete the Q2 budget draft,’ but what matters is the underlying function call,” he shared.

“Maybe the user doesn’t have admin privileges. Traditional moderation systems can’t handle that level of operational context.”

Why conventional solutions fail

According to John-Mathews, there are currently two primary solutions to reduce these risks. You can do it offline—meaning large scans where you try to find many vulnerabilities in your agent. Or you can do it online, meaning guardrails that block unsafe responses.

But existing generic LLM guardrails weren’t built for agents. They lack operational and company-specific context, and they can generate large numbers of false positives, even blocking up to 40 per cent of legitimate requests.

Take a banking example. A user writes: “Fraud claim — what should I do?”

A generic guardrail might block that because it sees the word “fraud”. But in context, it’s perfectly legitimate. The user may simply be asking what to do if their card has been compromised.

“Without understanding context, you can’t moderate correctly,” explained John-Mathews.

Further, traditional solutions are often designed around toy benchmarks (like blocking the forgetting of previous instructions) and fail to account for real-world attacks such as multi-step social engineering, context manipulation, or toolchain exploitation. The most common attacks aren't always the most dramatic.

Often, says John-Mathews, they begin with someone simply trying to make an agent step outside its intended role — and succeeding.

“There are things that are really dangerous, and things that are more reputational.” He offered the example, exposed by Som_patel5 on X, in which the poster opened the Chipotle restaurant chatbot to order food and explained that he needed help with a task: specifically, writing a Python script to reverse a linked list.

In doing so, a supposedly limited-purpose enterprise AI assistant still exposed the broader capabilities of its underlying large language model, suggesting weak containment and inadequate guardrails.

While this not only leads to reputational damage, it also suggests that users may be able to steer the system outside its intended scope through prompt manipulation. This could create opportunities for risks such as prompt injection attacks, extraction of hidden system instructions, misuse of connected tools or APIs, and potential access to sensitive backend workflows.

In another incident, the Chevrolet dealership chatbot became a viral example of how poorly constrained AI systems can be manipulated through prompt injection. Users discovered that the ChatGPT-powered bot could be tricked into ignoring its intended role as a car sales assistant and instead follow arbitrary instructions, including agreeing to sell a $76,000 Chevy Tahoe for $1.

From AI testing to real-time agent protection

Giskard’s Guards blocks unsafe answers and behaviours. Guards inspect the full agent execution chain (tool calls, parameter validation, multi-step reasoning) and adapts to each agent's business domain. To achieve this, Giskard builds multiple detectors. Some are extremely fast, almost instant. Others are more contextual and deeper. Each detector maps to a specific action.

If the system detects unsafe behaviour, it can block the response, notify developers, or simply monitor and log the interaction.

Even before today’s launch, the company's platform for testing enterprise AI agents was used by Mistral, Google DeepMind, BNP Paribas, AXA, Doctolib, among others. The tech is suitable for deployment for sensitive agentic workflows in regulated sectors like banking and insurance.

According to John-Mathews:

“Customers kept pushing us past detection. They asked for runtime AI agent protection adapted to their context. Guards closes the loop: detect, then protect and fix vulnerabilities and hallucinations in real time.”

The company also works extensively with healthcare organisations, where AI chatbot failures can quickly become high-risk.

“Their chatbots can sometimes diagnose issues if users provide symptoms,” explained John-Mathews.

“And depending on how the question is framed, the responses can become dangerous.”

Further, conversational AI systems may reinforce harmful beliefs. For example, if a user expresses scepticism about antibiotics, the chatbot can become demagogic and suggest avoiding them. The same thing can happen with vaccines. The problem can also extend to manufacturing environments.

“Imagine someone asking about electrical voltage requirements for a washing machine,” shared John-Mathews.

“If the chatbot gives the wrong answer, that could be a disaster.”

From policy documents to enforceable code

John-Mathews contends that traditional AI governance processes are struggling to keep pace with the rapid deployment cycles of enterprise AI agents.

While compliance teams still rely on manual risk assessments, Word documents, Excel spreadsheets, and annual audit cycles, AI teams are deploying and updating agents weekly — creating what the company describes as a dangerous governance gap that can lead to incidents.

Giskard's response is what John-Mathews calls "policy-as-code" — the idea that governance rules shouldn't live in documents at all, but should be embedded directly into the AI system as machine-enforceable logic. This approach replaces static documentation with machine-enforceable policies, automated compliance checks, Git-based version control, and rapid rollback capabilities, alongside prebuilt policy packs aligned with frameworks such as the EU AI Act and OWASP.

“Whether the rules come from regulators or from internal company guidelines, we make them enforceable through the AI system itself. The system can be hosted directly by the company on European infrastructure and used to enforce safe behaviour.”

Europe's sovereign AI stack is moving from policy to product

The company also sees the rise of agentic AI as part of a wider push toward sovereign European AI infrastructure.

A critical component of Giskard Guards is its position as a genuinely sovereign European AI security stack.

While “EU sovereign AI” has often remained largely political rhetoric, Giskard is attempting to deliver a concrete security-layer product: an independent European provider with its own technology, optimised models, and fully on-premise deployment capabilities.

John-Mathews asserts:

“We want to turn European AI governance into actual products and enforceable systems — not just abstract policies.”

For regulated sectors such as banking, insurance, and healthcare, sovereignty, data control, and deployability are no longer theoretical concerns, but operational requirements.”

Meet the French startup fixing the guardrail gap holding enterprise AI back

Comments

Would you like to write the first comment?
Login to post comments

Meet the French startup fixing the guardrail gap holding enterprise AI back

Why AI agents remain vulnerable to manipulation

Why conventional solutions fail

From AI testing to real-time agent protection

From policy documents to enforceable code

Europe's sovereign AI stack is moving from policy to product

Next news