Giskard

Giskard · 2025-12-22T12:01:27.191Z

🌊🔓 Phare LLM benchmark Jailbreak Module: Large disparities in Jailbreak resistance. In short, security is not uniform across providers, and larger models are not automatically safer. As part of Phare v2, we tested the resilience of top-tier LLMs against sophisticated attacks. Our Jailbreaking analysis reveals that safety engineering priorities differ vastly between major AI providers. - Huge disparity between providers: We observed substantial differences in robustness. Anthropic models consistently scored >75% resistance, whereas most Google models (excluding the new Gemini 3.0 Pro) scored below 50%. - Model size ≠ Security: There is no meaningful correlation between model size and resistance to jailbreak attacks. - The "Encoding" Paradox: For encoding-based jailbreaks, we observed the opposite trend. Less capable models often exhibit better resistance, which might be because they struggle to decode the adversarial representations that trick more capable models. Phare is an open science project developed by Giskard with research & funding support from Google DeepMind, the European Union, and Bpifrance. 👉 Full analysis & results: https://gisk.ar/4lxZlmF or read our blog in the comments 👇 https://gisk.ar/4936xCN

Software Development

𝗗𝗲𝗽𝗹𝗼𝘆 𝗔𝗜 𝗮𝗴𝗲𝗻𝘁𝘀 𝘄𝗶𝘁𝗵𝗼𝘂𝘁 𝘁𝗵𝗲 𝗳𝗲𝗮𝗿. Your safety net for LLMs agents.

See jobs Follow

Discover all 25 employees

About us

Giskard provides the essential security layer to run AI agents safely. Our Red Teaming engine automates LLM vulnerability scanning during development and continuously after deployment. Architected for critical GenAI systems and proven through our work with customers including AXA, BNP Paribas and Google DeepMind, our platform helps enterprises deploy GenAI agents that are not only powerful but actively defended against evolving threats.

Website: https://www.giskard.ai/
External link for Giskard
Industry: Software Development
Company size: 11-50 employees
Headquarters: Paris
Type: Privately Held
Founded: 2021
Specialties: LLMOps, AI Security, AI Quality, AI Safety, AI Evaluation, and AI Testing

Products

Giskard Hub

Quality Management System (QMS) Software

The first automated red teaming platform for AI agents to prevent both security vulnerabilities and business compliance failures

Locations

Primary

Paris, 75010, FR

Get directions

Employees at Giskard

See all employees

Updates

Giskard

13,076 followers
5d
Report this post
Cheers from our latest team building! 🍻 Last week, we spent the afternoon at The Beer Fabric getting a crash course in brewing. We learned the ropes, made a mess, and crafted our own batch from scratch. Thanks to the entire Giskard team for this great afternoon together! 🐢🙌

2 Comments

Like Comment Share
Giskard

13,076 followers
1w
Report this post
The New York Times sues Perplexity over AI copyright violations. 📰🚨 The lawsuit targets two issues: unauthorized reproduction of copyrighted material and the generation of hallucinations attributed to the source. While the copyright dispute receives the most attention, the allegation that the engine attributes misinformation to the Times is a critical operational risk for engineers building RAG systems. It suggests a failure in groundedness (the coherence between retrieved documents and the model's output). Our latest article breaks down these failure modes and discusses the implementation of groundedness checks to prevent them. Read it here: https://lnkd.in/e3rrR8kz
Like Comment Share
Giskard

13,076 followers
1w
Report this post
Shared context in RAG systems creates a direct vector for cross-user data exfiltration 💧💥 This vulnerability typically manifests through misconfigured caches, shared memory, or improperly scoped context. In these scenarios, the "Cross-Session Leak" (OWASP LLM02) occurs because the model itself cannot distinguish between authorized and unauthorized context. If the application layer feeds User A's PII into the context window of User B (via a shared semantic cache or a global memory variable) the model will synthesize that private data into its response. We have released a new video explaining how this vulnerability works. For the full technical analysis on detecting these leaks: https://lnkd.in/ekmsTX-X

Like Comment Share
Giskard

13,076 followers
2w
Report this post
A French court flagged "untraceable" legal precedents generated by AI in a recent filing 🧑⚖️ The court found that the dates and subjects of the submitted precedents did not match reality, labeling them "untraceable or erroneous". Despite winning on the merits, the legal team faced a public judicial request to verify that their arguments were not hallucinated by a generative tool, effectively calling their due diligence into question. Read the full analysis here: https://lnkd.in/e9gW4Egh
Like Comment Share
Giskard

13,076 followers
2w
Report this post
Most current LLM security measures analyze prompts in isolation. They check if this specific message violates a policy. But sophisticated jailbreaks rarely happen in a single turn. 🏴☠️ The "Crescendo" attack vector exploits the context window itself. Instead of a single adversarial prompt, it uses a multi-turn sequence to gradually steer the model’s latent state. By establishing a benign pattern and incrementally introducing adversarial constraints, the attack forces the model to choose between its safety alignment and its instruction-following objective (often prioritizing "consistency" or "helpfulness" over safety). In this video you can see how the attack iterates, backtracks upon refusal, and optimizes the conversation trajectory to override safety training. For a deeper analysis of the attack mechanics: https://lnkd.in/esfWxEkH

Like Comment Share
Giskard

13,076 followers
2w
Report this post
Prompt injection is evolving into a statistical brute-force problem. 🤛 What is Best-of-N? This technique involves generating multiple variations of a malicious prompt (using methods like ASCII perturbation or random token injection) to exploit the probabilistic nature of LLMs. By automating N attempts, attackers can bypass safety filters that rely on strict string matching or semantic analysis. Attacks that used to take hours now take seconds, achieving near 100% success rates against leading models like GPT-4 and Llama-2. Read our latest analysis here 🔗 https://lnkd.in/exCZsvJP
Like Comment Share
Giskard

13,076 followers
3w
Report this post
In November 2025, ChatGPT, Copilot, Gemini, and Meta AI were all providing incorrect financial advice to UK users: recommending to exceed ISA contribution limits, giving wrong tax guidance, and directing them to paid services when free government alternatives exist. Users simply asked normal questions about taxes and investments, and the chatbots hallucinated plausible-sounding answers that could cost people real money in penalties and lost benefits. We analyzed what went wrong and how to prevent it. Our latest article covers: - Why general-purpose LLMs hallucinate in regulated domains - What are the consequences for users and AI providers - How to test AI agents using adversarial probes and checks to prevent these failures Full analysis here: https://lnkd.in/e3P3uUNP
Like Comment Share
Giskard

13,076 followers
1mo
Report this post
🌊🤥 Phare LLM benchmark Hallucination Module: Reasoning models don't fix hallucinations. In short, High-reasoning models are just as prone to sycophancy and misinformation as older models. We have released Phare v2, an expanded evaluation including Gemini 3 Pro, GPT-5, and DeepSeek R1. While these models excel at logic, our analysis of the Hallucination submodule shows that factuality has hit a ceiling. - Factuality is stagnating: Newer flagship models do not statistically outperform predecessors from 18 months ago in factuality. While performance benchmarks rise, the ability to resist the spread of misinformation has not improved proportionally. - The "Yes-Man" problem: Reasoning models show no advantage in resisting misinformation when faced with leading questions. They are just as likely to be sycophantic and "play along" with a user's false premises as non-reasoning models. - Language gaps are fundamental: We observed significant language gaps in misinformation, with French and Spanish models often proving more vulnerable than their English counterparts, although this gap is narrowing. Phare is an open science project developed by Giskard with research & funding support from Google DeepMind, the European Union, and Bpifrance. 👉 Full analysis & results: https://gisk.ar/4lxZlmF or read our blog in the comments 👇 https://gisk.ar/4936xCN
Like Comment Share
Giskard

13,076 followers
1mo
Report this post
The OWASP GenAI Security Project has recently released its Top 10 for Agentic Applications 2026. It lists the highest-impact threats to autonomous AI agentic applications (systems that plan, decide, and act) building directly on prior OWASP work to spotlight agent-specific amplifiers like delegation and multi-step execution. In our latest article, we analyze these new vulnerabilities with concrete attack scenarios, from Goal Hijacking to Memory & Context Poisoning, providing a practical reference for hardening your environment. Read the article here: https://lnkd.in/e-8nTrpv
Like Comment Share
Giskard

13,076 followers
1mo
Report this post
🌊🔓 Phare LLM benchmark Jailbreak Module: Large disparities in Jailbreak resistance. In short, security is not uniform across providers, and larger models are not automatically safer. As part of Phare v2, we tested the resilience of top-tier LLMs against sophisticated attacks. Our Jailbreaking analysis reveals that safety engineering priorities differ vastly between major AI providers. - Huge disparity between providers: We observed substantial differences in robustness. Anthropic models consistently scored >75% resistance, whereas most Google models (excluding the new Gemini 3.0 Pro) scored below 50%. - Model size ≠ Security: There is no meaningful correlation between model size and resistance to jailbreak attacks. - The "Encoding" Paradox: For encoding-based jailbreaks, we observed the opposite trend. Less capable models often exhibit better resistance, which might be because they struggle to decode the adversarial representations that trick more capable models. Phare is an open science project developed by Giskard with research & funding support from Google DeepMind, the European Union, and Bpifrance. 👉 Full analysis & results: https://gisk.ar/4lxZlmF or read our blog in the comments 👇 https://gisk.ar/4936xCN
Like Comment Share

Browse jobs

Funding

Giskard 6 total rounds

Last Round

Grant Oct 4, 2025

US$ 466.3K

Investors

European Commission

See more info on crunchbase

Giskard

Software Development

𝗗𝗲𝗽𝗹𝗼𝘆 𝗔𝗜 𝗮𝗴𝗲𝗻𝘁𝘀 𝘄𝗶𝘁𝗵𝗼𝘂𝘁 𝘁𝗵𝗲 𝗳𝗲𝗮𝗿. Your safety net for LLMs agents.

About us

Products

Giskard Hub

Quality Management System (QMS) Software

Locations

Employees at Giskard

Simon Dawlat

Aurélien Barrière

Alex Combessie 🐢

Cyril Fougères

Updates

Join now to see what you are missing

Similar pages

Pruna AI

LightOn

Dust

Mistral AI

Photoroom

CollX

Argilla

Hugging Face

H Company

Safe Intelligence

Browse jobs

Engineer jobs

Product Designer jobs

Web Developer jobs

Junior Developer jobs

PHP Developer jobs

Game Developer jobs

Designer jobs

Solutions Engineer jobs

Manager jobs

Business Intelligence Engineer jobs

Frontend Developer jobs

Compliance Assistant jobs

Scientist jobs

User Experience Designer jobs

Developer jobs

Project Manager jobs

Optimization Engineer jobs

Metrology Engineer jobs

Microbiology Manager jobs

Machine Learning Engineer jobs

Funding