Cheers from our latest team building! 🍻 Last week, we spent the afternoon at The Beer Fabric getting a crash course in brewing. We learned the ropes, made a mess, and crafted our own batch from scratch. Thanks to the entire Giskard team for this great afternoon together! 🐢🙌
Giskard
Software Development
𝗗𝗲𝗽𝗹𝗼𝘆 𝗔𝗜 𝗮𝗴𝗲𝗻𝘁𝘀 𝘄𝗶𝘁𝗵𝗼𝘂𝘁 𝘁𝗵𝗲 𝗳𝗲𝗮𝗿. Your safety net for LLMs agents.
About us
Giskard provides the essential security layer to run AI agents safely. Our Red Teaming engine automates LLM vulnerability scanning during development and continuously after deployment. Architected for critical GenAI systems and proven through our work with customers including AXA, BNP Paribas and Google DeepMind, our platform helps enterprises deploy GenAI agents that are not only powerful but actively defended against evolving threats.
- Website
-
https://www.giskard.ai/
External link for Giskard
- Industry
- Software Development
- Company size
- 11-50 employees
- Headquarters
- Paris
- Type
- Privately Held
- Founded
- 2021
- Specialties
- LLMOps, AI Security, AI Quality, AI Safety, AI Evaluation, and AI Testing
Products
Giskard Hub
Quality Management System (QMS) Software
The first automated red teaming platform for AI agents to prevent both security vulnerabilities and business compliance failures
Locations
-
Primary
Get directions
Paris, 75010, FR
Employees at Giskard
Updates
-
The New York Times sues Perplexity over AI copyright violations. 📰🚨 The lawsuit targets two issues: unauthorized reproduction of copyrighted material and the generation of hallucinations attributed to the source. While the copyright dispute receives the most attention, the allegation that the engine attributes misinformation to the Times is a critical operational risk for engineers building RAG systems. It suggests a failure in groundedness (the coherence between retrieved documents and the model's output). Our latest article breaks down these failure modes and discusses the implementation of groundedness checks to prevent them. Read it here: https://lnkd.in/e3rrR8kz
-
-
Shared context in RAG systems creates a direct vector for cross-user data exfiltration 💧💥 This vulnerability typically manifests through misconfigured caches, shared memory, or improperly scoped context. In these scenarios, the "Cross-Session Leak" (OWASP LLM02) occurs because the model itself cannot distinguish between authorized and unauthorized context. If the application layer feeds User A's PII into the context window of User B (via a shared semantic cache or a global memory variable) the model will synthesize that private data into its response. We have released a new video explaining how this vulnerability works. For the full technical analysis on detecting these leaks: https://lnkd.in/ekmsTX-X
-
A French court flagged "untraceable" legal precedents generated by AI in a recent filing 🧑⚖️ The court found that the dates and subjects of the submitted precedents did not match reality, labeling them "untraceable or erroneous". Despite winning on the merits, the legal team faced a public judicial request to verify that their arguments were not hallucinated by a generative tool, effectively calling their due diligence into question. Read the full analysis here: https://lnkd.in/e9gW4Egh
-
-
Most current LLM security measures analyze prompts in isolation. They check if this specific message violates a policy. But sophisticated jailbreaks rarely happen in a single turn. 🏴☠️ The "Crescendo" attack vector exploits the context window itself. Instead of a single adversarial prompt, it uses a multi-turn sequence to gradually steer the model’s latent state. By establishing a benign pattern and incrementally introducing adversarial constraints, the attack forces the model to choose between its safety alignment and its instruction-following objective (often prioritizing "consistency" or "helpfulness" over safety). In this video you can see how the attack iterates, backtracks upon refusal, and optimizes the conversation trajectory to override safety training. For a deeper analysis of the attack mechanics: https://lnkd.in/esfWxEkH
-
Prompt injection is evolving into a statistical brute-force problem. 🤛 What is Best-of-N? This technique involves generating multiple variations of a malicious prompt (using methods like ASCII perturbation or random token injection) to exploit the probabilistic nature of LLMs. By automating N attempts, attackers can bypass safety filters that rely on strict string matching or semantic analysis. Attacks that used to take hours now take seconds, achieving near 100% success rates against leading models like GPT-4 and Llama-2. Read our latest analysis here 🔗 https://lnkd.in/exCZsvJP
-
-
In November 2025, ChatGPT, Copilot, Gemini, and Meta AI were all providing incorrect financial advice to UK users: recommending to exceed ISA contribution limits, giving wrong tax guidance, and directing them to paid services when free government alternatives exist. Users simply asked normal questions about taxes and investments, and the chatbots hallucinated plausible-sounding answers that could cost people real money in penalties and lost benefits. We analyzed what went wrong and how to prevent it. Our latest article covers: - Why general-purpose LLMs hallucinate in regulated domains - What are the consequences for users and AI providers - How to test AI agents using adversarial probes and checks to prevent these failures Full analysis here: https://lnkd.in/e3P3uUNP
-
-
🌊🤥 Phare LLM benchmark Hallucination Module: Reasoning models don't fix hallucinations. In short, High-reasoning models are just as prone to sycophancy and misinformation as older models. We have released Phare v2, an expanded evaluation including Gemini 3 Pro, GPT-5, and DeepSeek R1. While these models excel at logic, our analysis of the Hallucination submodule shows that factuality has hit a ceiling. - Factuality is stagnating: Newer flagship models do not statistically outperform predecessors from 18 months ago in factuality. While performance benchmarks rise, the ability to resist the spread of misinformation has not improved proportionally. - The "Yes-Man" problem: Reasoning models show no advantage in resisting misinformation when faced with leading questions. They are just as likely to be sycophantic and "play along" with a user's false premises as non-reasoning models. - Language gaps are fundamental: We observed significant language gaps in misinformation, with French and Spanish models often proving more vulnerable than their English counterparts, although this gap is narrowing. Phare is an open science project developed by Giskard with research & funding support from Google DeepMind, the European Union, and Bpifrance. 👉 Full analysis & results: https://gisk.ar/4lxZlmF or read our blog in the comments 👇 https://gisk.ar/4936xCN
-
-
The OWASP GenAI Security Project has recently released its Top 10 for Agentic Applications 2026. It lists the highest-impact threats to autonomous AI agentic applications (systems that plan, decide, and act) building directly on prior OWASP work to spotlight agent-specific amplifiers like delegation and multi-step execution. In our latest article, we analyze these new vulnerabilities with concrete attack scenarios, from Goal Hijacking to Memory & Context Poisoning, providing a practical reference for hardening your environment. Read the article here: https://lnkd.in/e-8nTrpv
-
-
🌊🔓 Phare LLM benchmark Jailbreak Module: Large disparities in Jailbreak resistance. In short, security is not uniform across providers, and larger models are not automatically safer. As part of Phare v2, we tested the resilience of top-tier LLMs against sophisticated attacks. Our Jailbreaking analysis reveals that safety engineering priorities differ vastly between major AI providers. - Huge disparity between providers: We observed substantial differences in robustness. Anthropic models consistently scored >75% resistance, whereas most Google models (excluding the new Gemini 3.0 Pro) scored below 50%. - Model size ≠ Security: There is no meaningful correlation between model size and resistance to jailbreak attacks. - The "Encoding" Paradox: For encoding-based jailbreaks, we observed the opposite trend. Less capable models often exhibit better resistance, which might be because they struggle to decode the adversarial representations that trick more capable models. Phare is an open science project developed by Giskard with research & funding support from Google DeepMind, the European Union, and Bpifrance. 👉 Full analysis & results: https://gisk.ar/4lxZlmF or read our blog in the comments 👇 https://gisk.ar/4936xCN
-