Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
21 views15 pages

Report Task2

The document outlines the OWASP Top 10 vulnerabilities related to Large Language Models (LLMs), detailing risks such as prompt injection, insecure output handling, and model theft, along with prevention strategies for each. It also compares Guardrail AI and Nemo Guardrail, tools designed to enhance LLM security, highlighting their methodologies, integration capabilities, and scopes. The document emphasizes the importance of robust security measures and guidelines to mitigate risks associated with LLM applications.

Uploaded by

NIKI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views15 pages

Report Task2

The document outlines the OWASP Top 10 vulnerabilities related to Large Language Models (LLMs), detailing risks such as prompt injection, insecure output handling, and model theft, along with prevention strategies for each. It also compares Guardrail AI and Nemo Guardrail, tools designed to enhance LLM security, highlighting their methodologies, integration capabilities, and scopes. The document emphasizes the importance of robust security measures and guidelines to mitigate risks associated with LLM applications.

Uploaded by

NIKI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Problem statement- Research on how to guard the problem related to

LLM.

OWASP(Open Web Application Security Project)-

● It aims to provide practical, actionable, and concise security guidance to help


professionals navigate the complex and evolving terrain of LLM application
security.
● The group’s goals include exploring how conventional vulnerabilities may pose
different risks or be exploited in novel ways within LLMs and how developers
must adapt traditional remediation strategies for applications utilizing LLMs.

OWASP Top 10 for LLM Applications:


LLM01: Prompt Injection -
This manipulates a large language model (LLM) through crafty inputs, causing
unintended actions by the LLM. Direct injections overwrite system prompts, while
indirect ones manipulate inputs from external sources.

1. Description:
● Direct Prompt Injection: Attackers manipulate system prompts, exploiting
insecure functions and data stores.
● Indirect Prompt Injection: Attackers control external inputs, embedding prompt
injections, leading to manipulated conversations.

2. Prevention and Mitigation:

● Enforce privilege control on LLM access to the backend system.


● Incorporate human approval for privileged actions to mitigate unauthorized
behavior.
● Segregate and denote untrusted content from user prompts.
● Establish trust boundaries and maintain user control over decision-making.
● Periodically monitor LLM input and output for unexpected behavior.

LLM02: Insecure Output Handling -


This vulnerability occurs when an LLM output is accepted without scrutiny, exposing
backend systems. Misuse may lead to severe consequences like XSS, CSRF, SSRF,
privilege escalation, or remote code execution.

1. Description:

● Inadequate validation, sanitization, and handling of LLM-generated outputs


before passing them downstream.
● Similar to providing indirect access to additional functionality through prompt
input.
● Risks include XSS, CSRF, SSRF, privilege escalation, and remote code execution.

2. Prevention and Mitigation:

● Treat LLM outputs with a zero-trust approach, validate and sanitize before use.
● Follow OWASP ASVS guidelines for effective input validation.
● Encode LLM output to mitigate undesired code execution.
LLM03: Training Data Poisoning-

This occurs when LLM training data is tampered, introducing vulnerabilities or biases
that compromise security, effectiveness, or ethical behavior. Sources include Common
Crawl, WebText, OpenWebText, & books.

1. Description:

● Manipulation of pre-training, fine-tuning, or embedding data to introduce


vulnerabilities, backdoors, or biases.
● Risks include compromised security, degraded performance, and reputational
damage.
● Attack vectors include falsified documents, biased inputs, and unverified data
sources

2. Prevention and Mitigation:

1. Verify supply chain of training data, maintain ML-BOM attestations, and verify
model cards.
2. Legitimize data sources and content throughout pre-training, fine-tuning, and
embedding stages.
3. Craft separate models for different use cases to ensure accuracy and granularity.
4. Implement network controls and strict vetting/filtering for training data.
5. Use adversarial robustness techniques and auto poisoning approaches.
6. Test and detect poisoning attacks during training, monitoring, and auditing model
behavior.

LLM04: Model Denial of Service-


Attackers cause resource-heavy operations on LLMs, leading to service degradation or
high costs. The vulnerability is magnified due to the resource-intensive nature of LLMs
and unpredictability of user inputs.

Description:

● Attack methods consume excessive resources, leading to service decline for


users.
● Emergent concern: manipulation of LLM context window, impacting system
responsiveness.
● Context window: critical for LLM understanding and processing text.

Prevention and Mitigation:

1. Implement input validation and sanitization.


2. Cap resource use per request or step.
3. Enforce API rate limits and restrict queued actions.
4. Continuously monitor resource utilization.
5. Set strict input limits based on the context window.
6. Promote awareness and provide guidelines for secure LLM implementation.

LLM05: Supply Chain Vulnerabilities-


LLM application lifecycle can be compromised by vulnerable components or services,
leading to security attacks. Using third-party datasets, pre- trained models, and plugins
can add vulnerabilities.

Description:

● Vulnerabilities in LLM supply chain impact training data, ML models, and


deployment platforms.
● Risks: biased outcomes, security breaches, system failures.
● Includes third-party package vulnerabilities, vulnerable pre-trained models, and
unclear T&Cs.

Prevention and Mitigation:

1. Vet data sources and suppliers, ensure alignment with data protection policies.
2. Use reputable plugins, test against LLM-specific insecure plugin design aspects.
3. Apply OWASP A06:2021 controls for vulnerable components.
4. Maintain up-to-date inventory using SBOMs, implement model and code signing.
5. Use anomaly detection and adversarial robustness tests.
6. Implement sufficient monitoring and patching policies, review supplier security
posture.

LLM06: Sensitive Information Disclosure-


LLMs may inadvertently reveal confidential data in its responses, leading to
unauthorized data access, privacy violations, and security breaches. It's crucial to
implement data sanitization and strict user policies to mitigate this.

Description:

● LLM applications can inadvertently expose sensitive data or proprietary


algorithms.
● Risks: unauthorized access, privacy violations, intellectual property breaches.
● Users should be cautious about inputting sensitive data that might be returned in
LLM output.

Prevention and Mitigation:

1. Employ robust data sanitization techniques to prevent user data from entering
the training model.
2. Implement stringent input validation to filter out potential malicious inputs.
3. Apply the rule of least privilege when enriching the model with sensitive data.
● Avoid training the model on information accessible to higher-privileged
users but displayed to lower-privileged ones.
● Limit access to external data sources and maintain a secure supply chain.

LLM07: Insecure Plugin Design


LLM plugins can have insecure inputs and insufficient access control. This lack of
application control makes them easier to exploit and can result in consequences like
remote code execution.

Description:

● LLM plugins are extensions called automatically during user interactions, often
with no control over execution.
● Lack of validation in plugins can lead to a wide range of undesired behaviors,
including remote code execution.
● Insufficient access controls and blind trust between plugins exacerbate the risk
of harmful consequences.

Prevention and Mitigation:


1. Enforce strict parameterized input with type and range checks, or introduce a
second layer of typed calls for validation.
2. Apply OWASP's ASVS recommendations for input validation and sanitization.
3. Thoroughly inspect and test plugins using SAST, DAST, and IAST in development
pipelines.
4. Minimize impact of insecure input exploitation following OWASP ASVS Access
Control Guidelines.
5. Use appropriate authentication identities like OAuth2 for effective authorization
and access control.
6. Require manual user authorization and confirmation for actions by sensitive
plugins.
7. Apply OWASP Top 10 API Security Risks - 2023 recommendations for REST APIs
typically used by plugins.

LLM08: Excessive Agency


LLM-based systems may undertake actions leading to unintended consequences. The
issue arises from excessive functionality, permissions, or autonomy granted to the
LLM-based systems.

Description:

● LLM-based systems are granted a degree of agency to interface with other


systems and undertake actions based on prompts.
● Excessive Agency vulnerability enables damaging actions in response to
unexpected or ambiguous LLM outputs, often due to excessive functionality,
permissions, or autonomy.
● It differs from Insecure Output Handling, which focuses on insufficient scrutiny of
LLM outputs.

Prevention and Mitigation:

1. Limit available plugins to essential functions only, excluding unnecessary


capabilities.
2. Ensure plugins implement only necessary functions, avoiding excessive features.
3. Avoid open-ended functions; use granular plugins with specific functionalities.
4. Restrict permissions granted to plugins to minimize potential damage.
5. Track user authorization and scope, ensuring actions are context-specific and
minimal.
6. Incorporate human-in-the-loop control for user approval before executing
actions.
7. Implement authorization in downstream systems to validate actions against
security policies.
8. Log and monitor plugin and downstream system activities for identifying
undesirable actions.
9. Implement rate-limiting to reduce the impact of undesirable actions and facilitate
detection.

LLM09: Overreliance
Systems or people overly dependent on LLMs without oversight may face
misinformation, miscommunication, legal issues, and security vulnerabilities due to
incorrect or inappropriate content generated by LLMs.

Description:

Overreliance occurs when an LLM provides erroneous information in an authoritative


manner, leading to security breaches, misinformation, legal issues, and reputational
damage. This vulnerability arises when users or systems trust the LLM's output without
proper oversight or confirmation, especially when it generates inaccurate or unsafe
content.

Prevention and Mitigation Strategies:

1. Monitor and review LLM outputs regularly, using self-consistency or voting


techniques to filter out inconsistent text.
2. Cross-check LLM output with trusted external sources to ensure accuracy and
reliability.
3. Enhance the model with fine-tuning or embeddings to improve output quality,
reducing the likelihood of inaccurate information.
4. Implement automatic validation mechanisms to cross-verify generated output
against known facts or data.
5. Break down complex tasks into manageable subtasks to reduce the chances of
hallucinations and hold agents accountable.
6. Clearly communicate risks and limitations associated with LLMs to users to help
them make informed decisions.
7. Build APIs and user interfaces that encourage responsible and safe use of LLMs,
including content filters and warnings about potential inaccuracies.
8. Establish secure coding practices and guidelines when using LLMs in
development environments to prevent integration of vulnerabilities.
LLM10: Model Theft
This involves unauthorized access, copying, or exfiltration of proprietary LLM models.
The impact includes economic losses, compromised competitive advantage, and
potential access to sensitive information.

Description:

● Model Theft: Unauthorized access and exfiltration of LLM models, which are
valuable intellectual property, leading to economic loss, brand reputation
damage, and unauthorized usage/access to sensitive information.
● Models can be compromised, physically stolen, copied, or their weights and
parameters extracted to create functional equivalents.

Prevention and Mitigation Strategies:

1. Implement strong access controls, authentication mechanisms, and RBAC.


2. Restrict LLM's access to network resources and internal services.
3. Monitor and audit access logs regularly to detect suspicious behavior.
4. Automate deployment with governance workflows to tighten access controls.
5. Mitigate prompt injection techniques and implement adversarial robustness
training.
6. Implement watermarking frameworks into the embedding and detection stages
of LLMs lifecycle.
Guardrail AI and Nemo Guardrail-

Difference between Guardrail AI , Nemo guardrail and OWASP

Nemo Guardrail Guardrail AI OWASP


Methodology Programmable Automated tool for Framework outlining
guardrails for LLMs to vulnerability detection common
ensure safety, security, and mitigation. vulnerabilities and
and topical focus mitigation strategies

Integration Integrates with Can be integrated Not a specific tool, but


LangChain, supports anywhere within an rather a set of
various LLMs existing application, guidelines for securing
supports various LLMs LLM applications.

Scope Comprehensive toolkit Comprehensive toolkit Comprehensive


for LLM security, for LLM security, guidelines for securing
includes content includes content LLM applications
moderation, topic moderation, topic against common
guidance, guidance, vulnerabilities like
hallucination hallucination Supply chain
prevention, response prevention, backdoor vulnerabilities,
shaping, attacks, trojan attacks, sensitive information
prompt tampering, and more disclosure, insecure
data leakage, model plugin design,
inversion attacks excessive agency,
model theft

Development Developed by NVIDIA Developed by Developed by OWASP


Guardrails AI

Guardrail AI-
● Guardrails Hub
Guardrails Hub is a collection of pre-built measures of specific types of risks
(called 'validators').

● Validators-
Validators are basic Guardrails components that are used to validate an aspect of an
LLM workflow. Validators can be used to prevent end-users from seeing the results
of faulty or unsafe LLM responses.
Ex- Guardrails Hub | Guardrails AI
● What is Guardrails?

- Guardrails runs Input/Output Guards in your application that detect, quantify and
mitigate the presence of specific types of risks. To look at the full suite of risks,
check out Guardrails Hub.
- Guardrails help you generate structured data from LLMs.

Guardrail_______
|
|
________________________________
| |
| |
Guard Rail
What is a Guard?

The guard object is the main interface for GuardRails. It is seeded with a RailSpec, and
then used to run the GuardRails AI engine. It is the object that accepts changing
prompts, wraps LLM prompts, and keeps track of call history (Serving as a lightweight
wrapper,)

RAIL Specification- RAIL


is a language-agnostic and human-readable format for
specifying specific rules and corrective actions for LLM outputs.
Link

Nemo- Guardrail:-

● NeMo Guardrails will help ensure smart applications powered by large


language models (LLMs) are accurate, appropriate, on topic and secure.

● NeMo Guardrails enables developers to set up three kinds of boundaries:


1. Topical guardrails prevent apps from veering off into undesired areas.
For example, they keep customer service assistants from answering
questions about the weather. (NOT TO GO OFF-TOPIC)
2. Safety guardrails ensure apps respond with accurate, appropriate
information. They can filter out unwanted language and enforce that
references are made only to credible sources.
3. Security guardrails restrict apps to making connections only to external
third-party applications known to be safe.

● Since NeMo Guardrails is open source, it can work with all the tools that
enterprise app developers use.
For example, it can run on top of LangChain, Zapier
Using NeMo, South Korea’s leading mobile operator built an intelligent assistant
that’s had 8 million conversations with its customers. A research team in Sweden
employed NeMo to create LLMs that can automate text functions for the
country’s hospitals, government and business offices.

● New rules can be created simply by few lines of code- video


Ex-
A Simple chatbot which builds on LLM it’s connected to a knowledge base of
Nvedia’s HR benefits.
While asking a financial question, the LLM might make up an answer or it might
give correct information but since this is a chatbot for HR topics, it shouldn’t
provide this information.

Need to add a Topical Guardrail here


● NeMo Guardrails is built on Colang, a modeling language and runtime developed
by NVIDIA for conversational AI. The goal of Colang is to provide a readable and
extensible interface for users to define or control the behavior of conversational
bots with natural language. Website

References-

https://owasp.org/www-project-top-10-for-large-language-model-applications/assets/PDF/O

WASP-Top-10-for-LLMs-2023-v1_1.pdf

https://www.guardrailsai.com/docs/how_to_guides/llm_api_wrappers

https://hub.guardrailsai.com/

https://www.linkedin.com/pulse/large-language-models-ashutosh-kumar-i0udf
https://developer.nvidia.com/blog/nvidia-enables-trustworthy-safe-and-secure-large-
language-model-conversational-systems/?ncid=prsy-552511#cid=dl28_prsy_en-us

https://youtu.be/Hg2KibOvnLM

https://developer.nvidia.com/blog/nvidia-enables-trustworthy-safe-and-secure-large-
language-model-conversational-systems/

You might also like