Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
296 views32 pages

AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenges

This review distinguishes between AI Agents and Agentic AI, providing a conceptual taxonomy, application mapping, and analysis of challenges to clarify their different design philosophies and capabilities. AI Agents are modular systems focused on narrow, task-specific automation, while Agentic AI represents a shift towards multi-agent collaboration and dynamic task decomposition. The paper aims to offer a structured understanding of these technologies, their applications, and the challenges they face, ultimately guiding the development of robust AI-driven systems.

Uploaded by

mikeschuster
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
296 views32 pages

AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenges

This review distinguishes between AI Agents and Agentic AI, providing a conceptual taxonomy, application mapping, and analysis of challenges to clarify their different design philosophies and capabilities. AI Agents are modular systems focused on narrow, task-specific automation, while Agentic AI represents a shift towards multi-agent collaboration and dynamic task decomposition. The paper aims to offer a structured understanding of these technologies, their applications, and the challenges they face, ultimately guiding the development of robust AI-driven systems.

Uploaded by

mikeschuster
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

AI Agents vs.

Agentic AI: A Conceptual


Taxonomy, Applications and Challenges
Ranjan Sapkota∗‡ , Konstantinos I. Roumeliotis† , Manoj Karkee∗‡
∗ CornellUniversity, Department of Environmental and Biological Engineering, USA
† Department of Informatics and Telecommunications, University of the Peloponnese, 22131 Tripoli, Greece

‡ Corresponding authors: [email protected], [email protected]

Abstract—This review critically distinguishes between AI Notably, Castelfranchi [3] laid critical groundwork by intro-
arXiv:2505.10468v1 [cs.AI] 15 May 2025

Agents and Agentic AI, offering a structured conceptual tax- ducing ontological categories for social action, structure, and
onomy, application mapping, and challenge analysis to clarify mind, arguing that sociality emerges from individual agents’
their divergent design philosophies and capabilities. We begin
by outlining the search strategy and foundational definitions, actions and cognitive processes in a shared environment,
characterizing AI Agents as modular systems driven by LLMs with concepts like goal delegation and adoption forming the
and LIMs for narrow, task-specific automation. Generative AI is basis for cooperation and organizational behavior. Similarly,
positioned as a precursor, with AI Agents advancing through tool Ferber [4] provided a comprehensive framework for MAS,
integration, prompt engineering, and reasoning enhancements. defining agents as entities with autonomy, perception, and
In contrast, Agentic AI systems represent a paradigmatic shift
marked by multi-agent collaboration, dynamic task decomposi- communication capabilities, and highlighting their applica-
tion, persistent memory, and orchestrated autonomy. Through tions in distributed problem-solving, collective robotics, and
a sequential evaluation of architectural evolution, operational synthetic world simulations. These early works established
mechanisms, interaction styles, and autonomy levels, we present that individual social actions and cognitive architectures are
a comparative analysis across both paradigms. Application do- fundamental to modeling collective phenomena, setting the
mains such as customer support, scheduling, and data summa-
rization are contrasted with Agentic AI deployments in research stage for modern AI agents. This paper builds on these insights
automation, robotic coordination, and medical decision support. to explore how social action modeling, as proposed in [3], [4],
We further examine unique challenges in each paradigm includ- informs the design of AI agents capable of complex, socially
ing hallucination, brittleness, emergent behavior, and coordina- intelligent interactions in dynamic environments.
tion failure and propose targeted solutions such as ReAct loops, These systems were designed to perform specific tasks with
RAG, orchestration layers, and causal modeling. This work aims
to provide a definitive roadmap for developing robust, scalable, predefined rules, limited autonomy, and minimal adaptability
and explainable AI-driven systems. to dynamic environments. Agent-like systems were primarily
Index Terms—AI Agents, Agentic AI, Autonomy, Reasoning, reactive or deliberative, relying on symbolic reasoning, rule-
Context Awareness, Multi-Agent Systems, Conceptual Taxonomy, based logic, or scripted behaviors rather than the learning-
vision-language model driven, context-aware capabilities of modern AI agents [5], [6].
For instance, expert systems used knowledge bases and infer-
Source:
ence engines to emulate human decision-making in domains
like medical diagnosis (e.g., MYCIN [7]). Reactive agents,
AI Agents Agentic AI
such as those in robotics, followed sense-act cycles based on
hardcoded rules, as seen in early autonomous vehicles like the
Stanford Cart [8]. Multi-agent systems facilitated coordina-
Nov 2022 Nov 2023 Nov 2024 2025
tion among distributed entities, exemplified by auction-based
Fig. 1: Global Google search trends showing rising interest resource allocation in supply chain management [9], [10].
in “AI Agents” and “Agentic AI” since November 2022 Scripted AI in video games, like NPC behaviors in early RPGs,
(ChatGPT Era). used predefined decision trees [11]. Furthermore, BDI (Belief-
Desire-Intention) architectures enabled goal-directed behavior
in software agents, such as those in air traffic control simu-
I. I NTRODUCTION lations [12], [13]. These early systems lacked the generative
capacity, self-learning, and environmental adaptability of mod-
Prior to the widespread adoption of AI agents and agentic ern agentic AI, which leverages deep learning, reinforcement
AI around 2022 (Before ChatGPT Era), the development learning, and large-scale data [14].
of autonomous and intelligent agents was deeply rooted in Recent public and academic interest in AI Agents and Agen-
foundational paradigms of artificial intelligence, particularly tic AI reflects this broader transition in system capabilities.
multi-agent systems (MAS) and expert systems, which em- As illustrated in Figure 1, Google Trends data demonstrates
phasized social action and distributed intelligence [1], [2]. a significant rise in global search interest for both terms
following the emergence of large-scale generative models in tant to delineate the technological and conceptual boundaries
late 2022. This shift is closely tied to the evolution of agent between AI Agents and Agentic AI. While both paradigms
design from the pre-2022 era, where AI agents operated in build upon large LLMs and extend the capabilities of gener-
constrained, rule-based environments, to the post-ChatGPT ative systems, they embody fundamentally different architec-
period marked by learning-driven, flexible architectures [15]– tures, interaction models, and levels of autonomy. AI Agents
[17]. These newer systems enable agents to refine their perfor- are typically designed as single-entity systems that perform
mance over time and interact autonomously with unstructured, goal-directed tasks by invoking external tools, applying se-
dynamic inputs [18]–[20]. For instance, while pre-modern quential reasoning, and integrating real-time information to
expert systems required manual updates to static knowledge complete well-defined functions [17], [37]. In contrast, Agen-
bases, modern agents leverage emergent neural behaviors tic AI systems are composed of multiple, specialized agents
to generalize across tasks [17]. The rise in trend activity that coordinate, communicate, and dynamically allocate sub-
reflects increasing recognition of these differences. Moreover, tasks within a broader workflow [14], [38]. This architec-
applications are no longer confined to narrow domains like tural distinction underpins profound differences in scalability,
simulations or logistics, but now extend to open-world settings adaptability, and application scope.
demanding real-time reasoning and adaptive control. This mo- Understanding and formalizing the taxonomy between these
mentum, as visualized in Figure 1, underscores the significance two paradigms (AI Agents and Agentic AI) is scientifically
of recent architectural advances in scaling autonomous agents significant for several reasons. First, it enables more precise
for real-world deployment. system design by aligning computational frameworks with
The release of ChatGPT in November 2022 marked a pivotal problem complexity ensuring that AI Agents are deployed
inflection point in the development and public perception of for modular, tool-assisted tasks, while Agentic AI is reserved
artificial intelligence, catalyzing a global surge in adoption, for orchestrated multi-agent operations. Moreover, it allows
investment, and research activity [21]. In the wake of this for appropriate benchmarking and evaluation: performance
breakthrough, the AI landscape underwent a rapid transforma- metrics, safety protocols, and resource requirements differ
tion, shifting from the use of standalone LLMs toward more markedly between individual-task agents and distributed agent
autonomous, task-oriented frameworks [22]. This evolution systems. Additionally, clear taxonomy reduces development
progressed through two major post-generative phases: AI inefficiencies by preventing the misapplication of design prin-
Agents and Agentic AI. Initially, the widespread success of ciples such as assuming inter-agent collaboration in a system
ChatGPT popularized Generative Agents, which are LLM- architected for single-agent execution. Without this clarity,
based systems designed to produce novel outputs such as text, practitioners risk both under-engineering complex scenarios
images, and code from user prompts [23], [24]. These agents that require agentic coordination and over-engineering simple
were quickly adopted across applications ranging from con- applications that could be solved with a single AI Agent.
versational assistants (e.g., GitHub Copilot [25]) and content- Since the field of artificial intelligence has seen significant
generation platforms (e.g., Jasper [26]) to creative tools (e.g., advancements, particularly in the development of AI Agents
Midjourney [27]), revolutionizing domains like digital design, and Agentic AI. These terms, while related, refer to distinct
marketing, and software prototyping throughout 2023. concepts with different capabilities and applications. This
Building on this generative foundation, a new class of article aims to clarify the differences between AI Agents and
systems known as AI Agents emerged. These agents en- Agentic AI, providing researchers with a foundational under-
hanced LLMs with capabilities for external tool use, function standing of these technologies. The objective of this study is
calling, and sequential reasoning, enabling them to retrieve to formalize the distinctions, establish a shared vocabulary,
real-time information and execute multi-step workflows au- and provide a structured taxonomy between AI Agents and
tonomously [28], [29]. Frameworks such as AutoGPT [30] Agentic AI that informs the next generation of intelligent agent
and BabyAGI (https://github.com/yoheinakajima/babyagi) ex- design across academic and industrial domains, as illustrated
emplified this transition, showcasing how LLMs could be in Figure 2.
embedded within feedback loops to dynamically plan, act, This review provides a comprehensive conceptual and archi-
and adapt in goal-driven environments [31], [32]. By late tectural analysis of the progression from traditional AI Agents
2023, the field had advanced further into the realm of Agentic to emergent Agentic AI systems. Rather than organizing the
AI complex, multi-agent systems in which specialized agents study around formal research questions, we adopt a sequential,
collaboratively decompose goals, communicate, and coordi- layered structure that mirrors the historical and technical
nate toward shared objectives. Architectures such as CrewAI evolution of these paradigms. Beginning with a detailed de-
demonstrate how these agentic frameworks can orchestrate scription of our search strategy and selection criteria, we
decision-making across distributed roles, facilitating intelligent first establish the foundational understanding of AI Agents
behavior in high-stakes applications including autonomous by analyzing their defining attributes, such as autonomy, reac-
robotics, logistics management, and adaptive decision-support tivity, and tool-based execution. We then explore the critical
[33]–[36]. role of foundational models specifically LLMs and Large
As the field progresses from Generative Agents toward Image Models (LIMs) which serve as the core reasoning and
increasingly autonomous systems, it becomes critically impor- perceptual substrates that drive agentic behavior. Subsequent
and limitations of AI Agents and Agentic AI. The process
is visually summarized in Figure 3, which delineates the
sequential flow of topics explored in this study. The analytical
Autonomy
framework was organized to trace the progression from basic
agentic constructs rooted in LLMs to advanced multi-agent
Interaction orchestration systems. Each step of the review was grounded in
rigorous literature synthesis across academic sources and AI-
powered platforms, enabling a comprehensive understanding
of the current landscape and its emerging trajectories.
AI Agents
& Architecture
The review begins by establishing a foundational under-
Agentic AI standing of AI Agents, examining their core definitions, design
principles, and architectural modules as described in the litera-
ture. These include components such as perception, reasoning,
Scope/ and action selection, along with early applications like cus-
Complexity tomer service bots and retrieval assistants. This foundational
layer serves as the conceptual entry point into the broader
Mechanisms agentic paradigm.
Next, we delve into the role of LLMs as core reasoning
components, emphasizing how pre-trained language models
underpin modern AI Agents. This section details how LLMs,
through instruction fine-tuning and reinforcement learning
Fig. 2: Mindmap of Research Questions relevant to AI from human feedback (RLHF), enable natural language in-
Agents and Agentic AI. Each color-coded branch represents teraction, planning, and limited decision-making capabilities.
a key dimension of comparison: Architecture, Mechanisms, We also identify their limitations, such as hallucinations, static
Scope/Complexity, Interaction, and Autonomy. knowledge, and a lack of causal reasoning.
Building on these foundations, the review proceeds to the
emergence of Agentic AI, which represents a significant con-
sections examine how generative AI systems have served ceptual leap. Here, we highlight the transformation from tool-
as precursors to more dynamic, interactive agents, setting augmented single-agent systems to collaborative, distributed
the stage for the emergence of Agentic AI. Through this ecosystems of interacting agents. This shift is driven by the
lens, we trace the conceptual leap from isolated, single-agent need for systems capable of decomposing goals, assigning
systems to orchestrated multi-agent architectures, highlight- subtasks, coordinating outputs, and adapting dynamically to
ing their structural distinctions, coordination strategies, and changing contexts capabilities that surpass what isolated AI
collaborative mechanisms. We further map the architectural Agents can offer.
evolution by dissecting the core system components of both The next section examines the architectural evolution from
AI Agents and Agentic AI, offering comparative insights into AI Agents to Agentic AI systems, contrasting simple, modular
their planning, memory, orchestration, and execution layers. agent designs with complex orchestration frameworks. We
Building upon this foundation, we review application domains describe enhancements such as persistent memory, meta-agent
spanning customer support, healthcare, research automation, coordination, multi-agent planning loops (e.g., ReAct and
and robotics, categorizing real-world deployments by system Chain-of-Thought prompting), and semantic communication
capabilities and coordination complexity. We then assess key protocols. Comparative architectural analysis is supported with
challenges faced by both paradigms including hallucination, examples from platforms like AutoGPT, CrewAI, and Lang-
limited reasoning depth, causality deficits, scalability issues, Graph.
and governance risks. To address these limitations, we outline Following the architectural exploration, the review presents
emerging solutions such as retrieval-augmented generation, an in-depth analysis of application domains where AI Agents
tool-based reasoning, memory architectures, and simulation- and Agentic AI are being deployed. This includes six key
based planning. The review culminates in a forward-looking application areas for each paradigm, ranging from knowledge
roadmap that envisions the convergence of modular AI Agents retrieval, email automation, and report summarization for AI
and orchestrated Agentic AI in mission-critical domains. Over- Agents, to research assistants, robotic swarms, and strategic
all, this paper aims to provide researchers with a structured business planning for Agentic AI. Use cases are discussed in
taxonomy and actionable insights to guide the design, deploy- the context of system complexity, real-time decision-making,
ment, and evaluation of next-generation agentic systems. and collaborative task execution.
Subsequently, we address the challenges and limitations
A. Methodology Overview inherent to both paradigms. For AI Agents, we focus on issues
This review adopts a structured, multi-stage methodology like hallucination, prompt brittleness, limited planning ability,
designed to capture the evolution, architecture, application, and lack of causal understanding. For Agentic AI, we identify
Foundational
LLMs as Core
Hybrid Literature Search Understanding
Reasoning Components
of AI Agents

Architectural Evolution: Emergence of Applications of


Agents → Agentic AI Agentic AI AI Agents & Agentic AI

Challenges & Limitations


(Agents + Agentic AI)

Potential Solutions:
RAG, Causal
Models, Planning

Fig. 3: Methodology pipeline from foundational AI agents to Agentic AI systems, applications, limitations, and solution
strategies.

higher-order challenges such as inter-agent misalignment, error Planning,” and “AI Agents + Tool Usage + Reasoning”
propagation, unpredictability of emergent behavior, explain- were employed to retrieve papers addressing both conceptual
ability deficits, and adversarial vulnerabilities. These problems underpinnings and system-level implementations. Literature
are critically examined with references to recent experimental inclusion was based on criteria such as novelty, empirical
studies and technical reports. evaluation, architectural contribution, and citation impact. The
Finally, the review outlines potential solutions to over- rising global interest in these technologies illustrated in Fig-
come these challenges, drawing on recent advances in causal ure 1 using Google Trends data reinforces the urgency of
modeling, retrieval-augmented generation (RAG), multi-agent synthesizing this emerging knowledge space.
memory frameworks, and robust evaluation pipelines. These
strategies are discussed not only as technical fixes but as foun- II. F OUNDATIONAL U NDERSTANDING OF AI AGENTS
dational requirements for scaling agentic systems into high- AI Agents are an autonomous software entities engineered
stakes domains such as healthcare, finance, and autonomous for goal-directed task execution within bounded digital envi-
robotics. ronments [14], [39]. These agents are defined by their ability
Taken together, this methodological structure enables a to perceive structured or unstructured inputs [40], reason
comprehensive and systematic assessment of the state of AI over contextual information [41], [42], and initiate actions
Agents and Agentic AI. By sequencing the analysis across toward achieving specific objectives, often acting as surrogates
foundational understanding, model integration, architectural for human users or subsystems [43]. Unlike conventional
growth, applications, and limitations, the study aims to provide automation scripts, which follow deterministic workflows, AI
both theoretical clarity and practical guidance to researchers agents demonstrate reactive intelligence and limited adaptabil-
and practitioners navigating this rapidly evolving field. ity, allowing them to interpret dynamic inputs and reconfigure
1) Search Strategy: To construct this review, we imple- outputs accordingly [44]. Their adoption has been reported
mented a hybrid search methodology combining traditional across a range of application domains, including customer
academic repositories and AI-enhanced literature discovery service automation [45], [46], personal productivity assistance
tools. Specifically, twelve platforms were queried: academic [47], internal information retrieval [48], [49], and decision
databases such as Google Scholar, IEEE Xplore, ACM Dig- support systems [50], [51].
ital Library, Scopus, Web of Science, ScienceDirect, and 1) Overview of Core Characteristics of AI Agents: AI
arXiv; and AI-powered interfaces including ChatGPT, Per- Agents are widely conceptualized as instantiated operational
plexity.ai, DeepSeek, Hugging Face Search, and Grok. Search embodiments of artificial intelligence designed to interface
queries incorporated Boolean combinations of terms such as with users, software ecosystems, or digital infrastructures in
“AI Agents,” “Agentic AI,” “LLM Agents,” “Tool-augmented pursuit of goal-directed behavior [52]–[54]. These agents dis-
LLMs,” and “Multi-Agent AI Systems.” tinguish themselves from general-purpose LLMs by exhibiting
Targeted queries such as “Agentic AI + Coordination + structured initialization, bounded autonomy, and persistent
task orientation. While LLMs primarily function as reactive in automation tasks where general-purpose reasoning is
prompt followers [55], AI Agents operate within explicitly de- unnecessary or inefficient.
fined scopes, engaging dynamically with inputs and producing • Reactivity and Adaptation: AI Agents often include
actionable outputs in real-time environments [56]. basic mechanisms for interacting with dynamic inputs,
allowing them to respond to real-time stimuli such as
user requests, external API calls, or state changes in
AI Agents software environments [17], [60]. Some systems integrate
rudimentary learning [66] through feedback loops [67],
[68], heuristics [69], or updated context buffers to refine
behavior over time, particularly in settings like personal-
ized recommendations or conversation flow management
[70]–[72].
These core characteristics collectively enable AI Agents to
serve as modular, lightweight interfaces between pretrained AI
models and domain-specific utility pipelines. Their architec-
tural simplicity and operational efficiency position them as key
enablers of scalable automation across enterprise, consumer,
Fig. 4: Core characteristics of AI Agents autonomy, task- and industrial settings. While limited in reasoning depth com-
specificity, and reactivity illustrated with symbolic representa- pared to more general AI systems, their high usability and
tions for agent design and operational behavior. performance within constrained task boundaries have made
them foundational components in contemporary intelligent
Figure 4 illustrates the three foundational characteristics that system design.
recur across architectural taxonomies and empirical deploy- 2) Foundational Models: The Role of LLMs and LIMs:
ments of AI Agents. These include autonomy, task-specificity, The foundational progress in AI agents has been significantly
and reactivity with adaptation. First, autonomy denotes the accelerated by the development and deployment of LLMs
agent’s ability to act independently post-deployment, mini- and LIMs, which serve as the core reasoning and perception
mizing human-in-the-loop dependencies and enabling large- engines in contemporary agent systems. These models enable
scale, unattended operation [46], [57]. Second, task-specificity AI agents to interact intelligently with their environments,
encapsulates the design philosophy of AI agents being spe- understand multimodal inputs, and perform complex reasoning
cialized for narrowly scoped tasks allowing high-performance tasks that go beyond hard-coded automation.
optimization within a defined functional domain such as LLMs such as GPT-4 [73] and PaLM [74] are trained on
scheduling, querying, or filtering [58], [59]. Third, reactivity massive datasets of text from books, web content, and dialogue
refers to an agent’s capacity to respond to changes in its corpora. These models exhibit emergent capabilities in natural
environment, including user commands, software states, or language understanding, question answering, summarization,
API responses; when extended with adaptation, this includes dialogue coherence, and even symbolic reasoning [75], [76].
feedback loops and basic learning heuristics [17], [60]. Within AI agent architectures, LLMs serve as the primary
Together, these three traits provide a foundational profile for decision-making engine, allowing the agent to parse user
understanding and evaluating AI Agents across deployment queries, plan multi-step solutions, and generate naturalistic
scenarios. The remainder of this section elaborates on each responses. For instance, an AI customer support agent powered
characteristic, offering theoretical grounding and illustrative by GPT-4 can interpret customer complaints, query backend
examples. systems via tool integration, and respond in a contextually
• Autonomy: A central feature of AI Agents is their appropriate and emotionally aware manner [77].
ability to function with minimal or no human intervention Large Image Models (LIMs) such as CLIP [78] and BLIP-
after deployment [57]. Once initialized, these agents are 2 [79] extend the agent’s capabilities into the visual domain.
capable of perceiving environmental inputs, reasoning Trained on image-text pairs, LIMs enable perception-based
over contextual data, and executing predefined or adaptive tasks including image classification, object detection, and
actions in real-time [17]. Autonomy enables scalable vision-language grounding. These capabilities are increasingly
deployment in applications where persistent oversight is vital for agents operating in domains such as robotics [80],
impractical, such as customer support bots or scheduling autonomous vehicles [81], [82], and visual content moderation
assistants [46], [61]. [83], [84].
• Task-Specificity: AI Agents are purpose-built for narrow, For example, as illustrated in Figure 5 in an autonomous
well-defined tasks [58], [59]. They are optimized to drone agent tasked with inspecting orchards, a LIM can
execute repeatable operations within a fixed domain, such identify diseased fruits or damaged branches by interpreting
as email filtering [62], [63], database querying [64], or live aerial imagery and triggering predefined intervention
calendar coordination [38], [65]. This task specialization protocols. Upon detection, the system autonomously triggers
allows for efficiency, interpretability, and high precision predefined intervention protocols, such as notifying horti-
smart decisions in complex situations.
3) Generative AI as a Precursor: A consistent theme in the
literature is the positioning of generative AI as the foundational
precursor to agentic intelligence. These systems primarily
operate on pretrained LLMs and LIMs, which are optimized
to synthesize novel content text, images, audio, or code
based on input prompts. While highly expressive, generative
models fundamentally exhibit reactive behavior: they produce
output only when explicitly prompted and do not pursue goals
autonomously or engage in self-initiated reasoning [90], [91].
Key Characteristics of Generative AI:
• Reactivity: As non-autonomous systems, generative
models are exclusively input-driven [92], [93]. Their
operations are triggered by user-specified prompts and
they lack internal states, persistent memory, or goal-
Fig. 5: An AI agent–enabled drone autonomously inspects following mechanisms [94]–[96].
• Multimodal Capability: Modern generative systems can
an orchard, identifying diseased fruits and damaged branches
using vision models, and triggers real-time alerts for targeted produce a diverse array of outputs, including coherent
horticultural interventions narratives, executable code, realistic images, and even
speech transcripts. For instance, models like GPT-4 [73],
PaLM-E [97], and BLIP-2 [79] exemplify this capacity,
enabling language-to-image, image-to-text, and cross-
cultural staff or marking the location for targeted treatment modal synthesis tasks.
without requiring human intervention [17], [57]. This work- • Prompt Dependency and Statelessness: Generative sys-
flow exemplifies the autonomy and reactivity of AI agents tems are stateless in that they do not retain context
in agricultural environment and recent literature underscores across interactions unless explicitly provided [98], [99].
the growing sophistication of such drone-based AI agents. Their design lacks intrinsic feedback loops [100], state
Chitra et al. [85] provide a comprehensive overview of AI management [101], [102], or multi-step planning a re-
algorithms foundational to embodied agents, highlighting the quirement for autonomous decision-making and iterative
integration of computer vision, SLAM, reinforcement learning, goal refinement [103], [104].
and sensor fusion. These components collectively support real-
Despite their remarkable generative fidelity, these systems
time perception and adaptive navigation in dynamic envi-
are constrained by their inability to act upon the environment
ronments. Kourav et al. [86] further emphasize the role of
or manipulate digital tools independently. For instance, they
natural language processing and large language models in
cannot search the internet, parse real-time data, or interact
generating drone action plans from human-issued queries,
with APIs without human-engineered wrappers or scaffolding
demonstrating how LLMs support naturalistic interaction and
layers. As such, they fall short of being classified as true
mission planning. Similarly, Natarajan et al. [87] explore deep
AI Agents, whose architectures integrate perception, decision-
learning and reinforcement learning for scene understand-
making, and external tool-use within closed feedback loops.
ing, spatial mapping, and multi-agent coordination in aerial
The limitations of generative AI in handling dynamic tasks,
robotics. These studies converge on the critical importance
maintaining state continuity, or executing multi-step plans led
of AI-driven autonomy, perception, and decision-making in
to the development of tool-augmented systems, commonly
advancing drone-based agents.
referred to as AI Agents [105]. These systems build upon
Importantly, LLMs and LIMs are often accessed via infer- the language processing backbone of LLMs but introduce
ence APIs provided by cloud-based platforms such as OpenAI additional infrastructure such as memory buffers, tool-calling
https://openai.com/, HuggingFace https://huggingface.co/, and APIs, reasoning chains, and planning routines to bridge the
Google Gemini https://gemini.google.com/app. These services gap between passive response generation and active task
abstract away the complexity of model training and fine- completion. This architectural evolution marks a critical shift
tuning, enabling developers to rapidly build and deploy agents in AI system design: from content creation to autonomous
equipped with state-of-the-art reasoning and perceptual abil- utility [106], [107]. The trajectory from generative systems to
ities. This composability accelerates prototyping and allows AI agents underscores a progressive layering of functionality
agent frameworks like LangChain [88] and AutoGen [89] that ultimately supports the emergence of agentic behaviors.
to orchestrate LLM and LIM outputs across task workflows.
In short, foundational models give modern AI agents their A. Language Models as the Engine for AI Agent Progression
basic understanding of language and visuals. Language models The emergence of Ai agent as a transformative paradigm
help them reason with words, and image models help them in artificial intelligence is closely tied to the evolution and
understand pictures-working together, they allow AI to make repurposing of large-scale language models such as GPT-3
[108], Llama [109], T5 [110], Baichuan 2 [111] and GPT3mix by combining reasoning (Chain-of-Thought prompting) and
[112]. A substantial and growing body of research confirms action (tool use), with LLMs alternating between internal
that the leap from reactive generative models to autonomous, cognition and external environment interaction.
goal-directed agents is driven by the integration of LLMs
3) Illustrative Examples and Emerging Capabilities: Tool-
as core reasoning engines within dynamic agentic systems.
augmented LLM agents have demonstrated capabilities across
These models, originally trained for natural language pro-
a range of applications. In AutoGPT [30], the agent may
cessing tasks, are increasingly embedded in frameworks that
plan a product market analysis by sequentially querying the
require adaptive planning [113], [114], real-time decision-
web, compiling competitor data, summarizing insights, and
making [115], [116], and environment-aware behavior [117].
generating a report. In a coding context, tools like GPT-
1) LLMs as Core Reasoning Components:
Engineer combine LLM-driven design with local code exe-
LLMs such as GPT-4 [73], PaLM [74], Claude
cution environments to iteratively develop software artifacts
https://www.anthropic.com/news/claude-3-5-sonnet, and
[127], [128]. In research domains, systems like Paper-QA
LLaMA [109] are pre-trained on massive text corpora using
[129] utilize LLMs to query vectorized academic databases,
self-supervised objectives and fine-tuned using techniques
grounding answers in retrieved scientific literature to ensure
such as Supervised Fine-Tuning (SFT) and Reinforcement
factual integrity.
Learning from Human Feedback (RLHF) [118], [119]. These
models encode rich statistical and semantic knowledge, These capabilities have opened pathways for more robust
allowing them to perform tasks like inference, summarization, behavior of AI agents such as long-horizon planning, cross-
code generation, and dialogue management. In agentic tool coordination, and adaptive learning loops. Nevertheless,
contexts, however, their capabilities are repurposed not the inclusion of tools also introduces new challenges in or-
merely to generate responses, but to serve as cognitive chestration complexity, error propagation, and context window
substrates interpreting user goals, generating action plans, limitations all active areas of research.The progression toward
selecting tools, and managing multi-turn workflows. AI Agents is inseparable from the strategic integration of
Recent work identifies these models as central LLMs as reasoning engines and their augmentation through
to the architecture of contemporary agentic structured tool use. This synergy transforms static language
systems. For instance, AutoGPT [30] and BabyAGI models into dynamic cognitive entities capable of perceiving,
https://github.com/yoheinakajima/babyagi use GPT-4 as planning, acting, and adapting setting the stage for multi-agent
both a planner and executor: the model analyzes high-level collaboration, persistent memory, and scalable autonomy.
objectives, decomposes them into actionable subtasks, invokes Figure 6 illustrates a representative case: a news query agent
external APIs as needed, and monitors progress to determine that performs real-time web search, summarizes retrieved
subsequent actions. In such systems, the LLM operates in a documents, and generates an articulate, context-aware answer.
loop of prompt processing, state updating, and feedback-based Such workflows have been demonstrated in implementations
correction, closely emulating autonomous decision-making. using LangChain, AutoGPT, and OpenAI function-calling
2) Tool-Augmented AI Agents: Enhancing Functionality: paradigms.
To overcome limitations inherent to generative-only systems
such as hallucination, static knowledge cutoffs, and restricted
interaction scopes, researchers have proposed the concept of
tool-augmented LLM agents [120] such as Easytool [121],
Gentopia [122], and ToolFive [123]. These systems integrate
external tools, APIs, and computation platforms into the
agent’s reasoning pipeline, allowing for real-time information
access, code execution, and interaction with dynamic data
environments.
Tool Invocation. When an agent identifies a need that
cannot be addressed through its internal knowledge such as
querying a current stock price, retrieving up-to-date weather
information, or executing a script, it generates a structured
function call or API request [124], [125]. These calls are
typically formatted in JSON, SQL, or Python, depending on
the target service, and routed through an orchestration layer
that executes the task.
Result Integration. Once a response is received from the Fig. 6: Workflow of an AI Agent performing real-time news
tool, the output is parsed and reincorporated into the LLM’s search, summarization, and answer generation, as commonly
context window. This enables the agent to synthesize new described in the literature (e.g., Author, Year).
reasoning paths, update its task status, and decide on the next
step. The ReAct framework [126] exemplifies this architecture
III. T HE E MERGENCE OF AGENTIC AI FROM AI AGENT Figure 7, the left side represents a traditional AI Agent in the
F OUNDATIONS form of a smart thermostat. This standalone agent receives
a user-defined temperature setting and autonomously controls
While AI Agents represent a significant leap in artificial in- the heating or cooling system to maintain the target tempera-
telligence capabilities, particularly in automating narrow tasks ture. While it demonstrates limited autonomy such as learning
through tool-augmented reasoning, recent literature identifies user schedules or reducing energy usage during absence, it
notable limitations that constrain their scalability in complex, operates in isolation, executing a singular, well-defined task
multi-step, or cooperative scenarios [130]–[132]. These con- without engaging in broader environmental coordination or
straints have catalyzed the development of a more advanced goal inference [17], [57].
paradigm: Agentic AI. This emerging class of systems extends In contrast, the right side of Figure 7 illustrates an Agentic
the capabilities of traditional agents by enabling multiple AI system embedded in a comprehensive smart home ecosys-
intelligent entities to collaboratively pursue goals through tem. Here, multiple specialized agents interact synergistically
structured communication [133]–[135], shared memory [136], to manage diverse aspects such as weather forecasting, daily
[137], and dynamic role assignment [14]. scheduling, energy pricing optimization, security monitoring,
1) Conceptual Leap: From Isolated Tasks to Coordinated and backup power activation. These agents are not just reactive
Systems: AI Agents, as explored in prior sections, integrate modules; they communicate dynamically, share memory states,
LLMs with external tools and APIs to execute narrowly scoped and collaboratively align actions toward a high-level system
operations such as responding to customer queries, performing goal (e.g., optimizing comfort, safety, and energy efficiency
document retrieval, or managing schedules. However, as use in real time). For instance, a weather forecast agent might
cases increasingly demand context retention, task interde- signal upcoming heatwaves, prompting early pre-cooling via
pendence, and adaptability across dynamic environments, the solar energy before peak pricing hours, as coordinated by an
single-agent model proves insufficient [138], [139]. energy management agent. Simultaneously, the system might
Agentic AI systems represent an emergent class of intelli- delay high-energy tasks or activate surveillance systems during
gent architectures in which multiple specialized agents collab- occupant absence, integrating decisions across domains. This
orate to achieve complex, high-level objectives. As defined in figure embodies the architectural and functional leap from
recent frameworks, these systems are composed of modular task-specific automation to adaptive, orchestrated intelligence.
agents each tasked with a distinct subcomponent of a broader The AI Agent acts as a deterministic component with limited
goal and coordinated through either a centralized orchestrator scope, while Agentic AI reflects distributed intelligence, char-
or a decentralized protocol [16], [134]. This structure signifies acterized by goal decomposition, inter-agent communication,
a conceptual departure from the atomic, reactive behaviors and contextual adaptation, hallmarks of modern agentic AI
typically observed in single-agent architectures, toward a form frameworks.
of system-level intelligence characterized by dynamic inter- 2) Key Differentiators between AI Agents and Agentic AI:
agent collaboration. To systematically capture the evolution from Generative AI
A key enabler of this paradigm is goal decomposition, to AI Agents and further to Agentic AI, we structure our
wherein a user-specified objective is automatically parsed and comparative analysis around a foundational taxonomy where
divided into smaller, manageable tasks by planning agents Generative AI serves as the baseline. While AI Agents and
[38]. These subtasks are then distributed across the agent Agentic AI represent increasingly autonomous and interactive
network. Multi-step reasoning and planning mechanisms systems, both paradigms are fundamentally grounded in gener-
facilitate the dynamic sequencing of these subtasks, allowing ative architectures, especially LLMs and LIMs. Consequently,
the system to adapt in real time to environmental shifts or each comparative table in this subsection includes Generative
partial task failures. This ensures robust task execution even AI as a reference column to highlight how agentic behavior
under uncertainty [14]. diverges and builds upon generative foundations.
Inter-agent communication is mediated through distributed A set of fundamental distinctions between AI Agents and
communication channels, such as asynchronous messaging Agentic AI particularly in terms of scope, autonomy, architec-
queues, shared memory buffers, or intermediate output ex- tural composition, coordination strategy, and operational com-
changes, enabling coordination without necessitating contin- plexity are synthesized in Table I, derived from close analysis
uous central oversight [14], [140]. Furthermore, reflective of prominent frameworks such as AutoGen [89] and ChatDev
reasoning and memory systems allow agents to store context [142]. These comparisons provide a multi-dimensional view
across multiple interactions, evaluate past decisions, and itera- of how single-agent systems transition into coordinated, multi-
tively refine their strategies [141]. Collectively, these capabili- agent ecosystems. Through the lens of generative capabilities,
ties enable Agentic AI systems to exhibit flexible, adaptive, we trace the increasing sophistication in planning, communica-
and collaborative intelligence that exceeds the operational tion, and adaptation that characterizes the shift toward Agentic
limits of individual agents. AI.
A widely accepted conceptual illustration in the literature While Table I delineates the foundational and operational
delineates the distinction between AI Agents and Agentic AI differences between AI Agents and Agentic AI, a more gran-
through the analogy of smart home systems. As depicted in ular taxonomy is required to understand how these paradigms
Fig. 7: Comparative illustration of AI Agent vs. Agentic AI, synthesizing conceptual distinctions found in the literature (e.g.,
Author, Year). Left: A single-task AI Agent. Right: A multi-agent, collaborative Agentic AI system.

emerge from and relate to broader generative frameworks. AI Agents, Agentic AI, and emerging Generative Agents.
Specifically, the conceptual and cognitive progression from Table III presents key architectural and behavioral attributes
static Generative AI systems to tool-augmented AI Agents, that highlight how each paradigm differs in terms of pri-
and further to collaborative Agentic AI ecosystems, necessi- mary capabilities, planning scope, interaction style, learning
tates an integrated comparative framework. This transition is dynamics, and evaluation criteria. AI Agents are optimized
not merely structural but also functional encompassing how for discrete task execution with limited planning horizons and
initiation mechanisms, memory use, learning capacities, and rely on supervised or rule-based learning mechanisms. In con-
orchestration strategies evolve across the agentic spectrum. trast, Agentic AI systems extend this capacity through multi-
Moreover, recent studies suggest the emergence of hybrid step planning, meta-learning, and inter-agent communication,
paradigms such as ”Generative Agents,” which blend gen- positioning them for use in complex environments requiring
erative modeling with modular task specialization, further autonomous goal setting and coordination. Generative Agents,
complicating the agentic landscape. In order to capture these as a more recent construct, inherit LLM-centric pretraining
nuanced relationships, Table II synthesizes the key conceptual capabilities and excel in producing multimodal content cre-
and cognitive dimensions across four archetypes: Generative atively, yet they lack the proactive orchestration and state-
AI, AI Agents, Agentic AI, and inferred Generative Agents. persistent behaviors seen in Agentic AI systems.
By positioning Generative AI as a baseline technology, this The second table (Table III) provides a process-driven
taxonomy highlights the scientific continuum that spans from comparison across three agent categories: Generative AI,
passive content generation to interactive task execution and AI Agents, and Agentic AI. This framing emphasizes how
finally to autonomous, multi-agent orchestration. This multi- functional pipelines evolve from prompt-driven single-model
tiered lens is critical for understanding both the current ca- inference in Generative AI, to tool-augmented execution in AI
pabilities and future trajectories of agentic intelligence across Agents, and finally to orchestrated agent networks in Agentic
applied and theoretical domains. AI. The structure column underscores this progression: from
To further operationalize the distinctions outlined in Ta- single LLMs to integrated toolchains and ultimately to dis-
ble I, Tables III and II extend the comparative lens to en- tributed multi-agent systems. Access to external data, a key
compass a broader spectrum of agent paradigms including operational requirement for real-world utility, also increases
TABLE I: Key Differences Between AI Agents and Agentic and decision-making granularity across the paradigms. These
AI tables collectively establish a rigorous framework to classify
Feature AI Agents Agentic AI
and analyze agent-based AI systems, laying the groundwork
Autonomous for principled evaluation and future design of autonomous,
software Systems of multiple AI intelligent, and collaborative agents operating at scale.
Definition programs that agents collaborating to Each of the comparative tables presented from Table V
perform specific achieve complex goals.
tasks. through Table IX offers a layered analytical lens to isolate
High autonomy Higher autonomy with the distinguishing attributes of Generative AI, AI Agents, and
Autonomy Level within specific the ability to manage Agentic AI, thereby grounding the conceptual taxonomy in
tasks. multi-step, complex tasks. concrete operational and architectural features. Table V, for
Typically handle Handle complex,
Task
single, specific multi-step tasks requiring
instance, addresses the most fundamental layer of differentia-
Complexity tion: core function and system goal. While Generative AI is
tasks. coordination.
Involve multi-agent narrowly focused on reactive content production conditioned
Operate
Collaboration collaboration and on user prompts, AI Agents are characterized by their ability
independently.
information sharing.
to perform targeted tasks using external tools. Agentic AI,
Learn and adapt Learn and adapt across a
Learning and
within their wider range of tasks and by contrast, is defined by its ability to pursue high-level
Adaptation
specific domain. environments. goals through the orchestration of multiple subagents each
Customer service
Supply chain
addressing a component of a broader workflow. This shift
chatbots, virtual from output generation to workflow execution marks a critical
management, business
Applications assistants,
process optimization, inflection point in the evolution of autonomous systems.
automated
virtual project managers.
workflows. In Table VI, the architectural distinctions are made explicit,
especially in terms of system composition and control logic.
Generative AI relies on a single model with no built-in capabil-
in sophistication, from absent or optional in Generative AI ity for tool use or delegation, whereas AI Agents combine lan-
to modular and coordinated in Agentic AI. Collectively, these guage models with auxiliary APIs and interface mechanisms
comparative views reinforce that the evolution from generative to augment functionality. Agentic AI extends this further by
to agentic paradigms is marked not just by increasing system introducing multi-agent systems where collaboration, memory
complexity but also by deeper integration of autonomy, mem- persistence, and orchestration protocols are central to the
ory, and decision-making across multiple levels of abstraction. system’s operation. This expansion is crucial for enabling
Furthermore, to provide a deeper multi-dimensional un- intelligent delegation, context preservation, and dynamic role
derstanding of the evolving agentic landscape, Tables V assignment capabilities absent in both generative and single-
through IX extend the comparative taxonomy to dissect five agent systems. Likewise in Table VII dives deeper into how
critical dimensions: core function and goal alignment, archi- these systems function operationally, emphasizing differences
tectural composition, operational mechanism, scope and com- in execution logic and information flow. Unlike Generative
plexity, and interaction-autonomy dynamics. These dimensions AI’s linear pipeline (prompt → output), AI Agents implement
serve to not only reinforce the structural differences between procedural mechanisms to incorporate tool responses mid-
Generative AI, AI Agents, and Agentic AI, but also introduce process. Agentic AI introduces recursive task reallocation and
an emergent category Generative Agents representing modular cross-agent messaging, thus facilitating emergent decision-
agents designed for embedded subtask-level generation within making that cannot be captured by static LLM outputs alone.
broader workflows. Table V situates the three paradigms in Table VIII further reinforces these distinctions by mapping
terms of their overarching goals and functional intent. While each system’s capacity to handle task diversity, temporal scale,
Generative AI centers on prompt-driven content generation, and operational robustness. Here, Agentic AI emerges as
AI Agents emphasize tool-based task execution, and Agentic uniquely capable of supporting high-complexity goals that de-
AI systems orchestrate full-fledged workflows. This functional mand adaptive, multi-phase reasoning and execution strategies.
expansion is mirrored architecturally in Table VI, where the Furthermore, Table IX brings into sharp relief the opera-
system design transitions from single-model reliance (in Gen- tional and behavioral distinctions across Generative AI, AI
erative AI) to multi-agent orchestration and shared memory Agents, and Agentic AI, with a particular focus on autonomy
utilization in Agentic AI. Table VII then outlines how these levels, interaction styles, and inter-agent coordination. Gener-
paradigms differ in their workflow execution pathways, high- ative AI systems, typified by models such as GPT-3 [108]
lighting the rise of inter-agent coordination and hierarchical and and DALL·E https://openai.com/index/dall-e-3/, remain
communication as key drivers of agentic behavior. reactive generating content solely in response to prompts
Furthermore, Table VIII explores the increasing scope and without maintaining persistent state or engaging in iterative
operational complexity handled by these systems ranging reasoning. In contrast, AI Agents such as those constructed
from isolated content generation to adaptive, multi-agent col- with LangChain [88] or MetaGPT [143], exhibit a higher
laboration in dynamic environments. Finally, Table IX syn- degree of autonomy, capable of initiating external tool invoca-
thesizes the varying degrees of autonomy, interaction style, tions and adapting behaviors within bounded tasks. However,
TABLE II: Taxonomy Summary of AI Agent Paradigms: Conceptual and Cognitive Dimensions

Conceptual Dimension Generative AI AI Agent Agentic AI Generative Agent


(Inferred)

Initiation Type Prompt-triggered by user or Prompt or goal-triggered Goal-initiated or orchestrated Prompt or system-level trig-
input with tool use task ger
Goal Flexibility (None) fixed per prompt (Low) executes specific goal (High) decomposes and (Low) guided by subtask
adapts goals goal
Temporal Continuity Stateless, single-session out- Short-term continuity within Persistent across workflow Context-limited to subtask
put task stages
Learning/Adaptation Static (pretrained) (Might in future) Tool selec- (Yes) Learns from outcomes Typically static; limited
tion strategies may evolve adaptation
Memory Use No memory or short context Optional memory or tool Shared episodic/task mem- Subtask-local or contextual
window cache ory memory
Coordination Strategy None (single-step process) Isolated task execution Hierarchical or decentralized Receives instructions from
coordination system
System Role Content generator Tool-using task executor Collaborative workflow or- Subtask-level modular gener-
chestrator ator

TABLE III: Key Attributes of AI Agents, Agentic AI, and serve as a planner while another retrieves information and
Generative Agents a third synthesizes a report each communicating through
shared memory buffers and governed by an orchestrator agent
Aspect AI Agent Agentic AI Generative
Agent that monitors dependencies and overall task progression. This
structured coordination allows for more complex goal pur-
Primary Task execution Autonomous Content genera-
Capability goal setting tion suit and flexible behavior in dynamic environments. Such
Planning Single-step Multi-step N/A (content architectures fundamentally shift the locus of intelligence
Horizon only) from single-model outputs to emergent system-level behavior,
Learning Rule-based or su- Reinforcement/meta- Large-scale pre- wherein agents learn, negotiate, and update decisions based on
Mecha- pervised learning training
nism evolving task states. Thus, the comparative taxonomy not only
Interaction Reactive Proactive Creative highlights increasing levels of operational independence but
Style also illustrates how Agentic AI introduces novel paradigms of
Evaluation Accuracy, latency Engagement, Coherence, diver- communication, memory integration, and decentralized con-
Focus adaptability sity
trol, paving the way for the next generation of autonomous
systems with scalable, adaptive intelligence.
TABLE IV: Comparison of Generative AI, AI Agents, and
Agentic AI A. Architectural Evolution: From AI Agents to Agentic AI
Systems
Feature Generative AI AI Agent Agentic AI While both AI Agents and Agentic AI systems are grounded
Core Content genera- Task-specific Complex in modular design principles, Agentic AI significantly extends
Function tion execution using workflow the foundational architecture to support more complex, dis-
tools automation
Mechanism Prompt → LLM Prompt → Tool Goal → Agent
tributed, and adaptive behaviors. As illustrated in Figure 8,
→ Output Call → LLM → Orchestration → the transition begins with core subsystems Perception, Rea-
Output Output soning, and Action, that define traditional AI Agents. Agentic
Structure Single model LLM + tool(s) Multi-agent sys- AI enhances this base by integrating advanced components
tem
such as Specialized Agents, Advanced Reasoning & Plan-
External None (unless Via external APIs Coordinated
Data added) multi-agent ning, Persistent Memory, and Orchestration. The figure further
Access access emphasizes emergent capabilities including Multi-Agent Col-
Key Trait Reactivity Tool-use Collaboration laboration, System Coordination, Shared Context, and Task
Decomposition, all encapsulated within a dotted boundary
that signifies the shift toward reflective, decentralized, and
their autonomy is typically confined to isolated task execution, goal-driven system architectures. This progression marks a
lacking long-term state continuity or collaborative interaction. fundamental inflection point in intelligent agent design. This
Agentic AI systems mark a significant departure from these section synthesizes findings from empirical frameworks such
paradigms by introducing internal orchestration mechanisms as LangChain [88], AutoGPT [89], and TaskMatrix [144],
and multi-agent collaboration frameworks. For example, plat- highlighting this progression in architectural sophistication.
forms like AutoGen [89] and ChatDev [142] exemplify agentic 1) Core Architectural Components of AI Agents: Foun-
coordination through task decomposition, role assignment, dational AI Agents are typically composed of four primary
and recursive feedback loops. In AutoGen, one agent might subsystems: perception, reasoning, action, and learning. These
TABLE V: Comparison by Core Function and Goal

Feature Generative AI AI Agent Agentic AI Generative Agent


(Inferred)

Primary Goal Create novel content based Execute a specific task us- Automate complex work- Perform a specific genera-
on prompt ing external tools flow or achieve high-level tive sub-task
goals
Core Function Content generation (text, Task execution with exter- Workflow orchestration and Sub-task content generation
image, audio, etc.) nal interaction goal achievement within a workflow

TABLE VI: Comparison by Architectural Components

Component Generative AI AI Agent Agentic AI Generative Agent


(Inferred)

Core Engine LLM / LIM LLM Multiple LLMs (potentially LLM


diverse)
Prompts Yes (input trigger) Yes (task guidance) Yes (system goal and agent Yes (sub-task guidance)
tasks)
Tools/APIs No (inherently) Yes (essential) Yes (available to constituent Potentially (if sub-task re-
agents) quires)
Multiple Agents No No Yes (essential; collabora- No (is an individual agent)
tive)
Orchestration No No Yes (implicit or explicit) No (is part of orchestration)

TABLE VII: Comparison by Operational Mechanism

Mechanism Generative AI AI Agent Agentic AI Generative Agent


(Inferred)

Primary Driver Reactivity to prompt Tool calling for task execu- Inter-agent communication Reactivity to input or sub-
tion and collaboration task prompt
Interaction Mode User → LLM User → Agent → Tool User → System → Agents System/Agent → Agent →
Output
Workflow Handling Single generation step Single task execution Multi-step workflow coordi- Single step within workflow
nation
Information Flow Input → Output Input → Tool → Output Input → Agent1 → Agent2 Input (from system/agent)
→ ... → Output → Output

TABLE VIII: Comparison by Scope and Complexity

Aspect Generative AI AI Agent Agentic AI Generative Agent


(Inferred)

Task Scope Single piece of generated Single, specific, defined task Complex, multi-faceted Specific sub-task (often
content goal or workflow generative)
Complexity Low (relative) Medium (integrates tools) High (multi-agent coordina- Low to Medium (one task
tion) component)
Example (Video) Chatbot Tavily Search Agent YouTube-to-Blog Title/Description/Conclusion
Conversion System Generator

TABLE IX: Comparison by Interaction and Autonomy

Feature Generative AI AI Agent Agentic AI Generative Agent


(Inferred)

Autonomy Level Low (requires prompt) Medium (uses tools au- High (manages entire pro- Low to Medium (executes
tonomously) cess) sub-task)
External Interaction None (baseline) Via specific tools or APIs Through multiple Possibly via tools (if
agents/tools needed)
Internal Interaction N/A N/A High (inter-agent) Receives input from system
or agent
Decision Making Pattern selection Tool usage decisions Goal decomposition and as- Best sub-task generation
signment strategy
Agentic AI

AI Agents

Multi-Agent
Collaboration Task-Decomposition

System Coordination
Shared Context

Fig. 8: Illustrating architectural evolution from traditional AI Agents to modern Agentic AI systems. It begins with core
modules Perception, Reasoning, and Action and expands into advanced components including Specialized Agents, Advanced
Reasoning & Planning, Persistent Memory, and Orchestration. The diagram further captures emergent properties such as Multi-
Agent Collaboration, System Coordination, Shared Context, and Task Decomposition, all enclosed within a dotted boundary
signifying layered modularity and the transition to distributed, adaptive agentic AI intelligence.

subsystems form a closed-loop operational cycle, commonly translates inferred decisions into external actions using
referred to as “Understand, Think, Act” from a user interface an action library. These actions may include sending
perspective, or “Input, Processing, Action, Learning” in sys- messages, updating databases, querying APIs, or pro-
tems design literature [14], [145]. ducing structured outputs. Execution is often managed
by middleware like LangChain’s “agent executor,” which
• Perception Module: This subsystem ingests input signals links LLM outputs to tool calls and observes responses
from users (e.g., natural language prompts) or external for subsequent steps [88].
systems (e.g., APIs, file uploads, sensor streams). It is • Basic Learning and Adaptation: Traditional AI Agents
responsible for preprocessing data into a format inter- feature limited learning mechanisms, such as heuristic
pretable by the agent’s reasoning module. For example, parameter adjustment [147], [148] or history-informed
in LangChain-based agents [88], [146], the perception context retention. For instance, agents may use simple
layer handles prompt templating, contextual wrapping, memory buffers to recall prior user inputs or apply
and retrieval augmentation via document chunking and scoring mechanisms to improve tool selection in future
embedding search. iterations.
• Knowledge Representation and Reasoning (KRR)
Module: At the core of the agent’s intelligence lies Customization of these agents typically involves domain-
the KRR module, which applies symbolic, statistical, or specific prompt engineering, rule injection, or workflow tem-
hybrid logic to input data. Techniques include rule-based plates, distinguishing them from hard-coded automation scripts
logic (e.g., if-then decision trees), deterministic workflow by their ability to make context-aware decisions. Systems like
engines, and simple planning graphs. Reasoning in agents ReAct [126] exemplify this architecture, combining reasoning
like AutoGPT [30] is enhanced with function-calling and action in an iterative framework where agents simulate
and prompt chaining to simulate thought processes (e.g., internal dialogue before selecting external actions.
“step-by-step” prompts or intermediate tool invocations). 2) Architectural Enhancements in Agentic AI: Agentic AI
• Action Selection and Execution Module: This module systems inherit the modularity of AI Agents but extend
their architecture to support distributed intelligence, inter- human supervision.
agent communication, and recursive planning. The literature
IV. A PPLICATION OF AI AGENTS AND AGENTIC AI
documents a number of critical architectural enhancements
that differentiate Agentic AI from its predecessors [149], To illustrate the real-world utility and operational diver-
[150]. gence between AI Agents and Agentic AI systems, this study
synthesizes a range of applications drawn from recent litera-
• Ensemble of Specialized Agents: Rather than operating ture, as visualized in Figure 9. We systematically categorize
as a monolithic unit, Agentic AI systems consist of and analyze application domains across two parallel tracks:
multiple agents, each assigned a specialized function e.g., conventional AI Agent systems and their more advanced
a summarizer, a retriever, a planner. These agents inter- Agentic AI counterparts. For AI Agents, four primary use
act via communication channels (e.g., message queues, cases are reviewed: (1) Customer Support Automation and
blackboards, or shared memory). For instance MetaGPT Internal Enterprise Search, where single-agent models handle
[143] exemplify this approach by modeling agents after structured queries and response generation; (2) Email Filtering
corporate departments (e.g., CEO, CTO, engineer), where and Prioritization, where agents assist users in managing
roles are modular, reusable, and role-bound. high-volume communication through classification heuristics;
• Advanced Reasoning and Planning: Agentic systems (3) Personalized Content Recommendation and Basic Data
embed recursive reasoning capabilities using frameworks Reporting, where user behavior is analyzed for automated
such as ReAct [126], Chain-of-Thought (CoT) prompting insights; and (4) Autonomous Scheduling Assistants, which
[151], and Tree of Thoughts [152]. These mechanisms interpret calendars and book tasks with minimal user input.
allow agents to break down a complex task into multiple In contrast, Agentic AI applications encompass broader and
reasoning stages, evaluate intermediate results, and re- more dynamic capabilities, reviewed through four additional
plan actions dynamically. This enables the system to categories: (1) Multi-Agent Research Assistants that retrieve,
respond adaptively to uncertainty or partial failure. synthesize, and draft scientific content collaboratively; (2)
• Persistent Memory Architectures: Unlike traditional Intelligent Robotics Coordination, including drone and multi-
agents, Agentic AI incorporates memory subsystems to robot systems in fields like agriculture and logistics; (3)
persist knowledge across task cycles or agent sessions Collaborative Medical Decision Support, involving diagnostic,
[153], [154]. Memory types include episodic memory treatment, and monitoring subsystems; and (4) Multi-Agent
(task-specific history) [155], [156], semantic memory Game AI and Adaptive Workflow Automation, where decen-
(long-term facts or structured data) [157], [158], and tralized agents interact strategically or handle complex task
vector-based memory for retrieval-augmented generation pipelines.
(RAG) [159], [160]. For example, AutoGen [89] agents 1) Application of AI Agents:
maintain scratchpads for intermediate computations, en-
1) Customer Support Automation and Internal Enter-
abling stepwise task progression.
prise Search: AI Agents are widely adopted in en-
• Orchestration Layers / Meta-Agents: A key innovation
terprise environments for automating customer support
in Agentic AI is the introduction of orchestrators meta-
and facilitating internal knowledge retrieval. In cus-
agents that coordinate the lifecycle of subordinate agents,
tomer service, these agents leverage retrieval-augmented
manage dependencies, assign roles, and resolve conflicts.
LLMs interfaced with APIs and organizational knowl-
Orchestrators often include task managers, evaluators, or
edge bases to answer user queries, triage tickets, and
moderators. In ChatDev [142], for example, a virtual
perform actions like order tracking or return initia-
CEO meta-agent distributes subtasks to departmental
tion [46]. For internal enterprise search, agents built
agents and integrates their outputs into a unified strategic
on vector stores (e.g., Pinecone, Elasticsearch) retrieve
response.
semantically relevant documents in response to natu-
These enhancements collectively enable Agentic AI to sup- ral language queries. Tools such as Salesforce Ein-
port scenarios that require sustained context, distributed labor, stein https://www.salesforce.com/artificial-intelligence/,
multi-modal coordination, and strategic adaptation. Use cases Intercom Fin https://www.intercom.com/fin, and Notion
range from research assistants that retrieve, summarize, and AI https://www.notion.com/product/ai demonstrate how
draft documents in tandem (e.g., AutoGen pipelines [89]) structured input processing and summarization capabil-
to smart supply chain agents that monitor logistics, vendor ities reduce workload and improve enterprise decision-
performance, and dynamic pricing models in parallel. making.
The shift from isolated perception–reasoning–action loops A practical example (Figure 10a) of this dual func-
to collaborative and reflective multi-agent workflows marks a tionality can be seen in a multinational e-commerce
key inflection point in the architectural design of intelligent company deploying an AI Agent-based customer support
systems. This progression positions Agentic AI as the next and internal search assistant. For customer support, the
stage of AI infrastructure capable not only of executing AI Agent integrates with the company’s CRM (e.g.,
predefined workflows but also of constructing, revising, and Salesforce) and fulfillment APIs to resolve queries such
managing complex objectives across agents with minimal as “Where is my order?” or “How can I return this
Customer Support
Automation and Multi-Agent
Internal Enterprise Research Assistants
Search

Email Filtering and Intelligent Robotics


Prioritization Coordination

Collaborative
Personalized Content
Medical Decision
Recommendation,
Support
Basic Data Analysis
and Reporting Multi-Agent Game
AI & Adaptive
Autonomous Workflow
Scheduling Automation
Assistants

Fig. 9: Categorized applications of AI Agents and Agentic AI across eight core functional domains.

item?” Within milliseconds, the agent retrieves contex- enhance efficiency, while embedded feedback loops en-
tual data from shipping databases and policy repos- able personalization through incremental learning [61].
itories, then generates a personalized response using Figure10b illustrates a practical implementation of AI
retrieval-augmented generation. For internal enterprise Agents in the domain of email filtering and prioriti-
search, employees use the same system to query past zation. In modern workplace environments, users are
meeting notes, sales presentations, or legal documents. inundated with high volumes of email, leading to cog-
When an HR manager types “summarize key benefits nitive overload and missed critical communications. AI
policy changes from last year,” the agent queries a Agents embedded in platforms like Microsoft Outlook
Pinecone vector store embedded with enterprise doc- or Superhuman act as intelligent intermediaries that
umentation, ranks results by semantic similarity, and classify, cluster, and triage incoming messages. These
returns a concise summary along with source links. agents evaluate metadata (e.g., sender, subject line) and
These capabilities not only reduce ticket volume and semantic content to detect urgency, extract actionable
support overhead but also minimize time spent searching items, and suggest smart replies. As depicted, the AI
for institutional knowledge. The result is a unified, agent autonomously categorizes emails into tags such
responsive system that enhances both external service as “Urgent,” “Follow-up,” and “Low Priority,” while
delivery and internal operational efficiency using mod- also offering context-aware summaries and reply drafts.
ular AI Agent architectures. Through continual feedback loops and usage patterns,
2) Email Filtering and Prioritization: Within productivity the system adapts to user preferences, gradually refining
tools, AI Agents automate email triage through content classification thresholds and improving prioritization ac-
classification and prioritization. Integrated with systems curacy. This automation offloads decision fatigue, allow-
like Microsoft Outlook and Superhuman, these agents ing users to focus on high-value tasks, while maintain-
analyze metadata and message semantics to detect ur- ing efficient communication management in fast-paced,
gency, extract tasks, and recommend replies. They apply information-dense environments.
user-tuned filtering rules, behavioral signals, and intent 3) Personalized Content Recommendation and Basic
classification to reduce cognitive overload. Autonomous Data Reporting: AI Agents support adaptive personal-
actions, such as auto-tagging or summarizing threads, ization by analyzing behavioral patterns for news, prod-
and content ranking. Simultaneously, AI Agents in an-
alytics systems (e.g., Tableau Pulse, Power BI Copi-
lot) enable natural-language data queries and automated
report generation by converting prompts to structured
database queries and visual summaries, democratizing
business intelligence access.
A practical illustration (Figure 10c) of AI Agents in
personalized content recommendation and basic data
reporting can be found in e-commerce and enterprise
(a) analytics systems. Consider an AI agent deployed on a
retail platform like Amazon: as users browse, click, and
purchase items, the agent continuously monitors inter-
action patterns such as dwell time, search queries, and
purchase sequences. Using collaborative filtering and
content-based ranking, the agent infers user intent and
dynamically generates personalized product suggestions
that evolve over time. For example, after purchasing
gardening tools, a user may be recommended compat-
ible soil sensors or relevant books. This level of per-
(b) sonalization enhances customer engagement, increases
conversion rates, and supports long-term user retention.
Simultaneously, within a corporate setting, an AI agent
integrated into Power BI Copilot allows non-technical
staff to request insights using natural language, for
instance, “Compare Q3 and Q4 sales in the Northeast.”
The agent translates the prompt into structured SQL
queries, extracts patterns from the database, and outputs
a concise visual summary or narrative report. This
application reduces dependency on data analysts and
empowers broader business decision-making through
(c) intuitive, language-driven interfaces.
4) Autonomous Scheduling Assistants: AI Agents in-
tegrated with calendar systems autonomously manage
meeting coordination, rescheduling, and conflict reso-
lution. Tools like x.ai and Reclaim AI interpret vague
scheduling commands, access calendar APIs, and iden-
tify optimal time slots using learned user preferences.
They minimize human input while adapting to dynamic
availability constraints. Their ability to interface with
enterprise systems and respond to ambiguous instruc-
(d) tions highlights the modular autonomy of contemporary
scheduling agents.
Fig. 10: Applications of AI Agents in enterprise settings: (a) A practical application of autonomous scheduling agents
Customer support and internal enterprise search; (b) Email can be seen in corporate settings as depicted in Fig-
filtering and prioritization; (c) Personalized content recom- ure 10d where employees manage multiple overlapping
mendation and basic data reporting; and (d) Autonomous responsibilities across global time zones. Consider an
scheduling assistants. Each example highlights modular AI executive assistant AI agent integrated with Google
Agent integration for automation, intent understanding, and Calendar and Slack that interprets a command like “Find
adaptive reasoning across operational workflows and user- a 45-minute window for a follow-up with the product
facing systems. team next week.” The agent parses the request, checks
availability for all participants, accounts for time zone
differences, and avoids meeting conflicts or working-
hour violations. If it identifies a conflict with a pre-
uct, or media recommendations. Platforms like Amazon, viously scheduled task, it may autonomously propose
YouTube, and Spotify deploy these agents to infer user alternative windows and notify affected attendees via
preferences via collaborative filtering, intent detection, Slack integration. Additionally, the agent learns from
historical user preferences such as avoiding early Friday synthesizers, and citation formatters under a central
meetings and refines its suggestions over time. Tools orchestrator. The orchestrator distributes tasks, manages
like Reclaim AI and Clockwise exemplify this capabil- role dependencies, and integrates outputs into coherent
ity, offering calendar-aware automation that adapts to drafts or review summaries. Persistent memory allows
evolving workloads. Such assistants reduce coordination for cross-agent context sharing and refinement over
overhead, increase scheduling efficiency, and enable time. These systems are being used for literature re-
smoother team workflows by proactively resolving am- views, grant preparation, and patent search pipelines,
biguity and optimizing calendar utilization. outperforming single-agent systems such as ChatGPT by
enabling concurrent sub-task execution and long-context
TABLE X: Representative AI Agents (2023–2025): Applica- management [89].
tions and Operational Characteristics For example, a real-world application of agentic AI as
Model / Reference Application Operation as AI Agent depicted in Figure 11a is in the automated drafting of
Area grant proposals. Consider a university research group
ChatGPT Deep Re- Research Analy- Synthesizes hundreds of preparing a National Science Foundation (NSF) sub-
search Mode sis / Reporting sources into reports; functions
OpenAI (2025) Deep as a self-directed research mission. Using an AutoGen-based architecture, distinct
Research OpenAI analyst. agents are assigned: one retrieves prior funded proposals
Operator Web Automation Navigates websites, fills forms, and extracts structural patterns; another scans recent
OpenAI (2025) Opera- and completes online tasks au- literature to summarize related work; a third agent aligns
tor OpenAI tonomously.
Agentspace: Deep Re- Enterprise Generates business
proposal objectives with NSF solicitation language; and
search Agent Reporting intelligence reports using a formatting agent structures the document per com-
Google (2025) Google Gemini models. pliance guidelines. The orchestrator coordinates these
Agentspace
agents, resolving dependencies (e.g., aligning methodol-
NotebookLM Plus Knowledge Man- Summarizes, organizes, and
Agent agement retrieves data across Google
ogy with objectives) and ensuring stylistic consistency
Google (2025) Workspace apps. across sections. Persistent memory modules store evolv-
NotebookLM ing drafts, feedback from collaborators, and funding
Nova Act Workflow Automates browser-based agency templates, enabling iterative improvement over
Amazon (2025) Ama- Automation tasks such as scheduling, HR
zon Nova requests, and email. multiple sessions. Compared to traditional manual pro-
Manus Agent Personal Task Executes trip planning, site cesses, this multi-agent system significantly accelerates
Monica (2025) Manus Automation building, and product compar- drafting time, improves narrative cohesion, and ensures
Agenthttps://manus.im/ isons via browsing. regulatory alignment offering a scalable, adaptive ap-
Harvey Legal Automates document drafting,
Harvey AI (2025) Har- Automation legal review, and predictive
proach to collaborative scientific writing in academia
vey case analysis. and R&D-intensive industries.
Otter Meeting Agent Meeting Transcribes meetings and pro- 2) Intelligent Robotics Coordination: In robotics and
Otter.ai (2025) Otter Management vides highlights, summaries, automation, Agentic AI underpins collaborative behav-
and action items.
ior in multi-robot systems. Each robot operates as a
Otter Sales Agent Sales Analyzes sales calls, extracts
Otter.ai (2025) Otter Enablement insights, and suggests follow- task specialized agent such as pickers, transporters, or
sales agent ups. mappers while an orchestrator supervises and adapts
ClickUp Brain Project Manage- Automates task tracking, up- workflows. These architectures rely on shared spatial
ClickUp (2025) ment dates, and project workflows. memory, real-time sensor fusion, and inter-agent syn-
ClickUp Brain
Agentforce Customer Routes tickets and generates
chronization for coordinated physical actions. Use cases
Agentforce (2025) Support context-aware replies for sup- include warehouse automation, drone-based orchard in-
Agentforce port teams. spection, and robotic harvesting [143]. For instance,
Microsoft Copilot Office Productiv- Automates writing, formula agricultural drone swarms may collectively map tree
Microsoft (2024) Mi- ity generation, and summarization
crosoft Copilot in Microsoft 365. rows, identify diseased fruits, and initiate mechanical
Project Astra Multimodal As- Processes text, image, audio, interventions. This dynamic allocation enables real-time
Google DeepMind sistance and video for task support and reconfiguration and autonomy across agents facing un-
(2025) Project Astra recommendations. certain or evolving environments.
Claude 3.5 Agent Enterprise Assis- Uses multimodal input for rea- For example, in commercial apple orchards (Figure 11b),
Anthropic (2025) tance soning, personalization, and
Claude 3.5 Sonnet enterprise task completion. Agentic AI enables a coordinated multi-robot system
to optimize the harvest season. Here, task-specialized
2) Appications of Agentic AI: robots such as autonomous pickers, fruit classifiers,
1) Multi-Agent Research Assistants: Agentic AI systems transport bots, and drone mappers operate as agentic
are increasingly deployed in academic and industrial units under a central orchestrator. The mapping drones
research pipelines to automate multi-stage knowledge first survey the orchard and use vision-language models
work. Platforms like AutoGen and CrewAI assign spe- (VLMs) to generate high-resolution yield maps and
cialized roles to multiple agents retrievers, summarizers, identify ripe clusters. This spatial data is shared via a
Using Agentic AI to
coordinate robotic harvest

Central Memory Layer


Retrieve prior
proposals Align with
solicitation

Structure the Store evolving


document drafts

(a)
(b)

Goal
Module

Memory
Store

(c) (d)
Fig. 11: Illustrative Applications of Agentic AI Across Domains: Figure 11 presents four real-world applications of agentic AI
systems. (a) Automated grant writing using multi-agent orchestration for structured literature analysis, compliance alignment,
and document formatting. (b) Coordinated multi-robot harvesting in apple orchards using shared spatial memory and task-
specific agents for mapping, picking, and transport. (c) Clinical decision support in hospital ICUs through synchronized agents
for diagnostics, treatment planning, and EHR analysis, enhancing safety and workflow efficiency. (d) Cybersecurity incident
response in enterprise environments via agents handling threat classification, compliance analysis, and mitigation planning.
In all cases, central orchestrators manage inter-agent communication, shared memory enables context retention, and feedback
mechanisms drive continual learning. These use cases highlight agentic AI’s capacity for scalable, autonomous task coordination
in complex, dynamic environments across science, agriculture, healthcare, and IT security.

centralized memory layer accessible by all agents. Picker terrain changes. All agents communicate asynchronously
robots are assigned to high-density zones, guided by through a shared protocol, and the orchestrator contin-
path-planning agents that optimize routes around obsta- uously adjusts task priorities based on weather fore-
cles and labor zones. Simultaneously, transport agents casts or mechanical faults. If one picker fails, nearby
dynamically shuttle crates between pickers and storage, units autonomously reallocate workload. This adaptive,
adjusting tasks in response to picker load levels and memory-driven coordination exemplifies Agentic AI’s
potential to reduce labor costs, increase harvest effi- threat is detected such as abnormal access patterns
ciency, and respond to uncertainties in complex agricul- or unauthorized data exfiltration specialized agents are
tural environments far surpassing the rigid programming activated in parallel. One agent performs real-time threat
of legacy agricultural robots [89], [143]. classification using historical breach data and anomaly
3) Collaborative Medical Decision Support: In high- detection models. A second agent queries relevant log
stakes clinical environments, Agentic AI enables dis- data from network nodes and correlates patterns across
tributed medical reasoning by assigning tasks such as systems. A third agent interprets compliance frameworks
diagnostics, vital monitoring, and treatment planning to (e.g., GDPR or HIPAA) to assess the regulatory sever-
specialized agents. For example, one agent may retrieve ity of the event. A fourth agent simulates mitigation
patient history, another validates findings against diag- strategies and forecasts operational risks. These agents
nostic guidelines, and a third proposes treatment options. coordinate under a central orchestrator that evaluates
These agents synchronize through shared memory and collective outputs, integrates temporal reasoning, and
reasoning chains, ensuring coherent, safe recommenda- issues recommended actions to human analysts. Through
tions. Applications include ICU management, radiology shared memory structures and iterative feedback, the
triage, and pandemic response. Real-world pilots show system learns from prior incidents, enabling faster and
improved efficiency and decision accuracy compared to more accurate responses in future cases. Compared
isolated expert systems [87]. to traditional rule-based security systems, this agentic
For example, in a hospital ICU (Figure 11c), an agentic model enhances decision latency, reduces false positives,
AI system supports clinicians in managing complex and supports proactive threat containment in large-scale
patient cases. A diagnostic agent continuously ana- organizational infrastructures [89].
lyzes vitals and lab data for early detection of sepsis
risk. Simultaneously, a history retrieval agent accesses V. C HALLENGES AND L IMITATIONS IN AI AGENTS AND
electronic health records (EHRs) to summarize comor- AGENTIC AI
bidities and recent procedures. A treatment planning To systematically understand the operational and theoret-
agent cross-references current symptoms with clinical ical limitations of current intelligent systems, we present a
guidelines (e.g., Surviving Sepsis Campaign), proposing comparative visual synthesis in Figure 12, which categorizes
antibiotic regimens or fluid protocols. The orchestra- challenges and potential remedies across both AI Agents and
tor integrates these insights, ensures consistency, and Agentic AI paradigms. Figure 12a outlines the four most
surfaces conflicts for human review. Feedback from pressing limitations specific to AI Agents namely, lack of
physicians is stored in a persistent memory module, causal reasoning, inherited LLM constraints (e.g., hallucina-
allowing agents to refine their reasoning based on prior tions, shallow reasoning), incomplete agentic properties (e.g.,
interventions and outcomes. This coordinated system autonomy, proactivity), and failures in long-horizon planning
enhances clinical workflow by reducing cognitive load, and recovery. These challenges often arise due to their reliance
shortening decision times, and minimizing oversight on stateless LLM prompts, limited memory, and heuristic
risks. Early deployments in critical care and oncology reasoning loops.
units have demonstrated increased diagnostic precision In contrast, Figure 12b identifies eight critical bottlenecks
and better adherence to evidence-based protocols, offer- unique to Agentic AI systems, such as inter-agent error cas-
ing a scalable solution for safer, real-time collaborative cades, coordination breakdowns, emergent instability, scala-
medical support. bility limits, and explainability issues. These challenges stem
4) Multi-Agent Game AI and Adaptive Workflow Au- from the complexity of orchestrating multiple agents across
tomation: In simulation environments and enterprise distributed tasks without standardized architectures, robust
systems, Agentic AI facilitates decentralized task exe- communication protocols, or causal alignment frameworks.
cution and emergent coordination. Game platforms like Figure 13 complements this diagnostic framework by syn-
AI Dungeon deploy independent NPC agents with goals, thesizing ten forward-looking design strategies aimed at mit-
memory, and dynamic interactivity to create emergent igating these limitations. These include Retrieval-Augmented
narratives and social behavior. In enterprise workflows, Generation (RAG), tool-based reasoning [120], [121], [123],
systems such as MultiOn and Cognosys use agents to agentic feedback loops (ReAct [126]), role-based multi-agent
manage processes like legal review or incident esca- orchestration, memory architectures, causal modeling, and
lation, where each step is governed by a specialized governance-aware design. Together, these three panels offer
module. These architectures exhibit resilience, exception a consolidated roadmap for addressing current pitfalls and
handling, and feedback-driven adaptability far beyond accelerating the development of safe, scalable, and context-
rule-based pipelines. aware autonomous systems.
For example, in a modern enterprise IT environment 1) Challenges and Limitations of AI Agents: While AI
(as depicted in Figure 11d), Agentic AI systems are Agents have garnered considerable attention for their ability to
increasingly deployed to autonomously manage cyber- automate structured tasks using LLMs and tool-use interfaces,
security incident response workflows. When a potential the literature highlights significant theoretical and practical
(a) (b)
Fig. 12: Challenges and Solutions Across Agentic Paradigms. (a) Key limitations of AI Agents including causality deficits and
shallow reasoning. (b) Amplified coordination and stability challenges in Agentic AI systems.

limitations that inhibit their reliability, generalization, and dency to produce hallucinations plausible but factually
long-term autonomy [126], [150]. These challenges arise from incorrect outputs. In high-stakes domains such as legal
both the architectural dependence on static, pretrained models consultation or scientific research, these hallucinations
and the difficulty of instilling agentic qualities such as causal can lead to severe misjudgments and erode user trust
reasoning, planning, and robust adaptation. The key challenges [174], [175]. Compounding this is the well-documented
and limitations (Figure 12a) of AI Agents are as summarized prompt sensitivity of LLMs, where even minor varia-
into following five points: tions in phrasing can lead to divergent behaviors. This
brittleness hampers reproducibility, necessitating metic-
1) Lack of Causal Understanding: One of the most foun-
ulous manual prompt engineering and often requiring
dational challenges lies in the agents’ inability to reason
domain-specific tuning to maintain consistency across
causally [164], [165]. Current LLMs, which form the
interactions [176].
cognitive core of most AI Agents, excel at identifying
Furthermore, while recent agent frameworks adopt rea-
statistical correlations within training data. However, as
soning heuristics like Chain-of-Thought (CoT) [151],
noted in recent research from DeepMind and conceptual
[177] and ReAct [126] to simulate deliberative pro-
analyses by TrueTheta, they fundamentally lack the
cesses, these approaches remain shallow in semantic
capacity for causal modeling distinguishing between
comprehension. Agents may still fail at multi-step in-
mere association and cause-effect relationships [166]–
ference, misalign task objectives, or make logically
[168]. For instance, while an LLM-powered agent might
inconsistent conclusions despite the appearance of struc-
learn that visiting a hospital often co-occurs with illness,
tured reasoning [126]. Such shortcomings underscore
it cannot infer whether the illness causes the visit or vice
the absence of genuine understanding and generalizable
versa, nor can it simulate interventions or hypothetical
planning capabilities.
changes.
Another key limitation lies in computational cost and
This deficit becomes particularly problematic under
latency. Each cycle of agentic decision-making partic-
distributional shifts, where real-world conditions differ
ularly in planning or tool-calling may require several
from the training regime [169], [170]. Without such
LLM invocations. This not only increases runtime la-
grounding, agents remain brittle, failing in novel or
tency but also scales resource consumption, creating
high-stakes scenarios. For example, a navigation agent
practical bottlenecks in real-world deployments and
that excels in urban driving may misbehave in snow or
cloud-based inference systems. Furthermore, LLMs have
construction zones if it lacks an internal causal model
a static knowledge cutoff and cannot dynamically in-
of road traction or spatial occlusion.
tegrate new information unless explicitly augmented
2) Inherited Limitations from LLMs: AI Agents, particu-
via retrieval or tool plugins. They also reproduce the
larly those powered by LLMs, inherit a number of intrin-
biases of their training datasets, which can manifest as
sic limitations that impact their reliability, adaptability,
culturally insensitive or skewed responses [178], [179].
and overall trustworthiness in practical deployments
Without rigorous auditing and mitigation strategies,
[171]–[173]. One of the most prominent issues is the ten-
TABLE XI: Representative Agentic AI Models (2023–2025): minimal oversight once initialized, they remain heavily
Applications and Operational Characteristics reliant on external scaffolding such as human-defined
Model / Reference Application Operation as Agentic AI prompts, planning heuristics, or feedback loops to func-
Area tion effectively [180]. Self-initiated task generation, self-
Auto-GPT Task Automation Decomposes high-level monitoring, or autonomous error correction are rare or
[30] goals, executes subtasks
via tools/APIs, and
absent, limiting their capacity for true independence.
iteratively self-corrects. Proactivity is similarly underdeveloped. Most AI Agents
GPT Engineer Code Generation Builds entire codebases: require explicit user instruction to act and lack the capac-
Open Source (2023) plans, writes, tests, and re- ity to formulate or reprioritize goals dynamically based
GPT Engineer fines based on output.
on contextual shifts or evolving objectives [181]. As a
MetaGPT Software Collab- Coordinates specialized
[143]) oration agents (e.g., coder, tester) result, they behave reactively rather than strategically,
for modular multi-role constrained by the static nature of their initialization. Re-
project development. activity itself is constrained by architectural bottlenecks.
BabyAGI Project Manage- Continuously creates, pri-
Nakajima (2024) ment oritizes, and executes sub-
Agents do respond to environmental or user input, but
BabyAGI tasks to adaptively meet response latency caused by repeated LLM inference calls
user goals. [182], [183], coupled with narrow contextual memory
Voyager Game Learns in Minecraft, in- windows [153], [184], inhibits real-time adaptability.
Wang et al. (2023) Exploration vents new skills, sets sub-
[161] goals, and adapts strategy Perhaps the most underexplored capability is social
in real time. ability. True agentic systems should communicate and
CAMEL Multi-Agent Simulates agent societies coordinate with humans or other agents over extended
Liu et al. (2023) [162] Simulation with communication, ne- interactions, resolving ambiguity, negotiating tasks, and
gotiation, and emergent
collaborative behavior. adapting to social norms.
Einstein Copilot Customer Automates full support However, existing implementations exhibit brittle,
Salesforce (2024) Ein- Automation workflows, escalates is- template-based dialogue that lacks long-term memory
stein Copilot sues, and improves via integration or nuanced conversational context. Agent-
feedback loops.
Copilot Studio Productivity Au- Manages documents,
to-agent interaction is often hardcoded or limited to
(Agentic Mode) tomation meetings, and projects scripted exchanges, hindering collaborative execution
Microsoft (2025) across Microsoft 365 with and emergent behavior [96], [185]. Collectively, these
Github Agentic adaptive orchestration.
Copilot
deficiencies reveal that while AI Agents demonstrate
Atera AI Copilot IT Operations Diagnoses/resolves IT is- functional intelligence, they remain far from meeting the
Atera (2025) Atera sues, automates ticketing, formal benchmarks of intelligent, interactive, and adap-
Agentic AI and learns from evolving tive agents. Bridging this gap is essential for advancing
infrastructures.
toward more autonomous, socially capable AI systems.
AES Safety Audit Industrial Safety Automates audits,
Agent assesses compliance, 4) Limited Long-Horizon Planning and Recovery: A
AES (2025) AES and evolves strategies to persistent limitation of current AI Agents lies in their
agentic enhance safety outcomes. inability to perform robust long-horizon planning, es-
DeepMind Gato General Robotics Performs varied tasks pecially in complex, multi-stage tasks. This constraint
(Agentic Mode) across modalities,
Reed et al. (2022) dynamically learns, stems from their foundational reliance on stateless
[163] plans, and executes. prompt-response paradigms, where each decision is
GPT-4o + Plugins Enterprise Manages complex work- made without an intrinsic memory of prior reasoning
OpenAI (2024) GPT- Automation flows, integrates external
4O Agentic tools, and executes adap-
steps unless externally managed. Although augmenta-
tive decisions. tions such as the ReAct framework [126] or Tree-
of-Thoughts [152] introduce pseudo-recursive reason-
ing, they remain fundamentally heuristic and lack true
these issues pose serious ethical and operational risks, internal models of time, causality, or state evolution.
particularly when agents are deployed in sensitive or Consequently, agents often falter in tasks requiring ex-
user-facing contexts. tended temporal consistency or contingency planning.
3) Incomplete Agentic Properties: A major limitation For example, in domains such as clinical triage or
of current AI Agents is their inability to fully satisfy financial portfolio management, where decisions depend
the canonical agentic properties defined in foundational on prior context and dynamically unfolding outcomes,
literature, such as autonomy, proactivity, reactivity, and agents may exhibit repetitive behaviors such as endlessly
social ability [135], [173]. While many systems mar- querying tools or fail to adapt when sub-tasks fail or
keted as ”agents” leverage LLMs to perform useful return ambiguous results. The absence of systematic
tasks, they often fall short of these fundamental cri- recovery mechanisms or error detection leads to brittle
teria in practice. Autonomy, for instance, is typically workflows and error propagation. This shortfall severely
partial at best. Although agents can execute tasks with limits agent deployment in mission-critical environments
where reliability, fault tolerance, and sequential coher- inaccuracies and corrupting subsequent decisions. For
ence are essential. example, if a verification agent erroneously validates
5) Reliability and Safety Concerns: AI Agents are not false information, downstream agents such as summariz-
yet safe or verifiable enough for deployment in critical ers or decision-makers may unknowingly build upon that
infrastructure [186]. The absence of causal reasoning misinformation, compromising the integrity of the entire
leads to unpredictable behavior under distributional shift system. This fragility underscores the urgent need for
[165], [187]. Furthermore, evaluating the correctness of integrating causal inference and intervention modeling
an agent’s plan especially when the agent fabricates into the design of multi-agent workflows, especially in
intermediate steps or rationales remains an unsolved high-stakes or dynamic environments where systemic
problem in interpretability [104], [188]. Safety guaran- robustness is essential.
tees, such as formal verification, are not yet available 2) Communication and Coordination Bottlenecks: A
for open-ended, LLM-powered agents. While AI Agents fundamental challenge in Agentic AI lies in achieving
represent a major step beyond static generative models, efficient communication and coordination across mul-
their limitations in causal reasoning, adaptability, robust- tiple autonomous agents. Unlike single-agent systems,
ness, and planning restrict their deployment in high- Agentic AI involves distributed agents that must col-
stakes or dynamic environments. Most current systems lectively pursue a shared objective necessitating precise
rely on heuristic wrappers and brittle prompt engineering alignment, synchronized execution, and robust commu-
rather than grounded agentic cognition. Bridging this nication protocols. However, current implementations
gap will require future systems to integrate causal mod- fall short in these aspects. One major issue is goal
els, dynamic memory, and verifiable reasoning mech- alignment and shared context, where agents often lack
anisms. These limitations also set the stage for the a unified semantic understanding of overarching objec-
emergence of Agentic AI systems, which attempt to tives. This hampers sub-task decomposition, dependency
address these bottlenecks through multi-agent collabo- management, and progress monitoring, especially in
ration, orchestration layers, and persistent system-level dynamic environments requiring causal awareness and
context. temporal coherence.
In addition, protocol limitations significantly hinder
2) Challenges and Limitations of Agentic AI: Agentic AI
inter-agent communication. Most systems rely on nat-
systems represent a paradigm shift from isolated AI agents to
ural language exchanges over loosely defined interfaces,
collaborative, multi-agent ecosystems capable of decomposing
which are prone to ambiguity, inconsistent formatting,
and executing complex goals [14]. These systems typically
and contextual drift. These communication gaps lead
consist of orchestrated or communicating agents that interact
to fragmented strategies, delayed coordination, and de-
via tools, APIs, and shared environments [18], [38]. While
graded system performance. Furthermore, resource con-
this architectural evolution enables more ambitious automa-
tention emerges as a systemic bottleneck when agents
tion, it introduces a range of amplified and novel challenges
simultaneously access shared computational, memory,
that compound existing limitations of individual LLM-based
or API resources. Without centralized orchestration or
agents. The current challenges and limitations of Agentic AI
intelligent scheduling mechanisms, these conflicts can
are as follows:
result in race conditions, execution delays, or outright
1) Amplified Causality Challenges: One of the most system failures. Collectively, these bottlenecks illustrate
critical limitations in Agentic AI systems is the magni- the immaturity of current coordination frameworks in
fication of causality deficits already observed in single- Agentic AI, and highlight the pressing need for stan-
agent architectures. Unlike traditional AI Agents that dardized communication protocols, semantic task plan-
operate in relatively isolated environments, Agentic AI ners, and global resource managers to ensure scalable,
systems involve complex inter-agent dynamics, where coherent multi-agent collaboration.
each agent’s action can influence the decision space of 3) Emergent Behavior and Predictability: One of the
others. Without a robust capacity for modeling cause- most critical limitations of Agentic AI lies in managing
effect relationships, these systems struggle to coordinate emergent behavior complex system-level phenomena
effectively and adapt to unforeseen environmental shifts. that arise from the interactions of autonomous agents.
A key manifestation of this challenge is inter-agent While such emergence can potentially yield adaptive and
distributional shift, where the behavior of one agent innovative solutions, it also introduces significant unpre-
alters the operational context for others. In the absence dictability and safety risks [145], [190]. A key concern
of causal reasoning, agents are unable to anticipate the is the generation of unintended outcomes, where agent
downstream impact of their outputs, resulting in coor- interactions result in behaviors that were not explicitly
dination breakdowns or redundant computations [189]. programmed or foreseen by system designers. These
Furthermore, these systems are particularly vulnerable to behaviors may diverge from task objectives, generate
error cascades: a faulty or hallucinated output from one misleading outputs, or even enact harmful actions par-
agent can propagate through the system, compounding ticularly in high-stakes domains like healthcare, finance,
or critical infrastructure. becomes exceedingly difficult. The lack of shared, trans-
As the number of agents and the complexity of their parent logs or interpretable reasoning paths across agents
interactions grow, so too does the likelihood of system makes it nearly impossible to determine why a particular
instability. This includes phenomena such as infinite sequence of actions occurred or which agent initiated a
planning loops, action deadlocks, and contradictory misstep.
behaviors emerging from asynchronous or misaligned Compounding this opacity is the absence of formal
agent decisions. Without centralized arbitration mecha- verification tools tailored for Agentic AI. Unlike tra-
nisms, conflict resolution protocols, or fallback strate- ditional software systems, where model checking and
gies, these instabilities compound over time, making formal proofs offer bounded guarantees, there exists
the system fragile and unreliable. The stochasticity and no widely adopted methodology to verify that a multi-
opacity of large language model-based agents further agent LLM system will perform reliably across all input
exacerbate this issue, as their internal decision logic is distributions or operational contexts. This lack of verifia-
not easily interpretable or verifiable. Consequently, en- bility presents a significant barrier to adoption in safety-
suring the predictability and controllability of emergent critical domains such as autonomous vehicles, finance,
behavior remains a central challenge in designing safe and healthcare, where explainability and assurance are
and scalable Agentic AI systems. non-negotiable. To advance Agentic AI safely, future
4) Scalability and Debugging Complexity: As Agen- research must address the foundational gaps in causal
tic AI systems scale in both the number of agents traceability, agent accountability, and formal safety guar-
and the diversity of specialized roles, maintaining sys- antees.
tem reliability and interpretability becomes increas- 6) Security and Adversarial Risks: Agentic AI architec-
ingly complex [191], [192]. A central limitation stems tures introduce a significantly expanded attack surface
from the black-box chains of reasoning characteristic compared to single-agent systems, exposing them to
of LLM-based agents. Each agent may process inputs complex adversarial threats. One of the most critical
through opaque internal logic, invoke external tools, vulnerabilities lies in the presence of a single point of
and communicate with other agents all of which occur compromise. Since Agentic AI systems are composed of
through multiple layers of prompt engineering, reason- interdependent agents communicating over shared mem-
ing heuristics, and dynamic context handling. Tracing ory or messaging protocols, the compromise of even
the root cause of a failure thus requires unwinding one agent through prompt injection, model poisoning,
nested sequences of agent interactions, tool invocations, or adversarial tool manipulation can propagate malicious
and memory updates, making debugging non-trivial and outputs or corrupted state across the entire system. For
time-consuming. example, a fact-checking agent fed with tampered data
Another significant constraint is the system’s non- could unintentionally legitimize false claims, which are
compositionality. Unlike traditional modular systems, then integrated into downstream reasoning by summa-
where adding components can enhance overall func- rization or decision-making agents.
tionality, introducing additional agents in an Agentic Moreover, inter-agent dynamics themselves are suscepti-
AI architecture often increases cognitive load, noise, ble to exploitation. Attackers can induce race conditions,
and coordination overhead. Poorly orchestrated agent deadlocks, or resource exhaustion by manipulating the
networks can result in redundant computation, contradic- coordination logic between agents. Without rigorous
tory decisions, or degraded task performance. Without authentication, access control, and sandboxing mech-
robust frameworks for agent role definition, communica- anisms, malicious agents or corrupted tool responses
tion standards, and hierarchical planning, the scalability can derail multi-agent workflows or cause erroneous
of Agentic AI does not necessarily translate into greater escalation in task pipelines. These risks are exacerbated
intelligence or robustness. These limitations highlight by the absence of standardized security frameworks for
the need for systematic architectural controls and trace- LLM-based multi-agent systems, leaving most current
ability tools to support the development of reliable, implementations defenseless against sophisticated multi-
large-scale agentic ecosystems. stage attacks. As Agentic AI moves toward broader
5) Trust, Explainability, and Verification: Agentic AI adoption, especially in high-stakes environments, em-
systems pose heightened challenges in explainability and bedding secure-by-design principles and adversarial ro-
verifiability due to their distributed, multi-agent architec- bustness becomes an urgent research imperative.
ture. While interpreting the behavior of a single LLM- 7) Ethical and Governance Challenges: The distributed
powered agent is already non-trivial, this complexity is and autonomous nature of Agentic AI systems intro-
multiplied when multiple agents interact asynchronously duces profound ethical and governance concerns, par-
through loosely defined communication protocols. Each ticularly in terms of accountability, fairness, and value
agent may possess its own memory, task objective, and alignment. In multi-agent settings, accountability gaps
reasoning path, resulting in compounded opacity where emerge when multiple agents interact to produce an
tracing the causal chain of a final decision or failure outcome, making it difficult to assign responsibility
for errors or unintended consequences. This ambiguity ence integration, and benchmark development. Only
complicates legal liability, regulatory compliance, and by addressing these deficiencies can the field progress
user trust, especially in domains such as healthcare, from prototype pipelines to trustworthy, general-purpose
finance, or defense. Furthermore, bias propagation and agentic frameworks suitable for deployment in high-
amplification present a unique challenge: agents in- stakes environments.
dividually trained on biased data may reinforce each
VI. P OTENTIAL S OLUTIONS AND F UTURE ROADMAP
other’s skewed decisions through interaction, leading to
systemic inequities that are more pronounced than in The potential solutions (as illustrated in Figure 13) to these
isolated models. These emergent biases can be subtle challenges and limitations of AI agents and Agentic AI are
and difficult to detect without longitudinal monitoring summarized in the following points:
or audit mechanisms. 1) Retrieval-Augmented Generation (RAG): For AI
Additionally, misalignment and value drift pose serious Agents, Retrieval-Augmented Generation mitigates hal-
risks in long-horizon or dynamic environments. With- lucinations and expands static LLM knowledge by
out a unified framework for shared value encoding, grounding outputs in real-time data [195]. By embed-
individual agents may interpret overarching objectives ding user queries and retrieving semantically relevant
differently or optimize for local goals that diverge from documents from vector databases like FAISS Faiss or
human intent. Over time, this misalignment can lead Pinecone Pinecone, agents can generate contextually
to behavior that is inconsistent with ethical norms or valid responses rooted in external facts. This is par-
user expectations. Current alignment methods, which are ticularly effective in domains such as enterprise search
mostly designed for single-agent systems, are inadequate and customer support, where accuracy and up-to-date
for managing value synchronization across heteroge- knowledge are essential.
neous agent collectives. These challenges highlight the In Agentic AI systems, RAG serves as a shared ground-
urgent need for governance-aware agent architectures, ing mechanism across agents. For example, a summa-
incorporating principles such as role-based isolation, rizer agent may rely on the retriever agent to access
traceable decision logging, and participatory oversight the latest scientific papers before generating a synthesis.
mechanisms to ensure ethical integrity in autonomous Persistent, queryable memory allows distributed agents
multi-agent systems. to operate on a unified semantic layer, mitigating in-
8) Immature Foundations and Research Gaps: Despite consistencies due to divergent contextual views. When
rapid progress and high-profile demonstrations, Agentic implemented across a multi-agent system, RAG helps
AI remains in a nascent research stage with unresolved maintain shared truth, enhances goal alignment, and
foundational issues that limit its scalability, reliability, reduces inter-agent misinformation propagation.
and theoretical grounding. A central concern is the 2) Tool-Augmented Reasoning (Function Calling): AI
lack of standard architectures. There is currently no Agents benefit significantly from function calling, which
widely accepted blueprint for how to design, monitor, extends their ability to interact with real-world systems
or evaluate multi-agent systems built on LLMs . This [159], [196]. Agents can query APIs, run local scripts,
architectural fragmentation makes it difficult to compare or access structured databases, thus transforming LLMs
implementations, replicate experiments, or generalize from static predictors into interactive problem-solvers
findings across domains. Key aspects such as agent [125], [154]. This allows them to dynamically retrieve
orchestration, memory structures, and communication weather forecasts, schedule appointments, or execute
protocols are often implemented ad hoc, resulting in Python-based calculations, all beyond the capabilities of
brittle systems that lack interoperability and formal pure language modeling.
guarantees. For Agentic AI, function calling supports agent level
Equally critical is the absence of causal foundations as autonomy and role differentiation. Agents within a team
scalable causal discovery and reasoning remain unsolved may use APIs to invoke domain-specific actions such as
challenges [193]. Without the ability to represent and querying clinical databases or generating visual charts
reason about cause-effect relationships, Agentic AI sys- based on assigned roles. Function calls become part
tems are inherently limited in their capacity to generalize of an orchestrated pipeline, enabling fluid delegation
safely beyond narrow training regimes [170], [194]. across agents [197]. This structured interaction reduces
This shortfall affects their robustness under distributional ambiguity in task handoff and fosters clearer behavioral
shifts, their capacity for proactive intervention, and boundaries, especially when integrated with validation
their ability to simulate counterfactuals or hypothetical protocols or observation mechanisms [14], [18].
plans core requirements for intelligent coordination and 3) Agentic Loop: Reasoning, Action, Observation: AI
decision-making. Agents often suffer from single-pass inference limita-
The gap between functional demos and principled de- tions. The ReAct pattern introduces an iterative loop
sign thus underscores an urgent need for foundational where agents reason about tasks, act by calling tools
research in multi-agent system theory, causal infer- or APIs, and then observe results before continuing.
Tool-Augmented Agentic Loop: Memory Architectures Multi-Agent
Retrieval-Augmented
Reasoning (Function Reasoning, Action, (Episodic, Semantic, Orchestration with
Generation (RAG)
Calling) Observation Vector) Role Specialization

Monitoring, Governance-Aware
Reflexive and Self- Causal Modeling Auditing, and Architectures
Programmatic Prompt
Critique Mechanisms and Simulation- Explainability (Accountability +
Engineering Pipelines
Based Planning Pipelines Role Isolation)

Fig. 13: Ten emerging architectural and algorithmic solutions such as RAG, tool use, memory, orchestration, and reflexive
mechanisms addressing reliability, scalability, and explainability across both paradigms

This feedback loop allows for more deliberate, context- 5) Multi-Agent Orchestration with Role Specialization:
sensitive behaviors. For example, an agent may verify In AI Agents, task complexity is often handled via mod-
retrieved data before drafting a summary, thereby re- ular prompt templates or conditional logic. However,
ducing hallucination and logical errors. In Agentic AI, as task diversity increases, a single agent may become
this pattern is critical for collaborative coherence. ReAct overloaded [200], [201]. Role specialization splitting
enables agents to evaluate dependencies dynamically tasks into subcomponents (e.g., planner, summarizer) al-
reasoning over intermediate states, re-invoking tools lows lightweight orchestration even within single-agent
if needed, and adjusting decisions as the environment systems by simulating compartmentalized reasoning. In
evolves. This loop becomes more complex in multi- Agentic AI, orchestration is central. A meta-agent or
agent settings where each agent’s observation must be orchestrator distributes tasks among specialized agents,
reconciled against others’ outputs. Shared memory and each with distinct capabilities. Systems like MetaGPT
consistent logging are essential here, ensuring that the and ChatDev exemplify this: agents emulate roles such
reflective capacity of the system is not fragmented across as CEO, engineer, or reviewer, and interact through
agents [126]. structured messaging. This modular approach enhances
4) Memory Architectures (Episodic, Semantic, Vector): interpretability, scalability, and fault isolation ensuring
AI Agents face limitations in long-horizon planning and that failures in one agent do not cascade without con-
session continuity. Memory architectures address this by tainment mechanisms from the orchestrator.
persisting information across tasks [198]. Episodic mem- 6) Reflexive and Self-Critique Mechanisms: AI Agents
ory allows agents to recall prior actions and feedback, often fail silently or propagate errors. Reflexive mech-
semantic memory encodes structured domain knowl- anisms introduce the capacity for self-evaluation [202],
edge, and vector memory enables similarity-based re- [203]. After completing a task, agents can critique their
trieval [199]. These elements are key for personalization own outputs using a secondary reasoning pass, increas-
and adaptive decision-making in repeated interactions. ing robustness and reducing error rates. For example,
Agentic AI systems require even more sophisticated a legal assistant agent might verify that its drafted
memory models due to distributed state management. clause matches prior case laws before submission. For
Each agent may maintain local memory while accessing Agentic AI, reflexivity extends beyond self-critique to
shared global memory to facilitate coordination. For ex- inter-agent evaluation. Agents can review each other’s
ample, a planner agent might use vector-based memory outputs e.g., a verifier agent auditing a summarizer’s
to recall prior workflows, while a QA agent references work. Reflexion-like mechanisms ensure collaborative
semantic memory for fact verification. Synchronizing quality control and enhance trustworthiness [204]. Such
memory access and updates across agents enhances patterns also support iterative improvement and adaptive
consistency, enables context-aware communication, and replanning, particularly when integrated with memory
supports long-horizon system-level planning. logs or feedback queues [205], [206].
7) Programmatic Prompt Engineering Pipelines: Man- agents, and workflows. Role isolation prevents rogue
ual prompt tuning introduces brittleness and reduces agents from exceeding authority, while accountability
reproducibility in AI Agents. Programmatic pipelines mechanisms assign responsibility for decisions and trace
automate this process using task templates, context causality across agents. Compliance protocols, ethical
fillers, and retrieval-augmented variables [207], [208]. alignment checks, and agent authentication ensure safety
These dynamic prompts are structured based on task in collaborative settings paving the way for trustworthy
type, agent role, or user query, improving generalization AI ecosystems.
and reducing failure modes associated with prompt
variability. In Agentic AI, prompt pipelines enable scal- AI Agents are projected to evolve significantly through
able, role-consistent communication. Each agent type enhanced modular intelligence focused on five key domains as
(e.g., planner, retriever, summarizer) can generate or depicted in Figure 14 as : proactive reasoning, tool integration,
consume structured prompts tailored to its function. By causal inference, continual learning, and trust-centric opera-
automating message formatting, dependency tracking, tions. The first transformative milestone involves transitioning
and semantic alignment, programmatic prompting pre- from reactive to Proactive Intelligence, where agents initiate
vents coordination drift and ensures consistent reasoning tasks based on learned patterns, contextual cues, or latent
across diverse agents in real time [14], [159]. goals rather than awaiting explicit prompts. This advancement
8) Causal Modeling and Simulation-Based Planning: AI depends heavily on robust Tool Integration, enabling agents to
Agents often operate on statistical correlations rather dynamically interact with external systems, such as databases,
than causal models, leading to poor generalization under APIs, or simulation environments, to fulfill complex user
distribution shifts. Embedding causal inference allows tasks. Equally critical is the development of Causal Reasoning,
agents to distinguish between correlation and causation, which will allow agents to move beyond statistical correlation,
simulate interventions, and plan more robustly. For supporting inference of cause-effect relationships essential
instance, in supply chain scenarios, a causally aware for tasks involving diagnosis, planning, or prediction. To
agent can simulate the downstream impact of shipment maintain relevance over time, agents must adopt frameworks
delays. In Agentic AI, causal reasoning is vital for safe for Continuous Learning, incorporating feedback loops and
coordination and error recovery. Agents must anticipate episodic memory to adapt their behavior across sessions and
how their actions impact others requiring causal graphs, environments. Lastly, to build user confidence, agents must
simulation environments, or Bayesian inference layers. prioritize Trust & Safety mechanisms through verifiable out-
For example, a planning agent may simulate different put logging, bias detection, and ethical guardrails especially
strategies and communicate likely outcomes to others, as their autonomy increases. Together, these pathways will
fostering strategic alignment and avoiding unintended redefine AI Agents from static tools into adaptive cognitive
emergent behaviors. systems capable of autonomous yet controllable operation in
9) Monitoring, Auditing, and Explainability Pipelines: dynamic digital environments.
AI Agents lack transparency, complicating debugging Agentic AI, as a natural extension of these foundations,
and trust. Logging systems that record prompts, tool emphasizes collaborative intelligence through multi-agent co-
calls, memory updates, and outputs enable post-hoc ordination, contextual persistence, and domain-specific orches-
analysis and performance tuning. These records help tration. Future systems (Figure 14 right side) will exhibit
developers trace faults, refine behavior, and ensure Multi-Agent Scaling, enabling specialized agents to work in
compliance with usage guidelines especially critical in parallel under distributed control for complex problem-solving
enterprise or legal domains. For Agentic AI, logging and mirroring team-based human workflows. This necessitates a
explainability are exponentially more important. With layer of Unified Orchestration, where meta-agents or orches-
multiple agents interacting asynchronously, audit trails trators dynamically assign roles, monitor task dependencies,
are essential for identifying which agent caused an error and mediate conflicts among subordinate agents. Sustained
and under what conditions. Explainability pipelines that performance over time depends on Persistent Memory archi-
integrate across agents (e.g., timeline visualizations or tectures, which preserve semantic, episodic, and shared knowl-
dialogue replays) are key to ensuring safety, especially edge for agents to coordinate longitudinal tasks and retain
in regulatory or multi-stakeholder environments. state awareness. Simulation Planning is expected to become
10) Governance-Aware Architectures (Accountability a core feature, allowing agent collectives to test hypotheti-
and Role Isolation): AI Agents currently lack built- cal strategies, forecast consequences, and optimize outcomes
in safeguards for ethical compliance or error attribution. before real-world execution. Moreover, Ethical Governance
Governance-aware designs introduce role-based access frameworks will be essential to ensure responsible deployment
control, sandboxing, and identity resolution to ensure defining accountability, oversight, and value alignment across
agents act within scope and their decisions can be autonomous agent networks. Finally, tailored Domain-Specific
audited or revoked. These structures reduce risks in Systems will emerge in fields like law, medicine, and sup-
sensitive applications such as healthcare or finance. ply chains, leveraging contextual specialization to outperform
In Agentic AI, governance must scale across roles, generic agents. This future positions Agentic AI not merely
Multi-Agent
Scaling
Continuous Trust &
Learning Safety

Domain-
Unified Or-
Specific
chestration
Systems

Causal
Reasoning
AI Agents Agentic AI

Ethical Persistent
Governance Memory

Tool Proactive
Integration Intelligence
Simulation
Planning

Fig. 14: Mindmap visualization of the future roadmap for AI Agents and Agentic AI.

as a coordination layer on the top of AI Agents, but as a new modular, role specialized networks facilitated by orchestra-
paradigm for collective machine intelligence with adaptive tion layers and reflective memory architectures. Additionally,
planning, recursive reasoning, and collaborative cognition at this study then surveyed application domains in which these
its core. paradigms are deployed. For AI Agents, we illustrated their
role in automating customer support, internal enterprise search,
VII. C ONCLUSION email prioritization, and scheduling. For Agentic AI, we
demonstrated use cases in collaborative research, robotics,
In this study, we presented a comprehensive literature-based medical decision support, and adaptive workflow automation,
evaluation of the evolving landscape of AI Agents and Agentic supported by practical examples and industry-grade systems.
AI, offering a structured taxonomy that highlights foundational Finally, this study provided a deep analysis of the challenges
concepts, architectural evolution, application domains, and key and limitations affecting both paradigms. For AI Agents,
limitations. Beginning with a foundational understanding, we we discussed hallucinations, shallow reasoning, and planning
characterized AI Agents as modular, task-specific entities with constraints, while for Agentic AI, we addressed amplified
constrained autonomy and reactivity. Their operational scope causality issues, coordination bottlenecks, emergent behavior,
is grounded in the integration of LLMs and LIMs, which and governance concerns. These insights offer a roadmap for
serve as core reasoning modules for perception, language future development and deployment of trustworthy, scalable
understanding, and decision-making. We identified generative agentic systems.
AI as a functional precursor, emphasizing its limitations in
autonomy and goal persistence, and examined how LLMs ACKNOWLEDGEMENT
drive the progression from passive generation to interactive
task completion through tool augmentation. This work was supported by the National Science Founda-
tion and the United States Department of Agriculture, National
This study then explored the conceptual emergence of
Institute of Food and Agriculture through the “Artificial Intel-
Agentic AI systems as a transformative evolution from isolated
ligence (AI) Institute for Agriculture” Program under Award
agents to orchestrated, multi-agent ecosystems. We analyzed
AWD003473, and AWD004595, Accession Number 1029004,
key differentiators such as distributed cognition, persistent
”Robotic Blossom Thinning with Soft Manipulators”.
memory, and coordinated planning that distinguish Agentic
AI from conventional agent models. This was followed by
D ECLARATIONS
a detailed breakdown of architectural evolution, highlight-
ing the transition from monolithic, rule-based frameworks to The authors declare no conflicts of interest.
S TATEMENT ON AI W RITING A SSISTANCE [21] A. Radford, K. Narasimhan, T. Salimans, I. Sutskever, et al., “Improv-
ing language understanding by generative pre-training,” arxiv, 2018.
ChatGPT and Perplexity were utilized to enhance grammat- [22] J. Sánchez Cuadrado, S. Pérez-Soler, E. Guerra, and J. De Lara,
ical accuracy and refine sentence structure; all AI-generated “Automating the development of task-oriented llm-based chatbots,”
in Proceedings of the 6th ACM Conference on Conversational User
revisions were thoroughly reviewed and edited for relevance. Interfaces, pp. 1–10, 2024.
Additionally, ChatGPT-4o was employed to generate realistic [23] Y. Lu, A. Aleta, C. Du, L. Shi, and Y. Moreno, “Llms and generative
visualizations. agent-based models for complex systems research,” Physics of Life
Reviews, 2024.
[24] A. Zhang, Y. Chen, L. Sheng, X. Wang, and T.-S. Chua, “On generative
R EFERENCES agents in recommendation,” in Proceedings of the 47th international
ACM SIGIR conference on research and development in Information
[1] E. Oliveira, K. Fischer, and O. Stepankova, “Multi-agent systems: Retrieval, pp. 1807–1817, 2024.
which research for which applications,” Robotics and Autonomous [25] S. Peng, E. Kalliamvakou, P. Cihon, and M. Demirer, “The impact
Systems, vol. 27, no. 1-2, pp. 91–106, 1999. of ai on developer productivity: Evidence from github copilot,” arXiv
[2] Z. Ren and C. J. Anumba, “Multi-agent systems in construction–state preprint arXiv:2302.06590, 2023.
of the art and prospects,” Automation in Construction, vol. 13, no. 3, [26] J. Li, V. Lavrukhin, B. Ginsburg, R. Leary, O. Kuchaiev, J. M. Cohen,
pp. 421–434, 2004. H. Nguyen, and R. T. Gadde, “Jasper: An end-to-end convolutional
[3] C. Castelfranchi, “Modelling social action for ai agents,” Artificial neural acoustic model,” arXiv preprint arXiv:1904.03288, 2019.
intelligence, vol. 103, no. 1-2, pp. 157–182, 1998. [27] A. Jaruga-Rozdolska, “Artificial intelligence as part of future practices
[4] J. Ferber and G. Weiss, Multi-agent systems: an introduction to in the architect’s work: Midjourney generative tool as part of a process
distributed artificial intelligence, vol. 1. Addison-wesley Reading, of creating an architectural form,” Architectus, no. 3 (71, pp. 95–104,
1999. 2022.
[5] R. Calegari, G. Ciatto, V. Mascardi, and A. Omicini, “Logic-based [28] K. Basu, “Bridging knowledge gaps in llms via function calls,” in
technologies for multi-agent systems: a systematic literature review,” Proceedings of the 33rd ACM International Conference on Information
Autonomous Agents and Multi-Agent Systems, vol. 35, no. 1, p. 1, 2021. and Knowledge Management, pp. 5556–5557, 2024.
[6] R. C. Cardoso and A. Ferrando, “A review of agent-based programming [29] Z. Liu, T. Hoang, J. Zhang, M. Zhu, T. Lan, J. Tan, W. Yao, Z. Liu,
for multi-agent systems,” Computers, vol. 10, no. 2, p. 16, 2021. Y. Feng, R. RN, et al., “Apigen: Automated pipeline for generating
[7] E. Shortliffe, Computer-based medical consultations: MYCIN, vol. 2. verifiable and diverse function-calling datasets,” Advances in Neural
Elsevier, 2012. Information Processing Systems, vol. 37, pp. 54463–54482, 2024.
[8] H. P. Moravec, “The stanford cart and the cmu rover,” Proceedings of [30] H. Yang, S. Yue, and Y. He, “Auto-gpt for online decision
the IEEE, vol. 71, no. 7, pp. 872–884, 1983. making: Benchmarks and additional opinions,” arXiv preprint
[9] B. Dai and H. Chen, “A multi-agent and auction-based framework and arXiv:2306.02224, 2023.
approach for carrier collaboration,” Logistics Research, vol. 3, pp. 101– [31] I. Hettiarachchi, “Exploring generative ai agents: Architecture, applica-
120, 2011. tions, and challenges,” Journal of Artificial Intelligence General science
[10] J. Grosset, A.-J. Fougères, M. Djoko-Kouam, and J.-M. Bonnin, (JAIGS) ISSN: 3006-4023, vol. 8, no. 1, pp. 105–127, 2025.
“Multi-agent simulation of autonomous industrial vehicle fleets: To- [32] A. Das, S.-C. Chen, M.-L. Shyu, and S. Sadiq, “Enabling synergistic
wards dynamic task allocation in v2x cooperation mode,” Integrated knowledge sharing and reasoning in large language models with
Computer-Aided Engineering, vol. 31, no. 3, pp. 249–266, 2024. collaborative multi-agents,” in 2023 IEEE 9th International Conference
[11] R. A. Agis, S. Gottifredi, and A. J. Garcı́a, “An event-driven behavior on Collaboration and Internet Computing (CIC), pp. 92–98, IEEE,
trees extension to facilitate non-player multi-agent coordination in 2023.
video games,” Expert Systems with Applications, vol. 155, p. 113457, [33] Z. Duan and J. Wang, “Exploration of llm multi-agent applica-
2020. tion implementation based on langgraph+ crewai,” arXiv preprint
[12] A. Guerra-Hernández, A. El Fallah-Seghrouchni, and H. Soldano, arXiv:2411.18241, 2024.
“Learning in bdi multi-agent systems,” in International Workshop on [34] R. Sapkota, Y. Cao, K. I. Roumeliotis, and M. Karkee, “Vision-
Computational Logic in Multi-Agent Systems, pp. 218–233, Springer, language-action models: Concepts, progress, applications and chal-
2004. lenges,” arXiv preprint arXiv:2505.04769, 2025.
[13] A. Saadi, R. Maamri, and Z. Sahnoun, “Behavioral flexibility in belief- [35] R. Sapkota, K. I. Roumeliotis, R. H. Cheppally, M. F. Calero, and
desire-intention (bdi) architectures,” Multiagent and grid systems, M. Karkee, “A review of 3d object detection with vision-language
vol. 16, no. 4, pp. 343–377, 2020. models,” arXiv preprint arXiv:2504.18738, 2025.
[14] D. B. Acharya, K. Kuppan, and B. Divya, “Agentic ai: Autonomous [36] R. Sapkota and M. Karkee, “Object detection with multimodal large
intelligence for complex goals–a comprehensive survey,” IEEE Access, vision-language models: An in-depth review,” Available at SSRN
2025. 5233953, 2025.
[15] M. Z. Pan, M. Cemri, L. A. Agrawal, S. Yang, B. Chopra, R. Tiwari, [37] B. Memarian and T. Doleck, “Human-in-the-loop in artificial intel-
K. Keutzer, A. Parameswaran, K. Ramchandran, D. Klein, et al., “Why ligence in education: A review and entity-relationship (er) analysis,”
do multiagent systems fail?,” in ICLR 2025 Workshop on Building Trust Computers in Human Behavior: Artificial Humans, vol. 2, no. 1,
in Language Models and Applications, 2025. p. 100053, 2024.
[16] L. Hughes, Y. K. Dwivedi, T. Malik, M. Shawosh, M. A. Albashrawi, [38] P. Bornet, J. Wirtz, T. H. Davenport, D. De Cremer, B. Evergreen,
I. Jeon, V. Dutot, M. Appanderanda, T. Crick, R. De’, et al., “Ai agents P. Fersht, R. Gohel, S. Khiyara, P. Sund, and N. Mullakara, Agentic
and agentic systems: A multi-expert analysis,” Journal of Computer Artificial Intelligence: Harnessing AI Agents to Reinvent Business,
Information Systems, pp. 1–29, 2025. Work and Life. Irreplaceable Publishing, 2025.
[17] Z. Deng, Y. Guo, C. Han, W. Ma, J. Xiong, S. Wen, and Y. Xiang, [39] F. Sado, C. K. Loo, W. S. Liew, M. Kerzel, and S. Wermter, “Ex-
“Ai agents under threat: A survey of key security challenges and future plainable goal-driven agents and robots-a comprehensive review,” ACM
pathways,” ACM Computing Surveys, vol. 57, no. 7, pp. 1–36, 2025. Computing Surveys, vol. 55, no. 10, pp. 1–41, 2023.
[18] M. Gridach, J. Nanavati, K. Z. E. Abidine, L. Mendes, and C. Mack, [40] J. Heer, “Agency plus automation: Designing artificial intelligence into
“Agentic ai for scientific discovery: A survey of progress, challenges, interactive systems,” Proceedings of the National Academy of Sciences,
and future directions,” arXiv preprint arXiv:2503.08979, 2025. vol. 116, no. 6, pp. 1844–1850, 2019.
[19] T. Song, M. Luo, X. Zhang, L. Chen, Y. Huang, J. Cao, Q. Zhu, D. Liu, [41] G. Papagni, J. de Pagter, S. Zafari, M. Filzmoser, and S. T. Koeszegi,
B. Zhang, G. Zou, et al., “A multiagent-driven robotic ai chemist “Artificial agents’ explainability to support trust: considerations on
enabling autonomous chemical research on demand,” Journal of the timing and context,” Ai & Society, vol. 38, no. 2, pp. 947–960, 2023.
American Chemical Society, vol. 147, no. 15, pp. 12534–12545, 2025. [42] P. Wang and H. Ding, “The rationality of explanation or human
[20] M. M. Karim, D. H. Van, S. Khan, Q. Qu, and Y. Kholodov, “Ai capacity? understanding the impact of explainable artificial intelligence
agents meet blockchain: A survey on secure and scalable collaboration on human-ai trust and decision performance,” Information Processing
for multi-agents,” Future Internet, vol. 17, no. 2, p. 57, 2025. & Management, vol. 61, no. 4, p. 103732, 2024.
[43] E. Popa, “Human goals are constitutive of agency in artificial intelli- [64] R. Khan, S. Sarkar, S. K. Mahata, and E. Jose, “Security threats in
gence (ai),” Philosophy & Technology, vol. 34, no. 4, pp. 1731–1750, agentic ai system,” arXiv preprint arXiv:2410.14728, 2024.
2021. [65] C. G. Endacott, “Enacting machine agency when ai makes one’s day:
[44] M. Chacon-Chamorro, L. F. Giraldo, N. Quijano, V. Vargas-Panesso, understanding how users relate to ai communication technologies for
C. González, J. S. Pinzón, R. Manrique, M. Rı́os, Y. Fonseca, scheduling,” Journal of Computer-Mediated Communication, vol. 29,
D. Gómez-Barrera, et al., “Cooperative resilience in artificial intel- no. 4, p. zmae011, 2024.
ligence multiagent systems,” IEEE Transactions on Artificial Intelli- [66] Z. Pawlak and A. Skowron, “Rudiments of rough sets,” Information
gence, 2025. sciences, vol. 177, no. 1, pp. 3–27, 2007.
[45] M. Adam, M. Wessel, and A. Benlian, “Ai-based chatbots in customer [67] P. Ponnusamy, A. Ghias, Y. Yi, B. Yao, C. Guo, and R. Sarikaya,
service and their effects on user compliance,” Electronic Markets, “Feedback-based self-learning in large-scale conversational ai agents,”
vol. 31, no. 2, pp. 427–445, 2021. AI magazine, vol. 42, no. 4, pp. 43–56, 2022.
[46] D. Leocádio, L. Guedes, J. Oliveira, J. Reis, and N. Melão, “Customer [68] A. Zagalsky, D. Te’eni, I. Yahav, D. G. Schwartz, G. Silverman,
service with ai-powered human-robot collaboration (hrc): A literature D. Cohen, Y. Mann, and D. Lewinsky, “The design of reciprocal
review,” Procedia Computer Science, vol. 232, pp. 1222–1232, 2024. learning between human and artificial intelligence,” Proceedings of the
[47] T. Cao, Y. Q. Khoo, S. Birajdar, Z. Gong, C.-F. Chung, Y. Moghaddam, ACM on Human-Computer Interaction, vol. 5, no. CSCW2, pp. 1–36,
A. Xu, H. Mehta, A. Shukla, Z. Wang, et al., “Designing towards 2021.
productivity: A centralized ai assistant concept for work,” The Human [69] W. J. Clancey, “Heuristic classification,” Artificial intelligence, vol. 27,
Side of Service Engineering, p. 118, 2024. no. 3, pp. 289–350, 1985.
[48] Y. Huang and J. X. Huang, “Exploring chatgpt for next-generation in- [70] S. Kapoor, B. Stroebl, Z. S. Siegel, N. Nadgir, and A. Narayanan, “Ai
formation retrieval: Opportunities and challenges,” in Web Intelligence, agents that matter,” arXiv preprint arXiv:2407.01502, 2024.
vol. 22, pp. 31–44, SAGE Publications Sage UK: London, England, [71] X. Huang, J. Lian, Y. Lei, J. Yao, D. Lian, and X. Xie, “Recommender
2024. ai agent: Integrating large language models for interactive recommen-
[49] N. Holtz, S. Wittfoth, and J. M. Gómez, “The new era of knowledge dations,” arXiv preprint arXiv:2308.16505, 2023.
retrieval: Multi-agent systems meet generative ai,” in 2024 Portland In- [72] A. M. Baabdullah, A. A. Alalwan, R. S. Algharabat, B. Metri, and N. P.
ternational Conference on Management of Engineering and Technology Rana, “Virtual agents and flow experience: An empirical examination
(PICMET), pp. 1–10, IEEE, 2024. of ai-powered chatbots,” Technological Forecasting and Social Change,
[50] F. Poszler and B. Lange, “The impact of intelligent decision-support vol. 181, p. 121772, 2022.
systems on humans’ ethical decision-making: A systematic literature
[73] J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman,
review and an integrated framework,” Technological Forecasting and
D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat, et al., “Gpt-4
Social Change, vol. 204, p. 123403, 2024.
technical report,” arXiv preprint arXiv:2303.08774, 2023.
[51] F. Khemakhem, H. Ellouzi, H. Ltifi, and M. B. Ayed, “Agent-based
[74] A. Chowdhery, S. Narang, J. Devlin, M. Bosma, G. Mishra, A. Roberts,
intelligent decision support systems: a systematic review,” IEEE Trans-
P. Barham, H. W. Chung, C. Sutton, S. Gehrmann, et al., “Palm: Scaling
actions on Cognitive and Developmental Systems, vol. 14, no. 1,
language modeling with pathways,” Journal of Machine Learning
pp. 20–34, 2020.
Research, vol. 24, no. 240, pp. 1–113, 2023.
[52] R. V. Florian, “Autonomous artificial intelligent agents,” Center for
Cognitive and Neural Studies (Coneural), Cluj-Napoca, Romania, [75] H. Honda and M. Hagiwara, “Question answering systems with deep
2003. learning-based symbolic processing,” IEEE Access, vol. 7, pp. 152368–
[53] T. Hellström, N. Kaiser, and S. Bensch, “A taxonomy of embodiment 152378, 2019.
in the ai era,” Electronics, vol. 13, no. 22, p. 4441, 2024. [76] N. Karanikolas, E. Manga, N. Samaridi, E. Tousidou, and M. Vassi-
[54] M. Wischnewski, “Attributing mental states to non-embodied au- lakopoulos, “Large language models versus natural language under-
tonomous systems: A systematic review,” in Proceedings of the Ex- standing and generation,” in Proceedings of the 27th Pan-Hellenic
tended Abstracts of the CHI Conference on Human Factors in Com- Conference on Progress in Computing and Informatics, pp. 278–290,
puting Systems, pp. 1–8, 2025. 2023.
[55] K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, and M. Fritz, [77] A. S. George, A. H. George, T. Baskar, and A. G. Martin, “Revolu-
“Not what you’ve signed up for: Compromising real-world llm- tionizing business communication: Exploring the potential of gpt-4 in
integrated applications with indirect prompt injection,” in Proceedings corporate settings,” Partners Universal International Research Journal,
of the 16th ACM Workshop on Artificial Intelligence and Security, vol. 2, no. 1, pp. 149–157, 2023.
pp. 79–90, 2023. [78] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal,
[56] Y. Talebirad and A. Nadiri, “Multi-agent collaboration: Harnessing G. Sastry, A. Askell, P. Mishkin, J. Clark, et al., “Learning transferable
the power of intelligent llm agents,” arXiv preprint arXiv:2306.03314, visual models from natural language supervision,” in International
2023. conference on machine learning, pp. 8748–8763, PmLR, 2021.
[57] A. I. Hauptman, B. G. Schelble, N. J. McNeese, and K. C. Madathil, [79] J. Li, D. Li, S. Savarese, and S. Hoi, “Blip-2: Bootstrapping language-
“Adapt and overcome: Perceptions of adaptive autonomous agents image pre-training with frozen image encoders and large language
for human-ai teaming,” Computers in Human Behavior, vol. 138, models,” in International conference on machine learning, pp. 19730–
p. 107451, 2023. 19742, PMLR, 2023.
[58] N. Krishnan, “Advancing multi-agent systems through model con- [80] S. Sontakke, J. Zhang, S. Arnold, K. Pertsch, E. Bıyık, D. Sadigh,
text protocol: Architecture, implementation, and applications,” arXiv C. Finn, and L. Itti, “Roboclip: One demonstration is enough to learn
preprint arXiv:2504.21030, 2025. robot policies,” Advances in Neural Information Processing Systems,
[59] H. Padigela, C. Shah, and D. Juyal, “Ml-dev-bench: Comparative vol. 36, pp. 55681–55693, 2023.
analysis of ai agents on ml development workflows,” arXiv preprint [81] M. Elhenawy, H. I. Ashqar, A. Rakotonirainy, T. I. Alhadidi, A. Jaber,
arXiv:2502.00964, 2025. and M. A. Tami, “Vision-language models for autonomous driving:
[60] M. Raees, I. Meijerink, I. Lykourentzou, V.-J. Khan, and K. Papangelis, Clip-based dynamic scene understanding,” Electronics, vol. 14, no. 7,
“From explainable to interactive ai: A literature review on current p. 1282, 2025.
trends in human-ai interaction,” International Journal of Human- [82] S. Park, M. Lee, J. Kang, H. Choi, Y. Park, J. Cho, A. Lee, and D. Kim,
Computer Studies, p. 103301, 2024. “Vlaad: Vision and language assistant for autonomous driving,” in
[61] P. Formosa, “Robot autonomy vs. human autonomy: social robots, Proceedings of the IEEE/CVF Winter Conference on Applications of
artificial intelligence (ai), and the nature of autonomy,” Minds and Computer Vision, pp. 980–987, 2024.
Machines, vol. 31, no. 4, pp. 595–616, 2021. [83] S. H. Ahmed, S. Hu, and G. Sukthankar, “The potential of vision-
[62] C. S. Eze and L. Shamir, “Analysis and prevention of ai-based phishing language models for content moderation of children’s videos,” in
email attacks,” Electronics, vol. 13, no. 10, p. 1839, 2024. 2023 International Conference on Machine Learning and Applications
[63] D. Singh, V. Patel, D. Bose, and A. Sharma, “Enhancing email market- (ICMLA), pp. 1237–1241, IEEE, 2023.
ing efficacy through ai-driven personalization: Leveraging natural lan- [84] S. H. Ahmed, M. J. Khan, and G. Sukthankar, “Enhanced multimodal
guage processing and collaborative filtering algorithms,” International content moderation of children’s videos using audiovisual fusion,”
Journal of AI Advancements, vol. 9, no. 4, 2020. arXiv preprint arXiv:2405.06128, 2024.
[85] P. Chitra and A. Saleem Raja, “Artificial intelligence (ai) algorithm and [106] Y. Liu, H. Du, D. Niyato, J. Kang, Z. Xiong, Y. Wen, and D. I. Kim,
models for embodied agents (robots and drones),” in Building Embod- “Generative ai in data center networking: Fundamentals, perspectives,
ied AI Systems: The Agents, the Architecture Principles, Challenges, and case study,” IEEE Network, 2025.
and Application Domains, pp. 417–441, Springer, 2025. [107] C. Guo, F. Cheng, Z. Du, J. Kiessling, J. Ku, S. Li, Z. Li, M. Ma,
[86] S. Kourav, K. Verma, and M. Sundararajan, “Artificial intelligence T. Molom-Ochir, B. Morris, et al., “A survey: Collaborative hardware
algorithm models for agents of embodiment for drone applications,” and software design in the era of large language models,” IEEE Circuits
in Building Embodied AI Systems: The Agents, the Architecture Prin- and Systems Magazine, vol. 25, no. 1, pp. 35–57, 2025.
ciples, Challenges, and Application Domains, pp. 79–101, Springer, [108] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal,
2025. A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al., “Language mod-
[87] G. Natarajan, E. Elango, B. Sundaravadivazhagan, and S. Rethinam, els are few-shot learners,” Advances in neural information processing
“Artificial intelligence algorithms and models for embodied agents: systems, vol. 33, pp. 1877–1901, 2020.
Enhancing autonomy in drones and robots,” in Building Embodied [109] H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux,
AI Systems: The Agents, the Architecture Principles, Challenges, and T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar, et al., “Llama:
Application Domains, pp. 103–132, Springer, 2025. Open and efficient foundation language models,” arXiv preprint
[88] K. Pandya and M. Holia, “Automating customer service using arXiv:2302.13971, 2023.
langchain: Building custom open-source gpt chatbot for organizations,” [110] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena,
arXiv preprint arXiv:2310.05421, 2023. Y. Zhou, W. Li, and P. J. Liu, “Exploring the limits of transfer learning
[89] Q. Wu, G. Bansal, J. Zhang, Y. Wu, B. Li, E. Zhu, L. Jiang, X. Zhang, with a unified text-to-text transformer,” Journal of machine learning
S. Zhang, J. Liu, et al., “Autogen: Enabling next-gen llm applications research, vol. 21, no. 140, pp. 1–67, 2020.
via multi-agent conversation,” arXiv preprint arXiv:2308.08155, 2023. [111] A. Yang, B. Xiao, B. Wang, B. Zhang, C. Bian, C. Yin, C. Lv, D. Pan,
D. Wang, D. Yan, et al., “Baichuan 2: Open large-scale language
[90] L. Gabora and J. Bach, “A path to generative artificial selves,” in EPIA
models,” arXiv preprint arXiv:2309.10305, 2023.
Conference on Artificial Intelligence, pp. 15–29, Springer, 2023.
[112] K. M. Yoo, D. Park, J. Kang, S.-W. Lee, and W. Park, “Gpt3mix:
[91] G. Pezzulo, T. Parr, P. Cisek, A. Clark, and K. Friston, “Generating Leveraging large-scale language models for text augmentation,” arXiv
meaning: active inference and the scope and limits of passive ai,” preprint arXiv:2104.08826, 2021.
Trends in Cognitive Sciences, vol. 28, no. 2, pp. 97–112, 2024. [113] D. Zhou, X. Xue, X. Lu, Y. Guo, P. Ji, H. Lv, W. He, Y. Xu, Q. Li,
[92] J. Li, M. Zhang, N. Li, D. Weyns, Z. Jin, and K. Tei, “Generative ai and L. Cui, “A hierarchical model for complex adaptive system: From
for self-adaptive systems: State of the art and research roadmap,” ACM adaptive agent to ai society,” ACM Transactions on Autonomous and
Transactions on Autonomous and Adaptive Systems, vol. 19, no. 3, Adaptive Systems, 2024.
pp. 1–60, 2024. [114] H. Hao, Y. Wang, and J. Chen, “Empowering scenario planning with
[93] W. O’Grady and M. Lee, “Natural syntax, artificial intelligence and artificial intelligence: A perspective on building smart and resilient
language acquisition,” Information, vol. 14, no. 7, p. 418, 2023. cities,” Engineering, 2024.
[94] X. Liu, J. Wang, J. Sun, X. Yuan, G. Dong, P. Di, W. Wang, [115] Y. Wang, J. Zhu, Z. Cheng, L. Qiu, Z. Tong, and J. Huang, “Intelligent
and D. Wang, “Prompting frameworks for large language models: A optimization method for real-time decision-making in laminated cool-
survey,” arXiv preprint arXiv:2311.12785, 2023. ing configurations through reinforcement learning,” Energy, vol. 291,
[95] E. T. Rolls, “The memory systems of the human brain and generative p. 130434, 2024.
artificial intelligence,” Heliyon, vol. 10, no. 11, 2024. [116] X. Xiang, J. Xue, L. Zhao, Y. Lei, C. Yue, and K. Lu, “Real-
[96] K. Alizadeh, S. I. Mirzadeh, D. Belenko, S. Khatamifard, M. Cho, C. C. time integration of fine-tuned large language model for improved
Del Mundo, M. Rastegari, and M. Farajtabar, “Llm in a flash: Efficient decision-making in reinforcement learning,” in 2024 International Joint
large language model inference with limited memory,” in Proceedings Conference on Neural Networks (IJCNN), pp. 1–8, IEEE, 2024.
of the 62nd Annual Meeting of the Association for Computational [117] Z. Li, H. Zhang, C. Peng, and R. Peiris, “Exploring large language
Linguistics (Volume 1: Long Papers), pp. 12562–12584, 2024. model-driven agents for environment-aware spatial interactions and
[97] D. Driess, F. Xia, M. S. Sajjadi, C. Lynch, A. Chowdhery, A. Wahid, conversations in virtual reality role-play scenarios,” in 2025 IEEE
J. Tompson, Q. Vuong, T. Yu, W. Huang, et al., “Palm-e: An embodied Conference Virtual Reality and 3D User Interfaces (VR), pp. 1–11,
multimodal language model,” 2023. IEEE, 2025.
[98] P. Denny, J. Leinonen, J. Prather, A. Luxton-Reilly, T. Amarouche, [118] T. R. McIntosh, T. Susnjak, T. Liu, P. Watters, and M. N. Halgamuge,
B. A. Becker, and B. N. Reeves, “Prompt problems: A new pro- “The inadequacy of reinforcement learning from human feedback-
gramming exercise for the generative ai era,” in Proceedings of the radicalizing large language models via semantic vulnerabilities,” IEEE
55th ACM Technical Symposium on Computer Science Education V. 1, Transactions on Cognitive and Developmental Systems, 2024.
pp. 296–302, 2024. [119] S. Lee, G. Lee, W. Kim, J. Kim, J. Park, and K. Cho, “Human strategy
[99] C. Chen, S. Lee, E. Jang, and S. S. Sundar, “Is your prompt detailed learning-based multi-agent deep reinforcement learning for online team
enough? exploring the effects of prompt coaching on users’ percep- sports game,” IEEE Access, 2025.
tions, engagement, and trust in text-to-image generative ai tools,” in [120] Z. Shi, S. Gao, L. Yan, Y. Feng, X. Chen, Z. Chen, D. Yin, S. Ver-
Proceedings of the Second International Symposium on Trustworthy berne, and Z. Ren, “Tool learning in the wild: Empowering language
Autonomous Systems, pp. 1–12, 2024. models as automatic tool agents,” in Proceedings of the ACM on Web
Conference 2025, pp. 2222–2237, 2025.
[100] A. Pan, E. Jones, M. Jagadeesan, and J. Steinhardt, “Feedback loops
[121] S. Yuan, K. Song, J. Chen, X. Tan, Y. Shen, R. Kan, D. Li, and D. Yang,
with language models drive in-context reward hacking,” arXiv preprint
“Easytool: Enhancing llm-based agents with concise tool instruction,”
arXiv:2402.06627, 2024.
arXiv preprint arXiv:2401.06201, 2024.
[101] K. Nabben, “Ai as a constituted system: accountability lessons from [122] B. Xu, X. Liu, H. Shen, Z. Han, Y. Li, M. Yue, Z. Peng, Y. Liu, Z. Yao,
an llm experiment,” Data & policy, vol. 6, p. e57, 2024. and D. Xu, “Gentopia: A collaborative platform for tool-augmented
[102] P. J. Pesch, “Potentials and challenges of large language models (llms) llms,” arXiv preprint arXiv:2308.04030, 2023.
in the context of administrative decision-making,” European Journal [123] H. Lu, X. Li, X. Ji, Z. Kan, and Q. Hu, “Toolfive: Enhancing tool-
of Risk Regulation, pp. 1–20, 2025. augmented llms via tool filtering and verification,” in ICASSP 2025-
[103] C. Wang, Y. Deng, Z. Lyu, L. Zeng, J. He, S. Yan, and B. An, “Q*: 2025 IEEE International Conference on Acoustics, Speech and Signal
Improving multi-step reasoning for llms with deliberative planning,” Processing (ICASSP), pp. 1–5, IEEE, 2025.
arXiv preprint arXiv:2406.14283, 2024. [124] Y. Song, F. Xu, S. Zhou, and G. Neubig, “Beyond browsing: Api-based
[104] H. Wei, Z. Zhang, S. He, T. Xia, S. Pan, and F. Liu, “Plangen- web agents,” arXiv preprint arXiv:2410.16464, 2024.
llms: A modern survey of llm planning capabilities,” arXiv preprint [125] V. Tupe and S. Thube, “Ai agentic workflows and enterprise apis:
arXiv:2502.11221, 2025. Adapting api architectures for the age of ai agents,” arXiv preprint
[105] A. Bandi, P. V. S. R. Adapa, and Y. E. V. P. K. Kuchi, “The power of arXiv:2502.17443, 2025.
generative ai: A review of requirements, models, input–output formats, [126] S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y. Cao,
evaluation metrics, and challenges,” Future Internet, vol. 15, no. 8, “React: Synergizing reasoning and acting in language models,” in
p. 260, 2023. International Conference on Learning Representations (ICLR), 2023.
[127] L. Ning, Z. Liang, Z. Jiang, H. Qu, Y. Ding, W. Fan, X.-y. Wei, [149] W. Chen, Y. Su, J. Zuo, C. Yang, C. Yuan, C. Qian, C.-M. Chan,
S. Lin, H. Liu, P. S. Yu, et al., “A survey of webagents: Towards Y. Qin, Y. Lu, R. Xie, et al., “Agentverse: Facilitating multi-agent
next-generation ai agents for web automation with large foundation collaboration and exploring emergent behaviors in agents,” arXiv
models,” arXiv preprint arXiv:2503.23350, 2025. preprint arXiv:2308.10848, vol. 2, no. 4, p. 6, 2023.
[128] M. W. U. Rahman, R. Nevarez, L. T. Mim, and S. Hariri, “Multi- [150] T. Schick, J. Dwivedi-Yu, R. Dessı̀, R. Raileanu, M. Lomeli, E. Ham-
agent actor-critic generative ai for query resolution and analysis,” IEEE bro, L. Zettlemoyer, N. Cancedda, and T. Scialom, “Toolformer:
Transactions on Artificial Intelligence, 2025. Language models can teach themselves to use tools,” Advances in
[129] J. Lála, O. O’Donoghue, A. Shtedritski, S. Cox, S. G. Rodriques, Neural Information Processing Systems, vol. 36, pp. 68539–68551,
and A. D. White, “Paperqa: Retrieval-augmented generative agent for 2023.
scientific research,” arXiv preprint arXiv:2312.07559, 2023. [151] J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V. Le,
[130] Z. Wu, C. Yu, C. Chen, J. Hao, and H. H. Zhuo, “Models as agents: D. Zhou, et al., “Chain-of-thought prompting elicits reasoning in large
Optimizing multi-step predictions of interactive local models in model- language models,” Advances in neural information processing systems,
based multi-agent reinforcement learning,” in Proceedings of the AAAI vol. 35, pp. 24824–24837, 2022.
Conference on Artificial Intelligence, vol. 37, pp. 10435–10443, 2023. [152] S. Yao, D. Yu, J. Zhao, I. Shafran, T. Griffiths, Y. Cao, and
[131] Z. Feng, R. Xue, L. Yuan, Y. Yu, N. Ding, M. Liu, B. Gao, J. Sun, and K. Narasimhan, “Tree of thoughts: Deliberate problem solving with
G. Wang, “Multi-agent embodied ai: Advances and future directions,” large language models,” Advances in neural information processing
arXiv preprint arXiv:2505.05108, 2025. systems, vol. 36, pp. 11809–11822, 2023.
[132] A. Feriani and E. Hossain, “Single and multi-agent deep reinforcement [153] J. Guo, N. Li, J. Qi, H. Yang, R. Li, Y. Feng, S. Zhang, and M. Xu,
learning for ai-enabled wireless networks: A tutorial,” IEEE Commu- “Empowering working memory for large language model agents,”
nications Surveys & Tutorials, vol. 23, no. 2, pp. 1226–1252, 2021. arXiv preprint arXiv:2312.17259, 2023.
[133] R. Zhang, S. Tang, Y. Liu, D. Niyato, Z. Xiong, S. Sun, S. Mao, [154] S. Agashe, J. Han, S. Gan, J. Yang, A. Li, and X. E. Wang, “Agent s:
and Z. Han, “Toward agentic ai: generative information retrieval An open agentic framework that uses computers like a human,” arXiv
inspired intelligent communications and networking,” arXiv preprint preprint arXiv:2410.08164, 2024.
arXiv:2502.16866, 2025. [155] C. DeChant, “Episodic memory in ai agents poses risks that should be
[134] U. M. Borghoff, P. Bottoni, and R. Pareschi, “Human-artificial interac- studied and mitigated,” arXiv preprint arXiv:2501.11739, 2025.
tion in the age of agentic ai: a system-theoretical approach,” Frontiers [156] A. M. Nuxoll and J. E. Laird, “Enhancing intelligent agents with
in Human Dynamics, vol. 7, p. 1579166, 2025. episodic memory,” Cognitive Systems Research, vol. 17, pp. 34–48,
[135] E. Miehling, K. N. Ramamurthy, K. R. Varshney, M. Riemer, D. Boun- 2012.
effouf, J. T. Richards, A. Dhurandhar, E. M. Daly, M. Hind, P. Sat- [157] G. Sarthou, A. Clodic, and R. Alami, “Ontologenius: A long-term
tigeri, et al., “Agentic ai needs a systems theory,” arXiv preprint semantic memory for robotic agents,” in 2019 28th IEEE International
arXiv:2503.00237, 2025. Conference on Robot and Human Interactive Communication (RO-
[136] W. Xu, Z. Liang, K. Mei, H. Gao, J. Tan, and Y. Zhang, “A-mem: MAN), pp. 1–8, IEEE, 2019.
Agentic memory for llm agents,” arXiv preprint arXiv:2502.12110,
[158] A.-e.-h. Munir and W. M. Qazi, “Artificial subjectivity: Personal se-
2025.
mantic memory model for cognitive agents,” Applied Sciences, vol. 12,
[137] C. Riedl and D. De Cremer, “Ai for collective intelligence,” Collective no. 4, p. 1903, 2022.
Intelligence, vol. 4, no. 2, p. 26339137251328909, 2025.
[159] A. Singh, A. Ehtesham, S. Kumar, and T. T. Khoei, “Agentic retrieval-
[138] L. Peng, D. Li, Z. Zhang, T. Zhang, A. Huang, S. Yang, and Y. Hu,
augmented generation: A survey on agentic rag,” arXiv preprint
“Human-ai collaboration: Unraveling the effects of user proficiency
arXiv:2501.09136, 2025.
and ai agent capability in intelligent decision support systems,” Inter-
national Journal of Industrial Ergonomics, vol. 103, p. 103629, 2024. [160] R. Akkiraju, A. Xu, D. Bora, T. Yu, L. An, V. Seth, A. Shukla, P. Gun-
decha, H. Mehta, A. Jha, et al., “Facts about building retrieval aug-
[139] H. Shirado, K. Shimizu, N. A. Christakis, and S. Kasahara, “Realism
mented generation-based chatbots,” arXiv preprint arXiv:2407.07858,
drives interpersonal reciprocity but yields to ai-assisted egocentrism in
2024.
a coordination experiment,” in Proceedings of the 2025 CHI Conference
on Human Factors in Computing Systems, pp. 1–21, 2025. [161] G. Wang, Y. Xie, Y. Jiang, A. Mandlekar, C. Xiao, Y. Zhu, L. Fan,
[140] Y. Xiao, G. Shi, and P. Zhang, “Towards agentic ai networking in and A. Anandkumar, “Voyager: An open-ended embodied agent with
6g: A generative foundation model-as-agent approach,” arXiv preprint large language models,” arXiv preprint arXiv:2305.16291, 2023.
arXiv:2503.15764, 2025. [162] G. Li, H. Hammoud, H. Itani, D. Khizbullin, and B. Ghanem, “Camel:
[141] P. R. Lewis and Ş. Sarkadi, “Reflective artificial intelligence,” Minds Communicative agents for” mind” exploration of large language model
and Machines, vol. 34, no. 2, p. 14, 2024. society,” Advances in Neural Information Processing Systems, vol. 36,
[142] C. Qian, W. Liu, H. Liu, N. Chen, Y. Dang, J. Li, C. Yang, W. Chen, pp. 51991–52008, 2023.
Y. Su, X. Cong, et al., “Chatdev: Communicative agents for software [163] S. Reed, K. Zolna, E. Parisotto, S. G. Colmenarejo, A. Novikov,
development,” arXiv preprint arXiv:2307.07924, 2023. G. Barth-Maron, M. Gimenez, Y. Sulsky, J. Kay, J. T. Springenberg,
[143] S. Hong, X. Zheng, J. Chen, Y. Cheng, J. Wang, C. Zhang, et al., “A generalist agent,” arXiv preprint arXiv:2205.06175, 2022.
Z. Wang, S. K. S. Yau, Z. Lin, L. Zhou, et al., “Metagpt: Meta [164] C. K. Thomas, C. Chaccour, W. Saad, M. Debbah, and C. S. Hong,
programming for multi-agent collaborative framework,” arXiv preprint “Causal reasoning: Charting a revolutionary course for next-generation
arXiv:2308.00352, vol. 3, no. 4, p. 6, 2023. ai-native wireless networks,” IEEE Vehicular Technology Magazine,
[144] Y. Liang, C. Wu, T. Song, W. Wu, Y. Xia, Y. Liu, Y. Ou, S. Lu, 2024.
L. Ji, S. Mao, et al., “Taskmatrix. ai: Completing tasks by connecting [165] Z. Tang, R. Wang, W. Chen, K. Wang, Y. Liu, T. Chen, and L. Lin,
foundation models with millions of apis,” Intelligent Computing, vol. 3, “Towards causalgpt: A multi-agent approach for faithful knowledge
p. 0063, 2024. reasoning via promoting causal consistency in llms,” arXiv preprint
[145] H. Hexmoor, J. Lammens, G. Caicedo, and S. C. Shapiro, Behaviour arXiv:2308.11914, 2023.
based AI, cognitive processes, and emergent behaviors in autonomous [166] Z. Gekhman, J. Herzig, R. Aharoni, C. Elkind, and I. Szpektor,
agents, vol. 1. WIT Press, 2025. “Trueteacher: Learning factual consistency evaluation with large lan-
[146] H. Zhang, Z. Li, F. Liu, Y. He, Z. Cao, and Y. Zheng, “Design guage models,” arXiv preprint arXiv:2305.11171, 2023.
and implementation of langchain-based chatbot,” in 2024 International [167] A. Wu, K. Kuang, M. Zhu, Y. Wang, Y. Zheng, K. Han, B. Li, G. Chen,
Seminar on Artificial Intelligence, Computer Technology and Control F. Wu, and K. Zhang, “Causality for large language models,” arXiv
Engineering (ACTCE), pp. 226–229, IEEE, 2024. preprint arXiv:2410.15319, 2024.
[147] E. Ephrati and J. S. Rosenschein, “A heuristic technique for multi-agent [168] S. Ashwani, K. Hegde, N. R. Mannuru, D. S. Sengar, M. Jindal,
planning,” Annals of Mathematics and Artificial Intelligence, vol. 20, K. C. R. Kathala, D. Banga, V. Jain, and A. Chadha, “Cause and effect:
pp. 13–67, 1997. can large language models truly understand causality?,” in Proceedings
[148] S. Kupferschmid, J. Hoffmann, H. Dierks, and G. Behrmann, “Adapting of the AAAI Symposium Series, vol. 4, pp. 2–9, 2024.
an ai planning heuristic for directed model checking,” in International [169] J. Richens and T. Everitt, “Robust agents learn causal world models,”
SPIN Workshop on Model Checking of Software, pp. 35–52, Springer, in The Twelfth International Conference on Learning Representations,
2006. 2024.
[170] A. Chan, R. Salganik, A. Markelius, C. Pang, N. Rajkumar, [190] D. Trusilo, “Autonomous ai systems in conflict: Emergent behavior and
D. Krasheninnikov, L. Langosco, Z. He, Y. Duan, M. Carroll, et al., its impact on predictability and reliability,” Journal of Military Ethics,
“Harms from increasingly agentic algorithmic systems,” in Proceed- vol. 22, no. 1, pp. 2–17, 2023.
ings of the 2023 ACM Conference on Fairness, Accountability, and [191] M. Puvvadi, S. K. Arava, A. Santoria, S. S. P. Chennupati, and H. V.
Transparency, pp. 651–666, 2023. Puvvadi, “Coding agents: A comprehensive survey of automated bug
[171] A. Plaat, M. van Duijn, N. van Stein, M. Preuss, P. van der Putten, fixing systems and benchmarks,” in 2025 IEEE 14th International
and K. J. Batenburg, “Agentic large language models, a survey,” arXiv Conference on Communication Systems and Network Technologies
preprint arXiv:2503.23037, 2025. (CSNT), pp. 680–686, IEEE, 2025.
[172] J. Qiu, K. Lam, G. Li, A. Acharya, T. Y. Wong, A. Darzi, W. Yuan, and [192] C. Newton, J. Singleton, C. Copland, S. Kitchen, and J. Hudack,
E. J. Topol, “Llm-based agentic systems in medicine and healthcare,” “Scalability in modeling and simulation systems for multi-agent, ai, and
Nature Machine Intelligence, vol. 6, no. 12, pp. 1418–1420, 2024. machine learning applications,” in Artificial Intelligence and Machine
[173] G. A. Gabison and R. P. Xian, “Inherent and emergent liability issues Learning for Multi-Domain Operations Applications III, vol. 11746,
in llm-based agentic systems: a principal-agent perspective,” arXiv pp. 534–552, SPIE, 2021.
preprint arXiv:2504.03255, 2025. [193] H. D. Le, X. Xia, and Z. Chen, “Multi-agent causal discovery using
[174] M. Dahl, V. Magesh, M. Suzgun, and D. E. Ho, “Large legal fictions: large language models,” arXiv preprint arXiv:2407.15073, 2024.
Profiling legal hallucinations in large language models,” Journal of [194] Y. Shavit, S. Agarwal, M. Brundage, S. Adler, C. O’Keefe, R. Camp-
Legal Analysis, vol. 16, no. 1, pp. 64–93, 2024. bell, T. Lee, P. Mishkin, T. Eloundou, A. Hickey, et al., “Practices for
[175] Y. A. Latif, “Hallucinations in large language models and their governing agentic ai systems,” Research Paper, OpenAI, 2023.
influence on legal reasoning: Examining the risks of ai-generated [195] P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal,
factual inaccuracies in judicial processes,” Journal of Computational H. Küttler, M. Lewis, W.-t. Yih, T. Rocktäschel, et al., “Retrieval-
Intelligence, Machine Reasoning, and Decision-Making, vol. 10, no. 2, augmented generation for knowledge-intensive nlp tasks,” Advances in
pp. 10–20, 2025. neural information processing systems, vol. 33, pp. 9459–9474, 2020.
[176] S. Tonmoy, S. Zaman, V. Jain, A. Rani, V. Rawte, A. Chadha, and [196] Y. Ma, Z. Gou, J. Hao, R. Xu, S. Wang, L. Pan, Y. Yang, Y. Cao,
A. Das, “A comprehensive survey of hallucination mitigation tech- A. Sun, H. Awadalla, et al., “Sciagent: Tool-augmented language
niques in large language models,” arXiv preprint arXiv:2401.01313, models for scientific reasoning,” arXiv preprint arXiv:2402.11451,
vol. 6, 2024. 2024.
[197] K. Dev, S. A. Khowaja, K. Singh, E. Zeydan, and M. Debbah,
[177] Z. Zhang, Y. Yao, A. Zhang, X. Tang, X. Ma, Z. He, Y. Wang,
“Advanced architectures integrated with agentic ai for next-generation
M. Gerstein, R. Wang, G. Liu, et al., “Igniting language intelligence:
wireless networks,” arXiv preprint arXiv:2502.01089, 2025.
The hitchhiker’s guide from chain-of-thought reasoning to language
[198] A. Boyle and A. Blomkvist, “Elements of episodic memory: in-
agents,” ACM Computing Surveys, vol. 57, no. 8, pp. 1–39, 2025.
sights from artificial agents,” Philosophical Transactions B, vol. 379,
[178] Y. Wan and K.-W. Chang, “White men lead, black women help? no. 1913, p. 20230416, 2024.
benchmarking language agency social biases in llms,” arXiv preprint [199] Y. Du, W. Huang, D. Zheng, Z. Wang, S. Montella, M. Lapata, K.-F.
arXiv:2404.10508, 2024. Wong, and J. Z. Pan, “Rethinking memory in ai: Taxonomy, operations,
[179] A. Borah and R. Mihalcea, “Towards implicit bias detection and mitiga- topics, and future directions,” arXiv preprint arXiv:2505.00675, 2025.
tion in multi-agent llm interactions,” arXiv preprint arXiv:2410.02584, [200] K.-T. Tran, D. Dao, M.-D. Nguyen, Q.-V. Pham, B. O’Sullivan, and
2024. H. D. Nguyen, “Multi-agent collaboration mechanisms: A survey of
[180] X. Liu, H. Yu, H. Zhang, Y. Xu, X. Lei, H. Lai, Y. Gu, H. Ding, llms,” arXiv preprint arXiv:2501.06322, 2025.
K. Men, K. Yang, et al., “Agentbench: Evaluating llms as agents,” [201] K. Tallam, “From autonomous agents to integrated systems, a
arXiv preprint arXiv:2308.03688, 2023. new paradigm: Orchestrated distributed intelligence,” arXiv preprint
[181] G. He, G. Demartini, and U. Gadiraju, “Plan-then-execute: An empir- arXiv:2503.13754, 2025.
ical study of user trust and team performance when using llm agents [202] Y. Lee, “Critique of artificial reason: Ontology of human and artificial
as a daily assistant,” in Proceedings of the 2025 CHI Conference on intelligence,” Journal of Ecohumanism, vol. 4, no. 3, pp. 397–415,
Human Factors in Computing Systems, pp. 1–22, 2025. 2025.
[182] Z. Ke, F. Jiao, Y. Ming, X.-P. Nguyen, A. Xu, D. X. Long, M. Li, [203] L. Ale, S. A. King, N. Zhang, and H. Xing, “Enhancing generative
C. Qin, P. Wang, S. Savarese, et al., “A survey of frontiers in llm ai reliability via agentic ai in 6g-enabled edge computing,” Nature
reasoning: Inference scaling, learning to reason, and agentic systems,” Reviews Electrical Engineering, pp. 1–3, 2025.
arXiv preprint arXiv:2504.09037, 2025. [204] N. Shinn, F. Cassano, A. Gopinath, K. Narasimhan, and S. Yao,
[183] M. Luo, X. Shi, C. Cai, T. Zhang, J. Wong, Y. Wang, C. Wang, “Reflexion: Language agents with verbal reinforcement learning,”
Y. Huang, Z. Chen, J. E. Gonzalez, et al., “Autellix: An efficient Advances in Neural Information Processing Systems, vol. 36, pp. 8634–
serving engine for llm agents as general programs,” arXiv preprint 8652, 2023.
arXiv:2502.13965, 2025. [205] F. Kamalov, D. S. Calonge, L. Smail, D. Azizov, D. R. Thadani,
[184] K. Hatalis, D. Christou, J. Myers, S. Jones, K. Lambert, A. Amos- T. Kwong, and A. Atif, “Evolution of ai in education: Agentic
Binks, Z. Dannenhauer, and D. Dannenhauer, “Memory matters: The workflows,” arXiv preprint arXiv:2504.20082, 2025.
need to improve long-term memory in llm-agents,” in Proceedings of [206] A. Sulc, T. Hellert, R. Kammering, H. Hoschouer, and J. S.
the AAAI Symposium Series, vol. 2, pp. 277–280, 2023. John, “Towards agentic ai on particle accelerators,” arXiv preprint
[185] H. Jin, X. Han, J. Yang, Z. Jiang, Z. Liu, C.-Y. Chang, H. Chen, and arXiv:2409.06336, 2024.
X. Hu, “Llm maybe longlm: Self-extend llm context window without [207] J. Yang, C. Jimenez, A. Wettig, K. Lieret, S. Yao, K. Narasimhan,
tuning,” arXiv preprint arXiv:2401.01325, 2024. and O. Press, “Swe-agent: Agent-computer interfaces enable automated
[186] M. Yu, F. Meng, X. Zhou, S. Wang, J. Mao, L. Pang, T. Chen, K. Wang, software engineering,” Advances in Neural Information Processing
X. Li, Y. Zhang, et al., “A survey on trustworthy llm agents: Threats Systems, vol. 37, pp. 50528–50652, 2024.
and countermeasures,” arXiv preprint arXiv:2503.09648, 2025. [208] S. Barua, “Exploring autonomous agents through the lens of large
[187] H. Chi, H. Li, W. Yang, F. Liu, L. Lan, X. Ren, T. Liu, and B. Han, language models: A review,” arXiv preprint arXiv:2404.04442, 2024.
“Unveiling causal reasoning in large language models: Reality or
mirage?,” Advances in Neural Information Processing Systems, vol. 37,
pp. 96640–96670, 2024.
[188] H. Wang, A. Zhang, N. Duy Tai, J. Sun, T.-S. Chua, et al., “Ali-agent:
Assessing llms’ alignment with human values via agent-based evalu-
ation,” Advances in Neural Information Processing Systems, vol. 37,
pp. 99040–99088, 2024.
[189] L. Hammond, A. Chan, J. Clifton, J. Hoelscher-Obermaier, A. Khan,
E. McLean, C. Smith, W. Barfuss, J. Foerster, T. Gavenčiak,
et al., “Multi-agent risks from advanced ai,” arXiv preprint
arXiv:2502.14143, 2025.

You might also like