RAG-based Explainable Prediction
RAG-based Explainable Prediction
Keywords: The prediction of road user behaviors in the context of autonomous driving has attracted considerable attention
Road users’ behaviors from the scientific community in recent years. Most works focus on predicting behaviors based on kinematic
Explainable predictions information alone, a simplification of reality since road users are humans, and as such they are highly
Pedestrian crossing actions
influenced by their surrounding context. In addition, a large plethora of research works rely on powerful Deep
Lane change maneuvers
Learning techniques, which exhibit high-performance metrics in prediction tasks but may lack the ability to
Autonomous driving
fully understand and exploit the contextual semantic information contained in the road scene, not to mention
their inability to provide explainable predictions that can be understood by humans. In this work, we propose
an explainable road users’ behavior prediction system that integrates the reasoning abilities of Knowledge
Graphs (KG) and the expressiveness capabilities of Large Language Models (LLM) by using Retrieval Augmented
Generation (RAG) techniques. For that purpose, Knowledge Graph Embeddings (KGE) and Bayesian inference
are combined to allow the deployment of a fully inductive reasoning system that enables the issuing of
predictions that rely on legacy information contained in the graph, as well as on current evidence gathered
in real-time by onboard sensors. Two use cases have been implemented following the proposed approach: 1)
Prediction of pedestrians’ crossing actions; and 2) Prediction of lane change maneuvers. In both cases, the
performance attained exceeds the current state-of-the-art in terms of anticipation and F1 score, showing a
promising avenue for future research in this field.
1. Introduction crashes, as the same report indicates that 33% of all road crashes
take place during a lane change maneuver. These statistics underscore
Despite the significant progress that the world has experienced in the need to develop technologies that aim to improve road safety by
the last years in terms of road safety, road traffic deaths continue endowing automated vehicles with the ability to anticipate pedestrian
to represent a global health crisis, according to the World Health and driver behaviors and motion patterns, such as road crossing actions
Organization report on road safety (WHO, 2023), especially for Vul- (for pedestrians) and lane change maneuvers (for drivers).
nerable Road Users (VRUs), which are involved in 53% of all road In the context of autonomous driving, a large number of road user
traffic fatalities. The same report highlights the fact that 23% of fatal behavior estimation algorithms have been developed to predict the
accidents involve pedestrians. As a matter of fact, pedestrians are the next actions of pedestrians (Kotseruba, Rasouli, & Tsotsos, 2021), cy-
most vulnerable road user group also on European Union roads, being clists (Pool, Kooij, & Gavrila, 2021), and drivers (Izquierdo et al., 2023),
involved in 20% of road traffic fatalities (Slootmans, 2021). Similarly, in an attempt to understand and anticipate their behaviors. However,
statistics published in 2023 by the National Highway Traffic Safety Ad- there is a missing component in the literature, namely a holistic view
ministration (NHTSA) reveal an increase in the number of deaths from
of road users’ behavior and decision-making to identify the extent of
motor vehicle traffic accidents in the United States of America in 2021
factors that affect their behaviors and to explain in what ways they are
compared to 2020, and a 17.3% increase compared to 2019 (Stewart,
interrelated. This gap exists because the majority of research works in
2023), lane change maneuvers being one of the main causes of vehicle
✩ This research has been funded by the HEIDI project of the European Commission under Grant Agreement: 101069538.
∗ Corresponding author.
E-mail addresses: [email protected] (M.M. Hussien), [email protected] (A.N. Melo), [email protected] (A.L. Ballardini),
[email protected] (C.S. Maldonado), [email protected] (R. Izquierdo), [email protected] (M.Á. Sotelo).
1
These authors contributed equally to this work.
https://doi.org/10.1016/j.eswa.2024.125914
Received 29 July 2024; Received in revised form 19 November 2024; Accepted 22 November 2024
Available online 29 November 2024
0957-4174/© 2024 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
M.M. Hussien et al. Expert Systems With Applications 265 (2025) 125914
the literature disregard the theoretical findings of traffic interaction and Overall, accounting for contextual information is essential for un-
treat the problem as dealing with rigid dynamic objects rather than a derstanding behavior in this situation. This context-based reasoning
social being (Schulz & Stiefelhagen, 2015). Consequently, they focused approach can be extended and applied to many other similar situations,
primarily on observable kinematic numerical data—such as positions, involving pedestrians and drivers, where contextual information is the
velocities, and accelerations—without fully incorporating the social key for understanding and anticipating behavior.
and contextual cues that are crucial for accurate prediction. Moreover,
these models often act as ‘‘black boxes’’, making it difficult to reason 2. Literature and research foundation
about and interpret their outputs. This issue is significant because the
lack of interpretability poses challenges in explaining the models and 2.1. Literature review
their predictions to others who may not be familiar with the underly-
ing algorithms. Therefore, explainability is crucial in the automotive A large number of research works have dealt with road users’
industry. We need models that cannot only predict behaviors but also behavior prediction in the context of autonomous driving. Regard-
provide clear explanations for their predictions. This is important for ing the prediction of lane change maneuvers, Su, Muelling, Dolan,
the validation and standardization processes that autonomous vehicles Palanisamy, and Mudalige (2018) used a Long Short-Term Memory
must undergo. By offering transparent decision-making, we can ensure (LSTM) model to predict vehicle lane changes by considering the
safety and build trust among users, which is essential for the acceptance vehicle’s past trajectory and neighbors’ states. In the work presented
and deployment of autonomous vehicles. by Benterki, Boukhnifer, Judalet, and Choubeila (2019), two machine
Building on this foundation, in this work, we focus on developing a learning models were utilized to predict lane changes of surrounding
holistic behavior understanding and prediction system that integrates vehicles on highways. The inputs were longitudinal/lateral velocities,
comprehensive contextual information to anticipate the actions of both longitudinal/lateral accelerations, distance to left/right lane markings,
pedestrians and drivers. Specifically, by leveraging linguistic contextual yaw angle, and yaw rate related to the road. These inputs were trained
information provided to a knowledge graph, we then utilize Bayesian and tested on Support Vector Machines (SVM) and Artificial Neural Net-
inference as a downstream task on top of the learned embeddings to works (ANN) models. Izquierdo, Quintanar, Parra, Fernández-Llorca,
provide predictions. We then employ retrieval augmented generation and Sotelo (2019a) predicted lane change intentions of surrounding
to provide explanations of the obtained predictions based on the given vehicles using two different methodologies and by only considering the
contextual inputs. This integrated approach not only enhances pre- visual information provided by the PREVENTION dataset (Izquierdo,
diction accuracy but also provides interpretability and explainability, Quintanar, Parra, Fernández-Llorca, & Sotelo, 2019b). The first method
contributing to the development of trustworthy systems. was Motion History Image - Convolutional Neural Network (MHI-CNN),
For instance, Fig. 1 depicts a situation involving several vehicles where temporal and visual information was obtained from the MHI,
driving on a three-lane highway. The green vehicle, which drives along and then fed to the CNN model. The second model was the GoogleNet-
the middle lane, represents the vehicle under interest, also referred to as LSTM model, in which a feature vector was obtained from a GoogleNet
the target vehicle, while the blue vehicles represent the vehicles around CNN model and then fed to the LSTM model to learn temporal patterns.
the target vehicle, being denoted as Left Following (LF, i.e., the vehicle Laimona, Manzour, Shehata, and Morgan (2020) trained LSTM and
driving behind the target vehicle on the left adjoining lane), Right Recurrent Neural Networks (RNN) models on the PREVENTION dataset
Following (RF, i.e., the vehicle driving behind the target vehicle on the to predict surrounding vehicles’ lane changing intentions by tracking
right adjoining lane), and Preceding (P, i.e., the vehicle preceding the the vehicles’ positions (centroid of the bounding box). Sequences of 10,
target vehicle along its ego-lane, which in this example is the middle 20, 30, 40, and 50 frames of (X, Y) coordinates of the target vehicle
lane), respectively. The figure shows the situation at the current time were considered for comparison. It was concluded that RNN models
(in solid colors) as well as several future positions of all vehicles (in performed better on short sequence lengths and the LSTM model out-
semi-transparent colors) according to the most likely predictions. In this performed RNN at long sequences. The work implemented by Xue,
scenario, the target vehicle is approaching quickly and dangerously the Xing, and Lu (2022) utilized eXtreme Gradient Boosting (XGBoost)
preceding vehicle due to a high difference in velocity between them, and LSTM to predict the vehicle lane change decision and trajectory
thus there is a high risk of collision with P based on the analysis of the prediction, respectively, in scenarios in the HighD dataset (Krajewski,
estimated time to collision (TTC). Given this context, the target vehicle Bock, Kloeker, & Eckstein, 2018). The models were based on the
has two possible courses of action: A) stay on the same lane while traffic flow (traffic density) level, the type of vehicle, and the relative
decreasing velocity abruptly to avoid a collision with P; B) execute a trajectory between the target vehicle and surrounding vehicles. Vehicle
left lane change maneuver, which seems to be the safer maneuver for trajectory predictions were issued based on historical trajectories and
all the actors involved in the scene. Action B (left lane change) appears the predicted lane change decisions. In the work by Gao et al. (2023),
to be a natural behavior that can be executed in a soft, organic manner a dual Transformer model was proposed. The first Transformer was
in coordination with the three surrounding vehicles. Consequently, the intended for lane change prediction, while the second one was used
behavior understanding and prediction system of LF would regard the for trajectory prediction. Similarly, the prediction of pedestrians’ cross-
left lane change maneuver of the target vehicle as a very likely future ing actions is another task that has been intensively targeted by the
behavior and would anticipate and prepare its ego-actions accordingly research community, focusing on forecasting whether or not a target
to accommodate such a potential maneuver of the target vehicle. pedestrian will cross the road at some point in the near future (typically
So, our model uses linguistic contextual information about the in the next 1–5 s). This task has been addressed through a diverse range
surrounding traffic situation—such as high-risk Time-to-Collision (TTC) of algorithms and architectures. Among these approaches, it is par-
with the preceding vehicle and the right following vehicle, and low-risk ticularly noteworthy to highlight several methods such as SingleRNN
TTC with the left following vehicle. Then, this information is provided based on Recurrent Neural Networks (RNNs) (Kotseruba, Rasouli, &
in the knowledge graph. After that, Bayesian inference is carried out on Tsotsos, 2020), CapFormer which uses a self-attention alternative based
top of the KGE to give the final prediction that the target vehicle will on transformer architecture (Lorenzo et al., 2021), a 3D convolutional
execute (a left lane change maneuver in that case). Finally, Retrieval model (C3D) based on spatiotemporal feature learning (Tran, Bourdev,
augmented generation explains this prediction: ‘‘The target vehicle is Fergus, Torresani, & Paluri, 2015), a stacked multilevel fusion RNN
changing lanes to the left to avoid potential collisions due to high-risk (SFRNN) (Rasouli, Kotseruba, & Tsotsos, 2020), and convolutional
TTC with vehicles ahead and to the right, while the left lane offers a LSTM (ConvLSTM) (SHI et al., 2015). Despite the abundance of models
safer option with low-risk TTC’’. and research focused on pedestrian crossing predictions, only a limited
2
M.M. Hussien et al. Expert Systems With Applications 265 (2025) 125914
Fig. 1. The target vehicle (green) will most likely make a left lane change maneuver based on the risk assessment of the surrounding (blue) vehicles.
number of them provide insights into explainability or are specifi- 2.2. Research gap
cally developed within the context of explainability. For instance, the
research by Achaji, Moreau, Aioun, and Charpillet (2023) highlights Despite advances in behavior prediction models for road users,
that Transformers offer an advantage in terms of interpretability, due existing approaches predominantly rely on deep learning techniques
to their attention mechanism. Moreover, the utilization of pedestrian that focus on numerical kinematic data such as positions, velocities,
location and body keypoints as features in predicting pedestrian actions and accelerations. As a result, these models often function as ‘‘black
results in more human-like behavior. Muscholl, Klusch, Gebhard, and boxes’’, lacking interpretability and the ability to incorporate the rich
contextual and semantic information present in road scenes. Further-
Schneeberger (2021) propose a dynamic Bayesian network model that
more, they typically do not fully exploit the interdependencies between
takes into account the influence of interaction and social signals. This
road users, the environment, and the implicit semantic relations that
system leverages visual means and employs various inference methods
influence behavior. Moreover, traditional models struggle to utilize
to provide explanations for its predictions, with a specific focus on
prior knowledge or linguistic information that could enhance prediction
determining the relative importance of each feature in influencing the accuracy and explainability. This limitation arises because they are
probability of pedestrian actions. generally impractical for learning all facts and data patterns from
While Deep Learning (DL) techniques have been reasonably success- scratch, especially in complex, real-world scenarios where understand-
ful in solving road users’ behavior prediction tasks, they may lack the ing subtle cues and relationships is crucial. This limitation hinders
ability to fully understand and exploit the interdependences between their ability to generalize across diverse environments and to provide
road users and the semantic relations implicit in a road scene. Not transparent reasoning behind their predictions. Therefore, there is a
in vain, in real-world applications it is impractical and inefficient need for models that can integrate comprehensive linguistic contextual
to learn all facts and data patterns from scratch, especially when information—including human factors, social interactions, and environ-
prior and linguistic knowledge is available. As an alternative, neuro- mental conditions—and that can reason about this information in an
symbolic learning can exploit such information to further improve interpretable manner. Such models should not only predict behaviors
the ability to really understand road scenes by utilizing well-formed but also provide explanations for these predictions, enhancing trust and
facilitating the adoption of autonomous systems.
axioms and rules that can guarantee explainability, both in terms of
asserted and inferred knowledge (Yi et al., 2018). In neuro-symbolic
2.3. Research objective
systems, abstract knowledge extraction is first carried out by means
of neural DL techniques, that transform reality into symbols, while
In this work, we aim to address the need to understand and pre-
logic (or symbolic) reasoning is then performed on the grounds of dict road users’ behaviors by incorporating contextual features into
such symbols. This human-like reasoning approach is interpretable a knowledge-based representation that can also encode other sources
and disentangled, while allowing for compositional, accurate, and gen- of information, such as human knowledge representing driving expe-
eralizable reasoning in rich, complex contexts, such as road scene rience. For that purpose, we propose a neuro-symbolic approach that
understanding and autonomous driving, that require identifying and will combine expressive features (representing road users’ context) and
reasoning about entities (road users, road context, and events) that are human experience (in the form of linguistic descriptions and/or rules)
bundled together by means of spatial, temporal, social, and semantic in a Knowledge Graph (KG) representation.
relations. Knowledge-infused techniques, such as Knowledge Graphs
(KG) (Hogan et al., 2021), enable the deployment of neuro-symbolic 1. Integration of Linguistic Contextual Information through Knowl-
edge Graphs: We construct a knowledge graph that encodes
reasoning given their capacity for representing knowledge and inter-
linguistic contextual information beyond mere kinematic data.
actions by means of directed graphs that can represent multiple and
This allows the model to capture complex relationships and de-
heterogeneous relations among entities. In addition, Knowledge Graph
pendencies among various factors influencing road users’ behav-
Embeddings (KGE) (Wickramarachchi, Henson, & Sheth, 2020) is a
iors, enabling generalization across different road environments
machine learning task that aims at learning a latent continuous vector involving both vehicles and pedestrians.
space representation (namely, embeddings) of the nodes and edges of 2. We perform behavior prediction by applying Bayesian inference
a KG, where the nodes represent the road users, the road context, and as a downstream task on the grounds of the learned embeddings
events, and the edges represent the semantic relations among them. derived from the knowledge graph, which will enable us to
Knowledge completion with KGEs can be used for predicting missing perform a fully inductive reasoning system based on KGE. This
entities (e.g. occluded pedestrians) or relations (e.g. lane change inten- principled approach leverages the rich contextual information
tion) in road scenes that may have been missed by purely data-driven captured in the embeddings to provide probabilistic predictions
techniques. about future behaviors.
3
M.M. Hussien et al. Expert Systems With Applications 265 (2025) 125914
3. To address the interpretability challenge, we use retrieval aug- 3.1.1. Pedestrian use case
mented generation to provide explanations for the obtained pre- The pedestrian use case focused on predicting whether a pedestrian
dictions based on the given contextual inputs. This step makes will cross the road in the next 30 frames. The entire pipeline has been
the model’s reasoning transparent and understandable, enhanc- trained and tested using two datasets:
ing trustworthiness by allowing users to comprehend why cer-
tain predictions were made. The explanations developed in this • Joint Attention for Autonomous Driving (JAAD).2 This dataset com-
work are intended to justify the decisions of AVs and support the prises 348 short video clips, each extensively annotated to depict
traceability required for AV standardization processes. Addition- various road actors and scenarios across diverse driving locations,
ally, these explanations can enhance the decision-making mod- traffic, and weather conditions. The dataset annotations encom-
ule itself, enabling it to produce more grounded and reasoned pass spatial, behavioral, contextual, and pedestrian information.
decisions. Details of the dataset are described in Rasouli, Kotseruba, and
The rest of this article is organized as follows: Section 3.1 describes Tsotsos (2017)
the procedure followed to build a Knowledge Graph that encodes road • Pedestrian Situated Intent (PSI).3 The dataset comprises 104 train-
users’ behavioral models; Section 3.2 presents the road users’ behavior ing videos, 34 validation videos, and 48 testing videos, collec-
prediction system using KGE and Bayesian inference; Section 3.3 dives tively covering 196 scenes. It includes bounding box annotations
into the details of how to achieve explainability with the proposed for traffic objects and agents, which are accompanied by text
approach; Section 4 introduces the implementation and experimental descriptions and reasoning explanations. Details of the dataset are
results and Section 5 describes the conclusions and future work. described in Chen et al. (2022).
Both datasets provide valuable scenarios captured in varied loca-
3. Proposed method tions with different pedestrians, offering significant insight into pedes-
trian street behaviors. However, they present some limitations due to
This section introduces the proposed method for providing an ex- the imbalance between pedestrians who cross the street and those who
plainable prediction of road user behaviors. Our method incorporates do not. In both datasets, approximately 72% of pedestrians actively
three main components: (1) a knowledge representation of the real cross the street, while the remaining 28% do not. Using the data as
world, (2) a road user behavior prediction approach, and (3) an ex- originally provided leads to overfitting towards pedestrian crossing
plainability component. The section first provides context, detailing
behavior, with most predictions indicating crossing. To improve the
feature extraction and transformation, and explains how these are used
reliability and quality of the experiments, the data were balanced
to design an ontology for two use cases focusing on pedestrians and
before implementation, as detailed in Section 4.1.
drivers. Next, the proposed prediction approach is detailed in all its
The process of modeling pedestrian behavior first considers feature
phases, and finally, the explainability component is presented from
extraction and linguistic transformation. From the mentioned datasets,
multiple perspectives.
a set of pedestrian features is extracted using a deep learning approach,
and then they are transformed from numerical to linguistic values, as
3.1. Modeling road user’s behaviors using knowledge graphs
detailed in Melo, Herrera-Quintero, Salinas, and Sotelo (2024). The
features extracted for each annotated pedestrian include:
In the realm of knowledge representation, the Knowledge Graph
(KG) stands as a key tool encoding triples that reveal real-world facts • Motion Activity: States the motion activity of the pedestrian
and semantic connections (Peng, Xia, Naseriparsa, & Osborne, 2023).
• Proximity to the road: Transforming from an assessment of road
A triple, the fundamental building block within the KG, comprises
segmentation and pedestrian location to a linguistic representa-
three elements: subject, predicate, and object (alternatively termed as
tion indicating the pedestrian’s proximity to the road in three
head, relation, and tail). Conceptually, a KG manifests itself as a graph
levels, based on their closeness to it.
where the edges represent relations and the nodes denote entities.
• Distance: Transforming from an estimated distance in meters to a
As previously mentioned, one of the applications of KGs focuses on
linguistic representation that indicates the pedestrian’s proximity
transforming them into low-dimensional vectors that encode entities
to the ego-vehicle.
and relationships, a technique known as Knowledge Graph Embedding
• Body Orientation: Transforming from an angle ranging from
(KGE). The resultant vector is employed for learning and reasoning
0◦ to 360◦ to a linguistic representation that encodes the pedes-
within embedding-based machine learning models, which can rely
on distance-based measures or similarity-based scoring (Choudhary, trian’s body posture from the perspective of the ego-vehicle.
Luthra, Mittal, & Singh, 2021). In this work, we employ two different
models for KGEs: a distance-based model known as TransE (Bordes, • Gaze: Transforming from a binary value to a linguistic indicator
Usunier, Garcia-Duran, Weston, & Yakhnenko, 2013) and a similarity- that denotes whether the pedestrian is observing the ego-vehicle.
based model named ComplEx (Trouillon, Welbl, Riedel, Gaussier, & Once features are extracted and transformed, a pedestrian behavior
Bouchard, 2016). ontology, referred to as PedFeatKG in this study, is designed from
It is important to highlight that the KG provides a generalizable pedestrian features and was outlined in Table 1. It encompasses the
approach applicable across various road scenarios and road user be- classes (entities in the KG), their descriptions, the instances of each
haviors. By enabling knowledge representation based on either data class representing linguistic values, and the potential relations asso-
or expert insights, this approach captures real-world scenarios through ciated with each class. In the PedFeatKG ontology, each pedestrian
entities and relationships. For our study, we evaluate the effectiveness from the dataset’s training set was represented by a class with a
of our approach through two real-world use cases focusing on different unique ID (noted as Pedestrian ID). At the same time, the Pedestrian
road user behaviors, including both pedestrians and drivers. ID was associated with a specific pedestrian instance at a particular
The KG design process centers on modeling road user behavior by frame (noted as Pedestrian instance ID). This latter class comprised
leveraging available features and contextual understanding of each sce- the pedestrian dataset ID and the frame number. For instance, if a
nario. Each use case was carefully evaluated, drawing from a collection
of real-world scenarios that pedestrians and drivers encounter daily.
From these scenarios, we identified a list of features that offer mean- 2
JAAD dataset is publicly available at: https://data.nvision2.eecs.yorku.ca/
ingful information about the context. These features were then system- JAAD_dataset/.
atically incorporated into the KG design to ensure a comprehensive 3
PSI dataset is publicly available at: http://pedestriandataset.situated-
representation of road user behavior. intent.net/.
4
M.M. Hussien et al. Expert Systems With Applications 265 (2025) 125914
Table 1
Pedestrian behavior ontology.
Class Class description Instance Possible relation
Pedestrian Generic entity pointing to every child pedestrian Pedestrian Any
Pedestrian ID Individual Pedestrian ID Ped1 HAS_CHILD
Pedestrian instance ID ID for a pedestrian at a particular frame Ped1–30 INSTANCE_OF
PREVIOUS
NEXT
Motion Activity Pedestrian motion activity Stand, Walk, Wave, Run, Na MOTION
Proximity Pedestrian closeness to the road NearFromCurb, MiddleDisFromCurb,
FarFromCurb LOCATION
Distance Pedestrian closeness to the ego-vehicle TooNearToEgoVeh, NearToEgoVeh,
MiddleDisToEgoVeh, FarToEgoVeh EGO_DISTANCE
TooFarToEgoVeh
Orientation Pedestrian body orientation VehDirection, LeftDirection, ORIENTATION
OppositeVehDirection, RigthDirection
Gaze Pedestrian attention Looking, NotLooking ATTENTION
Cross Action Crossing behavior of the pedestrian crossRoad, noCrossRoad ACTION
Fig. 2. 2(a) One KG instance where the vehicle has zero lateral acceleration and has medium TTC risk with the preceding vehicle and high TTC with the left following vehicle.
(b) PedFeatKG from explainable features with 1 instance.
pedestrian has the ID ‘‘ped1’’, there will be as many classes as frames The process of modeling lane change behavior first considers feature
considered in the following structure: ‘‘ped1–30’’, ‘‘ped1–32’’, ‘‘ped1– extraction and linguistic transformation. The inputs that will be used
34’’, and so on. Linking the pedestrian ID with the pedestrian instance to construct the KG are vehicle lateral velocity and acceleration, the
ID enabled the association of all pedestrian instances, indicating to the target vehicle intention, TTC with the preceding vehicle, and TTC
KG that they represent the same pedestrian across different frames. with the left/right preceding and following vehicles. These inputs are
Additionally, each pedestrian instance ID was linked with its previous extracted from the highD dataset in numerical format. Then, they are
and next pedestrian instance ID, thus providing temporal association converted to linguistic categories. For example, the lateral acceleration
information regarding pedestrian behavior in a road scene within numerical value is converted to a category from a set of linguistic cate-
the KG. Likewise, all pedestrian ID classes extracted were associated gories like accelerating left, zero lateral acceleration, and accelerating
with a generalization class called Pedestrian, enabling any specific right. To divide each numerical feature into some linguistic categories,
pedestrian to be linked to a general one. This linkage is considered some thresholds are determined. Following the structure proposed
a path reification link. On the other hand, each pedestrian instance ID in Manzour, Ballardini, Izquierdo, and Sotelo (2024), we used the
was subsequently linked with the five pedestrian features that represent normal distribution of the lateral velocity and acceleration separation
the pedestrian’s state in the following triple format: ⟨pedestrian-instance- thresholds. Regarding the other TTC numerical variables, the thresh-
ID, FEATURE_RELATION, value⟩. In addition, it can be observed that olds are based on the studies in Manzour et al. (2024), Saffarzadeh,
Nadimi, Naseralavi, and Mamdoohi (2013) and Ramezanı khansarı,
the pedestrian instance ID was also linked with a crossing behavior,
Moghadas Nejad, and Moogeh (2021).
delineated by two possible class values: crossRoad or noCrossRoad.
Once features are extracted and transformed, a driver behavior
Fig. 2(b) shows a generated KG instance from the PedFeatKG ontology.
ontology, referred to as DriverKG in this study, is based on the reifi-
In this example, it represented the state of the pedestrian with ID
cation of nodes and relationships obtained from the HighD dataset,
0_12_57b in frame 40, its features, and its future crossing action.
to get reified triples. For example, if the vehicle is accelerating in
any direction and the TTC risk with the left following vehicle is high,
3.1.2. Drivers use case then the reified triples will be ⟨vehicle, LATERAL_ACCELERATION_IS, ze-
In the lane change prediction use case scenario, the HighD4 dataset is roAcceleration⟩, and ⟨vehicle, TTC_WITH_LEFT_FOLLOWING_VEHICLE_IS,
used. It is a German dataset that was recorded using a camera mounted highRiskLeftFollowing ⟩. Table 2 shows the KG ontology for the lane
on a drone, providing a collection of naturalistic top-view scenes of change prediction case. The table is divided into four columns. The first
vehicle movements and interactions on German highways. Details of column shows the possible classes in the KG. The description of each
the dataset are described in Krajewski et al. (2018). class is indicated in the second column. The third column shows the
possible reified instances that can be assigned to that class, given that
the class can take only one instance at a frame. The last column shows
4
The dataset is publicly available through the following link https:// the relation that points to that class. In this ontology, a generic entity
levelxdata.com/highd-dataset. named vehicle is linked to various child vehicles via the HAS_CHILD
5
M.M. Hussien et al. Expert Systems With Applications 265 (2025) 125914
Table 2
Vehicle behavior ontology.
Class Class Description Instance Possible Relation
LLC (Left Lane Change)
intention Lane changing intention LK (Lane Keep) INTENTION_IS
of the vehicle RLC ( Right Lane Change)
movingLeft
latVelocity Vehicle lateral velocity movingStraight LATERAL_VELOCITY_IS
movingRight
leftAcceleration
latAcceleration Vehicle lateral accelera- zeroAcceleration (No LATERAL_ACCELERATION_IS
lateral acceleration)
tion rightAcceletion
highRiskPreceding
ttcPreceding TTC with the preceding mediumRiskPreceding TTC_WITH_PRECEDING_VEHICLE_IS
(front) vehicle lowRiskPreceding
highRiskLeftPreceding
ttcLeftPreceding TTC with the left mediumRiskLeftPreceding TTC_WITH_LEFT_PRECEDING_VEHICLE_IS
preceding (front) vehicle lowRiskLeftPreceding
highRiskRightPreceding
ttcRightPreceding TTC with the right mediumRiskRightPreceding TTC_WITH_RIGHT_PRECEDING_VEHICLE_IS
preceding (front) vehicle lowRiskRightPreceding
highRiskLeftFollowing
ttcLeftFollowing TTC with the left mediumRiskLeftFollowing TTC_WITH_LEFT_FOLLOWING_VEHICLE_IS
following (rear) vehicle lowRiskLeftFollowing
highRiskRightFollowing
ttcRightFollowing TTC with the right mediumRiskRightFollowing TTC_WITH_RIGHT_FOLLOWING_VEHICLE_IS
following (rear) vehicle lowRiskRightFollowing
vehicleID Child vehicle ID which changes every frame vehicle ID number HAS_CHILD
(e.g. ‘741’)
vehicle Generic entity pointing to every child vehicle – Any
relation. Each child vehicle is assigned a unique ID (known as vehicleID) 3.2. Road users’ behavior prediction approach
for each frame. It is important to note that even if it is the same
Both Road Users’ Behavior Predictions use cases to leverage the pro-
physical vehicle across different frames, it will receive a new vehicleID
posed architecture based on feature extraction, KGs and their associated
in each frame. Consequently, while both IDs in reality refer to the same ontologies. The overall workflow, depicted in Fig. 3, comprises three
vehicle, they are treated as distinct vehicles within the ontology when main phases: (1) KG Generation, (2) KGE Learning, and (3) Bayesian
generating triples and constructing the KG. Inference and Prediction. This section provides the details of the three
phases.
Fig. 2(a) shows a generated KG instance based on the previously
mentioned ontology. In this instance, the vehicle with ID 741 is a child 3.2.1. Phase 1: KG generation using all types of knowledge
of the generic entity vehicle. This can be described in a triple with for- To capture and encapsulate the data concerning road users’ be-
mat ⟨targetVehicle, HAS_CHILD, 741⟩. This child has latAcceleration class haviors, the initial step involved extracting the data and features that
describe each scene from both the driver’s and pedestrian’s perspec-
assigned to zeroAcceleration instance. Also, 741 has mediumRiskPre-
tives. The subsequent step was to convert the extracted features into
ceding and highRiskLeftFollowing TTC. vehicle 741 INTENTION_IS LK. linguistic values. Following this, utilizing the Ampligraph 2.0.0 li-
brary (Costabello et al., 2019), the KG is constructed in the form of
6
M.M. Hussien et al. Expert Systems With Applications 265 (2025) 125914
Fig. 4. Calculating the lane change probabilities in the Bayesian Inference and Prediction phase.
triples, where a set of triples represents the scene in a frame. Building the graph. For example, assuming that the evidence is composed of
the KG is a process executed based on a KG ontology that generalizes two elements 𝑒1 and 𝑒2 indicating respectively that the pedestrian is
the data applicable to each road scene. This knowledge can originate looking to the ego-vehicle and the pedestrian is near to the ego-vehicle,
from various sources and formats, including annotations in datasets, the associated probabilities are given by the reification of following
fuzzy rules, and textual explanations concerning road user behavior. two triplets ⟨pedestrian, action, looking ⟩ and ⟨pedestrian, egoDistance,
nearToEgoVeh⟩.
3.2.2. Phase 2: KGE learning
𝑃 (𝑒) = 𝑃 (𝑒1 ) × ⋯ × 𝑃 (𝑒𝑛 ) (2)
In the second phase, Ampligraph 2.0.0 was employed to construct
a KGE model using the KG generated in the previous phase. We used
the ScoringBasedEmbeddingModel to implement a neural architecture The probability of the evidence given the hypothesis 𝑃 (𝑒|ℎ) is
that encodes concepts from a KG into low-dimensional vectors, using calculated based on Eq. (3). It is computed as the product of the
a scoring layer such as ComplEx and TransE. The training and valida- probabilities of all pieces of evidence assuming the condition that
tion process using the KGE model is conducted using the Ampligraph the hypothesis is true. These conditioned pieces of evidence are also
library. As a result of this phase, optimal embeddings representing the reified. For example, calculating the probability of a pedestrian near
KG are obtained. These embeddings are then utilized in the subsequent the vehicle given the hypothesis that this pedestrian will cross the
phase for inference and prediction over the KG. road can be reified as ⟨nearToEgoVeh, INTENTION_IS, crossRoad⟩. The
computation of this conditional probability implies that we take for
3.2.3. Phase 3: Bayesian inference & prediction granted that the object entity is a pedestrian who will cross the road.
Our approach is designed to leverage inductive reasoning by incor- Under these conditions, the likelihood of the pedestrian being close
porating specific structures, known as reifications, into the knowledge to the vehicle in these circumstances will be calculated. After that,
graph by leveraging the properties of the ontology. This allows us to all computed conditioned probabilities are then multiplied together to
perform Bayesian inference (phase 3 in Fig. 3) on the embeddings determine 𝑃 (𝑒|ℎ). Finally, with all these probabilities available from the
derived from the KG learning phase (phase 2). These embeddings graph through the embeddings, the probability of a hypothesis given
are trained on the contextual and linguistic data extracted from the the evidence 𝑃 (ℎ|𝑒) can be calculated using the Bayes rule in Eq. (1).
datasets, as detailed in Section 3.1.1 and Section 3.1.2. Once these
𝑃 (𝑒|ℎ) = 𝑃 (𝑒1 , … , 𝑒𝑛 |ℎ) = 𝑃 (𝑒1 |ℎ) × ⋯ × 𝑃 (𝑒𝑛 |ℎ) (3)
embeddings are obtained, it is then possible to calculate the probabil-
ities of reified triples 𝑃 (ℎ, 𝑟, 𝑡) using the evaluation method from the
AmpliGraph library. Then, the Bayes rule in Eq. (1) is used to compute By analogy, the same concept applies to vehicle lane change pre-
the probability of a hypothesis given some evidence, denoted as 𝑃 (ℎ|𝑒), diction. For example, let us suppose that we are interested in cal-
where h represents the hypothesis (such as the likelihood of a pedes- culating the probability that a vehicle will make a left lane change
trian intention to cross the road) and e stands for the evidence, which in given that the vehicle is moving straight while having a high TTC
this context is data measured by onboard sensors at a specific moment. risk with the preceding vehicle. For better illustration, Fig. 4 shows
The datasets (like JAAD for pedestrians and HighD for vehicles) provide how Bayesian inference operates over the KG to calculate the lane
this sensory data. change probabilities. The probability 𝑃 (ℎ) is computed by evaluat-
𝑃 (ℎ)𝑃 (𝑒|ℎ) ing the triple ⟨targetVehicle, INTENTION_IS, LLC⟩ from the KGE. Then,
𝑃 (ℎ|𝑒) = (1)
𝑃 (𝑒) the probability 𝑃 (𝑒) is calculated by multiplying the two evaluated
For instance, if the hypothesis is that a pedestrian intends to cross the triplets (1) ⟨targetVehicle, LATERAL_VELOCITY_IS, movingStraight ⟩ and
road and the evidence includes observations such as (i) the pedestrian’s (2) ⟨targetVehicle, TTC_WITH_PRECEDING_VEHICLE_IS, highRiskPreceing ⟩.
attention state is looking and (ii) the pedestrian’s location is near to Similarly, the probability 𝑃 (𝑒|ℎ) is computed by the multiplication
the vehicle, then the probability of the hypothesis 𝑃 (ℎ) is determined of the two evaluated triplets (1) ⟨movingStraight, INTENTION_IS, LLC⟩
by evaluating a reified triple, e.g., reifying the intention of a pedestrian and (2) ⟨highRiskPreceding, INTENTION_IS, LLC⟩. Finally, 𝑃 (ℎ|𝑒) is cal-
to cross the road into the triple ⟨pedestrian, INTENTION_IS, crossRoad⟩. culated using Eq. (1). This process combines structured knowledge
Concerning the calculation of 𝑃 (𝑒), which involves a series of multiple representation with probabilistic Bayesian inference to make predic-
pieces of evidence, we employed Eq. (2) since each piece of evidence tions based on observed data. This process is repeated for each label,
is considered independent. Additionally, each element 𝑒𝑖 is reified from so the probability of LLC is computed given the generated linguistic
7
M.M. Hussien et al. Expert Systems With Applications 265 (2025) 125914
inputs, the same computation is done for LK and RLC, and the score
with the highest probability will be the model’s prediction. In the
pedestrian use case, the probabilities of crossRoad and noCrossRoad
are computed given the generated linguistic inputs, and the highest
probability will be considered as the model’s prediction.
3.3. Explainability
8
M.M. Hussien et al. Expert Systems With Applications 265 (2025) 125914
Fig. 7. RAG workflow (the numbers show the arrangement of the RAG process flow throughout the figure).
Table 3
Pedestrian Features Extraction.
Feature Extraction type Description
Motion activity Neural network We implemented a transformer architecture that processes
the 2D body pose and outputs the pedestrian action.
Proximity to the road Neural network and estimation From YOLOPv2 (Han, Zhao, Zhang, Chen, Zhang, & Yuan,
2022) was obtained from the drivable road area
segmentation and lane detection. Based on an experimental
minimum distance it is estimated whether the pedestrian is
near to the road or not.
Distance Estimation Estimated using the triangle similarity
Orientation Neural network Using the PedRecNet (Burgermeister & Curio, 2022) the joint
positions of the human body and the body orientation from
the azimuthal angle 𝜑 were obtained.
Gaze Estimation We used the 2D body pose detection and the positions of the
nose, left eye, and right eye keypoints.
9
M.M. Hussien et al. Expert Systems With Applications 265 (2025) 125914
Table 4 Table 5
Number of triples in the experimental setup. Comparing the pedestrian behavior predictor with various methods. The table includes
Dataset Ontology Triples the available results.
(a) 𝐽 𝐴𝐴𝐷𝑡𝑒𝑠𝑡
PedFeatKG 238 795
PSI
PedFeatRulesKG 302 574 Model F1 Precision Recall Accuracy
PedFeatKG 139 624 C3D 0.65 0.57 0.75 0.84
JAAD
PedFeatRulesKG 197 381 PCPA 0.68 – – 0.85
Decision Tree 0.78 0.78 0.78 0.78
Fuzzy Logic 0.75 0.69 0.81 0.69
PedFeatKG 0.86 0.77 0.96 0.79
depending on the quantity of triples and the dataset. Additionally, PedFeatRulesKG 0.87 0.86 0.88 0.83
we set 𝑙𝑒𝑎𝑟𝑛𝑖𝑛𝑔 𝑅𝑎𝑡𝑒 = 0.0005, 𝑏𝑎𝑡𝑐 ℎ𝑆 𝑖𝑧𝑒 = 10 000, and implemented (b) 𝑃 𝑆 𝐼𝑡𝑒𝑠𝑡
an early stopping criterion using the Mean Reciprocal Rank (MRR) Model F1 Precision Recall Accuracy
as is described in Melo et al. (2024). MRR is used to evaluate the
eP2P 0.66 – – 0.76
effectiveness of embeddings by averaging the reciprocal positions of the Ours Black Box 0.75 0.74 0.75 0.62
first correct answer in a list of queries. We observed that higher MRR Decision Tree 0.63 0.63 0.63 0.63
scores for the embeddings directly correlate with improved prediction Fuzzy Logic 0.72 0.74 0.70 0.59
accuracy. PedFeatKG 0.81 0.75 0.89 0.69
PedFeatRulesKG 0.84 0.75 0.94 0.72
From the datasets selected in Section 3.1.1 and the ontologies
described before (PedFeatKG and PedFeatRulesKG), two KGs were
generated, each composed of a specific number of triples and entities,
as detailed in Table 4. 12 tracks (20% of the data) were reserved exclusively for testing.
Finally, the third segment was focused on RAG implementation. To A variety of triple counts are explored for validation, including 500,
accomplish this explainability module, we utilized pedestrian features 1000, 2000, 4000, and 10 000 triples. Despite this range, the evaluation
extracted from the JAAD and PSI datasets to generate a human-readable score during testing remained consistent across these different triple
document that incorporates a basic explanation of why pedestrians counts. Therefore, a decision was made to proceed with 2000 triples for
have or do not have the intention to cross the road, for instance: ‘‘The validation, leveraging the train_test_split_no_unseen function provided by
pedestrian will not cross the street because, the pedestrian is looking, is the Ampligraph library to facilitate this choice. The final distribution
oriented to the left, is running, is at a moderate distance from the road of triples for the dataset was established as 351 736 for training, 2000
and the vehicle is too far’’. Then, we utilized this document containing for validation, and 12 222 for testing.
all descriptions in the RAG module, based on the LangChain frame- Two distinct scoring models are compared: TransE and ComplEx.
work. Within this module, we segmented the document into chunks To ensure a fair comparison, the training parameters are fixed. This
and transformed them into embeddings using the OpenAI model. The includes setting the embedding size (k) to 100, employing the Adam
embeddings were then stored in the Chroma vector database. Subse- optimizer with learning rate = 0.0005, utilizing the SelfAdversarialLoss,
quently, the final response was generated using the OpenAI GPT-4 generating five negative triples for every positive triple by corrupting
Large Language Model (LLM), based on a prompt tailored for the both the subject and object, and setting a batch size of 10 000. Addi-
pedestrian use case and a query derived from the pedestrian features tionally, validation parameters were specified, with a burn-in period
within the prediction frame, as detailed in Fig. 8.
and frequency both set at five, alongside a validation batch size of 100.
Regarding the experimental setup, we selected datasets as outlined
An early stopping criterion is also used to monitor the MRR metric
in Section 3.1.1, using a set of videos for both training and testing.
during validation, with a patience threshold of five validation epochs.
To ensure experiment reliability, we initially chose videos that met
F1-score is the used evaluation metric to choose the best model. Also,
specific criteria for visibility and high quality. For JAAD, 136 videos
precision and recall metrics are used to compare results with other
were used for training and 35 for testing, while for PSI, 104 videos were
works. The machine used to carry out this experiment is a Lenovo
allocated to training and 48 to testing. Additionally, during training,
Legion laptop with Windows 11, i9-13900HX CPU, 32 GB of RAM, and
videos were prioritized based on quality and relevance, with the most
NVIDIA GeForce RTX 4090.
representative videos positioned at the start of the training set. The
Regarding the RAG section the data is divided into chunks with
performance evaluation of the proposed pipeline was carried out using
the size of 384 tokens. Each chunk represents a KG of one sample in
precision, recall, and F1-score metrics. Precision is defined as the ratio
the form of triples. Chunks were transformed into embedding vectors
of correctly predicted positive cases to the total predicted positives,
using all-MiniLM-L6-v2 Hugging Face embedding model and stored in
recall as the ratio of correctly predicted positive cases to the total actual
positives, and F1-score as the harmonic mean of precision and recall. the Chroma vector database. After that, OpenAI GPT-4 LLM was used
in the generation module to generate the final response. The query is
4.2. Implementation details — drivers use case formed by extending the linguistic inputs which are fed to the KGE
and Bayesian inference model with the lane change prediction output
This section addresses two main branches regarding the driver use obtained from that model. Then, this query is fed to the RAG model.
case. The first one focuses on training the generated KG obtained from Fig. 8 shows the used system prompt to implement few-shot learning
phase 1. The training utilizes the Ampligraph library, which creates with an example where the query is provided together with the type
the KGE from the HighD dataset. The second branch discusses the of expected answer so that the LLM model can follow similar behavior
generation of explanations using the RAG technique. The dataset was while generating the responses.
divided based on tracks, to ensure a clear distinction between training,
validation, and testing data, taking into account the vehicles’ behaviors 4.3. KG-based prediction results — pedestrian use case
across different tracks. This division was crucial to prevent the overlap
of behaviors, especially since vehicles on the same track could exhibit The performance of our knowledge-based predictor approach was
similar behaviors, such as vehicles moving to the right because there evaluated over the test set of each dataset. The results provided were
is an exit on the right at the end of the road. Consequently, the compared with the following methods:
dataset was organized so that the first 48 tracks (80% of the data),
were allocated for the training and validation phases. The remaining • In JAAD:
10
M.M. Hussien et al. Expert Systems With Applications 265 (2025) 125914
– Convolutional 3D (C3D): it is a state-of-the-art model that black box employed transformer encoding blocks, pedes-
utilizes RGB frames and a fully-connected (fc) layer to gen- trian features, and a many-to-one attention layer to predict
erate the final prediction (Kotseruba et al., 2021; Tran, crossing intention.
Bourdev, Fergus, Torresani, & Paluri, 2014). – Decision Tree: we used the process described above to gen-
– Pedestrian crossing prediction with attention (PCPA): it is erate predictions through this approach.
a state-of-the-art model that integrates a 3D convolutional – Fuzzy Logic: we used the process described above to gener-
branch for encoding visual information, alongside individ- ate predictions through the fuzzy logic approach.
ual RNNs to process various features (Kotseruba et al.,
2021). According to the results presented in Table 5, experiments conducted
– Decision Tree: to evaluate this technique, we used the sim- on both datasets, PSI and JAAD, demonstrate that both KG mod-
els outperform other methods focusing on ‘‘black box’’ strategies or
ple implementation of Decision Trees provided by the KN-
explainability. Specifically, in the case of JAAD, the KG models signifi-
IME Analytics Platform, using the Gini Index as a quality
cantly enhance performance in terms of F1-score, with PedFeatRulesKG
meter, minimum records per node set as 4, and with Min-
showing a 22% improvement compared to C3D and a 19% improve-
imal Description Length (MDL) pruning method activated. ment compared to PCPA. Similarly, compared to the decision tree and
fuzzy logic approach, our method demonstrates improvements of 9%
– Fuzzy logic: as mentioned in Section Section 3.3.1, we and 12%, respectively. While improvements in precision and recall are
utilized the IVTURS-FARC method to extract a set of fuzzy also evident in our approach, accuracy values in ‘‘black box’’ methods
rules and membership functions. Subsequently, employing are higher. In the case of PSI, improvements are evident in terms of
a Takagi–Sugeno (TS) inference system implemented in F1 scores, precision, and recall, while accuracy remains higher with
Python, we generated predictions. the ‘‘black box’’ approach. Specifically, PedFeatRuleKG shows improve-
ments in the F1-score compared to eP2P, our black box method, the
• In PSI: decision tree, and the fuzzy logic approach, by 18%, 9%, 21%, and
12%, respectively. In addition, in both datasets, both KG models yield
– Pedestrian Trajectory Prediction (eP2P): it is a state-of- similar results. However, the best performance is achieved by the KG in-
the-art model that leverages context features and LSTM corporating pedestrian features and fuzzy rules (PedFeatRulesKG). This
encoder–decoder modules to forecast pedestrian intentions highlights the importance of integrating various sources of information
and trajectories (Chen et al., 2022). into the KG to enhance pedestrian behavior predictions. Moreover, the
– Ours Black Box: we developed the black box to participate in inclusion of fuzzy rules enhances the robustness of the KG and provides
the IEEE ITSS Student Competition on Pedestrian Behavior additional evidence, which is included in the Bayesian inference pro-
Prediction, which took place in 2023 at ITSC2023. This cess, offering clues that differentiate between crossing and non-crossing
11
M.M. Hussien et al. Expert Systems With Applications 265 (2025) 125914
12
M.M. Hussien et al. Expert Systems With Applications 265 (2025) 125914
For this use case, explainability can be explored from two perspec-
tives: KG Models and RAG models. When it comes to KG models, the
PedFeatKG ontology only activates the pedestrian features, which could
provide a possible explanation for the prediction. On the other hand,
if we use the PedFeatRulesKG ontology, the pedestrian features are
supported by fuzzy rules, offering additional insight into the prediction.
In the second case, the pedestrian features representing the pedestrian
state were queried to the retrieval module to explain why the prediction
was made and whether the pedestrian would cross the road or not.
The example depicted in Fig. 9 showcases two predictions derived
from JAAD videos, encompassing the prediction outcome, the frame
of prediction, pedestrian features, activated fuzzy rules, and the RAG
explanation. These examples underscore the significance of pedestrian
body orientation and proximity to the road as crucial factors in explain-
ing why pedestrians choose to cross or not cross the road. In the case
of video 044, two fuzzy rules were activated, explaining the prediction
of crossing the road due to: (1) the pedestrian is near the road and
(2) the pedestrian is oriented to the left and is walking fast. Similarly,
in the instance of video 262 where the pedestrian will not cross, one
fuzzy rule was activated, indicating that pedestrians are oriented in the
same direction as the vehicle. This explanation is further enhanced by
the RAG, which also considers the distance between the pedestrian and
the ego-vehicle.
13
M.M. Hussien et al. Expert Systems With Applications 265 (2025) 125914
Table 7
Comparison with other models using the F1-score (%) metric.
Prediction
Time 0.5s 1.0s 1.5s 2.0s
Algorithm
Xue et al. 98.20 97.10 96.61 95.19
Gao et al. 99.18 98.98 97.56 91.76
Ours 97.72 97.86 98.11 97.95
Table 8
Different multimedia for results visualization in the pedestrians and drivers use cases.
Use Case Link
Pedestrians https://www.youtube.com/playlist?list=PLAeK3AuwxenEqDvdJAk8X9Ysn5egmGvKO
Drivers https://www.youtube.com/playlist?list=PLAeK3AuwxenFsZslUIYk1CitWKAeAddgt
Fig. 12. Examples of lane change prediction explainability based on the discussed scene from HighD dataset.
LK and RLC, and the prediction with the highest probability will be 5. Conclusions and future work
the model’s prediction. The model uses the KGE to get all the triples
probabilities after reification as mentioned earlier in Section 3.2.3 and In this work, a context-based road users’ behavior prediction system
Fig. 3. During this instant, the model prediction is LK as it has a higher has been developed using Knowledge Graphs, as the main structure
probability than LLC and RLC. Two seconds later, the risk associated for representing knowledge, and Bayesian inference with graph reifi-
with the left following vehicle decreases from high to low, while the cations as a means to implement a fully inductive reasoning system
as a downstream task. Two use cases have been targeted following
medium risk with a center preceding vehicle remains. This change
this predictive approach: (1) pedestrian crossing actions; and (2) ve-
prompts the model to predict an LLC represented by a red arrow
hicle lane change maneuvers. In both cases, the proposed KG-based
pointing to the left of the vehicle. After that, in the third captured
solution provides superior performance concerning the state of the
frame represented in Fig. 10c. The target vehicle starts to accelerate in art both in terms of anticipation and F1 metric. Especially relevant
the left direction, moving with lateral velocity in the left direction as is the demonstrated capability for predicting road users’ behaviors
well. So, the vehicle started moving to merge and was about to change in the absence of relevant kinematic clues, given the ability of the
lane. By the final frame, after the lane change by 0.5 s, the vehicle is proposed system to account for contextual information. Different types
merging into the left lane, accelerating right while still moving left. The of information sources have been integrated, including datasets and
preceding vehicle that was a high-risk before the lane change is now rules, as proof of the capability to deal with numerical and linguistic
the right preceding vehicle and the left preceding vehicle has shifted information in a harmonized knowledge representation format using
positions to be directly ahead. These changes significantly affect the Knowledge Graphs. This feature endows the system with the capacity
TTC values. Table 8 contains links for some multimedia videos that to incorporate human knowledge in the form of linguistic descriptions
provide results of different scenes including the scene discussed in this representing experience and/or rules. Finally, explainable descriptions
of the behavioral predictions have been implemented using Retrieval
section. Regarding the explainability of the driver use case using RAG,
Augmented Generation Techniques (RAG), as a means to combine the
Fig. 12 shows the RAG explanation for the first two instances in the
reasoning ability of Knowledge Graphs and the expressive capacity of
scene described in Fig. 10. The model gives clear, reasonable, and
Large Language Models. Despite the progress exhibited in the current
precise explanations. The computational time for the RAG explanation work, several improvements are envisaged to extend and test the
is between two to three seconds per query. It is important to note that it predictive capabilities in new use cases, such as (1) near-miss (or
is not necessary to query the RAG model with every frame. Instead, we crash) lane change maneuvers, and (2) occluded pedestrians on urban
can limit queries to instances where there is an accident, or a collision, scenarios. Similarly, further research is necessary for understanding
or when an explanation is needed for either the predicted crossing road users’ behaviors more holistically, especially in cross-cultural
intention or the predicted lane change maneuver. settings. For that purpose, new data will be gathered in regions of
14
M.M. Hussien et al. Expert Systems With Applications 265 (2025) 125914
the world with different social rules, such as the MENA (Middle East Hogan, A., Blomqvist, E., Cochez, M., D’amato, C., Melo, G. D., Gutierrez, C., et al.
and North Africa) region, Southeast Asia, and Latin America. Finally, (2021). Knowledge graphs. ACM Computing Surveys, 54, http://dx.doi.org/10.1145/
3447772.
the proposed predictive system will be integrated with the behavior
Ishibuchi, H., & Nakashima, T. (2001). Effect of rule weights in fuzzy rule-based
planner of an Autonomous Vehicle in order to make AVs behave in a classification systems. IEEE Transactions on Fuzzy Systems, 9, 506–515. http://dx.
more human-like fashion. doi.org/10.1109/91.940964.
Izquierdo, R., Quintanar, A., Parra, I., Fernández-Llorca, D., & Sotelo, M. A. (2019a).
Experimental validation of lane-change intention prediction methodologies based
CRediT authorship contribution statement
on CNN and LSTM. In 2019 IEEE intelligent transportation systems conference (pp.
3657–3662). http://dx.doi.org/10.1109/ITSC.2019.8917331.
Mohamed Manzour Hussien: Methodology, Software, Validation, Izquierdo, R., Quintanar, A., Parra, I., Fernández-Llorca, D., & Sotelo, M. A. (2019b).
Formal analysis, Investigation, Writing – original draft, Writing – re- The prevention dataset: A novel benchmark for prediction of vehicles intentions.
view & editing, Visualization. Angie Nataly Melo: Methodology, Soft- In 2019 IEEE intelligent transportation systems conference (pp. 3114–3121). http:
//dx.doi.org/10.1109/ITSC.2019.8917433.
ware, Validation, Formal analysis, Investigation, Writing – original
Izquierdo, R., Quintanar, D. F., Daza, I. G., Hernández, N., Parra, I., & Sotelo, M.
draft, Writing – review & editing, Visualization. Augusto Luis Bal- Á. (2023). Vehicle trajectory prediction on highways using bird eye view rep-
lardini: Software, Validation, Formal analysis, Investigation, Writing – resentations and deep learning. Applied Intelligence: The International Journal of
original draft, Visualization. Carlota Salinas Maldonado: Supervision. Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies, 53,
8370–8388. http://dx.doi.org/10.1007/s10489-022-03961-y.
Rubén Izquierdo: Supervision. Miguel Ángel Sotelo: Conceptualiza-
Kotseruba, I., Rasouli, A., & Tsotsos, J. K. (2020). Do they want to cross? Understanding
tion, Methodology, Supervision, Project administration. pedestrian intention for behavior prediction. In 2020 IEEE intelligent vehicles
symposium (pp. 1688–1693). http://dx.doi.org/10.1109/IV47402.2020.9304591.
Declaration of competing interest Kotseruba, I., Rasouli, A., & Tsotsos, J. K. (2021). Benchmark for evaluating pedestrian
action prediction. In Proceedings of the IEEE/CVF winter conference on applications
of computer vision (pp. 1258–1268).
The authors declare that they have no known competing finan- Krajewski, R., Bock, J., Kloeker, L., & Eckstein, L. (2018). The highd dataset: A
cial interests or personal relationships that could have appeared to drone dataset of naturalistic vehicle trajectories on German highways for validation
influence the work reported in this paper. of highly automated driving systems. In 2018 21st international conference on
intelligent transportation systems (pp. 2118–2125). http://dx.doi.org/10.1109/ITSC.
2018.8569552.
Acknowledgment Laimona, O., Manzour, M. A., Shehata, O. M., & Morgan, E. I. (2020). Implementation
and evaluation of an enhanced intention prediction algorithm for Lane-changing
This research has been funded by the HEIDI project of the European scenarios on highway roads. In 2020 2nd novel intelligent and leading emerging
sciences conference (pp. 128–133). http://dx.doi.org/10.1109/NILES50944.2020.
Commission under Grant Agreement: 101069538.
9257983.
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., et al.
Data availability (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. In
H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, H. Lin (Eds.), Advances
in neural information processing systems: Vol. 33, (pp. 9459–9474). Curran
Data will be made available on request.
Associates, Inc., URL https://proceedings.neurips.cc/paper_files/paper/2020/file/
6b493230205f780e1bc26945df7481e5-Paper.pdf.
Lorenzo, J., Alonso, I. P., Izquierdo, R., Ballardini, A. L., Saz, A. H., Llorca, D. F.,
References et al. (2021). CAPformer: Pedestrian crossing action prediction using transformer.
Sensors, 21, http://dx.doi.org/10.3390/s21175694.
Achaji, L., Moreau, J., Aioun, F., & Charpillet, F. (2023). Analysis over vision-based Manzour, M., Ballardini, A., Izquierdo, R., & Sotelo, M. A. (2024). Vehicle Lane
models for pedestrian action anticipation. In 2023 IEEE 26th international conference change prediction based on knowledge graph embeddings and Bayesian inference.
on intelligent transportation systems (pp. 5846–5851). http://dx.doi.org/10.1109/ In 2024 IEEE intelligent vehicles symposium (pp. 1893–1900). http://dx.doi.org/10.
ITSC57777.2023.10422283. 1109/IV55156.2024.10588599.
Alcala-Fdez, J., Alcala, R., & Herrera, F. (2011). A fuzzy association rule-based Melo, A. N., Herrera-Quintero, L. F., Salinas, C., & Sotelo, M. A. (2024). Knowledge-
classification model for high-dimensional problems with genetic rule selection and based explainable pedestrian behavior predictor. In 2024 IEEE intelligent vehicles
lateral tuning. IEEE Transactions on Fuzzy Systems, 19, 857–872. http://dx.doi.org/ symposium (pp. 3348–3355). http://dx.doi.org/10.1109/IV55156.2024.10588605.
10.1109/TFUZZ.2011.2147794. Melo, A. N., Salinas, C., & Sotelo, M. A. (2023). Experimental insights towards
Benterki, A., Boukhnifer, M., Judalet, V., & Choubeila, M. (2019). Prediction of explainable and interpretable pedestrian crossing prediction. arXiv:2312.02872.
surrounding vehicles Lane change intention using machine learning. In 2019 10th Muscholl, N., Klusch, M., Gebhard, P., & Schneeberger, T. (2021). EMIDAS: Explainable
IEEE international conference on intelligent data acquisition and advanced computing social interaction-based pedestrian intention detection across street. In Proceedings
systems: Technology and applications (pp. 839–843). http://dx.doi.org/10.1109/ of the 36th annual ACM symposium on applied computing (pp. 107–115). New
IDAACS.2019.8924448. York, NY, USA: Association for Computing Machinery, http://dx.doi.org/10.1145/
Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., & Yakhnenko, O. (2013). 3412841.3441891.
Translating embeddings for modeling multi-relational data. In C. Burges, L. Bottou, Peng, C., Xia, F., Naseriparsa, M., & Osborne, F. (2023). Knowledge graphs: Op-
M. Welling, Z. Ghahramani, & K. Weinberger (Eds.), Advances in neural information portunities and challenges. Artificial Intelligence Review, 56, 13071–13102. http:
processing systems: Vol. 26, Curran Associates, Inc., URL https://proceedings.neurips. //dx.doi.org/10.1007/s10462-023-10465-9.
cc/paper_files/paper/2013/file/1cecc7a77928ca8133fa24680a88d2f9-Paper.pdf. Pool, E. A. I., Kooij, J. F. P., & Gavrila, D. M. (2021). Crafted vs learned representations
Burgermeister, D., & Curio, C. (2022). PedRecNet: Multi-task deep neural network for in predictive models—A case study on cyclist path prediction. IEEE Transactions on
full 3D human pose and orientation estimation. In 2022 IEEE intelligent vehicles Intelligent Vehicles, 6, 747–759. http://dx.doi.org/10.1109/TIV.2021.3064253.
symposium (pp. 441–448). http://dx.doi.org/10.1109/IV51971.2022.9827202. Ramezanı khansarı, E., Moghadas Nejad, F., & Moogeh, S. (2021). Comparing time to
Chen, T., Jing, T., Tian, R., Chen, Y., Domeyer, J., Toyoda, H., et al. (2022). PSI: collision and time headway as safety criteria. Pamukkale Üniversitesi Mühendislik
A pedestrian behavior dataset for socially intelligent autonomous car. arXiv:2112. Bilimleri Dergisi, 27, 669–675.
02604. Rasouli, A., Kotseruba, I., & Tsotsos, J. K. (2017). Are they going to cross? A benchmark
Choudhary, S., Luthra, T., Mittal, A., & Singh, R. (2021). A survey of knowledge graph dataset and baseline for pedestrian crosswalk behavior. In Proceedings of the IEEE
embedding and their applications. arXiv:2107.07842. international conference on computer vision workshops.
Costabello, L., Bernardi, A., Janik, A., Pai, S., Van, C. L., McGrath, R., et al. (2019). Rasouli, A., Kotseruba, I., & Tsotsos, J. K. (2020). Pedestrian action anticipation using
AmpliGraph: A library for representation learning on knowledge graphs. http: contextual feature fusion in stacked RNNs. arXiv:2005.06582.
//dx.doi.org/10.5281/zenodo.2595043. Saffarzadeh, M., Nadimi, N., Naseralavi, S., & Mamdoohi, A. R. (2013). A general
Gao, K., Li, X., Chen, B., Hu, L., Liu, J., Du, R., et al. (2023). Dual transformer formulation for time-to-collision safety indicator. 166, (pp. 294–304). http://dx.
based prediction for Lane change intentions and trajectories in mixed traffic doi.org/10.1680/tran.11.00031,
environment. IEEE Transactions on Intelligent Transportation Systems, 24, 6203–6216. Sanz, J. A., Fernández, A., Bustince, H., & Herrera, F. (2013). IVTURS: A linguistic fuzzy
http://dx.doi.org/10.1109/TITS.2023.3248842. rule-based classification system based on a new interval-valued fuzzy reasoning
Han, C., Zhao, Q., Zhang, S., Chen, Y., Zhang, Z., & Yuan, J. (2022). YOLOPv2: Better, method with tuning and rule selection. IEEE Transactions on Fuzzy Systems, 21,
faster, stronger for panoptic driving perception. arXiv:2208.11434. 399–411. http://dx.doi.org/10.1109/TFUZZ.2013.2243153.
15
M.M. Hussien et al. Expert Systems With Applications 265 (2025) 125914
Schulz, A. T., & Stiefelhagen, R. (2015). A controlled interactive multiple model Research Group at the Universidad de Alcalá, Spain, in
filter for combined pedestrian intention recognition and path prediction. In 2015 2019. During his time at INVETT, he was awarded a Marie
IEEE 18th international conference on intelligent transportation systems (pp. 173–178). Skłodowska-Curie Actions research grant and a research
http://dx.doi.org/10.1109/ITSC.2015.37. grant within the Maria Zambrano/NextGenerationEU project
SHI, X., Chen, Z., Wang, H., Yeung, D.-Y., Wong, W.-k., & WOO, W.-c. (2015). from the Spanish Ministry of Science, Innovation, and
Convolutional LSTM network: A machine learning approach for precipitation Universities. His research focuses on developing advanced
nowcasting. In C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, & R. Gar- systems for autonomous vehicle localization and data fusion,
nett (Eds.), Advances in neural information processing systems: Vol. 28, Curran using heterogeneous data sources such as digital maps, Li-
Associates, Inc., URL https://proceedings.neurips.cc/paper_files/paper/2015/file/ DAR, and image data, combined with cutting-edge computer
07563a3fe3bbe7e3ba84431ad9d055af-Paper.pdf. vision and machine learning algorithms.
Slootmans, F. (2021). European road safety observatory. Road Safety Thematic Report,
Personal Mobility Devices.
Stewart, T. (2023). Overview of motor vehicle traffic crashes in 2021: Technical report, Carlota Salinas Maldonado earned her Ph.D. degree in en-
National Highway Traffic Safety Administration. gineering and automatics from the Universidad Complutense
Su, S., Muelling, K., Dolan, J., Palanisamy, P., & Mudalige, P. (2018). Learning vehicle de Madrid in 2015. She is currently an assistant profes-
surrounding-aware lane-changing behavior from observed trajectories. In 2018 IEEE sor at the Computer Engineering Department, University
intelligent vehicles symposium (pp. 1412–1417). IEEE. of Alcalá, Alcalá de Henares, Madrid, 28801, Spain. Her
Tran, D., Bourdev, L. D., Fergus, R., Torresani, L., & Paluri, M. (2014). C3D: Generic research interests include autonomous vehicle navigation,
features for video analysis. arXiv:1412.0767v1. data fusion systems, lidar, computer vision, and machine
Tran, D., Bourdev, L., Fergus, R., Torresani, L., & Paluri, M. (2015). Learning learning algorithms.
spatiotemporal features with 3D convolutional networks. In Proceedings of the IEEE
international conference on computer vision.
Trouillon, T., Welbl, J., Riedel, S., Gaussier, E., & Bouchard, G. (2016). Complex
embeddings for simple link prediction. In M. F. Balcan, & K. Q. Weinberger (Eds.),
Proceedings of machine learning research: Vol. 48, Proceedings of the 33rd international
conference on machine learning (pp. 2071–2080). New York, New York, USA: PMLR,
URL https://proceedings.mlr.press/v48/trouillon16.html. Rubén Izquierdo Gonzalo received a Bachelor’s degree in
WHO (2023). Global status report on road safety 2023: Summary. World Health electronics and industrial automation engineering in 2014,
Organization. an M.S. in industrial engineering in 2016, and a Ph.D.
Wickramarachchi, R., Henson, C., & Sheth, A. (2020). An evaluation of knowledge degree in information and communication technologies in
graph embeddings for autonomous driving data: Experience and practice. arXiv: 2020 from the University of Alcalá (UAH). He is currently
2003.00344. an Assistant Professor at the Department of Computer En-
Xue, Q., Xing, Y., & Lu, J. (2022). An integrated lane change prediction model gineering of the UAH. His research interest is focused on
incorporating traffic context based on trajectory data. Transportation Research Part C the prediction of vehicle behaviors and control algorithms
(Emerging Technologies), 141, Article 103738. http://dx.doi.org/10.1016/j.trc.2022. for highly automated and cooperative vehicles. His work
103738. has developed a predictive ACC and AES system for cut-
Yi, K., Wu, J., Gan, C., Torralba, A., Kohli, P., & Tenenbaum, J. (2018). Neural- in collision avoidance successfully tested in Euro NCAP
symbolic VQA: Disentangling reasoning from vision and language understanding. tests. He was awarded the Best Ph.D. thesis on Intelligent
In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, & R. Gar- Transportation Systems by the Spanish Chapter of the ITSS
nett (Eds.), Advances in neural information processing systems: Vol. 31, Curran in 2022, and the outstanding award for his Ph.D. thesis
Associates, Inc., URL https://proceedings.neurips.cc/paper_files/paper/2018/file/ by the UAH in 2021. He also received the XIII Prize from
5e388103a391daabe3de1d76a6739ccd-Paper.pdf. the Social Council of the UAH for the University-Society
Knowledge Transfer in 2018 and the Prize to the Best Team
with Full Automation in GCDC 2016.
Mohamed Ahmed Abdelaziz Manzour Hussien obtained
his bachelor’s degree in mechatronics from the German Uni-
versity in Cairo (GUC) in 2019. During his undergraduate Miguel Ángel Sotelo received a degree in Electrical Engi-
studies, he had the opportunity to conduct his bachelor’s neering in 1996 from the Technical University of Madrid,
thesis in the field of machine learning at the IFS (Institut a Ph.D. degree in Electrical Engineering in 2001 from the
für Schienenfahrzeuge) institute in Germany. After that, University of Alcalá (Alcalá de Henares, Madrid), Spain,
he worked as a lecturer assistant in the GUC till 2022. and a Master in Business Administration (MBA) from the
During this period, he completed his master’s degree in European Business School in 2008. He is currently a Full
the field of Intelligent Transportation Systems (ITS) at the Professor at the Department of Computer Engineering of the
Multi-Robot Systems (MRS) research group in the GUC in University of Alcalá (UAH). His research interests include
2022. He focused on pedestrian behavior prediction. In Self-driving cars, Prediction Systems, and Traffic Technolo-
2023, he started his Ph.D. journey in the field of ITS at gies. He is the author of more than 300 publications in
the INVETT (Intelligent Vehicles and Traffic Technologies) journals, conferences, and book chapters. He was been the
research group, University of Alcala, Spain. He is focusing recipient of the Best Research Award in the domain of
on vehicle behavior prediction. In 2023, he won the IEEE Automotive and Vehicle Applications in Spain in 2002 and
ITSS student competition in the track of Driver Decision 2009, and the 3M Foundation Awards in the category of
Prediction. eSafety in 2004 and 2009. Miguel Ángel Sotelo has served
as Project Evaluator, Rapporteur, and Reviewer for the
European Commission in the field of ICT for Intelligent
Angie Nataly Melo Castillo obtained a bachelor’s Degree Vehicles and Cooperative Systems in FP6 and FP7. He
in Informatics in 2013 from the University Catolica in was Editor-in-Chief of the IEEE Intelligent Transportation
Colombia and a Master’s Degree in Intelligent Transporta- Systems Magazine (2014–2016), Associate Editor of IEEE
tion Systems in 2016 from Czech Technical University in Transactions on Intelligent Transportation Systems (2008–
Prague. She started her work in the INVETT Research Group 2014), member of the Steering Committee of the IEEE
in September 2022, where she is currently pursuing a PhD Transactions on Intelligent Vehicles (since 2015), and a
degree in Information and Communications Technologies member of the Editorial Board of The Open Transportation
with the Computer Engineering Department, developing Journal (2006–2015). He has served as General Chair of the
explainable prediction systems in the context of pedestrian 2012 IEEE Intelligent Vehicles Symposium (IV’2012) that
behavior. was held in Alcalá de Henares (Spain) in June 2012. He was
the recipient of the IEEE ITS Outstanding Research Award in
2022, the IEEE ITS Outstanding Application Award in 2013,
Augusto Luis Ballardini was born in Buenos Aires, Ar-
and the Prize to the Best Team with Full Automation in
gentina, in 1984. He completed his M.Sc. and Ph.D. degrees
GCDC 2016. He is a Former President of the IEEE Intelligent
in Computer Science from the Università degli Studi di
Transportation Systems Society.
Milano - Bicocca, Italy, in 2012 and 2017 respectively. Fol-
lowing his post-doctoral activities in the IRALAB Research
Group for two years, Dr. Ballardini joined the INVETT
16