Glossary of Terms
A performance measurement for
AUC-ROC (Area Under the Receiver classification models at various
Operating Characteristic Curve) threshold settings. AUC represents the
degree of separability between classes.
A metric that measures the proportion
Accuracy of correct predictions made by a model
compared to the total number of
predictions.
Artificial intelligence systems that can
Agentic AI make autonomous decisions based on
goals, feedback, and context—similar
to how a human agent would operate.
A systematic error in a model that
Bias (in AI) leads to unfair outcomes for certain
groups, often caused by historical data
or skewed training sets.
A resampling method used to estimate
Bootstrapping the uncertainty of a statistic by
repeatedly sampling from the original
dataset (with replacement) to create
many new datasets. The statistic (e.g.,
mean or standard deviation) is
calculated for each resample, allowing
a probability distribution to be built and
uncertainty to be assessed.
A table used to describe the
Confusion Matrix performance of a classification model,
showing the true positives, true
negatives, false positives, and false
negatives.
The ratio of a borrower's current credit
Credit Utilization card balances to their credit limits,
often used as an indicator of credit risk.
The process of replacing missing or
Data Imputation incomplete data with substituted values
to maintain the integrity of the dataset.
A machine learning model that splits
Decision Tree data into branches to reach a decision
based on input variables. It’s valued for
its interpretability.
The failure to make required debt
Delinquency payments on time, typically used in
credit risk assessments.
A fairness metric that is satisfied if the
Demographic parity results of a model's classification are
not dependent on a given sensitive
attribute.
A measure of how evenly positive
Disparate impact outcomes are distributed across
different groups. If one group gets
positive results much less often than
another, it may suggest unfair
treatment or bias.
The process of analyzing datasets to
EDA (Exploratory Data Analysis) summarize their main characteristics
and uncover patterns before applying
formal modeling.
The harmonic mean of precision and
F1 Score recall, providing a balance between the
two metrics for evaluating classification
models.
Ensuring that AI systems do not
Fairness (in AI) discriminate against individuals or
groups and that decisions are
equitable.
The process of optimizing model
Hyperparameter Tuning settings to improve its performance,
such as adjusting tree depth or learning
rate.
It refers to a situation where the
Imbalanced Data distribution of classes in a dataset is
highly skewed, with one class
significantly outnumbering the other(s).
In statistics, imputation is the process
Imputation of replacing missing values with
substituted values.
A statistical model used for binary
Logistic Regression classification tasks, predicting the
probability of one of two outcomes.
Instances where no data value is
Missing Data stored for a variable in an observation.
These gaps can impact model quality.
There three different missing data
mechanisms:
Missing Completely at
Random (MCAR): Data is
considered MCAR when the
reason for the missing values is
unrelated to any other data in
the dataset. The missingness
happens by pure chance,
without any pattern.
Missing at Random (MAR):
Data is considered MAR when
the reason values are missing is
related to other information in
the dataset that isn't missing. If
we know the values of some
complete variables, we can
explain why other values are
missing.
Missing Not at Random
(MNAR): Data is considered
MNAR when the reason it's
missing is related to the missing
value itself. In other words, the
missingness depends on
information we don’t have.
A mathematical technique that
Monte Carlo simulation simulates the range of possible
outcomes for an uncertain event.
The percentage of true positive
Precision predictions among all positive
predictions made by the model.
Using historical data and algorithms to
Predictive Modeling forecast future outcomes, such as
customer delinquency.
The percentage of actual positive
Recall cases that were correctly identified by
the model.
A tool used to explain the output of
SHAP (Shapley Additive machine learning models by assigning
Explanations) each feature an importance value.
Artificially generated data that mimics real-
Synthetic Data world patterns and distributions, used
when actual data is limited or sensitive.