Process Mining
Dr. Hans Weigand
Overview
• What is process mining?
• What can process mining do for BPM cq auditing?
• What is a formal definition of process and how does the process mining
algorithm work?
• Appendix Case: Process Mining in logistics (thesis project Ruud v
Cruchten) [not for the exam]
Before we start: what is a process?
What is process mining?
• Extracting knowledge from event
data
• Building a process model inductively
• E.g., from cases A-B-C and A-D-C we
derive (mine) a process A –(B xor D) –C
• Comparing mined process model
with process description
• Visualizing mined models for
analysis
Background
/TU Aachen
Difference process mining and data mining
• Data mining is the automated process of discovering patterns in large
data sets involving
• Patterns e.g., in the form of linear regression functions
• Patterns relate variables in the data set
• Process mining aims at discovering the process
• Not variables, but events and event relationships
• Not only the “main stream” or “happy path” but also all secondary paths, variants
• Also commonalities between data mining and process mining: inductive
algorithms, overfitting, visualization, …
Process mining visualization
Process mining and process management
• The notion of Business Process has become very important (esp. in ’90s) as
complementary to the traditional focus on functions and tasks.
• The traditional way of business process management is to design (or
reengineer) processes using formal modeling tools, then implement (e.g., ERP).
• Formal modeling tools typically based on Petri Net models (see below).
• This has been extended with process monitoring and optimization.
• However, in many cases, a process description
• is not present at all
• is present but has never been implemented
• has been implemented but is not up to date
• Result: A complete fact-based process model is missing
• So the first value of process mining: providing insight, based on facts
Three main types of process mining (vd Aalst)
From event log to process model
(ordered in time)
Visualization – different levels needed
Process Mining and BPM
Process Mining as a mirror
Inzicht in Business Rules
Example RABO bank: Local branches have different processes
Local bank X Local bank Y
14
Some recent trends in Process Mining and BPM
• Integrate PM results with dashboards (see SAP Celonis exercise).
• Root cause analysis: given some hick-up or elephant path in the process,
what can be the cause?
• Enhancement: how to improve the process?
• Use real-time IoT data
• Event Data Management – making event data available is already half of
the success of PM
Process mining and auditing
Process mining and auditing (Jans, Vasarhelyi)
• Does the process conform to the audit rules?
• Built-in controls are not always present in the implementation.
• Sometimes built-in controls are switched off for business purposes
• Ex-post or ex-ante insight in a process being changed
• What is added-value?
• Entire population is analyzed, not a sample
• Independent meta-data is added (external control) [next sheet]
• Walkthroughs
• Actual performance rather than assumed performance
• Specific analyses, e.g., social relationships
Event log
• Event log/audit trail in ERP system contains more data (meta-data), like
time-stamp
• Importantly, the event log is created by the IT system, not under the
control of the auditee (e.g. the manager of the sales department)
Narratives
Closer look
Event logs and ERP data tables
Petri Nets are formal process models
• Places (states)
• Transitions
• Tokens
• Choice, merge
• Parallel Split, join
Petri Net example
Which transition(s) can fire?
Will this PN reach an end state?
Short exercise
Can you model this with a Petri Net?
7-25
Formal definition of algorithm
• If L is an event log, then a process discovery algorithm is a function that
maps L onto a process model such that the model is “representative” for
the behavior in the event log.
• More specifically: a function that maps L onto a marked Petri Net P such
that P is sound and all traces in L correspond to firing sequences in P.
sound: “that the process can reach end state”
Quality criteria
1. So mined model is not necessarily unique
2. Multiple models (levels) needed
Alpha algorithm – basic idea
1. Derive event-event relations
• X > Y iff for some case X is directly followed by Y
• X → Y iff X > Y and not Y > X (causality)
• X || Y iff X > Y and Y > X (parallel)
• X # Y iff not X > Y and not Y > X
Our example
Alpha algorithm (2)
1. Derive event-event relations
2. Match with basic Petri Net constructs
Alpha algorithm
• Formal definition is much more complicated
• Alpha algorithm is a proof-of-concept, not the best algorithm, not useful
for large data sets
• Alpha algorithm does not deal with noise in the data
• Another approach is Genetic Mining (Ana Karla Alves de Medeiros et al)
that generates several alternatives and (after some rounds) selects the
best one
• Today, many commercial implementations
B
A D E
D
C
Conclusion
• Process Mining has been surprisingly useful in auditing and business
process analysis.
• Still many technical and application challenges
• How to deal with very large processes? (decomposition)
• How to represent uncertainty about the result (fuzzy petri nets; responsible data science)
• How to map ERP booking events into a decent event log?
• How to combine process mining and RPA?
Appendix:
Case Ruud van Cruchten
Introduction
• Msc Information Management (Tilburg University)
& Msc IT Management (University of Turku)
• Started working as BI consultant / Data Scientist
• Building dashboards in Qlik and Power BI
• Consulting on becoming data driven organization:
BI adoption (change management meets BI)
• Data Science project: predictive modelling and process mining
• PhD on Data Quality in Process Mining
Process Mining Project Approach
PM2: Process Mining Project Methodology (van Eck, Lu, Leemans, van der Aalst, 2015)
• Iterative approach
• RQ driven, first iteration often
explorative to derive more specific
questions
• Different roles required
• IT/Data expert
• Process Miner
• Business expert
80/20 rule applies:
80% time spend in data preparation
20% time spend in actual analysis
van Eck, M. L., Lu, X., Leemans, S.J.J., & van der Aalst, W.M.P. (2015). PM2: A
process mining project methodology. International Conference on Advanced
Information Systems Engineering, 297–313. https://doi.org/10.1007/978-3-319-
19069-3_19
ASML case: mining a logistics process
• Low volume – high value production
• Highly customizable machines
• 70 physical locations worldwide
12.000 SAP locations worldwide
• Low transparency in flow of
materials
• “Inefficient” inventory 300M p/y
Problem to be solved:
Reduce inefficient inventory
How: by creating more transparency in
material flow → process mining not the goal
but the means to an end
Step 1: Scope and goal of the analysis
Several iterations of refining the scope and goal of the analysis.
First results:
Step 1: Scope and goal of the analysis
Several iterations of refining the scope and goal of the analysis.
Final results:
Step 2.1: Data Extraction
• Identify what data to gather → consult business experts
• Identify where to find the data → consult IT/Data experts
What: Where:
Log file of material locations Data spread over several tables
integration & transformation required
Step 2.2: Data Preparation
First results were a “spaghetti model”
Reason: too many unique location names
and movement codes
59 movement codes in dataset
→ mapped to their function: Constructed
unique case ID
Manually mapped SAP
location to “readable”
send, receive or send&receive location names
(function)
963 SAP locations in dataset:
→ mapped to their function, e.g. analysis,
repair, resulting in 41 “locations”
Greatly reduced complexity
Step 2.3: Data Transformation
The data was not in right format to show a sequence of locations, instead it showed the
actions performed to move materials (send & receive bookings)
→Data transformation required to distill locations
→A lot of domain knowledge required to perform correct transformation
Consult with business experts!
→ Conclusion: process mining provides multiple perspectives on material flow:
what locations and what actions used for movement
Step 3: Interpreting results
Several iterations of refining the scope by filtering on certain materials,
timeframe, subprocesses etcetera.
Able to provide clear overview on the flow of materials through locations,
but how does this affect inventory?
Step 3: Interpreting results
Problem: current KPI’s focused on single locations
→ result: no process overview of execution and
performance
Process Mining provided:
+22 days
• Insight in flow between locations
• Accurate performance measurement (throughput
times)
• Impact of behavior in process
(8 vs 30 days throughput time)
14 days is stock reorder horizon!
Business expert determined that this behavior was
not acceptable!
Is it always this hard too prepare data?
No, example: social housing organisation,
administrative process with a perfect workflow
log.
Defined linear a workflow in ERP system,
however, in reality not linear…
Over 500 variants in process execution
• Identified bottlenecks
• Identified improvements in forcing correct
workflow execution
• Created monitoring dashboard in Power BI for
continuous improvement; no process model
overview required
→defined KPI to measure performance and
variance
Lessons learned
• Start small, increase in scope always possible
• Don’t underestimate time needed to prepare/clean data
• Analyze with a goal/research question in mind, otherwise you get lost in
the results
• Some variance in process execution is normal
• Always validate results with business experts
• Be critical; process mining provides 100% transparency; not everybody
is happy with that…