Data Trans
Data Trans
Big Data
in Transport
Data is increasingly significant in the
management and use of transport systems.
This Insight will explore big data management,
best practice and consider the challenges and
developments ahead for those responsible for
big data in a transport environment.
2. Big Data Management The infrastructure also needs to support flexible and
dynamic data structures. This means that all data that is
2.1. General available relating to a data item needs to be transmitted
and stored, even if it is not used by its primary application.
The big data process includes data acquisition, For example, all elements of a ticketing transaction
processing, aggregation and delivery. Data acquisition in need to be transmitted and stored so that it can be used
transport relates to the collection of a high volume of data effectively, even if its initial ticketing application only
from specific data sources e.g. presence detection data, needs to associate one barrier passage with a user ID on a
tolling and passenger transaction data. Data acquisition in database.
a big data environment is characterised by a high volume
of semi/unstructured raw data ready for processing (e.g. Data reliability is dependent on accuracy and precision
traffic speed). (for measured quantities and associated metadata, such
as time stamping). This means that the resolution of
Data processing involves cleansing (e.g. anonymization), measured quantities needs to be as high as possible and
the application of unique IDs to records and identification comms error rates need to be low. Operational demands
of errors. Clean data from multiple data sources is then for safety drive precision and accuracy in monitoring
made available for aggregation. and control in all modes. This encourages the use of
redundancy and multiple data sources. Safety is not
Big data aggregation is achieved by organising and compromised in the event of failures, but service levels
processing data from an unstructured to a structured often suffer. So, failures to detect train movements or
state. For example, vehicle presence detections are used control railway signals result in service disruptions, rather
to establish characteristics of traffic, such as flow or than higher accident rates. These requirements demand
occupancy, which is used to establish congestion or delay high data rates and fast processing, which necessitates
data. Or train departure data is used to predict delays. ever greater investment in detection, communications and
Aggregated data may or may not be moved from its original processing infrastructure.
location. Data sets may be aggregated into one big data
set, which can then be processed using intensive analytics
to identify relations, trends and insight. This is then
available for analysis and dissemination.
It is notable that as well as increasing the demands on Mobile-sourced data also provides data acquisition
infrastructure provision directly, the need to monitor, opportunities, but with a different set of performance
transmit, process, and store all data elements also challenges to traditional detectors:
increases the need to manage data privacy and security,
which has an impact on data infrastructure provision. n GPS/mobile (speed, presence, count)
n Bluetooth (speed, presence, count).
This means that if an in-house storage solution is
adopted by a transport service provider, it requires As part of a Technology Strategy Board study with Deloitte,
significant capital expenditure on data storage. Cloud Imperial College London and INRIX, Transport for London
storage provides an alternative, with the capital and (TfL) also compared three datasets from existing detector
operational risk transferred to a third party, but with higher sources (off-call mobile phone data provided by INRIX,
operational costs manifest in service charges. Similarly, ANPR2 journey time data provided by the TfL LCAP3
communications services that support transport operators system and TfL iBus bus journey time data, based on GPS
used to be implemented and delivered by the operators vehicle tracking). It was found that mobile phone-sourced
themselves. This is less common now, with collaborative data quality depends on the context (e.g. time of day, type
and third party provision becoming more popular. of user and speed).
An Australian analysis of sensor options provides an The exploitation of mobile data to infer traffic flows in
indicator of how to match functions to detectors: urban environments is limited by its lack of flexibility
in measuring flows on-demand on a specific path, but
n Inductive loops (presence, count and speed) it might be aggregated with other sources to improve
n Piezo-electric strips (counts, pressure, speed) its performance. Also, mobile technology is subject to
n Pneumatic tubes (counts, speed) potential biases (for example, some age or social groups
n Cameras (counts, classification, speed, presence) might have a greater tendency to use bicycles or trains
n Infrared sensors (counts, speed, classification) instead of road-based motor vehicles). Unknown vehicle
n Passive acoustic (counts, speed) occupancy also increases the level of uncertainty when
n Microwave (counts, speed, presence) sourcing traffic flow data from mobile technology.
n RFID (presence, counts, classification).
The Global Marine Technology Trends (GMTT) 2030
In the maritime arena, big data and analytics has been report highlights the need to integrate a range of emerging
identified in a recent report addressing its application in technologies as a critical factor in developing robust,
commercial shipping and naval applications. It recognises reliable and efficient solutions to exploit data from a wide
the proliferation of big data solutions enabled by wireless range of sources in varying and constantly changing
communications, novel sensor technologies and the structures and architectures. This includes the need
creation of ad hoc networks, with widespread applications for robust, high bandwidth, secure communications
including, meteorological oceanographic, traffic data, supported by sophisticated analytics to augment highly
material and machinery performance data, cargo data and skilled operators.
accident data.
Connected vehicles developments are expected to
promote changes in the way data is acquired for highway
applications. The timeliness, availability and accuracy of
data will be much higher than for existing techniques and
it can be expected to contain more telemetry, rather than
alerts. However, its value is dependent on penetration
rates, which are currently low, but expected to increase.
This illustrates the challenges in using a wide variety of These can be analysed to identify performance of
data sources. It is essential to know and understand the particular train services, for example, over the preceding 6
quality of the data available. It highlights that big data months, which can be used to increase the confidence of
processing and aggregation needs to apply a data quality customers or the speed of operational decision making.
assessment, otherwise outputs will be inaccurate, so users’
decisions will carry a higher risk that their needs will not be Big data is expected to play an increasing role for
met and data providers’ reputations will suffer accordingly. transport infrastructure owners and operators in managing
their assets. BIM (Buildings Information Modelling)
Transport network and service operators need to manage generates asset information as soon as it is designed.
data to ensure it is available, reliable, accurate and Maintenance and service functions add to that data so
true. This means that relevant data standards should be that infrastructure owners are able to develop a clear
established and incoming data needs to be monitored, picture about the state of their assets, how they need to be
controlled and refined. For example, specific data quality managed and what resources might be needed to preserve
requirements for a transport network operator might their capability.
include:
2.3 Data Processing
n Spatial granularity (line, station, urban road
network, link, junction etc) The value of big data increases as latency decreases, i.e.
n Temporal granularity (minutes, hours days, the faster data is delivered, the more value it provides to
annual, etc) users. Performance improvements lead to qualitatively
n Direction discrimination better analysis outputs (e.g. closer to real-time). The
n Modal discrimination challenge, then, is to deliver data as fast as possible.
n Sample size within provided spatial
and temporal quantities This is supported by standardisation and DATEX II provides
n Bias (free or bias). a set of specifications for exchange of traffic information
in a standard format between separate systems. DATEXII
Other sources of data that are expected to contribute to big is a structured data model that utilises UML, is platform
data in transport include live feeds from social media (e.g. independent and seeks to harmonise the exchange of
Twitter – particularly for public transport), traffic data and traffic and travel information across the EU. Processing
weather. Many historic data sets are becoming available. data within these standards is the challenge.
Data standards are also developing for maritime The accuracy of data is likely to be an increasingly
application. The Automatic Identification System (AIS) is significant factor in data quality so that the value of
an automatic tracking system used on ships and by Vessel information in a big data solution can be enhanced. This
Traffic Services (VTS) for identifying and locating vessels is particularly relevant in predictive applications and it
by electronically exchanging data with other nearby ships, can be difficult to achieve. The challenge for transport
AIS base stations and satellites. The National Marine service providers is that when travellers are presented
Electronics Association (NMEA) standard uses two primary with forecasts about their journey, they expect them to
sentences for AIS data to receive data from other vessels be fulfilled. However, the only certainties that transport
and for own vessel’s information. operators can offer are records of past events.
Transport operators need to consider if current server For example, journey time postings on motorways are
storage has sufficient capacity to handle data within based on the measured journey times of recent travellers.
desired time parameters. Requirements for time If this is perceived as travel time prediction, it will be
parameters, storage solutions etc. will need to be correct as long as the traffic and highway conditions do
considered and assessed. Cloud computing solutions may not change. Similar challenges exist in all public transport
be a viable option, subject to careful feasibility analysis. operations. Big data can provide more realistic predictions
by comparing current conditions with historic data and by
2.4 Aggregation assigning confidence levels and tolerances to predictions.
The infrastructure required for organising big data must be The presentation of predictive information to travellers
able to manipulate and process data at the original storage raises more challenges. For example, travellers might rely
location and manage high throughput as part of the big on variable message signs, real time passenger information
data processing step in addition to being able to handle or platform displays when there are no disruptions,
data of varying types and converting unstructured data to but a mobile solution might be more appropriate for
structured data. dissemination of disruption information.
Transport service operators cannot always extrapolate If appropriate standards are used for data exchange
meaningful outputs from original source data (e.g. mobile (such as DATEXII), meta data will be available that can be
phone data) because of lack of expertise or investment in used to speed up searches. For example, timestamping
systems. Third party intervention is available to process can be used to filter historic data according to day, date
data into a meaningful and usable format. For example, or age, which enhances the quality of predictions for
sample bias can inhibit analysis and mobile data might traffic information. It also speeds up batch processing by
not be representative of the travelling population and narrowing down the data set for analysis. This is important
additional analysis and aggregation with other data sets for some operational applications, such as incident
might be necessary to create useful inputs for operators detection, where confidence levels can be raised by rapid
to use. As a result, transport operators will become more analysis of multiple data sets using narrow time and
reliant upon third parties to process and aggregate the location parameters.
data necessary for their own analysis and delivery.
2.5 Information Delivery The challenge of maintaining control of the data can be
achieved with appropriate agreements, which underlines
The value of big data needs to be challenged because big the need to work collaboratively with third parties (such as
data analysis (e.g. fusion and mining) might not produce mobile apps and traffic information service providers) in
the ‘truth’. Analysis could identify patterns where none what is essentially an un-regulated market.
exist because they might emerge if data is analysed for
long enough. Conversely, trends can be lost when data is Organisations such as TfL, Network Rail and Highways
combined. This indicates that skills and expertise are likely England have pursued an open data approach, making
to be important in big data processing. data and reports freely available to third parties and the
public. This commonly involves removal of restrictions on
If there is no understanding of context, it can be lost commercial usage of data in a bid to increase information
within a big data set. Diverting motorists to switch modes availability and dissemination. TfL provides Application
and catch a train to avoid congestion will be fruitless if Programme Interface (API) for web and app developers for
there are no parking spaces at the station. Finding ways journey planning, live travel disruptions and underground
to convert information into simple messages can be a and bus service information.
significant challenge, particularly if the output media have
constraints (for example, Variable Message Signs - VMS). Transport operators need to ensure that a single source
of truth can still be maintained if big data is made freely
Also, it is important to ensure a single source of truth and available. They also need to ensure that third parties do
that third party users do not corrupt data and misuse it. not reduce the quality of this data and that it is used in
pursuance of its transport obligations.
For example, some Jaguar Landrover (JLR) vehicles 3.4 Information Delivery
collect data about problems in the highway surface (e.g.
potholes), which is subsequently used to create warnings UTMC enables traffic management applications to share
for other JLR drivers. Ideally, this data should be available and communicate information amongst themselves e.g.
to highway authorities for asset management purposes and VMS with ANPR.
to other drivers as part of traffic information and JLR is
working to that end. DATEX II is a structured data model that utilises UML,
is platform independent and seeks to harmonise the
3.2 Data Processing exchange of traffic and travel information across the EU.
Transxchange is a Department for Transport (DfT)-
Building a legacy big data environment should be avoided sponsored national standard for bus information exchange
because of the risk of potential disruptive changes such as with other systems and SIRI ((Service Interface for Real
new data types, hardware and programming approaches. Time Information) is an EU standard for exchange of
This means that standards and commercial off the shelf current, planned and predicted real time public transport
(COTS) solutions should be used wherever possible. information between systems).
3.3 Aggregation
4.1 Data Access Perceptions of privacy are also likely to influence the
value of big data, particularly where it relates to the use of
Network operators want travel information to be distributed personal data derived from mobile or ANPR sources. This
effectively so that travellers can make effective journey can be anonymised so that there is no risk to privacy, but
decisions. This is commonly achieved by making data the perception that authorities are tracking individuals is
freely available to third parties for processing and onward likely to continue. This puts at risk the ability to acquire
dissemination. The commercial models that support the data because individuals will not trust authorities or app
third party providers involve app purchases, subscriptions suppliers.
or advertising revenue.
Geographic location is associated with a device through
4.2 Data Quality a relevant identification (e.g. GPS coordinates, Internet
Protocol (IP) address, RFID, or Wi-Fi positioning system).
Whilst travellers perceive a benefit, the business model is For safety applications (such as eCall), regulations have
sustainable, but any disruptions or changes in perceived been developed to protect privacy.
benefits might challenge revenue streams. For example,
if all motorists are advised to take an alternative route to It is notable that individuals are prepared to share their
avoid an incident, significant delays might ensue. Similarly, location data if they perceive a clear benefit in doing so,
if motorists are advised to change modes and take the which explains why, for example, fitness apps are popular.
train, the credibility of the advice will be undermined if The challenge of this approach for big data is that inputs to
there is no parking availability at the station and all the mobile-sourced data might not be representative because
trains are late or full. subscribers will have particular motivations for permitting
access to their location data. Demographic attitudes
This means that even without external business towards sharing personal data might also favour younger
disruptions, it is important to maintain and improve the generations.
quality of data and associated advice.
Principles of proportionality and minimisation, with For example, cloud-based deployment provides some
transparent processes, policies and strategies are likely protection against obsolescence by partially offsetting
to be important in retaining user confidence. Ideally, this capital risk against ongoing service costs.
should be publicised alongside the benefits in sharing
data. 4.6 Skills
4.4 Interoperability Big data is likely to affect skill sets in the transport industry
in the future. As operations become more complex, the
Big data solutions are more likely to succeed if the data drive for improvements in services and efficiencies can
is interoperable, enabling systems to process data from be expected to increase the dependence on systems
any source. The key to integration is standardisation and and data. Over time, system processes will develop to
an open architecture. Regulation is needed to deliver this perform better than their human counterparts in such a
when the market cannot. scenario, which will reduce network manager’s reliance
on operators’ skills and knowledge. However, this trend is
Standardisation through regulation can be seen in the likely to increase the dependency on data specialist skills
application of Telematics Applications for Passenger to manage performance.
Services Technical Specifications for Interoperability
(TAP TSI). This defines European-wide procedures and Application program interfaces (APIs) are also expected
interfaces between all types of railway stakeholders. TAP to create dependencies on skills for big data deployment.
TSI supports interoperable and cost-efficient information APIs provide the building blocks of protocols, tools and
exchange for high quality journey information and routines for the interaction of software components in
ticketing. Similarly, specifications have been adopted to order to create applications, particularly when developing
improve interoperability of real-time highway status and graphical user interfaces (GUIs). The challenge for
traffic data to be made accessible in a standardised format suppliers, authorities and managers will be to ensure
(DATEX II) as part of the ITS Directive . that skills are available at the right level and in sufficient
quantity to support big data solutions.
4.5 Business Case
4.7 Internet of Things
Mobile data provides significant opportunities for big data
deployment. However, rapid development in technology The internet of things is expected to create significant
and services creates uncertainty for big data investment. data content as more and more devices become
Business cases could be undermined if solutions are connected. This is likely to provide opportunities for big
obsolete before they reach maturity, so flexibility needs to data application developers and cloud service providers to
be built into the delivery. innovate.
This IET Transport Sector Insight was written by Matthew Clarke, ATKINS Transportation.
Image of driverless pod courtesy of the Transport Systems Catapult.
www.theiet.org/transport
The Institution of Engineering and Technology (IET) is working to engineer a better world. We inspire, inform and influence the global engineering community, supporting technology
innovation to meet the needs of society. The Institution of Engineering and Technology is registered as a Charity in England and Wales (No. 211014) and Scotland (No. SC038698).
E6D16016/PDF/0616