Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
25 views11 pages

Data Trans

Uploaded by

Akmal Bintari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views11 pages

Data Trans

Uploaded by

Akmal Bintari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

IET Sector Insights

Big Data
in Transport
Data is increasingly significant in the
management and use of transport systems.
This Insight will explore big data management,
best practice and consider the challenges and
developments ahead for those responsible for
big data in a transport environment.

1. Introduction In all modes of transport, there is now a considerable


quantity and diversity of data available for operators to
Data sets can be so large and complex that they become
improve performance, efficiency, service provision, safety
difficult to process using traditional data processing
and security. Data also enables operators to manage
applications and existing data management tools. As a
demand conflicts, customer service, environmental
result, capturing, storing, searching, sharing, transferring
impacts and innovation. This can be seen in traffic signal
and analysing the data sets can be a significant challenge.
co-ordination, trains reporting track defects, on-line flight
check-ins and cargo tracking.
The trend to larger data sets is a result of the plethora of
information that can now be derived from the analysis
So data is increasingly significant in the management and
of a single large set of related data. Increasingly, data is
use of transport systems. It is important to identify the
gathered by information-sensing mobile devices, remote
needs for data and its capabilities and constraints. This will
sensing, software logs, cameras, microphones and
help to determine the impact of big data on transport and
wireless sensor networks. Global technological information
the innovations that are expected or necessary (such as
per-capita capacity has approximately doubled every 40
the internet of things).
months since the 1980s. Some predictions show that data
production will be 44 times greater in 2020 than it was in
2009.

In transport, the increase in data is manifest in the


availability of traffic information, particularly through
sat nav applications. Similar passenger information
applications provide departure information for public
transport users. Payment for transport (ticketing and
tolling) is increasingly reliant on data-dependent
technology, applications and services.

Big Data in Transport 1


IET Sector Insights
William Perugini / Shutterstock.com

1.1. Definition The velocity of data has increased in transport due


to improved communications technology and media
(particularly fibre optic cabling) and increased processing
Big data uses data sets with sizes beyond the capability of
power and speed for monitoring and processing. Some
traditionally-used software applications to capture, store,
applications have experienced a step change in data
manage and process data within acceptable time frames.
velocity as technology has changed. For example, ticketing
A single big data set size can range from a few dozen
and tolling transactions that use smart cards or tags are
terabytes to several petabytes. Gartner research defined
now immediately reported, whereas paper-based ticketing
big data challenges and opportunities as being three-
depends on human processing to acquire data from the
dimensional (3Vs model):
transactions.
n Volume (increasing amount of data)
The variety of transport-related data has increased
n Velocity (speed of data in and out)
significantly. Modern trains and aircraft report internal
n Variety (range of data types and their sources).
system telemetry in real time from anywhere in the world
and it is possible to acquire information about all crew
New forms of processing are required to enable enhanced
members and passengers. In past years, only static data
insight, decision making and process optimisation to
might have been available about the rolling stock, aircraft
address the characteristics of 3Vs. In transport, the volume
and crew and once outside human monitoring (e.g.
of data has increased because of growth in the amount
signalman observations or radar range), its progress or
of traffic (all modes) and detectors. Also, travellers, goods
position was unknown.
and vehicles generate more data from mobile devices
and tracking transponders (including trains, ships and
aircraft). Infrastructure, environmental and meteorological
monitoring also produces data that is related to transport
operations and users.

2 Big Data in Transport


IET Sector Insights

2. Big Data Management The infrastructure also needs to support flexible and
dynamic data structures. This means that all data that is
2.1. General available relating to a data item needs to be transmitted
and stored, even if it is not used by its primary application.
The big data process includes data acquisition, For example, all elements of a ticketing transaction
processing, aggregation and delivery. Data acquisition in need to be transmitted and stored so that it can be used
transport relates to the collection of a high volume of data effectively, even if its initial ticketing application only
from specific data sources e.g. presence detection data, needs to associate one barrier passage with a user ID on a
tolling and passenger transaction data. Data acquisition in database.
a big data environment is characterised by a high volume
of semi/unstructured raw data ready for processing (e.g. Data reliability is dependent on accuracy and precision
traffic speed). (for measured quantities and associated metadata, such
as time stamping). This means that the resolution of
Data processing involves cleansing (e.g. anonymization), measured quantities needs to be as high as possible and
the application of unique IDs to records and identification comms error rates need to be low. Operational demands
of errors. Clean data from multiple data sources is then for safety drive precision and accuracy in monitoring
made available for aggregation. and control in all modes. This encourages the use of
redundancy and multiple data sources. Safety is not
Big data aggregation is achieved by organising and compromised in the event of failures, but service levels
processing data from an unstructured to a structured often suffer. So, failures to detect train movements or
state. For example, vehicle presence detections are used control railway signals result in service disruptions, rather
to establish characteristics of traffic, such as flow or than higher accident rates. These requirements demand
occupancy, which is used to establish congestion or delay high data rates and fast processing, which necessitates
data. Or train departure data is used to predict delays. ever greater investment in detection, communications and
Aggregated data may or may not be moved from its original processing infrastructure.
location. Data sets may be aggregated into one big data
set, which can then be processed using intensive analytics
to identify relations, trends and insight. This is then
available for analysis and dissemination.

Information delivery uses advanced statistics and data


modelling to join disparate data so that it is organised
for presentation to end users. For a rail freight operator,
this might be a dashboard of progress, performance and
predictions. For a bus passenger, it might be an expected
arrival time for a service at a stop.

2.2 Data Acquisition

The infrastructure that is required to support big data


acquisition needs to handle high volumes of transactions
and to deliver low, predictable latency capture and
processing. Mobile network providers depend on
significant numbers of base stations to meet the service
pisaphotography / Shutterstock.com

levels demanded by their customers. Transport network


managers use mobile networks to acquire and deliver data.
Transport operators also use bespoke communications
services such as fibre optic networks to ensure they can
have immediate access to traffic and travel data and so
they can regulate and control traffic flows for normal and
incident operations. This applies to all modes.

Big Data in Transport 3


IET Sector Insights

It is notable that as well as increasing the demands on Mobile-sourced data also provides data acquisition
infrastructure provision directly, the need to monitor, opportunities, but with a different set of performance
transmit, process, and store all data elements also challenges to traditional detectors:
increases the need to manage data privacy and security,
which has an impact on data infrastructure provision. n GPS/mobile (speed, presence, count)
n Bluetooth (speed, presence, count).
This means that if an in-house storage solution is
adopted by a transport service provider, it requires As part of a Technology Strategy Board study with Deloitte,
significant capital expenditure on data storage. Cloud Imperial College London and INRIX, Transport for London
storage provides an alternative, with the capital and (TfL) also compared three datasets from existing detector
operational risk transferred to a third party, but with higher sources (off-call mobile phone data provided by INRIX,
operational costs manifest in service charges. Similarly, ANPR2 journey time data provided by the TfL LCAP3
communications services that support transport operators system and TfL iBus bus journey time data, based on GPS
used to be implemented and delivered by the operators vehicle tracking). It was found that mobile phone-sourced
themselves. This is less common now, with collaborative data quality depends on the context (e.g. time of day, type
and third party provision becoming more popular. of user and speed).

An Australian analysis of sensor options provides an The exploitation of mobile data to infer traffic flows in
indicator of how to match functions to detectors: urban environments is limited by its lack of flexibility
in measuring flows on-demand on a specific path, but
n Inductive loops (presence, count and speed) it might be aggregated with other sources to improve
n Piezo-electric strips (counts, pressure, speed) its performance. Also, mobile technology is subject to
n Pneumatic tubes (counts, speed) potential biases (for example, some age or social groups
n Cameras (counts, classification, speed, presence) might have a greater tendency to use bicycles or trains
n Infrared sensors (counts, speed, classification) instead of road-based motor vehicles). Unknown vehicle
n Passive acoustic (counts, speed) occupancy also increases the level of uncertainty when
n Microwave (counts, speed, presence) sourcing traffic flow data from mobile technology.
n RFID (presence, counts, classification).
The Global Marine Technology Trends (GMTT) 2030
In the maritime arena, big data and analytics has been report highlights the need to integrate a range of emerging
identified in a recent report addressing its application in technologies as a critical factor in developing robust,
commercial shipping and naval applications. It recognises reliable and efficient solutions to exploit data from a wide
the proliferation of big data solutions enabled by wireless range of sources in varying and constantly changing
communications, novel sensor technologies and the structures and architectures. This includes the need
creation of ad hoc networks, with widespread applications for robust, high bandwidth, secure communications
including, meteorological oceanographic, traffic data, supported by sophisticated analytics to augment highly
material and machinery performance data, cargo data and skilled operators.
accident data.
Connected vehicles developments are expected to
promote changes in the way data is acquired for highway
applications. The timeliness, availability and accuracy of
data will be much higher than for existing techniques and
it can be expected to contain more telemetry, rather than
alerts. However, its value is dependent on penetration
rates, which are currently low, but expected to increase.

4 Big Data in Transport


IET Sector Insights

This illustrates the challenges in using a wide variety of These can be analysed to identify performance of
data sources. It is essential to know and understand the particular train services, for example, over the preceding 6
quality of the data available. It highlights that big data months, which can be used to increase the confidence of
processing and aggregation needs to apply a data quality customers or the speed of operational decision making.
assessment, otherwise outputs will be inaccurate, so users’
decisions will carry a higher risk that their needs will not be Big data is expected to play an increasing role for
met and data providers’ reputations will suffer accordingly. transport infrastructure owners and operators in managing
their assets. BIM (Buildings Information Modelling)
Transport network and service operators need to manage generates asset information as soon as it is designed.
data to ensure it is available, reliable, accurate and Maintenance and service functions add to that data so
true. This means that relevant data standards should be that infrastructure owners are able to develop a clear
established and incoming data needs to be monitored, picture about the state of their assets, how they need to be
controlled and refined. For example, specific data quality managed and what resources might be needed to preserve
requirements for a transport network operator might their capability.
include:
2.3 Data Processing
n Spatial granularity (line, station, urban road
network, link, junction etc) The value of big data increases as latency decreases, i.e.
n Temporal granularity (minutes, hours days, the faster data is delivered, the more value it provides to
annual, etc) users. Performance improvements lead to qualitatively
n Direction discrimination better analysis outputs (e.g. closer to real-time). The
n Modal discrimination challenge, then, is to deliver data as fast as possible.
n Sample size within provided spatial
and temporal quantities This is supported by standardisation and DATEX II provides
n Bias (free or bias). a set of specifications for exchange of traffic information
in a standard format between separate systems. DATEXII
Other sources of data that are expected to contribute to big is a structured data model that utilises UML, is platform
data in transport include live feeds from social media (e.g. independent and seeks to harmonise the exchange of
Twitter – particularly for public transport), traffic data and traffic and travel information across the EU. Processing
weather. Many historic data sets are becoming available. data within these standards is the challenge.

Big Data in Transport 5


IET Sector Insights

Data standards are also developing for maritime The accuracy of data is likely to be an increasingly
application. The Automatic Identification System (AIS) is significant factor in data quality so that the value of
an automatic tracking system used on ships and by Vessel information in a big data solution can be enhanced. This
Traffic Services (VTS) for identifying and locating vessels is particularly relevant in predictive applications and it
by electronically exchanging data with other nearby ships, can be difficult to achieve. The challenge for transport
AIS base stations and satellites. The National Marine service providers is that when travellers are presented
Electronics Association (NMEA) standard uses two primary with forecasts about their journey, they expect them to
sentences for AIS data to receive data from other vessels be fulfilled. However, the only certainties that transport
and for own vessel’s information. operators can offer are records of past events.

Transport operators need to consider if current server For example, journey time postings on motorways are
storage has sufficient capacity to handle data within based on the measured journey times of recent travellers.
desired time parameters. Requirements for time If this is perceived as travel time prediction, it will be
parameters, storage solutions etc. will need to be correct as long as the traffic and highway conditions do
considered and assessed. Cloud computing solutions may not change. Similar challenges exist in all public transport
be a viable option, subject to careful feasibility analysis. operations. Big data can provide more realistic predictions
by comparing current conditions with historic data and by
2.4 Aggregation assigning confidence levels and tolerances to predictions.

The infrastructure required for organising big data must be The presentation of predictive information to travellers
able to manipulate and process data at the original storage raises more challenges. For example, travellers might rely
location and manage high throughput as part of the big on variable message signs, real time passenger information
data processing step in addition to being able to handle or platform displays when there are no disruptions,
data of varying types and converting unstructured data to but a mobile solution might be more appropriate for
structured data. dissemination of disruption information.

Transport service operators cannot always extrapolate If appropriate standards are used for data exchange
meaningful outputs from original source data (e.g. mobile (such as DATEXII), meta data will be available that can be
phone data) because of lack of expertise or investment in used to speed up searches. For example, timestamping
systems. Third party intervention is available to process can be used to filter historic data according to day, date
data into a meaningful and usable format. For example, or age, which enhances the quality of predictions for
sample bias can inhibit analysis and mobile data might traffic information. It also speeds up batch processing by
not be representative of the travelling population and narrowing down the data set for analysis. This is important
additional analysis and aggregation with other data sets for some operational applications, such as incident
might be necessary to create useful inputs for operators detection, where confidence levels can be raised by rapid
to use. As a result, transport operators will become more analysis of multiple data sets using narrow time and
reliant upon third parties to process and aggregate the location parameters.
data necessary for their own analysis and delivery.

Careful management is needed, therefore, to ensure


data quality is maintained. This is likely to benefit from
a collaborative approach with reciprocal arrangements
in place for the two way exchange of data. Third party
service providers can process data to a required quality
performance level in exchange for free access to
input data so that they can add value for subscription
customers.

6 Big Data in Transport


IET Sector Insights

2.5 Information Delivery The challenge of maintaining control of the data can be
achieved with appropriate agreements, which underlines
The value of big data needs to be challenged because big the need to work collaboratively with third parties (such as
data analysis (e.g. fusion and mining) might not produce mobile apps and traffic information service providers) in
the ‘truth’. Analysis could identify patterns where none what is essentially an un-regulated market.
exist because they might emerge if data is analysed for
long enough. Conversely, trends can be lost when data is Organisations such as TfL, Network Rail and Highways
combined. This indicates that skills and expertise are likely England have pursued an open data approach, making
to be important in big data processing. data and reports freely available to third parties and the
public. This commonly involves removal of restrictions on
If there is no understanding of context, it can be lost commercial usage of data in a bid to increase information
within a big data set. Diverting motorists to switch modes availability and dissemination. TfL provides Application
and catch a train to avoid congestion will be fruitless if Programme Interface (API) for web and app developers for
there are no parking spaces at the station. Finding ways journey planning, live travel disruptions and underground
to convert information into simple messages can be a and bus service information.
significant challenge, particularly if the output media have
constraints (for example, Variable Message Signs - VMS). Transport operators need to ensure that a single source
of truth can still be maintained if big data is made freely
Also, it is important to ensure a single source of truth and available. They also need to ensure that third parties do
that third party users do not corrupt data and misuse it. not reduce the quality of this data and that it is used in
pursuance of its transport obligations.

Big Data in Transport 7


IET Sector Insights

3. Best Practice The Digital Railway concept is building on the principles of


systems engineering to improve performance and increase
3.1 Data Acquisition efficiency, with expectations that it will support future
integration to achieve multimodal travel, bringing together
SQL databases cannot be used for collection and storage airlines, buses, trams and taxis.
due to the variety of data formats stored within big data
sets. This means that big data needs to be processed For urban highways, Urban Traffic Management and
by converting data from an unstructured to a structured Control (UTMC) is likely to be a useful way of obtaining
form, or by using approaches that do not rely on relational data from a range of compatible systems and equipment
databases, such as NoSQL . (e.g. VMS and ANPR). Highways England is implementing
a new control system for motorways (CHARM), which is
Using a systems engineering approach, data can be seen primarily intended for operational control but it will use
in the context of business requirements driving operational significant data input from traffic detection, which will be
requirements. User needs can be viewed in the context of used in parallel by the National Traffic Information Service
an architecture that is governed by standards. The choice (NTIS). Developments in mobile data processing provide
of data sources should be driven by business needs. For new opportunities for data acquisition (for example,
example, Highways England’s business objectives are TomTom®).
driven by safety, performance and customer service. Data
from traffic and environmental sensors and CCTV cameras Automotive suppliers provide services that acquire data
is used to meet operational objectives for the levels, from the vehicles they have supplied whilst their customers
severity and resolution of accident, congestion, journey are using them. This is subsequently processed to provide
time reliability and traveller information. useful information for all of their customers.

8 Big Data in Transport


IET Sector Insights

For example, some Jaguar Landrover (JLR) vehicles 3.4 Information Delivery
collect data about problems in the highway surface (e.g.
potholes), which is subsequently used to create warnings UTMC enables traffic management applications to share
for other JLR drivers. Ideally, this data should be available and communicate information amongst themselves e.g.
to highway authorities for asset management purposes and VMS with ANPR.
to other drivers as part of traffic information and JLR is
working to that end. DATEX II is a structured data model that utilises UML,
is platform independent and seeks to harmonise the
3.2 Data Processing exchange of traffic and travel information across the EU.
Transxchange is a Department for Transport (DfT)-
Building a legacy big data environment should be avoided sponsored national standard for bus information exchange
because of the risk of potential disruptive changes such as with other systems and SIRI ((Service Interface for Real
new data types, hardware and programming approaches. Time Information) is an EU standard for exchange of
This means that standards and commercial off the shelf current, planned and predicted real time public transport
(COTS) solutions should be used wherever possible. information between systems).

DATEX II is a European data interoperability standard that


includes a range of data types and delivery mechanisms
associated with traffic and travel information delivery and
exchange.

It is now adopted by Highways England as its primary


information protocol. DATEXII is a structured data model
that utilises UML, is platform independent and seeks to
harmonise the exchange of traffic and travel information
across the EU. Highways England believes adoption of
DATEX II will improve data quality and timeliness.

3.3 Aggregation

Packages are available that provide an open-source


software framework designed for large-scale processing
and storage of data sets on clusters of commodity
hardware (e.g. Hadoop, Apache Spark). This allows big
data to be processed and organised whilst data is stored
on an original database.

Once data has been processed and aggregated, the


aggregated data set can provide multiple reporting
processes or reports with a source of data i.e. one big data
set can be reused for multiple activities.

TomTom® uses data aggregated from its subscribers


and inputs from network operators to provide traffic
information. Also, Twitter® feeds can be monitored
to identify hot spots of activity, which might indicate
public transport delays. The growth of non-relational
database processing is likely to continue because it brings
unstructured data sources into big data solutions.

Big Data in Transport 9


IET Sector Insights

4. Big Data Challenges and Developments 4.3 Privacy

4.1 Data Access Perceptions of privacy are also likely to influence the
value of big data, particularly where it relates to the use of
Network operators want travel information to be distributed personal data derived from mobile or ANPR sources. This
effectively so that travellers can make effective journey can be anonymised so that there is no risk to privacy, but
decisions. This is commonly achieved by making data the perception that authorities are tracking individuals is
freely available to third parties for processing and onward likely to continue. This puts at risk the ability to acquire
dissemination. The commercial models that support the data because individuals will not trust authorities or app
third party providers involve app purchases, subscriptions suppliers.
or advertising revenue.
Geographic location is associated with a device through
4.2 Data Quality a relevant identification (e.g. GPS coordinates, Internet
Protocol (IP) address, RFID, or Wi-Fi positioning system).
Whilst travellers perceive a benefit, the business model is For safety applications (such as eCall), regulations have
sustainable, but any disruptions or changes in perceived been developed to protect privacy.
benefits might challenge revenue streams. For example,
if all motorists are advised to take an alternative route to It is notable that individuals are prepared to share their
avoid an incident, significant delays might ensue. Similarly, location data if they perceive a clear benefit in doing so,
if motorists are advised to change modes and take the which explains why, for example, fitness apps are popular.
train, the credibility of the advice will be undermined if The challenge of this approach for big data is that inputs to
there is no parking availability at the station and all the mobile-sourced data might not be representative because
trains are late or full. subscribers will have particular motivations for permitting
access to their location data. Demographic attitudes
This means that even without external business towards sharing personal data might also favour younger
disruptions, it is important to maintain and improve the generations.
quality of data and associated advice.

10 Big Data in Transport


IET Sector Insights

Principles of proportionality and minimisation, with For example, cloud-based deployment provides some
transparent processes, policies and strategies are likely protection against obsolescence by partially offsetting
to be important in retaining user confidence. Ideally, this capital risk against ongoing service costs.
should be publicised alongside the benefits in sharing
data. 4.6 Skills

4.4 Interoperability Big data is likely to affect skill sets in the transport industry
in the future. As operations become more complex, the
Big data solutions are more likely to succeed if the data drive for improvements in services and efficiencies can
is interoperable, enabling systems to process data from be expected to increase the dependence on systems
any source. The key to integration is standardisation and and data. Over time, system processes will develop to
an open architecture. Regulation is needed to deliver this perform better than their human counterparts in such a
when the market cannot. scenario, which will reduce network manager’s reliance
on operators’ skills and knowledge. However, this trend is
Standardisation through regulation can be seen in the likely to increase the dependency on data specialist skills
application of Telematics Applications for Passenger to manage performance.
Services Technical Specifications for Interoperability
(TAP TSI). This defines European-wide procedures and Application program interfaces (APIs) are also expected
interfaces between all types of railway stakeholders. TAP to create dependencies on skills for big data deployment.
TSI supports interoperable and cost-efficient information APIs provide the building blocks of protocols, tools and
exchange for high quality journey information and routines for the interaction of software components in
ticketing. Similarly, specifications have been adopted to order to create applications, particularly when developing
improve interoperability of real-time highway status and graphical user interfaces (GUIs). The challenge for
traffic data to be made accessible in a standardised format suppliers, authorities and managers will be to ensure
(DATEX II) as part of the ITS Directive . that skills are available at the right level and in sufficient
quantity to support big data solutions.
4.5 Business Case
4.7 Internet of Things
Mobile data provides significant opportunities for big data
deployment. However, rapid development in technology The internet of things is expected to create significant
and services creates uncertainty for big data investment. data content as more and more devices become
Business cases could be undermined if solutions are connected. This is likely to provide opportunities for big
obsolete before they reach maturity, so flexibility needs to data application developers and cloud service providers to
be built into the delivery. innovate.

This IET Transport Sector Insight was written by Matthew Clarke, ATKINS Transportation.
Image of driverless pod courtesy of the Transport Systems Catapult.

Visit our website


for all the latest news and information from
the IET Transport Sector

www.theiet.org/transport

The Institution of Engineering and Technology (IET) is working to engineer a better world. We inspire, inform and influence the global engineering community, supporting technology
innovation to meet the needs of society. The Institution of Engineering and Technology is registered as a Charity in England and Wales (No. 211014) and Scotland (No. SC038698).

E6D16016/PDF/0616

Big Data in Transport 11

You might also like