Thanks to visit codestin.com
Credit goes to github.com

Skip to content

shuxingcheng/transdim

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

transdim

MIT License Python 3.7 GitHub stars

Machine learning models make important developments about spatiotemporal data modeling - like how to forecast near-future traffic states of road networks. But what happens when these models are built with incomplete data commonly collected in real-world systems?

About the Project

In the transdim (transportation data imputation) project, we create machine learning models to help address some of the toughest challenges of spatiotemporal data modeling -- from missing data imputation to time series prediction.

In a hurry? Please check out our contents as follows.

Table of Contents
Strategic Aim
Tasks and Challenges
What we do just now!
Overview
Selected References
Our Publications
License

Strategic Aim

Creating accurate and efficient solutions for the spatiotemporal traffic data imputation and prediction tasks.

Tasks and Challenges

  • Missing data imputation

    • Random missing: Each sensor lost their observations at completely random. (★★★)
    • Non-random missing: Each sensor lost their observations during several days. (★★★★)
  • Rolling traffic prediction

    • Forecasting without missing values. (★★★)
    • Forecasting with incomplete observations. (★★★★★)

What we do just now!

  • add a framework indicating overall studies;

framework

Framework: Tensor completion task and its framework including data organization and tensor completion, in which traffic measurements are partially observed.

  • define the problems clearly;

    • Example: Traffic forecasting using matrix factorization models.

      example

Real experiment setting: Observations with 0%, 20% and 40% fiber missing rates during first 56 days are treated as stationary inputs. Meanwhile, there are some rolling inputs for forecasting traffic speed during last 5 days (from Monday to Friday) in a rolling manner.

  • describe the core challenges intuitively;
  • list main contributions of these studies.

What we care about!

  • Best algebraic structure for data imputation.
  • The context of urban transportation.
  • Data noise avoidance.
  • Competitive imputation and prediction performance.
  • Capable of various missing data scenarios.

Overview

With the development and application of intelligent transportation systems, large quantities of urban traffic data are collected on a continuous basis from various sources, such as loop detectors, cameras, and floating vehicles. These data sets capture the underlying states and dynamics of transportation networks and the whole system and become beneficial to many traffic operation and management applications, including routing, signal control, travel time prediction, and so on. However, the missing data problem is inevitable when collecting traffic data from intelligent transportation systems.

Publicly available at our Zenodo repository!

example (a) Time series of actual and estimated speed within two weeks from August 1 to 14.

example (b) Time series of actual and estimated speed within two weeks from September 12 to 25.

The imputation performance of BGCP (CP rank r=15 and missing rate α=30%) under the fiber missing scenario with third-order tensor representation, where the estimated result of road segment #1 is selected as an example. In the both two panels, red rectangles represent fiber missing (i.e., speed observations are lost in a whole day).

Machine learning models

  • Missing data imputation

Urban traffic speed data set (i.e., Guangzhou-data-set(Gdata)) registered traffic speed data from 214 road segments over two months (61 days from August 1 to September 30 in 2016) in Guangzhou, China. We organize the raw data into a time series matrix of (214, 8784). For tensor-based models, we use a third-order tensor (214, 61, 144) as input. Matrix based models are tested with the time series matrix (214, 8784).

We consider two common missing data scenarios (i.e., random missing (RM) and non-random missing (NM)). For RM, we simply remove certain amount of observed entries in the matrix randomly and use these entries as ground truth to evaluate RMSE. For NM, we apply correlated fiber missing experiment by randomly choosing certain amount (e.g., 40%) (location, day) combinations and removing the whole time series in each combination.

Selected References

Our Publications

  • Xinyu Chen, Zhaocheng He, Yixian Chen, Yuhuan Lu, Jiawei Wang (2019). Missing traffic data imputation and pattern discovery with a Bayesian augmented tensor factorization model. Transportation Research Part C: Emerging Technologies, 104: 66-77. [preprint] [slide] [data] [Matlab code]

  • Xinyu Chen, Zhaocheng He, Lijun Sun (2019). A Bayesian tensor decomposition approach for spatiotemporal traffic data imputation. Transportation Research Part C: Emerging Technologies, 98: 73-84. [preprint] [doi] [data] [Matlab code] [Python code]

  • Xinyu Chen, Zhaocheng He, Jiawei Wang (2018). Spatial-temporal traffic speed patterns discovery and incomplete data recovery via SVD-combined tensor decomposition. Transportation Research Part C: Emerging Technologies, 86: 59-77. [doi] [data]

    Please consider citing our papers if they help your research.

Our Blog Posts (in Chinese)

License

This work is released under the MIT license.

About

Data imputation for urban transportation systems

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%