Inspired by papers:

What: Temporal Autoencoder for Predicting Video

How: Tensorflow version of CNN to LSTM to uCNN

Why:

Inspired by papers:

Uses parts of (or inspired by) the following repos:

https://github.com/tensorflow/models/blob/master/real_nvp/real_nvp_utils.py https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/rnn/python/ops/core_rnn_cell_impl.py https://github.com/machrisaa/tensorflow-vgg https://github.com/loliverhennigh/ https://coxlab.github.io/prednet/ https://github.com/tensorflow/models/tree/master/video_prediction https://github.com/yoonkim/lstm-char-cnn https://github.com/anayebi/keras-extra https://github.com/tgjeon/TensorFlow-Tutorials-for-Time-Series https://github.com/jtoy/awesome-tensorflow https://github.com/aymericdamien/TensorFlow-Examples

Inspired by the following articles:

http://spectrum.ieee.org/automaton/robotics/artificial-intelligence/deep-learning-ai-listens-to-machines-for-signs-of-trouble?adbsc=social_20170124_69611636&adbid=823956941219053569&adbpl=tw&adbpr=740238495952736256

http://www.theverge.com/2016/8/4/12369494/descartes-artificial-intelligence-crop-predictions-usda

https://devblogs.nvidia.com/parallelforall/exploring-spacenet-dataset-using-digits/

And inspired to a lesser extent the following papers:

https://arxiv.org/abs/1508.01211 https://arxiv.org/abs/1507.08750 https://arxiv.org/abs/1505.00295 www.ijcsi.org/papers/IJCSI-8-4-1-139-148.pdf cs231n.stanford.edu/reports2016/223_Report.pdf

Program Requirements:

Tensorflow and related packages like python
OpenCV

Post-Processing requirements

avconv, mencoder, MP4Box,smplayer

How to run:

python main.py

Post-processing: making model vs. predicted video:

sh mergemov.sh

smplayer out_all.mp4 or smplayer out_all2_fast.mp4

Some training results:

Balls, slow movie:
Balls, fast movie:
Training Curve in Tensorflow (norm order 80):
Wheel, slow movie:
Wheel, fast movie:
Training Curve in Tensorflow (norm order 40):

Notes for wheel case:

Longer training frames work better to predict longer
Seems to need to have loss over at least one rotation to be able to predict well into multiple frames in the future
Central part of wheel diffuses even when otherwise does well. Lack of resolution

Parameters:

In main.py:

Choose global flags
In main():
- Choose to use checkpoints (if exist) or not: continuetrain
- type of model: modeltype
- number of balls: num_balls

In balls.py:

SIZE: size of ball's bounding box in pixels
omega: angular frequency of rotation for modeltype=1 (wheel type)

Ideas and Future Work:

Test on other models
Try more filters
Try L2 loss not only on (or not just on) final image, but hidden states. Should approximate adversarial networks, which keep image and hidden latent variable more smoothly connected (i.e. avoid fractured manifold).
Try different hyperparameters
Try multi-scale for space
Try multi-scale for time (to capture periods over long times)
Try Stacked Conv/Deconv LSTMs (https://arxiv.org/pdf/1506.04214v2.pdf and https://arxiv.org/pdf/1605.07157v4.pdf)
Try skip connections (https://arxiv.org/pdf/1605.07157v4.pdf)
Try temporal convolution
Try other LSTM architectures (C-peek, bind forget-recall, GRU, etc.)
Try adversarial loss:

https://github.com/carpedm20/DCGAN-tensorflow http://blog.aylien.com/introduction-generative-adversarial-networks-code-tensorflow/ http://blog.aylien.com/introduction-generative-adversarial-networks-code-tensorflow/ http://blog.aylien.com/introduction-generative-adversarial-networks-code-tensorflow/ http://blog.aylien.com/introduction-generative-adversarial-networks-code-tensorflow/ (pytorch) http://blog.aylien.com/introduction-generative-adversarial-networks-code-tensorflow/ https://arxiv.org/pdf/1511.05644v2.pdf

Try more depth in time
Train with geodesic acceleration (can't be done in python in tensorflow)
Try homogenous LSTM/CNN architecture
Include depth in CNN even if not explicitly 3D data, to avoid issues with overlapping pixel space causing diffusion
Estimate velocity field in rgb, to avoid collisions most likely state as averaging to no motion due to L2 error's treatment of two possible states.
Use entropy generation rate to train attention where can best predict.
Try rotation, faces, and ultimately real video.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
lossexamples		lossexamples
LICENSE		LICENSE
README.md		README.md
clstm.py		clstm.py
layers.py		layers.py
main.py		main.py
mergemov.sh		mergemov.sh
models.py		models.py
myslide.sh		myslide.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

What: Temporal Autoencoder for Predicting Video

How: Tensorflow version of CNN to LSTM to uCNN

Why:

Inspired by papers:

Uses parts of (or inspired by) the following repos:

Inspired by the following articles:

And inspired to a lesser extent the following papers:

Program Requirements:

Post-Processing requirements

How to run:

Some training results:

Parameters:

Ideas and Future Work:

About

Uh oh!

Releases

Packages

Languages

License

pseudotensor/temporal_autoencoder

Folders and files

Latest commit

History

Repository files navigation

What: Temporal Autoencoder for Predicting Video

How: Tensorflow version of CNN to LSTM to uCNN

Why:

Inspired by papers:

Uses parts of (or inspired by) the following repos:

Inspired by the following articles:

And inspired to a lesser extent the following papers:

Program Requirements:

Post-Processing requirements

How to run:

Some training results:

Parameters:

Ideas and Future Work:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages