tf_cnn_benchmarks.py training speed using TFrecord in SSD drive is only half of the one using synthetic data

Hi @tfboyd, 

Problem: tf_cnn_benchmarks.py training speed using TFrecord in SSD drive  is only half of the one using synthetic data 
Question: how to identify and reduce the software bottleneck for training with TFrecord in SSD drive? 

Attached image show training speed and gpu/cpu utilization ratio for imagenet training using TFrecord in SSD drive (upper picture) and synthetic data (lower picture), based on tf_cnn_benchmarks.py cmd in https://github.com/tensorflow/benchmarks

Not hardware bottleneck because the same PC with Pytorch ver 1.0.1.post2 achieves 320 img/sec (100% GPU util) for resnet50 training that uses jpeg in SSD. Pytorch training code taken from https://github.com/pytorch/examples/tree/master/imagenet 
python main.py -a resnet50 /N/data/ILSVRC2012/partition/imagenet-data/imagenet_data

system infor
ubuntu18.04
Samsung SSD 970 EVO 
TensorFlow:  1.14
Model:       mobilenet
Dataset:     imagenet
Mode:        training
SingleSess:  False
Batch size:  192 global
192 per device
Num batches: 600548
Num epochs:  90.00
Devices:     ['/gpu:0']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   parameter_server

![ScreenHunter_67 May  21 12 31](https://user-images.githubusercontent.com/17609244/58068621-a6937980-7bc4-11e9-9392-38328f136e3a.jpg)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

tf_cnn_benchmarks.py training speed using TFrecord in SSD drive is only half of the one using synthetic data #378

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

tf_cnn_benchmarks.py training speed using TFrecord in SSD drive is only half of the one using synthetic data #378

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions