Thanks to visit codestin.com
Credit goes to github.com

Skip to content
This repository was archived by the owner on Dec 9, 2024. It is now read-only.
This repository was archived by the owner on Dec 9, 2024. It is now read-only.

tf_cnn_benchmarks.py training speed using TFrecord in SSD drive is only half of the one using synthetic data #378

@VincentChong123

Description

@VincentChong123

Hi @tfboyd,

Problem: tf_cnn_benchmarks.py training speed using TFrecord in SSD drive is only half of the one using synthetic data
Question: how to identify and reduce the software bottleneck for training with TFrecord in SSD drive?

Attached image show training speed and gpu/cpu utilization ratio for imagenet training using TFrecord in SSD drive (upper picture) and synthetic data (lower picture), based on tf_cnn_benchmarks.py cmd in https://github.com/tensorflow/benchmarks

Not hardware bottleneck because the same PC with Pytorch ver 1.0.1.post2 achieves 320 img/sec (100% GPU util) for resnet50 training that uses jpeg in SSD. Pytorch training code taken from https://github.com/pytorch/examples/tree/master/imagenet
python main.py -a resnet50 /N/data/ILSVRC2012/partition/imagenet-data/imagenet_data

system infor
ubuntu18.04
Samsung SSD 970 EVO
TensorFlow:  1.14
Model:       mobilenet
Dataset:     imagenet
Mode:        training
SingleSess:  False
Batch size:  192 global
192 per device
Num batches: 600548
Num epochs:  90.00
Devices:     ['/gpu:0']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   parameter_server

ScreenHunter_67 May  21 12 31

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions