-
Notifications
You must be signed in to change notification settings - Fork 75.2k
Closed
Labels
comp:gpuGPU related issuesGPU related issuesstat:awaiting responseStatus - Awaiting response from authorStatus - Awaiting response from authortype:performancePerformance IssuePerformance Issue
Description
OS:
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): CentOS release 7.4.1708
- TensorFlow installed from (source or binary): From source
- Python version: 2.7.13
- Bazel version: 0.6.1
- CUDA/cuDNN version: CUDA 8.0/cuDNN 6.0.21
- GPU model and memory: GeForce GTX 950M, memory 4GB
output of tf_env_collect.sh
== cat /etc/issue ===============================================
Linux zhanghao 3.10.0-693.2.2.el7.x86_64 #1 SMP Tue Sep 12 22:26:13 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
VERSION="7 (Core)"
VERSION_ID="7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
== are we in docker =============================================
No
== compiler =====================================================
c++ (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16)
Copyright © 2015 Free Software Foundation, Inc.
本程序是自由软件;请参看源代码的版权声明。本软件没有任何担保;
包括没有适销性和某一专用目的下的适用性担保。
== uname -a =====================================================
Linux zhanghao 3.10.0-693.2.2.el7.x86_64 #1 SMP Tue Sep 12 22:26:13 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
== check pips ===================================================
== check for virtualenv =========================================
False
== tensorflow import ============================================
Traceback (most recent call last):
File "<string>", line 1, in <module>
ImportError: No module named tensorflow
== env ==========================================================
LD_LIBRARY_PATH is unset
DYLD_LIBRARY_PATH is unset
== nvidia-smi ===================================================
Tue Oct 10 16:36:08 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.90 Driver Version: 384.90 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 950M Off | 00000000:0A:00.0 Off | N/A |
| N/A 45C P0 N/A / N/A | 0MiB / 4044MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
== cuda libs ===================================================
== cat /etc/issue ===============================================
Linux zhanghao 3.10.0-693.2.2.el7.x86_64 #1 SMP Tue Sep 12 22:26:13 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
VERSION="7 (Core)"
VERSION_ID="7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
== are we in docker =============================================
No
== compiler =====================================================
c++ (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16)
Copyright © 2015 Free Software Foundation, Inc.
本程序是自由软件;请参看源代码的版权声明。本软件没有任何担保;
包括没有适销性和某一专用目的下的适用性担保。
== uname -a =====================================================
Linux zhanghao 3.10.0-693.2.2.el7.x86_64 #1 SMP Tue Sep 12 22:26:13 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
== check pips ===================================================
numpy (1.12.1)
protobuf (3.4.0)
tensorflow (1.4.0rc0)
tensorflow-tensorboard (0.4.0rc1)
== check for virtualenv =========================================
False
== tensorflow import ============================================
tf.VERSION = 1.4.0-rc0
tf.GIT_VERSION = v1.3.0-rc1-3111-g4196d6d
tf.COMPILER_VERSION = v1.3.0-rc1-3111-g4196d6d
Sanity check: array([1], dtype=int32)
== env ==========================================================
LD_LIBRARY_PATH /usr/local/cuda/lib64/:/usr/local/cuda/lib64/stubs/:/usr/local/cuda/extras/CUPTI/lib64/:/usr/local/cuda/nvvm/lib64/:/usr/lib64/nvidia/:/opt/intel/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64:/opt/intel/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin:/opt/intel/compilers_and_libraries_2017.4.196/linux/mpi/intel64/lib:/opt/intel/compilers_and_libraries_2017.4.196/linux/mpi/mic/lib:/opt/intel/compilers_and_libraries_2017.4.196/linux/ipp/lib/intel64:/opt/intel/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin:/opt/intel/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin:/opt/intel/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64/gcc4.7:/opt/intel/debugger_2017/iga/lib:/opt/intel/debugger_2017/libipt/intel64/lib:/opt/intel/compilers_and_libraries_2017.4.196/linux/daal/lib/intel64_lin:/opt/intel/compilers_and_libraries_2017.4.196/linux/daal/../tbb/lib/intel64_lin/gcc4.4
DYLD_LIBRARY_PATH is unset
== nvidia-smi ===================================================
Tue Oct 10 16:36:37 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.90 Driver Version: 384.90 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 950M Off | 00000000:0A:00.0 Off | N/A |
| N/A 45C P0 N/A / N/A | 0MiB / 4044MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
== cuda libs ===================================================
/usr/local/cuda-8.0/doc/man/man7/libcudart.so.7
/usr/local/cuda-8.0/doc/man/man7/libcudart.7
/usr/local/cuda-8.0/targets/x86_64-linux/lib/libcudart.so.8.0.61
/usr/local/cuda-8.0/targets/x86_64-linux/lib/libcudart_static.a
output of python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"
('v1.3.0-rc1-3111-g4196d6d', '1.4.0-rc0')
Describe the problem
SVD on GPU is slower than SVD on CPU
Source code / logs
file main.py
import tensorflow as tf
import numpy as np
import sys
D = 1024
dA = np.random.normal(size=(D,D))
dev = "/gpu:0" if len(sys.argv)==1 else "/cpu:0"
with tf.device(dev):
A = tf.placeholder(shape=(D,D),dtype=tf.float32)
S, U, V = tf.svd(A)
config = tf.ConfigProto()
config.log_device_placement = True
config.graph_options.optimizer_options.global_jit_level=tf.OptimizerOptions.ON_1
sess = tf.Session(config=config)
for _ in xrange(10):
dS, dU, dV = sess.run((S, U, V), feed_dict={A:dA})
run on GPU
time python main.py
2017-10-10 16:28:49.047703: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:892] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-10-10 16:28:49.048176: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: GeForce GTX 950M major: 5 minor: 0 memoryClockRate(GHz): 1.124
pciBusID: 0000:0a:00.0
totalMemory: 3.95GiB freeMemory: 3.91GiB
2017-10-10 16:28:49.048205: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 950M, pci bus id: 0000:0a:00.0, compute capability: 5.0)
Device mapping:
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 950M, pci bus id: 0000:0a:00.0, compute capability: 5.0
2017-10-10 16:28:49.064960: I tensorflow/core/common_runtime/direct_session.cc:299] Device mapping:
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 950M, pci bus id: 0000:0a:00.0, compute capability: 5.0
Svd: (Svd): /job:localhost/replica:0/task:0/device:GPU:0
2017-10-10 16:28:49.067234: I tensorflow/core/common_runtime/placer.cc:874] Svd: (Svd)/job:localhost/replica:0/task:0/device:GPU:0
Placeholder: (Placeholder): /job:localhost/replica:0/task:0/device:GPU:0
2017-10-10 16:28:49.067302: I tensorflow/core/common_runtime/placer.cc:874] Placeholder: (Placeholder)/job:localhost/replica:0/task:0/device:GPU:0
2017-10-10 16:28:49.074053: I tensorflow/core/kernels/cuda_solvers.cc:159] Creating CudaSolver handles for stream 0x488e860
python main.py 27.50s user 2.30s system 100% cpu 29.658 total
run on CPU
time python main.py -
2017-10-10 16:29:53.252138: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:892] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-10-10 16:29:53.252572: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: GeForce GTX 950M major: 5 minor: 0 memoryClockRate(GHz): 1.124
pciBusID: 0000:0a:00.0
totalMemory: 3.95GiB freeMemory: 3.91GiB
2017-10-10 16:29:53.252600: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 950M, pci bus id: 0000:0a:00.0, compute capability: 5.0)
Device mapping:
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 950M, pci bus id: 0000:0a:00.0, compute capability: 5.0
2017-10-10 16:29:53.269242: I tensorflow/core/common_runtime/direct_session.cc:299] Device mapping:
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 950M, pci bus id: 0000:0a:00.0, compute capability: 5.0
Svd: (Svd): /job:localhost/replica:0/task:0/device:CPU:0
2017-10-10 16:29:53.271505: I tensorflow/core/common_runtime/placer.cc:874] Svd: (Svd)/job:localhost/replica:0/task:0/device:CPU:0
Placeholder: (Placeholder): /job:localhost/replica:0/task:0/device:CPU:0
2017-10-10 16:29:53.271544: I tensorflow/core/common_runtime/placer.cc:874] Placeholder: (Placeholder)/job:localhost/replica:0/task:0/device:CPU:0
python main.py - 34.33s user 10.68s system 621% cpu 7.241 total
Metadata
Metadata
Assignees
Labels
comp:gpuGPU related issuesGPU related issuesstat:awaiting responseStatus - Awaiting response from authorStatus - Awaiting response from authortype:performancePerformance IssuePerformance Issue