ARROW-1744: [Plasma] Provide TensorFlow operator to transfer Tensors between Plasma and TensorFlow #2046

pcmoritz · 2018-05-15T05:07:40Z

codecov-io · 2018-05-17T00:04:55Z

Codecov Report

Merging #2046 into master will increase coverage by 0.03%.
The diff coverage is 95.48%.

@@            Coverage Diff             @@
##           master    #2046      +/-   ##
==========================================
+ Coverage   86.28%   86.31%   +0.03%     
==========================================
  Files         242      244       +2     
  Lines       41042    41192     +150     
==========================================
+ Hits        35413    35556     +143     
- Misses       5629     5636       +7

Impacted Files	Coverage Δ
python/pyarrow/plasma.py	`86.27% <77.77%> (-1.83%)`	⬇️
python/pyarrow/tests/test_plasma_tf_op.py	`91.42% <91.42%> (ø)`
cpp/src/plasma/tf_plasma_op.cc	`98.19% <98.19%> (ø)`
cpp/src/arrow/util/memory.h	`100% <0%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f319bca...e261f2c. Read the comment docs.

zhijunfu · 2018-05-17T07:14:36Z

Just a bit curious: this sounds like some translation between TF data and plasma objects, is the purpose of the change to support TensorFlow to use Plasma as store?
And it seems the code mainly uses Plasma client to access the store, functionality wise it doesn't look like to be part of plasma if I understand correctly, would you mind to explain a little bit on the considerations to include this as part of plasma? Thanks!

pcmoritz · 2018-05-17T07:38:31Z

@zhijunfu This op is supposed to be useful to transfer data between Plasma and TensorFlow. Supporting machine learning workloads seamlessly is very important for us and we want to make the interop between TensorFlow and Plasma as high performance as possible. This op will be fully optional and there will be a flag that is off by default which determines if the op gets built/included or not, so there will be no dependency on TensorFlow for users if they don't want to use the op.

You are right that the op interacts with plasma through the client interface.

[This is work in progress]

Also cc'ing @concretevitamin, who is the author of the op.

zhijunfu · 2018-05-17T08:50:28Z

@pcmoritz Got it. Thanks for the explanations.

concretevitamin · 2018-05-20T22:31:24Z

cpp/src/plasma/CMakeLists.txt

+  endif()
+  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -D_GLIBCXX_USE_CXX11_ABI=0")
+  include_directories(${TensorFlow_INCLUDE_DIRS})
+  include_directories(${TensorFlow_INCLUDE_DIRS}/external/nsync/public)


Hmm, (1) nsync is not included in the tf.sysconfig output? (2) orthogonal to last question, is this needed here? I never needed it to compile by hand.

This is only needed for python 2.7 and yes, it's not included in the tf.sysconfig output. See sadeepj/crfasrnn_keras#19 which I'm putting in.

concretevitamin · 2018-05-20T22:33:13Z

cpp/src/plasma/tf_plasma_op.cc

+        }
+      }
+
+      // #define PLASMA_CLIENT_DOES_NOT_EXIST 3


This commented out block can all be removed.

doing this now

concretevitamin · 2018-05-20T22:33:46Z

cpp/src/plasma/tf_plasma_op.cc

+
+REGISTER_KERNEL_BUILDER(Name("TensorToPlasma").Device(DEVICE_CPU),
+                        TensorToPlasmaOp<CPUDevice>);
+REGISTER_KERNEL_BUILDER(Name("TensorToPlasma").Device(DEVICE_GPU),


Please wrap these two GPU registrations with GOOGLE_CUDA guard

doing this now

concretevitamin · 2018-05-20T22:33:51Z

cpp/src/plasma/tf_plasma_op.cc

+
+REGISTER_KERNEL_BUILDER(Name("PlasmaToTensor").Device(DEVICE_CPU),
+                        PlasmaToTensorOp<CPUDevice>);
+REGISTER_KERNEL_BUILDER(Name("PlasmaToTensor").Device(DEVICE_GPU),


doing this now

concretevitamin · 2018-05-20T22:34:09Z

cpp/src/plasma/tf_plasma_op.cc

+
+    const int64_t size_in_bytes = object_buffer.data->size();
+    TensorShape shape({size_in_bytes / sizeof(float)});
+    // LOG(INFO) << "Output TensorShape: " << shape.DebugString();


all commented out code can be removed

doing this now

concretevitamin · 2018-05-20T22:34:24Z

cpp/src/plasma/tf_plasma_op.cc

+        OP_REQUIRES_ASYNC(context, success,
+                          errors::Internal("D2H memcpy failed to be enqueued."), done);
+      }
+      // TODO(zongheng): does std::move() give better performance?


concretevitamin · 2018-05-20T22:35:09Z

cpp/src/plasma/tf_plasma_op.cc

+#include "tensorflow/core/framework/shape_inference.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/mutex.h"
+#ifdef GOOGLE_CUDA


group with gpu_event_mgr include

robertnishihara

Nice work @pcmoritz @concretevitamin!

robertnishihara · 2018-05-20T22:40:02Z

cpp/src/plasma/tf_plasma_op.cc

+  std::string plasma_store_socket_name_;
+  std::string plasma_manager_socket_name_;
+
+  mutex mu_;


Please document all of the private fields. In particular, mu_ guards more than just client_, right?

robertnishihara · 2018-05-20T22:41:00Z

python/pyarrow/tests/test_plasma_tf_op.py

+
+        sess.run(to_plasma)
+        # NOTE(zongheng): currently it returns a flat 1D tensor.
+        # So reshape manually.


If we serialize data as Arrow tensors, it should be easy to get rid of this limitation.

robertnishihara · 2018-05-20T22:51:32Z

cpp/src/plasma/tf_plasma_op.cc

+    if (std::is_same<Device, CPUDevice>::value) {
+      for (int i = 0; i < num_tensors; ++i) {
+        const auto& input_tensor = context->input(i);
+        std::memcpy(static_cast<void*>(data + offsets[i] / sizeof(float)),


Let's serialize data as Arrow tensors.

robertnishihara · 2018-05-20T23:02:00Z

python/pyarrow/tests/test_plasma_tf_op.py

+                plasma_manager_socket_name="")
+
+        def FromPlasma():
+            return plasma.tf_plasma_op.plasma_to_tensor(


Can you add a page documenting the API and giving code examples for how to use it?

robertnishihara · 2018-05-20T23:02:16Z

cpp/src/plasma/tf_plasma_op.cc

+    const int64_t size_in_bytes = object_buffer.data->size();
+    TensorShape shape({size_in_bytes / sizeof(float)});
+    // LOG(INFO) << "Output TensorShape: " << shape.DebugString();
+    // LOG(INFO) << "size_in_bytes of the plasma object: " << size_in_bytes;


Let's remove all the dead code.

robertnishihara · 2018-05-20T23:03:37Z

cpp/src/plasma/tf_plasma_op.cc

+    }
+
+    const int64_t size_in_bytes = object_buffer.data->size();
+    TensorShape shape({size_in_bytes / sizeof(float)});


We should make sure we support different dtypes and preserve the tensor type.

This code looks like it might fail for tensors whose dtype is smaller than 4 bytes.

robertnishihara · 2018-05-20T23:17:11Z

It'd be nice to ship GPU and CPU versions of the op with pyarrow. Is that feasible?

pcmoritz · 2018-05-21T00:03:46Z

I don't know if we can build against CUDA in Travis and/or have a GPU machine for building @wesm @xhochy Any thoughts? Moving forward, it will become more important for the project I think.

ppwwyyxx · 2018-05-21T00:20:08Z

TF's core may change across versions. For example I've had to rebuild some of my ops and horovod after a TF upgrade.
Some of TF's build flags will affect binary compatibility. For example "GLIBCXX_USE_CXX11_ABI".

Therefore it's probably not feasible to ship the op as a binary like the rest of pyarrow. It'll have to be built during pip install.

xhochy · 2018-05-21T11:22:22Z

I guess we can build against CUDA on Travis but not test with a real GPU. We should ensure that we can build all artefacts within Travis, otherwise it will be hard to make any binary releases.

I'm thinking a bit if it may be better for this functionality to be available as a separate Python package as it may often be needed to be compiled from scratch. I would expect that we should be able to ship most parts of pyarrow in conda-forge and on PyPI compiled so that it supports a wide range of installations but the TensorflowOp may be a bit more picky. Despite it being a separate Package, I would keep it in the Arrow sourcetree and always release it within an Arrow release. Thus it will profit from all Arrow enhancements and its supporting build infrastructure directly.

Some of TF's build flags will affect binary compatibility. For example "GLIBCXX_USE_CXX11_ABI".

pyarrow wheels are always built with GLIBCXX_USE_CXX11_ABI=0 as required by the manylinux1 standard.

Therefore it's probably not feasible to ship the op as a binary like the rest of pyarrow. It'll have to be built during pip install.

As we ship wheels with pip, it cannot be compiled during pip install pyarrow as for wheel there is no real install/compile step involved. It more in the terms of simply extract.

pcmoritz · 2018-06-21T23:40:09Z

replaced by #2104

@pcmoritz

…between Plasma and TensorFlow This is based off of #2046 from @pcmoritz and @concretevitamin, who wrote the original TensorFlow Op with help from @ericl. In addition to the previous PR, this supports serialization of arbitrary type tensors. To get it working, arrow has to be compiled in the following way: ``` cmake -DARROW_PLASMA=on -DARROW_TENSORFLOW=on -DARROW_PYTHON=on .. make -j8 sudo make install ``` And pyarrow like that: ``` PYARROW_WITH_PLASMA=1 PYARROW_WITH_TENSORFLOW=1 python setup.py develop ``` The PR also includes utilities that should be generally useful in converting between TensorFlow and Arrow. It is currently supposed to be used to transfer weights in and out of TensorFlow but hopefully in the future there will also be support for data Tensors. Author: Philipp Moritz <[email protected]> Author: Peter Schafhalter <[email protected]> Closes #2104 from pschafhalter/tensor-serialization and squashes the following commits: 50c6635 <Philipp Moritz> don't import tensorflow on manylinux1 3b7bbc7 <Philipp Moritz> skip tests that use tensorflow inside of manylinux1 container 097d48d <Philipp Moritz> Merge branch 'master' into tensor-serialization df44982 <Philipp Moritz> Merge branch 'master' into tensor-serialization 13e4db3 <Philipp Moritz> remove setting the deleted variable to nullptr b79c710 <Philipp Moritz> more fixes 200ad89 <Philipp Moritz> move build flags earlier b4b4134 <Philipp Moritz> tensorflow namespace b0731ad <Philipp Moritz> fixes 10e369a <Philipp Moritz> more fixes 01e580c <Philipp Moritz> more cleanups 027d683 <Philipp Moritz> fix e494f28 <Philipp Moritz> clean up code 1ef52d0 <Philipp Moritz> fix 90757c1 <Philipp Moritz> fix gpu code d5d3740 <Philipp Moritz> fix d137fcf <Philipp Moritz> compile pyarrow without cxx-11 ABI too 81f34ff <Philipp Moritz> fix b4e02e1 <Philipp Moritz> fix test 749e21e <Philipp Moritz> cleanups and test fb8786e <Philipp Moritz> fix bc3a5bb <Philipp Moritz> cleanups 56c9cf1 <Philipp Moritz> fix c6066ab <Philipp Moritz> fix 2fe3523 <Philipp Moritz> cleanups 4dc4ff3 <Philipp Moritz> fix TensorFlow test 3264efe <Philipp Moritz> test acc1f0b <Philipp Moritz> fix test 372551f <Philipp Moritz> pkg-config 64ee050 <Philipp Moritz> fix test d83b9ae <Philipp Moritz> fix test 2385e7c <Philipp Moritz> fix af4eb2f <Philipp Moritz> cleanups 6a34556 <Philipp Moritz> move things around afa5dba <Philipp Moritz> fix b29bf39 <Philipp Moritz> linting 14c9e50 <Philipp Moritz> fix linting eb1a576 <Philipp Moritz> fix 47a21c9 <Philipp Moritz> add license 1c8c5fa <Philipp Moritz> Merge branch 'master' into tensor-serialization 1442be0 <Philipp Moritz> test more dtypes ead1a2e <Philipp Moritz> test af68371 <Philipp Moritz> debug2 a2131ad <Philipp Moritz> debug 531f4a7 <Philipp Moritz> fix be83a20 <Philipp Moritz> try datatype 077266b <Philipp Moritz> fixes 1164d5a <Philipp Moritz> fixes 4f641c3 <Philipp Moritz> fix warnings a674720 <Philipp Moritz> fix 140a2b9 <Philipp Moritz> needs debugging 8b1dd8c <Philipp Moritz> get write path working d7ead2b <Philipp Moritz> fix d291f64 <Philipp Moritz> fix f461b38 <Philipp Moritz> fix linting 3b01abe <Philipp Moritz> fix 232ed6f <Philipp Moritz> add tensor utilities f7466de <Peter Schafhalter> Formatting 3ed436b <Peter Schafhalter> Formatting f4ad8e5 <Peter Schafhalter> Check that all tensors have the same dtype f601bb0 <Peter Schafhalter> Convert Tensor type to Arrow 4167d1d <Peter Schafhalter> Fix tensor serialization e261f2c <Philipp Moritz> fixes for the op d381ff3 <Philipp Moritz> fixes c705d53 <Philipp Moritz> fix for python 2.7 22b5564 <Philipp Moritz> fix wheel building ae2e7ca <Philipp Moritz> adapt code for arrow f6cef78 <Philipp Moritz> add initial tensorflow op

pcmoritz force-pushed the tensorflow-plasma-op branch 11 times, most recently from bd9aa43 to 5d0174a Compare May 16, 2018 23:17

pcmoritz force-pushed the tensorflow-plasma-op branch 4 times, most recently from 4a09f87 to 3cc3362 Compare May 17, 2018 04:41

pcmoritz force-pushed the tensorflow-plasma-op branch 5 times, most recently from 8644be8 to ae2e7ca Compare May 19, 2018 21:33

pcmoritz added 3 commits May 19, 2018 14:34

add initial tensorflow op

f6cef78

adapt code for arrow

ae2e7ca

fix wheel building

22b5564

pcmoritz force-pushed the tensorflow-plasma-op branch 3 times, most recently from e3c44db to 6a44a5b Compare May 20, 2018 07:11

fix for python 2.7

c705d53

pcmoritz force-pushed the tensorflow-plasma-op branch from 6a44a5b to c705d53 Compare May 20, 2018 07:27

fixes

d381ff3

pcmoritz changed the title ~~[WIP] Tensorflow plasma op~~ ARROW-1744: [Plasma] Provide TensorFlow operator to transfer Tensors between Plasma and TensorFlow May 20, 2018

pcmoritz mentioned this pull request May 20, 2018

ARROW-1744: [WIP] Add plasma tensorflow op #1379

Closed

concretevitamin reviewed May 20, 2018

View reviewed changes

robertnishihara reviewed May 20, 2018

View reviewed changes

fixes for the op

e261f2c

pschafhalter mentioned this pull request Jun 5, 2018

ARROW-1744: [Plasma] Provide TensorFlow operator to transfer Tensors between Plasma and TensorFlow #2104

Closed

pcmoritz closed this Jun 21, 2018

ARROW-1744: [Plasma] Provide TensorFlow operator to transfer Tensors between Plasma and TensorFlow #2046

ARROW-1744: [Plasma] Provide TensorFlow operator to transfer Tensors between Plasma and TensorFlow #2046

Uh oh!

Conversation

pcmoritz commented May 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-io commented May 17, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

zhijunfu commented May 17, 2018

Uh oh!

pcmoritz commented May 17, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhijunfu commented May 17, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pcmoritz May 20, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pcmoritz May 20, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

robertnishihara left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

robertnishihara commented May 20, 2018

Uh oh!

pcmoritz commented May 21, 2018

Uh oh!

ppwwyyxx commented May 21, 2018

Uh oh!

xhochy commented May 21, 2018

Uh oh!

pcmoritz commented Jun 21, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

pcmoritz commented May 15, 2018 •

edited

Loading

codecov-io commented May 17, 2018 •

edited

Loading

pcmoritz commented May 17, 2018 •

edited

Loading

pcmoritz May 20, 2018 •

edited

Loading

pcmoritz May 20, 2018 •

edited

Loading