Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@pcmoritz
Copy link
Contributor

@pcmoritz pcmoritz commented May 15, 2018

written by @concretevitamin

@pcmoritz pcmoritz force-pushed the tensorflow-plasma-op branch 11 times, most recently from bd9aa43 to 5d0174a Compare May 16, 2018 23:17
@codecov-io
Copy link

codecov-io commented May 17, 2018

Codecov Report

Merging #2046 into master will increase coverage by 0.03%.
The diff coverage is 95.48%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2046      +/-   ##
==========================================
+ Coverage   86.28%   86.31%   +0.03%     
==========================================
  Files         242      244       +2     
  Lines       41042    41192     +150     
==========================================
+ Hits        35413    35556     +143     
- Misses       5629     5636       +7
Impacted Files Coverage Δ
python/pyarrow/plasma.py 86.27% <77.77%> (-1.83%) ⬇️
python/pyarrow/tests/test_plasma_tf_op.py 91.42% <91.42%> (ø)
cpp/src/plasma/tf_plasma_op.cc 98.19% <98.19%> (ø)
cpp/src/arrow/util/memory.h 100% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f319bca...e261f2c. Read the comment docs.

@pcmoritz pcmoritz force-pushed the tensorflow-plasma-op branch 4 times, most recently from 4a09f87 to 3cc3362 Compare May 17, 2018 04:41
@zhijunfu
Copy link
Contributor

Just a bit curious: this sounds like some translation between TF data and plasma objects, is the purpose of the change to support TensorFlow to use Plasma as store?
And it seems the code mainly uses Plasma client to access the store, functionality wise it doesn't look like to be part of plasma if I understand correctly, would you mind to explain a little bit on the considerations to include this as part of plasma? Thanks!

@pcmoritz
Copy link
Contributor Author

pcmoritz commented May 17, 2018

@zhijunfu This op is supposed to be useful to transfer data between Plasma and TensorFlow. Supporting machine learning workloads seamlessly is very important for us and we want to make the interop between TensorFlow and Plasma as high performance as possible. This op will be fully optional and there will be a flag that is off by default which determines if the op gets built/included or not, so there will be no dependency on TensorFlow for users if they don't want to use the op.

You are right that the op interacts with plasma through the client interface.

[This is work in progress]

Also cc'ing @concretevitamin, who is the author of the op.

@zhijunfu
Copy link
Contributor

@pcmoritz Got it. Thanks for the explanations.

@pcmoritz pcmoritz force-pushed the tensorflow-plasma-op branch 5 times, most recently from 8644be8 to ae2e7ca Compare May 19, 2018 21:33
@pcmoritz pcmoritz force-pushed the tensorflow-plasma-op branch 3 times, most recently from e3c44db to 6a44a5b Compare May 20, 2018 07:11
@pcmoritz pcmoritz force-pushed the tensorflow-plasma-op branch from 6a44a5b to c705d53 Compare May 20, 2018 07:27
@pcmoritz pcmoritz changed the title [WIP] Tensorflow plasma op ARROW-1744: [Plasma] Provide TensorFlow operator to transfer Tensors between Plasma and TensorFlow May 20, 2018
endif()
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -D_GLIBCXX_USE_CXX11_ABI=0")
include_directories(${TensorFlow_INCLUDE_DIRS})
include_directories(${TensorFlow_INCLUDE_DIRS}/external/nsync/public)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, (1) nsync is not included in the tf.sysconfig output? (2) orthogonal to last question, is this needed here? I never needed it to compile by hand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only needed for python 2.7 and yes, it's not included in the tf.sysconfig output. See sadeepj/crfasrnn_keras#19 which I'm putting in.

}
}

// #define PLASMA_CLIENT_DOES_NOT_EXIST 3

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commented out block can all be removed.

Copy link
Contributor Author

@pcmoritz pcmoritz May 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doing this now


REGISTER_KERNEL_BUILDER(Name("TensorToPlasma").Device(DEVICE_CPU),
TensorToPlasmaOp<CPUDevice>);
REGISTER_KERNEL_BUILDER(Name("TensorToPlasma").Device(DEVICE_GPU),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please wrap these two GPU registrations with GOOGLE_CUDA guard

Copy link
Contributor Author

@pcmoritz pcmoritz May 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doing this now


REGISTER_KERNEL_BUILDER(Name("PlasmaToTensor").Device(DEVICE_CPU),
PlasmaToTensorOp<CPUDevice>);
REGISTER_KERNEL_BUILDER(Name("PlasmaToTensor").Device(DEVICE_GPU),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doing this now


const int64_t size_in_bytes = object_buffer.data->size();
TensorShape shape({size_in_bytes / sizeof(float)});
// LOG(INFO) << "Output TensorShape: " << shape.DebugString();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all commented out code can be removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doing this now

OP_REQUIRES_ASYNC(context, success,
errors::Internal("D2H memcpy failed to be enqueued."), done);
}
// TODO(zongheng): does std::move() give better performance?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can remove

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep

#include "tensorflow/core/framework/shape_inference.h"
#include "tensorflow/core/platform/logging.h"
#include "tensorflow/core/platform/mutex.h"
#ifdef GOOGLE_CUDA

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

group with gpu_event_mgr include

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Contributor

@robertnishihara robertnishihara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::string plasma_store_socket_name_;
std::string plasma_manager_socket_name_;

mutex mu_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document all of the private fields. In particular, mu_ guards more than just client_, right?


sess.run(to_plasma)
# NOTE(zongheng): currently it returns a flat 1D tensor.
# So reshape manually.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we serialize data as Arrow tensors, it should be easy to get rid of this limitation.

if (std::is_same<Device, CPUDevice>::value) {
for (int i = 0; i < num_tensors; ++i) {
const auto& input_tensor = context->input(i);
std::memcpy(static_cast<void*>(data + offsets[i] / sizeof(float)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's serialize data as Arrow tensors.

plasma_manager_socket_name="")

def FromPlasma():
return plasma.tf_plasma_op.plasma_to_tensor(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a page documenting the API and giving code examples for how to use it?

const int64_t size_in_bytes = object_buffer.data->size();
TensorShape shape({size_in_bytes / sizeof(float)});
// LOG(INFO) << "Output TensorShape: " << shape.DebugString();
// LOG(INFO) << "size_in_bytes of the plasma object: " << size_in_bytes;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove all the dead code.

}

const int64_t size_in_bytes = object_buffer.data->size();
TensorShape shape({size_in_bytes / sizeof(float)});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make sure we support different dtypes and preserve the tensor type.

This code looks like it might fail for tensors whose dtype is smaller than 4 bytes.

@robertnishihara
Copy link
Contributor

It'd be nice to ship GPU and CPU versions of the op with pyarrow. Is that feasible?

@pcmoritz
Copy link
Contributor Author

I don't know if we can build against CUDA in Travis and/or have a GPU machine for building @wesm @xhochy Any thoughts? Moving forward, it will become more important for the project I think.

@ppwwyyxx
Copy link

TF's core may change across versions. For example I've had to rebuild some of my ops and horovod after a TF upgrade.
Some of TF's build flags will affect binary compatibility. For example "GLIBCXX_USE_CXX11_ABI".

Therefore it's probably not feasible to ship the op as a binary like the rest of pyarrow. It'll have to be built during pip install.

@xhochy
Copy link
Member

xhochy commented May 21, 2018

I guess we can build against CUDA on Travis but not test with a real GPU. We should ensure that we can build all artefacts within Travis, otherwise it will be hard to make any binary releases.

I'm thinking a bit if it may be better for this functionality to be available as a separate Python package as it may often be needed to be compiled from scratch. I would expect that we should be able to ship most parts of pyarrow in conda-forge and on PyPI compiled so that it supports a wide range of installations but the TensorflowOp may be a bit more picky. Despite it being a separate Package, I would keep it in the Arrow sourcetree and always release it within an Arrow release. Thus it will profit from all Arrow enhancements and its supporting build infrastructure directly.

Some of TF's build flags will affect binary compatibility. For example "GLIBCXX_USE_CXX11_ABI".

pyarrow wheels are always built with GLIBCXX_USE_CXX11_ABI=0 as required by the manylinux1 standard.

Therefore it's probably not feasible to ship the op as a binary like the rest of pyarrow. It'll have to be built during pip install.

As we ship wheels with pip, it cannot be compiled during pip install pyarrow as for wheel there is no real install/compile step involved. It more in the terms of simply extract.

@pcmoritz
Copy link
Contributor Author

replaced by #2104

@pcmoritz pcmoritz closed this Jun 21, 2018
robertnishihara pushed a commit that referenced this pull request Jul 17, 2018
…between Plasma and TensorFlow

This is based off of #2046 from @pcmoritz and @concretevitamin, who wrote the original TensorFlow Op with help from @ericl.

In addition to the previous PR, this supports serialization of arbitrary type tensors.

To get it working, arrow has to be compiled in the following way:

```
cmake -DARROW_PLASMA=on -DARROW_TENSORFLOW=on -DARROW_PYTHON=on ..
make -j8
sudo make install
```

And pyarrow like that:

```
PYARROW_WITH_PLASMA=1 PYARROW_WITH_TENSORFLOW=1 python setup.py develop
```

The PR also includes utilities that should be generally useful in converting between TensorFlow and Arrow. It is currently supposed to be used to transfer weights in and out of TensorFlow but hopefully in the future there will also be support for data Tensors.

Author: Philipp Moritz <[email protected]>
Author: Peter Schafhalter <[email protected]>

Closes #2104 from pschafhalter/tensor-serialization and squashes the following commits:

50c6635 <Philipp Moritz> don't import tensorflow on manylinux1
3b7bbc7 <Philipp Moritz> skip tests that use tensorflow inside of manylinux1 container
097d48d <Philipp Moritz> Merge branch 'master' into tensor-serialization
df44982 <Philipp Moritz> Merge branch 'master' into tensor-serialization
13e4db3 <Philipp Moritz> remove setting the deleted variable to nullptr
b79c710 <Philipp Moritz> more fixes
200ad89 <Philipp Moritz> move build flags earlier
b4b4134 <Philipp Moritz> tensorflow namespace
b0731ad <Philipp Moritz> fixes
10e369a <Philipp Moritz> more fixes
01e580c <Philipp Moritz> more cleanups
027d683 <Philipp Moritz> fix
e494f28 <Philipp Moritz> clean up code
1ef52d0 <Philipp Moritz> fix
90757c1 <Philipp Moritz> fix gpu code
d5d3740 <Philipp Moritz> fix
d137fcf <Philipp Moritz> compile pyarrow without cxx-11 ABI too
81f34ff <Philipp Moritz> fix
b4e02e1 <Philipp Moritz> fix test
749e21e <Philipp Moritz> cleanups and test
fb8786e <Philipp Moritz> fix
bc3a5bb <Philipp Moritz> cleanups
56c9cf1 <Philipp Moritz> fix
c6066ab <Philipp Moritz> fix
2fe3523 <Philipp Moritz> cleanups
4dc4ff3 <Philipp Moritz> fix TensorFlow test
3264efe <Philipp Moritz> test
acc1f0b <Philipp Moritz> fix test
372551f <Philipp Moritz> pkg-config
64ee050 <Philipp Moritz> fix test
d83b9ae <Philipp Moritz> fix test
2385e7c <Philipp Moritz> fix
af4eb2f <Philipp Moritz> cleanups
6a34556 <Philipp Moritz> move things around
afa5dba <Philipp Moritz> fix
b29bf39 <Philipp Moritz> linting
14c9e50 <Philipp Moritz> fix linting
eb1a576 <Philipp Moritz> fix
47a21c9 <Philipp Moritz> add license
1c8c5fa <Philipp Moritz> Merge branch 'master' into tensor-serialization
1442be0 <Philipp Moritz> test more dtypes
ead1a2e <Philipp Moritz> test
af68371 <Philipp Moritz> debug2
a2131ad <Philipp Moritz> debug
531f4a7 <Philipp Moritz> fix
be83a20 <Philipp Moritz> try datatype
077266b <Philipp Moritz> fixes
1164d5a <Philipp Moritz> fixes
4f641c3 <Philipp Moritz> fix warnings
a674720 <Philipp Moritz> fix
140a2b9 <Philipp Moritz> needs debugging
8b1dd8c <Philipp Moritz> get write path working
d7ead2b <Philipp Moritz> fix
d291f64 <Philipp Moritz> fix
f461b38 <Philipp Moritz> fix linting
3b01abe <Philipp Moritz> fix
232ed6f <Philipp Moritz> add tensor utilities
f7466de <Peter Schafhalter> Formatting
3ed436b <Peter Schafhalter> Formatting
f4ad8e5 <Peter Schafhalter> Check that all tensors have the same dtype
f601bb0 <Peter Schafhalter> Convert Tensor type to Arrow
4167d1d <Peter Schafhalter> Fix tensor serialization
e261f2c <Philipp Moritz> fixes for the op
d381ff3 <Philipp Moritz> fixes
c705d53 <Philipp Moritz> fix for python 2.7
22b5564 <Philipp Moritz> fix wheel building
ae2e7ca <Philipp Moritz> adapt code for arrow
f6cef78 <Philipp Moritz> add initial tensorflow op
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants