Codestin Search App

ssnl · 2018-03-17T18:53:18Z

This builds upon #5855 , and is the second of three PRs that #5537 will be split into.

ezyang · 2018-03-19T19:41:40Z

Forgot to check in a file?

ssnl · 2018-03-19T19:43:27Z

@ezyang This should be built upon #5855 , but it doesn't include the commits from that. I didn't know how to split a PR. :P I'll fix this.

Sign in to view

ssnl · 2018-03-23T17:14:35Z

Let me add enough text/comments on this PR and ping you afterwards. :)

…

On Fri, Mar 23, 2018 at 13:12 Edward Z. Yang ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In aten/src/ATen/native/cuda/SpectralOps.cu <#5856 (comment)>: > +#include <thrust/unique.h> +#include <cufft.h> +#include <cufftXt.h> +#include <cmath> +#include <numeric> +#include <iostream> + +namespace at { namespace native { + +__forceinline__ +static bool is_pow_of_two(long long int x) { + return (x & (x - 1)) == 0; +} + +// counting_iterator => index to fill +struct cnt_to_dst_idx_functor : public thrust::unary_function<int64_t, int64_t> What does this struct do — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#5856 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFaWZaoA9J8E0E59RDHeznKHKsA__oLMks5thS0ZgaJpZM4Su66h> .

ssnl · 2018-03-23T17:16:26Z

Cufft uses long long int. I don’t think int64_t is equivalent with long long int. Maybe I’m wrong?

…

On Fri, Mar 23, 2018 at 13:15 Edward Z. Yang ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In aten/src/ATen/native/cuda/SpectralOps.cu <#5856 (comment)>: > + IntList output_sizes) { + Tensor input = self; + + bool is_half = input.type().scalarType() == ScalarType::Half; + + // cuFFT requires input and output data pointers to complex type aligned + // our allocated output tensor is always 256 bytes aligned so it is fine, but + // we need to check input tensor to make sure that it is not unaligned, e.g. + // from a slicing. + auto complex_size_bytes = 2 * input.type().elementSizeInBytes(); + if (reinterpret_cast<std::uintptr_t>(input.data_ptr()) % complex_size_bytes != 0) { + input = self.clone(); + } + + // check input batch size + long long int batch = input.size(0); int64_t seems more idiomatic here — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#5856 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFaWZTvUdTicBUWsRk9_cjBfG2ft50Haks5thS2XgaJpZM4Su66h> .

Sign in to view

ssnl · 2018-03-23T17:33:22Z

Yes. One can use multiple gpu to do large FFT. But we don’t do it here so it’s fine.

…

On Fri, Mar 23, 2018 at 13:32 Edward Z. Yang ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In aten/src/ATen/native/cuda/SpectralOps.cu <#5856 (comment)>: > + auto data = device_ptr(input.data<cuda_scalar_t>()); + dst_iterator dsts(data, dst_idxs); + src_iterator srcs(dst_idxs, dst_idx_to_src_functor<cuda_scalar_t>(input)); + thrust::copy_n(policy, srcs, n, dsts); + }); +} + +// cufft +Tensor _fft_cufft(const Tensor& self, int64_t signal_ndim, + bool complex_input, bool complex_output, + bool inverse, IntList checked_signal_sizes, + bool normalized, bool onesided, + IntList output_sizes) { + Tensor input = self; + + bool is_half = input.type().scalarType() == ScalarType::Half; I am reading the half restrictions and they state "More than one GPU is not supported". Do you know what this means? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#5856 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFaWZa1dENEDaz5aG1KealWc8f0Ythi0ks5thTGbgaJpZM4Su66h> .

Sign in to view

ssnl · 2018-03-23T17:41:07Z

It is in the pass through wrapper function.

…

On Fri, Mar 23, 2018 at 13:39 Edward Z. Yang ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In aten/src/ATen/native/SpectralOps.cpp <#5856 (comment)>: > + const bool onesided) { + + if (signal_ndim < 1 || signal_ndim > 3) { + std::ostringstream ss; + ss << "Expected signal_ndim to be 1, 2, or 3, but got signal_ndim=" + << signal_ndim; + throw std::runtime_error(ss.str()); + } + if (!at::isFloatingType(self.type().scalarType())) { + std::ostringstream ss; + ss << "Expected an input tensor of floating types, but got input " + << self.type() << self.sizes(); + throw std::runtime_error(ss.str()); + } + + auto signal_tensor_ndim = signal_ndim + (int) complex_input; // add complex dim Also, you're using a contiguous inner dimension to simulate complex numbers. Is the contiguity of the inner dimension being checked somewhere? I couldn't find it. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#5856 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFaWZY8YFGooePV_S0CNHIGhJ8t1TePgks5thTMbgaJpZM4Su66h> .

Sign in to view

ezyang · 2018-03-23T17:56:32Z

Still haven't reviewed the mkl. Just cufft and the wrapper.

ssnl · 2018-03-23T21:42:43Z

.... omg why doesn't github email reply directly reply to the review comment.

Sign in to view

+  // We need to clone the input tensor in these two cases.
+  auto complex_size_bytes = 2 * input.type().elementSizeInBytes();
+  if ((reinterpret_cast<std::uintptr_t>(input.data_ptr()) % complex_size_bytes != 0 ||
+      (complex_input && !complex_output && input.stride(signal_ndim) != 2))) {


Sign in to view

+      std::ostringstream ss;
+      ss << "cuFFT doesn't support signals of half type with compute "
+         << "capability less than SM_53, but the device containing input half "
+         << "tensor only has SM_" << dev_prop->major << dev_prop->minor;


Sign in to view

+// including cuFFT and MKL.
+//
+// cuFFT doc: http://docs.nvidia.com/cuda/cufft/index.html#multi-dimensional
+// MKL doc: https://software.intel.com/en-us/mkl-developer-reference-c-dfti-complex-storage-dfti-real-storage-dfti-conjugate-even-storage#CONJUGATE_EVEN_STORAGE


ssnl · 2018-03-28T15:03:02Z

@ezyang Ready for round 2 review :)

Sign in to view

+// signals, but only contains half (onesided) of the values.
+// This function modifies inplace.
+__forceinline__
+static void _fft_fill_with_conjugate_symmetry_(Tensor& input,


Sign in to view

+  bool need_contiguous = input.stride(signal_ndim) == 0;
+  if (complex_input) {
+    // Real/imag dimension must be like complex type. Need to make the input
+    // tensor conitugous if this dimension is not contiguous.


Sign in to view

+  // need to check input and output size and strides
+  // be careful about complex domain, where the stride needs to be divided by 2
+  // only need to test upper bound MKL_LONG_MAX as these values are non-negative
+  if (sizeof(MKL_LONG) < sizeof(int64_t)) {


Sign in to view

+  });
+}
+
+// Returns undefined tensor if the parameter range is greater than MKL_LONG.


* implement fft ifft rfft irfft * add tests for fft ifft rfft irfft

onnxbot-worker-3 mentioned this pull request Mar 17, 2018

[auto] pytorch-pr-5856 onnxbot/onnx-fb-universe#1183

Closed

ssnl force-pushed the fft_fwd branch 4 times, most recently from 5871d40 to 1e3e1fd Compare March 21, 2018 21:33

ezyang mentioned this pull request Mar 23, 2018

[fft] [3 of 3] Implements backward of fft ifft rfft irfft #5537

Merged