Thanks to visit codestin.com
Credit goes to github.com

Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

[MXNET-349] Histogram Operator#10931

Merged
piiswrong merged 5 commits into
apache:masterfrom
haojin2:histogram
Jun 25, 2018
Merged

[MXNET-349] Histogram Operator#10931
piiswrong merged 5 commits into
apache:masterfrom
haojin2:histogram

Conversation

@haojin2
Copy link
Copy Markdown
Contributor

@haojin2 haojin2 commented May 14, 2018

Description

This PR implements histogram operator.

Checklist

Essentials

  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Implementation of internal _histogram operator
  • User-facing Python interface
  • Unit tests

Comments

The user-facing interface resembles basic behaviors of Numpy's histogram: https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.histogram.html.
Three versions are supported:
histogram(x, bins=7) - divided into 7 equal bins in the range of [x.min, x.max]
histogram(x, bins=7, range=(a,b)) - divided into 7 equal bins in the range of [a, b], anything outside the range is ignored
histogram(x, bins=NDArray) - bin bounds are specified by the bins NDArray.

@haojin2 haojin2 requested a review from szha as a code owner May 14, 2018 08:18
@haojin2 haojin2 force-pushed the histogram branch 2 times, most recently from a1bf838 to 49a9ae4 Compare May 14, 2018 08:31
Comment thread python/mxnet/ndarray/ndarray.py Outdated
return _internal._histogram(data=a, bins=bins)
elif isinstance(bins, int):
if range_ is None:
range_ = (float(a.min().asnumpy()[0]), float(a.max().asnumpy()[0]))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't do this in the frontend.

Comment thread python/mxnet/ndarray/ndarray.py Outdated
elif isinstance(bins, int):
if range_ is None:
range_ = (float(a.min().asnumpy()[0]), float(a.max().asnumpy()[0]))
return _internal._histogram(data=a, bins=array([]), bin_cnt=bins, range=range_)
Copy link
Copy Markdown
Contributor

@piiswrong piiswrong May 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why make an invalid array/

Copy link
Copy Markdown
Contributor Author

@haojin2 haojin2 May 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bins is not used in this version of _histogram, but it's one of the inputs so I need a mock array.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can define a mode variable in HistogramParam for distinguishing the cases of 1 and 2 ndarray inputs. When register the op, use set_num_inputs([](const NodeAttrs& attrs) to define the number of inputs based upon the mode.
See example:
https://github.com/apache/incubator-mxnet/blob/master/src/operator/nn/convolution.cc#L471

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread src/operator/tensor/histogram.cc Outdated
.add_argument("bins", "NDArray-or-Symbol", "Input ndarray")
.add_arguments(HistogramParam::__FIELDS__());

NNVM_REGISTER_OP(_backward_histogram)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Historgram op is not differentiable. Don't register backward op.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will get rid of it.

@haojin2 haojin2 force-pushed the histogram branch 3 times, most recently from 7c44e7f to b17585a Compare May 14, 2018 20:50
Comment thread python/mxnet/ndarray/ndarray.py Outdated
return NDArray(handle=_new_alloc_handle(shape, ctx, False, dtype))


def histogram(a, bins=10, range_=None):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why underscore suffix for range?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pylint complains if we override the keyword range

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@piiswrong numpy.histogram has parameter range. Should we disable pylint check here to make mx.nd.histogram parameter naming consistent with numpy?

"""

# pylint: disable= no-member, protected-access
if isinstance(bins, NDArray):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to check whether bins is a 1d array?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's checked inside the op

Comment thread python/mxnet/ndarray/ndarray.py Outdated
# pylint: disable= no-member, protected-access
if isinstance(bins, NDArray):
return _internal._histogram(data=a, bins=bins)
elif isinstance(bins, int):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace int with integer_types.

Comment thread tests/python/unittest/test_operator.py Outdated
def f(x, bins=10, range=None):
return np.histogram(x, bins, range=range)

shape = (3, 3, 3)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Randomize shape and test ndarrays from 1D-5D.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread tests/python/unittest/test_operator.py Outdated
x = rand_ndarray(shape, stype='default')
mx_bins = mx.nd.array([-1.0, 0.5, 2.0, 4.5, 50.0])
np_bins = mx_bins.asnumpy()
bin_cnt = 5
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should test more settings of bin_cnt.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread python/mxnet/ndarray/ndarray.py Outdated
elif isinstance(bins, int):
if range_ is None:
range_ = (float(a.min().asnumpy()[0]), float(a.max().asnumpy()[0]))
return _internal._histogram(data=a, bins=array([]), bin_cnt=bins, range=range_)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can define a mode variable in HistogramParam for distinguishing the cases of 1 and 2 ndarray inputs. When register the op, use set_num_inputs([](const NodeAttrs& attrs) to define the number of inputs based upon the mode.
See example:
https://github.com/apache/incubator-mxnet/blob/master/src/operator/nn/convolution.cc#L471

Comment thread src/operator/tensor/histogram-inl.h Outdated
SHAPE_ASSIGN_CHECK(*out_attrs, 1, in_attrs->at(1));
}

return out_attrs->at(0).ndim() == 1U && out_attrs->at(0).Size() != 0U &&
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call shape_is_none to simplify the code. ndim == 1 should have been guaranteed before this line.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread src/operator/tensor/histogram-inl.h Outdated

TYPE_ASSIGN_CHECK(*out_attrs, 0, mshadow::kInt64);
TYPE_ASSIGN_CHECK(*out_attrs, 1, in_attrs->at(0));
return out_attrs->at(0) != -1 && out_attrs->at(1) != -1;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call !type_is_none(out_attrs->at(i))

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@haojin2 haojin2 force-pushed the histogram branch 4 times, most recently from 26a9d38 to c2a9f1d Compare May 30, 2018 20:38
@haojin2 haojin2 changed the title [MXNET-349] [WIP] Histogram Operator [MXNET-349] Histogram Operator May 30, 2018
@haojin2 haojin2 force-pushed the histogram branch 4 times, most recently from 240eecc to 55d6ccb Compare June 2, 2018 01:38
@haojin2
Copy link
Copy Markdown
Contributor Author

haojin2 commented Jun 2, 2018

@reminisce @piiswrong @anirudh2290 @rahul003 Unit tests added, this is ready, please give a review if you have time. Thanks!

Comment thread python/mxnet/ndarray/ndarray.py Outdated
res, bin_bounds = np.histogram(a.asnumpy(), bins=bins)
return array(res), array(bin_bounds)
return _internal._histogram(data=a, bin_cnt=bins, range=range)
return None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why return None? Shouldn't this cause an error?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that I should throw an exception?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an exception for the illegal input case.

@haojin2 haojin2 force-pushed the histogram branch 2 times, most recently from 39fe74e to c53e477 Compare June 4, 2018 23:47
Comment thread src/operator/tensor/histogram.cc Outdated
.set_attr<nnvm::FInferShape>("FInferShape", HistogramOpShape)
.set_attr<nnvm::FInferType>("FInferType", HistogramOpType)
.set_attr<FCompute>("FCompute<cpu>", HistogramOpForward<cpu>)
.set_attr<nnvm::FInplaceOption>("FInplaceOption",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to register this if it's empty.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread src/operator/tensor/histogram.cc Outdated
}
}

template<typename cpu>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a template specialization? Same for other places.

template<>
void HistogramForwardImpl<cpu>()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

template<>
void HistogramForwardImpl<cpu>(const OpContext& ctx,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change this back to using OpContext? If using Stream for dispatching, you can get rid of the templates.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not using OpContext, I'm just getting rid of the stream argument here.

@haojin2 haojin2 force-pushed the histogram branch 3 times, most recently from 8689ebd to ee38e0c Compare June 6, 2018 20:39
@haojin2
Copy link
Copy Markdown
Contributor Author

haojin2 commented Jun 7, 2018

@piiswrong Should be ready for merge, please take a look when you have time, thanks!

@piiswrong
Copy link
Copy Markdown
Contributor

So symbolic is not supported?

@haojin2
Copy link
Copy Markdown
Contributor Author

haojin2 commented Jun 12, 2018

@piiswrong There's no backward pass for this function, I wonder if symbolic is necessary in this case?

@haojin2
Copy link
Copy Markdown
Contributor Author

haojin2 commented Jun 22, 2018

@piiswrong symbol is added for histogram.

@haojin2 haojin2 force-pushed the histogram branch 2 times, most recently from 8bc8e18 to 4595fd9 Compare June 23, 2018 16:24
@piiswrong piiswrong merged commit ed7e360 into apache:master Jun 25, 2018
XinYao1994 pushed a commit to XinYao1994/incubator-mxnet that referenced this pull request Aug 29, 2018
* implementation of histogram operator

* address code reviews and code re-design

* add exception for invalid inputs

* address code reviews

* add symbol and symbolic forward check for histogram
@haojin2 haojin2 deleted the histogram branch March 25, 2019 08:14
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants