-
-
Notifications
You must be signed in to change notification settings - Fork 26k
[MRG] Raise exception on providing complex data to estimators #9551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Fixes rendering of docstring examples - Instead of importing cross_val_score in example, cross_validate is imported
…into complex_data merging changes from the master branch
…into complex_data Merging with upstream
sklearn/utils/validation.py
Outdated
if not hasattr(dtype_orig, 'kind'): | ||
# not a data type (e.g. a column named dtype in a pandas DataFrame) | ||
dtype_orig = None | ||
|
||
if dtype_orig is not None and dtype_orig.kind == "c": | ||
raise ValueError("Complex data is not supported\n{}\n".format(array)) | ||
elif isinstance(array, list) or isinstance(array, tuple): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is the best way. We might also not catch everything with it (how about dataframes that contain complex data?). I'm not sure this is easy to catch later if dtype
is not None, though. With your addition we're converting into an array twice: here and in 411. It would be great if we could use the conversion at 411, but as I said, it's not entirely clear to me how to do that.
I just realized that we are not doing anything with complex data, as we pass anything that's not
Looking into the
I guess that's in accordance with the documentation, though. If dtype was numeric, we should check after the array creation, as we do for "O" data in 425. Can you please also add a test to |
Thank you @amueller . Let me make changes as you have suggested. |
@amueller |
Did you mean |
We have common tests for all estimators, implemented in
check_estimators.py. That those tests work correctly is tested in
test_estimator_checks.py
…On 16 August 2017 at 07:35, pravarmahajan ***@***.***> wrote:
Can you please also add a test to check_estimators.py?
Did you mean test_estimator_checks.py in sklearn/utils/tests?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#9551 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6_3kdz0XqyPjQE8mgFTSj7OZDcTHks5sYg81gaJpZM4O2t8W>
.
|
Why do we even allow dtype to be list when we are discarding everything
other than the first element?
You missed the part where we *accepted* the data as is if its dtype is in
that list. Otherwise we *try* to convert the data with the first dtype.
|
@jnothman |
Answering 2): |
…into complex_data
…into complex_data
Thanks @amueller |
@@ -433,6 +433,44 @@ def test_check_array_min_samples_and_features_messages(): | |||
assert_array_equal(y, y_checked) | |||
|
|||
|
|||
def test_check_array_complex_data_error(): | |||
X = np.array([[1 + 2j, 3 + 4j, 5 + 7j], [2 + 3j, 4 + 5j, 6 + 7j]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe instead of repeating the content every time create a list of lists in the beginning and then create everything else from that, so it's more obvious what is tested? Don't have a strong opinion though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had thought of that, but it becomes very messy for some of the cases. For example converting list of list to tuple of tuple, or to tuple of np-arrays. Maybe I can add one line comment for what case is being tested.
warnings.simplefilter('error', ComplexWarning) | ||
array = np.array(array, dtype=dtype, order=order, copy=copy) | ||
except ComplexWarning: | ||
raise ValueError("Complex data not supported\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that this branch is covered, but I don't see where. We're never passing dtype
right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never passing dtype to check_array? Even if we don't pass it to check_array, the value of dtype changes within this function. For example lines 381-386. The ComplexWarning comes only if we are setting dtype to some one of the real types. However, if dtype is "None", no conversion takes place and no warning is produced, so we need to check that case again in subsequentl lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One example where dtype is explicitly passed to the function is that of svm decision function
LGTM, I think unfortunately this is the easiest we can get away with. I'm a bit confused about when we hit the ComplexWarning branch, though. |
…into complex_data
array = np.array(array, dtype=dtype, order=order, copy=copy) | ||
except ComplexWarning: | ||
raise ValueError("Complex data not supported\n" | ||
"{}\n".format(array)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, at this point the array has been converted to float or int and the error message will be confusing. We need to keep a reference to the original input with complex values to report it in this error message (and del original_array
otherwise to let the gabage collector free the memory as soon as possible otherwise).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe:
with warnings.catch_warnings():
try:
warnings.simplefilter('error', ComplexWarning)
new_array = np.array(array, dtype=dtype, order=order, copy=copy)
except ComplexWarning:
raise ValueError("Complex data not supported\n"
"{}\n".format(array))
array = new_array
del new_array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hum sorry my previous comments are wrong. Your code is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 once my comment is addressed
…-learn#9551) * Modifies model_selection.cross_validate docstring (scikit-learn#9534) - Fixes rendering of docstring examples - Instead of importing cross_val_score in example, cross_validate is imported * raise error on complex data input to estimators * Raise exception on providing complex data to estimators * adding checks to check_estimator for complex data * removing some unnecessary parts * autopep8 changes * removing ipdb, restoring some autopep8 fixes * removing ipdb, restoring some autopep8 fixes * adding documentation for complex data handling * adding one line explanation for each test case
…-learn#9551) * Modifies model_selection.cross_validate docstring (scikit-learn#9534) - Fixes rendering of docstring examples - Instead of importing cross_val_score in example, cross_validate is imported * raise error on complex data input to estimators * Raise exception on providing complex data to estimators * adding checks to check_estimator for complex data * removing some unnecessary parts * autopep8 changes * removing ipdb, restoring some autopep8 fixes * removing ipdb, restoring some autopep8 fixes * adding documentation for complex data handling * adding one line explanation for each test case
…-learn#9551) * Modifies model_selection.cross_validate docstring (scikit-learn#9534) - Fixes rendering of docstring examples - Instead of importing cross_val_score in example, cross_validate is imported * raise error on complex data input to estimators * Raise exception on providing complex data to estimators * adding checks to check_estimator for complex data * removing some unnecessary parts * autopep8 changes * removing ipdb, restoring some autopep8 fixes * removing ipdb, restoring some autopep8 fixes * adding documentation for complex data handling * adding one line explanation for each test case
Stupid question, why did we not use |
@lesteve
The |
OK fair enough ... I feel like this could be simplified but it will require a bit more thoughts. |
Reference Issue
Fixes #9528
What does this implement/fix? Explain your changes.
An exception should be raised on giving complex data as input to the estimators. The changed code
does the same, via the function check_array. A new test function has been added to test_validation.py
corresponding to the code changes.
Any other comments?