Thanks to visit codestin.com
Credit goes to github.com

Skip to content

DOC Ensures that load_boston passes numpydoc validation #22247

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jan 24, 2022

Conversation

purna135
Copy link
Contributor

Reference Issues/PRs

Addresses #21350

What does this implement/fix? Explain your changes.

This PR ensures that sklearn.datasets._base.load_boston passes numpydoc validation

  • Removed the function from FUNCTION_DOCSTRING_IGNORE_LIST
  • Verify that all tests are passed

Any other comments?

My first PR to my favorite Library! : )
Thank You!

cc: (pair programming partner) @Ramanand-Yadav

…NORE_LIST and ensures that it passes numpydoc validation
@purna135
Copy link
Contributor Author

Hi @thomasjpfan

I followed the instructions given in Addresses #21350

I removed sklearn.datasets._base.load_boston from FUNCTION_DOCSTRING_IGNORE_LIST

After runing "pytest sklearn/tests/test_docstrings.py -k sklearn.datasets._base.load_boston"
I got the below result

platform win32 -- Python 3.9.9, pytest-6.2.5, py-1.11.0, pluggy-1.0.0
rootdir: C:\Users\Purna\Downloads\OpenSource\scikit-learn, configfile: setup.cfg
plugins: cov-3.0.0
collected 0 items / 2 skipped

======================== 2 skipped in 0.09s ========================

I also run "pytest sklearn" and got the below result

image

Since there is no error message I created this PR and now I am getting this error
Could you please help me with this
Thanks

@purna135
Copy link
Contributor Author

Hi @thomasjpfan
I was missing some important packages but now I got them installed.
Now I am getting the desired error messages and am I working on them.

Thanks

Hi @thomasjpfan

I followed the instructions given in Addresses #21350

I removed sklearn.datasets._base.load_boston from FUNCTION_DOCSTRING_IGNORE_LIST

After runing "pytest sklearn/tests/test_docstrings.py -k sklearn.datasets._base.load_boston" I got the below result

platform win32 -- Python 3.9.9, pytest-6.2.5, py-1.11.0, pluggy-1.0.0 rootdir: C:\Users\Purna\Downloads\OpenSource\scikit-learn, configfile: setup.cfg plugins: cov-3.0.0 collected 0 items / 2 skipped

======================== 2 skipped in 0.09s ========================

I also run "pytest sklearn" and got the below result

image

Since there is no error message I created this PR and now I am getting this error Could you please help me with this Thanks

@jmloyola
Copy link
Member

Thanks for the PR, @purnachandramansingh!
It's great that you found what the cause error was.
Now, you should see what are the problems with the documentation.
Let me know if you need a hand.

@purna135
Copy link
Contributor Author

purna135 commented Jan 21, 2022

Hi @jmloyola
Thanks for your response.

After installing all the dependencies properly I found there are 3 problems with the documentation. i.e

  • GL03: Double line break found; please use only one blank line to separate sections or paragraphs, and do not leave blank lines at the end of docstrings
  • GL09: Deprecation warning should precede extended summary
  • RT03: Return value has no description

I fixed GL03 & RT09 error codes successfully but now got stuck at GL09: Deprecation warning should precede extended summary.
It is in sklearn.datasets._base.load_boston
https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/datasets/_base.py

Line: 1180

I also referred to the numpydoc documentation
https://numpydoc.readthedocs.io/en/latest/format.html#sections
https://numpydoc.readthedocs.io/en/latest/format.html#deprecation-warning
still, I am trying my best to fix this message asap but don't know what am I missing.

I will be very grateful if you can guide me on this case.
Thanks

@jmloyola
Copy link
Member

jmloyola commented Jan 21, 2022

You can read more about the problem with GL09 here.

You can do something similar as they did in that PR:

  • remove the .. deprecated directive;
  • add the missing information from that directive into the @deprecate decorator ("See the warning message below for further details regarding the alternative datasets.").

We are showing a deprecation message two times here. After you do that, run pytest again to check that everything is OK. You can also build the documentation and see how it looks.

In the future, when you face an error, print the message here so it is easier for us to help you ^^

@purna135
Copy link
Contributor Author

Thanks for the help @jmloyola

Actually, I am completely new to this git interface so a bit confused, I will keep this in my mind for future conversation.
btw Is there any active channel for scikit-learn where I can ask my doubts?

In the future, when you face an error, print the message here so it is easier for us to help you ^^

@ogrisel
Copy link
Member

ogrisel commented Jan 21, 2022

Actually, I am completely new to this git interface so a bit confused, I will keep this in my mind for future conversation.
btw Is there any active channel for scikit-learn where I can ask my doubts?

It's better to have a focused discussion on your github PR itself rather than a generic channel if you need still help related to this PR. Feel free to ask more precise questions if @jmloyola answers above are not enough.

@ogrisel
Copy link
Member

ogrisel commented Jan 21, 2022

Note: there is also a failure related to black:

https://dev.azure.com/scikit-learn/scikit-learn/_build/results?buildId=37188&view=logs&j=32e2e1bb-a28f-5b18-6cfc-3f01273f5609&t=fc67071d-c3d4-58b8-d38e-cafc0d3c731a

Run the black command locally to fix the formatting of this file, then commit and push again:

pip install black==21.6b0
black sklearn/datasets/_base.py

@purna135
Copy link
Contributor Author

Actually, I am completely new to this git interface so a bit confused, I will keep this in my mind for future conversation.
btw Is there any active channel for scikit-learn where I can ask my doubts?

It's better to have a focused discussion on your github PR itself rather than a generic channel if you need still help related to this PR. Feel free to ask more precise questions if @jmloyola answers above are not enough.

Sure @ogrisel

@ogrisel
Copy link
Member

ogrisel commented Jan 21, 2022

@thomasjpfan I opened #22264 to fix the problem with the check-changelog linter.

@purna135
Copy link
Contributor Author

You can also build the documentation and see how it looks.

Is there any way to build documentation for a single file only (the file I changed) ?

  • remove the .. deprecated directive;
  • add the missing information from that directive into the @deprecate decorator ("See the warning message below for further details regarding the alternative datasets.").

I did the same and now it passed pytest, flake8 and black.
So pushing them again.

@jmloyola
Copy link
Member

Is there any way to build documentation for a single file only (the file I changed) ?

You can not do that (as far as I know). Remember that the documentation has hyperlinks connecting each module together.

I did the same and now it passed pytest, flake8 and black.

I would leave the warning below the description of the function. Something like this:

@deprecated(r"""`load_boston` is deprecated in 1.0 and will be removed in 1.2.""")
def load_boston(*, return_X_y=False):
    r"""Load and return the Boston house-prices dataset (regression).

    ==============   ==============
    Samples total               506
    Dimensionality               13
    Features         real, positive
    Targets           real 5. - 50.
    ==============   ==============

    Read more in the :ref:`User Guide <boston_dataset>`.

    .. warning::
        The Boston housing prices dataset has an ethical problem: as
        investigated in [1]_, the authors of this dataset engineered a
        non-invertible variable "B" assuming that racial self-segregation had a
        positive impact on house prices [2]_. Furthermore the goal of the
        research that led to the creation of this dataset was to study the
        impact of air quality but it did not give adequate demonstration of the
        validity of this assumption.

        The scikit-learn maintainers therefore strongly discourage the use of
        this dataset unless the purpose of the code is to study and educate
        about ethical issues in data science and machine learning.

        In this special case, you can fetch the dataset from the original
        source::

            import pandas as pd  # doctest: +SKIP
            import numpy as np

            data_url = "http://lib.stat.cmu.edu/datasets/boston"
            raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
            data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
            target = raw_df.values[1::2, 2]

        Alternative datasets include the California housing dataset [3]_
        (i.e. :func:`~sklearn.datasets.fetch_california_housing`) and Ames
        housing dataset [4]_. You can load the datasets as follows::

            from sklearn.datasets import fetch_california_housing
            housing = fetch_california_housing()

        for the California housing dataset and::

            from sklearn.datasets import fetch_openml
            housing = fetch_openml(name="house_prices", as_frame=True)  # noqa

        for the Ames housing dataset.

    Parameters
    ----------

Note that there was an error in raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None). The separation should be "\s+" instead of "s+". I know it isn't something you added and it was already there, just letting you know so we can fix it now πŸ€“.

Also, I used "Boston" for description of the function instead of "boston".

@purna135
Copy link
Contributor Author

Note that there was an error in raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None). The separation should be "\s+" instead of "s+". I know it isn't something you added and it was already there, just letting you know so we can fix it now πŸ€“.

Also, I used "Boston" for description of the function instead of "boston".

Thanks, @jmloyola I am doing these changes now.

@purna135
Copy link
Contributor Author

purna135 commented Jan 21, 2022

I would leave the warning below the description of the function. Something like this:

Note that there was an error in raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None). The separation should be "\s+" instead of "s+". I know it isn't something you added and it was already there, just letting you know so we can fix it now πŸ€“.

Also, I used "Boston" for description of the function instead of "boston".

All done 😊
Please have a look @jmloyola

@jmloyola
Copy link
Member

Now the test test_load_boston_warning is failing :(

We forgot to run pytest there to catch it.

@ogrisel, @thomasjpfan, @jjerphan, what do you think? How should we handle this? Beforehand, the information was duplicated. One time in the @deprecated decorator and once again in the .. warning directive.

Should we change test_load_boston_warning, leave the warning information in the decorator, or duplicate the warnings (leave as it was before). I don't like the second option because the description of the functions is not at the top of the page.

@thomasjpfan
Copy link
Member

I think the warning message is important enough to remain in both the FutureWarning and the docstring.

Moving forward, I think we can remove the .. warning directive and keep the content in @deprecated.
Looking at the render on main, the whole docstring became the warning because of how @deprecated auto-inserted the warning message.

@purna135
Copy link
Contributor Author

Hi @thomasjpfan, @jmloyola
As per your instruction I kept the message in both @deprecated decorator and .. warning directive.
After this I had generated the html file and below is the output, kindly have look on this and kindly let me know if it's ok ro merge or I need to do any modification.
Thanks

image
image
image

Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@jmloyola
Copy link
Member

LGTM

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thomasjpfan thomasjpfan merged commit 6bdfb9a into scikit-learn:main Jan 24, 2022
@purna135
Copy link
Contributor Author

Thank you so much @thomasjpfan 😊
It's my first contribution my most favorite library scikit-learn ❀
It was a great experience and learned a lot about how opensource, GitHub, and code base works.
Thank you so much to open this opportunity for new contributors like me.
also a lot of thanks to all the friendly members of this community specially @jmloyola, @jjerphan @ogrisel, @atharvapatil123, @genvalen, you all helped me a lot and cleared all my doubts.

As per the rule mentioned in #21350 I have successfully completed 3 PRs. could you please suggest any other issue to contribute or shall I continue my contribution to #21350?
Thanks πŸ™

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants