Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@HGWright
Copy link
Contributor

πŸš€ Pull Request

Description

I have brought the internal Dask best practice advice and examples into the Dask documentation. I have updated to change specific internal information to more generic language, more relevant in the documentation.

This should be linked to #4959. But does not fully close the issue.


Consult Iris pull request check list

@HGWright HGWright requested a review from lbdreyer March 10, 2023 11:26
@codecov
Copy link

codecov bot commented Mar 10, 2023

Codecov Report

Patch and project coverage have no change.

Comparison is base (48e3a86) 89.37% compared to head (aac31cd) 89.37%.

❗ Current head aac31cd differs from pull request most recent head 95540f3. Consider uploading reports for the commit 95540f3 to get more accurate results

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #5190   +/-   ##
=======================================
  Coverage   89.37%   89.37%           
=======================================
  Files          89       89           
  Lines       22419    22419           
  Branches     5380     5380           
=======================================
  Hits        20036    20036           
  Misses       1637     1637           
  Partials      746      746           

β˜” View full report in Codecov by Sentry.
πŸ“’ Do you have feedback about the report comment? Let us know in this issue.

@HGWright HGWright removed the request for review from lbdreyer March 10, 2023 12:28
@lbdreyer lbdreyer self-assigned this Mar 16, 2023
Copy link
Member

@lbdreyer lbdreyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, thanks @HGWright !

A few general comments:

  • There are a few terms that need tidying up. I think we were quite lazy in the draft Dask best practices docs, but we need to be more careful here if this is to be included in the Iris docs, particularly, the following need to be update:
    numpy -> NumPy
    netcdf -> netCDF (I don't believe this needs to be capitalised)
    Also dask should always be capitalised Dask

  • We have used CPU's in quite a few places, but the apostrophe is incorrect, so that should just be CPUs.

  • You have used the term "multiprocessing system", but I think a term like "computing cluster" would be more appropriate.

  • There are a few examples of MO specific sections that need generalising a bit more, I have added specific comments where this is required.

Copy link
Member

@lbdreyer lbdreyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking very close to being ready to merge!
There are just two outstanding issues that I can see:

  • Malformed tables causing docs tests to fail. I suspect this could just be solved by adding the extra spaces that you lost when you changed CPU's -> CPUS. See my suggestions below:
  • "This branch is out-of-date with the base branch" I've never seen this error before. It might be alluding to a merge conflict? But maybe something else!

@pp-mo pp-mo mentioned this pull request Mar 30, 2023
7 tasks
@pp-mo
Copy link
Member

pp-mo commented Mar 30, 2023

  • "This branch is out-of-date with the base branch"

This is not an error.
I think it is just something that GitHub has started offering as an option -- basically, an automatic merge-back from the target branch, or rebase onto it.
I don't really understand the benefit of these, if there are no conflicts, but possibly it enables you to see more exactly what the result will be when merged back (e.g. docs builds).

@pp-mo
Copy link
Member

pp-mo commented Mar 30, 2023

How open are we to further modifying this now?
I'm just re-reading some of the content and clearly some things could be improved.
Obviously this content is quite old now, and besides what may have changed, I think in some places our understanding may also have improved since it was written.

@pp-mo
Copy link
Member

pp-mo commented Mar 30, 2023

How open are we to further modifying this

Some random things I have noticed as I re-read it (but we can still address afterwards/elsewhere):

  • in the "PP and Fieldsfiles section" (under Chunking), we should probably say that the same applies to GRIB too.
  • in the "Netcdf Files" section (under Chunking), we might usefully explain that...
    • one can somewhat adjust how Iris chunks netcdf data by setting the netcdf target chunk-size,
      e.g. dask.config.set(**{'array.chunk-size': '250 Mib'})), but that ...
    • sometimes the default choice will not suit your usage (e.g. the access to vertical slices in the Parallelising a Loop of Multiple Calls to a Third Party Library section), that
    • there is no direct control over input chunking (at present), and especially that
    • rechunking cannot fix how data is fetched from files.
  • in the "Dask bags and greedy parallelism" example
    we should probably mention that
    • (A) Bags use a process-scheduler by default,
    • (B) iris lazy computation does not function with a process-based scheduler, but that doesn't matter here because iris loading only constructs dask arrays, and never computes them -- I think?
    • (C) use of distributed is often/usually advised as better than 'processes' (as already noted in this section)

@lbdreyer
Copy link
Member

How open are we to further modifying this now? I'm just re-reading some of the content and clearly some things could be improved. Obviously this content is quite old now, and besides what may have changed, I think in some places our understanding may also have improved since it was written.

I'd been in support of improving things before this gets added to a release.

Are we intending this to go in Iris 3.5?

@pp-mo
Copy link
Member

pp-mo commented Mar 30, 2023

Are we intending this to go in Iris 3.5?

I had thought so, but I see it's not actually on the board.
@ESadek-MO can you explain ?

@HGWright
Copy link
Contributor Author

HGWright commented Jun 2, 2023

@pp-mo & @lbdreyer. Given that I have lost some momentum with this, my preference would be to bank this and then I will open a new issue to make improvements as this is already quite a big PR. Then at least the information is out there.

I think this should be good to go if that's what we are doing.

Copy link
Member

@lbdreyer lbdreyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, just one final small change and then this should be ready to merge. Could you also create an issue to capture the extra work that @pp-mo suggests?

@lbdreyer
Copy link
Member

Great work @HGWright ! Please remember to create a new issue to address these points

@lbdreyer lbdreyer merged commit 18d24a9 into SciTools:main Jun 12, 2023
@HGWright
Copy link
Contributor Author

Thanks @lbdreyer great to finally get this across the line. For the new issue please see #5344

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants