Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@kannandeepti
Copy link
Contributor

@kannandeepti kannandeepti commented May 26, 2022

proposed BUGFIX: changed raise ValueError in align_track_with_cooler to just a warning

Previously, the align_track_with_cooler() function raised a Value
Error if the track values of a given region in the viewframe were
all NaN. However, this check is not necessary because NaN values
are anyways already masked.

kannandeepti and others added 2 commits May 26, 2022 18:18
… warning

Previously, the align_track_with_cooler() function raised a Value
Error if the track values of a given region in the viewframe were
all NaN. However, this check is not necessary because NaN values
are anyways already masked.
Ratified 2022-06-01.
@agalitsyna
Copy link
Member

Hey, @gfudenberg. Is there a reason why empty regions in align track with cooler give an error rather than a warning? (Authorship deduced from git blame)

I'm thinking about Deepti's case: why would one chromosome having all NaNs in the dataframe before assignment constitute an issue in the downstream analysis?

@gfudenberg
Copy link
Member

Hi-- I don't know if I quite understand Deepti's case, but I think we discussed this in
#271
#180 (comment)
and the idea was that you should pass a view that does not have the offending chromosome, since it is one line. It may have helped streamline assumptions in some downstream applications as well (saddles?)

@agalitsyna
Copy link
Member

Thanks for getting back to it! I think the non-trivial thing here is that align_track_with_cooler does not distinguish absent bins in the track and NaN values in the track, and it's not entirely clear why.
Imagine that some chromosome has all NaNs in the track (e.g. some mark is completely absent on this chromosome). It's expected that the bins from this chromosome do not contribute to aggregation, but why would this cause an error?

The solution might be to raise an error only if the chromosome is completely absent form the track (or some bins of it, as you propose in #271).

@gfudenberg
Copy link
Member

Maybe it is the use in functions like cis_eig? E.g. if there is this all NaN chromosome, maybe this would throw an error when you try to correlate?

phasing_track_region_values = phasing_track_region["value"].values

…track_with_cooler. NaNs and truly unassigned values are now distinguishable. There are now two options for align_track_with_cooler: ignore missing values or not. This parameter is propagated to saddles (because this check does not affect the results comparing to removal of chromosomes with all NaNs), but set to default True in eigdecomp (because at least one value per chromosome is strictly needed for phasing).
@agalitsyna
Copy link
Member

Okay, these are all good points that we discuss here. I introduced the parameter for ignoring missing values in alignment, and add it as a control option for saddle (because the chromosomes with all NaNs do not really affect the output). For eigdecomp you are right, @gfudenberg , there should be at least some value per each chromosome, otherwise phasing does not make sense. I set this new parameter to default True in eigdecomp.

To sum up, this is small update of align_track_with_cooler, but might be important to those practicing various types of saddles.

@agalitsyna
Copy link
Member

Illustration of the problem before and after the fix:
image

@agalitsyna agalitsyna requested a review from gfudenberg June 22, 2022 20:42
Copy link
Member

@gfudenberg gfudenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe consider
drop_track_nas --> drop_track_na

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants