Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Updated violin plot example as per suggestions in issue #7251 #7360

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Oct 31, 2016
Prev Previous commit
Next Next commit
Replaced use of zip with call to transpose
  • Loading branch information
DrNightmare committed Oct 30, 2016
commit 03946d410bae14ba2d294f84700d3207d39c3e06
5 changes: 3 additions & 2 deletions examples/statistics/customized_violin_demo.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,8 +62,9 @@ def set_axis_style(ax, labels):
pc.set_edgecolor('black')
pc.set_alpha(1)

medians = np.percentile(data, 50, axis=1)
inter_quartile_ranges = list(zip(*(np.percentile(data, [25, 75], axis=1))))
tmp = (np.percentile(data, [25, 50, 75], axis=1)).T
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you actually need a transpose here or would untransposed be the same and then: median = quantiles[1], inter quartile range = qauntiles[[0,2]]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without using the transpose would work for the medians. For the inter quartile ranges we do need a transpose.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes no sense. Either the rows are turned into columns or they aren't. I could be doing the selection wrong on the rows for inter quartile range though. What's the shape of quantiles before the transpose?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list inter_quartile_ranges is supposed to be a list of pairs.
Before the transpose, the shape of quantiles would be 3 x (some N). Let's say these three rows are numbered 0, 1, 2. Row 1 would be the medians, no problem there. If we do
inter_quartile_ranges = tmp[[0, 2]]
(without the initial transpose), it's shape would be 2 x (some N). We need it's shape to be (some N) x (2) for it to be a list of pairs.
Please correct me if I'm wrong.

Copy link
Member

@story645 story645 Oct 30, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, then maybe save the transpose to inter_quartile_ranges = tmp[[0,2]].T? And I think tmp should be named qaurtiles since that's what they are. That being said, if you go the vlines approach, it simplifies to quartile1, median, quartile3 = np.percentile(data, [25, 50, 75], axis=1) where qaurtile1 is iqrmin, etc.
Also, currently iqr is getting computed both here and in adjacent values and that should probably be cleaned up. My first pass suggestion is change the function sig of adjacent_values to adjacent_vals(vals, q1, q3), drop the first two lines in adjacent_vals, and the call is
whiskers = [adjacent_values(sorted_array, q1, qr ) for sorted_array, q1, q3 in zip(data, quartile1, quartile3)]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, makes sense. Firstly, I have changed the variable name to 'quartiles' and altered the transpose calculation, as suggested. I'm working on the vlines approach and the adjacent_values() function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@story645 I could do something like this, as per your code in the original issue and your suggestions above:

quartile1, medians, quartile3 = np.percentile(data, [25, 50, 75], axis=1)
inter_quartile_ranges = np.vstack([quartile1, quartile3]).T
whiskers = [
    adjacent_values(sorted_array, q1, q3)
    for sorted_array, q1, q3 in zip(data, quartile1, quartile3)]
whiskersMin, whiskersMax = list(zip(*whiskers))
# plot medians as points,
# whiskers as thin lines, quartiles as fat lines
inds = np.arange(1, len(medians) + 1)
ax2.scatter(inds, medians, marker='o', color='white', s=30, zorder=3)
ax2.vlines(inds, quartile1, quartile3, color='k', linestyle='-', lw=5)
ax2.vlines(inds, whiskersMin, whiskersMax, color='k', linestyle='-', lw=1)

I think this looks good. Any comments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I committed the changes for now. If you have any suggestions, please put them in a review, thanks!

medians = tmp[:, 1]
inter_quartile_ranges = tmp[:, [0, 2]]
whiskers = [adjacent_values(sorted_array) for sorted_array in data]

# plot whiskers as thin lines, quartiles as fat lines,
Expand Down