Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Facet grid FigureFactory #731

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 41 commits into from
Jun 12, 2017
Merged

Facet grid FigureFactory #731

merged 41 commits into from
Jun 12, 2017

Conversation

Kully
Copy link
Contributor

@Kully Kully commented Apr 12, 2017

will be making a notebook for a proper showcase of these charts

@Kully Kully changed the title Facet grid Facet grid FigureFactory Apr 12, 2017
num_of_rows = len(data[facet_row].unique())

if facet_col:
num_of_cols = len(data[facet_col].unique())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts about adding binning for numerical columns?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't thinking about it but I can after I get something working. I'm not finished with the code here, so I can ping you when I do

@@ -16,3 +16,4 @@
from plotly.figure_factory._table import create_table
from plotly.figure_factory._trisurf import create_trisurf
from plotly.figure_factory._violin import create_violin
from plotly.figure_factory._facet_grid import create_facet_grid
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imports in alphabetical order

@jackparmer
Copy link
Contributor

How 1-1 is this figure factory with the ggplot2 facet geom? Would be great to see some screenshot output examples when this gets closer.

@Kully
Copy link
Contributor Author

Kully commented Apr 13, 2017

How 1-1 is this figure factory with the ggplot2 facet geom?

pretty close, just missing some variables right now. Yeah, I'm gonna put together either screenshots or a notebook presentation or both.

@Kully
Copy link
Contributor Author

Kully commented Apr 17, 2017

Progress:
screen shot 2017-04-16 at 8 43 50 pm
screen shot 2017-04-16 at 8 49 53 pm
screen shot 2017-04-16 at 8 58 07 pm
screen shot 2017-04-16 at 8 59 32 pm

@Kully
Copy link
Contributor Author

Kully commented Apr 19, 2017

@jackparmer Here's my progress so far.
@cldougl Ready for a review.
@chriddyp Here's a Jupyter Notebook of some examples!

Things that we could add:

  • binning for numerical columns (Chris' suggestion and exists for scatterplot_matrix FigureFactory)
    -axis titles should be in middle of each axis

Feel free to make suggestions. πŸ˜„

@jackparmer
Copy link
Contributor

@Kully This looks very cool! Can you please make a Jupyter notebook that uses these dataframes and recreates these examples using your FigureFactory?

http://ggplot2.tidyverse.org/reference/facet_grid.html

I think this will be a nice way to be sure we're covering canonical faceting usage.

@Kully
Copy link
Contributor Author

Kully commented Apr 20, 2017

Can you please make a Jupyter notebook that uses these dataframes and recreates these examples using your FigureFactory?

For sure!

@@ -3,6 +3,7 @@
from plotly.exceptions import PlotlyError

import plotly.tools as tls
import plotly.figure_factory as ff
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for updating all of these @Kully !

**kwargs)

def test_unequal_data_label_length(self):
kwargs = {'hist_data': [[1, 2]], 'group_labels': ['group', 'group2']}
self.assertRaises(PlotlyError, tls.FigureFactory.create_distplot,
self.assertRaises(PlotlyError, ff.create_distplot,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same- one line?

@@ -32,24 +33,24 @@ def test_wrong_histdata_format(self):
# will fail)

kwargs = {'hist_data': [1, 2, 3], 'group_labels': ['group']}
self.assertRaises(PlotlyError, tls.FigureFactory.create_distplot,
self.assertRaises(PlotlyError, ff.create_distplot,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this could be one line now (with **kwargs)

**kwargs)

kwargs = {'hist_data': [[1, 2], [1, 2, 3]], 'group_labels': ['group']}
self.assertRaises(PlotlyError, tls.FigureFactory.create_distplot,
self.assertRaises(PlotlyError, ff.create_distplot,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same - one line?

**kwargs)

def test_simple_distplot_prob_density(self):

# we should be able to create a single distplot with a simple dataset
# and default kwargs

dp = tls.FigureFactory.create_distplot(hist_data=[[1, 2, 2, 3]],
dp = ff.create_distplot(hist_data=[[1, 2, 2, 3]],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix spacing w/ lines below

from plotly.figure_factory import utils
import plotly.colors as colors

from plotly.graph_objs import graph_objs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can group all from plotly... imports together

)

return annotation_dict

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can add 2 new lines between defs

fig['layout']['annotations'] = annotations

return fig

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 spaces between defs

colorscale=None, color_dict=None, title='facet grid',
height=600, width=600, **kwargs):
"""
Returns data for a facet grid.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this returns a figure right? not just the data object

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed 'data' -> 'figure'

"""
if not pd:
raise exceptions.ImportError(
"'pandas' must be imported for this FigureFactory."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we changed the syntax I would probably say figure_factory or figure type or chart type

@cldougl
Copy link
Member

cldougl commented Apr 25, 2017

@Kully looks like a weird thing is going on when toggling the traces:
tracebg

@Kully
Copy link
Contributor Author

Kully commented May 3, 2017

@Kully looks like a weird thing is going on when toggling the traces:

Yeah, it's pretty wack.

For some reason when a subplot is empty, the background just disappears. I think it has to do with all the customization I'm using.

Are you a fan of having all those duplicate trace icons, or is one per category enough.

colormap will be treated as categorical (True) or sequential (False).
Default = False.
:param (bool) widen_frame: if set to True, all points in each subplot
are strickly contained in the region of the subplot by increasing the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spelling: strictly

:param (bool) widen_frame: if set to True, all points in each subplot
are strickly contained in the region of the subplot by increasing the
maximum and minimum range values by 1. Setting to False doesn't do
anything.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe "If False, then points may be plotted on the edge of the frame."

anything.
Default = False
:param (str|dict) facet_row_labels: set to either 'name' or a dictionary
of all the values in the facetting row mapped to some text to show up
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spelling: faceting

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and it's all of the "unique values" not just "values", right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct

Default = False
:param (str|dict) facet_row_labels: set to either 'name' or a dictionary
of all the values in the facetting row mapped to some text to show up
in the label annotations. If None, labelling works like usual.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spelling: labeling

:param (int) width: the width of the facet grid figure.
:param (int) size: point size in pixels.
:param (str) trace_type: decides the type of plot to appear in the
facet grid. The options are 'scatter' and 'scattergl'.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

although presumably you could pass in something like histogram and a set of kwargs to customize that histogram right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could even skip applying the marker and mode options if type is not in ['scatter', 'scattergl'], allowing users to create valid facet_grids with heatmaps, histograms, 2dhistograms, etc

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order for histogram to work, you'd need to also make x and y optional too

Copy link
Contributor Author

@Kully Kully Jun 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this pass, can we not include trace types besides scatter and scattergl? mode for example is not a valid key for a bar chart, and would result in an error.

I don't think turning off validation for the figure is a good idea, but the way I'd implement the option to add other trace types is to have a dictionary of keys that would go in the trace dict for each specific trace

Copy link
Member

@chriddyp chriddyp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking pretty good so far! This is going to be really great, I can already see how I'll use this in Dash :) My main comments around around:

  • Numerical binning instead of using the unique numerical values
  • Making the code DRYer

I'll finish up reviewing this evening!

@chriddyp
Copy link
Member

chriddyp commented Jun 2, 2017

Some other thoughts as I play around with this:

  • I'm curious if it can be generalized a little bit to be able to plot any chart type, not just scatter. I find myself frequently wanting to make a faceted histogram, so this could be a big win. It seems like it could with a few small changes:
    • making x and y optional: in a histogram you would only supply x (or, if you wanted horizontal histogram, y)
    • skipping the marker assignments if the type isn't scatter or scattergl
  • Sizing I think that we should make the height a multiple of the number of rows and the width a mulitple of the number of columns. Something like height=min(600, 175 * n_rows). That way, if you have 20 unique values, your graph isn't 15px tall. If you have only two unique values, then we keep the default height (like 600 or w/e it is)
  • Text - We should add an optional text column_name that adds text to the grouped values

@chriddyp
Copy link
Member

chriddyp commented Jun 2, 2017

Is there a requirement that the graphs need to look like ggplot2 by default? I would prefer that we keep the styling more aligned to the plotly defaults to be consistent with the rest of the library. There are also a few things that bother me about the ggplot styling:

  • The points are pure black which is really stark. Without opacity, it's hard to tell if they overlap - here's an article about avoiding pure blacks in design: https://ianstormtaylor.com/design-tip-never-use-black/
  • The points are really small.
  • I personally find the grey background and grey background shapes kind of old looking
  • I think the tick placement much heavier than it needs to be

Some people like the theme however, and so we can keep it in with maybe a theme=ggplot2 argument.

Here is a little redesign:
image
Compared with:
image

Here is the code I'm using to convert the facet_grid styles:

gg = ff.create_facet_grid(mpg, 'cty', 'hwy', 'class')
p = copy.deepcopy(gg)
for trace in p['data']:
    # Overwrite marker styles: 
    # - slightly bigger than usual points
    # - opacity in the points
    # - a small border around the points
    trace['marker'] = {
        'color': 'rgba(31, 119, 180, 0.5)',
        'size': 8,
        'line': {'color': 'darkgrey', 'width': 1}
    }

# removing the dark grey background
del p['layout']['shapes']

# making the plot height porportional to the number of rows
p['layout']['height'] = len(p['data']) * 150


for k, v in p['layout'].iteritems():
    if 'axis' in k:
        # Reverting to default grid styles
        del v['gridcolor']
        del v['gridwidth']
        del v['tickfont']
        del v['tickwidth']
        del v['dtick']
        # Except the ticks: removing the ticks
        v['ticklen'] = 0

# Removing the grey background
del p['layout']['plot_bgcolor']

# Adding a slightly off-white margin background
# This makes it easier to distinguish one subplot from the other
p['layout']['paper_bgcolor'] = 'rgb(251, 251, 251)'

# Update hovering mode for scatter plots
p['layout']['hovermode'] = 'closest'

kwargs.pop('marker', None)

# make sure dataframe index starts at 0
df.index = range(len(df))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we want to be doing this: this is going to be modifying the user's dataframe without them knowing.

For example:
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Damn, you're right. I can fix that with some rewriting.

if key not in facet_row_labels.keys():
unique_keys = df[facet_row].unique().tolist()
raise exceptions.PlotlyError(
"If you are using a dictioanry for custom labels for "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spelling: "dictionary"

if key not in facet_col_labels.keys():
unique_keys = df[facet_col].unique().tolist()
raise exceptions.PlotlyError(
"If you are using a dictioanry for custom labels for "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spelling: "dictionary".

max_range = math.ceil(max_range)
if widen_frame:
min_range -= 1
max_range += 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call in fixing the range across all subplots so that it's easy to compare values.

I think we can be a bit smarter about setting the autorange though and then make it the default and remove this widen_frame argument:

  • Adding/subtracting 1 like this won't work if the numerical ranges are really big or if the numerical ranges are within a small range like between 0.5 and 0.6 (in which case it'll end up making the range too large)
  • Instead, why don't we just make it say 5% bigger than the absolute range. I believe this is how plotly.js does it. Something like: range = [min - (max - min) * 0.05, max + (max - min) * 0.05]. You could ask the #plotly_js folks exactly what value they use

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea Chris

'Blackbody': ['rgb(0,0,0)', 'rgb(160,200,255)'],
'Earth': ['rgb(0,0,130)', 'rgb(255,255,255)'],
'Electric': ['rgb(0,0,0)', 'rgb(255,250,220)'],
'Viridis': ['#440154', '#fde725']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These colorscales seem a little off - aren't there supposed to be more than 2 colors in them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that colorscale list is not correct. It's being used by _scatterplot.py for colorscales, so I eventually want to make a PR to replace the colorscales in utils with plotly.colors.PLOTLY_SCALES (the proper one) and rewrite the way scatterplot handles colorscales.

Copy link
Contributor Author

@Kully Kully Jun 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will handle in a separate PR with a new issue: #769

@Kully
Copy link
Contributor Author

Kully commented Jun 8, 2017

The points are pure black which is really stark. Without opacity, it's hard to tell if they overlap - here's an article about avoiding pure blacks in design: https://ianstormtaylor.com/design-tip-never-use-black/

Good read. It's funny because the "black" that they pit against the examples of nearly-black in app layouts are actually not #000000 either. The website may be swiping a filter over the whole page

@Kully
Copy link
Contributor Author

Kully commented Jun 9, 2017

@jackparmer @chriddyp @cldougl
A few words about this PR. I want to get this thing merged ASAP (today if possible).

  1. Can one of you do a quick review of the facet grid? CHANGELOG and version number are already bumped the next pip package.
  2. I want to move the remaining things/ideas for this figure factory - numerical binning/custom cuts for binning/other trace-type support - to another PR, as they are not canonical features.

Sounds good?

@jackparmer
Copy link
Contributor

I want to move the remaining things/ideas for this figure factory - numerical binning/custom cuts for binning/other trace-type support - to another PR, as they are not canonical features.

SGTM!

@Kully
Copy link
Contributor Author

Kully commented Jun 10, 2017

@jackparmer @chriddyp πŸ’ƒ ?

@chriddyp
Copy link
Member

  • numerical binning/custom cuts for binning to another PR

Yeah, that sounds good. In the meantime, we can document a workflow where the user creates their own bins in their dataframe and facets off of that. For example:

>> df = pd.DataFrame({'x': [1, 2, 3, 4, 1, 2, 3, 4], 'y': ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']})
>> df['group'] = pd.cut(df['x'], 5, ['first group', 'second group', 'third group', 'fourth group', 'fifth group'])
>> df
   x  y         group
0  1  a  (0.997, 1.6]
1  2  b    (1.6, 2.2]
2  3  c    (2.8, 3.4]
3  4  d      (3.4, 4]
4  1  e  (0.997, 1.6]
5  2  f    (1.6, 2.2]
6  3  g    (2.8, 3.4]
7  4  h      (3.4, 4]

or, with custom labels:

>> df['group'] = pd.cut(df['x'], 5, labels=['first group', 'second group', 'third group', 'fourth group', 'fifth group'])
>> df
   x  y         group
0  1  a   first group
1  2  b  second group
2  3  c  fourth group
3  4  d   fifth group
4  1  e   first group
5  2  f  second group
6  3  g  fourth group
7  4  h   fifth group

@chriddyp
Copy link
Member

thanks for making all of those changes. this feature is really great now. πŸ’ƒ !

@Kully Kully merged commit f52c5db into master Jun 12, 2017
@Kully Kully deleted the facet_grid branch June 12, 2017 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants