16 Client-side linking

16.1 Graphical queries

This section focuses on a particular approach to linking views known as graphical (database) queries using the R package plotly. With plotly, one can write R code to pose graphical queries that operate entirely client-side in a web browser (i.e., no special web server or callback to R is required). In addition to teaching you how to pose queries with the highlight_key() function, this section shows you how to control how queries are triggered and visually rendered via the highlight() function.

Figure 16.1 shows a scatterplot of the relationship between weight and miles per gallon of 32 cars. It also uses highlight_key() to assign the number of cylinders to each point so that when a particular point is ‘queried’ all points with the same number of cylinders are highlighted (the number of cylinders is displayed with text just for demonstration purposes). By default, a mouse click triggers a query, and a double-click clears the query, but both of these events can be customized through the highlight() function. By typing help(highlight) in your R console, you can learn more about what events are supported for turning graphical queries on and off.

library(plotly)
mtcars %>%
  highlight_key(~cyl) %>%
  plot_ly(
    x = ~wt, y = ~mpg, text = ~cyl, mode = "markers+text", 
    textposition = "top", hoverinfo = "x+y"
  ) %>%
  highlight(on = "plotly_hover", off = "plotly_doubleclick")

FIGURE 16.1: A visual depiction of how highlight_key() attaches metadata to graphical elements to enable graphical database queries. Each point represents a different car and the number of cylinders (cyl) is assigned as metadata so that when a particular point is queried all points with the same number of cylinders are highlighted. For the interactive, see https://plotly-r.com/interactives/link-intro.html

Generally speaking, highlight_key() assigns data values to graphical marks so that when graphical mark(s) are directly manipulated through the on event, it uses the corresponding data values (call it $SELECTION_VALUE) to perform an SQL query of the following form.

SELECT * FROM mtcars WHERE cyl IN $SELECTION_VALUE

For a more useful example, lets use graphical querying to pose interactive queries of the txhousing dataset. This data contains monthly housing sales in Texan cities acquired from the TAMU real estate center and made available via the ggplot2 package. Figure 16.2 shows the median house price in each city over time which produces a rather busy (spaghetti) plot. To help combat the overplotting, we could add the ability to click a particular a given point on a line to highlight that particular city. This interactive ability is enabled by simply using highlight_key() to declare that the city variable be used as the querying criteria within the graphical querying framework.

One subtlety to be aware of in terms of what makes Figure 16.2 possible is that every point along a line may have a different data value assigned to it. In this case, since the city column is used as both the visual grouping and querying variable, we effectively get the ability to highlight a group by clicking on any point along that line. Section 16.4.1 has examples of using different grouping and querying variables to query multiple related groups of visual geometries at once, which can be a powerful technique.²³

# load the `txhousing` dataset
data(txhousing, package = "ggplot2")

# declare `city` as the SQL 'query by' column
tx <- highlight_key(txhousing, ~city)

# initiate a plotly object
base <- plot_ly(tx, color = I("black")) %>% 
  group_by(city)

# create a time series of median house price
base %>%
  group_by(city) %>%
  add_lines(x = ~date, y = ~median)

FIGURE 16.2: Graphical query of housing prices in various Texas cities. The query in this particular example must be triggered through clicking directly on a time series. For the interactive, see https://plotly-r.com/interactives/txmissing.html

Querying a city via direct manipulation is somewhat helpful for focusing on a particular time series, but it’s not so helpful for querying a city by name and/or comparing multiple cities at once. As it turns out, plotly makes it easy to add a selectize.js powered dropdown widget for querying by name (aka indirect manipulation) by setting selectize = TRUE.²⁴ When it comes to comparing multiple cities, we want to be able to both retain previous selections (persistent = TRUE) as well as control the highlighting color (dynamic = TRUE). This videos explains how to use these features in Figure 16.3 to compare pricing across different cities.

highlight(
  time_series, 
  on = "plotly_click", 
  selectize = TRUE, 
  dynamic = TRUE, 
  persistent = TRUE
)

Brush color

FIGURE 16.3: Using a selectize dropdown widget to search for cities by name and comparing multiple cities through persistent selection with a dynamic highlighting color. For a visual and audio explanation, see https://bit.ly/txmissing-modes.

By querying a few different cities in Figure 16.3, one obvious thing we can learn is that not every city has complete pricing information (e.g., South Padre Island, San Marcos, etc). To learn more about what cities are missing information as well as how that missingness is structured, Figure 16.4 links a view of the raw time series to a dot-plot of the corresponding number of missing values per city. In addition to making it easy to see how cities rank in terms of missing house prices, it also provides a way to query the corresponding time series (i.e., reveal the structure of those missing values) by brushing cities in the dot-plot. This general pattern of linking aggregated views of the data to more detailed views fits the famous and practical information visualization advice from Shneiderman (1996): “Overview first, zoom and filter, then details on demand”.

# remember, `base` is a plotly object, but we can use dplyr verbs to
# manipulate the input data 
# (`txhousing` with `city` as a grouping and querying variable)
dot_plot <- base %>%
  summarise(miss = sum(is.na(median))) %>%
  filter(miss > 0) %>%
  add_markers(
    x = ~miss, 
    y = ~forcats::fct_reorder(city, miss), 
    hoverinfo = "x+y"
  ) %>%
  layout(
    xaxis = list(title = "Number of months missing"),
    yaxis = list(title = "")
  ) 

subplot(dot_plot, time_series, widths = c(.2, .8), titleX = TRUE) %>%
  layout(showlegend = FALSE) %>%
  highlight(on = "plotly_selected", dynamic = TRUE, selectize = TRUE)

Brush color