Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
27 views11 pages

Let The Chart Spark Embedding Semantic Context Int

Uploaded by

ari.qh0208
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views11 pages

Let The Chart Spark Embedding Semantic Context Int

Uploaded by

ari.qh0208
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Let the Chart Spark: Embedding Semantic Context into Chart with

Text-to-Image Generative Model


Shishi Xiao, Suizi Huang, Yue Lin, Yilin Ye, Wei Zeng

May 20 0
Desert
Farmland
1970
65 -5
25 1980

May 10 1990

-10
2000
Area of the Kubuqi
April 30
Desert and the -15
2010
Restoration in 2001
Date of
(x10000km²)
-20
April 20 Cherry Blossom

in High Park Forest -25 2020


arXiv:2304.14630v1 [cs.AI] 28 Apr 2023

55
2006 2008 2010 2012 2014 2016 2018 2020
Glacier Mass Balance

(a) (b) (c)

Fig. 1: Pictorial visualizations created by ChartSpark. (a) a line chart depicts the date of cherry blossom in High Park each year,
embedded with the tree branch while also preserving the trend. (b) A pie chart shows the area of Kubuqi desert and its restoration in
2021, with the three types of land embedded in the corresponding sectors in a consistent style. (c) A line chart shows the amount of
glacier mass balance per year, coherently presented with the background glacier structure in compliance with the data trend.

Abstract— Pictorial visualization seamlessly integrates data and semantic context into visual representation, conveying complex
information in a manner that is both engaging and informative. Extensive studies have been devoted to developing authoring tools to
simplify the creation of pictorial visualizations. However, mainstream works mostly follow a retrieving-and-editing pipeline that heavily
relies on retrieved visual elements from a dedicated corpus, which often compromise the data integrity. Text-guided generation methods
are emerging, but may have limited applicability due to its predefined recognized entities. In this work, we propose ChartSpark, a novel
system that embeds semantic context into chart based on text-to-image generative model. ChartSpark generates pictorial visualizations
conditioned on both semantic context conveyed in textual inputs and data information embedded in plain charts. The method is generic
for both foreground and background pictorial generation, satisfying the design practices identified from an empirical research into
existing pictorial visualizations. We further develop an interactive visual interface that integrates a text analyzer, editing module, and
evaluation module to enable users to generate, modify, and assess pictorial visualizations. We experimentally demonstrate the usability
of our tool, and conclude with a discussion of the potential of using text-to-image generative model combined with interactive interface
for visualization design.
Index Terms—pictorial visualization, generative model, authoring tool

1 I NTRODUCTION designed templates or certain rules, along with interactive interfaces


In the contemporary era of data explosion, visualization technology that allow for easy customization of pictorial visualizations. However,
has become increasingly ubiquitous due to its ability to present com- rule-based approaches have been criticized for being too rigid and lim-
plex data in a clear and vivid manner. Pictorial visualization is one iting users’ creativity. On the other hand, example-based approaches,
of essential techniques used to embed semantic context into chart, such as TimeLine [9], apply machine learning techniques to extract
enhancing the visual representation hidden in the data [43]. The advan- design practices from existing visualizations and apply them to new
tages of pictorial visualization are manifold, including improvements designs. However, example-based approaches require a large number
to long-term recall, user engagement and enjoyment, and information of well-designed examples to be effective.
acquisition [3, 17, 18, 31]. The aesthetic and practical values make the Recently, some studies approach the problem with more involved
technique become widely adopted across a diverse range of applications, techniques that allow users to select graphic elements and then bind
including advertisement, education, and entertainment [2, 4, 5, 21, 30] . data to them. These approaches typically follow a retrieving-and-
Nevertheless, creating a visually appealing and informative chart editing pipeline, which involves retrieving a suitable image from a large
requires a certain level of design expertise. Novice users may inadver- corpus and then adapting it to the visualization. In the retrieving stage,
tently include unnecessary and distracting elements in their charts, such the methods retrieve appropriate visual elements from a large image
as excessive gridlines, 3D effects, and decorative elements, commonly corpus, using guidance that can be specified by a data file [11] or an
referred as “chartjunk” [14, 46]. To address this issue, authoring tools image [52]. The retrieved visual elements are then semi-automatically
that assist users in creating pictorial visualizations are emerging. Com- composed together, with user refinements enabled by an interactive
mon tools can be generally divided into two categories. Rule-based interface, in the editing stage. For instance, Infomages [11] retrieves
approaches, such as Piktochart [1] and DataShot [48], provide pre- an image containing the target visual style, and then applies filling or
overlaying techniques to adapt the data to fit the image. Alternatively,
the retrieved image can serve as a reference with style mimicked for
new content [34, 43]. For instance, Vistylist [43] decomposes the style
of reference chart into a tuple of color, font, icon to guide the creation
Manuscript received xx xxx. 201x; accepted xx xxx. 201x. Date of Publication of new designs. In the process, deep learning techniques (e.g., [9, 34])
xx xxx. 201x; date of current version xx xxx. 201x. For information on can be employed, to facilitate the maintenance of the visual structure in
obtaining reprints of this article, please send e-mail to: [email protected].
the reference chart while extending it with new information.
Digital Object Identifier: xx.xxxx/TVCG.201x.xxxxxxx
Nonetheless, these methods share some common issues. First, re-
trieving an appropriate visual design that accurately aligns with the focus on expanding the design space and propose various design di-
data requires a large-scale and high-quality image corpus, which is mensions as guidelines for practice. Recently, more efforts have been
challenging to construct. Studies in this direction often create new devoted to developing authoring tools for pictorial visualizations. In
datasets, resulting in various datasets (e.g., [12, 25, 28, 33, 43]) that line with this research direction, our goal is to develop an authoring tool
do not meet each other’s requirements. Second, the retrieval process that simplifies the process of creating plausible pictorial visualizations.
requires certain guidance, which can be difficult for novice users to
Visualization Authoring Tool. Numerous authoring tools have been
manipulate. With the increasing popularity of language models, numer-
developed to streamline the process of creating pictorial visualizations.
ous works (e.g., [12, 39, 47]) have explored the automatic generation
Previous methods [23,49,50] have focused on mapping visual elements
of charts from text. However, these methods are often limited by a
to data using suitable visual encoding channels, but the design of visual
predefined set of recognized entities and specific data types, such as
elements still requires the expertise of a professional designer. In order
proportion-related statistics [12]. Additionally, editing the image to
to reduce the effort required to design visual elements from scratch,
bind it with data is an error-prone task due to the trade-off between
many approaches implement a retrieving-and-editing pipeline to draw
fitting the chart into the image and adapting the chart to the image.
inspiration from a vast repository of dedicated resources. There are two
To alleviate these limitations, we propose ChartSpark, a novel picto-
types of designs based on the retrieved objects: content-based design
rial visualization authoring tool that embed semantic context into charts
and example-based design. Content-based design [11, 52] involves
based on text-to-image generative model, while preserving important
searching for an image that shares semantic context with the original
visual attributes such as trend. We firstly extract the target visualized
chart, and then merging its visual representation with the chart. For
object through text analyzer. The generation process will then be guided
instance, Infomages [11] retrieves an image that contains the target vi-
by the object and its corresponding description provided by the user.
sualization object and subsequently applies filling or overlay techniques
As an initial step towards utilizing generation techniques for creating
to adapt the chart to fit the image. These methods not only involve time-
visualization charts, we conducted a preliminary study that examines
consuming operations to merge retrieved images and charts but also
the different display formats of pictorial visualization. Drawing from
often compromise data integrity due to error-prone manual collation.
this study, we categorized the embedded objects into foreground and
Example-based design [9, 34, 43] focuses on retrieving a well-crafted
background, with two types of generation methods for each, namely
chart as a reference and emulating its style for creating new content.
conditional and unconditional. Unlike their definition in the natural
Vistylist [43] disentangles the font style, color, and icon from a retrieved
image domain, we determine whether the method is conditional based
example and applies them to curate a new chart. Despite their apparent
on whether it takes into account chart information. In addition, we offer
simplicity, both content-based and example-based design approaches
an editing module for modifying and adding information, as well as
still require a certain level of familiarity with visualization design prin-
an estimation module for providing users with feedback on distortion
ciples to avoid mistakes, which can be challenging for novice users.
perspectives. By utilizing generation techniques instead of searching
for images, we have three advantages over previous methods: 1) Elim- Some recent works have addressed the limitation by leveraging lan-
inate the tediousness of searching and the possibility of not finding guage models with text to guide the creation of proportion-related
suitable images, 2) Allows for a more natural combination of semantic visualization [12] or highlight the intended information [47]. Nonethe-
information and charts with robust generative ability, 3) Cover more less, such methods heavily depend on a predetermined textual set of
visual elements and support flexible text instructions. Therefore, our recognition entities and specific data types, such as only being able
approach enables a text-guided generation and reduces the complexity to recognize proportion-related statistics [12]. This work takes a dif-
of creating a pictorial visualization. ferent approach by utilizing a text-to-image generative model, instead
In all, our contribution are three-fold: of language models relying on structured statements, to enhance pic-
torial visualization design. With the generative model, users have the
• We create pictorial visualization based on generative model that can
flexibility to provide textual inputs, and can avoid the challenges asso-
be conditioned by both semantic context and structure information.
ciated with finding appropriate images and the potential for error when
• We construct a interface with text analyzer, editing module, and
manually editing the visual design.
evaluation module to assist user in creation process.
• The case study and expert interview demonstrate the effectiveness of Text-guided Image Generation. Text-to-image diffusion models have
our method. demonstrated a great success in producing high-quality images, sur-
passing previous mainstream GAN models [16, 22] and autoregressive
2 R ELATED W ORKS models [38, 51]. Pilot works [13, 20, 44] are constructed using a hierar-
Pictorial Visualization. Pictorial visualization incorporates dedicated chy of denoising autoencoders that exhibit desirable properties such as
visual embellishment to deliver complex data detail in an attention- a stationary training objective and good generalization, which provide
grabbing manner. There are various types of expressive visual forms, a solid foundation for following researches. To enable text conditions,
including icon-based graph [43], proportion-related chart [34], and Rombach et al. [40] incorporate cross-attention layers and operate
timeline diagram [9]. Compared to plain charts, the aesthetic appeal in latent space rather to solve the issue of costly inference. Stable
of pictorial visualization can create a strong first impression, increase Diffusion [37] achieves zero-shot performance through the utilization
engagement [18,21,30], and enhance information absorption [3,31] and of CLIP’s multi-modality space [36], while Imagen [42] discovers
memorability [2, 4, 17]. Bateman et al. [2] revealed that embellished that pretrained language models can serve as an effective text encoder
charts could significantly increase long-term recall by quantifying mem- for generation purposes. However, plain text-guided generation lacks
orability attributes, including data-ink ratios and visual densities, with- control, which has led to the development of models that emphasize
out sacrificing interpretation accuracy. However, some visualization controllable manipulation. Some studies [15, 41] use a set of images
minimalists [14, 46] argue that visual representations should be de- to fine-tune generative models and maintain consistent style in newly
voted to displaying data information maximally. Visually embellished synthesized images. Other studies [8, 19] deep into the cross-attention
charts can be distracting to users and reduce chart readability. These mechanism and inject semantic information throughout the diffusion
claims were supported by studies [10, 24] that estimate the amount of process. Recent advancements in conditional models [27, 53], which
quantitative information extracted from graphical perception. incorporate text as well as additional grounding concepts such as bound-
Despite the ongoing debate on “chartjunk”, studies (e.g., [4, 14]) ing boxes and keypoints, have made it feasible to tackle the issues of
generally recognize the inherent benefits of visual embellishments, as layout and composition in image generation.
long as they do not cause distractions or misinterpretations. Borkin The aforementioned methods focus on natural images, while we are
et al.’s [4] “beyond memorability” study concludes that properly used pioneering the use of text-to-image generative models in the realm of
pictograms do not interfere with understanding and can improve recog- visualization. This is nevertheless a challenging task, as the generated
nition. The vast design space for pictorial visualization allows for a pictorial visualizations must maintain data integrity while also accu-
great deal of creativity but also poses challenges. Some studies [6, 32] rately reflecting the semantic context. To address this challenge, we
General user Artist Vis Expert the title and observing the trend to quickly comprehend the data. Next,
Visualize
Extract
Retrieve
they draw or retrieve some visual elements reflecting the theme of data.
Semantic
Visual
Data

Raw Data Binding Estimation Once the visual elements were established, participants attempted to
Context Element
bind the data with these elements using techniques such as rotation,
scaling, deformation, and stretching. At the end of the creation process,
(a) some participants evaluated the final design to ensure the accuracy of
data binding and avoid the error of manual manipulation.
Foreground Internal Single Foreground Internal Multi Foreground External Background
Based on the interviews, we found that participants all mentioned
that it was tedious and challenging to bind data with visual elements.
“I need to repeat the same operation for each single element”, A2 com-
mented, “If I want to change visual materials, I have to start my work
from the scratch. It is not good for creation iteration”. Meanwhile,
P1 noted that “The image cutout and anti-aliasing need to use profes-
sional software that is hard to use. It is also complicated to deform
the visual element to adjust the shape of charts”. Furthermore, both
P1 and P2 stated that it was difficult to find images that matched with
the theme. “I can barely find any concrete elements, especially for
abstract vocabulary”, mentioned by P2. However, neither artists nor
visualization experts thought that would be a potential issue. As for
evaluating visualization performance, visualization experts raised up
the concern about the visual distortion and data integrity compared with
0 20% 40% 60% 80% 100%
the original chart.
(b)
3.1.2 Design Requirements
Fig. 2: Analysis of preliminary study. (a) General workflow of cre- Besides the discussion about the conventional workflow, participants
ating pictorial visualizations contains five stages. Participants with also expressed their expectations and concerns about involving gener-
different backgrounds have encountered difficulties on the stage with ative models to assist the process of creating pictorial visualizations.
indications. (b) Taxonomy of common design patterns for different All participants believed that it was essential to capture appropriate
chat types. We represent the percentage of each design pattern for each semantic context and significant data features before starting to design,
chart type in the corpus. For illustrating the representation of each especially in the context of AI generation. They sketched a chart that
design patterns, we use several types of candy to represent. reflected data at the beginning of their design process, which provided
a solid base for adding the visual elements with relevant semantic con-
integrate chart information into the textual guidance through the use of text. When telling participants generative models were involved in the
attention mechanisms. creation process, they emphasized the necessity of raw data preview so
that they could evaluate if the portrayal of data by the generative model
3 P RELIMINARY S TUDY was faithful. Participants hoped that the generative model could allow
We conducted a preliminary study to comprehend the design process flexible customization in design. Specifically, some participants wanted
in crafting a pictorial representation. From the formative interviews, a controllable generative result including its shape and color while
we gained a better understanding of the design workflow (Sect. 3.1), others wanted a large variety of styles in visual elements. Moreover,
and received concerns and expectations that are summarized as design participants expected an integration of visual attributes from charts
requirements. We collected a corpus of pictorial visualizations and while generating images. “Recent text-to-picture models work well in
examined the design patterns presented in them (Sect. 3.2). general”, V2 acknowledged, “However, particularly for pictorial visu-
alization, we must consider information encoded by visual channel like
3.1 Formative Interviews the trend for line plot and height for bar chart”. All participants antici-
To study a typical workflow and ensure the authoring tool can be pated an evaluation module to validate the quality of their works once
useful and accessible to a wide range of users, we conducted formative the visualization was completed, which provided the visual distortion
interviews with people from three different backgrounds, including two from the original data including height, area, and angle.
artists (A1, A2) who mostly use Adobe Illustrator and P5.js in daily By following the design workflow, there are four design requirements
work, two visualization experts (V1, V2) who have around 3-years data based on participants’ concerns (R1 and R4) and expectations (R2 and
analysis and dashboard design experience, and two individuals (P1, P2) R3) respectively.
without relevant training in art or visualization. Each participant was R1. Preview data and theme. Visualizing the raw data and obtaining
interviewed individually, and the length of each interview varied from semantic description before starting visualization design.
30 minutes to an hour. The interview consisted of three stages. First, we R2. Personalize visual element. Customizing pictorial visualization,
introduced the concept of pictorial visualization and presented several such as color and shape by controllable manipulation, while ex-
examples to the participants. During the second stage, participants were panding design space with various styles of visual elements .
provided with a dataset of the global change in a desert area, which R3. Embed semantic context into chart. Integrating the visual el-
included an x-axis for time, a y-axis for area, and a title. They were then ement and data automatically and naturally, while supporting
instructed to describe their design workflow using rough sketches while flexible embedding methods for the semantic context.
vocalizing their thought process. Finally, we asked the participants the R4. Evaluate the performance. Evaluating visualization design in
following questions: 1) What are the key steps involved in creating visual distortion, which indicates the loss of data integrity.
a pictorial visualization? 2) Which step in the pictorial visualization
creation process do you find the most challenging, and why? 3) What 3.2 Pictorial Visualization Corpus
expectations and concerns would you have if a generative model were Our preliminary study also unveils systematic patterns for integrating
involved in creating pictorial visualizations? semantic context with data in pictorial visualizations. To collect the
sample, we drew upon datasets provided by prior research [11, 26, 43]
3.1.1 General Workflow and manually selected some typical charts, resulting in 587 charts to
The participants’ workflows in creating pictorial visualizations revealed constitute part of our data. As some of the collected pictographs overly
that there was a general pattern to the process, as shown in Fig. 2 (a). focused on a specific type such as icon-based, we also retrieved addi-
Firstly, participants recognized the theme and data features by reading tional examples from Pinterest and Google to supplement our corpus.
After removing redundant visualizations, our final corpus comprised distortion of the generated chart, while the evaluation in the final stage
863 samples. We then classified the images based on embedding and delivers more accurate values and explicitly visualizes the error.
representation. To minimize individual judgment bias, each visualiza-
tion was appraised and categorized by two authors, with a double-check 4.2 Feature Extraction
process. We analyzed the collected data and summarized the common
Since pictorial visualization fuses both numerical data and textual infor-
design patterns into a taxonomy. As depicted in Fig. 2 (b), we employ
mation, we extract the data feature and semantic context, respectively.
various shapes of candies as content to represent this taxonomy. Over-
Data feature. To facilitate users’ comprehension of the overall appear-
all, the design pattern concerning the embedded object can be divided
ance of the raw data, we offer a variety of chart types from which they
into foreground and background. For the foreground, there are various
can choose to visualize the data (Fig. 5 (A1 )). Users can efficiently
forms of data representation, such as encoding as an internal element
identify patterns and trends concealed within the data in data preview
of the chart itself or locating it externally, and filling the element in
(Fig. 5 (A2 )). This data preview also functions as an indicator to detect
a single or multiple manner. In contrast, the background refers to the
any deviations in the subsequent generation process. Furthermore, we
overall visual context in which the data is presented.
extract the data annotation encompassing the x and y axes and the title
• Foreground Internal Single (394, 45.6%). The semantic context into an SVG format file, making it available for subsequent user editing.
is embedded in foreground, encoding the visual element in a single Semantic context. For the semantic context extraction, we employ
manner as the chart itself. Examples include rectangular candies for a two-step approach, namely keyword extraction and relevant word
bars in a bar chart and round candies for points in a scatter plot. retrieval. Initially, we extract the keyword using MPNet [45], a pre-
• Foreground Internal Multiple (299, 34.7%). The semantic context trained model featuring a sentence transformer architecture. However,
is embedded in foreground, encoding the visual element in a multiple since the keyword may not be sufficiently explicit to inspire a concrete
manner as the chart itself. For example, a stack of the same candies visual element, we also provide relevant words to stimulate users’ cre-
to form a bar in a bar chart. ativity, particularly for individuals with limited design expertise, in
• Foreground External (104, 12.1%). The semantic context is embed- accordance with Fig. 2. This objective is accomplished by estimating
ded in foreground, encoding the visual element in a single manner word similarity through Word2Vec [29] to convert words into vectors.
and locating it externally. For instance, there is a candy next to each To accomplish this, we employ the English Wikipedia Dump from
sector in a pie chart. November 2021, comprising a corpus of 199,430 words, each repre-
• Background (66, 7.7%). The semantic context is embedded by sented by a 300-dimensional vector. In our experiments, we discovered
visual elements in background. Some background may also reflect that some top similar words are more closely associated with proper
the data information by the contour visual elements. nouns, which could decrease the recognition capacity in visualization.
While the visual representation for embedding semantic context into With the assumption that the occurrence frequency of the word in the
charts can be diverse, we observe a significant phenomenon in both corpus is related to its usage in everyday life, we integrate frequency
the foreground and background: whether the data can be reflected and similarity as two criteria to rank the relevant semantic context,
within the displayed visual element. If so, the semantic information yielding the top 7 words as a result (Fig. 5 (A3 )).
and data information will be encoded together within the element, and
the element would comply with the data’s inherent trend or magnitude. 4.3 Generation
Based on the preliminary study, we identify four embedding types,
4 M ETHOD along with foreground and background as the two main layers for em-
4.1 Overview bedding objects. Our key observation is that some embedding types
only require visual embellishment containing semantic context, while
Extraction Generation Estimation others need to comply with the inherent data to make it part of the chart
Prompt Generation
itself. In light of this observation, we devise a generation methodology
Data feature Process that employs both unconditional and conditional approaches. The fun-
Visual

Embedding Refine
object Unconditional Distortion damental distinction between these approaches hinges on whether the
Semantic Context Foreground
Conditional
Background chart information is factored into the generation process. As illustrated
in Fig. 4(a), the generation stage consists of three core modules. The
Fig. 3: The 3-stage framework of ChartSpark. The extraction stage unconditional module adopts the fundamental text-to-image diffusion
provides users with data features and semantic context. The generation model, subsequently generating a corresponding visual element. For
stage produces visual elements by the input prompt and selected method the conditional module, we inject the chart image into the attention
with a final refinement. The evaluation stage evaluates the generated map, serving as guidance for the ensuing generation process, as shown
visualization based on distortion. in Fig. 4(b). Lastly, the modification module is tasked with reproduction
and refinement to accommodate the four embedding types and enhance
The proposed ChartSpark framework’s workflow is depicted in
its details.
Fig. 3, comprising three primary stages. In the initial stage, data
features are visualized and semantic context is extracted from the raw
data, offering users a visual preview and thematic topic to enhance their 4.3.1 Unconditional Generation
comprehension of the data (R1). Subsequently, users employ prompt- Diffusion-based generation methods outperform previous generation
driven generation to acquire visual elements, embedding the semantic methods in the quality of generated images and semantic comprehen-
context into the foreground or background in either a conditional or sion. In this work, we develop our framework based on the Frozen La-
unconditional manner (R2, R3). Ultimately, the evaluation module tent Diffusion Model (LDM) [40]. Below, we outline the core structure
furnishes a suggestion mechanism to indicate data distortion (R4). In and generation process of LDM to provide some preliminary knowl-
comparison to the workflow depicted in Fig. 2, ChartSpark streamlines edge. Similar to the previous diffusion models [13, 20, 44], LDM
the process by supplanting the retrieval and data binding stages with follows a forward process that incorporates Gaussian noise using a
the generation, which mitigates the inconvenience of retrieving visual Markov process, and a reverse process that denoises to recover the
elements and integrating them with charts. Instead, users provide original distribution from reconstructing the image. However, LDM
prompts regarding their preferred style and embedding technique to distinguishes itself from other diffusion models by employing com-
steer this automatic generation. The ChartSpark framework features a pressed and low-dimensional latent space instead of pixel-based diffu-
sandwich structure, with the first and last stages ensuring optimization sion, thereby reducing computational costs for inference. The LDM
of the middle stage’s performance and augmenting the faithfulness in architecture includes an autoencoder that converts pixel images into
data and expressiveness in visual representation. The preview presented latent representations and a UNet that predicts noise and performs de-
in the initial stage enables users to intuitively discern the potential noising in the latent space. To enable text-guided generation, LDM
Unconditional
Modification
background removal

Replication

“A cherry blossom,

white and pink.”

Diffusion Model

“A tree branch, slender

and grow horizontally.”

Refinement

Augmentation module

Attention map
“The lighthouse by the “A bowl of cherries” “A cute fluffy cat in
Conditional
Sea” the sofa”
0 Attention Score 1

(a) (b)

Fig. 4: The process for unconditional and conditional generation for foreground and background. (a) Given a prompt, the generative model
generates relevant visual elements by unconditional or conditional method, then the generated visualization can be edited through replication and
refinement. (b) Internal mechanism of incorporating an image and textual input into the attention map.

enhances the underlying UNet backbone with a cross-attention mecha- upsampling, resulting in cross-attention layers being present at different
nism, facilitating the integration of multimodal guidance. resolutions. In our experiments, we observed that the middle layer
Foreground. In our preliminary analysis of existing pictorial visual- exhibited better performance, and empirically set the N as 16.
ization, the foreground is the common object to embed the semantic Background. In unconditional background generation, the aim is to
context and can exhibit various representations. The unconditional incorporate semantic context without extracting objects. To achieve
generation can produce visual elements to embellish the chart, which this, we employ a straightforward text-to-image generation by the
closely matches the semantics but does not contain information about fundamental diffusion model.
underlying data. We achieve this by utilizing the prompt-driven method.
The text prompt P provided by the user consists of an object Pob j and its 4.3.2 Conditional Generation
corresponding description Pdes . In Fig. 4 (a), the term with an underline Compared with unconditional generation, conditional generation in-
represents Pob j , while the term without an underline represents Pdes . volves integrating chart image Ic to make the generated visual element
Given the generated image Ig , our objective is to extract the visual comply with the data information, such as trend and contour. There
element related to the semantic context Pob j from the image. To accom- are two principal challenges that require addressing: 1) Enhancing gen-
plish this, we use cross-attention between object Pob j and Ig to locate erational diversity. We introduce an augmentation module to expand
the target region, and then remove the background to obtain Iob j . As the possible fusion directions. Nevertheless, we have discovered that
shown in the top of Fig. 4 (b), we obtain the visual feature map V of the conventional augmentation operations used in natural image domains,
generated image from the autoencoder and embedding of Pob j . Next, such as cropping and flipping, are inappropriate for charts and may
we use linear projections to transform them into Q and K. We then ultimately jeopardize the data integrity of the chart. 2) Integrating the
multiply Q and K to obtain the attention score, which is subsequently semantic context and the chart. This entails determine how to condi-
multiplied with V to generate the final attention map. In summary, the tion the generation process by merging the attention map containing
process can be described as follows: semantic context and chart with the data information.
Foreground. The conditional foreground generation emphasizes the
QK T integration of semantic context into the visual marks in the chart, while
A(Q, K,V ) = So f tmax( √ ) ·V, (1)
d adhering to the data represented within the chart. Intuitively, the seman-
where the d represents the dimension of the latent projection dimen- tic context needs to be integrated into the rectangle, line, sector, and
sion of Q and K, and the So f tmax function is utilized to normalize bubble for the bar chart, line chart, pie chart, and scatter plot, respec-
the attention score. As shown in the bottom of Fig. 4(b), the attention tively. Initially, we randomly augment Ic with various manipulations,
score is directly proportional to the strength of the relevance between including Gaussian blur, dynamic blur, and image warp, as depicted
the image and text. As a result, we can extract the object of interest in Fig. 4 (a). The augmentation module aug is established based on
from a cluttered background by comparing pixel differences. To ac- the principle of enhancing the diversity of chart element shapes while
complish this, we first calculate the threshold to distinguish the object maintaining data integrity. Then, we obtain the attention map A con-
and background, obtaining a mask. Next, we perform a pixel-wise cerning the Pob ject from the generation process (Eq. 1). To infuse the
comparison at the corresponding positions in Ig to obtain a rough object data information from Ic into the attention map, we utilize Ic as a mask,
region, denoted as Iob j . Lastly, to achieve a more refined result, we ensuring the attention map possesses the same shape as the element in
utilize ISNet [35], a state-of-the-art segmentation neural network, to Ic . To maximize the fused image I f use by including as much semantic
eliminate redundant information. The process can be described through context as possible, we employ two common affine transformations,
the following equations: scaling and rotating. The optimization function can be expressed as:
∑ Ai j I f use = max[aug(Ic ) φ ( fupsample (A), θ , s)], (5)
M = I[Ai j > ], (2) θ ,s
N2
Iob j = fupsample (M) Ig , (3) where φ is the affine transformation parameterized by scaling param-
eter s and rotation parameter θ . Finally, taking as input I f use , which
0 integrates semantic context and chart information, we regenerate to
Iob j = R(Iob j ), (4)
obtain Ig .
where I[.] is the element-wise indicator function on the matrix, N 2 Background. The background serves not only as a container of seman-
represents the total number of pixels in attention map A, and M is tic context but also as a part of the chart that conveys data information.
calculated as a matrix with a value of 1 at the object’s location and To this end, we fuse the feature of Ic into the background generation
0 elsewhere. Since M has the same dimensions as A, we employ an process. We also leverage the augmentation module to improve the
upsample technique to resize its shape to match that of Ig . R represents diversity of generation. Unlike the augmentation for foreground, which
the redundant information removal operation. The symmetric and focuses on element distortion, the augmentation for the background
hierarchical structure of the UNet involves both downsampling and necessitates the seamless integration of the chart’s features with the
A C
A1 C1

A2

C2

A3

B B3
B1
B2
C3

Fig. 5: User interface of ChartSpark. It consists of a central canvas for manipulation and composition of visual elements and three main modules
corresponding to the process of feature extraction (A), generation (B), as well as evaluation and editing (C).
background component. In practice, we define a set encompassing this issue, we optimize these local details using the reconstruction
various interaction methods between the chart elements and the chart capabilities of the generative model.
edges (axis and border of the chart image) to establish a closed shape, Refinement During the generation process, users may integrate mul-
particularly in the case of line charts, as the other three chart types tiple embedding methods and experiment with various generations,
possess ample space to encode information. In the process of merging yielding several independent generation results. For instance, as de-
semantic context and chart, we utilize a blending approach by calcu- picted in Fig. 5, the tree branch and the cherry blossom are generated
lating a weighted average of the attention A and the augmented chart separately, employing unconditional and conditional foreground modes,
image aug(Ic ) to facilitate this integration. This can be expressed as: respectively. Distinctly generated visual elements can give rise to nu-
I f use = ρA + (1 − ρ)aug(Ic ), (6) merous issues, such as incoherent concatenation and inconsistent styles.
Refinement based on image-to-image generation can supplement de-
where ρ ∈ [0, 1], we set ρ as 0.6 empirically. Then we injected the I f use tails to counteract incoherent concatenation and harmonize the style of
into the generation procedure to achieve reconstruction, yielding the Ig . the image, while preserving its layout and semantic context.
4.3.3 Modification
Upon producing individual components, it is essential to adjust them
to accomplish the ultimate composition. At the element level, we 4.4 Evaluation
reuse the generated elements to encode other visual marks in the chart,
enhancing its reproducibility and adaptability. At the chart levels, we As stated in requirement R4, it is essential to provide an evaluation
refine the image details to create a more harmonious overall appearance, module to inform users of any potential distortions that affect data
particularly when merging independently generated visual elements. integrity when creating pictorial visualizations.
Replication To apply the visual element to other visual marks in a To assess distortion, we ascertain the disparity between the generated
chart, traditional tools present two challenges in this process, including visual element and the original plain chart (Fig. 5(C1 )). Given that each
the tedious task of copying each individual element one by one and chart employs distinct visual channels to encode data, we tailor our
the risk of element distortion. We propose a warp-and-merge strategy methodology accordingly to guarantee precise assessement for each
to overcome these challenges. Taking a bar chart as a case in point. chart type. For bar charts, we concentrate primarily on the height of
Initially, we generate the fundamental visual component by employing each bar as an indicator of distortion. For line charts, the portrayal of
the tallest bar as a reference point. This streamlines the issue by the trend is of paramount importance in the evaluation. For pie charts,
examining how to reduce the element to correspond with the shorter we measure the angle for each sector to gauge distortion. For scatter
bars. To elaborate, we partition the visual component into five equally charts, we estimate the distance of the centroid of each point to assess
elevated sections and compute the structural similarity (SSIM) amid distortion. In contrast to the approach presented in [11], our system
each pair. Grounded on the proportion of the bar, we cut the most not only displays global numeric distortion values to users but also
similar part and concatenate the remaining parts together. However, the identifies local regions with high errors and presents them to users in
concatenated image might have artifacts and rigid junctions. To address visual context to facilitate modifications (Fig. 5(C1 )).
Global
Total.
Agricultural 7.14 7.07
Land Use by
60
1.2 Wheat
1.0

in y
50

rs d b
12/13

nd ito se
6.78
40 16/17 6.73

la is u
0.7 0.8

ea l v ort
279185
323190 08/09

w Z na sp
30
20/21

Ne tio ran
20

a t
rn of
te s
10

in ode
713510

m
0 0.1 Goals scored by Linel Messi for
04/05 FC Barcelona in all competitions
783820
869838
2000 2004 2008 2012

(a) (b) (c)


Average daily time spent reading per
capita in US in 2021 15%
Sweet fruit
26000 Number of Cases of
Accidental Fire in India
& veggies
25000
0.5 hrs

0.4 5% 24000

75%
Green

Veggies 23000
0.3
Citrus fruit
22000
0.2

21000

Juice
0.1 Yrs

Composition 20000

15-19 25-34 45-54 65-74 Yrs 2003 2005 2007 2009 2011 2013

(d) (e) (f)


Fig. 6: Pictorial visualizations generated by ChartSpark. (a) A bar chart showing the transportation types used by visitors in New Zealand.
(b) A scatter plot showing Messi’s total goal scored and average goal scored for FC Barcelona per season. (c) A bar chart showing the global
agricultural land used by cereal. (d) A bar chart showing the average daily reading time per capita in US in 2021. (e) A pie chart showing a juice
composition recipe. (f) A line chart showing the number of Indian fire accidents per year.
5 I NTERFACE izations. As illustrated in Figure 6, we present the basic chart for each
The user interface (Fig. 5) is divided into three core modules that pictograph in the lower left corner, while showcasing the alternative
correspond to the process of feature extraction (A), generation (B), and visual elements generated on the right side.
evaluation and editing (C). The central canvas of the interface allows Foreground External. As shown in Fig. 6 (a), the shape of visual
users to flexibly manipulate elements. elements in this pictorial visualization is not constrained by the data.
Feature extraction. To begin, the user uploads a data file in either Instead, they function as adornments to the chart. To generate these
JSON or CSV format and selects a desired chart type and aspect ratio in elements, we have adopted the unconditional foreground mode, which
the settings panel (A1 ). The raw data is displayed in A2 , and the relevant enables us to extract the target through Pob ject , while using the same
semantic context is shown in A3 . The data annotation information Pdescription , which we have defined as “a well-designed sticker”, en-
will also be rendered on the canvas. Both the visual representation sures consistency in the style of different transportation representations.
of the raw data and the semantic context aid users in understanding Foreground Internal Single. Fig. 6 (b) and (e) shows using generated
the data and supplying a design precedent before creation. They are visual element to replace the original visual mark. In the (b), the
displayed throughout the creation process, enabling users to compare generated pictograph employs same encoding as the original chart, with
the generated visualizations with the original data (R1). height signifies goals scored during a specific season and size indicates
Generation. The Generation module comprises three components: the average goal. Football elements replace the bubbles, and a football
generation options (B1 ), gallery (B2 ), and modification (B3 ). In B1 , pitch is generated to complement the foreground. Fig. 6 (e) displays
users can customize the generation object and generation styles utilizing the cross-section of each fruit and vegetable supplanting the original
text boxes and select the generation target and method through two sets fan-shaped segments in the pie chart.
of buttons (R2, R3). The generation results are displayed in B2 , labeled Foreground Internal Multiple. Fig. 6 (d) involves multiple replica-
with the generation options. Users can choose whether to preserve tion units within a single bar. ChartSpark provides various visually
or discard the generation element in the B2 , and regeneration can be diverse elements that fit the height of the bar using the same prompt,
performed from B1 . The B2 view is closely linked to B1 , as the content “the pile of books”. Each book in the element is unique, circumventing
of text boxes and state of generation options are replaced with the labels monotonous duplication. Moreover, each bar is filled with the element
in the row of B2 . The first two buttons of B3 serve as modifications in naturally and automatically without the manually manipulation.
the generation process, corresponding to the replication and refinement Background. In Fig. 6 (c) and (f), the chart has been merged into the
functions described in Section 4.3.3. image in a more natural manner under the mode of background and
Evaluation and editing. C1 offers users data distortion evaluation with conditional, which encode the both the semantic and data information..
both explicit qualitative value and error visualization (R4). C2 com- In case (c), as the data describes the land use of wheat, the prompt is
prises several tools allowing users to edit their visualizations, including utilized as “aerial view of wheat field”. In case (f), the fire trend of the
font (such as type, size, and bolding), basic shapes, effects, and strokes. generated image aligns with the original chart.
Moreover, users can flexibly transform their work using scaling and
rotating. C3 displays the layers of elements, and provides the ability to 6.2 User Study
adjust the layer order and visibility. We conducted a user study to evaluate the usability of our tool and
the effectiveness of our approach to facilitate creation of pictorial
6 E VALUATION
visualization. Our user study consists of a survey about usability and
In order to comprehensively evaluate the efficacy of ChartSpark, we effectiveness, succeeded by a semi-structured interview to obtain more
performed three distinct analyses, including example applications, a in-depth qualitative feedback.
user study, and an expert interview. Participants: We recruited 8 participants aged between 21 to 28, who
are visualization users interested in pictorial visualization. The partic-
6.1 Example Applications ipants are mainly graduate students majoring in different disciplines
Based on the taxonomy used in the preliminary study, we utilized from computer science, business, art and design, to architecture. All
ChartSpark to generate the four primary categories of pictorial visual- participants have used or designed pictorial visualization.
ChartSpark provides enjoyable and creative experience. Data Integrity
Procedure: The study was conducted in a one-on-one and face-to-face Enjoyment: 5
MEAN = 4.00
4.5
manner. First, we showed some examples of pictorial visualization and SD = 0.50
4 Diversity
ChartSpark supports flexible forms of pictorial visualization. Manipulation
introduced to the participants the basic concepts including the embed- Flexibility: 4
3.75

3 3 4.75
ding objects ( foreground and background) and embedding techniques MEAN = 4.75

SD = 0.43 3.75
3
3.75

(conditional and unconditional). Next, we presented the interface to ChartSpark is effective for creating pictorial visualization.
Effectiveness: 3.25 3 2.25
the participant and introduced the functionalities. We then guided the MEAN = 4.50

4.75 3.75 4.25


SD = 0.71
participant through a step-by-step process of creating a pictorial visu- ChartSpark is easy to use.
3.5
Accessibility Fitness
alization. Finally, we encouraged them to independently explore the Ease of use:
MEAN = 4.38
4.5
SD = 0.70
authoring tool for 10 minutes to gain further familiarity. Afterwards, 0 1 2 3 4 5 6 7 8
Customization
participant were asked to make a pictorial visualization from given data Strongly Disagree Disagree Neutral Agree Strongly Agree Infomages DataQuilt ChartSpark
files. We carefully observed their creation process and documented (a) (b)
their inquiries and remarks. Upon finishing the creation, we started
the survey, where the participant answered four 5-point Likert scale Fig. 7: Results of user study and expert interview. (a) represents
questions about usability and effectiveness. The survey was followed proportion of participants’ evaluations in enjoyment, flexibility, effec-
by a short interview to gather additional feedback. Each study lasted tiveness, and ease of use, with mean and standard deviation respectively.
for about 50 minutes. The participant was compensated with a gift of (b) is a six metrics which compares the performance of ChartSparck
10$ after they completed the study. with DataQuilt and Infomages.
Result Analysis: We report our survey results here, see Fig. 7 (a).
• Enjoyment: Notably, the novel creative experience provided by our post-editing functions. As a result, I often have to export the AI-
tool proved to be delightful for the majority of participants (mean = generated result to external tool like Photoshop for editing. So I
4.00, SD = 0.50). P1 stated, “I really enjoy engaging with the AI and appreciate the effort to add several editing functions, but I also sug-
see all the interesting possibilities generated by the tool.” “I have gest to add more advanced functions like those in Photoshop or
tried many popular text-to-image tools, this is the first that enables develop a plugin for Photoshop”, P1 commented.
me to employ AI in visualization. It has heightened my engagement
in the creative process”, commented P8. 6.3 Expert Interview
• Flexibility: The majority of participants praised the flexibility of To conduct a thorough comparison of the performance of ChartSpark
our tool (mean = 4.75, SD = 0.43). P3 said, “I could freely switch with previous tools, including DataQuilt [52] and Infomages [11], we
between foreground elements and background elements and combine engaged four designer and experts with extensive experience in infor-
them to create a variety of interesting pictorial visualizations.” P7 mation design to evaluate these tools. Designer 1 (D1) has expertise in
added, “This tool driven by AI generative model helped me find graphic design and has been working in the information design field
design materials of diverse styles which are not easily available across both print and digital media for over six years. Designer 2 (D2)
in traditional design pipeline where I had to spend a lot of time specializes in UX/UI design and has been working in this field for
searching for references of different styles.” more than three years. Designer 3 (D3) focuses on data storytelling and
• Effectiveness: All participants acknowledged the effectiveness of interactive art, and is familiar with AI-driven techniques for creating
our system for creating pictorial visualization (mean = 4.50, SD = engaging visual content. Expert 1 (E1) has been engaged in both aca-
0.71). “As I am not adept at using professional graphic design demic research and industry projects for more than four years, with a
software like Adobe Illustrator, this tool has made it possible to many focus on the development and evaluation of visualization techniques
people like me”, P6 remarked. “I am amazed that the generated and tools. We established six metrics that comprehensively cover the
elements can correspond with the data trend, and there is a wide creation of pictorial visualizations to help ensure a clear and detailed
variety of options to choose from”. assessment of each tool.
• Ease of use: Most participants agreed that the authoring tool is easy • Data Integrity. This metric evaluates the system’s respond to data
to use (mean = 4.38, SD = 0.70). “Most of the functions are intuitive, errors and the tool’s ability to communicate the errors to users.
easy to understand and use”, P5 commented. “The whole creation • Generalization. This metric examines the tool’s ability to support a
process is smooth because of the simple design of the interface and broad range of methods to incorporate semantic context into charts.
the convenient functions supported by AI such as the replication • Fitness. This metric concentrates on the successful integration of
button”, P1 said. “However, I think the object and description can be visual elements and charts, taking into account aspects such as trend.
put into one text entry box to make it more convenient,” P2 suggested. • Customization. This metric evaluates the tool’s capacity to tailor
Feedback: We summarize the feedback from the interview as follows. visual elements and offer a diverse set of customization options.
• One-shot generation vs. more intermediate control: Some novice • Accessibility. This metric focuses on the ease with which visual
users may prefer quick generation without much human intervention. elements can be obtained for design purposes.
For example, P4 said, “It would be really efficient if I only need to • Manipulation. This metric measures the tool’s effectiveness in
import the data and the tool just generate ready-made pictorial visu- allowing users to alter visual elements for specific needs of charts.
alization for me.” However, users who have experience or care about The evaluation involved six metrics, of which the first metric focused
the details in design favor more controlability of the intermediate on faithfulness, while the second to fourth metrics focused on the ex-
process and the results. “Even though AI can provide impressive pressiveness of pictorial visualization. The last two metrics evaluated
inspiration, it is better for me if I can still pick what materials I want the usability of the tools. To begin the evaluation process, we presented
to use like in this tool. But I suppose it would be even better if I can several cases to each participant, revealing embedding techniques that
have more control such as by sketch.” merge semantic context and data information in pictorial visualiza-
• Faithful evaluation: The users also talked about CharSpark sup- tions. Next, we showed a video of each tool, demonstrating the overall
porting reliable data presentation. P3 said, “I worry that pictorial operation process and main function modules in a randomized order.
visualizations may overly emphasize aesthetics, sacrificing readabil- We then presented additional cases to the participants and asked them
ity. Thus, I value the design featuring editable data annotations, how to create a pictorial visualization if they use these tools follow-
saving me the trouble of adding sticks and labels manually.”. P5 ing the think-aloud protocol. After completing the evaluation process,
mention that, “The evaluation view displays a precise error location, each participant provided a score for each tool based on the six eval-
allowing me to easily identify problematic areas, which is far more uation metrics. The results are shown in Fig. 7 (b), which provides a
intuitive than simply presenting numerical values.” comprehensive overview of the performance and usability of each tool.
• Integrated tool: Users also pointed out the importance of an inte- Faithfulness. D1 mentioned that “it is challenging to check the accu-
grated tool. “Nowadays many AI applications do not come with racy of data binding in DataQuilt as there is no evaluation offered”. On
the other hand, D3 pointed out that “ChartSpark offers both numerical by our tool can be a controversial issue, as the training of generative
evaluation results and shows the error location, which can guide users models uses a significant amount of material obtained from the Internet.
on how to improve the visualization.” Faithfulness in data expression. The faithfulness of data expression
Expressiveness. Regarding expressiveness, all participants agreed can be affected in the process of creating pictorial visualization. Despite
that ChartSpark is more versatile in embedding semantic context to we integrate chart images as a condition for conditional generation,
both foreground and background, whereas Infomages focus on the there is an inevitable degree of visual distortion that can cause the
background and DataQuilt on the foreground. D2 commented that resulting visual elements to deviate from the original data. This creates
“Infomages use an image as the main component, which makes cus- a trade-off between maintaining data integrity and crafting a visually
tomizability weak and compatibility with chart types limited”. D1 appealing and contextually relevant design. To ensure effective com-
appreciated Infomages’ design and found that the merge of elements munication of information, users may need to add data annotations to
and charts seems very natural. However, she also mentioned that “it enhance the readability of the visualization. To address this issue, we
is difficult to find an image that has the same trend as the plain chart.” offer editable data annotation formats and an editing module equipped
Participant D4 commented on the embedding methods of DataQuilt with various tools. However, this can increase the workload and im-
and Infomages, describing them as “unidirectional and full of compro- pose design limitations on users, which requires manual adjustment
mises”. They explained that “DataQuilt extracts visual elements from of sticks and labels to ensure consistency with the overall style of the
images and adapts them to fit the height or area of the original chart, visualization.
while Infomages alters the chart’s shape to accommodate the semantic
image”. In contrast, ChartSpark’s generation results represent a bilat- 7.2 Future Work
eral merge, reshaping the visual elements intrinsically. D3 said that Enrich the design space. To enhance the practicality of ChartSpark,
“there is no much room for me to change the shape or color in elements we should consider integrating more types of infographics. Further-
in Infomages and DataQuilt”. Participant D2 praised ChartSpark’s more, the embedding object can be subdivided into more granular and
customization capabilities, noting that “if I want to change the number targeted categories for each specific type of chart to achieve better
or color of visual elements, I can achieve it by modifying the description data communication and stronger visual expressiveness. Currently, our
prompt. I find the alternative element in the gallery is important for augmentation modules utilize the same manipulation techniques for all
design iteration”. However, D2 express her worry about the generated types of charts. However, due to differences in visual encoding among
element can’t meet the high resolution requirement, in this way, she chart types, a more targeted design is necessary to ensure effective data
will more prefer to choose upload the visual element. communication and stronger visual expressiveness.
Usability. All participant mentioned that “it is challenging to retrieve Improve controllability of generative model. To improve the control-
the element containing the exact same encoding as the given data, lability of the generation process, we can incorporate different forms
particularly for Informages”. They emphasized that the overlay per- of prior knowledge and allow for editing and iteration of generated
formance of Informages “heavily relies on the retrieved image”. Par- results. In addition to prompt-based input, other description types,
ticipant D4 expressed appreciation for ChartSpark, stating that “it such as sketches and semantic maps, can be included to provide more
eliminates the need to search for images and allows them to generate flexible and diverse information, leading to a more controllable gen-
as many visualizations as desired”. However, Participant D1 raised eration process. Furthermore, we can enhance the iteration process
concerns about potential copyright issues arising from AI-generated by allowing users to select satisfactory generated images as a baseline
content. Regarding the manipulation of these tools, Participant D2 said for future generations, instead of generating independently each time.
that “Informages requires the least manual intervention among the To address the trade-off between achieving a more natural semantic
tools but meanwhile limit the combination ways of merge the visual context and preserving data integrity, we can add a slider to the user
element and chart. DataQuilt primarily automates batch operations interface, which allows users to adjust the level of emphasis on each
for mapping elements to data, but users still need to consider how to aspect according to their specific needs and preferences. Additionally,
adapt the visualizations. ChartSpark, on the other hand, incorporates we can enable users to edit the generated visual elements through an
data consideration when providing visual elements”. Furthermore, instruction-based approach, as demonstrated in [7]. This not only re-
Participant D3 pointed out that data annotation and axis automation are duces the need for regenerating the entire visualization but also enables
available in ChartSpark, which could decrease the workload during the fine-grained editing for localized changes in the visual elements.
creation process. Enhance faithfulness in data expression. In order to enhance the
faithfulness of ChartSpark, there are two potential areas of improve-
7 D ISCUSSION ment that should be considered. One is to enhance the evaluation
module in the current system. Although numerical values and error
By utilizing the generative model, we are able to embed both semantic locations are provided to users, it may not be explicit enough to guide
context and data information in crafting a pictorial visualization. How- them in rectifying visual elements. Therefore, more detailed inferences
ever, our system still has several limitations that need to be addressed and specific suggestions should be provided. The other way to improve
in future work. faithfulness is to mitigate potential distortions that may occur during
the creation process. It is important to conduct thorough detection
7.1 Limitation and analysis at each stage of the generation process. This can help
Diversity of supported visualization. Currently, ChartSpark only to identify any potential issues or errors that may negatively impact
supports a limited set of chart types, including bar charts, line plots, pie the performance of the generation process. For instance, when target
charts, and scatter plots. This limitation can be problematic when users objects are not appropriately identified, unsuccessful element extraction
require more diverse chart types to communicate their data effectively. may occur, resulting in poor-quality visualizations. Similarly, back-
Controllability of generative model. The results generated by our ground removal may perform poorly if incomplete removal is done or
tool may be unexpected for users due to various factors. The prompt if the wrong part of the image is removed. By leveraging the results
design is crucial in achieving specific requirements, limiting the control- of detection and analysis, we can provide users with better generated
lability of generation. Even minor edits to the prompt can significantly images by avoiding various types of failures that can compromise the
alter the generated output. Furthermore, inconsistencies in style may overall quality and accuracy of the visualizations.
arise as the visual elements are generated separately and independently.
Due to the limited manipulations for generated elements as we mainly 8 C ONCLUSION
focus on the generation process, users may find it challenging to make In this paper, we propose a novel framework named ChartSpark to
minor adjustments. For example, a user may want to modify the color create pictorial visualizations. The framework employs a text-to-image
of a generated cherry blossom slightly, but maintaining its shape can be generative model to integrate both the semantic context and chart in-
difficult. In addition, the copyright of pictorial visualizations generated formation. ChartSpark is flexible and compatible with both uncon-
ditional and conditional methods to fuse the semantic context with [21] J. Hullman, E. Adar, and P. Shah. Benefitting InfoVis with visual diffi-
charts. It is also versatile and can be applied to both foreground and culties. IEEE Trans. Vis. Comput. Graph., 17(12):2213–2222, 2011. 1,
background visualizations. Additionally, we have developed a user 2
interface that integrates feature extraction, generation, and evalua- [22] T. Karras, S. Laine, and T. Aila. A style-based generator architecture for
tion to facilitate the creation process. To evaluate the effectiveness generative adversarial networks. In Proc. CVPR, pp. 4401–4410, 2019. 2
of ChartSpark, we present several cases covering the four main forms [23] N. W. Kim, E. Schweickart, Z. Liu, M. Dontcheva, W. Li, J. Popovic,
of pictorial visualizations. The feedback from user studies and expert and H. Pfister. Data-driven guides: Supporting expressive design for
interviews demonstrates the framework’s effectiveness. Source code information graphics. IEEE Trans. Vis. Comput. Graph., 23(1):491–500,
of the model and interface, and the pictorial visualization corpus are 2016. 2
[24] S. M. Kosslyn. Understanding charts and graphs. Applied Cognitive
released at https://github.com/SerendipitysX/ChartSpark to
Psychology, 3(3):185–225, 1989. 2
promote future work in this direction. [25] X. Lan, Y. Shi, Y. Zhang, and N. Cao. Smile or scowl? looking at
infographic design through the affective lens. IEEE Trans. Vis. Comput.
R EFERENCES Graph., 27(6):2796–2807, 2021. 2
[1] Piktochart. https://piktochart.com/, last accessed on 22/02/2023. 1 [26] X. Lan, Y. Shi, Y. Zhang, and N. Cao. Smile or scowl? looking at
[2] S. Bateman, R. L. Mandryk, C. Gutwin, A. Genest, D. McDine, and infographic design through the affective lens. IEEE Trans. Vis. Comput.
C. Brooks. Useful junk? the effects of visual embellishment on compre- Graph., 27(6):2796–2807, 2021. 3
hension and memorability of charts. In Proc. ACM CHI, pp. 2573–2582, [27] Y. Li, H. Liu, Q. Wu, F. Mu, J. Yang, J. Gao, C. Li, and Y. J. Lee. GLIGEN:
2010. 1, 2 open-set grounded text-to-image generation. In Proc. CVPR, 2023. 2
[3] R. Borgo, A. Abdul-Rahman, F. Mohamed, P. W. Grant, I. Reppa, [28] M. Lu, C. Wang, J. Lanir, N. Zhao, H. Pfister, D. Cohen-Or, and H. Huang.
L. Floridi, and M. Chen. An empirical study on using visual embel- Exploring visual information flows in infographics. In Proc. ACM CHI, p.
lishments in visualization. IEEE Trans. Vis. Comput. Graph., 18(12):2759– 1–12, 2020. 2
2768, 2012. 1, 2 [29] T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of
[4] M. A. Borkin, Z. Bylinskii, N. W. Kim, C. M. Bainbridge, C. S. Yeh, word representations in vector space. arXiv preprint arXiv:1301.3781,
D. Borkin, H. Pfister, and A. Oliva. Beyond memorability: Visualization 2013. 4
recognition and recall. IEEE Trans. Vis. Comput. Graph., 22(1):519–528, [30] A. V. Moere and H. Purchase. On the role of design in information
2015. 1, 2 visualization. Information Visualization, 10(4):356–371, 2011. 1, 2
[5] M. A. Borkin, A. A. Vo, Z. Bylinskii, P. Isola, S. Sunkavalli, A. Oliva, [31] A. V. Moere, M. Tomitsch, C. Wimmer, B. Christoph, and T. Grechenig.
and H. Pfister. What makes a visualization memorable? IEEE Trans. Vis. Evaluating the effect of style in information visualization. IEEE Trans.
Comput. Graph., 19(12):2306–2315, 2013. 1 Vis. Comput. Graph., 18(12):2739–2748, 2012. 1, 2
[6] J. Boy, A. V. Pandey, J. Emerson, M. Satterthwaite, O. Nov, and E. Bertini. [32] L. Morais, Y. Jansen, N. Andrade, and P. Dragicevic. Showing data about
Showing people behind data: Does anthropomorphizing visualizations people: A design space of anthropographics. IEEE Trans. Vis. Comput.
elicit more empathy for human rights data? In Proc. ACM CHI, pp. Graph., 28(3):1661–1679, 2020. 2
5462–5474, 2017. 2 [33] L. Morais, Y. Jansen, N. Andrade, and P. Dragicevic. Showing data about
[7] T. Brooks, A. Holynski, and A. A. Efros. Instructpix2pix: Learning to people: A design space of anthropographics. IEEE Trans. Vis. Comput.
follow image editing instructions. arXiv preprint arXiv:2211.09800, 2022. Graph., 28(3):1661–1679, 2022. 2
9 [34] C. Qian, S. Sun, W. Cui, J.-G. Lou, H. Zhang, and D. Zhang. Retrieve-
[8] H. Chefer, Y. Alaluf, Y. Vinker, L. Wolf, and D. Cohen-Or. Attend-and- then-adapt: Example-based automatic generation for proportion-related
excite: Attention-based semantic guidance for text-to-image diffusion infographics. IEEE Trans. Vis. Comput. Graph., 27(2):443–452, 2020. 1,
models. arXiv preprint arXiv:2301.13826, 2023. 2 2
[9] Z. Chen, Y. Wang, Q. Wang, Y. Wang, and H. Qu. Towards automated [35] X. Qin, H. Dai, X. Hu, D.-P. Fan, L. Shao, and L. Van Gool. Highly
infographic design: Deep learning-based auto-extraction of extensible accurate dichotomous image segmentation. In Proc. ECCV, pp. 38–56,
timeline. IEEE Trans. Vis. Comput. Graph., 26(1):917–926, 2019. 1, 2 2022. 5
[10] W. S. Cleveland and R. McGill. Graphical perception: Theory, experimen- [36] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal,
tation, and application to the development of graphical methods. Journal G. Sastry, A. Askell, P. Mishkin, J. Clark, et al. Learning transferable
of the American Statistical Association, 79(387):531–554, 1984. 2 visual models from natural language supervision. In Proc. ICML, pp.
[11] D. Coelho and K. Mueller. Infomages: Embedding data into thematic 8748–8763, 2021. 2
images. Comput. Graph. Forum, 39(3):593–606, 2020. 1, 2, 3, 6, 8 [37] A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, and M. Chen. Hierarchi-
[12] W. Cui, X. Zhang, Y. Wang, H. Huang, B. Chen, L. Fang, H. Zhang, J.-G. cal text-conditional image generation with CLIP latents. arXiv preprint
Lou, and D. Zhang. Text-to-viz: Automatic generation of infographics arXiv:2204.06125, 2022. 2
from proportion-related natural language statements. IEEE Trans. Vis. [38] A. Ramesh, M. Pavlov, G. Goh, S. Gray, C. Voss, A. Radford, M. Chen,
Comput. Graph., 26(1):906–916, 2019. 2 and I. Sutskever. Zero-shot text-to-image generation. In Proc. ICML, pp.
[13] P. Dhariwal and A. Nichol. Diffusion models beat gans on image synthesis. 8821–8831, 2021. 2
In Proc. NIPS, vol. 34, pp. 8780–8794, 2021. 2, 4 [39] M. M. Rashid, H. K. Jahan, A. Huzzat, R. A. Rahul, T. B. Zakir, F. Meem,
[14] S. Few and P. Edge. The chartjunk debate. Visual Business Intelligence M. S. H. Mukta, and S. Shatabda. Text2chart: A multi-staged chart
Newsletter, pp. 1–11, 2011. 1, 2 generator from natural language text. In Proc. PAKDD, pp. 3–16, 2022. 2
[15] R. Gal, Y. Alaluf, Y. Atzmon, O. Patashnik, A. H. Bermano, G. Chechik, [40] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer. High-
and D. Cohen-Or. An image is worth one word: Personaliz- resolution image synthesis with latent diffusion models. In Proc. CVPR,
ing text-to-image generation using textual inversion. arXiv preprint pp. 10684–10695, 2022. 2, 4
arXiv:2208.01618, 2022. 2 [41] N. Ruiz, Y. Li, V. Jampani, Y. Pritch, M. Rubinstein, and K. Aberman.
[16] R. Gal, O. Patashnik, H. Maron, A. H. Bermano, G. Chechik, and Dreambooth: Fine tuning text-to-image diffusion models for subject-
D. Cohen-Or. StyleGAN-NADA: CLIP-guided domain adaptation of driven generation. arXiv preprint arXiv:2208.12242, 2022. 2
image generators. ACM Trans. Graph., 41(4):1–13, 2022. 2 [42] C. Saharia, W. Chan, S. Saxena, L. Li, J. Whang, E. Denton, S. K. S.
[17] S. Haroz, R. Kosara, and S. L. Franconeri. Isotype visualization: Working Ghasemipour, B. K. Ayan, S. S. Mahdavi, R. G. Lopes, et al. Photorealistic
memory, performance, and engagement with pictographs. In Proc. ACM text-to-image diffusion models with deep language understanding. arXiv
CHI, pp. 1191–1200, 2015. 1, 2 preprint arXiv:2205.11487, 2022. 2
[18] L. Harrison, K. Reinecke, and R. Chang. Infographic aesthetics: Designing [43] Y. Shi, P. Liu, S. Chen, M. Sun, and N. Cao. Supporting expressive and
for the first impression. In Proc. ACM CHI, pp. 1187–1190, 2015. 1, 2 faithful pictorial visualization design with visual style transfer. IEEE
[19] A. Hertz, R. Mokady, J. Tenenbaum, K. Aberman, Y. Pritch, and D. Cohen- Trans. Vis. Comput. Graph., 29(1):236–246, 2022. 1, 2, 3
Or. Prompt-to-prompt image editing with cross attention control. arXiv [44] J. Song, C. Meng, and S. Ermon. Denoising diffusion implicit models.
preprint arXiv:2208.01626, 2022. 2 arXiv preprint arXiv:2010.02502, 2020. 2, 4
[20] J. Ho, A. Jain, and P. Abbeel. Denoising diffusion probabilistic models. [45] K. Song, X. Tan, T. Qin, J. Lu, and T.-Y. Liu. Mpnet: Masked and
In Proc. NIPS, vol. 33, pp. 6840–6851, 2020. 2, 4 permuted pre-training for language understanding. In Proc. NIPS, vol. 33,
pp. 16857–16867, 2020. 4
[46] E. R. Tufte. The visual display of quantitative information. The Journal
for Healthcare Quality, 7(3):15, 1985. 1, 2
[47] Y. Wang, Z. Hou, L. Shen, T. Wu, J. Wang, H. Huang, H. Zhang, and
D. Zhang. Towards natural language-based visualization authoring. IEEE
Trans. Vis. Comput. Graph., 29(1):1222–1232, 2022. 2
[48] Y. Wang, Z. Sun, H. Zhang, W. Cui, K. Xu, X. Ma, and D. Zhang.
DataShot: Automatic generation of fact sheets from tabular data. IEEE
Trans. Vis. Comput. Graph., 26(1):895–905, 2020. 1
[49] Y. Wang, H. Zhang, H. Huang, X. Chen, Q. Yin, Z. Hou, D. Zhang, Q. Luo,
and H. Qu. Infonice: Easy creation of information graphics. In Proc. ACM
CHI, pp. 1–12, 2018. 2
[50] H. Xia, N. Henry Riche, F. Chevalier, B. De Araujo, and D. Wigdor.
DataInk: Direct and creative data-oriented drawing. In Proc. ACM CHI,
pp. 1–13, 2018. 2
[51] J. Yu, Y. Xu, J. Y. Koh, T. Luong, G. Baid, Z. Wang, V. Vasudevan, A. Ku,
Y. Yang, B. K. Ayan, et al. Scaling autoregressive models for content-rich
text-to-image generation. arXiv preprint arXiv:2206.10789, 2022. 2
[52] J. E. Zhang, N. Sultanum, A. Bezerianos, and F. Chevalier. DataQuilt:
Extracting visual elements from images to craft pictorial visualizations. In
Proc. ACM CHI, pp. 1–13, 2020. 1, 2, 8
[53] L. Zhang and M. Agrawala. Adding conditional control to text-to-image
diffusion models. arXiv preprint arXiv:2302.05543, 2023. 2

You might also like