2024 IEEE Conference on Artificial Intelligence (CAI)
Visualize Music Using Generative Arts
Brian Man-Kit Ng, Samantha Rose Sudhoff, Haichang Li, Joshua Kamphuis, Tim Nadolsky,
Yingjie Chen, Kristen Yeon-Ji Yun, and Yung-Hsiang Lu
Purdue University, West Lafayette, Indiana, USA.
{ng118, ssudhoff, li4560, jpkamphu, tnadolsk, victorchen, yun98, yunglu}@purdue.edu
Abstract—Music is one of the most universal forms of com- Using generative models to produce arts from music has
munication and entertainment across cultures. This can largely several advantages. First, this process can be customized by
be credited to the sense of synesthesia, or the combining of users’ preferences: users may add or remove words interac-
senses. Based on this concept of synesthesia, we want to explore
whether generative AI can create visual representations for tively to produce different visual effects that better match
music. The aim is to inspire the user’s imagination and enhance the mood of the performer and theme of the music. Second,
the user experience when enjoying music. Our approach has the generative arts can be produced quickly and inexpensively. As
following steps: (a) Music is analyzed and classified into multiple a result, this can potentially give musicians a more flexible way
dimensions (including instruments, emotion, tempo, pitch range, to design the performing stage, and give audiences a richer
harmony, and dynamics) to produce textual descriptions. (b) The
texts form inputs of machine models that can predict the genre experience while enjoying the show.
of the input audio. (c) The prompts are inputs of generative The original contributions of this study are the follow-
machine models to create visual representations. The visual ing: (a) The creation of a software system to autonomously
representations are continuously updated as the music plays, generate representative images from music audio using
ensuring that the visual effects aptly mirror the musical changes.
generative artificial intelligence (GAI) methods. (b) A
A comprehensive user study with 88 users confirms that our
approach is able to generate visual art reflecting the music pieces. comprehensive user evaluation of the generated images
From a list of images covering both abstract images and realistic comparing to human-chosen images. We convert music to
images, users considered that our system-generated images can visuals in three steps, as illustrated in Figure 1: (a) Analyze the
better represent pieces of music than human-chosen images. It music based on multiple factors (such as instruments, tempo,
suggests that generative arts can become a promising method to
pitch, and dynamics). (b) Create textural descriptions using
enhance users’ listening experience while enjoying music. Our
method provides a new approach to visualize music and to enjoy Spotify’s music classifier Basic Pitch [5], and OpenSmile’s
music through generative arts. audEERING feature extraction[6]. (c) Generative arts based
Index Terms—Visualize Music; Generative Models of Artifi- on the textual descriptions using pre-trained diffusion mod-
cial Intelligence els [7]. The visual representations can be updated in real-
I. I NTRODUCTION time while the music is played. Music often goes through
multiple phases with different characteristics. For example, a
symphony usually has four movements, and each movement
can have sections with different rhythmic and melodic patterns
to express various emotions and scenes. The generated images
should reflect these dynamic changes in the music.
We used human subjects to evaluate the effectiveness of
our system and examine two aspects: (1) Do these generated
images reflect the music? (2) Do users prefer the images
generated by our system? We generated both abstract images
Fig. 1: The proposed method has three steps: Music Analysis, and realistic images, and compared these generated images
Prompt Generation, and Image Generation. The images change with manually selected images. We used an online survey
as the music is played. Non-AI Image Source: [1] to examine whether the users prefer the system-generated
Music is a multifaceted form of expressions and can be images or the manually selected images. The survey was
felt through humans’ multiple senses. Listening to music is open for one month and 88 people participated. Among their
heavily involved with visual senses: The color spectrum is selections, 58% of respondents prefer the images generated
related to music [2]. It is a common notion that music can by our system. This is significantly higher than the 35% of
express imagery either through music composition techniques images not generated by the system. The remaining 7% select
or the addition of lyrics to tell a story. Composing classical no images. The notable difference (23%) along with a p-value
music has the notion of visualizing figurative arts [3]. Pairing of less then 0.01 determined by a chi-squared test indicates
music with the right visuals and vice versa can often lead to a that generative arts offer a promising solution improving users’
more holistic entertainment experience [4], as done very often enjoyment while listening to music. The survey is available at
with various forms of media (live performing, Karaoke, music https://ai4musicians.org/visualize.html.
TV, cinema, live orchestras, etc.).
U.S. Government work not protected by U.S. copyright 1517
DOI 10.1109/CAI59869.2024.00273
II. R ELATED W ORK with the music. Liu et al. [19] create ”Generative Disco”
A. Generative Artificial Intelligence using human-chosen prompts to generate images. This method
Diffusion models have made recent developments into the takes a text-to-image approach rather than music-to-image,
field of computer vision [8]; image generation is one of the and focuses on utilizing user-inputs and lyrics as a medium
most common applications. Stable Diffusion [9] has been for determining prompts in generating images. It is labour
widely used for AI generated images. The model is primarily intensive and will be hard to create images in real time.
based on using prompts as inputs; these prompts allow images Betin [20] stylizes existing images based on an audio input in
to be retroactively adjusted [10]. real-time. The method serves primarily as an abstract image
The visual notion of music has been investigated in several adjustment based on existing image’s structure and changes
studies. Braganca et al. [11] evaluate the cross-modal associa- the color styling based on the physical elements of a Mel
tion of sensations and their relationship to musical perception Spectrogram. Hence, the result is not full image generation, but
with a focus on synesthesia. Actis-Grosso et al. [3] explore rather image alteration. Table I compares the proposed method
similarities between music and visual arts. Modem Works [12] with existing methods. Our goal is to create imagery that is
utilizes Stable Diffusion and Teenage Engineering’s OP-Z more connected to music, improving the user experience.
track sequencer and synthesizer to translate music into im- TABLE I: Comparison of Methods.
agery. Cowles [13] experiments on pairing audio with visual Method Approach Features
stimuli; correlations were found between subjects choosing Modem [12] Prompt Generation MIDI Generated Images
Liu [19] Prompt Utilization (lyrics) Specialized Text-to-Image
certain selected images and music. Gayen et al. [14] find Betin [20] Signal Processing Image Alteration
common trends in painted depictions of music with contrasting This paper Prompt Generation Real-Time Music-to-Image
emotional tones. Wehner [15] uses paintings and music from
Paul Klee to test and evaluate the ability of people to correlate III. V ISUALIZE M USIC BY G ENERATIVE A RTIFICIAL
paintings with music. Inspired by such prior works that show I NTELLIGENCE
the close relationships between visual art and music, this paper Our approach entails interpreting musical elements and
further uses generative machine models to produce visual incorporating additional features, such as chord-analysis, to
representations based on input music. train based on the styles of existing music. To generate images
B. Visualizing Music from music, text prompts serve as an intermediary bridging the
Identifying music through a generative model can be done two mediums (sound and visual). The overall flow can be seen
through several methods depending on how music data is in Figure 2 and will be discussed in the following subsections.
interpreted. The common forms of music data are MIDI (Mu-
sical Instrument Digital Interface) files and signal processing
techniques like Mel Spectrograms [16]. The former represents
music as a digitized pattern of notes and the latter represents
music as a non-linear transformation on the frequency scale
of an audio file. MusicBert [17] uses MIDI to develop a
“Symbolic Music Representation” to analyze music through
patterns of notes. Riffusion [18] (a fine-tuned Stable Diffusion
model) uses Mel Spectrograms to analyze music as images
to train a convolutional neural network (CNN) to match to
existing spectrograms. Such tools and their models can be
effectively trained to classify digitized audio inputs into music
genres; however, an issue arises when it comes to expanding
these classifications into descriptive image generation. The use
of prompts as descriptive tags, aiming to apply them equally to
both auditory and visual experiences, reintroduces the concept
of synesthesia [11]. The subjective nature of synesthetic per-
ceptions acts as an abstract association in achieving seamless
audio-to-image generation.
C. Comparisons
Several methods have used AI models to generate images
from music. Modem’s OP-Z/Stable Diffusion [12] utilizes
prompt engineering to provide imagery from solely MIDI Fig. 2: The process of generating image from music. It starts
inputs. Using MIDI considers basic music elements but lacks with music analysis. A neural network predicts music genre,
over-encompassing details such as genre, instrumentation, or tempo, and emotional values. An prompts was generated from
contextual clues from chord progressions. As such, the results the prediction, and passed into Stable Diffusion for image
are mostly abstract images that lack contextual connection generation.
1518
A. Music Analysis alignment of emotions conveyed by both pictures and music.
We start with analyzing several metrics form the music’s After we have our final prompt, we then feed it to a diffusion-
audio recording and MIDI file. We calculate both temporal and type image-generating model to get our set of images.
physical statistics about the audio using spectrogram analysis IV. H UMAN - SUBJECT E VALUATION AND S TUDY R ESULTS
such as root mean square (RMS) amplitude, spectral width
and centroid, etc., as well as musical data such as pitch,
overall chord patterns and tempo. We used Spotify’s Basic
Pitch [5] to extract MIDI features through chords and pitch,
and OpenSMILE[6] to extract audio features.
B. Emotion/Genre Analysis
We then feed these calculated metrics into a fully con-
nected neural network. We use feed-forward neural networks
to estimate the genre of the music piece and valence-arousal
emotion values. Emotions are measured in terms of valence
(how positive or negative an emotion feels) and arousal
(how intensely the emotion is felt) via the Valence-Arousal
Model [21]. This can be visualized as positive and negative
values on a coordinate graph.
C. Prompt Generation
Fig. 3: Examples of generated images from the system.
Based on these estimates, we use k-nearest neighbors to Beethoven Symphony No. 5 is depicted with imagery of a
assign a set of prompt words to the music (such as genre, thunder storm or a bird on fire, while the more mellow Mozart
emotional words, colors, etc), where k is 1 as prompt features Violin Sonata No. 21 both indicate the violin instrumentation
are relatively distinct. We would like these initial prompts and also an overall brighter color palate.
to relate to the lighting and colors in the generated artwork.
To evaluate the efficacy of our method, we conduct an
For example, when an emotion like “anger” is detected (one
online human-subject study to answer the question: “Can
with a high positive valence and arousal), the generated image
generative visual arts reflect the rich expressions of music?”,
should use saturated colors such as vibrant reds or dark purples
and ”Do audiences like the generated visual?”. In the study,
and black. The subject of the artwork will be also based on
we evaluated the visual arts generated from different pieces
the genre of the input music. As in the case of Figure 3,
of music. After hearing a piece of music, a user selects an
the first passages of Beethoven Symphony No. 5 is classified
image that can best reflect the music. The options include three
with the emotional prompts of ”angry”, ”aggressive”, and
types of images (1) generated by our system, (2) chosen by
”violent”. This results in the images having a theme of either
human (members in this research team), (3) generated based
red or black hues. Additional analysis on the MIDI chords
on other pieces of music. If our system-generated images
and MeL spectrograms defines the genre as a classical work,
are preferable by the majority of the users, our system can
which contributes to the painted texture of the images. Further
effectively produce visual representations reflecting the music.
adjustment of the prompts through “prompt modifiers” [10]
can help generate specific details and variations in the images. A. User Profiles
We produce images using various prompts for each genre We send emails to students and faculty at Purdue and
including solo performances, chamber music, symphony or- collect 88 responses. Among them 62.5% are male and 31.8%
chestras (including concertos), choirs (accompanied by piano are female. Most subjects (84.1%) are within the age range of
or orchestra), and operas/ballets. 18-24. Many of our participants are either student musicians
D. Image Generation (35.2%) or play an instrument for leisure (33.0%).
Finally, once these prompts are generated, we introduce B. Music
some random image-related words into the prompt (such as This study uses 15 pieces of classical music with each 10
camera angle, movement, framing, etc.) to add variation to seconds long. The pieces are chosen from 5 major classical
the resultant image. LLMs (Large Language Models) can music genres: choir, opera and ballet, chamber music, solo per-
comprehend valence-arousal emotion values and provide feed- formance, and larger group of ensemble (orchestra or band).
back on the represented emotions. Therefore, in this process, Three pieces per genre. These pieces are well-known and
the initially obtained valence-arousal emotion values will be representative for its category i.e. Beethoven’s 9th Symphony
collectively inputted into the LLMs. Once these fundamental (Choir) and Bach Cello Suite No. 1 Prelude (Solo). When
elements composing the prompt are acquired, the GPT-4 [22] selecting the pieces, we considered a diverse set of musical
LLM will be introduced to assist in prompt engineering for features such that our system can be generalized broadly.
more detailed image generation. Additionally, throughout this C. Visual Representations of Music
process, the LLM is emphasized to consistently maintain the For each music piece, our system generates six images
(per trial). For comparison, musicians in our team select
1519
TABLE II: Proportion of Images Chosen & Expected Values.
Subjectivity Level: Realistic Abstract
System Expected % 40.2% 50.4%
System User Chosen % 53.0% 69.0%
Non-System Expected % 54.9% 39.8%
Non-System User Chosen % 47.0% 29.6%
Distraction Expected % 4.9% 9.7%
Distraction User Chosen % 0.0% 1.4%
P-Value < 0.01 < 0.01
Fig. 4: A sample question. The user was asked to choose a
image best fits the music. Non-AI Image sources: [23], [24].
makeups provided by the 195 total images included in the sur-
vey. However, the percentage of the system-generated images
six images manually from three online image repositories: chosen by users is much higher than the actual percentage of
Pexels, Pixabay, and Unsplash. These images also reflect images included in the survey. Figure 5 (a) and (b) show the
the music pieces based on the musicians’ judgement. The percentages of selections and options of abstract images. The
manually selected images are used for comparison against the images generated are 50.4% of all image options, but counted
system-generated images. If the users prefer system-generated to 69.0% of users’ selections. In contrast, the other 49.6% of
images to human-chosen images, it suggests that our system images only counted to 31.0% in users’ selections. Similarly,
can generate images that are closer to the music than those for realistic images, users prefer system generated images
manually selected images. This in turn suggests the viability of (45.8% options counted to 52.3% users selected). Chi-square
generated images in accurately representing music on human analysis (table II) shows that there is a statistically significant
standards. Also, to ensure that users can select the images preference for trial images found for both the realistic and
that truly represent the specific piece of music, we include abstract images. The p-values for both realistic and abstract
a system-generated image from a different piece of music images are less than 0.01. Consequently, this suggests that
(distraction). This image does not reflect the current music. users perceive the images generated by our system as better
This distraction aims to confirm that users can distinguish if representations of the music than human-chosen images
an image represents the music or not. In total, for each piece For triangulation, we also examined if users are able to
of music, thirteen images are available. identify images that do not reflect the music. In each question,
This study considers images of different styles to avoid there is one distraction image out of 5 possible images. If users
possible preference bias due to styles. We classify the images randomly choose an image, we should expect the proportion
into abstract and realistic. Realistic arts depict the subject of distraction images selected to be slightly lower than 20%
matter with a high degree of fidelity to its real-world appear- (due to the “None of the Above” option available to users).
ance; abstract forms use colors, shapes, lines, and forms to However, the total percentage of distraction images chosen
convey emotions, ideas, or concepts. A user may have a strong during the survey was less than 1%, signifying that users are
preference for one certain style. To ensure we are comparing able to tell which images do not reflect the music.
similar styles of images, we categorize each image as either Overall, the total percentage of system images chosen in
realistic or abstract. Figure 3 shows several examples. The the survey is 58%, the percentage of human-chosen images
survey includes 82 photos or realistic images and 113 abstract chosen is 35%, and the remaining percentage is comprised of
images, total 195 images. “None of the Above” choices. The total number of selections
D. Questionnaire by users are 7 + 150 + 349 + 183 + 206 + 61 (None of the
We designed 15 questions. During survey, a user receives Above) = 956. Users select generated images 349 + 206 = 555
10 random questions plus one additional question measures times. The ratio is 555
956 = 58%. Users select non-system images
users’ preferences of subjectivity (toal 11 questions). Figure 4 150 + 183 = 333 times. The ratio is 333 956 = 35%. The p-value
is an example of a question. Each question includes a 10- across both subjectivity levels is less than 0.01. This signifies
second music clip. The user clicks the button to play the music. that our system creates effective visual representations of
The system selects four images that may be generated by our music that are more preferred by users. Additionally, our
system (trial, also called system-generated) or human-chosen. distraction images test shows that users are able to tell which
Additionally, one distraction image is included to detect style images are not correspond to the musical clips. This suggests
bias. The user may also select ”None of the images”. that the System-generated images are preferred over human-
E. Result and Analysis selected images not because of their type, but due to their
Figure 5 shows user’s preferences between system- meaningful representation of the music..
generated and human-chosen images as representations of We further examined all the 15 music pieces used in
the given music clips, as well as their subjectivity level this survey. Among the 15 music pieces in our survey, each
preferences. If users had selected images randomly, the ex- of these pieces receives a different level of preference for
pected numbers of system-generated images and non-system- system-generated images as shown in Figure 6. The piece
generated images chosen would have followed the percentage in our survey with the highest proportion (best system per-
formance) of system-generated images is Albeniz’s Asturias,
1520
(a) abstract, selected (b) abstract, options (c) realistic, selected (d) realistic, options
Fig. 5: The survey results. (a)(b) Abstract style. (c)(d) Realistic style. (a) Users select system-generated images 349 times
(68.97%) and images not generated by our system 150 times (29.64%). (b) Only 50.4% images are system generated. (c) Users
select system-generated images 206 times (52.2%). (d) Only 45.8% images are system-generated. The users selected “None
of the images” 61 times which is not represented in the pie charts.
Additionally, the majority of our users are either White or
Asian (91.0%), and the majority (69.3%) have played music
instruments. Our future work may analyze the relationships of
user demographic and musical experience along with defining
a concrete qualitative evaluation of results with a more diverse
study group. This study considers only classical music. A
future study should consider other types of music, such as
jazz, rock, and pop.
Fig. 6: Percentages of system-generated chosen by users for
different composers. The figure shows 7 of the 15 composers
in our survey.
where 5065 = 76.9% of the images selected by users are system-
generated. The piece with the lowest proportion (worst system
performance) of system-generated images is Tchaikovsky’s
Piano Concerto No. 1, with 22 62 = 35.5% of images chosen by
users for this piece. There is a large difference between the
largest and smallest percentage of system-generated images
chosen between pieces, suggesting that our system may not Fig. 7: Our system in live Cello performing
able to equally visualize different types of musics. B. Applications
V. D ISCUSSION There lies a great opportunity in image generation for
A. Limitations entertainment and enhancing the user experience when lis-
The p-values for both the abstract and realistic subjectivity tening to music. Real-time implementations can decorate a
levels are less than 0.01. We conclude that there is a statis- space being used for social events (i.e. karaoke, clubs, parties)
tically meaningful preference for system-generated images as as a more immersive substitute to music videos, ambient
opposed to human-chosen images. However, there are several lighting, or still images. Musicians can efficiently provide
limitations found both in the selected user base for our survey a visual experience to the performance that surpasses their
as well as through the organization of our survey questions. own capabilities. The generated images can provide users
Also, it seems our system’s performance varies when dealing with hearing-impairments a visual outlet to enjoy music.
with different music. Is there a systematic difference (i.e. Other works have shown these possibilities like with Liu’s
always perform worse on certain types of music), or just ”Generative Disco” [19] or Betin’s ”Visualizing Sound with
random error, still needs more investigation. AI” [20]. Our method can provide human-interpreted image
The majority of our users fall into the age range of quality in these applications. Recently we have put our system
18-25 (84.1%) because the place (university) of this study. in a live performing event (Fig. 7 https://www.youtube.com/
1521
watch?v=LF172wWu2jU). The system runs smoothly. It saved and Machine Intelligence, 45(9):10850–10869, Septem-
a lot of effort from the performer in choosing images for ber 2023. Conference Name: IEEE Transactions on
the background visual effect of the music. The performer and Pattern Analysis and Machine Intelligence.
audience feel the generated image at the background largely [9] Robin Rombach, Andreas Blattmann, Dominik Lorenz,
reflect the nature and characteristics of the music. Patrick Esser, and Björn Ommer. High-resolution image
VI. C ONCLUSION synthesis with latent diffusion models, 2022.
This paper presents a study using generative artificial [10] Jonas Oppenlaender. A taxonomy of prompt modifiers
intelligence to visualize music. Our system analyzes music for text-to-image generation. Behaviour amp; Informa-
by multiple elements, such as instruments, tempo, emotion, tion Technology, page 1–14, November 2023.
pitch, and generates text prompts. The prompts are then input [11] Guilherme Francisco F Bragança, João Gabriel Marques
to diffusion models to produce images. A user study indicates Fonseca, and Paulo Caramelli. Synesthesia and music
that this approach can effectively reflect the rich expression of perception. Dementia & neuropsychologia, 9:16–23,
music. 2015.
[12] Modem. Op-z stable diffusion. https://modemworks.com/
ACKNOWLEDGMENTS
projects/op-z-stable-diffusion/, Jan 2023.
We appreciate the support from the sponsors and the people
[13] John T. Cowles. An experimental study of the pairing
that participated in the survey. This work is supported in part
of certain auditory and visual stimuli. Journal of Exper-
by NSF IIS-2326198 and by the CREATE program of Purdue.
imental Psychology, 18(4):461–469, 1935.
Any opinions, findings, and conclusions or recommendations
[14] Pinaki Gayen, Junmoni Borgohain, and Priyadarshi Pat-
expressed in this paper are those of the authors and do not
naik. The Influence of Music on Image Making: An Ex-
necessarily reflect the views of the sponsors.
ploration of Intermediality Between Music Interpretation
R EFERENCES and Figurative Representation, pages 285–293. 06 2021.
[1] Cellist man clipart, music vintage. https://openverse.org/ [15] Walter L. Wehner. The relation between six paintings by
image/7962407e-1be8-4123-a3d7-7b1449f65c3b. paul klee and selected musical compositions. Journal of
[2] Charles Spence and Nicola Di Stefano. Coloured hearing, Research in Music Education, 14(3):220–224, 1966.
colour music, colour organs, and the search for percep- [16] Hugo B. Lima, Carlos G. R. Dos Santos, and Bianchi S.
tually meaningful correspondences between colour and Meiguins. A Survey of Music Visualization Tech-
sound. i-Perception, 13(3):20416695221092802, 2022. niques. ACM Computing Surveys, 54(7):143:1–143:29,
PMID: 35572076. July 2021.
[3] Rossana Actis-Grosso, Carlotta Lega, Alessandro Zani, [17] Mingliang Zeng, Xu Tan, Rui Wang, Zeqian Ju, Tao Qin,
Olga Daneyko, Zaira Cattaneo, and Daniele Zavagno. and Tie-Yan Liu. MusicBERT: Symbolic Music Un-
Can music be figurative? exploring the possibility of derstanding with Large-Scale Pre-Training, June 2021.
crossmodal similarities between music and visual arts. arXiv:2106.05630 [cs].
Psihologija, 50:285–306, 01 2017. [18] Seth Forsgren and Hayk Martiros. Riffusion - Stable
[4] Mats B Küssner and Tuomas Eerola. The content and diffusion for real-time music generation. https://github.
functions of vivid and soothing visual imagery during com/riffusion/riffusion, 2022.
music listening: Findings from a survey study. Psy- [19] Vivian Liu, Tao Long, Nathan Raw, and Lydia Chilton.
chomusicology: Music, Mind, and Brain, 29:90, 2019. Generative disco: Text-to-video generation for music
[5] Rachel M. Bittner, Juan José Bosch, David Rubinstein, visualization, 2023. arXiv:2304.08551 [cs].
Gabriel Meseguer-Brocal, and Sebastian Ewert. A [20] Vasily Betin. Visualizing sound with ai. Medium, May
lightweight instrument-agnostic model for polyphonic 2020.
note transcription and multipitch estimation. In IEEE In- [21] Saikat Basu, Nabakumar Jana, Arnab Bag, Mahadevappa
ternational Conference on Acoustics, Speech, and Signal M, Jayanta Mukherjee, Somesh Kumar, and Rajlakshmi
Processing, 2022. Guha. Emotion recognition based on physiological
[6] Florian Eyben, Martin Wöllmer, and Björn Schuller. signals using valence-arousal model. In International
Opensmile: The munich versatile and fast open-source Conference on Image Information Processing, pages 50–
audio feature extractor. In ACM International Conference 55, 2015.
on Multimedia, page 1459–1462, 2010. [22] Josh Achiam et. al. Gpt-4 technical report. Technical
[7] Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav report, OpenAI, 2023. arXiv:2303.08774 [cs].
Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, [23] Pixabay. Light sun cloud japan. https://www.pexels.com/
and Mark Chen. GLIDE: Towards Photorealistic Im- photo/light-sun-cloud-japan-45848/, February 2016.
age Generation and Editing with Text-Guided Diffusion [24] Prawny. Abstract painting country golden.
Models, March 2022. arXiv:2112.10741 [cs]. https://pixabay.com/illustrations/abstract-painting-
[8] Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor country-golden-5985987/, February 2021.
Ionescu, and Mubarak Shah. Diffusion Models in Vi-
sion: A Survey. IEEE Transactions on Pattern Analysis
1522