Visualize Music Using Generative Arts

The paper presents a method for visualizing music through generative arts, utilizing AI to create real-time visual representations based on musical elements such as genre, emotion, and tempo. A user study with 88 participants showed that the system-generated images were preferred over human-chosen images, indicating the effectiveness of this approach in enhancing the music listening experience. The process involves analyzing music, generating descriptive prompts, and using diffusion models to create visuals that reflect the dynamic changes in the music.

Uploaded by

workingsom0118

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views6 pages

Visualize Music Using Generative Arts

Uploaded by

workingsom0118

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

2024 IEEE Conference on Artificial Intelligence (CAI)

Visualize Music Using Generative Arts

Brian Man-Kit Ng, Samantha Rose Sudhoff, Haichang Li, Joshua Kamphuis, Tim Nadolsky,
Yingjie Chen, Kristen Yeon-Ji Yun, and Yung-Hsiang Lu
Purdue University, West Lafayette, Indiana, USA.
{ng118, ssudhoff, li4560, jpkamphu, tnadolsk, victorchen, yun98, yunglu}@purdue.edu

Abstract—Music is one of the most universal forms of com- Using generative models to produce arts from music has
munication and entertainment across cultures. This can largely several advantages. First, this process can be customized by
be credited to the sense of synesthesia, or the combining of users’ preferences: users may add or remove words interac-
senses. Based on this concept of synesthesia, we want to explore
whether generative AI can create visual representations for tively to produce different visual effects that better match
music. The aim is to inspire the user’s imagination and enhance the mood of the performer and theme of the music. Second,
the user experience when enjoying music. Our approach has the generative arts can be produced quickly and inexpensively. As
following steps: (a) Music is analyzed and classified into multiple a result, this can potentially give musicians a more flexible way
dimensions (including instruments, emotion, tempo, pitch range, to design the performing stage, and give audiences a richer
harmony, and dynamics) to produce textual descriptions. (b) The
texts form inputs of machine models that can predict the genre experience while enjoying the show.
of the input audio. (c) The prompts are inputs of generative The original contributions of this study are the follow-
machine models to create visual representations. The visual ing: (a) The creation of a software system to autonomously
representations are continuously updated as the music plays, generate representative images from music audio using
ensuring that the visual effects aptly mirror the musical changes.
generative artificial intelligence (GAI) methods. (b) A
A comprehensive user study with 88 users confirms that our
approach is able to generate visual art reflecting the music pieces. comprehensive user evaluation of the generated images
From a list of images covering both abstract images and realistic comparing to human-chosen images. We convert music to
images, users considered that our system-generated images can visuals in three steps, as illustrated in Figure 1: (a) Analyze the
better represent pieces of music than human-chosen images. It music based on multiple factors (such as instruments, tempo,
suggests that generative arts can become a promising method to
pitch, and dynamics). (b) Create textural descriptions using
enhance users’ listening experience while enjoying music. Our
method provides a new approach to visualize music and to enjoy Spotify’s music classifier Basic Pitch [5], and OpenSmile’s
music through generative arts. audEERING feature extraction[6]. (c) Generative arts based
Index Terms—Visualize Music; Generative Models of Artifi- on the textual descriptions using pre-trained diffusion mod-
cial Intelligence els [7]. The visual representations can be updated in real-
I. I NTRODUCTION time while the music is played. Music often goes through
multiple phases with different characteristics. For example, a
symphony usually has four movements, and each movement
can have sections with different rhythmic and melodic patterns
to express various emotions and scenes. The generated images
should reflect these dynamic changes in the music.
We used human subjects to evaluate the effectiveness of
our system and examine two aspects: (1) Do these generated
images reflect the music? (2) Do users prefer the images
generated by our system? We generated both abstract images
Fig. 1: The proposed method has three steps: Music Analysis, and realistic images, and compared these generated images
Prompt Generation, and Image Generation. The images change with manually selected images. We used an online survey
as the music is played. Non-AI Image Source: [1] to examine whether the users prefer the system-generated
Music is a multifaceted form of expressions and can be images or the manually selected images. The survey was
felt through humans’ multiple senses. Listening to music is open for one month and 88 people participated. Among their
heavily involved with visual senses: The color spectrum is selections, 58% of respondents prefer the images generated
related to music [2]. It is a common notion that music can by our system. This is significantly higher than the 35% of
express imagery either through music composition techniques images not generated by the system. The remaining 7% select
or the addition of lyrics to tell a story. Composing classical no images. The notable difference (23%) along with a p-value
music has the notion of visualizing figurative arts [3]. Pairing of less then 0.01 determined by a chi-squared test indicates
music with the right visuals and vice versa can often lead to a that generative arts offer a promising solution improving users’
more holistic entertainment experience [4], as done very often enjoyment while listening to music. The survey is available at
with various forms of media (live performing, Karaoke, music https://ai4musicians.org/visualize.html.
TV, cinema, live orchestras, etc.).

U.S. Government work not protected by U.S. copyright 1517

DOI 10.1109/CAI59869.2024.00273
II. R ELATED W ORK with the music. Liu et al. [19] create ”Generative Disco”
A. Generative Artificial Intelligence using human-chosen prompts to generate images. This method
Diffusion models have made recent developments into the takes a text-to-image approach rather than music-to-image,
field of computer vision [8]; image generation is one of the and focuses on utilizing user-inputs and lyrics as a medium
most common applications. Stable Diffusion [9] has been for determining prompts in generating images. It is labour
widely used for AI generated images. The model is primarily intensive and will be hard to create images in real time.
based on using prompts as inputs; these prompts allow images Betin [20] stylizes existing images based on an audio input in
to be retroactively adjusted [10]. real-time. The method serves primarily as an abstract image
The visual notion of music has been investigated in several adjustment based on existing image’s structure and changes
studies. Braganca et al. [11] evaluate the cross-modal associa- the color styling based on the physical elements of a Mel
tion of sensations and their relationship to musical perception Spectrogram. Hence, the result is not full image generation, but
with a focus on synesthesia. Actis-Grosso et al. [3] explore rather image alteration. Table I compares the proposed method
similarities between music and visual arts. Modem Works [12] with existing methods. Our goal is to create imagery that is
utilizes Stable Diffusion and Teenage Engineering’s OP-Z more connected to music, improving the user experience.
track sequencer and synthesizer to translate music into im- TABLE I: Comparison of Methods.
agery. Cowles [13] experiments on pairing audio with visual Method Approach Features
stimuli; correlations were found between subjects choosing Modem [12] Prompt Generation MIDI Generated Images
Liu [19] Prompt Utilization (lyrics) Specialized Text-to-Image
certain selected images and music. Gayen et al. [14] find Betin [20] Signal Processing Image Alteration
common trends in painted depictions of music with contrasting This paper Prompt Generation Real-Time Music-to-Image
emotional tones. Wehner [15] uses paintings and music from
Paul Klee to test and evaluate the ability of people to correlate III. V ISUALIZE M USIC BY G ENERATIVE A RTIFICIAL
paintings with music. Inspired by such prior works that show I NTELLIGENCE
the close relationships between visual art and music, this paper Our approach entails interpreting musical elements and
further uses generative machine models to produce visual incorporating additional features, such as chord-analysis, to
representations based on input music. train based on the styles of existing music. To generate images
B. Visualizing Music from music, text prompts serve as an intermediary bridging the
Identifying music through a generative model can be done two mediums (sound and visual). The overall flow can be seen
through several methods depending on how music data is in Figure 2 and will be discussed in the following subsections.
interpreted. The common forms of music data are MIDI (Mu-
sical Instrument Digital Interface) files and signal processing
techniques like Mel Spectrograms [16]. The former represents
music as a digitized pattern of notes and the latter represents
music as a non-linear transformation on the frequency scale
of an audio file. MusicBert [17] uses MIDI to develop a
“Symbolic Music Representation” to analyze music through
patterns of notes. Riffusion [18] (a fine-tuned Stable Diffusion
model) uses Mel Spectrograms to analyze music as images
to train a convolutional neural network (CNN) to match to
existing spectrograms. Such tools and their models can be
effectively trained to classify digitized audio inputs into music
genres; however, an issue arises when it comes to expanding
these classifications into descriptive image generation. The use
of prompts as descriptive tags, aiming to apply them equally to
both auditory and visual experiences, reintroduces the concept
of synesthesia [11]. The subjective nature of synesthetic per-
ceptions acts as an abstract association in achieving seamless
audio-to-image generation.
C. Comparisons
Several methods have used AI models to generate images
from music. Modem’s OP-Z/Stable Diffusion [12] utilizes
prompt engineering to provide imagery from solely MIDI Fig. 2: The process of generating image from music. It starts
inputs. Using MIDI considers basic music elements but lacks with music analysis. A neural network predicts music genre,
over-encompassing details such as genre, instrumentation, or tempo, and emotional values. An prompts was generated from
contextual clues from chord progressions. As such, the results the prediction, and passed into Stable Diffusion for image
are mostly abstract images that lack contextual connection generation.

1518
A. Music Analysis alignment of emotions conveyed by both pictures and music.
We start with analyzing several metrics form the music’s After we have our final prompt, we then feed it to a diffusion-
audio recording and MIDI file. We calculate both temporal and type image-generating model to get our set of images.
physical statistics about the audio using spectrogram analysis IV. H UMAN - SUBJECT E VALUATION AND S TUDY R ESULTS
such as root mean square (RMS) amplitude, spectral width
and centroid, etc., as well as musical data such as pitch,
overall chord patterns and tempo. We used Spotify’s Basic
Pitch [5] to extract MIDI features through chords and pitch,
and OpenSMILE[6] to extract audio features.
B. Emotion/Genre Analysis
We then feed these calculated metrics into a fully con-
nected neural network. We use feed-forward neural networks
to estimate the genre of the music piece and valence-arousal
emotion values. Emotions are measured in terms of valence
(how positive or negative an emotion feels) and arousal
(how intensely the emotion is felt) via the Valence-Arousal
Model [21]. This can be visualized as positive and negative
values on a coordinate graph.
C. Prompt Generation
Fig. 3: Examples of generated images from the system.
Based on these estimates, we use k-nearest neighbors to Beethoven Symphony No. 5 is depicted with imagery of a
assign a set of prompt words to the music (such as genre, thunder storm or a bird on fire, while the more mellow Mozart
emotional words, colors, etc), where k is 1 as prompt features Violin Sonata No. 21 both indicate the violin instrumentation
are relatively distinct. We would like these initial prompts and also an overall brighter color palate.
to relate to the lighting and colors in the generated artwork.
To evaluate the efficacy of our method, we conduct an
For example, when an emotion like “anger” is detected (one
online human-subject study to answer the question: “Can
with a high positive valence and arousal), the generated image
generative visual arts reflect the rich expressions of music?”,
should use saturated colors such as vibrant reds or dark purples
and ”Do audiences like the generated visual?”. In the study,
and black. The subject of the artwork will be also based on
we evaluated the visual arts generated from different pieces
the genre of the input music. As in the case of Figure 3,
of music. After hearing a piece of music, a user selects an
the first passages of Beethoven Symphony No. 5 is classified
image that can best reflect the music. The options include three
with the emotional prompts of ”angry”, ”aggressive”, and
types of images (1) generated by our system, (2) chosen by
”violent”. This results in the images having a theme of either
human (members in this research team), (3) generated based
red or black hues. Additional analysis on the MIDI chords
on other pieces of music. If our system-generated images
and MeL spectrograms defines the genre as a classical work,
are preferable by the majority of the users, our system can
which contributes to the painted texture of the images. Further
effectively produce visual representations reflecting the music.
adjustment of the prompts through “prompt modifiers” [10]
can help generate specific details and variations in the images. A. User Profiles
We produce images using various prompts for each genre We send emails to students and faculty at Purdue and
including solo performances, chamber music, symphony or- collect 88 responses. Among them 62.5% are male and 31.8%
chestras (including concertos), choirs (accompanied by piano are female. Most subjects (84.1%) are within the age range of
or orchestra), and operas/ballets. 18-24. Many of our participants are either student musicians
D. Image Generation (35.2%) or play an instrument for leisure (33.0%).
Finally, once these prompts are generated, we introduce B. Music
some random image-related words into the prompt (such as This study uses 15 pieces of classical music with each 10
camera angle, movement, framing, etc.) to add variation to seconds long. The pieces are chosen from 5 major classical
the resultant image. LLMs (Large Language Models) can music genres: choir, opera and ballet, chamber music, solo per-
comprehend valence-arousal emotion values and provide feed- formance, and larger group of ensemble (orchestra or band).
back on the represented emotions. Therefore, in this process, Three pieces per genre. These pieces are well-known and
the initially obtained valence-arousal emotion values will be representative for its category i.e. Beethoven’s 9th Symphony
collectively inputted into the LLMs. Once these fundamental (Choir) and Bach Cello Suite No. 1 Prelude (Solo). When
elements composing the prompt are acquired, the GPT-4 [22] selecting the pieces, we considered a diverse set of musical
LLM will be introduced to assist in prompt engineering for features such that our system can be generalized broadly.
more detailed image generation. Additionally, throughout this C. Visual Representations of Music
process, the LLM is emphasized to consistently maintain the For each music piece, our system generates six images
(per trial). For comparison, musicians in our team select

1519
TABLE II: Proportion of Images Chosen & Expected Values.
Subjectivity Level: Realistic Abstract
System Expected % 40.2% 50.4%
System User Chosen % 53.0% 69.0%
Non-System Expected % 54.9% 39.8%
Non-System User Chosen % 47.0% 29.6%
Distraction Expected % 4.9% 9.7%
Distraction User Chosen % 0.0% 1.4%
P-Value < 0.01 < 0.01
Fig. 4: A sample question. The user was asked to choose a
image best fits the music. Non-AI Image sources: [23], [24].
makeups provided by the 195 total images included in the sur-
vey. However, the percentage of the system-generated images
six images manually from three online image repositories: chosen by users is much higher than the actual percentage of
Pexels, Pixabay, and Unsplash. These images also reflect images included in the survey. Figure 5 (a) and (b) show the
the music pieces based on the musicians’ judgement. The percentages of selections and options of abstract images. The
manually selected images are used for comparison against the images generated are 50.4% of all image options, but counted
system-generated images. If the users prefer system-generated to 69.0% of users’ selections. In contrast, the other 49.6% of
images to human-chosen images, it suggests that our system images only counted to 31.0% in users’ selections. Similarly,
can generate images that are closer to the music than those for realistic images, users prefer system generated images
manually selected images. This in turn suggests the viability of (45.8% options counted to 52.3% users selected). Chi-square
generated images in accurately representing music on human analysis (table II) shows that there is a statistically significant
standards. Also, to ensure that users can select the images preference for trial images found for both the realistic and
that truly represent the specific piece of music, we include abstract images. The p-values for both realistic and abstract
a system-generated image from a different piece of music images are less than 0.01. Consequently, this suggests that
(distraction). This image does not reflect the current music. users perceive the images generated by our system as better
This distraction aims to confirm that users can distinguish if representations of the music than human-chosen images
an image represents the music or not. In total, for each piece For triangulation, we also examined if users are able to
of music, thirteen images are available. identify images that do not reflect the music. In each question,
This study considers images of different styles to avoid there is one distraction image out of 5 possible images. If users
possible preference bias due to styles. We classify the images randomly choose an image, we should expect the proportion
into abstract and realistic. Realistic arts depict the subject of distraction images selected to be slightly lower than 20%
matter with a high degree of fidelity to its real-world appear- (due to the “None of the Above” option available to users).
ance; abstract forms use colors, shapes, lines, and forms to However, the total percentage of distraction images chosen
convey emotions, ideas, or concepts. A user may have a strong during the survey was less than 1%, signifying that users are
preference for one certain style. To ensure we are comparing able to tell which images do not reflect the music.
similar styles of images, we categorize each image as either Overall, the total percentage of system images chosen in
realistic or abstract. Figure 3 shows several examples. The the survey is 58%, the percentage of human-chosen images
survey includes 82 photos or realistic images and 113 abstract chosen is 35%, and the remaining percentage is comprised of
images, total 195 images. “None of the Above” choices. The total number of selections
D. Questionnaire by users are 7 + 150 + 349 + 183 + 206 + 61 (None of the
We designed 15 questions. During survey, a user receives Above) = 956. Users select generated images 349 + 206 = 555
10 random questions plus one additional question measures times. The ratio is 555
956 = 58%. Users select non-system images
users’ preferences of subjectivity (toal 11 questions). Figure 4 150 + 183 = 333 times. The ratio is 333 956 = 35%. The p-value
is an example of a question. Each question includes a 10- across both subjectivity levels is less than 0.01. This signifies
second music clip. The user clicks the button to play the music. that our system creates effective visual representations of
The system selects four images that may be generated by our music that are more preferred by users. Additionally, our
system (trial, also called system-generated) or human-chosen. distraction images test shows that users are able to tell which
Additionally, one distraction image is included to detect style images are not correspond to the musical clips. This suggests
bias. The user may also select ”None of the images”. that the System-generated images are preferred over human-
E. Result and Analysis selected images not because of their type, but due to their
Figure 5 shows user’s preferences between system- meaningful representation of the music..
generated and human-chosen images as representations of We further examined all the 15 music pieces used in
the given music clips, as well as their subjectivity level this survey. Among the 15 music pieces in our survey, each
preferences. If users had selected images randomly, the ex- of these pieces receives a different level of preference for
pected numbers of system-generated images and non-system- system-generated images as shown in Figure 6. The piece
generated images chosen would have followed the percentage in our survey with the highest proportion (best system per-
formance) of system-generated images is Albeniz’s Asturias,

1520
(a) abstract, selected (b) abstract, options (c) realistic, selected (d) realistic, options

Fig. 5: The survey results. (a)(b) Abstract style. (c)(d) Realistic style. (a) Users select system-generated images 349 times
(68.97%) and images not generated by our system 150 times (29.64%). (b) Only 50.4% images are system generated. (c) Users
select system-generated images 206 times (52.2%). (d) Only 45.8% images are system-generated. The users selected “None
of the images” 61 times which is not represented in the pie charts.

Additionally, the majority of our users are either White or

Asian (91.0%), and the majority (69.3%) have played music
instruments. Our future work may analyze the relationships of
user demographic and musical experience along with deﬁning
a concrete qualitative evaluation of results with a more diverse
study group. This study considers only classical music. A
future study should consider other types of music, such as
jazz, rock, and pop.

Fig. 6: Percentages of system-generated chosen by users for

different composers. The ﬁgure shows 7 of the 15 composers
in our survey.

where 5065 = 76.9% of the images selected by users are system-

generated. The piece with the lowest proportion (worst system
performance) of system-generated images is Tchaikovsky’s
Piano Concerto No. 1, with 22 62 = 35.5% of images chosen by
users for this piece. There is a large difference between the
largest and smallest percentage of system-generated images
chosen between pieces, suggesting that our system may not Fig. 7: Our system in live Cello performing
able to equally visualize different types of musics. B. Applications
V. D ISCUSSION There lies a great opportunity in image generation for
A. Limitations entertainment and enhancing the user experience when lis-
The p-values for both the abstract and realistic subjectivity tening to music. Real-time implementations can decorate a
levels are less than 0.01. We conclude that there is a statis- space being used for social events (i.e. karaoke, clubs, parties)
tically meaningful preference for system-generated images as as a more immersive substitute to music videos, ambient
opposed to human-chosen images. However, there are several lighting, or still images. Musicians can efﬁciently provide
limitations found both in the selected user base for our survey a visual experience to the performance that surpasses their
as well as through the organization of our survey questions. own capabilities. The generated images can provide users
Also, it seems our system’s performance varies when dealing with hearing-impairments a visual outlet to enjoy music.
with different music. Is there a systematic difference (i.e. Other works have shown these possibilities like with Liu’s
always perform worse on certain types of music), or just ”Generative Disco” [19] or Betin’s ”Visualizing Sound with
random error, still needs more investigation. AI” [20]. Our method can provide human-interpreted image
The majority of our users fall into the age range of quality in these applications. Recently we have put our system
18-25 (84.1%) because the place (university) of this study. in a live performing event (Fig. 7 https://www.youtube.com/

1521
watch?v=LF172wWu2jU). The system runs smoothly. It saved and Machine Intelligence, 45(9):10850–10869, Septem-
a lot of effort from the performer in choosing images for ber 2023. Conference Name: IEEE Transactions on
the background visual effect of the music. The performer and Pattern Analysis and Machine Intelligence.
audience feel the generated image at the background largely [9] Robin Rombach, Andreas Blattmann, Dominik Lorenz,
reflect the nature and characteristics of the music. Patrick Esser, and Björn Ommer. High-resolution image
VI. C ONCLUSION synthesis with latent diffusion models, 2022.
This paper presents a study using generative artificial [10] Jonas Oppenlaender. A taxonomy of prompt modifiers
intelligence to visualize music. Our system analyzes music for text-to-image generation. Behaviour amp; Informa-
by multiple elements, such as instruments, tempo, emotion, tion Technology, page 1–14, November 2023.
pitch, and generates text prompts. The prompts are then input [11] Guilherme Francisco F Bragança, João Gabriel Marques
to diffusion models to produce images. A user study indicates Fonseca, and Paulo Caramelli. Synesthesia and music
that this approach can effectively reflect the rich expression of perception. Dementia & neuropsychologia, 9:16–23,
music. 2015.
[12] Modem. Op-z stable diffusion. https://modemworks.com/
ACKNOWLEDGMENTS
projects/op-z-stable-diffusion/, Jan 2023.
We appreciate the support from the sponsors and the people
[13] John T. Cowles. An experimental study of the pairing
that participated in the survey. This work is supported in part
of certain auditory and visual stimuli. Journal of Exper-
by NSF IIS-2326198 and by the CREATE program of Purdue.
imental Psychology, 18(4):461–469, 1935.
Any opinions, findings, and conclusions or recommendations
[14] Pinaki Gayen, Junmoni Borgohain, and Priyadarshi Pat-
expressed in this paper are those of the authors and do not
naik. The Influence of Music on Image Making: An Ex-
necessarily reflect the views of the sponsors.
ploration of Intermediality Between Music Interpretation
R EFERENCES and Figurative Representation, pages 285–293. 06 2021.
[1] Cellist man clipart, music vintage. https://openverse.org/ [15] Walter L. Wehner. The relation between six paintings by
image/7962407e-1be8-4123-a3d7-7b1449f65c3b. paul klee and selected musical compositions. Journal of
[2] Charles Spence and Nicola Di Stefano. Coloured hearing, Research in Music Education, 14(3):220–224, 1966.
colour music, colour organs, and the search for percep- [16] Hugo B. Lima, Carlos G. R. Dos Santos, and Bianchi S.
tually meaningful correspondences between colour and Meiguins. A Survey of Music Visualization Tech-
sound. i-Perception, 13(3):20416695221092802, 2022. niques. ACM Computing Surveys, 54(7):143:1–143:29,
PMID: 35572076. July 2021.
[3] Rossana Actis-Grosso, Carlotta Lega, Alessandro Zani, [17] Mingliang Zeng, Xu Tan, Rui Wang, Zeqian Ju, Tao Qin,
Olga Daneyko, Zaira Cattaneo, and Daniele Zavagno. and Tie-Yan Liu. MusicBERT: Symbolic Music Un-
Can music be figurative? exploring the possibility of derstanding with Large-Scale Pre-Training, June 2021.
crossmodal similarities between music and visual arts. arXiv:2106.05630 [cs].
Psihologija, 50:285–306, 01 2017. [18] Seth Forsgren and Hayk Martiros. Riffusion - Stable
[4] Mats B Küssner and Tuomas Eerola. The content and diffusion for real-time music generation. https://github.
functions of vivid and soothing visual imagery during com/riffusion/riffusion, 2022.
music listening: Findings from a survey study. Psy- [19] Vivian Liu, Tao Long, Nathan Raw, and Lydia Chilton.
chomusicology: Music, Mind, and Brain, 29:90, 2019. Generative disco: Text-to-video generation for music
[5] Rachel M. Bittner, Juan José Bosch, David Rubinstein, visualization, 2023. arXiv:2304.08551 [cs].
Gabriel Meseguer-Brocal, and Sebastian Ewert. A [20] Vasily Betin. Visualizing sound with ai. Medium, May
lightweight instrument-agnostic model for polyphonic 2020.
note transcription and multipitch estimation. In IEEE In- [21] Saikat Basu, Nabakumar Jana, Arnab Bag, Mahadevappa
ternational Conference on Acoustics, Speech, and Signal M, Jayanta Mukherjee, Somesh Kumar, and Rajlakshmi
Processing, 2022. Guha. Emotion recognition based on physiological
[6] Florian Eyben, Martin Wöllmer, and Björn Schuller. signals using valence-arousal model. In International
Opensmile: The munich versatile and fast open-source Conference on Image Information Processing, pages 50–
audio feature extractor. In ACM International Conference 55, 2015.
on Multimedia, page 1459–1462, 2010. [22] Josh Achiam et. al. Gpt-4 technical report. Technical
[7] Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav report, OpenAI, 2023. arXiv:2303.08774 [cs].
Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, [23] Pixabay. Light sun cloud japan. https://www.pexels.com/
and Mark Chen. GLIDE: Towards Photorealistic Im- photo/light-sun-cloud-japan-45848/, February 2016.
age Generation and Editing with Text-Guided Diffusion [24] Prawny. Abstract painting country golden.
Models, March 2022. arXiv:2112.10741 [cs]. https://pixabay.com/illustrations/abstract-painting-
[8] Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor country-golden-5985987/, February 2021.
Ionescu, and Mubarak Shah. Diffusion Models in Vi-
sion: A Survey. IEEE Transactions on Pattern Analysis

1522

AI-Augmented Creativity Evaluating The Role of Generative Models in Music Composition
No ratings yet
AI-Augmented Creativity Evaluating The Role of Generative Models in Music Composition
7 pages
QMG Vae-2
No ratings yet
QMG Vae-2
3 pages
The Impact of Artificial Intelligence On Visual An
No ratings yet
The Impact of Artificial Intelligence On Visual An
12 pages
A Review of Intelligent Music Generation Systems: Lei Wang, Ziyi Zhao, Hanwei Liu, Junwei Pang, Yi Qin and Qidi Wu
No ratings yet
A Review of Intelligent Music Generation Systems: Lei Wang, Ziyi Zhao, Hanwei Liu, Junwei Pang, Yi Qin and Qidi Wu
28 pages
Music PPT (3.1)
No ratings yet
Music PPT (3.1)
13 pages
Music Compostion With Magenta
No ratings yet
Music Compostion With Magenta
2 pages
A Survey On Artificial Intelligence For Music Generation
No ratings yet
A Survey On Artificial Intelligence For Music Generation
26 pages
Artificial Intelligence in Music Recent Trends and
No ratings yet
Artificial Intelligence in Music Recent Trends and
40 pages
2+ijrise 2023 1083
No ratings yet
2+ijrise 2023 1083
3 pages
A Survey of AI Music Generation Tools and Models
No ratings yet
A Survey of AI Music Generation Tools and Models
39 pages
Music Generation with Transformers
No ratings yet
Music Generation with Transformers
6 pages
IEEE - Research Paper
No ratings yet
IEEE - Research Paper
6 pages
A I in Creative Industries
No ratings yet
A I in Creative Industries
14 pages
Copyright Challenges in The Music Industry
No ratings yet
Copyright Challenges in The Music Industry
12 pages
App and Advances
No ratings yet
App and Advances
19 pages
Literature Review
No ratings yet
Literature Review
17 pages
AP Research Log
No ratings yet
AP Research Log
7 pages
AI in ART&MUSIC
No ratings yet
AI in ART&MUSIC
20 pages
AI's Role in Art and Music
No ratings yet
AI's Role in Art and Music
2 pages
A Review of Intelligent Music Generation Systems
No ratings yet
A Review of Intelligent Music Generation Systems
24 pages
QMG Gan
No ratings yet
QMG Gan
10 pages
Coherent Music Composition With Efficient Deep Lea
No ratings yet
Coherent Music Composition With Efficient Deep Lea
18 pages
AI-Driven Music Generation with GANs
No ratings yet
AI-Driven Music Generation with GANs
1 page
AI Music - by Suha Atiyeh
No ratings yet
AI Music - by Suha Atiyeh
3 pages
Pradesh DL
No ratings yet
Pradesh DL
9 pages
SSRN 4575257
No ratings yet
SSRN 4575257
9 pages
Neo-Riemannian Theory For Generative Film and Videogame Music
No ratings yet
Neo-Riemannian Theory For Generative Film and Videogame Music
9 pages
University of California Los Angeles
No ratings yet
University of California Los Angeles
102 pages
Generative AI and Computational Creativity Redefining
No ratings yet
Generative AI and Computational Creativity Redefining
2 pages
A Comprehensive Overview of AI-enabled Music Classification and Its Influence in Games
No ratings yet
A Comprehensive Overview of AI-enabled Music Classification and Its Influence in Games
15 pages
The Path To AI in Music
No ratings yet
The Path To AI in Music
3 pages
Aimusic
No ratings yet
Aimusic
16 pages
The Impact of AI On The Future Music Industry
No ratings yet
The Impact of AI On The Future Music Industry
4 pages
Perceptual Factors Evaluation Scale For Musical Works
No ratings yet
Perceptual Factors Evaluation Scale For Musical Works
28 pages
MusiCoT Paper
No ratings yet
MusiCoT Paper
14 pages
Music Emotion Recognition System
No ratings yet
Music Emotion Recognition System
3 pages
Title: AI in Art and Music Generation
No ratings yet
Title: AI in Art and Music Generation
8 pages
Ki Whitepaper en
No ratings yet
Ki Whitepaper en
23 pages
Musical Agent Systems: Macat and Macatart: Keon Ju M. Lee Philippe Pasquier
No ratings yet
Musical Agent Systems: Macat and Macatart: Keon Ju M. Lee Philippe Pasquier
12 pages
Neural Music Generation Insights
No ratings yet
Neural Music Generation Insights
3 pages
On The Evaluation of Generative Models in Music
100% (1)
On The Evaluation of Generative Models in Music
12 pages
Evolving Structures For Electronic Dance Music: JULY 2013
100% (1)
Evolving Structures For Electronic Dance Music: JULY 2013
9 pages
Algoritmic Music Composition Based On Artificial Intelligence A 2018
No ratings yet
Algoritmic Music Composition Based On Artificial Intelligence A 2018
7 pages
Ji Yang Luo Survey Symbolic Music Generation
No ratings yet
Ji Yang Luo Survey Symbolic Music Generation
39 pages
Exploring Bias in AI-Composed Music
No ratings yet
Exploring Bias in AI-Composed Music
16 pages
Johann Joseph Fux
No ratings yet
Johann Joseph Fux
5 pages
Updated IEEE Paper
No ratings yet
Updated IEEE Paper
19 pages
Midi RNN Ieee
No ratings yet
Midi RNN Ieee
6 pages
Journal of Creative Behavior - 2023 - Tigre Moura - Artificial Intelligence Creates Art An Experimental Investigation of
No ratings yet
Journal of Creative Behavior - 2023 - Tigre Moura - Artificial Intelligence Creates Art An Experimental Investigation of
16 pages
Aula 15 - Moura, Castrucci e Hindley (2023)
No ratings yet
Aula 15 - Moura, Castrucci e Hindley (2023)
16 pages
Generating AI Songs A Comprehensive Guide To Creating Music With Artificial Intelligence
No ratings yet
Generating AI Songs A Comprehensive Guide To Creating Music With Artificial Intelligence
4 pages
EmotionBox A Music Element Driven Emotio
No ratings yet
EmotionBox A Music Element Driven Emotio
14 pages
Mini Project - Music Genre Classification
No ratings yet
Mini Project - Music Genre Classification
20 pages
Image-To-Music Generation An Application Using Image Processing and Audio Synthesis
No ratings yet
Image-To-Music Generation An Application Using Image Processing and Audio Synthesis
10 pages
Article - April 26th Version
No ratings yet
Article - April 26th Version
4 pages
GANs in Art and Music Creation
No ratings yet
GANs in Art and Music Creation
5 pages
LAIDLOW, Robert. (Dissertation) Artificial Intelligence Within The Creative Process of Contemporary Classical, 2022
No ratings yet
LAIDLOW, Robert. (Dissertation) Artificial Intelligence Within The Creative Process of Contemporary Classical, 2022
163 pages
Computational Creativity and Music Generation Systems
No ratings yet
Computational Creativity and Music Generation Systems
21 pages
ARGUS Visualization of AI-Assisted Task Guidance in AR
No ratings yet
ARGUS Visualization of AI-Assisted Task Guidance in AR
11 pages
A Robust Model For Automated Essay Scoring System
No ratings yet
A Robust Model For Automated Essay Scoring System
5 pages
Lecture 3
No ratings yet
Lecture 3
68 pages
Unit I
No ratings yet
Unit I
203 pages
Final Year Major Project Report
No ratings yet
Final Year Major Project Report
74 pages
Artificial Intelligence - Britannica
No ratings yet
Artificial Intelligence - Britannica
16 pages
AI Book 10 - Worksheets - Unit 2
No ratings yet
AI Book 10 - Worksheets - Unit 2
6 pages
Computer Science BSC Research Proposal and Thesis Template
No ratings yet
Computer Science BSC Research Proposal and Thesis Template
9 pages
Main Project
No ratings yet
Main Project
6 pages
Final Lab Exam - Attempt Review Ai 2333
No ratings yet
Final Lab Exam - Attempt Review Ai 2333
17 pages
Deep Seek
No ratings yet
Deep Seek
2 pages
Sensors 23 07650
No ratings yet
Sensors 23 07650
24 pages
Datamites Ai Expert Brochure
No ratings yet
Datamites Ai Expert Brochure
10 pages
A Comparative Study of Deep Learning Models For Guava Leaf Disease Detection
No ratings yet
A Comparative Study of Deep Learning Models For Guava Leaf Disease Detection
5 pages
Uncooled Thermal Image Denoising Using Deep Convolutional Neural Network
No ratings yet
Uncooled Thermal Image Denoising Using Deep Convolutional Neural Network
6 pages
Generative AI and ChatGPT For Beginners - A Comprehensive Guide To Harness The Power of AI, Boost Productivity, and Get More Done in Less Time (Tech Mastery)
No ratings yet
Generative AI and ChatGPT For Beginners - A Comprehensive Guide To Harness The Power of AI, Boost Productivity, and Get More Done in Less Time (Tech Mastery)
118 pages
Ai and Data Science
No ratings yet
Ai and Data Science
9 pages
Independent Research Project (IRP) - Marketing Final Report Submission (Section A and D)
No ratings yet
Independent Research Project (IRP) - Marketing Final Report Submission (Section A and D)
20 pages
State of AI in Education
No ratings yet
State of AI in Education
29 pages
Book TheLMbook Sample
No ratings yet
Book TheLMbook Sample
30 pages
UNIT I-PGI20C05J-Deep Neural Networks
No ratings yet
UNIT I-PGI20C05J-Deep Neural Networks
35 pages
Paper 67
No ratings yet
Paper 67
8 pages
Deep Learning
No ratings yet
Deep Learning
68 pages
PFX 48420843
No ratings yet
PFX 48420843
6 pages
Certified Data Scientist Associate Level
No ratings yet
Certified Data Scientist Associate Level
9 pages
11-Business Inteligence and Knowledge Manag
No ratings yet
11-Business Inteligence and Knowledge Manag
40 pages
Midterm Lab Exam - Attempt Review
No ratings yet
Midterm Lab Exam - Attempt Review
17 pages
A Review Paper On Artificial Neural Network A Prediction Technique
No ratings yet
A Review Paper On Artificial Neural Network A Prediction Technique
3 pages
Hybrid CNN Amp Random Forest Model For Effective Onion Leaf Disease
No ratings yet
Hybrid CNN Amp Random Forest Model For Effective Onion Leaf Disease
6 pages
BDE Final Report
No ratings yet
BDE Final Report
53 pages