Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
15 views94 pages

6 Computer Simulation of Musical Evolution

The document discusses the intersection of music and evolution, particularly through the lens of computer simulation and evolutionary computing. It explores how music can be analyzed and synthesized using computer technology, highlighting the complexities of musical culture and the potential for artificial systems to evolve music over time. The text emphasizes the relationship between analytical and synthetic approaches in understanding music's structure and creativity, while also addressing the philosophical implications of machine-generated music.

Uploaded by

Zoey Chou
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views94 pages

6 Computer Simulation of Musical Evolution

The document discusses the intersection of music and evolution, particularly through the lens of computer simulation and evolutionary computing. It explores how music can be analyzed and synthesized using computer technology, highlighting the complexities of musical culture and the potential for artificial systems to evolve music over time. The text emphasizes the relationship between analytical and synthetic approaches in understanding music's structure and creativity, while also addressing the philosophical implications of machine-generated music.

Uploaded by

Zoey Chou
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 94

MUSIC IN EVOLUTION

AND
EVOLUTION IN MUSIC

Steven Jan
https://www.openbookpublishers.com

©2022 Steven Bradley Jan

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives


4.0 International license (CC BY-NC-ND 4.0). This license allows you to share, copy, distribute
and transmit the work for non-commercial purposes, providing attribution is made to the author
(but not in any way that suggests that he endorses you or your use of the work). Attribution
should include the following information:

Steven Bradley Jan, Music in Evolution and Evolution in Music. Cambridge, UK: Open Book
Publishers, 2022, https://doi.org/10.11647/OBP.0301

Copyright and permissions for the reuse of many of the images included in this publication
differ from the above. This information is provided in the Credits on page xxxi.

Further details about CC BY-NC-ND licenses are available at https://creativecommons.org/


licenses/by-nc-nd/4.0/.

All external links were active at the time of publication unless otherwise stated and have been
archived via the Internet Archive Wayback Machine at https://archive.org/web.

Digital material and resources associated with this volume are available at https://doi.org/10.
11647/OBP.0301#resources.

Every effort has been made to identify and contact copyright holders and any omission or error
will be corrected if notification is made to the publisher.

ISBN Paperback: 9781800647350


ISBN Hardback: 9781800647367
ISBN Digital (PDF): 9781800647374
ISBN Digital ebook (EPUB): 9781800647381
ISBN Digital ebook (AZW3): 9781800647398
ISBN XML: 9781800647404
ISBN HTML: 9781800647411
DOI: 10.11647/OBP.0301
Cover image by Gareth Price, all rights reserved. Cover design by Anna Gatti.
6. Computer Simulation of
Musical Evolution

Evolutionary computing (EC) may have varied applications in music.


Perhaps the most interesting application is for the study of the circum-
stances and mechanisms whereby musical cultures might originate and
evolve in artificially created worlds inhabited by virtual communities of
software agents. In this case, music is studied as an adaptive complex
dynamic system; its origins and evolution are studied in the context of
the cultural conventions that may emerge under a number of constraints,
including psychological, physiological and ecological constraints. Mu-
sic thus emerges from the overall behaviour of interacting autonomous
elements. (Miranda et al., 2003, p. 91)

6.1 Introduction: Computer Analysis and Synthesis


of Music
The growth of computer technology in recent decades has made possible the
understanding of a number of complex physical, biological and cultural phe-
nomena and processes, such as weather patterns, evolutionary processes and
economic cycles. In the third of these domains, culture, music has figured
quite prominently in such research, not least because of its inherently high de-
gree of complexity and the associated, and seemingly irresistible, challenge it
poses for computer science. The application of computers to the study of mu-
sic comes in two basic, sometimes overlapping, forms: analysis and synthesis.
The analytic tradition deals with using computers to break music down
into its component parts, primarily the more tractable aspects of harmony,
melody and rhythm, with the aims of arriving at segmentations that are in
some senses compositionally, music-theoretically or cognitively meaningful
(Meredith, 2016). The object of such research is generally symbolic music – i.e.,
music encoded in some text-based or numeric representation format (§6.4).

© 2022 Steven Jan, CC BY-NC-ND 4.0 https://doi.org/10.11647/OBP.0301.06


474 Music in Evolution and Evolution in Music

A separate field – exemplified by the Shazam song-identification software


(Shazam, 2019) and the Sonic Visualiser recording-analysis software (Sonic
Visualiser, 2021) – is audio-based analysis, which aims to identify patterns
in sound files. Yet these must still be converted into some form of internal
symbolic representation, upon which the analytical engine operates.

The parameter of melody has arguably received the most sustained attention
in symbolic music analysis, the field being particularly concerned with se-
quential pattern-finding and similarity-matching (see, for instance Conklin
& Anagnostopoulou, 2006; Janssen et al., 2017). Some of this research is con-
ducted under the broader rubric of Music Information Retrieval (MIR) and is
assessed via the MIREX (Music Information Retrieval Evaluation eXchange)
organisation and its associated competitions designed to determine optim-
ally performing systems (MIREX, 2020). The technologies for analysis are
variously offline (Lartillot, 2019; this application using both audio and sym-
bolic approaches) or online (Kornstädt, 1998; Huron et al., 2021); and they
are orientated either toward broad usability by non-specialist musicologists
(Wheatland, 2009), or for expert investigation of focused problem-spaces,
such as building and testing hypotheses in music cognition (M. Pearce &
Müllensiefen, 2017). Aligning with the growing accessibility of big data
(§3.4.1), online pattern-finding utilities are often front-ends for music data-
bases allowing, for example, large-scale searches for incipits and the location
of common patterns in a particular corpus (RISM, 2021). While generally not
explicitly conducted in such terms, the concerns of the computer-analytic tra-
dition are well suited to locating the types of pattern replication – for finding
musemes by virtue of what is replicated – encompassed by memetics.

The synthetic tradition – which arguably started in 1957 with the Illiac Suite
(Hiller & Isaacson, 1957) – deals with using computers to generate music
that conforms to a particular style or that is felt by a human observer to be
convincing (as music) to some extent.257 The degree to which this latter
attribute is upheld is a form of Turing Test (TT) (Turing, 1950), in that the
programmer is (sometimes) attempting to convince a human listener that
the outputs of his or her program are the unmediated creative products of

257 Perhaps the origin of this tradition is rather earlier, in the musical dice-games (the dice

being a random-number generator) of the late-eighteenth century (Ariza, 2011; Moseley, 2016;
Tesar, 2000).
6. Computer Simulation of Musical Evolution 475

another human intelligence. Nevertheless, Ariza (2009) offers a critique of ap-


plications of the Turing Test to music, arguing that a test that was designed by
Turing to discern the existence of thought as articulated via natural-language
interlocution has often been applied uncritically to music. He argues that
“the TT employs natural language discourse to represent the presence of
thought; its spirit is not preserved in either the MOtT [Musical Output toy
Test] or the MDtT [Musical Directive toy Test]” (2009, p. 61). The MOtT
attempts to distinguish between two musical outputs, one human-generated,
the other computer-generated (2009, p. 55). The MDtT replaces “Output”
with “Directive”, whereby the interrogator requests (and attempts to dis-
tinguish between) music from a human and from a computer, generated
according to some specification or input style (2009, p. 55). Some might
argue – apropos the point above on “unmediated creative products” – that
computer-generated music (hereafter “CGM”) is in reality human-generated
music (hereafter “HGM”), albeit at one degree of remove: they might assert
that “[t]he computer can be seen not as an autonomous author but as a
system that executes or reconfigures knowledge imparted to it by its pro-
grammers” (Ariza, 2009, p. 64). By contrast, genuine creativity, according to
the “Lovelace Test” of Bringsjord et al. (2001), can be identified “when [a]
H[uman architect] cannot account for how [an artificial] A[gent] produced
[an] o[utput]” (2001, p. 4) (§6.6). Nevertheless, understanding the pro-
duction of an o is not necessarily straightforward, given that some systems’
generative operations occur, intractably, in a black box. Thus, the adherents
of CGM might argue that their algorithms produce os that, because they
cannot be “accounted for”, are therefore potentially creative.

The analytic and the synthetic approaches are often conducted reciprocally:
to generate music effectively it is necessary to understand its nature and struc-
ture analytically; and such understanding is itself deepened by the synthetic
process of designing music-generative algorithms. Moreover, as discussed in
§6.3, the music analysis-synthesis distinction also applies to cognate research
in the computer simulation of language’s structure and evolution, which –
as the persistent focus in this book on the close evolutionary connections
between music and language might suggest – has numerous overlaps with
the computer simulation of music’s structure and evolution.
476 Music in Evolution and Evolution in Music

The chapter continues by considering how music-synthesis systems occupy


a continuum from minimally to maximally autonomous, and reviews the
terminology associated with research on primarily autonomous systems
(§6.2). It moves on to explore how computer systems for simulating language
evolution relate to analogous technologies in music, both domains presenting
to algorithm designers the problem of creatively evolving a Humboldtian
medium in a virtual environment (§6.3). Before turning to examine examples
of music-generative systems, it is necessary to consider how music might
be represented in ways that are meaningful to machines (§6.4). The main
body of the chapter is concerned with an overview of a number of different
strategies for music generation – some evolutionary, some not – by means
of the examination of one or two systems selected as representative of each
approach (§6.5). Finally, the issue of machine creativity is considered, partly
in the light of the discussion of animal creativity in Chapter 5 (§5.5), focusing
on philosophies and strategies for evaluating CGM (§6.6).

6.2 The Continuum of Synthesis and Counterfactual


Histories of Music
The focus of this chapter is, naturally, on the synthetic tradition. Research in
this field occupies a continuum, or “spectrum of automation” (Fernández &
Vico, 2013, p. 516). At one end of this continuum of synthesis are augmentation
systems, which use the computer to expand the extant potential of a human
composer or improviser; this technology is sometimes also termed Computer-
Aided Algorithmic Composition (CAAC) (2013, p. 515). At the other end of
the continuum are fully automatic generative systems, which aspire to the kind of
autonomy typical of radical AI. Framed in this way, “CAAC [represents] a low
degree of automation, algorithmic composition a high degree of automation”
(2013, p. 516). The continuum of synthesis might be expanded to encompass
the whole creative range, from (fully) HGM at one end to (fully) CGM at
the other, with CAAC therefore occupying some mid-point. Apropos the
point on mediation made in §6.1, one should nevertheless remember that
(fully) CGM is, at least partly, HGM, because the underlying algorithms that
give rise to CGM are the product of human intelligence, albeit arguably not
the specifically musical domain of that intelligence, and albeit an intelligence
6. Computer Simulation of Musical Evolution 477

that – in a manner analogous, for instance, to aleatoric HGM – delegates


most or all of the decision-making to the computer.

One of the principal aims of research in CGM is to create computer systems


that are capable not only of generating music – either in the form of aug-
mentation/CAAC systems or of fully automatic generative systems – but
also of evolving it over time. Specifically, systems with an evolutionary di-
mension, the main concern of this chapter, not only (i) model the “local”
cognitive-evolutionary processes of the individual musician (as composer
or improviser) – Velardo’s psychological ontological category of being; but
they may also (ii) model “global” structural-systemic processes of musical
change over time – Velardo’s socio-cultural category (2016, p. 104, Fig. 3)
(§1.5.5). From a Universal Darwinian perspective, both processes operate
by “translating” the VRS algorithm into the specific generative algorithms
of the system, the latter modelling – insofar as these domains can be mean-
ingfully separated – intra- and inter-brain Darwinism, respectively. Some of
the explicitly evolutionary systems considered in §6.5 fall under the ambit of
(ii), which – on account of its connection with processes operating in biolo-
gical and cultural evolution – forms the centre of gravity of the discussion.
Their implementation of evolutionary modelling potentially allows for the
“replaying” of musical history, starting from generally accepted beginnings
and evolving alternative, or counterfactual, histories of music – music not
as it was in the past, but as it might have been, or might at some point in
the future become. Nevertheless, whenever computers are used to model
analytical or synthetic processes that occur fundamentally in the human
mind, one needs to remember – as expressed in Temperley’s double-negative
formulation – that

the mere fact that a [computer] model performs a process successfully


certainly does not prove that the process is being performed cognitively
in the same way. However, if a model does not perform a process suc-
cessfully, then one knows that the process is not performed cognitively
in that way. If the model succeeds in its purpose, then one has at least a
hypothesis for how the process might be performed cognitively, which
can then be tested by other means. (Temperley, 2001, p. 6; emphases in
the original)
478 Music in Evolution and Evolution in Music

Given that many systems for the generation of music are based on Darwinian
principles, and given that several of these are at least able to produce recog-
nisably musical outputs of increasing complexity over time, then it can be
said, to paraphrase Temperley, that if such a Darwinian model does perform
the process successfully, then one knows that the process might be (or might
have been) performed cognitively – and thus socio-culturally – in that way.

The music-synthetic research programme has been conducted under vari-


ously rubrics, including (alphabetically listed): AI Music Creativity (AIMC,
2021; Miranda, 2021);258 Algorithmic Composition (Nierhaus, 2009); Algo-
rithmic Music (Dean & McLean, 2018); Computer Composition (Miranda, 2001);
Computer Simulation of Musical Creativity (CSMC) (as a subset of the field
of computational creativity (McCormack & D’Inverno, 2012; Bown, 2021));
Computer Simulation of Musical Evolution (CSME) (Gimenes, 2015); Evolution-
ary Computer Music (Miranda & Biles, 2007); Evolutionary Computing (EC)
(Miranda, 2004); Generative Music (N. Collins & Brown, 2009); and Musical
Metacreation (MuMe) (Eigenfeldt et al., 2013),259 among others (§1.4). Apro-
pos two of these rubrics, the term “simulation” is often used interchangeably
with “emulation”, but there are important methodological differences in-
herent in the terminology. Here, simulation is understood as an attempt to
duplicate/replicate the (external, perceived) behaviour or outputs of a system,
as it appears to an observer; whereas emulation is understood as an attempt
to duplicate/replicate the system’s (internal, functional) mechanisms (and
thus, pace Temperley, also its behaviour or outputs). Given these definitions,
and while the term “CS[imulation]MC” has become current,260 one might
argue that “CE[mulation]MC” is a more appropriate acronym to encompass
those systems that implement (emulate) specific mechanisms – here, the
VRS algorithm – that are held to underpin the phenomena – here, cultural
evolution and human creativity – being modelled.

Of the above rubrics, the acronyms CSMC and CSME align most closely with
the concerns of this chapter, not least because, apropos the first, Darwinism
has been framed as a form of creativity (§5.5.2). By virtue of this connection,
258 See also the Journal of Creative Music Systems (https://www.jcms.org.uk/).
259 Metacreation applies to a range of creative domains in addition to music, and encompasses
a number of competences in addition to generation. See Pasquier (2019).
260 This may be partly due to the arguably greater euphony and ease of pronunciation of

“CSMC” as against “CEMC”.


6. Computer Simulation of Musical Evolution 479

and as is indicated in some of the rubrics listed above, machine creativity of-
ten draws upon evolutionary mechanisms. Nevertheless, not all systems that
generate music do so by means of (emulating) evolutionary mechanisms:
many use ostensibly non-evolutionary processes in producing (simulating)
music, so they are seemingly creative without being evolutionary. Thus, it is
important to distinguish between means and ends in the realm of music gen-
eration: an evolutionary means might produce unsatisfactory musical ends;
while satisfactory musical ends might be produced by non-evolutionary
means – and vice versa. Some might argue that – despite the potential for un-
satisfactory outputs – evolutionary mechanisms are more “authentic” from
a Universal-Darwinian perspective, not least because the notion of what is
“satisfactory” can only emerge from a taste-culture that – like the music being
appraised – is generated by the VRS algorithm (§6.6.2). As a further complic-
ation, some systems that are not ostensibly or primarily evolutionary might
nevertheless implement certain evolutionary processes. A neural network
(§6.5.1.2), for instance, essentially takes an often highly varied input and
selects regularities within it in order to replicate them in its output. In this
sense, and to adapt the formulations above, such systems are both creative
and implicitly evolutionary (CSMC+E). At the most fundamental level, and
as explored in §6.6.3, the origination of non-evolutionary systems is never-
theless invariably the result of cultural-evolutionary (memetic) processes,
hence my use above of “ostensibly”.

6.3 The (Co)evolution of Music and Language V:


Computer Simulation of Language Evolution
Discussing “chance associations between the phonetic segments of the hol-
istic utterance [constituting Hmmmmm] and the objects or events to which
they related” (§2.7.6), Mithen (drawing on Wray (1998)) argues that “a
learning-agent mistakenly infers some form of non-random behaviour in a
speaking-agent indicating a recurrent association between a symbol string
[proteme] and a meaning, and then uses this association to produce its own
utterances, which are now genuinely non-random” (2006, pp. 253, 256).
Mithen’s remarks echo an element of the argument for the vocal learning con-
stellation made by Merker (2012) in §2.7.5 and §2.7.6 (point 12 of the list on
480 Music in Evolution and Evolution in Music

page 147), and both he and Merker refer specifically to computer simulations
of this hypothesised process conducted by Kirby and his colleagues (Kirby,
2001; Kirby, 2007; Kirby, 2013; Kirby et al., 2015; Scott-Phillips & Kirby, 2010;
Y. Ren et al., 2020; see also Oudeyer & Hurford, 2006; Fitch, 2010, pp. 501–
503) – a field that might be termed the Computer Emulation/Simulation of
Linguistic Evolution (CE/SLE). These models are motivated by the desire to
understand how compositionality evolved in language, using computers to
replicate in minutes processes that occurred over many thousands of years
and that are therefore not directly accessible to us. As “proof of concept”
(Fitch, 2010, p. 502), they suggest that the Wray-Mithen hypothesis for the
evolution of language from Hmmmmm – the fragmentation of musilanguage
into fully compositional language and its associated bifurcation into music
and language – may well reflect evolutionary reality.

In an iterated learning model (ILM) study, Kirby (2001) used agent-based


simulation (§273) to model the transmission of language between an adult
(teacher) agent and a learner agent. He made a distinction between meaning
(expressed here simply as a two-component pattern a, b, each component of
which had a value between 0 and 5 (e.g., a0 , b3 )) and signal (here a character
string drawn from the letters a–z) (2001, p. 103). After the first fifty utterances
by the adult, it became evident that a form of protolanguage had evolved
(2001, p. 105). By a later stage of the simulation, the system had converged
on a fully compositional language (2001, p. 106) in which meaning and
signal had aligned closely under the aegis of a controlling grammar. Further
refinement of the system allowed it to generate “stable irregularity”, of the
type common in natural languages where, for example, some of the most
common verbs are highly but stably irregular (2001, p. 107). Kirby sees this
outcome as a vindication of Wray’s (1998) “associations . . . ” hypothesis,
arguing (apropos a later ILM simulation) that

similarities between strings that by chance correspond to similarities


between their associated meanings are being picked up by the learning al-
gorithms that are sensitive to such substructure. Even if the occurrences
of such correspondences are rare, they are amplified by the iterated
learning process. A holistic mapping between a single meaning and a
single string will only be transmitted if that particular meaning is ob-
served by a learner. A mapping between a sub-part of a meaning and a
6. Computer Simulation of Musical Evolution 481

[segmented, protemic] sub-string on the other hand will be provided


with an opportunity for transmission every time any meaning is observed
that shares that sub-part. Because of this differential in the chance of
successful transmission, these compositional correspondences tend to
snowball until the entire language consists of an interlocking system of
[meaning-proteme] regularities. (Kirby, 2013, pp. 129–130; emphasis in
the original)

Merker elegantly summarises the process as it is thought to have occurred in


real hominin communities and as it has been modelled in computer simula-
tions of these early, language-forming interactions. He argues that

[t]he [song] repertoire . . . is launched on a process of progressive string-


context assortative and hierarchical decomposition from holistic strings
downwards. Taking place as an unintended side effect of intergenera-
tional transmission through the learner bottleneck, the process is en-
tirely passive and automatic, and takes place [initially] for no reason of
instrumental utility whatsoever. (Merker, 2012, pp. 241–242; emphasis
in the original)

In an implicit Universal Darwinism, Kirby is at pains to stress that his sys-


tem is focused “less on the way in which we as a species have adapted to
the task of using language [biological evolution] and more on the ways in
which languages adapt to being better passed on by us [cultural evolution]”
(2001, p. 110). Languages themselves have to adapt (towards greater com-
positionality) because “[h]olistic languages cannot be reliably transmitted
in the presence of a [learner] bottleneck . . . , since generalisation to unseen
examples cannot be reliable” (Kirby, 2013, p. 129; emphasis in the original).
Thus, in his model “there is no natural selection; agents do not adapt, but
rather we can see the process of transmission in the ILM as imposing a
cultural linguistic [i.e., memetic] selection on features of the language that
the agents use” (Kirby, 2001, p. 108). While Kirby focuses on the power of
cultural evolution as the driver of Merker’s “string-context assortative and
hierarchical decomposition”, he nevertheless acknowledges the importance
of the coevolutionary relationship between biological and cultural forces in
language evolution (2013, p. 136). Indeed, as Fitch argues,
482 Music in Evolution and Evolution in Music

[g]iven the importance of linguistic communication to human children,


and given a pervasive change in the nature of the ambient communica-
tion system, biological selection will still occur, favoring ‘segmentation-
prone’ infants who master the new analytic [compositional] system more
rapidly [than other infants] (in contrast to previous generations, where
selection would favor the learning of holistic systems . . . ). (Fitch, 2010,
p. 502).

The biological evolution of “segmentation-proneness” – perhaps fostered


by the effects of the FOXP2 gene (§2.7.6) – might also have been a factor in
memetic drive (§3.7.1). Segmentation (hierarchical decomposition) would
have optimised the capacity for imitation (point 140 of the list on page 255)
by means of a “divide-and-conquer” chunking mechanism that – by fos-
tering the replication of memes, with its attendant aptive benefits to genes
– would have facilitated, directly or indirectly, the differential selection of
Blackmore’s Capacity-to-imitate genes. Moreover, while the lack of a relat-
ively stable meaning-component distinguishes music-cultural evolution from
language-cultural evolution – but see §3.8.5 – the former process has also
been successfully modelled by agent-based systems. Given its appearance
in the CGM outputs of such simulations, it is possible that the “composi-
tional”/recursive-hierarchic structure of HGM arose from musilanguage via
the same mechanisms as the computer-generated language (CGL) simula-
tions of Kirby suggest occurred in human-generated language (HGL). This
is indeed the implication of a study involving iterated learning, Miranda et al.
(2003), discussed in §273.

6.4 Music and/versus Its Representations


Before turning to the evaluation of a sample of music-generative systems
in §6.5, it is necessary briefly to address an issue that affects them all and
that indeed is relevant to many of the topics considered in this book more
generally, albeit sometimes only indirectly. As outlined in §6.1, most ana-
lytical and synthetic systems, however categorised, normally deal not with
music but with representations of music (Selfridge-Field, 1997). Putting aside
the complications attendant upon the ontology of music – which, in a hard
memetic view, exists fundamentally as patterns of neuronal interconnection,
6. Computer Simulation of Musical Evolution 483

potentially as hypothesised in the HCT (§3.8.3) – such systems convert what


humans experience as sounds, plus their associated physical movements,
into some form of cold numerical representation, which inevitably attenu-
ates their richness. Such representations might be MIDI note-numbers, the
“**kern” representation of the Humdrum Toolkit (Huron, 1997), or some other
essentially abstract system, such as the text-based museme-representations
in Figure 3.14.261 This “representation problem” is closely connected to the is-
sue of conscious experience (§7.3), for while music (and phenomena in other
sensory modalities) is presumably encoded in our brains in an essentially
abstract manner, it is somehow rendered powerfully vibrant in conscious
experience and, through embodiment, is made visceral for us. In the case of
CGM, while the systemic representations of music are comparably abstract,
we can be quite sure that the machines running the simulations are not con-
scious, in the sense of their being capable of experiencing the resulting music
as a human does.

Philosophically, the representation problem, and the poverty of experience it


motivates, might be regarded as a significant flaw of music-generative sys-
tems, one that militates against their utility in demonstrating, for instance, the
operation of the VRS algorithm in cultural evolution. How, one might argue,
can a machine be used to explore the evolution of music through Darwinian
processes in human societies if that machine is incapable of experiencing
the emotions and physicality central to musicality in our species? These
vibrant and visceral sensations of music and movement can be understood
as qualia – the specific experiences that form a component of consciousness –
the explanation of which constitutes the essence of the “hard problem” of
consciousness (§7.2.1). One might counter this by saying that as long as the
machine has some way of encoding (abstract) representations of emotional
states and physicality as a component of its algorithms for determining fit-
ness – the latter, on the “museme’s eye view” (Blackmore, 2000a; Dennett,
2017, Ch. 10), an index of its selfishness – then the specific phenomenological
experiences a museme engenders in humans are incidental to the operation
of the VRS algorithm in silico, even though this is not the case in vivo. This
circumvents the hard problem of consciousness insofar as the machine gener-

261 This also holds true for robotic systems (Miranda, 2008), which, even though they utilise

physical movements, represent these gestures as symbolic codes.


484 Music in Evolution and Evolution in Music

ation of music is concerned, by decoupling qualia – which themselves arise


from (higher-level) evolutionary processes (§7.3) – from the (lower-level)
evolution of musemes. By extension, the non-necessity of qualia to the op-
eration of the VRS algorithm might also be held to apply to systemic views
of consciousness arising in electronic networks (§7.6), without necessarily
precluding the (eventual) evolution of qualia therein.

6.5 Overview and Critique of Music-Creative Systems

As outlined in §6.1, the aim of this section is to survey a number of different


strategies employed by music-generative systems, each approach being illus-
trated by the consideration of one or two representative systems, in order to
examine the underlying design philosophies and to evaluate their outputs.
It should be evident that, owing to the rapid progress being made in digital
technologies, it is likely that this survey will rapidly become out of date –
perhaps at a greater rate than that of the scientific data drawn upon in this
book – with once cutting-edge systems soon becoming obsolete and thus
only of historical value. While the evaluation of music-generative systems is
complex (§6.6.2), the aim here is to get a general sense of how similar the
selected systems’ outputs are to HGM, to determine if this alignment relates
to their underlying algorithms, and to ascertain if those systems that produce
music using explicitly evolutionarily approaches are able to “outperform”
those that do not. Naturally this is highly subjective – the criteria for assess-
ing the similarity of CGM to HGM and those for determining one system’s
outperformance of another are intrinsically contingent, fluid and relative –
and there is not room here for a fully comprehensive and systematic survey;
but the working hypothesis is that the VRS algorithm is, almost by definition,
the best way to bootstrap quality (however evaluated), whether that be in
human-generative or computer-generative environments.

CSMC (or whichever of the rubrics in §6.2 is used to describe it) is a relatively
new field – momentum in it began to build significantly in the 1990s – and, as
represented in Figure 6.1, necessarily incorporates several related disciplines
beyond the purely computational.
6. Computer Simulation of Musical Evolution 485

Computational
Creativity Memetics

Cognitive Sociology
Science
CSMC

Music Music
Theory Representation

Figure 6.1: The Ambit of Computer Simulation of Musical Creativity.

Figure 6.2 shows a possible taxonomy of extant systems, arranged according


to the AI techniques employed; apropos the continuum of §6.2, augmentation
systems, fully automatic generative systems, and those in between, can in
principle belong to any of the taxonomy’s categories. It is partly guided by the
magisterial surveys undertaken by Fernández and Vico (2013), which offers a
taxonomy “structured by methodology” – i.e., by the operational mechanism
of the underlying algorithm (2013, pp. 518–519, Fig. 1);262 and (to a lesser
extent) by Herremans et al. (2017), which presents a “functional taxonomy”
based on the range of musical domains – melody, harmony, rhythm and
timbre – in which generative systems have been developed to operate (2017,
p. 3, Fig. 1).263 The present section does not, however, attempt to rival
Fernández and Vico (2013) or Herremans et al. (2017) in scope or depth –
both have a number of subtle subdivisions and both survey a larger body of

262 Fernández and Vico (2013, p. 519, Fig. 1) also list methods for music generation that fall

outside the scope of AI – i.e., approaches that are “not based on models of human creativity” –
such as cellular automata.
263 Herremans et al. (2017) is complemented by an online repository (Herremans, 2022) of

generative systems and their outputs in order “to provide a place for music researchers to
exchange their results and make their works more visible”.
486 Music in Evolution and Evolution in Music

Multimedia
Systems
Hybrid
Systems
Multi-Algorithm
Systems
Agent-Based
Systems
Genetic/Evolutionary
Algorithms
Optimisation Non-Agent-Based
Systems Systems
Local Search
Algorithms
CSMC
Constraint-Satisfaction
Systems
Knowledge/Rule-Based
Systems
Grammar-Based
Systems

Markov
Models

Machine-Learning Neural
Systems Networks

Recombination
Systems

Figure 6.2: Taxonomy of Computer Simulation of Musical Creativity Systems.

literature – mine being focused primarily on those music-generative systems


based on evolutionary models.264

Under their schema, Fernández and Vico (2013) posit three high-level cat-
egories: (i) Machine Learning systems (those abstracting statistical regular-
ities from a dataset and using this information to generate further data in
accordance with those regularities); (ii) knowledge/rule-based systems (which
they also term Symbolic AI) (systems incorporating extant grammatical/
syntactic knowledge/rules about the target domain that is used to generate
new, grammar-conformant outputs); and (iii) Optimisation systems (those
finding the best solutions to problems, often using the most powerful means
of achieving this, the VRS algorithm), these categories being indicated in
bold on Figure 6.2. Two broader points made by Fernández and Vico (2013)
offer useful context for these three categories: (i) by virtue of the operation
of their algorithms, many systems undertake analysis before they proceed
to synthesis, reminding us that the distinction made between them outlined
in §6.1 is not hard-and-fast (2013, p. 526); and (ii) several systems deploy
264 I am grateful to Valerio Velardo for his thoughts on taxonomies of music-generative systems,

which have also informed Figure 6.2.


6. Computer Simulation of Musical Evolution 487

not one mechanism (algorithm type) but several, therefore utilising a hybrid
generative methodology (2013, p. 561) (§6.5.4). As might be evident from
comparable endeavours in biology, the taxonomy in Figure 6.2 is only one
of many possible arrangements: those systems surveyed could also have
been classified chronologically by date of implementation, or aesthetically by
perceived/assessed success of outputs, among other criteria; but the broadly
“categorical” approach used here is intended to distinguish clearly between
the philosophies underlying each system and, concomitantly, the basic mech-
anisms by which that philosophy is put into practice via algorithm-design.
As noted in §6.2, the primary focus here is upon systems based upon evol-
utionary principles – in terms of both categories (i) and (ii) on page 477
– although not all represent a thoroughgoing implementation of the VRS
algorithm. Including non-, partly-, and wholly-evolutionary systems in this
consideration allows for at least a preliminary assessment of the issue of
means versus ends in music generation raised in that section.

The taxonomy in Figure 6.2 does not establish separate categories for those
systems that produce their outputs offline, as code that may subsequently
be converted to score notation for later human performance or audio files
that can be played later; and those that generate music online, in real time
(Tatar & Pasquier, 2019, pp. 62–63), the latter sometimes in the context of
human-machine interactive live performance and/or improvisation (termed
“interactive reflexive musical systems” by Fober et al. (2019, p. 1)). Clearly
those of the latter type must demonstrate rapid intelligent interaction with
the ideas produced by their human colleagues, whereas the former are
under no such restriction, being limited only by the constraints of their
internal dynamics. Nevertheless, advances in computer processing power
may sometimes result in the human being the drag in such systems, not
the machine, even though the human often has the edge when it comes to
fecundity of invention. While synchronic (offline) and diachronic (online)
outputs represent very different ends, the underlying means are often very
similar, and so their treatment is integrated here. A bridge between these
two realms is afforded by systems that output their generated music in
real time not as sound but in the form of western notation, such as that
developed for Eigenfeldt’s work An unnatural selection (Eigenfeldt, 2014b),
which represents a “continuation of research into expressive performance
488 Music in Evolution and Evolution in Music

within electroacoustic music by incorporating instrumentalists rather than


synthetic output” (Eigenfeldt, 2014a, p. 276). Here, human performers play,
or rather they sight-read, music created by a computer that is both generated
– using a combination of Markov models (§6.5.1.3) and genetic/evolutionary
algorithms (§6.5.3.2) – and notated on a tablet-device display in real time
(2014a, p. 283).

6.5.1 Machine-Learning Systems


This category encompasses systems that are trained on some domain-specific
dataset in order that they can subsequently reproduce what they have learned
in their own outputs. They internalise the regularities of the target domain by
means of statistical learning – essentially a process of noticing patterns and
remembering them (see point 15 of the list on page 148). In music, this learn-
ing involves the extrapolation of the various recurrences that define musical
styles and that, because they are constrained by perception and cognition,
also foster comprehension. In the terminology of this book, such recurrences,
as culturally transmitted phenomena, are by definition memetic. Subsequent
to this analytical stage, machine-learning approaches in music generation
simulate the concatenation of abstracted musemes to form musemeplexes
and musemesätze in ways that align with those in the training repertoire.
Thus, such systems need to be capable of learning high- and intermediate- as
well as low-level pattern-regularities in order to generate convincing music.

6.5.1.1 Recombination Systems

A machine-learning approach is found in a number of systems designed by


David Cope. One of the first pioneers of computer-composed music, his first
attempts in this direction were motivated by a desire to use the computer as
an augmentation system to help generate ideas for his own compositional
work and act as a stimulus to his creativity. Conducted under the rubric
of Experiments in Musical Intelligence (EMI; colloquially, “Emmy”) (Cope,
1996; Cope, 2015) – which is both the name of a research project and the
computer program that implements it – an important principle underpinning
his research is recombination. In such systems a lexicon of musical patterns
is learned, generally by the decomposition of one or more source works into
units whose identity is afforded by, among other factors, their recurrence;
6. Computer Simulation of Musical Evolution 489

and the resulting units are then (re)assorted in ways that produce music
that aims to be both syntactically correct and aesthetically satisfying. Cope
argues that

[m]uch of what happens in the universe results from recombination. The


recombination of atoms, for instance, produces new molecules. Com-
plex chemicals derive from the recombination of more rudimentary
particles. Humans evolve through genetic recombination and depend on
recombination for communication, since language itself results from the
recombining of words and phrases. Cultures thereby rely on recombina-
tion to establish and preserve their traditions. Music is no different. The
recombinations of pitches and durations represent the basic building
blocks of music. Recombination of larger groupings of pitches and dura-
tions . . . form[s] the basis for musical composition and help[s] establish
the essence of both personal and cultural musical styles. (Cope, 2001,
p. 1)

In Hofstadter’s summary – “an accurate account of the fundamentals of


the program’s processes”, in Cope’s view (2001, p. 83) – EMI operates by
means of two processes: “(1) chop up; (2) reassemble” (in Cope, 2001,
p. 44).265 Chopping up is achieved by searching for regularities – composers’
style-specific “signatures” plus more generic material (Cope, 1998; Cope,
2001, pp. 48–49; Cope, 2003) – in some input, a corpus of music whose style
EMI is intended to imitate in its own outputs. Chopping up – coindexation-
determined segmentation – is accomplished by parsing extant HGM for recur-
rences of the well-formed units that tend to result from gestalt-psychological
processes of pattern-formation. To reiterate Calvin’s phrase from §2.7.6, “that
which is copied may serve to define the pattern” (1998, p. 21). Essentially,
the units arrived at in this stage are musemes, although Cope does not use
this term, nor does he invoke memetics to describe them. Reassembly is argu-
ably more problematic and, again according to Hofstadter, consists of two
sub-processes: “([2.]1) Make the local flow-pattern of each voice similar to
that in source pieces; ([2.]2) Make the global positioning of fragments similar
to that in source pieces” (in Cope, 2001, p. 44; emphases in the original).
These two sub-processes are coded as “syntactic/formal meshing” and “se-
mantic/content meshing”, respectively, by Hofstadter (in Cope, 2001, p. 44).
265 Hofstadter’s summary is given in Chapter 2 of Cope (2001) (“Staring Emmy straight in the

eye – and doing my best not to flinch”), of which he is the author.


490 Music in Evolution and Evolution in Music

Chopping up and reassembly naturally result in the recombination of pat-


terns central to the operation of EMI. They also engender replication, because
the resulting patterns are subsequently redeployed in generated works. Not
only is the latter phenomenon central to Universal Darwinism, it is also key
to the notion of style in music, at least in the conception of Meyer: he argued
that “[s]tyle is a replication of patterning, whether in human behavior or
in the artifacts produced by human behavior, that results from a series of
choices made within some set of constraints” (1996, p. 3).

The first of the reassembly sub-processes ([2.]1; local flow-pattern) again de-
volves to two (sub-sub-)processes: “([2.1.]1) voice-hooking; [and] ([2.1.]2)
texture-matching” (in 2001, p. 45). Voice-hooking requires voice-leading
continuity in the output piece between a museme, x1 , and that, y1 , chosen
to follow it sequentially.266 Voice-hooking is broadly analogous to the mus-
eme parataxis underpinning the RHSGAP model (§3.5.2), which, in my
formulation, is partly contingent upon the strength of implication-realisation
pressures spanning museme segmentation boundaries. For instance, a Pro-
cess (Narmour, 1990, p. 89) initiated at the end of one museme, if continued
in the following museme, will tend to bind the two together, attenuating the
force of the segmentation boundary separating them and tilting the balance
between openness (connection) and closure (disconnection) typical of most
linear/diachronic art-forms towards the former. Texture-matching, perhaps
more simply, requires the adjustment of the (accompaniment) texture of
an input museme so that it conforms with that of its new context in the
output composition (2001, pp. 45–46). The second of the reassembly sub-
processes ([2.]2; global positioning) is arguably the more complex element.
In brief, patterns at a number of hierarchic levels – in my terms, musemes,
musemeplexes, and musemesätze, moving recursively upwards – are given a
functional designation by Cope drawn from a set represented by the acronym
“SPEAC”. These functions – Statement, Preparation, Extension, Antecedent
and Consequent – are intended, as Hofstadter conceives them, to represent
the “tension-resolution status” of the pattern (in Cope, 2001, p. 46). Thus, as
Hofstadter notes, “any local fragment of an input piece winds up with a set
of labels – its own label, that of the larger fragment inside which it sits, then

266 Specifically, voice-hooking requires that the cross-pattern juxtaposition of pitches in EMI’s

output, x1 –y1 , should match that which obtained in the original input, x–y (2001, p. 45).
6. Computer Simulation of Musical Evolution 491

that of the next-larger fragment in which that one sits, and so on, and so on”
(in Cope, 2001, pp. 46–47).

Having outlined the nature of the algorithm underpinning EMI, one might
wonder how closely it relates to what is known of the processes driving
the generation of music in human brains and cultures, compositionally and
improvisationally – and thus how convincing are its outputs. In Cope’s view,
“composers compose recombinantly. I use this term deliberately, since I believe
Experiments in Musical Intelligence uses processes of recombinance similar
to those that human composers use to compose” (2001, p. 89; emphasis in
the original). In saying this, Cope is arguably asserting, albeit not in these
terms, that the hypothesis of §3.5.2 – that competition between members of
an allele-class of musemes to instantiate a structural locus/node of a mus-
emeplex, which then, as a member of an allele-class of musemeplexes, itself
competes to instantiate a component of a musemesatz – is the fundamental
mechanism underpinning human composition (and improvisation), this con-
ception therefore guiding the algorithmic basis of EMI. As a note of caution,
we might nevertheless recall Temperley’s comments on modelling cited on
page 477 – that “the mere fact that a [computer] model performs a process
successfully certainly does not prove that the process is being performed
cognitively in the same way” (2001, p. 6). Thus, even when EMI does perform
the process of composition successfully – as is suggested is the case below –
this does not necessarily mean that the RHSGAP model actually underpins
human perception and cognition during the music-generative process, des-
pite its elegance and parsimony.267 In its defence, the final paragraph of this
section argues that there are significant differences between EMI’s (partly
Darwinian) functionality and the (fully Darwinian) RHSGAP model that to
some extent ameliorate Temperley’s caution.

On the question of its producing convincing music, many people are evid-
ently “fooled” by the outputs of EMI. That is, they hear the music it produces
and they come to the conclusion that it is the work of a human composer.
This is attested by the success of EMI in what Cope terms “The Game”. This
267 This is perhaps naively to assume that all composers work, and have worked historically,

in the same way. While they clearly have not – as evidenced by the enormous variety of past
and present musics – the argument of this book is that there are a number of common (natural)
cognitive processes underpinning the generation of music (on account of Homo sapiens’ shared
genetic heritage), despite their often highly varied cultural (nurtural) manifestations.
492 Music in Evolution and Evolution in Music

is played by presenting to a sample of listeners a variety of music, some by


human composers, some by EMI, and asking them to differentiate between
the two categories (2001, p. 13). According to Cope, “[r]esults from previ-
ous tests with large groups of listeners, such as 5000 in one test in 1992 . . . ,
typically average between 40 and 60 percent correct responses” (2001, p. 21).
Here the lower the rate of correct responses, the more convincing are the
outputs of EMI: a score of 0% indicates that EMI is entirely convincing, in
that listeners cannot distinguish its music from that of human composers
– and vice versa for a score of 100%. This suggests, on the most positive
interpretation, that EMI is capable of passing what is effectively a Turing Test
– Ariza’s (2009) caveats in §6.1 notwithstanding – at least in the estimation
of a significant proportion of its listeners. Nevertheless, further research is
needed on the correlations between musical knowledge and training and the
ability to resist being fooled by EMI, or indeed any other music-generative
system – which are presumably directly proportional.

To what extent can EMI be regarded as Darwinian? Certainly recombination


is a feature of both biological and cultural evolution, in that, in the former,
sexual reproduction involves the assortative recombination of gene alleles
from both parents in the offspring, as occurs during the crossing-over phase
of meiosis; and, in the latter, the RHSGAP model hypothesises, in a form of
abstract crossing-over, the allelic substitution of structurally and functionally
analogous musemes and musemeplexes. But Darwinism requires more
than mere shuffling. Indeed, reassortment is itself only one aspect of the
variation component of the VRS algorithm. Cope’s model seemingly does not
encompass the mutation that is essential for the creation of potentially aptive
information-diversity – the low-level novelty-generation underpinning the
higher-level processes of pattern shuffling in reassortment. Moreover, while
EMI, as noted above, implements replication – by virtue of the recombination
of identified patterns – it is not entirely clear from Cope’s accounts how
selection operates, namely how EMI decides which patterns among a set of
candidate alleles to favour for a given structural locus. On this basis, and
while certainly partly Darwinian, Cope’s program cannot – to the extent
that its detailed operation is understood – be regarded as a fully Darwinian-
evolutionary system.
6. Computer Simulation of Musical Evolution 493

6.5.1.2 Neural Networks

This section will explore the generative power of Artificial Neural Networks
(ANN), focusing on a system developed to assimilate and replicate stylistic
regularities in folk music (for systems emulating rock and jazz, see (Dadabots,
2021)). First developed in the mid-twentieth century, an ANN, sometimes
called a “connectionist” system (P. M. Todd & Loy, 1991), is a program that
attempts to simulate networks of neurons in the animal brain, using virtual/
functional equivalents of biological structures (Zou et al., 2009; Rosa et al.,
2020). Their basic function is to learn – usually understood as the capacity to
form stable categories from some set of input data – and thus they have been
a key architecture in the field of machine learning. ANNs have their basis in
the notion of Hebbian Learning (Hebb, 1949), the principle, discovered in
the 1940s, that understanding of the world, as mediated by sensory stimuli,
is represented by the brain in the form of connections between neurons of
differential strengths.

As outlined in §3.8.3, the Hexagonal Cloning Theory (HCT) (Calvin, 1998)


formalises the neuronal connections underpinning Hebbian Learning in
terms of interdigitating triangular arrays in the cerebral cortex, these organ-
ised into hexagons with a characteristic spatiotemporal firing pattern. The
cloning of a particular configuration across the surface of cortex represents its
competitive (selective) success over rival candidates for alignment between
incoming data and patterns stored by basins of attraction in the connectiv-
ity. The HCT offers a robust model of brain function, able to account for
pattern learning and recall via operation of the VRS algorithm in a neur-
onal Darwin machine. While ANNs only loosely approximate the two- and
three-dimensional structures proposed by the HCT (but see below), they
nevertheless replicate its operating mechanism: they detect and encode uni-
parametric components of multiparametric input data; they learn statistical
regularities (multiparametric association frequencies) in such data; and they
separate learned patterns from surrounding “noise”. In short, they are also
a Darwin machine.

An ANN is a “sandwich” consisting of several layers of virtual neurons,


usually represented two-dimensionally as columns of neurons arranged
from left to right or bottom to top. The input layer (far left/bottom) and the
494 Music in Evolution and Evolution in Music

output layer (far right/top) are separated by at least one intermediate “hidden”
layer. Above (to the right of) the input level, each neuron receives several
inputs via connections from neurons in the layer below (to the left of) them.
Using a propagation function, the value of each input is multiplied by some
weight before being summed to form a combined input. This input may be
further adjusted by an activation function, which serves to restrict the value of
the summed input to conform to some scale. Neurons may be retrospectively
re-weighted by backpropagation in the light of an assessment of the fit of the
output category to the input data (Arnx, 2019). As a categorisation device,
an ANN seeks certain statistical regularities in the input and outputs its
“understanding” of the configuration of these recurrent patterns. In supervised
learning tasks, a desired output category, such as the configuration of a
specific musical pattern, is pre-specified and the occurrences of the sought
pattern in the input data are given in the output. In this sense, supervised
learning is an example of a classification problem, to recall the distinction
made by Große Ruse et al. (2016) in §5.4.1.2. In unsupervised learning tasks,
the network is allowed to alight upon regularities it detects in the input,
forming its own categories according to the strength (encoded as network
weights) of features in the input data. In this sense, unsupervised learning
is an example of a clustering problem.

The power of ANNs to categorise has been explored in the music-analytic


tradition, perhaps most notably in four seminal articles by Gjerdingen (1989a,
1989b, 1990, 1992) that explored the use of adaptive-resonance-theory (ART)
networks (an architecture developed by Grossberg (1987)) in (unsupervised)
music-clustering problems. In brief, these studies show that a multi-layered
network can abstract individual pitch elements (level “F1 ”; level eight in
Table 1.4) from a set of pieces; it can detect the stable associations of pitches
in this set that constitute musemes (“F2 ”; level seven); it can recognise the
replicated sequences of musemes that generate musemeplexes (“F3 ”; level
six); and it can develop high-level representations of similarity such as are
embodied by a musemesatz (“F4 ”; level five) (Gjerdingen, 1990, p. 360, Fig.
8; see also Jan, 2011a, sec. 5, Fig. 14). These levels are also marked on Figure
3.17, to indicate how they relate to the operation of the HCT.
6. Computer Simulation of Musical Evolution 495

ANNs exist in a variety of different architectures appropriate to the task at


hand. A type that has been developed extensively in recent years is the Deep
Neural Network (DNN), which has more than one hidden layer and which is
particularly suited to complex unsupervised learning tasks. These tasks are
often subsumed under the rubric of deep learning, which concerns the applic-
ation of machine-learning algorithms to data-rich domains (Schmidhuber,
2015; I. Goodfellow et al., 2016; Briot, 2021). Subtypes of the DNN include
Convolutional Neural Networks (CNN), which are well suited to applications
involving static data, such as images (Cireşan et al., 2011), as seen in Google’s
DeepDream image-manipulation software (Mordvintsev et al., 2015); and Re-
current Neural Networks (RNN), which are effective in applications involving
dynamic data, such as the sequentially/temporally organised information to
which music can be converted (Sturm et al., 2016, p. 3).268 A RNN

is any neural network possessing a directed connection from the output


of at least one unit [neuron] into the input of another unit located at a
shallower layer than itself (closer to the input). A deep RNN is a stack
of several RNN layers, where each hidden layer generates an output
sequence that is then used as a sequential input for the deeper layer.
With deeper architectures, one expects each layer of the network to be
able to learn higher level representations of the input data and its short-
and long-term relationships. The recurrence (feedback) present in an
RNN allows it to take into account its past inputs together with new
inputs. Essentially, an RNN predicts a sequence of symbols given an
input sequence. (Sturm et al., 2016, pp. 2–3)

While initially developed as learning devices, ANNs can redeploy what they
have learned to generate, in the case of music, “new” pieces by reassembling
certain of the abstracted attributes of some training sample. To this end,
Sturm and Ben-Tal (2017) and Sturm et al. (2015, 2016) developed a related
pair of RNN systems called char-rnn and folk-rnn, training them on a corpus
of some 23,635 melodies of Irish folk music contributed by users of the
online folk-music community The Session (Various, 2021) and generating
some 30,000 output tunes (Sturm & Ben-Tal, 2017, p. 7). Specifically, the
training sample consisted of transcriptions of that repertoire (Korshunova,
2016) into the text-based ABC symbolic music notation language (Walshaw,
268 Related deep-learning architectures include the Generative Adversarial Network (GAN) (I. J.

Goodfellow et al., 2014) and the Variational AutoEncoder (VAE) (Guo et al., 2020).
496 Music in Evolution and Evolution in Music

2019) (and is therefore subject to the “representation problem” raised in


§6.4). In terms of the difference between the two systems, char-rnn “operates
over a vocabulary of single characters, and is trained on a continuous text
file”; whereas folk-rnn “operates over a vocabulary of transcription tokens,
and is trained on single complete transcriptions” (Sturm et al., 2016, p. 4).
Essentially, char-rnn builds up its understanding of the repertoire from the
atomic level (single ABC characters); whereas folk-rnn builds it from the
molecular level (groups of ABC characters). The latter includes the melodic
patterns that, on account of their recurrence in the training set (and that are
detected by folk-rnn), constitute musemes.

For various complex reasons, RNNs sometimes struggle to alight upon reg-
ularities in the input, an issue that can be solved by using a long short-term
memory (LSTM) architecture. This modifies the activation function in ways
that help to foster convergence (Sturm et al., 2016, p. 3). Both char-rnn and
folk-rnn use three hidden layers with five-hundred and twelve LSTM “cells” or
“blocks” each (2016, p. 6). Having said on page 493 that ANNs only loosely
approximate the structures proposed by the HCT, LSTM cells nevertheless
appear to be the closest functional equivalent to the hexagonally coordinated
triangular arrays of the HCT. An LSTM architecture “increases the number
of parameters to be estimated in training, but controls the flow of information
in and out of each cell to greatly help with convergence . . . ” (2016, p. 3). In
terms of Calvin’s model, the parametric increase relates to the association
of multiple feature-encoding triangular arrays within the constraints of a
hexagonal plaque; and convergence pertains to the formation of the basins
of attraction within the connectivity that stably encode regularities in the
input.

Figure 6.3a shows an example of one of folk-rnn’s training inputs, the jig
“Thank God we’re surrounded by water” (melody no. 2611, second version,
in The Session’s database);269 Figure 6.3b shows one of the system’s generated
outputs (melody no. 2857, as transcribed in Sturm, 2017b, p. 2871); and
Figure 6.3c shows an improved version of the melody of Figure 6.3b, with

269 Sturm and Ben-Tal (2017) and Sturm et al. (2016) assembled their training sample in 2015,

so any melody listed on The Session at that time would have been included. The Session website
indicates when tunes were added to its database, so those melodies included in/excluded from
the training sample can be identified.
6. Computer Simulation of Musical Evolution 497

suggested harmonisation, by Sturm (2017a), transposed to the “Ddor” of


the original version in Sturm (2017b, p. 2871).270 Of the system-generated
melody (Figure 6.3b), Sturm says “I can’t remember how I came across this
tune, which appears in The folk-rnn Session Book Volume 1 . . . [(Sturm, 2017b),
a collection of the system’s generated compositions], but I do remember
falling in love with it immediately” (2017a).

One can understand Sturm’s affection for Figure 6.3b: it certainly has a
pleasing lilt to it, and the dorian implications unfold effectively. There is
also a degree of musemic “logic” here, in that the arpeggio pattern b1 –g1 –
e1 –e1 (b. 41–5 ) is answered by its transposition a1 –f1 –d1 –d1 (b. 161–5 , second
time-bar).271 There are some infelicities, however, the most grating among
which are the circumvention of the strong D-minor implication in b. 10
by the following abrupt C major in b. 11 (corrected in Figure 6.3c as per
the description in note 270 on page 497), and (paradoxically) the D-minor-
Z
implying b of b. 74–6 , which does not integrate smoothly with the melody’s
prevailing dorian mode.

While Figure 6.3a is but a small fraction of the input corpus assimilated, and
while Figure 6.3b is an even smaller fraction of the system’s output, there
are nevertheless interesting similarities between the two, which suggest that
certain attributes of Figure 6.3a were shared by other input tunes, were
therefore abstracted by folk-rnn, and were redeployed in Figure 6.3b (and
presumably in other output melodies), just as happens in human-only neural
networks. At the highest structural-hierarchic level, and by means of a
comparative statistical analysis of the training sample and all the generated
outputs, Sturm and Ben-Tal (2017, pp. 7–8, Tab. 2, Tab. 3) determined
that most of folk-rnn’s output melodies follow “the conventional structure
AABB, with each section being eight bars long, with or without pickup
bars, or explicit repetition tokens at the beginning of sections”. This “tune
(A)–turn (B)” form, typical in Irish traditional music (Sturm et al., 2016,
p. 9), is evident in Figure 6.3b, which indeed follows the AABB structure
of Figure 6.3a. Moreover, and as with Figure 6.3a, the generated melody
270 I have made a few further modifications to Sturm’s version in Figure 6.3c, correcting some

odd harmonisations and, most significantly, changing the c2 of b. 111–2 to d2 .


271 More broadly, the system seems to have assimilated this tradition’s stylistic convention of a

repeated ˇ “( –ˇ “( or ˇ “( –ˇ “ pattern with the second duration being approached by falling motion that
is evidenced by this museme and by musemes in other generated outputs.
498 Music in Evolution and Evolution in Music

(a) Example of folk-rnn’s Training Input: Melody no. 2611, “Thank God we’re sur-
rounded by water”, Second Version.

(b) Example of folk-rnn’s Generated Output: Melody no. 2857, Original.

(c) Example of folk-rnn’s Generated Output: Melody no. 2857, Version 2, as Modified
by Sturm.

Figure 6.3: Examples of folk-rnn’s Training Input and Generated Output.


6. Computer Simulation of Musical Evolution 499

also incorporates an antecedent-consequent periodicity in the tune section


(bb. 0–4; bb. 5–8), again indicating the system’s “understanding” of another
presumably corpus-wide aspect of large-scale organisation (unlike Figure
6.3b, however, Figure 6.3a also has an antecedent-consequent periodicity in
the turn section).

Beyond these musemesatz- and musemeplex-level similarities, museme-


level alignments between Figure 6.3a and Figure 6.3b – as manifestations
of regularities in the training corpus assimilated in the generated corpus
– include the d1 –c1 –d1 museme (Figure 6.3a, b. 14–6 ; Figure 6.3b, b. 14–6 ),
and the Ionian-mode-defining segment g1 –a1 –b1 –c2 (Figure 6.3a, bb. 22 –31 ;
Figure 6.3b, b. 22–6 ). The latter pattern in Figure 6.3b is an example of a pitch-
sequence recurrence that does not conform to gestalt principles (even though
it arguably does in Figure 6.3a). A RNN might alight upon such potentially
“invisible” sequences – those that might not be able to function as a candidate
museme because they do not constitute a perceptually-cognitively salient unit
for humans – on the basis of brute recurrence alone; and it might redeploy
them in a similarly invisible manner (as is the case in Figure 6.3b). Beyond
this example, the fact that a particular pitch sequence occurs in a training
sample in significant numbers for it to be learned by a system suggests that
it must nevertheless satisfy gestalt chunking criteria to a sufficient extent to
constitute a museme and thus to be replicated by humans in the tradition
from which that system learns. The issue is more complex than this, however,
because folk-rnn appears to build its knowledge, in part, from musemic half-
bar and whole-bar segments, so certain “invisible” – half-bar- and barline-
straddling – patterns, such as the g1 –a1 –b1 –c2 of Figure 6.3b, might arise
indirectly as artefacts of the repeated parataxis of certain “visible” (musemic
half-bar- and whole-bar-aligned) segments.

6.5.1.3 Markov Models

A Markov Chain (MC) represents a series of choices where the likelihood of


a particular choice being made depends only on the outcome of the previous
choice. More formally, “Markov sequences represent stochastic processes
having the ‘Markov property’ . . . . This property says that the future state
of the sequence depends only on the last state . . . ” (Pachet & Roy, 2011,
p. 150). A Markov system uses statistical learning to internalise the rules
500 Music in Evolution and Evolution in Music

C+ D- F+ G+ A-
C+ 1–5 6–20 21–50 51–80 81–100
D- 1–18 19–20 21–60 61–85 86–100
F+ 1–30 31–48 49–50 51–81 82–100
G+ 1–20 21–43 44–73 74–75 76–100
A- 1–30 31–48 49–65 66–98 99–100

Table 6.1: Transition Probability Table for Rock-Style Harmonic Progressions.

underpinning such progression-related probabilities in some domain. Thus,


while Markov models might also be considered under the rubric of know-
ledge/rule-based systems (§6.5.2), they are included here under the present
category of machine-learning systems because they often embody know-
ledge acquired by means of the analysis of some training corpus. In music,
Markovian principles have been utilised by systems that generate sequences
of events such as melodic pitches, rhythm values, chord progressions, etc. –
discrete entities in these domains representing one of the “states” referred to
above – each of which follows probabilistically from its antecedent.

Learned regularities in a domain are commonly represented in Markov sys-


tems by means of a table of transition probabilities that expresses the likelihood
that a state Sn will be followed by a state Sn+1 (Pachet & Roy, 2011, p. 149).
Table 6.1 shows one such table – assembled from a number of probability
vectors, where entries, all of which are positive, sum to 1 – suitable for gener-
ating chord progressions in the style of rock music. Here, each chord-type –
corresponding to Roman numerals I, II, IV, V and VI in the key of C major –
represents a state. Starting with the chord of C major in the left-hand column,
a randomly generated number determines the second chord in the progres-
sion, this being the one in the top row that corresponds with the selected
number. The size of the chosen chord’s encompassing number-range repres-
ents the progression’s transition probability. Having generated the second
chord, the process is repeated, starting from whichever row the second chord
occurs in on the far left-hand column, and so on.272

272 I am grateful to Valerio Velardo for these transition probabilities.


6. Computer Simulation of Musical Evolution 501

As the top row of Table 6.1 indicates, there is a 5% probability (5-0 = 5/100)
of the initial C major chord’s being followed by another C major chord, but a
29% probability (80-51 = 29/100) that the second chord will be a G major
chord. If G were indeed selected, then the next iteration of the algorithm
would start on the fourth (G+) row and generate a third chord by means
of a second random number, etc. Some progressions have a probability
of zero because the chords of E minor (III) and B diminished (VII) are
not even admitted as harmonic possibilities in Table 6.1, so they cannot
occur in progressions. Thus, Table 6.1 is implicitly adopting a theoretical
position on chord and chord-progression frequency that hypothesises the
non-occurrence of these two chords in the style emulated by the table. As the
corpus analysis conducted by De Clercq and Temperley (2011, p. 60, Tab. 2)
indicates, this reading is (intentionally) erroneous: the actual predominance
of III and VII in the corpus they studied – one hundred rock songs, made up
of the top twenty per decade from 1950–2000 – is 0.019 (i.e., the chord-type is
that of 19% of all chords in the corpus) and 0.004 (4%), respectively. Corpus
analysis also helps to arrive at a more nuanced transition probability table:
according to De Clercq and Temperley (2011, p. 61, Tab. 3), the transition
I–III occurs on forty-four occasions in their corpus, so its probability can be
calculated as a proportion of all the transitions within the corpus.

Such “first-order” MCs generate locally coherent musical sequences, on


account of their embodying the low-level statistical regularities of the under-
lying style, but they are notoriously prone to producing a meandering output,
one lacking any medium- or large-scale sense of direction. “Higher-order”
MCs offer a partial solution to this problem, because they group “atomic” ele-
ments into larger, “molecular”, chunks to form a state. If a given unit – such
as the single chords in Table 6.1 – formed a state in a first-order MC, then two
such units are considered to constitute a state in a second-order MC, and three
in a third-order MC, etc. (Shamshad et al., 2005, pp. 694–695). In this sense,
a state in a first-order MC is equivalent to an entity at level eight in Table 1.4,
whereas in a third-order MC it is equivalent to an entity at level seven. Thus,
higher-order MCs afford an opportunity to internalise regularities in terms
of musical patterns made up of note sequences – musemes, as opposed to
solitary “verticals” – that are, en bloc, the objects of the stochastic process.
Nevertheless, higher-order MCs are also prone to the non-developmental
502 Music in Evolution and Evolution in Music

circularity of first-order MCs, but at a higher structural-hierarchic level: a


pattern might recur, but without the developmental modifications to which
a human composer might subject it. As a partial solution, “variable-length
Markov Models” (VMM) are able “to capture statistical correlations of dif-
ferent length scales in a single probabilistic model” (Pachet & Roy, 2011,
p. 151), affording the opportunity to generate patterns learned from inputs
containing “overlapping” and “nested” structures (Jan, 2011a, sec. 4.1.2,
para. 57).

Further refinements are afforded by “hidden Markov Models” (HMM),


where “hidden (not observable) states are added, as a way to better rep-
resent the context. Observable states can be considered as specific control
properties” (Pachet & Roy, 2011, p. 152). To reconfigure Table 6.1 as a HMM
would require n sets of some or all of the chord-types in the top row to be
“stored” in “containers”, one for each set, the latter encapsulating the hidden
states. The algorithm would first select a container and then select a chord
from within it. The next chord would also be selected from a container, but
all these choices would be constrained by transition probabilities: container
x might, for example, be more likely to be chosen than container y; and
within the selected container, chord p might be more likely to be chosen than
chord q. The sequence of generated/output chords would not be hidden (it
is “observable”), but the sequence of containers that gave rise to it would (it
is “not observable”), because the same chord might be stored in two or more
containers. In music, an HMM might be used to restrict the set of chords
(the contents of a container) available to be chosen at any specific point in
a sequence, in order to align with some model of chord progression. This
model might either be one arrived at via statistical learning – of which the
corpus analysis of De Clercq and Temperley (2011) discussed above is a
subset – or one based on some (presumably empirically grounded) theory –
such as that of Piston (1962, pp. 17–18), which represents “generalizations
. . . based on observation of usage . . . ”.

François Pachet’s Continuator system (Pachet, 2003) learns musical style from
input music using a Markov model in order to continue phrases played by a
human performer. In interaction with the system,
6. Computer Simulation of Musical Evolution 503

a user typically plays a musical phrase using a MIDI instrument (e.g., a


keyboard). This phrase is then converted into a sequence of symbols,
representing a given dimension or viewpoint [parameter; (Conklin &
Witten, 1995; Conklin, 2013)] of music, such as its pitch, duration, or
velocity. The sequence is then ‘learnt’ by the system by computing a
model of the transition probabilities between successive symbols. When
the phrase is finished (typically after a certain temporal threshold has
passed), the system generates a new phrase using the Markov model
built so far. The user can then play another phrase, or interrupt the
phrase being played, depending on the chosen interaction mode. Each
time a new phrase is played, the Markov model is updated. (Pachet &
Roy, 2011, p. 149; emphasis in the original)

Thus, the Continuator combines analysis with synthesis (§6.1), parsing an


input style by statistical learning into a set of transition probabilities, and
then generating outputs according to those probabilities.273 Moreover, the
Continuator incorporates aspects of constraint-satisfaction systems (§6.5.2.2),
in that it permits the user to specify certain conditions the output must satisfy,
such as – in the case of a blues chord progression – stipulating certain starting,
finishing and intermediate chord-types or – in distortions of normative blues
style – the appearance of certain non-standard chords (Pachet & Roy, 2011,
pp. 155–156). To achieve this, it uses an Elementary Markov Constraints (EMC)
model, which “explore[s] the set of sequences that satisfy exactly the control
constraints, and . . . define[s] the Markovian property as a cost function to
optimize” (2011, pp. 158–159).

As with many leading music-generative systems, there are many online


videos demonstrating their capabilities. In the case of the Continuator, Sony
CSL (2012) shows the composer Gjörgy Kurtág improvising with the pro-
gram, the latter responding – to my ears – in a congruent and engaging
manner to Kurtág’s sophisticated inputs. In a similar style to this interaction,
Figure 6.4 (Fober et al., 2019, p. 1, Fig. 1) shows a transcription of part of
another dialogue between a human (upper stave) and the Continuator (lower
stave).

273 The interactive design of the Continuator, and indeed other interactive augmentation systems

(§6.2) such as GenJam (§6.5.3.2), makes it well suited to use in educational contexts (Ferrari &
Addessi, 2014).
504 Music in Evolution and Evolution in Music

Figure 6.4: Sample Output of the Continuator.

While such freely atonal music as the user’s input phrase – and those extem-
porised by Kurtág – might be thought easy to simulate using any number
of approaches, including the quasi-random (see the discussion on page 525
following Figure 6.10), the Continuator has clearly abstracted the gross con-
tour and aspects of the rhythmic structure of the input as a basis for its
answering output phrase. It has first taken the user’s opening d1 –e Z1 (b. 1)
Z ^1 (b. 4); it has next matched the user’s following
and mutated it to b 1 –b
wide-interval zig-zag e Z1 –b Z1 –c1 (b. 1) with b ^1 –g \2 –c1 (bb. 4–5); and it has
then reworked the user’s rising fifth–falling sixth pattern e1 –b1 –d1 (b. 2)
as c 1 –g 1 –b (b. 5). Put another way, the Continuator has analysed a set
\ \ Z
of musemes and then explored the multidimensional hypervolume (§3.6.5,
§5.5.2) encompassing them in order to locate other patterns occupying the
regions of that hypervolume that define the musemes’ allele-classes. It has
then concatenated these museme alleles in a manner that creates a mus-
emeplex. While the concept of the museme allele, and indeed those of the
musemeplex and the musemesatz, have been defined primarily in terms
of replicated pitch frameworks (§3.5.2), there is no reason why the looser,
contour-based, similarities evident here cannot also be understood in terms
of these three categories. Moreover, given the Continuator’s evident ability
to abstract transition probabilities from the input phrase, it is reasonable to
believe that the arguably greater challenge of a tonal input phrase would
also be successfully learned and replicated.

Note, finally, that neither a first-order nor (normally) a second-order MC in


themselves embody a Darwinian system. While the motion from an “prefix”
state P to a “continuation” state Y (Pachet & Roy, 2011, p. 151) involves
6. Computer Simulation of Musical Evolution 505

an element of fitness – a statistically more probable continuation is selected


over a statistically less probable one, and thus is in this sense fitter – there
is no complete VRS algorithm at work. This is because there can be no true
variation nor any meaningful replication of such an information-poor unit as
a state consisting of a monad or a dyad (assuming those monads or dyads
themselves consist of a single entity, such as a note or a chord). This situation
potentially changes, however, with third-order MCs because the level of
information richness is sufficient to sustain the VRS algorithm, in the sense
that discrete patterns in an input may be captured and then subjected to
replication and selection (see also note 204 on page 356). Nevertheless, there
is still little scope for variation here, unless a three-element state were to
be replaced by another state that, on account of similarity, constituted an
allele of it. Even so, this would only represent the substitution of museme
x2 for museme x1 , not the mutation of x1 generating x2 . Moreover, and
as its name implies, the mono-linear strand of an MC is at odds with the
poly-linear nexus of intersecting strands characteristic of true Darwinism in
biological and cultural evolution. Yet these limitations can be transcended
when a Markov system is integrated with a human collaborator – as is the
case with the Continuator – because this provides the missing ingredient
of true variation of the patterns the system outputs. Thus nourished by
human-driven variation, the system may then go on to replicate and select
those human-generated musemes by encoding them via statistical learning
and incorporating them in its outputs.

6.5.2 Knowledge/Rule-Based Systems


In contrast to the methodology of machine-learning systems (§6.5.1), which
self-/soft-encode their knowledge as a result of the statistical learning res-
ulting from exposure to the target domain, knowledge/rule-based systems
– sometimes called “expert” systems Ebcioğlu (1988) – are hard-encoded
by the programmer. They are explicitly taught what they know, and there-
fore they reflect the programmer’s conceptions of the domain in question,
generating music in the image of that conception. Machine-learning and
knowledge/rule-based approaches are not mutually exclusive: a system can
be given a framework of knowledge and rules – the basic epistemological
building blocks of its domain – that it can then use to guide its statistical
506 Music in Evolution and Evolution in Music

learning; conversely, the analytical and/or synthetic outcomes of a process


of statistical learning can be filtered through a framework of knowledge
and rules that constrains the learned abstractions in terms of some desired
epistemological structure (§6.5.4).

6.5.2.1 Grammar-Based Systems

As has been argued throughout this book, statistical regularities in musical


styles arise from the interaction between nature and nurture. Nature provides
the perceptual-cognitive constraints that define which musical patterns can
and cannot pass through its filter; whereas nurture transmits between mem-
bers of a cultural community those viable patterns (memes) that can traverse
the perceptual-cognitive filter, the most salient or useful of these increas-
ing in predominance as per the mechanism of the VRS algorithm. These
bi-causal regularities can be described or prescribed by a grammar (§4.6),
that assigns functions to discrete entities – like words (noun, verb, determ-
iner) or chord-types (tonic, subdominant, dominant) – and that formalises
sequential-combinatorial rules, descriptive or prescriptive, for their concat-
enation. The use of the word grammar recognises the structural and func-
tional commonalities between language and music that have been addressed
throughout this book and that are crystallised in generative-transformational
and other grammar-based accounts of musical structure (Quick & Hudak,
2013, p. 59) (§4.4.1.3). Music-generative systems can encode grammatical
formalisms and use them to produce music that, on account of its conformity
to a grammar, is in alignment with the style described/prescribed by that
grammar; and that is perceptually-cognitively accessible to those who, by
virtue of nature and/or nurture, can parse the music described or prescribed
by the grammar.

Young (2017a) applies the concept of categorial grammar to the task of music
generation. Unlike generative-transformational grammars, categorial gram-
mars “not only describe the syntax of a sentence, but also how the meanings
of the individual words combine to create the meaning of the entire sentence”
(2017a, p. 2). Young argues that

[c]ategorial grammars describe how ‘objects’ (computational expres-


sions) of various types combine to form larger expressions.. . . Categorial
6. Computer Simulation of Musical Evolution 507

grammars lend themselves to automatic generation of music. Combinat-


ors can be used to derive new musical objects, including melodies, from
pre-existing musical objects. There are a set of valid musical objects and
functions, and they can be put together in such a way as to result in an
expression that is a melody. By automatically generating valid lambda
expressions, we generate small musical pieces. (Young, 2017a, pp. 1–2)

Lambda calculus allows the components of a sequence, such as words in a


sentence or discrete musical objects, to be represented in terms of concatena-
tion, predicate logic and nested hierarchy (2017a, p. 2). As an example, the
expression

λx, y, z.combine(x, y, z)
(rhythm, [0.5, 0.5, 1.0]) (start_pit,(5, 0)),
(contour, [1, 3, 2])

describes the set of possible musical objects – the museme allele-class – of


three elements whose first element is c1 and is the lowest pitch, whose second
2
element is the highest pitch, and whose rhythm is 4 ˇ “( ˇ “( ˇ “ (2017a, pp. 3–4, Fig.
3). Young (2017a) used categorial grammars to formalise and generate more
extended musical entities than those constituting this allele-class. Figure
6.5 (2017a, p. 7, Fig. 6) shows a short piano piece based on the following
grammatical rules:274

Chords are created by combining diatonic scales starting on different


keys with chord types, namely triads, ninth, seventh, and eleventh
chords. Each chord X is then made into a sequence of the chords X
IV/X V/X . . . . The resulting chords are combined with a rhythmic fig-
ure with 3 notes and a total length of 2 beats. The resulting melody is
manipulated in several ways, namely diminution with repetition, the
addition of an appoggiatura, and inversion. (Young, 2017a, p. 7)

While only six bars in length, this piece is afforded considerable coherence
by its grammar’s specification of a limited number of permissible musical
objects and their concatenation. The recurrence in every bar of the ˇ “( ‰ ˇ “) ˇ “
rhythmic museme and – apart from the triplet rhythm of the inversion-
based b. 4 – its duple diminution, gives the piece a distinctive character.

274 Young (2017b) gives other examples of music produced by this method.
508 Music in Evolution and Evolution in Music

Figure 6.5: Sample Output of Categorial Grammar.

The left-hand augmented-second/minor-third dyads ground the harmony,


instantiating the “X–IV/X–V/X” sequence that results from application of the
lambda expression’s “[ id , fourOf , fiveOf , ]” function (2017a, p. 7). While
the grammar conceives these chords in terms of diatonic operations, the
dyad spellings (and the associated upper-stave pitch) encourage reading the
harmonic museme associated with the ˇ (“ ‰ ˇ )“ ˇ “ rhythmic museme as traversing
\ Z \
the three diminished-seventh chords F–G –B(–D), B –C –E(–G), and C–E – Z
\
F (–A), and therefore exhausting the chromatic collection. These attributes
give the piece a flavour of the late/post-tonal music of the last quarter of the
nineteenth century, the closest examples of similar HGM perhaps including
– albeit of much greater scope and technique – such pieces as Liszt’s Sospiri!
S. 192 no. 5 of 1879 and his Bagatelle ohne Tonart S. 216a of 1885.

6.5.2.2 Constraint-Satisfaction Systems

A constraint-satisfaction system, as its name implies, attempts to find a


solution to a problem by satisfying a number of constraints that delineate
the problem. More formally, solving the problem involves locating the set
of points within a hypervolume whose coordinates satisfy the constraints
of that problem. These points – permissible values for a set of variables –
define a “feasible region” for the location of a constraint-satisfying output.
In music-generative systems, this approach amounts to encoding style rules
as constraints in a number of parameters that the output music must satisfy.
6. Computer Simulation of Musical Evolution 509

In this sense there are significant overlaps between constraint-satisfaction


systems and grammar-based systems (§6.5.2.1), in that the constraints are
grammatical rules that must be satisfied. In terms of the VRS algorithm, the
satisfaction of constraints represents a form of selection, one that filters out
those configurations that do not meet the “survival” criteria represented by
the constraints.

Quick and Hudak (2013) outline a (unnamed) grammar-based system that


generates chord progressions in “classical” and jazz styles. Their program
uses a probabilistic temporal graph grammar (PTGG). This overcomes certain
limitations of other types of grammar – such as a context-free grammar (CFG)
– in that: (i) it is able to capture phrase repetition; (ii) it can account for
the probabilistic dimension of musical style; and (iii) it allows the temporal
element in musical hierarchies to be accommodated (for instance, the gen-
eration from a minim-value I chord of a V–I progression assigns both the
V and the I the value of a crotchet) (2013, p. 59). The first of these attrib-
utes, in particular, helps overcome certain limitations of other generative
approaches – most notably Markov models, but also neural networks – which
generally struggle to render convincingly the hierarchic patterns of phrase-
and section-repetition evident in even the most simple instances of HGM
(Quick & Hudak, 2013, p. 67).

Quick and Hudak (2013, p. 60, Fig. 1) use a two-phase design in their system.
In the first (“abstract/structural generation”) phase, a generative algorithm
outputs the harmonic progression of a piece – the system deals only with
harmony, and does not in this version feature a specific melody-generation
facility – as an abstract sequence of Roman numerals. The generative al-
gorithm proceeds by progressively expanding a “start” symbol – i.e., the
highest hierarchic level, such as a sentence in linguistics or a Schenkerian
background-level tonic in music theory – until a “terminal” symbol – i.e.,
a word or a foreground-level chord – is reached. This expansion is reg-
ulated by grammatical rules that are implemented as functions and that
are deployed probabilistically by the generative algorithm based on styl-
istic precepts and regularities extracted from external statistical data (2013,
p. 62). These “[r]ules can create repetition as well as exhibit[ing] conditional
behavior, yielding complex structures with even a very simple generative
510 Music in Evolution and Evolution in Music

algorithm” (2013, p. 61). In the second (“musical interpretation”) phase,


the system uses the “abstract” chords generated in the first phase to produce
“concrete” chords, by voicing the former in musically meaningful (“perform-
able”) ways (2013, p. 63). This is achieved using a constraint-satisfaction
algorithm informed by the “OPTIC” model of Tymoczko (2006) and Cal-
lender et al. (2008) (see also Tymoczko (2011)). This model aims to expand
and unify extant harmonic and voice-leading theory, proposing the form-
ation of equivalence-classes of “objects” (sequences or sets of pitches) by
disregarding the five categories of transformation: Octave equivalence, Per-
mutation, Transposition, Inversion, and changes of Cardinality. Quick and
Hudak (2013) use “OPC space” to move from the “block trichords” implied
by the Roman-numeral output of the system’s first phase to an expanded
voicing suitable for mapping to the output music’s four voices in the second
phase (2013, p. 63; p. 64, Fig. 2; Callender et al., 2008, p. 346).

Both the generative and the constraint-satisfaction algorithms draw upon the
syntax of “let expressions” (Quick & Hudak, 2013, p. 62). These allow for the
replacement of the abstract terms x, y, etc., with concrete Roman-numerals,
and for the recursive-hierarchical embedding of chord progressions. For
instance, the let expression

let x = (let y = V t1 I t2 in y y) in x IV t3 I t4 x (6.1)

– where the superscript “t” refers to the time duration of the chord – expands
to the chord progression

V t1 I t2 V t1 I t2 IV t3 I t4 V t1 I t2 V t1 I t2 (6.2)

(Quick & Hudak, 2013, 65, Eq. 14, Eq. 15).

A sample output of the system is shown in Figure 6.6 (after Quick & Hudak,
2013, p. 69, Fig. 8), the system’s four-stave output being compressed here to
two staves.
6. Computer Simulation of Musical Evolution 511

Figure 6.6: Sample Output of PTGG System.

The capacity of a PTGG to implement phrase repetition is evident in the


extract’s ABA structure, marked on Figure 6.6, the B section itself containing
internal repetition (bb. 3–4, bb. 5–6). Yet while Quick and Hudak (2013,
p. 60) assert that “our system’s output sounds similar to a classical [i.e.,
Bach?] chorale”, the passage has many shortcomings if held strictly to this
claim. For one thing, the voice-leading is clearly very poor, with many awk-
ward, unvocal intervals in all four parts. Despite its aspirations to chorale
style, the system, as mentioned, does not claim to be able to generate melody.
Perhaps this arises from a misunderstanding of the nature of melody: in the
best Bach chorales, all four voices are independent melodies, smooth and
interesting in themselves. Even though the constraint-satisfaction algorithm
aims to “[r]egulat[e] voice-leading smoothness by restricting the range of
movement in the voices” (2013, p. 66), tighter constraints on the range of
permissible note-to-note intervals would have improved the melodic sense
of all parts. This lack of voice-leading parsimony is presumably an artefact
of the use of OPC space: in expanding from chord to chord, as opposed to
voice-note to voice-note along a complete part, the constraint-satisfaction al-
gorithm lacks the sensitivity to long-range voice-leading parsimony intrinsic
to chorale styles, J. S. Bach’s and others (such as “Classical hymn texture”
(Rosen, 1997, p. 319)). Returning to the issue of means versus ends raised in
§6.2, Bach’s voice-leading parsimony seems to have resulted from his writing
chorales by taking the extant melody, adding a bass line, then composing
the tenor voice, and finally inserting the alto part (David et al., 1998, p. 399;
Mabley, 2015). In short, each part arises from the linear concatenation (para-
512 Music in Evolution and Evolution in Music

taxis) of musemes in ways that respect the museme-concatenation of the


other parts. The present system, in its chord-to-chord constraint-satisfaction
search, behaves more like a university undergraduate student’s composing
“vertically” than like Bach’s composing “horizontally”. While using different
means does not necessarily preclude alighting upon similar ends, adopting
Bach’s means would appear to make the end of an “authentic” – smoothly
contrapuntal – chorale style more likely, in both HGM and CGM.

To add further criticism, the harmony of Figure 6.6 lacks a sense of direction,
6
with some odd treatments of 4 chords (b. 23–4 , b. 31 , b. 51 , etc.). Quick and
Hudak (2013, p. 68) acknowledge this, noting that “the transition between the
first instance of part A and the beginning of part B is a jarring transition that
is not very suitable for the target genre. Similarly, the first measure of part B
sounds rather odd with an unexpected major-minor transition in the middle
of the measure”. To understand these discontinuities, it is useful to refer to the
Roman-numeral labels shown under the bass line of Figure 6.6, which result,
as noted above, from the expansion of let expressions encoding probabilistic
aspects of the grammar. While most of these progressions are within Bach’s
vocabulary, that at the start of the B section (b. 3) seems at the very distant
periphery of probability in his style. The segment “M7(V M7(VII) VI V . . . )”
translates as: “chord V in relation to the seventh degree of C major (i.e.,
\
F minor (not major) in the context of B minor); followed by chord VII in
\
relation to the seventh degree of B minor (i.e., G diminished – the root
Z
ungrammatically spelled here as A – in the context of A major); followed
by chord VI in relation to the seventh degree of C major (i.e., G major in the
context of B minor); followed by chord V in relation to the seventh degree
\
of C major (i.e., F minor (not major) in the context of B minor)”. I know
of no Bach chorale that deploys this progression; and if any did, one would
imagine his voice-leading would be very much smoother.

Judged as a chorale – as Quick and Hudak (2013) invite us to do – the output


in Figure 6.6 is inferior to that produced by the significantly earlier CHORAL
system of Ebcioğlu (1988, p. 50, Fig. 1, Fig. 2), one of the first successfully to
generate music in this style. This is not to compare like with like, however,
because – despite their broadly common knowledge/rule-based approaches
– CHORAL is designed to harmonise extant chorale melodies (which serve to
6. Computer Simulation of Musical Evolution 513

facilitate style-emulation), whereas the system of Quick and Hudak (2013) is


not subject to this constraint, being primarily a melody-independent, chord-
progression generator. In the domain of Bach-chorale generation, CHORAL
is itself arguably trumped by the DeepBach system (Hadjeres et al., 2016;
Hadjeres et al., 2017), which learns the style of chorales using a neural
network. Their system is “steerable in the sense that a user can constrain
the generation by imposing positional constraints such as notes, rhythms
or cadences in the generated score” (Hadjeres et al., 2017, p. 1). Thus, in a
hybrid methodology (§6.5.4), DeepBach combines a neural network learning-
generative model with a constraint-satisfaction filter to refine the network’s
outputs.

6.5.3 Optimisation Systems

Optimisation systems, as this category’s name suggests, seek to determine


the optimum solution to a given problem. Similar to constraint-satisfaction
systems, optimisation systems search a notional problem-space, but the latter
attempt to trace the shortest and/or easiest route to a given solution. As with
constraint-satisfaction systems, the problem space may be represented as a
hypervolume in which the parameters of the problem are represented by the
axes and various candidate solutions sit at their intersections. As has been
argued on several earlier occasions, evolution by natural selection, driven
by the VRS algorithm, is a means of searching a problem-space in order to
locate an optimal solution to the problem of survival (§5.5.2); it is arguably
the optimal optimisation algorithm. Nevertheless, it is a proximity-weighted
algorithm, in the sense that it will alight upon the closest acceptable solution,
not the best overall. That is, evolution does not search the whole hyper-
volume, because it cannot see all that the hypervolume encompasses, and
because leaping to the best solution involves too much genetic and ontogen-
etic risk. Instead, evolution moves gradualistically by the shortest possible
(lowest-risk) distance to the nearest acceptable solution. Clinging stubbornly
on to the cliff-face of life, it short-sightedly searches for the nearest aptive
foothold able to prevent falling to oblivion; it does not risk the saltationist
lunge to a secure, but more distant, ledge.
514 Music in Evolution and Evolution in Music

Even if an optimisation system avoids untrammelled saltationism, as it surely


must if it is to be evolutionarily authentic, the combinatorial explosion that
arises from parametric interaction in even short spans of music means that
it must still search a large space in order to locate optimal solutions to a
particular set of desired criteria. Herremans and Sörensen (2013, pp. 6427–
6428; emphases in the original) identify three categories of “metaheuristic
optimization algorithms” able to accomplish this searching: (i) “population-
based metaheuristics”; (ii) “constructive metaheuristics”; and (iii) “local-search
algorithms” (sometimes also called neighbourhood search algorithms). The
first of these approaches is considered in §6.5.3.2; the second, which “con-
struct solutions from their constituting parts”, will not be considered here
owing to their relative underdevelopment for music-generative tasks; and
the third is considered in §6.5.3.1.

6.5.3.1 Local Search Algorithms

Local search algorithms “iteratively make small changes to a single solution”


(Herremans & Sörensen, 2013, p. 6428) in order to find the optimal solution
within a relatively constrained search-space. Herremans and Sörensen (2013)
developed a music-generative system, Optimuse, capable of composing in-
stances of fifth-species counterpoint (Fux, 1965),275 which they extended in
the Android app FuX (Herremans et al., 2015, p. 85). In Optimuse, based on
a variable neighbourhood search (VNS) algorithm, conformity to the principal
melodic and intervallic/harmonic rules of the species (as formalised in Salzer
and Schachter (1989)) is represented by a weighted subscore (where zero
represents perfect conformity) relating to each rule. Weighting allows for in-
creasing the emphasis of certain rules deemed to be particularly significant to
the style. Some of these rules are inviolable (“hard”), implemented as strict
constraints (i.e., they must score zero), whereas others are flexible (“soft”),
allowing scope for partial conformity. An objective function f (s) sums the
subscores and thus arrives at an overall assessment of how well a candidate
fragment of fifth-species counterpoint accords with the style, where the lower
the value, the greater the degree of conformity (2013, p. 6429).

275 See Z. Ren (2020) for a contrasting – genetic/evolutionary algorithm-based – approach to

the composition of first-species counterpoint.


6. Computer Simulation of Musical Evolution 515

In outline, Optimuse first generates a candidate fragment s – consisting of a


cantus firmus plus a counterpoint – using the hard rules. In this sense, the
program is functioning as a knowledge/rule-based system (§6.5.2; indeed,
Herremans and Sörensen (2013) appears in a journal devoted to expert
systems), because the hard rules used to arrive at an exemplar of fifth-species
counterpoint represent its understanding of the style. The local search itself
operates within three “neighbourhoods”. In the first, “swap neighbourhood”,
Optimuse explores the set of proximate variants that may be generated by
swapping any two notes in s, populating the neighbourhood by applying
this swap to all notes in s. Having used f (s) to find the optimum in this
first neighbourhood, the system then takes this best version, s′, as s and uses
it as the basis for populating and searching for an optimum in the second,
“change1 neighbourhood”, wherein the pitch of one note in the new s is
changed to another pitch permissible in the key. Then, having found the
optimum in this second neighbourhood, the best version, s′, is again taken as
s and is used to populate and search for the optimum in the third, “change2
neighbourhood”, wherein the pitch of two adjacent notes in the new-new s
are changed to other pitches permissible in the key. Taking the best version,
s′, from the third neighbourhood as s, the search then moves back to the first,
swap, neighbourhood, and the cycle is repeated until no other candidate
scoring closer to f (s) = 0 than the optimal form, sbest , can be found. Beyond
this basic mechanism, Optimuse implements other strategies designed to
prevent the search from becoming “trapped” around local optima, and to
avoid an aggregate low score arising from the combination of several low
and a few high scores (2013, pp. 6430–6431, Fig. 3, Fig. 5).

An example of Optimuse’s output (Herremans & Sörensen, 2013, p. 6433, Fig.


8) is shown in Figure 6.7.

Herremans and Sörensen (2013, p. 6434) say of this passage that “[i]t is
the subjective opinion of the authors that the generated fragment sounds
pleasing to the ear”; but they also acknowledge “its lack of theme or sense of
direction”. This is perhaps a fair assessment, and one that invites comparison
(by analogy with the discussion of Bach chorales in §6.5.2.2) with the work of
undergraduate students learning the basis of Fux’s approach. Such beginners
often manage to satisfy most or all of the rules, and in doing so arrive at
516 Music in Evolution and Evolution in Music

Figure 6.7: Fifth-Species Counterpoint Example Generated by Optimuse.

broadly agreeable solutions; but reconciling conflicts among the “soft” rules,
in particular, often leads to a degree of short-term thinking – local problem-
solving – that prevents the kind of coherent melodic flow and inner unity
found in the work of Fux’s models, particularly Palestrina. The melody in
Figure 6.7 also arguably suffers from an under-specification of the rules:
while their nineteen “horizontal” and nineteen “vertical” rules (Herremans
& Sörensen, 2013, pp. 6435–6436, Tab. A.7, A.8) are certainly essential to the
definition of the style, they do not capture the granularity of detail found
in some specifications of practice, such as the detailed profile of Palestrina’s
style offered by Jeppesen (1992). In particular, the repeated notes in bb. 4,
8 and 12 are prohibited in Jeppesen’s account (1992, pp. 111, 114, 136).276
Nevertheless, Herremans and Sörensen (2013, p. 6434) acknowledge that
future iterations of Optimuse could implement such sensitivity to composer-
specific style features in the objective function.

While sometimes set apart from systems based on genetic/evolutionary


algorithms as not truly evolutionary, local search algorithms nevertheless
potentially implement the VRS algorithm. Variation is provided, in the
case of Optimuse, by the swaps and changes – the edit-distance operations of
insertion, deletion and substitution (§3.6.5) – made to the candidate fragment
s that define the three neighbourhoods; replication is found in the copying
276 Jeppesen would also dismiss as unidiomatic to Palestrina’s style the upward skip from an

accented crotchet (i.e., the first and third in a bar of four) in b. 73–4 (1992, p. 120).
6. Computer Simulation of Musical Evolution 517

of a neighbourhood’s optimal variant, s′, to serve as the starting point for


the configuration of and search in the next neighbourhood; and selection
is accomplished by the objective function f (s), which assesses, by way of
the subscores and their weights, the “fitness” of the generated melodies
according to their conformity with the rule-set defining the style, and thereby
locates s′. Nevertheless, the final two stages are inverted here compared with
the algorithm’s normal sequence – selection in Optimuse (of s′) occurs before
replication, whereas in biological evolution (if not in cultural) it occurs after
it. Perhaps, however, this is to take a too rigidly linear view of the arguably
bidirectionally circular VRS algorithm, as is discussed in the third point of
the paragraph (on algorithm-sequencing) on page 549.

6.5.3.2 Genetic/Evolutionary Algorithms

Invented by Koza (1992), and enabling “population-based metaheuristics”


(to recall the categories identified by Herremans and Sörensen (2013, p. 6428)
listed on page 514), the paradigm of genetic programming instantiates evol-
utionary processes by implementing the VRS algorithm in computer code.
Systems based on genetic/evolutionary algorithms (GAs) both “generate”
and “test”, to use Dennett’s distinction (1995, p. 373): they engender the
necessary variation, often by dividing patterns in the relevant domain and
recombining their subcomponents; they replicate the varied patterns; and
they select from the resulting population using some fitness function (akin
to Optimuse’s objective function) that determines the desired attributes of
the successful patterns and/or their fit to some environmental or functional
constraint. Suitable for exploring evolutionary scenarios in a number of do-
mains, genetic programming has proved fruitful in music-generative tasks,
allowing for the rapid replaying of the memetic processes hypothesised to
have underpinned “real” music-cultural evolution. Beyond music synthesis,
GAs have been used for music-analytical purposes (Rafael et al., 2009; Geetha
Ramani & Priya, 2019); and for emotion-, genre- and piece/song-recognition
tasks (Gutiérrez & García, 2016). In some music-generative systems – such as
DarwinTunes (MacCallum, Leroi et al., 2012; MacCallum, Mauch et al., 2012)
– selection is devolved to human choice, the power and reach of the internet
making such crowd-based evaluations of candidate patterns relatively easy
to solicit. The discussion below is divided into systems that do not associate
518 Music in Evolution and Evolution in Music

the workings of the GA with interactions between virtual agents and those
that do.

Non-Agent-Based Systems

One of the most successful music-GA systems is Biles’ GenJam (Genetic Jam-
mer) software (2007, 2020). It is a “real-time, MIDI-based, interactive impro-
visation system that uses evolutionary computation to evolve populations
of melodic ideas (licks, in the jazz vernacular), which it uses to generate its
improvisations in live performance settings . . . ” (Biles, 2013, p. 20; emphasis
in the original). In a sense, it offers the same functionality as Pachet’s Con-
tinuator (§6.5.1.3), except it uses a GA rather than the Markov model of the
Continuator. GenJam has two modes of operation, “interactive” and “autonom-
ous”. In the interactive mode, the system supports: (i) “trading fours and
eights” (i.e., human-machine alternation of four- or eight-bar phrases (Biles,
2007, p. 156; Biles, 2013, p. 22)); (ii) “collective improvisation” or “intelli-
gent echo” (i.e., simultaneous human-machine improvisation (2007, p. 157,
2013, p. 22)); and (iii) “interbreeding” or “evolving . . . in the direction of
the human’s playing” (i.e., hybridisation of human- and machine-generated
bars (2007, p. 158, 2013, p. 23)). Essentially, in its interactive mode GenJam
draws upon a vocabulary of musical patterns and, before and during live
jazz human-machine co-improvisation, subjects them to the operation of the
VRS algorithm in response to ideas devised by the human soloist. In the
autonomous mode, the software runs this process with no interaction with a
human colleague (2007, p. 159).

Aligning with ideas discussed in §1.6.1, GenJam’s architecture is typical of


GA systems in that it ostensibly maintains a distinction between, in my
terms, a memome and a phemotype, although Biles uses the corresponding
terms (genotype and phenotype) from genetics (2007, p. 142). While this
is a binarism inherent in all systems where computer code gives rise to
musical sounds (§6.6.3, §7.6.1), in GA systems it is explicitly formalised in
the architecture, as reflected in the organisation of the memotypic elements
in conformity to the structure and function of DNA. Nevertheless, the issue is
not straightforward because, in my conception, the memome aligns with the
system’s music-representing source-code memes – strictly, with the electronic
impulses associated with the executable file derived by compilation from the
6. Computer Simulation of Musical Evolution 519

source-code memes – and not (as is the convention in genetic programming)


with on-screen representations of the source-code memes, which are the
phemotypic products of the plain-text files encoding the source-code memes.
In §7.6.1 these electronic-impulse replicators are termed “i-memes” or, using
a term of Blackmore’s, “tremes” (2015).

In contrast with the memome, the phemotype is not formalised explicitly by


Biles. Indeed, there is a degree of slippage between replicators and vehicles
in Biles’ accounts of GenJam’s design and function that is indicated by his
referring to the members of the phrase population (explained below) as both
“chromosomes” and as “individuals” (2007, p. 142). While it is unnecessary
to draw slavish comparisons between natural and cultural replicators, it
is certainly incorrect to regard an entity as both a replicator and a vehicle,
for this erodes the key distinction between the germ line (that which is
replicated) and the soma line (that which facilitates replication) (§1.8). A
memetic interpretation of the phemotype in GenJam would thus regard it as
consisting of the sound patterns motivated by the tremes manipulated by the
program, running on a silicon-based hardware; to which must be added the
sound patterns generated by the human co-performer motivated by memes
and musemes, running on a carbon-based hardware.

At the memomic/genotypic level, “genes” occupy slots in a “measure [bar]


chromosome” and then – in one reading of Biles (2007, 2013) – at a higher
structural-hierarchic level, measure chromosomes in effect themselves serve
as genes occupying slots in a “phrase chromosome”.277 Measure chromo-
somes are members of a set of sixty-four one-bar units (the “measure pop-
ulation”), whose original members may be replaced by variants. Phrase
chromosomes are members of a set of forty-eight four-bar phrases (the
“phrase population”) built from concatenation of members of the measure
population (Biles, 2007, pp. 142–145; Biles, 2013, p. 21). This design essen-
tially implements the RHSGAP model (§3.5.2), whereby bars (musemes)
assortatively recombine to generate phrases (musemeplexes). At a higher
277 Thus, Biles regards each note of a measure chromosome as a gene (2013, p. 21), not each
4
complete (4 ) bar of eight quaver-value slots. This is perhaps on account of each slot’s being
coded for by four bits (see note 278 on page 520). Thus, a bit might be regarded as analogous to
a nucleotide. In music, and as discussed on page 505 apropos third-order MCs, a single note is
not normally sufficient to function as a museme, so there is some disanalogy between (pseudo)
nature and culture here.
520 Music in Evolution and Evolution in Music

Repeat
Select 4 individuals at random to form a family (tournament selection)
Select 2 family members with the greatest fitness to be parents
Perform crossover on the 2 parents to generate 2 children
Mutate the resulting 2 children until they are unique in the population
Assign 0 as fitness for both children
Replace the 2 non-parent family members with the new children
Until half the population has been replaced with new children

Figure 6.8: GenJam’s Genetic Algorithm.

structural-hierarchic level is situated the “soloist” (broadly analogous to a


musemesatz), this being Biles’ term for “a collection of tunes that GenJam
will perform during the training process”, set up by the program’s human
“mentor” (2007, p. 145).

Figure 6.8 (Biles, 2007, p. 146, Fig. 7.5) represents in “pseudocode” – a


natural-language statement of the operation of the algorithm – the GA under-
pinning GenJam. The GA is deployed during a “training” phase, which draws
upon the measure population and the phrase population and uses them to
generate variants. This mutation is followed by human-driven selection:
the mentor listens to variants and codes them as either “g” (good) or “b”
(bad). This assessment determines, via a “fitness” value (2007, p. 142), the
likelihood of the variant’s use (its replication) in an improvisation.

In many GAs, the mutational operations are often “mindless”, sometimes


involving “flipping a random bit” of the data encoding (2007, p. 147).278
Biles found that using this approach did not work well in GenJam, one of
the reasons for this being that “while random changes will make measures
and phrases different, they are unlikely to make them sound better” (2007,
p. 148). While this perhaps underestimates the power of the “blind watch-
maker” (Dawkins, 2006) to build complexity by seizing on small, random
variations, Biles – keen to develop a system that would produce music that
sounds recognisably like jazz – developed a number of “musically meaning-
ful mutations”, operators that implement the familiar motivic-development
devices of transposition, retrogression and inversion (2007, pp. 148–149).

278 As outlined in note 277 on page 519, each note is coded in GenJam by four bits, and each bar

has eight quaver notes (which may be joined to form longer note values or rendered as rests),
making thirty-two bits per bar.
6. Computer Simulation of Musical Evolution 521

Figure 6.9: Sample Output of GenJam.

Similarly, the “intelligent crossover” operations (deployed at measure- and


phrase-level) attempt to avoid “unfortunate” crossover points, whereby a
bar or a phrase is divided in ways that create a new museme or museme-
plex with excessively large intra- or inter-museme intervals, respectively
(2007, pp. 152–153). “Intelligent note-level measure crossover” is illustrated
in Figure 6.9, which shows a transcription of a passage of GenJam’s output
generated by this operation (2007, 153–154, Figs. 7.12, 7.13).

Here bb. 1 and 2 are the two “parent” bars and bb. 3 and 4 are the two
resulting “child” bars. The crossover points are marked by vertical lines
in Figure 6.9, showing that the parent bars have been split after the fifth
quaver-event, leading, for example, to the preservation of the segment c2 –a1 –
g1 (segment 4) from quavers 6–8 of Parent 2 in the corresponding segment
of Child 1. This crossover also results in the retention in Child 1 of stepwise
melodic movement into segment 4, now from the b Z1 at the end of segment
2
1; and in melodic stasis on the d (equivalent to segment 2) in Child 2.
The underlying chord of this phrase is C major seventh throughout, but
normally chords change once or twice per bar, as pre-specified in a “chord
progression” file (2007, pp. 140–141; p. 145, Tab. 7.2) that constrains the
number of available melody notes for each chord. This is in accordance with
Biles’ safety-first “design philosophy that starts with simple, robust choices
and tries to avoid complex solutions to specific situations. I want GenJam
to always sound competent and never sound ‘wrong”’ (2013, p. 22). As a
calculated risk, however, GenJam is able to insert chromatic passing notes
Z
outside the specified note-list for each chord, as exemplified by the e 2 on
quaver four of Parent 2 (2007, pp. 144–145).

As a result of – or perhaps despite – these operations, the phrase in Figure 6.9


sounds idiomatic and characterful as jazz; but the real test of an interactive
system is of course a live performance situation using a challenging piece.
As demonstrated by Biles’ and GenJam’s rendition of the jazz standard “You
522 Music in Evolution and Evolution in Music

go to my head” (Biles, 2019), the system picks up smoothly at c. 0:50 and


again at c. 1:20, and develops melodic ideas with some stylistic sensitivity,
although perhaps without Biles’ flair. At the very least, the improvisation
demonstrates the power of the VRS algorithm to latch onto the musemes
and musemeplexes constitutive of this style and to manipulate them in ways
that align with the deployment of musico-operational/procedural memes in
human-only jazz improvisations.

Arguably the most radical use of GAs in music generation is the Iamus com-
puter (Diaz-Jerez & Vico, 2017), developed by a team led by Francisco Vico
at the University of Málaga. Named after the prophet of Greek mythology –
the offspring of Apollo and Evadne, who was able to understand birds – the
system is designed to compose music for orchestra and traditional acoustic
instruments in an avant-garde “classical” style, evoking the late- and post-
modernism cultivated by many contemporary human composers. Its outputs
are MusicXML files, which can be readily converted to musical scores using a
score-editing program; and the developers’ intention is that these scores then
be performed by professional musicians, with all the nuances of expression
and interpretation that they would bring to bear on human-composed music.
Indeed, a commercial recording of some of Iamus’s compositions, performed
by the London Symphony Orchestra, is available (Iamus, 2012). Another
design motivation for Iamus is that its repository of generated materials are
available to composers to draw from and adapt in order to stimulate their own
compositional practice. Thus, beyond its arguably primary function as a fully
automatic generative system, Iamus serves additionally as an augmentation
system, to recall the terminology of §6.2. In its primary role, a co-developer,
Gustavo Díaz-Jerez argues – and perhaps one needs a somewhat liberal
interpretation of his word “intervention” – that “Opus one (generated by
Iamus on 15 October 2010) is a good example of the quality of the resulting
compositional process and, to our knowledge, the first musical fragment ever
conceived and written in professional music notation by a computer without
human intervention” (Diaz-Jerez, 2011, p. 14).

A computer cluster housed in a striking tigerprint-patterned case (Sewell,


2012), the underlying mechanism of Iamus is presented under the rubric
of Melomics (Melodic Genomics) (Sánchez-Quintana et al., 2013). While
6. Computer Simulation of Musical Evolution 523

the technology is commercially sensitive, it is possible to understand its


algorithmic basis from published literature (Puy (2017, sec. 2) offers the
most comprehensive overview). It operates on evo-devo (evolutionary-
developmental) principles (S. B. Carroll, 2005), whereby (in biology)
“evolutionary changes are interpreted as small mutations in the genome
of organisms that modulate their developmental processes in complex
and orchestrated ways, resulting in altered forms and novel features”
(Sánchez-Quintana et al., 2013, p. 100). Thus, the system incorporates not
only the traditional genomic/memomic aspects of systems based on GAs,
but also an ontological-phenotypic/phemotypic element of embryology,
whereby the “self-organized choreographies of precisely timed events, with
cells dividing and arranging themselves into layers of tissues that fold in
complex shapes, resulting in the formation of a multicellular organism
from a single zygote” (Sánchez-Quintana et al., 2013, p. 100) of biological
embryology is emulated in code.279

Using the Melomics algorithm,

Iamus implements the evolution of complex musical structures, encoded


into artificial genomes (resembling multicellular living organisms, which
develop from a genome, and [which] also evolve in time). These gen-
omes represent the musical information in an indirect and very compact
way: each genome encodes the specifications to generate a music piece
following a complex developmental process. (Sánchez-Quintana et al.,
2013, p. 101)

Leaving aside the conflation of replicator (“artificial genomes”) and vehicle


(“multicellular living organisms”) here, in memetic terms one might take this
summary to indicate that the artificial genome functions as a musemesatz, in
that it encodes a series of source-code memes (or source-code meme allele-
classes), and/or the structural loci/nodes in which they are to be situated. By
contrast, Puy (2017, sec. 2) equates the genome with a “‘generating cell’ or
279 In biological evolution, embryological processes are “phenotype-side”, not “genome-side”.

In the “digital embryology” of the Melomics algorithm, insofar as it can be reconstructed and
understood from published accounts, the distinction appears blurred. This embryology is
implemented by means of indirect encodings (“formal abstractions of developmental processes
that define complex mappings between genotype and phenotype” (Sánchez-Quintana et al.,
2013, p. 100; Puy, 2017, sec. 2)), in contrast to the direct encodings (which “straightforwardly
map genotypes (representations of solutions) to phenotypes (the solutions themselves)” (Puy,
2017, sec. 2)) often used by other GA-based systems.
524 Music in Evolution and Evolution in Music

‘musical motif”’, akin to the Schoenbergian Grundgestalt/basic shape or the


Kellerian basic idea (§4.4.1.1). On the former interpretation, the (memomic)
source-code memes whose configuration and sequential structure is specified
by the artificial-genome musemesatz provide – via the resultant executable –
the instructions to generate a sequence of MusicXML-code memes. These,
in turn, provide the instructions to a score-editing program to generate a
series of (phemotypic) graphemes, delineating the generated musemes via
western musical notation. These graphemes provide the instructions, to
human performers, to generate sound sequences, which subsequently result
in the (memomic) encoding of Iamus-generated musemes in the brains of
their listeners.

The artificial genomes are subject to mutational operations, the resulting


variant structures being evaluated by a fitness function (Sánchez-Quintana
et al., 2013, pp. 101–102) that defines the “conditions” that must be satisfied
by the selected music. These conditions are organised into the six categories
of instrumental feasibility, notational correctness, form-type, instrument-
specific expressive nuance, user-criteria (specifically, piece-duration and
instrumentation), and aesthetic factors (encompassing dissonance levels and
timbre) (Puy, 2017, sec. 2). The fitness function encodes almost 1,000 rules
of music theory (Sánchez-Quintana et al., 2013, p. 102) and thus Iamus –
in common with other GA-based systems whose fitness functions encode
theoretical precepts as selective criteria – also represents a knowledge/rule-
based system (§6.5.2). “Recombination operators” permit the merging of
genomes encoding different musical styles and thus “offspring might show
combined features of the parental genomes”, this giving rise to “[n]ew
fusion genres” (Sánchez-Quintana et al., 2013, p. 101). Again insofar as the
detailed operation of Iamus can be understood, this suggests that the unit
of selection (§1.6.2) in the Melomics algorithm is in effect the whole piece
– strictly, the musemesatz underpinning its artificial genome – rather than
any lower-level unit, such as the individual source-code memes constituting
that musemesatz, or their resultant MusicXML-code memes, graphemes and
musemes.

One of Iamus’s compositions is the piano piece Colossus (2012), named after
the computer built during World War II to decrypt German codes by Tommy
6. Computer Simulation of Musical Evolution 525

Figure 6.10: Iamus: Colossus (2012), bb. 1–8.

Flowers with contributions from Alan Turing. Figure 6.10 shows the first
eight bars of the score.280

On first hearing, this music seems technically and stylistically convincing,


having, perhaps, a flavour of the style of Messiaen in its mystical and evoc-
ative textures. Cynics might argue that such a freely atonal avant-garde
style is not difficult to pastiche, because musical surfaces generated by, for
instance, a quasi-random approach to composition may not differ markedly
from those generated by strict, logical and intentional processes, such as
those seemingly underpinning the operation of Iamus. In a similar way, it
is arguably not beyond the ability of most artistically untrained people to
simulate, at least superficially, the visual style of an abstract painter like
Jackson Pollock through random application of paint to a canvas. Of course,
such “informed randomness” is part of the working methods of a number
of composers and painters. Presumably on account of the music-theoretical
rules encoded in its fitness function, the sound-patterns of Colossus tend
to form chunks that are consonant with the perceptual-cognitive grouping
criteria governing most HGM. Being thus coherent to a human listener, these
segments are likely perceived, and may function, in terms of the musemes
of HGM. Moreover, there is a good deal of stylistic consistency here, with
the exploration of the high registers of the piano; the use of left-hand chords
that are tied across the bar line and introduced by glissandi and acciaccature;

280 See also Díaz Jerez (2012) for a performance with Díaz-Jerez on piano.
526 Music in Evolution and Evolution in Music

and a right-hand melody that mixes triplet and “straight” quavers. Yet the
overall structure seems somewhat diffuse and lacking a clear developmental
trajectory: while there is no obligation, or consistent tradition in such a style,
for an arch-shaped tension-curve (page 411), the piece lacks a clear narrative,
such as might be expected in the work of a human composer. This deficit is
partly because, for all the chunking, there is little of the motivic development
that might sustain such a narrative. In short, this piece is clearly music, but
it is not particularly musical, as judged from an unavoidably biased human
perspective.

Agent-Based Systems

In implementing the VRS algorithm, a GA is essentially concerned with


generating a variety of patterns, copying them and then selecting them us-
ing some fitness function. This can happen in an “open”/unbounded way,
by creating a (virtual) workspace within which the algorithm operates. In
this way, the evolutionary processes are running abstractly, without any
explicit representation of the contexts and structures within which the replic-
ators are usually situated in “real-world” evolution. A more thoroughgoing
implementation of evolutionary methodologies would represent both the
replicator side and the vehicle side of the dynamic (§1.6.1), and would thus
preserve the distinction between the germ line and the soma line. This model
is implemented in certain agent-based systems that have been developed to
simulate evolution in a number of domains (Bonabeau, 2002). In the most
explicitly evolutionary of these, the agents constitute vehicles in which the
replicators reside. Echoing the nature of biological and cultural evolution,
the survival of the replicators in such systems is generally contingent upon
that of the vehicles, and vice versa.

Tatar and Pasquier (2019) present a comprehensive survey of seventy-eight


agent-based systems developed for music-related tasks (2019, pp. 57–60,
Tab. 1), which they organise into a nine-dimensional typology (2019, p.
63, Fig. 2).281 This overview indicates that the purposes for which such
systems have been developed – the “musical tasks” dimension (2019, p. 63) –
281 These nine dimensions are “agent architectures, musical tasks, environment types, number

of agents, number of agent roles, communication types, corpus types, input/output (I/O) types,
[and] human interaction modality (HIM)” (2019, p. 63).
6. Computer Simulation of Musical Evolution 527

are highly diverse and not always related to the autonomous, evolutionary-
generative purposes that are the focus of this chapter. Some programs, for
instance, serve as augmentation systems to facilitate the work of human
composers (§6.2) (2019, pp. 63–64); while one system surveyed performs
arrangement (2019, p. 64). Importantly, Tatar and Pasquier (2019, p. 65)
make a distinction between “mono-agent” and “multi-agent” systems. On
this criterion, certain systems I have considered under other rubrics – such
as the Continuator (§6.5.1.3) and GenJam (§6.5.3.2) – would be regarded
as (mono-)agent-based. Moreover, the inclusion of the Continuator in the
mono-agent category indicates that agent-based systems (of both mono- and
multi-agent types) use a range of generative methodologies, including those
discussed in previous sections, and not just genetic/evolutionary algorithms.
Given this considerable variety, and in order to maintain the evolutionary
focus of the chapter, my concern in this section is with GA-based multi-agent
systems that attempt to simulate evolutionary changes in musical cultures.
These often draw upon memetic concepts, albeit rarely explicitly.

Agent-based systems of the latter type may themselves be divided into two
categories: single-replicator and dual-replicator architectures. The former cat-
egory is concerned solely with either cultural evolution (Lumaca & Baggio,
2017; see also Mcloughlin et al., 2018, discussed in §5.4.2.3), or with biological
evolution (Jõks & Pärtel, 2019). The latter category attempts a coevolutionary
simulation of replicator interaction (§3.7), such as the modelling of genetic
and language-cultural (lexemic) coevolution in Azumagakito et al. (2018),
or the modelling of genetic and music-cultural (musemic coevolution in
Miranda et al. (2003), discussed below. Dual-replicator systems simulate
not only the idea of generations of agents, common to many agent-based sys-
tems, but they also allow for the exploration of horizontal (cultural), oblique
(cultural) and vertical (biological and cultural) transmission between agents
(§3.6). By modelling socio-cultural interactions, dual-replicator systems thus
allow for memetic factors to be yoked to genetic factors, in order to test, in
microcosm, cultural evolution’s role in mediating biological evolution, and
vice versa.

Miranda et al. (2003) developed an agent-based system to simulate three


interconnected music-evolutionary scenarios: (i) the sexually selected origin
528 Music in Evolution and Evolution in Music

of musical preferences (§2.5.3); (ii) the transmission of musical patterns


within a community owing to imitation; and (iii) the emergence of musical
grammars that combine syntax and semantics (see also Miranda (2003)).
In the first of these simulations, agents have a biological sex, with males
attempting to woo females using melody. Thus, the simulation encodes a
quasi-genetic and a quasi-memetic dimension. Females internalise a set of
Markovian transition probabilities (drawn from a training corpus of folk
songs) and they use these as criteria to rate the desirability of the male singer
(2003, pp. 92–93). The female “mates” with the male producing the most
highly rated melody and “has one child per generation created via cros-
sover and mutation with her chosen mate” (2003, p. 92). As with sexual
selection in biology, “[t]his child will have a mix of the musical traits and
preferences encoded in its mother and father” (2003, p. 92), where “musical
traits” is analogous to the “ornament” of sexual selection and “preference”
is equivalent to the concept of the same name in biology. A particular vari-
ant of this simulation is worthy of mention, one where females rate most
highly those males who violate the expectations encoded by the transition
probabilities of the training corpus. In fact, “in order to get a high surprise
score, a tune must first build up expectations, by making transitions to notes
that have highly anticipated notes following them, and then violate these
expectations, by not using the highly anticipated note” (2003, p. 93). While
this simulation does not, to my knowledge, attempt to incorporate the the-
ory of memetic drive (§3.7.1), this would illuminate the issue – not clear
from the discussion in Miranda et al. (2003) – of whether the ornament is
transmitted (in the terms of the simulation) via genetic or memetic means:
is the ornament the capacity to vocalise (equivalent to peacock tail-feathers)
or is it the vocalisations themselves (as in bird-song)? Strict adherence to
sexual selection theory would require the former, but the dual-replicator
coevolutionary orientation of memetic drive expands the focus in order to
make a distinction between a genetically controlled preference (including
for the aspect of expectation-violation in the ornament) and a memetically
transmitted ornament that can reflexively mediate that preference (including
the aspect of expectation-violation).

In
simulation is explored using robots – each the physical manifestation of an
6. Computer Simulation of Musical Evolution 529

agent – capable of hearing and reproducing sounds via auditory analysis,


auditory-motor association, motor-control mapping, and voice-synthesis
(2003, p. 95, Fig. 2). While an agent-based approach is capable in principle
of representing interaction between autonomous creative entities, and of
incorporating both memetic and genetic dimensions, the operation of the
VRS algorithm occurs covertly within the system, and the principal manifest-
ations of the system’s processes are often its output logs – in some cases, data
constituting the “compositions” produced by the agents at various stages
of a cycle. From these, the changing museme-pool of the agent-community
might be determined and the nature of the evolutionary changes understood.
To make the processes involved more tractable, several agent-based systems
employ robots capable of perceiving and generating – “singing” (Miranda &
Drouet, 2006; Gimenes et al., 2007) – musical patterns. While such systems,
obviously enough, tend not to implement analogues to biological reproduc-
tion, their musemic replication is evident in a way that is not the case with
more “virtual” systems. Genuine communities of social robots (Miranda,
2008) may be built, and aspects of vocal production and perception may be
simulated more directly than is the case with virtual (non-robotic) agents.
Nevertheless, it could be argued that, at worst, such systems are “gimmicky”,
in that all their functionality, and more, could be implemented using virtual
agents. Moreover, and as might be inferred from §6.4, the seeming close-
ness of such robotic systems to the dynamics of human music-making is
arguably illusory, and thus their apparent physicality and vocality is just as
“symbolic” – as opposed to “vibrant and visceral” (page 483) – as that of
virtual-agent-based systems.

Despite these concerns, the use of robots in the second simulation does
expedite the exploration of the vocal learning that is hypothesised to have
played a key role in the evolution of music and language (§2.7.5). Here,
“expectation is defined as a sensory-motor problem, whereby agents evolve
vectors of motor control parameters to produce imitations of heard tunes”
(2003, p. 94). While the use of robotic technology in this simulation is not
strictly required – one could internalise the processes of listening and repro-
ducing in a system – it does allow exploration of the constraints imposed
by physicality when listening to and reproducing musical patterns, factors
530 Music in Evolution and Evolution in Music

surely mediating memetic (mis)transmission.282 The simulation indicated


that “agents learn by themselves how to correlate perception parameters
(analysis) with production (synthesis) ones and they do not necessarily need
to build the same motor representations for what is considered to be per-
ceptibly identical . . . . The repertoire of tunes emerges from the interactions
of the agents, and there is no global procedure supervising or regulating
them; the actions of each agent are based solely upon their own evolving
expectations” (2003, p. 97). The strong tendency towards convergence on a
shared repertoire of melodies evident in this simulation is not only driven
by social interactions but is also an index of social bonding. In this sense, it
supports hypotheses asserting the importance of group sociality cemented
by shared musical practice in human evolution (§2.5.2).

In the third (purely cultural-evolutionary) simulation, Miranda et al. (2003)


extend the iterated learning approach discussed in §6.3 in order to apply
it to music. They simulate the evolution of syntax in short “compositions”
based on the concatenation of members of a set of nine melodic “riffs” trans-
mitted through a bottleneck between “teacher” and “learner” agents. The
association of two riffs arbitrarily engenders one of twenty-four “emotions”
(i.e., “emotion(riff,riff )”), which can be recursively embedded (e.g., “emo-
tion(riff,emotion(riff,riff ))”). Two such “emotion-structures” give rise to one
of eight “moods” (2003, pp. 101–102). The outcomes of the simulations using
this model were consistent: it was observed that

[t]he learners are constantly seeking out generalizations in their input.


Once a generalization is induced, it will tend to propagate itself because
it will, by definition, be used for more than one meaning. In order for
any part of the musical culture to survive from one generation to the
next, it has to be apparent to each learner in the randomly chosen 200
compositions each learner hears. A composition that is only used for
one meaning and is not related to any other composition can only be
transmitted if the learner hears that composition in its input. Musical
structure, in the form of increasingly general grammar rules, results in a
more stable musical culture. The learners no longer need to learn each
composition as an isolated, memorized piece of knowledge. Instead, the

282 The second simulation is described as being based on a “mimetic” model (§3.3.1). While

Miranda et al. (2003, p. 92) invoke the concept of memes only once, all three simulations draw
implicitly on the concept of particulate, culturally transmitted replicators.
6. Computer Simulation of Musical Evolution 531

learners can induce rules and regularities that they can then use to create
new compositions that they themselves have never heard, yet still reflect
the norms and systematic nature of the culture in which they were born.
(Miranda et al., 2003, p. 106)

This simulation thus offers further support for the hypothesis that sound-
systems move from holism towards increasing compositionality as a result
of cultural transmission through a learner bottleneck. To reiterate Kirby’s
central point in the passage quoted on page 480, it shows that “[a] holistic
mapping between a single meaning and a single string will only be transmit-
ted if that particular meaning is observed by a learner. A mapping between
a sub-part of a meaning and a [segmented, protemic] sub-string [a riff, in
the case of Miranda et al. (2003)] on the other hand will be provided with
an opportunity for transmission every time any meaning is observed that shares
that sub-part” (2013, pp. 129–130; emphasis in the original). Unlike Kirby’s
models, of course, the third simulation in Miranda et al. (2003) uses music
as its substrate for the association with meaning-states and not language –
a distinction that seems to dissolve in the light of such ILM simulations. In
fostering the origin of compositionality in any type of sound-stream, whether
one chooses to conceive of it as musical or linguistic, such simulations af-
ford evidence for Merker’s (2012) account of the evolution of compositional
language from musilinguistic vocalisations (points 12 and 13 of the list on
page 147). Moreover, while post-bifurcation music indicates that composi-
tionality is not necessarily associated with referentiality, the third simulation
supports the notion of a semantic association between the resulting composi-
tional protemes and specific extra-musical phenomena, in this case, emotions
(points 15 and 16 of the list on page 148). As discussed in §3.8.5, this associ-
ation might also obtain with musemes as the sonic replicator.

These three simulations arguably do not generate particularly interesting


music: their outputs are short, often disjointed, melodic phrases with little
musical character (Miranda, 2003, pp. 104–105, Fig. 8, Fig. 9). This of-
ten seems to be the case with agent-based systems, perhaps because their
primary purpose is less the creation of interesting music – unlike folk-rnn, the
Continuator, or GenJam – and more the testing of hypotheses on the cultural
evolution of musical and linguistic patterns and structures. Thus, while
aesthetically limited, such simulations nevertheless offer strong evidence
532 Music in Evolution and Evolution in Music

for the operation of the VRS algorithm and the associated phenomenon of
emergence. As Levitin explains, “[w]hen biological complexity arises from
simpler forms in small steps, we call it evolution. When a wholly unexpected
property – such as human consciousness – arises from a complex system, we
call it emergence” (2009, p. 269). Links are made between evolution and con-
sciousness in §7.3.2, which argues that consciousness is a form of evolution
and that evolution is a form of consciousness. Agent-based simulations of
music and language evolution not only afford evidence in support of these
links – rendering them tangible in a relatively short time-scale – but they
also suggest that, contrary to Levitin’s implicit saltationism, the distinction
between evolution and emergence is gradualistic, one of degree, not kind.

6.5.4 Hybrid Systems


This category encompasses two main sub-categories: (i) the combination of
two or more of the techniques considered in the previous sections, which, in
the systems discussed so far, were deployed in isolation; and (ii) the combin-
ation of one or more music-generation algorithms with phenomena in other
media (usually the visual realm), these non-musical elements sometimes
also being generated algorithmically.

6.5.4.1 Multi-Algorithm Systems

By this category is meant those systems that combine two or more of the
generative approaches considered separately above, the output of one algo-
rithm becoming the input to another. This is a common strategy in music
generation, the rationale being that quality-enhancing synergies may result
from the yoking of algorithms. Two recent multi-algorithm systems adopt
essentially the same strategy: they generate music using a GA (§6.5.3.2) and
then they filter the GA’s output a using a neural network (§6.5.1.2). Specific-
ally, the network is trained on a dataset of HGM in order to act as the fitness
function.

Mitrano et al. (2017) uses a GA based on that underpinning Biles’ GenJam


(§6.5.3.2) to generate monophonic solos using a MIDI representation. They
then utilised a RNN that forms a component of Google’s Magenta software
– an open-source project exploring deep learning techniques in visual art
6. Computer Simulation of Musical Evolution 533

and music (Google, 2021) – as the fitness function. Specifically, they used
Magenta’s Improv RNN, which is pre-trained on a large dataset of melodies,
in order to use what it had learned as a filter for the melodies generated by
the GA component of their system. In studies comparing human-judgement
fitness functions with the Improv RNN-based fitness function, Mitrano et al.
(2017) found that the latter offered a more consistent, efficient and parsimo-
nious assessment of fitness (Mitrano et al., 2017, pp. 4–6). Indeed, “although
the conventions of functional tonal harmony are not explicitly encoded in
Improv RNN, it is able to recognize basic triadic and diatonic hierarchical
weightings that correspond to those conventions” (2017, p. 4). Improv RNN
was thus able to assess (select) music in terms of the kinds of learned but
innately shaped pitch representations that are formalised, for instance, in
Krumhansl and Kessler (1982) and Krumhansl (1990) and that, as humans
do, it had abstracted from its training set.

In an analogous approach, Farzaneh and Toroghi (2020) present a melody-


generation system that filters the output of a GA, seeded with a database of
folk melodies in ABC representation, through a LSTM acting as the fitness
function. Specifically, their network is a bi-directional LSTM (Bi-LSTM), which
is “an LSTM whose parameters represent the forward and backward correla-
tions of the adjacent notes or frames of the musical signal . . . ” (2020, p. 2).
Whereas Mitrano et al. (2017) run and test human-based and RNN-based
fitness functions in parallel, in a two-stage process, Farzaneh and Toroghi
(2020) deploy them in series, in a three-stage process: the outputs of their
GA are first evaluated by human judges, and then the most highly rated
melodies are fed into the Bi-LSTM in order to train it to serve as a fitness
function. Thus, unlike the use of Improv RNN in Mitrano et al. (2017), which
has already been trained on a musical dataset, the Bi-LSTM in Farzaneh and
Toroghi (2020) learns what (some) humans find desirable on the basis of
their “training” on a musical dataset. This means that, in the former system,
the ANN-as-fitness-function indirectly captures human preferences (as they
have played out over extended time-frames in the production and reception
of music) whereas, in the latter, they are more directly (but perhaps more
narrowly) represented.
534 Music in Evolution and Evolution in Music

An issue taken up again in §6.6.2, it is clear from both these studies that
significant quality enhancements accrue from using what are essentially
Darwinian algorithms in tandem: a GA is explicitly Darwinian, because it
operationalises the VRS algorithm in code; an ANN is implicitly Darwinian,
because (as noted in §6.2) it takes varied input data and selects certain of the
patterns detected therein for replication in its output. A further extension
of multi-algorithm systems – perhaps one implicit in the approach pursued
by Mitrano et al. (2017) – is that suggested by Collins, whereby, in a “future
feedback loop, . . . output algorithmic compositions are created by systems
trained on real musical examples, and algorithmic outputs may in turn
become the next generation of available music [for training]” (2018, pp. 11–
12, Fig. 1). In this way, machines might be able to escape the constraints of
human taste-cultures and establish their own independent frameworks for
evaluation.

6.5.4.2 Multimedia Systems

This category refers to a generative algorithm that produces music to accom-


pany another medium, most usually moving images, that carries some form
of narrative. Such systems have primarily been developed to provide music to
accompany video games (Plut & Pasquier, 2020), but virtual reality (VR) and
augmented reality (AR) are other candidate application domains.283 In such
environments the system is not simply generating music as a stand-alone
output, as in folk-rnn or Iamus. Rather – in a significant augmentation of the
interactive performance dynamic of systems such as the Continuator and Gen-
Jam – it is generating music in response to a rapidly changing context whose
twists and turns result from the user’s responses to real-time situations and
myriad choice-points. Clearly such systems must not only produce internally
coherent music, but they must also respond to the kinetic and affective states
implied by the visual dimension and its objects and protagonists.

An example of a system in this category is Kantor, named after the pion-


eer of set theory, Georg Cantor (Velardo, 2019). Kantor is an interactive
283 Such systems have not yet gained a secure foothold in film music, perhaps because of the

higher status and greater economic muscle of film-music composers (certainly when speaking
of the leading figures) compared with composers of video-game music. The increasing cultural
prominence of video-game music (as evidenced by concerts and recordings of this music)
suggests this imbalance may not be permanent.
6. Computer Simulation of Musical Evolution 535

system, partly inspired by ideas of Xenakis (2001), whereby the music for
nine “islands” is produced by a generative algorithm. Each island is a three-
dimensional space based on a two-dimensional image synthesised using
geometrical-graphical elements – graphemes – derived from the style of the
late abstract paintings by Kandinsky. Using a VR headset, a user moves
through the dissociated elements of the faux-Kandinsky 3D-painting – as
if the painting had been shattered into fragments by an explosion – each
geometrical-graphical element being associated with a specific instrumental
line of the complete generated composition. Movement towards and away
from these floating painting-elements produces a corresponding adjustment
in the configuration of the music, because

[t]he different islands have unique sonic profiles, achieved through


different arrangements of instrumental ensembles. Each geometrical
pattern in an island is an audio source that broadcasts an instrumental
part. The sound in the experience is spatialised and surrounds the player.
The polyphonic music emerges through the interaction between the
music associated with each shape and the player’s position. By flying
across an island, the player can experience infinite, slightly different
implementations of the same piece. The music isn’t static. It’s a living
being that evolves as a function of the player’s position and the dynamic
distances between the geometrical shapes. (Velardo, 2019, sec. 3)

This form of intra-game “evolution” is more metaphorical than literal: a


piece is assembled extra-game in the initial generative process, and then
the player experiences different sonic perspectives on that piece – its meta-
morphoses – during their “flight” within the image. In fact, Kantor does not
use an evolutionary algorithm, relying instead upon “stochastic (random)
mathematical functions to generate musical sequences” – i.e., upon a Markov
model (Velardo, 2019, sec. 3). Nevertheless, a true form of evolution is to
be found in the extra-game, post-generation adjustments undertaken by the
programmers before the music is deemed finalised. These “re-appropriate
the creative process by polishing the generated material”. At a more sys-
temic level, the programmers “take an educated guess of what might not be
working in the generative system, based on its creative output. As a result
of this diagnostic phase, we change the initial instructions. In more radical
circumstances, we tweak the code of the system to implement our desiderata”
536 Music in Evolution and Evolution in Music

(Velardo, 2019, sec. 1). These extra-game processes afford the variation (via
“polishing”, reconfiguration and recoding), selection (via determination of
the optimum “tweaks”), and replication (via re-incorporation of the selected
elements into the system) at the work level and the system level necessary
for the VRS algorithm to bootstrap quality in this domain (see also Figure
6.14 and the associated discussion).

One significant dimension of Kantor’s operation is its invocation of a graph-


ical equivalent to the linguistic principle of compositionality, whereby “[t]he
meaning of a Kandinsky painting emerges through the interaction . . . of
single patterns, such as lines, circles and squares. When considered indi-
vidually, these patterns don’t display much artistic quality.. . . However,
when enough patterns are wisely composed together, an aesthetic quality
emerges” (Velardo, 2019, sec. 2). In linguistic compositionality, structures
and meanings arise from the assembly of lexemes in syntactic-semantic
configurations; in musical compositionality, musemes assemble to generate
musemeplexes and musemesätze; and in image-based compositionality, the
wider structure and sense of a painting emerges from the association of its
component graphemes. In Kantor, the image-based compositionality gives
rise to a form of musical compositionality in which sound-layers recombine
horizontally-polyphonically – as when one hears the ever-changing com-
binations of sounds in a cityscape as one moves through it – as opposed
to the vertical-paratactic recombination hypothesised to underpin musical
generation in the RHSGAP model (§3.5.2).

Figure 6.11 shows a human-generated image in the style of late Kandinsky


(Figure 6.11a), and the associated Kantor-generated music (“Kantor #8”,
subtitled “mystery”) (Figure 6.11b).

The transcription in Figure 6.11b – a notation of the two-part stratum/track


“mystery_1” does not do the music justice because, as noted, it is only one
stratum – a graphical-object-associated instrumental part – of a complex
polydimensional texture. In combination, the lines comprising Kantor #8
create a Gamelan-like texture, with subtly shifting and luminously oscillating
tuned-percussion sonorities. The extract shown rises chromatically through
Z
a major third from a (mis-notated) implied tonic A major to its mediant
C major. This is usually accomplished here by a straightforward semitonal
6. Computer Simulation of Musical Evolution 537

(a) Faux-Kandinsky Image.

(b) Associated Output of Kantor.

Figure 6.11: Sample Output of Kantor.


538 Music in Evolution and Evolution in Music

ascent connecting the stages from I to III, but there is a neo-Riemannian L-


operator (Leittonwechsel) shift that breaks the pattern. Enabled by resolving
the A major of b. 4 to its local tonic, D minor, b. 5, the fall of a major third to
Z
the mid-point of the ascent, B , implements the L-operator (D- →<L>→B + ); Z
Z
with an implied upper-voice motion from a1 –b 1 across bb. 5–6), from which
point the semitonal ascent to C major resumes. Heard in conjunction with
the other instrumental strata, however, there are certain contradictions to the
progression just outlined. In fact, the harmony of b. 5 is, en bloc, B major,
Z
Z
not D minor (a B is added to the D and F by other strata), and so there is
no global L-operator, only a local, stratum-specific one, and therefore only
a more conventional chromatic ascent. Nevertheless, as an antidote to this
relatively uninventive harmonic framework, the B-minor harmony implied
for most of b. 7 is in fact subsumed into a diminished-seventh chord. Its
\ \ \
f therefore implies a half-diminished-seventh chord (G -B-D-F ) abutting
dissonantly with the full-diminished-seventh (G \-B-D-F ^) of the complete
texture. A trace of this more complex harmony is afforded by the g \ of b. 74 .
Another example of harmonic blending is the elision of the two final chords
(the half/full-diminished-seventh and the C major), with elements of the
former seventh chord persisting into b. 10. The same process of juxtaposition
explains the g \1 in b. 4, which, while only appearing in this stratum on the
third beat, is present throughout bb. 3–4 in the full “picture”, as a suspension
\ Z
of the root of the previous G /A harmony.

6.6 Machine Creativity


In addition to its advancement of computer science, the computer analysis
and synthesis of music is an important tool for music psychology, offering a
powerful means of developing and testing models of music perception, cogni-
tion and generation. Boden (2004) has explored the last of these, considering
to what extent machines might possess the faculty of creativity – and thus
give rise to processes or outputs that manifest it – and how this might illu-
minate our understanding of human creativity. In this sense Boden’s model
is orientated squarely around human creativity, and – while a formulation
challenged to some extent in §6.6.2 – machine creativity is understood as
to some extent parasitic on it. However, as §5.5 has argued, creativity can
be understood in (Universal-)Darwinian terms, and thus while it is seen by
6. Computer Simulation of Musical Evolution 539

many as a purely human attribute, there is a case for extending its reach to
encompass the “outputs” (the vocalisations) of animals and – as is implicit
in much of the foregoing discussion – those of machines. Thus, beyond
using machines to shed light on human creativity, the generative power of
the systems surveyed in this chapter, together with the many others that
constraints of space meant could not be considered, implies that their outputs
should be judged on their own terms, as potentially aesthetic objects. If this
is accepted, it follows that there is a need for the evaluation of the outputs
of music-generative systems, and indeed of systems designed to generate
artistic outputs in other domains. Moreover, there is a higher-level need for
a meta-critical evaluation of the methodologies for evaluation of the outputs of
music-generative systems.

6.6.1 Can Machines be Creative?


The issue of computational creativity relates to the second and the fourth of
the four “Lovelace-questions” – named after the nineteenth-century math-
ematician Lady Ada Lovelace (1815–1852) – identified by Boden:

The first Lovelace-question is whether computational ideas can help us


understand how human creativity is possible. The second is whether
computers (now or in the future) could ever do things which at least
appear to be creative. The third is whether a computer could ever appear
to recognize creativity – in poems written by human poets, for instance.
And the fourth is whether computers themselves could ever really be cre-
ative (as opposed to merely producing apparently creative performance
whose originality is wholly due to the human programmer). (Boden,
2004, pp. 16–17; emphases in the original)

Questions two and four are, respectively, what might be termed “hard” and
“soft” versions of the Turing Test. Question two relates to the ability of a
computer to fool a human observer that a piece of music (for instance) it
produced is the work of another human, as in the case of tests of Cope’s EMI
via The Game discussed on page 491 (but see again the critique of applications
of the TT to non-natural-language media by Ariza (2009) in §6.1 and raised
again in connection with The Game). Question four relates to the same
ability, but replaces the “smoke and mirrors” of question two with real magic.
Even though these two questions concern putative examples of non-human
540 Music in Evolution and Evolution in Music

creativity, they aim ostensibly to model processes and perceptions operative


in human music-making.284 Whether the fruits of computers’ electronic
labours deserve credit as “honorary” human music, whether we consider
them relativistically as a kind of extended “animal music”, or whether we
consider them as sui generis, is nevertheless a moot point.

While Boden’s (2004) approach is admirably rigorous, it is impeded by the


fundamental problem that whereas the development of algorithms for the
machine-generation of music is objective, scientific and tangible, creativity
is, by contrast, subjective, humanistic and intangible, and thus there is a
fundamental dissociation between the two domains. There have been several
attempts to define creativity (Runco, 2014), including Boden’s own categor-
ies of P- and H-creativity and, within these, combinational, exploratory and
transformational creativity (2004, pp. 3–6) (§5.5.1). All are compromised,
however, by the slippery intersubjectivity of creativity: no two people will
necessarily agree on what constitutes creativity, and there is no fixed stand-
ard by which someone’s experience or qualifications allow them to trump
the assessments of others. Moreover, this intersubjectivity combines both
analogue (graded) and digital (all-or-nothing) judgements: one can deem
something (or some component part of something) to be creative or not; and
one can also entertain judgements of something’s (or some component part
of something) being more or less creative than something else.

The title of the following section implies that the answer to the question at
the head of the present section is in the affirmative but, as the foregoing
discussion indicates, the issue is not clear cut: leaving aside the problem of
(inter)subjectivity, one could develop and test methodologies for the evalu-
ation of machine creativity and find that, judged in their light, there is no
such creativity evident in the sampled processes or outputs. While this does
not necessarily verify that there is no such thing as machine creativity (or
falsify its existence) – the evaluation methodology, or the interpretation of
its outcomes, might be at fault; or the chosen sample might not demonstrate
creativity, however defined – it at least gives one a starting point for develop-
284 Lovelace indeed believed that computers might eventually be able to generate music,

arguing that if “the fundamental relations of pitched sounds in the signs of harmony and of
musical composition were susceptible of such [numerical] expressions and adaptations, the
[computer] might compose elaborate and scientific pieces of music of any degree of complexity
or extent” (in Herremans & Sörensen, 2013, p. 6427).
6. Computer Simulation of Musical Evolution 541

ing models for evaluation and for reflexively refining the generative systems
themselves. Adopting the standpoint on creativity articulated in §5.5.2, if
Darwinism represents a form of creativity (defined as combinational, explor-
atory or transformational operations conducted to locate solutions within a
problem space); and if a generative system wholly or partly implements the
VRS algorithm (whether because of its evolutionary architecture or in spite
of its ostensibly non-evolutionary architecture); then that system has at least
the potential to be creative and thus its resulting CGM embodies (or tokens)
that creativity, whether or not the latter manifests itself in ways that align
with the more circumscribed notions of creativity – shaped by culturally
evolved notions of musical style, structure and genre – that generally attend
HGM.

6.6.2 The Evaluation of Machine Creativity


There are several dimensions according to which machine (and human)
creativity might be evaluated. While constraints of space prevent a compre-
hensive treatment, Jordanous (2012, Ch. 2), Loughran and O’Neill (2017)
and Meraviglia (2020, pp. 16–17) offer overviews of issues and literature
in this area.285 The principal issues in this field are summarised under the
following three rubrics and discussed more fully in what follows. Note that
the terms of each rubric are non-exclusive, existing as points on a continuum
rather than as binarisms.

1. The Ontology of Evaluation: Should evaluation focus upon some abstraction of


the creativity of the system’s processes, or instead upon one or more of its
concrete outputs, as tokens of those processes? If the latter, does their ontology
affect their evaluation? To what extent is the creativity of a system or its outputs
an analogue or digital property?

2. Qualitative versus Quantitative: Should system outputs be judged qualitatively


or quantitatively, both according to a set of evaluation criteria? While they
are not coterminous, qualitative assessments imply the predominance of sub-
jective over objective factors, whereas the converse is the case for quantitative
assessments.
285 Oft-cited evaluation methodologies include Ritchie’s criteria for creativity (2007), Colton’s
“creative tripod” model (2008), the FACE and IDEA models (Colton et al., 2011; Charnley
et al., 2012), Jordanous’s own SPECS framework (2012), considered below, and the Apprentice
Framework (Negrete-Yankelevich & Morales-Zaragoza, 2014).
542 Music in Evolution and Evolution in Music

3. Intra- or Extra-Human Perception and Cognition: Should only those outputs of


music-generative systems that accord with the constraints of human perception
and cognition be evaluated, or should those outputs that transcend those limits
also be admitted?

Apropos point 1, it may be the case that a system demonstrates localised


or generalised instances of creativity in its processes, but that its generated
outputs are themselves of limited creative value – however system- or output-
level creativity is defined, captured and assessed. This issue inheres, partly
in the wider question of the ontology of the outputs of music-generative
systems. That is, evaluation is partly contingent upon the extent to which a
system’s outputs align with the two dominant models of music in human
culture: music-as-work and music-as-process – Taruskin’s (1995) distinction
between text and act (§1.3, §2.1, §5.1). In the former (musicological) case,
a minority of human music (“classical” or “art”) is organised into objects
(works), is preserved and transmitted by elaborate notational systems, and
is aestheticised in a canonic discourse. In the latter (ethnomusicological)
case, the vast majority of human music (“world” and “popular”) exists as a
process, is preserved in memory and transmitted orally, and serves an array
of social functions, religious, political and personal. When the outputs (as
distinct from the generative processes) of machines are assessed as poten-
tially creative, criteria of value may differ according to whether these outputs
exist as texts or acts, in Taruskin’s sense. When outputs come under the
category of music-as-work (texts), and whatever specific ontological model
one applies to them (Puy, 2017; §5.5.2), there is a predominant focus upon
the synchronic factors of large-scale (global) structural coherence, carefully
controlled handling of repetition and variation, and long-term/cumulative
effect. As with early analytical and critical responses to Beethoven’s music –
which attempted to discern an overarching order that transcended the local
moments of seeming discontinuity – evaluation of CGM in this category
reads it in the light of criteria and values derived from the reception of the
canonic masterworks of HGM (§4.4.1). When outputs come under the cat-
egory of music-as-process (acts, this distinct from the generative-creative
acts/processes of the parent system), the focus is upon the diachronic factors
of moment-to-moment (local) narrative coherence, appropriateness of con-
sequents in relation to their motivating antecedents, and short-term, hedonic
6. Computer Simulation of Musical Evolution 543

affect/effect. Criteria of value for music in this category might also include
the extent to which the system motivates a desire in a human musician to
collaborate with it (Kalonaris, 2018, p. 2).

Apropos point 2 of the list on page 541, extant approaches to creativity-


evaluation may be variously qualitative or quantitative. One very widely
used qualitative approach is the Consensual Assessment Technique (CAT)
(Amabile, 1982; refined in Hennessey et al., 2011). This is used “for the
assessment of creativity and other aspects of products, relying on the inde-
pendent subjective judgments of individuals familiar with the domain in
which the products were made” (Hennessey et al., 2011, p. 253). At the risk
of oversimplification, the CAT essentially relies on intersubjective expert-
ise as the arbiter of judgements on creativity. This majoritarian-relativist
view is understood in Darwinian terms at the end of this section; but a
minoritarian-absolutist might argue against it, on the grounds that the wis-
dom of (educated) crowds might nevertheless be trumped by the greater
wisdom of the (even more educated) individual. The majoritarian-relativist
counter-argument to this stance is that the evaluation of creative artifacts
is not scientific, and thus the aesthetic value of an art-object is a function
of what the majority view finds valuable, just as the economic value of a
commodity or a service inheres in the what the majority of purchasers would
be prepared to pay for it.

Perhaps the most widely cited quantitative methodology of recent years


is the Standardised Procedure for Evaluating Creative Systems (SPECS)
framework developed by Jordanous (2012, p. 138, Tab. 5.1), which identi-
fies fourteen “components” or “building blocks” of creativity in terms of
which a system’s processes and outputs might be judged. The components
of this “ontology of creativity” are (in Jordanous’s alphabetical listing): (i)
“active involvement and persistence”; (ii) “generation of results”; (iii) “deal-
ing with uncertainty”; (iv) “domain competence”; (v) “general intellectual
ability”; (vi) “independence and freedom”; (vii) “intention and emotional
involvement”; (viii) “originality”; (ix) “progression and development”; (x)
“social interaction and communication”; (xi) “spontaneity/subconscious pro-
cessing”; (xii) “thinking and evaluation”; (xiii) “value”; and (xiv) “variety,
divergence and experimentation” (2012, pp. 118–120, Fig. 4.7). Processes
544 Music in Evolution and Evolution in Music

and outputs need not score highly all these components in order to qualify
as creative; indeed, two components in particular are difficult to align with
generative systems: component (vii) (“[p]ersonal and emotional investment,
immersion, self-expression, involvement in a process; [i]ntention and desire
to perform a task, a positive process giving fulfilment and enjoyment” (2012,
p. 119)) implies a degree of physicality lacking in computer systems; and com-
ponent (x) (“[c]ommunicating and promoting work to others in a persuasive,
positive manner; [m]utual influence, feedback, sharing and collaboration
between society and individual” (2012, p. 119)) is only possible in facsimile
via agent-based systems (§273). The two criteria minimally constitutive of
creativity (the necessary conditions) given on page 450 – novelty and value –
correspond to Jordanous’s components (viii) (“originality”)/(xiv) (“variety,
divergence and experimentation”) and (xiii) (“value”), respectively.

While broadly framed as a quantitative approach, it is nevertheless clear


that determining the degree to which a particular component of Jordanous’s
framework is satisfied in a system’s processes or outputs is inherently qual-
itative and thus subjective. Given this, it is arguably not possible to verify
or falsify a claim made apropos a process in or object of HGM or CGM in
relation to a specific framework component (2012, pp. 36–37). Jordanous
sidesteps this objection by maintaining that “[t]here are . . . a number of
differences between SPECS and scientific method, largely due to how SPECS
handles the non-scientific and dynamic nature of computational creativity”
(2012, p. 157). Thus, both computational creativity and the SPECS frame-
work are a complex mixture of scientific and non-scientific components, and
the latter’s subjectivity obscures the objective treatment of the former. While
their own claims to objectivity are arguably overstressed – see the “final con-
sideration” of §4.4.3, on page 360 – music theory and analysis offer a means
by which the evaluation of the processes and outputs of music-generative
systems might be supported. Specifically, they can add a quasi-scientific
backstop to Jordanous’s criteria (iv), (v), (viii), (ix), (xiii) and (xiv) in ways
that might indeed allow for specific claims to be verified or falsified (§6.6.3).

Apropos point 3 of the list on page 542, another difficulty with the evaluation
of machine creativity is the potential for computers to transcend human
psychological constraints (Lerdahl, 1992). This may lead to the production
6. Computer Simulation of Musical Evolution 545

of music that is partly or wholly beyond the perceptual-cognitive grasp of


humans, such that the music is regarded, in the extreme, as noise.286 This
potential has already been realised by some human composers: the “New
Complexity” school, as represented by the music of Brian Ferneyhough and
Michael Finnissy, arguably illustrates the tendency most clearly. Such music,
human- or computer-generated, would occupy what Velardo terms “Region
Three” of the “Circle of Sound” (2014, pp. 15–17). As represented in Figure
6.12, after (Velardo, 2014, p. 16, Fig. 1), this circle contains everything that
might be regarded as “music” – in itself a problematic concept. “Region One”
contains low-complexity music that entirely respects human perceptual-
cognitive constraints (2014, pp. 15–16). “Region Two” encompasses higher
complexity music that requires some degree of training or knowledge – im-
plying, therefore, a “competent, experienced listener” (Meyer, 1973, p. 110) –
to appreciate its complexities fully (2014, p. 16). Region Three encompasses
music that, on account of its violation of our perceptual-cognitive constraints,
is too complex for the human mind to perceive and cognise – if not (pace Fer-
neyhough and Finnissy) to generate – and that might reasonably be assumed
to be an inevitable product of computer, as opposed to human, creativity
(2014, p. 17). Separated from Region Two by a “Horizon of Intelligibility” rep-
resented by the thick black line in Figure 6.12 (2014, pp. 16–17), Region Three
is potentially the largest, on account of its freedom from the relatively tight
constraints operating upon human perception and cognition (Figure 6.12 is
not drawn to scale; the dotted line at the outer circumference represents the
potentially infinite size of Region Three).

From this model Velardo and Vallati (2016, p. 11) derive the notions of anthro-
pocentric and non-anthropocentric creativity. The former, occupying Region
One and Region Two of Figure 6.12, encompasses creativity that is by and for
humans. The latter, crossing the Horizon of Intelligibility and occupying Re-
gion Three, encompasses creativity that is beyond human appreciation and
that is therefore restricted to non-human auditors (machines and perhaps
intelligent aliens). Region Three might be accessible to humans if we could
be genetically re-engineered in order to restructure our perceptual-cognitive
apparatus to allow us to process its contents; or – more organically – if we
were to evolve in the light of the selection pressures this music imposes. The
286 For the issue of “noise in and as music”, see Cassidy and Einbond (2013).
546 Music in Evolution and Evolution in Music

Region 3

Region 2

Region 1

Figure 6.12: The “Circle of Sound”.

latter is perhaps unlikely, given that competences deployed in the processing


of Region One and Region Two (anthropocentric) music are likely an exapt-
ation from capacities evolved in response to more urgent, survival-related
demands; and given that, conversely, the inability to perceive and cognise
very complex music is unlikely to put us at a significant survival disadvant-
age in the modern world. Nevertheless, memetic drive might be able to
push genes in the direction of computer-analogous competences, in order
to serve the interests of very complex, Region-Three (non-anthropocentric)
musemes.

Expanding upon the categories of anthropocentric and non-anthropocentric


creativity, Velardo and Vallati (2016, pp. 11–12) arrive at the framework
summarised below:

• Anthropocentric Creativity

Humans for Humans (2H): encompasses the bulk of human creativity and
its entirety before the invention of computers.

Computer-Aided for Humans (CH): relates to the use of computers as an


augmentation system to support human creativity (§6.2).

AI for Humans (AIH): involves technology able to motivate an affirmative


answer to at least the second Lovelace-question (i.e., “whether computers
6. Computer Simulation of Musical Evolution 547

(now or in the future) could ever do things which at least appear to be


creative”) and ideally the fourth (i.e., “whether computers themselves
could ever really be creative (as opposed to merely producing apparently
creative performance whose originality is wholly due to the human pro-
grammer)”) (Boden, 2004, pp. 16–17; emphases in the original) (§6.6.1).

• Non-Anthropocentric Creativity

AI for AI (2AI): encompasses all creativity that is by machines and that is


comprehensible only to other machines.

Non-anthropocentric creativity presupposes (i) non-anthropocentric dis-


crimination and (ii) non-anthropocentric taste: (i) is the ability of machines
to distinguish between functional uses of their competences and artistic/aes-
thetic uses – akin to the ability of humans to distinguish between the skills
required to solve a crossword puzzle and those required to write a sonata;
and (ii) is the ability to value their creative outputs (in the light of (i)).287
Taste, as the product of cultural evolution, is itself creative, because it is a
verbal-conceptual memeplex (§3.4) that, like all memes, selfishly “seeks” its
own survival by arriving at solutions that have the effect of expediting its rep-
lication. One way of achieving this is for it to evolve a fit, via coevolution, with
other memes and memeplexes, such as those constituting a particular cultural
phenomenon or product. It is surely no accident that the growth of a public
musical culture in Europe in the late-eighteenth and early-nineteenth cen-
turies was associated with a rise in music criticism (Hoffmann, 1998): both
replicator types – musemes/musemeplexes and verbal-conceptual memes/
memeplexes – were mutually interdependent, because without the musemes
of the musical culture there would be no motivation for the verbal-conceptual
memes of the critical culture; and without the latter the former would not
have been so extensively replicated.

A mechanism for this coevolution was proposed in connection with the


discussion of sexual selection in §2.5.3, offering an account of taste-formation
that does not depend upon the existence of absolute standards of value. To
reiterate as a statement this mechanism, outlined as a question on page 108,

287 Indeed, machines might value the creative outputs of other machines more highly than

those of humans – which they might perhaps regard as hopelessly banal – preferring instead to
dream of electric sonatas (Dick, 1968).
548 Music in Evolution and Evolution in Music

“a culturally transmitted ornament – a particular complement of musemes . . .


– [might] have been associated with a culturally transmitted preference – a
taste-related liking for the ornament represented by those musemes . . . – such
that they existed in a cultural linkage disequilibrium, i.e., in an alignment that
is more consistent than would be expected on the basis of random association
alone”. One implication of this mechanism is there are no absolute standards
of taste, only relative ones; and thus the judgements that are sustained by a
taste-culture – the “acts” sustained by its “texts” – are subjective, not objective.
One therefore does not necessarily have to be a postmodernist to endorse
relativism: in Universal Darwinism, all assessments are local, contingent
and, ultimately, selfish.

In general, most approaches to the evaluation of machine creativity assume an


anthropocentric perspective, specifically (apropos the framework of Velardo
and Vallati (2016) on page 546) an AIH orientation. This is perhaps not
surprising, given that: (i) humans, for the various reasons outlined above,
aspire to develop systems that can organise sounds in ways that are recog-
nisably musical (i.e., to be as close as possible to HGM); and (ii) 2AI is not,
by definition, subject to the constraints attendant upon (i). Having reviewed
a number of different categories of generative system, is it possible to de-
termine which is/are the most likely to score well on AIH-based rubrics? In
other words, which algorithm type, or which combination of algorithms –
recombination systems, neural networks, Markov models, grammar-based
systems, constraint-satisfaction systems, local-search algorithms, or genetic/
evolutionary algorithms, to recall the categories of §6.5 – is able to produce
the most “realistic” music (according to comparison with some stylistic ex-
emplar(s) of HGM) and/or the most “convincing” music (from a quasi-TT
perspective), from the perspective of a human listener?

While resolving this question depends largely upon which of the approaches
to evaluation discussed above one takes – and how one operationalises it
and assesses its outcomes – it seems that, on balance, a generative strategy
that assimilates music-stylistic norms and pattern vocabularies from extant
corpora and that then bootstraps the outcomes of this learning using selective
processes would likely stand the best chance of producing “AIH-compliant”
CGM. I say this simply because this is how – certainly in the view articulated
6. Computer Simulation of Musical Evolution 549

in this book – music is generated in human cultures, as formalised by the


operation of the VRS algorithm on musemes. A broad mechanistic alignment
between generative processes might therefore reasonably be assumed to
give rise to a close structural and perceptual-cognitive alignment between
CGM and HGM, leading to a positive evaluation of the former. Thus, neural
network models whose statistical learning of a corpus is subsequently refined
by means of a GA seem, on this logic, the most well suited to producing
“realistic” and “convincing” CGM. But the two systems (admittedly a very
small sample) discussed in §6.5.4.1 – Mitrano et al. (2017) and Farzaneh and
Toroghi (2020) – take the opposite approach: they refine the output of a GA
using a neural network (GA → ANN); indeed, I know of no system that
works the other way round (ANN → GA).

There are three reasons why this apparent sub-optimal sequencing of al-
gorithms is insignificant from a cultural-evolutionary perspective. First, and
as argued in §6.5.4.1, the VRS algorithm is intrinsic to both architectures, ex-
plicitly in the case of the GA and implicitly in the case of the ANN, and so it is
perhaps irrelevant which is invoked first. Secondly, in the systems developed
by Mitrano et al. (2017) and Farzaneh and Toroghi (2020) the ANN is tightly
integrated into the GA as the fitness function, so the architecture is funda-
mentally a GA if one regards the fitness function as a “slot” that can be filled
by a broad range of possible mechanisms for fitness-determination. Thirdly,
and most fundamentally, the VRS algorithm is perhaps best understood as
a bidirectional circle, not a line: while the initial impulse of the algorithm
perhaps came from replication – this resulting from the appearance of “an
entity . . . capable of behavior that staves off, however primitively, its own
dissolution and decomposition . . . ” (Dennett, 1993a, p. 174) – any one of
variation, replication and selection can be the starting (or entry) point of an
evolutionary process in an already established system, such as those musical
(sub)cultures drawn upon and simulated by music-generative programs.

6.6.3 The Theory and Analysis of Computer-Generated Music

Whether CGM is realistic or convincing – to recall the two anthropocentric


evaluation criteria from §6.6.2 – depends upon certain factors that music
theory and analysis are well suited to model. While much attention has
550 Music in Evolution and Evolution in Music

been given to methods for generating music using computers; and while
almost as much thought has been given to strategies for evaluating the out-
puts as music, little consideration has been given, to my knowledge, to the
music-theoretical aspects of CGM or to strategies for analysing it – but see
Various (2012) for the broader context and methodology – analysis being
understood here as a species of evaluation, even as a means for the verification
or falsification of evaluative judgements (page 544). Nor, indeed, has the
more fundamental issue of developing a philosophy to determine whether
such music (collectively or in individual instances) warrants, by analogy
with HGM,288 such theoretical/analytical treatment in the first place.

The human-centricity of music theory and analysis (§4.4) poses certain


problems for those wishing to extend it from HGM to CGM. This is partly
because the motivations for music-related computational creativity are (non-
exclusively) binary: from a scientific perspective, and as noted in §6.1, the
inherent complexity of music, resulting from its multiparametric combin-
atoriality, makes it an irresistible challenge for computer science; from a
humanistic perspective there is strangeness and beauty in experiencing mu-
sic made by a non-human entity, when that music is not presented to listeners
in the form of a Turing Test. The aesthetic frisson of the latter is not dissimilar
to that arising from hearing the vocalisations of certain non-human animals
(§5.4). Whichever motivation drives the application of music theory and
analysis to CGM – the former might use it to verify the efficacy of an algo-
rithm, the latter might use it to illuminate similar phenomena appearing
in HGM – there are inevitably thorny philosophical issues, one of which
is encompassed by the question of how should music theory and analysis
approach CGM, and what does CGM have to offer (“as a goal or as a goad”,
to recall Kerman’s phrase (1994b, p. 61)) to theory and analysis?

In terms of affecting the conception and practice of music theory and analysis
as it applies to both HGM and CGM, it is useful to consider how the VRS al-
gorithm relates to the three poles of the semiological tripartition. Figure 6.13

288 As this book has stressed, HGM comes in a dazzling variety of forms according – to give

just two constraints – to the cultural background and level of training of the composer/producer.
My focus in this section is primarily on HGM produced by trained professional composers and
written (as opposed to improvised) in broadly western art-music traditions.
6. Computer Simulation of Musical Evolution 551

VRS 3 VRS 5

P2 E2

metalanguage

analytical discourse
methodology influence
VRS 6
of analysis on the music

VRS 4
object
P1 HGM 1 E1 P3 HGM 2

VRS 1a VRS 2 VRS 7 VRS 1b

Figure 6.13: Theory and Analysis in HGM.

and Figure 6.14, after Figure 4.9, adapt Nattiez’s model of “object, metalan-
guage and method” discussed in §4.6 in order to represent this interaction.

In Figure 6.13, the VRS algorithm drives seven stages of the process. It should
be noted that this is a considerable oversimplification, and that there are, in
reality, multiple connections between the nodes identified, these forming a
highly complex and dynamic nexus:

VRS 1: VRS 1a drives the first poietic stage (P 1), by means of intra- and inter-brain
memetic processes, resulting in the production of an object (i.e., a work) and/or
a process (i.e., an improvisation) constituting HGM and situated at the neutral
level (as score and/or sound), both forms being represented by the element
labelled “HGM 1” in Figure 6.13.

VRS 2: mediates the esthesic stage (E 1), in that the extant memetic and musemic
complement of a listener acts as a filter for the input musemes of the products
of VRS 1a. There are numerous modes of listening, so while (primarily or
secondarily) listening for aesthetic/subjective pleasure, the listener may, as a
theorist/analyst, may attend to the music (secondarily or primarily) by deploy-
ing this more “intellectual” mode. Using the mental representation formed
from HGM 1 (and potentially many other instances of HGM), in addition to
knowledge of various theoretical/analytical discourses, this “bifocal” listening
may serve: (i) to develop a theory (VRS 3); and/or (ii) to guide its analytical
application, or the application of another theory (VRS 4).

VRS 3: receiving input from VRS 2 and from other instances of VRS 2 processes
(represented by the left-hand vertical dotted line), this drives the poiesis (P 2)
552 Music in Evolution and Evolution in Music

of a theoretical/analytical discourse – a metalanguage – that is reified in the


form of a published text, or several (inter)texts (Allen, 2011), that constitute the
phemotypic form of the theoretical/analytical verbal-conceptual memeplex.

VRS 4: mediates the connection between the theoretical/analytical discourse and the
esthesis (E 1) of HGM 1 (represented by the central vertical dotted line). On
occasions, the analyst may engage both modes of listening referred to apropos
VRS 2, and so there are two, difficult-to-separate, sources for E 1: (i) that arising
from an aesthetic/subjective response to HGM 1 (VRS 2); and (ii) that arising
from a theoretical/analytical response to it – i.e., one related to the application
of a particular discourse pertaining to it (VRS 4).

VRS 5: drives, analogously to VRS 2, the reception (E 2) of the theoretical/analytical


discourse within the community that engages with it, in the light of: (i) its
perceived alignment with, and development of, its intellectual tradition (in-
cluding its potentially heightened explanatory power vis-à-vis its antecedent
models); and/or (ii) its perceived alignment with its target HGM, this medi-
ated by interaction with processes encompassed by VRS 4 (represented by the
right-hand vertical dotted line).

VRS 6: mediates (by way of its effects on composers and improvisers) the “influence
on the music” of one or more theoretical/analytical discourses, this being
particularly evident in certain conservative traditions. It acts in conjunction
with a re-iteration of VRS 1 (VRS 1b), giving rise to a consequent/child (HGM 2)
of the antecedent/parent (HGM 1) arising from VRS 1a.

VRS 7: mediates the influence of esthesic (E 1) responses to HGM 1 on the poiesis


(P 3) of a consequent HGM (HGM 2). In some cases, such as Rameau and
Babbitt, the composer may also be a theorist (beyond the general understand-
ing of theory, formal or otherwise, evident in most accomplished musicians).
While the layout of Figure 6.13 is intended to indicate that such VRS 6-related
responses are distinct from those pertaining to VRS 7, in reality they may
well blend, and so there are, by analogy with VRS 4, two difficult-to-separate
sources for P 3: (i) that arising from aesthetic/subjective responses to HGM 1
(VRS 2/VRS 7); and (ii) that arising from responses mediated by a theoretical/
analytical discourse (VRS 6).

In Figure 6.14, there are two distinct tripartitional processes operating, one for
the generative system and the other for the resulting music. They are shaped
6. Computer Simulation of Musical Evolution 553

VRS 3 VRS 5

P3 E3

metalanguage

analytical discourse
methodology influence
VRS 6
of analysis on the system

VRS 4
object
(music)
P2 CGM 1 E2 P5 CGM 2

VRS 1a VRS 2 VRS 1b


VRS 7

object
(system)
P1 S1 E1 P4 S2

Figure 6.14: Theory and Analysis in CGM.

by VRS processes that are analogous across the two domains of technology
and music. To adapt and supplement the points made apropos Figure 6.13:

VRS 1: • Down-arrow: P 1 drives the design and coding of the music-generative


system itself (S 1), leading to the production of a program that is situated,
as electronically stored code and executables, at the neutral level.

• Up-arrow: S 1 – in the case of those systems using genetic/evolutionary


algorithms – then generates (diagonal arrow from S 1 to P 2) the output
CGM (CGM 1), a process broadly analogous to the operation of VRS 1a
in Figure 6.13.

VRS 2: • Down-arrow: broadly analogous to VRS 5 in Figure 6.13, this mediates


the reception (E 1) of S 1 itself, in terms of: (i) its perceived alignment
with, and development of, its intellectual tradition (including its po-
tentially augmented generative power vis-à-vis its antecedent systems);
and/or (ii) its efficacy in generating music according to its specified
design aims (diagonal arrow from CGM 1 to E 1).

• Up-arrow: analogously to VRS 2 in Figure 6.13, mediates the reception


(E 2) by humans of CGM 1. While CGM 1 can at present only mo-
tivate aesthetic/subjective and/or theoretical/analytical responses in a
human listener (VRS 2 in Figure 6.13), such responses may also (even-
tually) be possible – as forms of consciousness – in another machine,
554 Music in Evolution and Evolution in Music

as might, of course, responses to the HGM (HGM 1 and HGM 2) res-


ulting from VRS 1 in Figure 6.13. The theoretical/analytical mode of
interaction in computers is exemplified by the operation of such ana-
lytical programs as the Humdrum Toolkit (Huron, 2002; Huron, 2022),
or the Tonalities prolongation-analysis software (Pople, 2002) (for the
latter, see §7.5.3). Despite their being products of the VRS algorithm,
these and other current music-analysis programs are arguably merely
non-autonomous “prosthetic”’ extensions of human theorists/analysts
and are therefore not currently fully autonomous of our control.289 Nev-
ertheless, and apropos the category of “AI for AI” in point 279 of the list
on page 547, at some point in the future one AI might conceivably be cap-
able of “hearing”, both aesthetically/subjectively and/or theoretically/
analytically, the outputs of another.

VRS 3: as with VRS 3 in Figure 6.13, and also receiving input from VRS 2 and from
other instances of VRS 2 processes (represented by the left-hand vertical dotted
line), this drives the poiesis (P 3) of a theoretical/analytical discourse. This
poiesis may formulate a model that (assuming those using the model are aware
of CGM 1’s status as CGM): (i) treats CGM in the same terms as HGM; or that
(ii) takes account of the fact that the discourse relates specifically to CGM.

VRS 4: as with VRS 4 in Figure 6.13, this mediates the connection between the
theoretical/analytical discourse and the esthesis (E 2) of CGM 1 (represented
by the central vertical dotted line). On occasions, and extending VRS 2 in Figure
6.13 to encompass “trifocal” listening, the aesthetic/subjective perspective,
the theoretical/analytical perspective, and the perspective of the generative
system’s programmer may be employed by the same observer, and so there are
three, difficult-to-separate, sources for E 2: (i) that arising from an aesthetic/
subjective response to CGM 1 (VRS 2, up-arrow); (ii) that arising from the
application of a particular theoretical/analytical discourse pertaining to it
(VRS 4); and (iii) that arising from understanding how specific features of
CGM 1 may have arisen as a result of the operation and interaction of S 1’s
algorithms (VRS 2, down-arrow).

VRS 5: as with VRS 5 in Figure 6.13, this drives, analogously to VRS 2, the reception
(E 3) of the theoretical/analytical discourse within the community that engages
with it, in the light of: (i) its perceived alignment with, and development of, its
intellectual tradition (including its potentially heightened explanatory power
289 As McLuhan argued, “[a]ll media are extensions of some human faculty – psychic or

physical” (1969, p. 26).


6. Computer Simulation of Musical Evolution 555

vis-à-vis its antecedent models); and/or (ii) its perceived alignment with its
target CGM, this mediated by interaction with processes encompassed by
VRS 4 (represented by the right-hand vertical dotted line). E 3 is contingent,
among other factors, upon knowledge of the CGM-status of the target music,
for should a model developed for use on HGM prove ill-suited to a work
not known to be CGM (or vice versa), then the model might unreasonably be
regarded as being at fault when the error lies, in reality, in its (mis-)application.

VRS 6: mediates (by way of its effects on programmers) the “influence on the system”
of one or more theoretical/analytical discourses. It acts in conjunction with a
re-iteration of VRS 1 (VRS 1b; diagonal arrow from S 2 – a child of the parent
S 1 – to P 5) to engender a child (CGM 2) of the parent (CGM 1) arising from
VRS 1a. As with HGM 1 and HGM 2, the connection between CGM 1 and
CGM 2 may be indirect, especially if (in the case of CGM), S 2 represents a
radical reworking of S 1, made after generating CGM 1.

VRS 7: extending VRS 7 in Figure 6.13, there are three inputs to the poiesis (P 4) of
the second iteration of a music-generative system. While the layout of Figure
6.14 is intended to indicate that the two VRS 7-related inputs are distinct from
the theoretical/analytical-discourse-mediated input deriving from VRS 6, in
reality the three form interconnected sources for P 4:

• Down-arrow: that arising from understanding how specific features of


CGM 1 may have arisen as a result of the operation and interaction of
S 1’s algorithms (E 1).

• Up-arrow/diagonal arrow from E 2 to P 4: that arising from the aesthetic/


subjective responses to CGM 1 (E 2).

• That arising from the influence of theoretical/analytical discourses medi-


ated by VRS 6.

As this discussion suggests, every element of these two analogous processes


in HGM and CGM is made up of and driven by replicators, either in their
memomic (brain-stored) or their phemotypic (physical-world) forms (Table
1.3), sustaining the VRS algorithm in a number of domains and substrates
and operating at various levels of different ontological categories (§1.5.5).

Eschewing the Ultima Thule of Region Three of the Circle of Sound, and
thus 2AI creativity (§6.6.2), one might argue that CGM is tractable using
556 Music in Evolution and Evolution in Music

current (and historical) theoretical/analytical approaches to the extent to which


it reflects (or appears to reflect) the operation of human-like perceptual-cognitive
constraints on its generation. Specifically, if an instance of CGM respects the
hierarchical-grouping structure of most HGM, then it is likely to be amenable
to the same analytical methodologies that are applicable to HGM. Conversely,
if an instance of CGM violates these constraints, then – depending upon
how comprehensively they are rejected – it is less likely to be a meaningful
object for HGM-focused analytical methodologies. This second scenario
poses a significant challenge to attempts by theory and analysis to arrive at
methodologies that are able to engage with such CGM, and it might indeed
be deemed meaningless for a human-centred theory and analysis even to
attempt to cross the Horizon of Intelligibility in order to attempt an encounter.
As just suggested, this distinction is not necessarily clear-cut: an instance
of CGM might mostly adhere to human perceptual-cognitive constraints,
abandoning them only occasionally.

By hierarchical-grouping structure I mean – apropos the RHSGAP model


(§3.5.2) – the perceptually-cognitively driven tendency of most HGM to be
composed of musemes satisfying STM constraints; the tendency for these
units to follow on from each other in coherent ways; and the tendency for this
chunking to be replicated at multiple structural-hierarchic levels, such that
there exist higher-order units that themselves relate logically to each other in
the diachronic unfolding of the music. Much music theory and analysis has,
unsurprisingly, attempted to understand music in these psychologically ori-
entated terms, ranging – to briefly consolidate the accounts given in §4.1 and
§4.4 – from sixteenth-century linguistically/rhetorically motivated analyses
of vocal music by Burmeister; to eighteenth-century models of phrase- and
cadence-concatenation in Koch and Kirnberger; to Schenkerian voice-leading
models; to Schoenbergian and Retian theories of motivic transformation;
and, in more recent times (and perhaps going full-circle), to applications to
music of Chomskyan generative-transformational grammar (see also Bent &
Pople, 2001, sec. II).

To illustrate this principle, it is useful to discuss a short case-study. An


example of CGM considered in §6.5.3.2 – the Iamus computer’s composition
Colossus – is a good candidate. An analytical methodology appropriate for
6. Computer Simulation of Musical Evolution 557

attempting to understand this music – one not listed in the above historical
review – is PC set theory (Forte, 1973). In summary, this approach identifies
salient pitch-collections drawn from the chromatic set ranging from three
to nine notes (of which there are 4,096 in total) and reduces them, via the
operations of transposition and inversion, to one of 208 fundamental set-
classes. Transposition and inversion thus give rise to various members of each
set-class, with 3–11, for instance, having 24 distinct forms (the twelve major
and twelve minor triads). PC set theory affords the opportunity to relate
seemingly unconnected pitch-collections using their membership of specific
set-classes (each of which has a characteristic internal interval complement
or interval-class (IC) vector) as a common denominator. In HGM, two
patterns of the same set-class – or, alternatively, having a Z-relation, where
two different set-classes share the same IC vector (Forte, 1973, p. 21) – are
perceived, and may have been conceived, as having stronger synchronic and
diachronic connections than patterns lacking such relationships. Thus, PC
set correspondences may be taken as affording evidence of compositional
intentionality and higher-order pitch-content planning in HGM, although
there are certain cautions, to be discussed below, in this regard.

Four criteria seem appropriate to organise an analysis of Colossus. Firstly,


as the boxes on Figure 6.15 – an annotated version of Figure 6.10 – suggest,
and on the basis of the inevitably subjective segmentation adopted here,290
there is a degree of (“vertical”) recurrence of PC sets evident in this extract,
in the form of three appearances of set-class 4–19 (bb. 1, 5 and 6) and two
appearances of set-class 3–11 in b. 8. Nevertheless, and secondly, while
there is a degree of motivic unity engendered by the recurrent 1 × ˇ “( –4 × ˇ “*
units in bb. 1 and 3, these motives are not related by membership of a
common set-class. Thirdly, there is no Z-relationship evident between the
PC sets identified, although alternative segmentations might reveal such
relationships. Fourthly, identifying certain registrally salient pitches, as
marked by the arrows, indicates that the lower-voice pitches (D (b. 1), A \ \
^ ^
(b. 4), B (b. 6), and G (b. 7)) spell out (“horizontally”) set-class 4–19, this

290 Segmentation is a highly controversial topic in PC set theory (Hasty, 1981), given the

arguably greater propensity of the method to circularity and to confirmation bias than is the
case with approaches for analysing tonal music. The segmentation of Colossus utilised here
attempts to respect motivic and gestalt-psychological grouping principles. See again the “final
consideration” of §4.4.3, on page 360 and also Lalitte et al. (2009).
558 Music in Evolution and Evolution in Music

Figure 6.15: Iamus: Colossus (2012), bb. 1–8, with PC Set Annotations.

set-class, as noted apropos the first criterion, being significant vertically. No


such registral connections are evident, however, in the upper-voice line.

It is not at all straightforward to assess the significance of these findings,


which might be quasi-random, in the “pseudo-Jackson-Pollock” sense dis-
cussed on page 525. This is to assume, as PC set theory does, that in the case
of HGM such set-theoretical phenomena are not random – i.e., that they are
at some level intentional, and therefore significant and valuable because they
represent a manifestation of the composer’s agency and intelligence. The
evidence for such intentionality in HGM might exist not directly but rather
at some degree of remove: as with all theoretical systems and analytical
methodologies, the phenomena theorised by PC set theory might themselves
be epiphenomena – second-order intentional – of some more fundamental –
first-order intentional – process, such as, in the case of PC set theory, a focus
on the intervallic structure of harmonies and melodies. Some might argue
that this focus is indeed the primary motivation behind Viennese early aton-
ality, as opposed to the implicit (Fortean) set-class recurrence such intervallic
structures motivate.
6. Computer Simulation of Musical Evolution 559

Intentionality in CGM is, as the discussion of Figure 6.14 implies, distributed,


in that theory and analysis has to take into account both the “real” intentional-
ity of the programmer (to some extent analogous to first-order intentionality
in HGM) and the “virtual” intentionality of the generative system (to some
extent analogous to second-order intentionality in HGM). An extreme point
on the continuum of automation/autonomy discussed in §6.2, programmer
non-intentionality in CGM inheres in the appearance of phenomena in the
resulting music that are not explicitly coded for – or even broadly anticipated
– in the generative algorithm. Given that the VRS algorithm itself cannot
account for all the output possibilities of any given input, this does not
necessarily undermine the significance of a given phenomenon, either in
biological or cultural evolution. As in second-order intentionality in HGM,
certain programmer-non-intentional aspects of CGM might be theoretically
and analytically interesting even if they are not wholly controlled for via the
underpinning algorithm. This autonomy – perhaps supporting an affirmat-
ive answer to the fourth Lovelace-question – thus represents the triumph of
a system’s virtual intentionality over the programmer’s real intentionality.

One aspect of this intentionality inheres in the extent to which Iamus uses – or
does not use – human-analogous perceptual-cognitive constraints when ma-
nipulating note-patterns. Some of these patterns in Colossus are, in Lerdahl’s
phrase, “cognitively opaque” (1992, p. 118). Nevertheless, the evo-devo algo-
rithm underpinning Iamus might be sustaining the selection and replication
of certain note-groups that, owing to their analogous interval-class content,
leads to the vertical and horizontal set-class recurrences identified, even
when note-order and rhythmic structure is subject to levels of variation that
militate against explicit organisation on the basis of pitch- and rhythm-stable
musemes situated at a number of structural-hierarchic levels. Thus, there
is potentially memetic replication of certain intervallic structures – in the
form of PC sets functioning as unordered, interval-defined musemes – in
Iamus’s computer-algorithmic implementation of cultural evolution, even
though there is little of the explicit pitch-plus-rhythm museme replication
characteristic of HGM.

While comparisons are, in some cases, odious, it is perhaps instructive to


relate Colossus to an example of free-atonal HGM. Figure 6.16 shows the
560 Music in Evolution and Evolution in Music

Figure 6.16: Schoenberg: Klavierstück op. 11 no. 1 (1909), bb. 1–11.

opening section of Schoenberg’s Klavierstück op. 11 no. 1 (1909), which one


might compare with Colossus in terms of the four criteria outlined on page 557:
(i) recurrence of certain set-classes; (ii) alignment of set-class structure with
motivic/musemic structure; (iii) Z-relationships between significant sets;
and (iv) higher-order, registrally salient (“middleground”-level) set-class
structure.

On criterion (i), the red-coloured pitches in the right hand of bb. 1–2 (G , ^
\ ^ Z ^ ^
G , B ), the blue-coloured pitches in b. 3 (D , E , F ), the green-coloured
pitches in bb. 4–5 (G ^, B Z, B ^), and the purple-coloured pitches in b. 10 (G \,
A ^, C ^) are all members of set-class 3–3 (Straus, 2005, pp. 45–47), Schoenberg
relating set-class membership with motivic recurrence, criterion (ii), in ways
not evident in Colossus. Moreover, the bracketed melodic pitches in bb. 1–
3 (upward-facing note-sticks) are a member of set-class 6–Z10; and the
following pitches, boxed, in the left hand are a member of set-class 6–Z39.
On criterion (iii), these two set-classes are Z-correspondent with each other,
sharing the IC vector 333321 (2005, pp. 92–93). Finally, on criterion (iv), the
^ \ ^ Z ^
highest melodic pitches (G , G , B ) and the lowest bass-voice pitches (G , G ,
Z
B ) (marked, respectively, by down and up arrows) are themselves members
6. Computer Simulation of Musical Evolution 561

of set-class 3–3, forming an expression of this set-class at the middleground


level (2005, pp. 104–105).

On this admittedly very limited body of comparative evidence, it would


appear that, in terms of the aspects considered here (including the four
specific criteria), Colossus possesses to some extent the hierarchical-grouping
structure identified on page 556 as a key factor in the amenability of CGM
to HGM-orientated analytical methodologies. This mode of organisation is
clearly evident in the local and higher-order set-class structure of Schoen-
berg’s op. 11 no. 1, whereas its deployment in Colossus arguably lacks the
surface-level clarity and rigorous motivic logic of the Schoenberg piece. This
potential deficiency is not necessarily to be taken as evidence that Colossus
lacks aesthetic or intellectual value, or that is necessarily to be regarded as
inferior to the Schoenberg piece. Rather, it is to acknowledge that CGM does
not always conform as closely to certain perceptual-cognitive constraints
as HGM despite, in this case (and from what we know of its operational
principles), Iamus’s using a broadly Darwinian algorithm. Thus, it may po-
tentially score less highly on rubrics deriving from analytical methodologies
that have evolved to describe and explicate HGM. Of course, the notion of
scoring on rubrics implies a quantitative methodology, whereas a largely
qualitative approach was adopted in the present comparison. Perhaps more
fundamentally, there is a wider (VRS-driven) general intelligence, strategic
planning, embodiment and cultural memory underpinning HGM that, at
present, is not sufficiently well implemented in computers to allow for the
generation of fully human-convincing CGM. At the risk of sounding glib,
and as Levitin might say (apropos the quotation on page 74) of computers,
at the moment “they just don’t get it”.

Given the foregoing discussion, and in summary, one might make the fol-
lowing points:

• Music theory and analysis have developed alongside the HGM they seek to
explicate, so it is perhaps inevitable that these two processes have become coe-
volutionarily self-reinforcing: theory and analysis evolve to model a target that
is itself constantly evolving, change in both domains resulting from the action
of the VRS-algorithm on memes and musemes; and much music, to ensure
562 Music in Evolution and Evolution in Music

coherence (and thus replication), evolves to align with certain constraints of


organisation consolidated in theory.

• While much CGM is to some extent convincing in comparison with HGM, its
lack of – or perceived deficiencies in – the multilevelled hierarchical-grouping
structure that engenders coherence for humans is a significant difference, often
leading to the lower perceived (teleo)logical drive of CGM in comparison with
HGM.

• This deficiency often renders CGM problematic when exposed to theoretical


frameworks and analytical approaches evolved for HGM, and it leads to a
methodological tension: should generative algorithms be modified in order
to generate music that is more tractable to theory and analysis (and therefore,
by extension, more comprehensible to human perception and cognition); or
should theory and analysis expand its Horizon of Intelligibility, in order to
accommodate the challenges of this new category of music, as it has done in
the case of radical HGM for centuries?

• While it has been assumed that an analyst is normally aware that the object
of investigation is an instance of CGM, it should be acknowledged – in what
might be regarded as a theoretical/analytical Turing Test – that the outcome
of an analysis may well be affected by knowledge of the non-human origins
of CGM. If the analyst were unaware of the music’s provenance, one might
(perhaps cynically) hypothesise that certain elements regarded as deficiencies
in known CGM might be regarded as creative – i.e., as novel and valuable – in
assumed HGM.

Some of these issues arise from the the attributes of HGM’s antecedent
musilanguage, from the biological- and cultural-evolution-shaped nature of
HGM, and from the perceptual-cognitive and embodied foundations shaping
human musicality and music. They currently separate HGM from CGM,
but it is not inconceivable that in the future cultural-evolutionary pressures
might build human-analogous aptations in machines; or, conversely, that
biological evolution might reshape human musicality along the Region-Three-
unlocking lines discussed on page 545.

6.7 Summary of Chapter 6


Chapter 6 has argued that:
6. Computer Simulation of Musical Evolution 563

1. The power of computers makes them well suited to emulating/simulating


processes that, in real time, are difficult to observe or analyse. At the most
autonomous end of the continuum of synthesis, the machine generation of
music can, when conceived and implemented in Darwinian-memetic terms,
facilitate both the modelling of human perceptual-cognitive and creative pro-
cesses, and the rendering of counterfactual histories of music.

2. Evolutionary change in replicator systems has been modelled by simulations


of language evolution. The simulation of music evolution is in some ways
less complex than that of language, because the semantic dimension central to
language need not necessarily be incorporated for validation of the hypothes-
ised mechanisms in music; yet in other ways it is more complex, owing to the
arguably greater combinatorial and structural complexity of music.

3. A number of music-generative systems have been developed using a range


of algorithms. While some of these are not strictly Darwinian, and while all
must address the distinction between music and its representations, those
modelling the operation of the VRS algorithm are certainly capable of evolving
convincing outputs. Some of these systems simulate musical societies in which
their virtual agents represent vehicles within and between which a rich nexus
of replicators evolves. The most sophisticated of them also simulate gene-
meme coevolution and thus represent the nearest machine equivalent to the
dual-replicator coevolution operative in human – and potentially in animal –
societies.

4. Just as certain species of non-human animals might be regarded as potentially


creative, engendering novelty and perhaps value in their outputs, the same
issue arises in connection with music-generative systems. The existence and
nature of machine creativity is to some extent contingent upon the evaluation
of the operation and outputs of such systems, approaches to which remain the
subject of ongoing debate. A resolution depends, in part, upon whether one is
prepared to consider the machine-generated denizens of the Circle of Sound’s
Region Three, whether existing as work or process, as constituting examples
of music or not.

Chapter 7 will summarise the issues covered in Chapters 1–6 in the course
of an exploration of certain similarities between evolution and conscious-
ness. In particular, it will use Universal Darwinism to understand evolution
(including music-cultural evolution) as a form of consciousness, and vice
versa. It will: review current theories of consciousness, endorsing those
564 Music in Evolution and Evolution in Music

that conceive it as not only shaped by evolutionary processes, but also as


itself an implementation of the VRS algorithm; frame consciousness as a
form of fast-acting evolution, and evolution as a form of slow-acting con-
sciousness; revisit the discussion of music-language coevolution in Chapter
3 to explore further some of the issues pertinent to consciousness raised
there; use the phenomenon of tonal-system change in music as an example
of music-historical/structural consciousness; and assess the extent to which
the new forms of information storage and transmission afforded by com-
puter technology and the internet motivate an expansion of the ontology of
replicators and, more fundamentally, constitute nascent forms of evolution.

You might also like