Thanks to visit codestin.com
Credit goes to github.com

Skip to content

jasper-zheng/nn_terrain

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Latent Terrain: Coordinates-to-Latents Generator for Neural Audio Autoencoders

New documentation page: https://jasper-zheng.github.io/nn_terrain/

Latent terrain is a coordinates-to-latents mapping model for neural audio autoencoders (such as RAVE, Music2Latent). A terrain is a surface map for the autoencoder's latent space, taking coordinates in a control space as inputs, and producing continuous latent vectors in real-time.

Latent terrain aims to open up the creative possibilities of latent space navigation, allowing one to adapt an autoencoder to easier-to-navigate interfaces (such as gestural controllers, stylus and tablets, XY-pads, and more), and build new musical instruments that compose and interact with AI audio generators.

An example latent space walk with Music2Latent:

terrain-walk.mp4

All documentation, installation and building instruction, please see our new web documents: https://jasper-zheng.github.io/nn_terrain/

Change Logs

Oct. 2025 v1.5.6.1 and v1.6.0.1

  • nn.terrain~:
    • Terrain training moved to multi-thread, train and plot_interval message won't block the main thread anymore.
    • The last argument in plot_interval now defines which latent dimension to sample, a new attribute plot_multi_channel is added for coloured plot.
  • nn.terrain.encode:
    • Supported autoencoders with stereo io channels (i.e., supoorted stable-audio-open-1.0);
    • Default encoder_batch_size chenged from 64 to 16;
    • Fixed the faulty argument list.
  • Cleaned up the codebase: Fourier-CPPN, Dataset classes moved to backend.cpp.

Aug. 2025 v1.5.6 and v1.6.0

  • [BREAKING CHANGE] Changed order of arguments for the plot_interval method in nn.terrain~, to be align with the value_boundaries attribute in nn.terrain.gui.
  • [BREAKING CHANGE] All attribute names updated.
  • Mouse behaviours in nn.terrain.gui updated: the output coordinates dictionary will be automatically updated whenever a "mouse up" is performed.
  • Fixed the playhead updates in the play mode in nn.terrain.gui.
  • [HELP FILES] Added a JS script for patch cords scripting.

May. 2025 v1.5.6-beta and v1.6.0-beta

  • The first release.

Build Instructions

Please refer to https://jasper-zheng.github.io/nn_terrain/compile/.

TODOs

  • [✕︎] Load and inference scripted mapping model exported bt torchscript.
  • [✔︎] Display terrain visualisation.
    • [✔︎] Greyscale (one-channel)
    • [✔︎] Multi-channel (yes but no documentation atm)
  • [✔︎] Interactive training of terrain models in Max MSP.
  • [✔︎] Customised configuration of Fourier-CPPNs (Tancik et al., 2020).
  • [✔︎] Example patches, tutorials...
  • [✕︎] PureData

Get in touch

Hi, this is Shuoyang (Jasper). nn.terrain~ is part of my ongoing PhD work on Discovering Musical Affordances in Neural Audio Synthesis, supervised by Anna Xambó Sedó and Nick Bryan-Kinns, and part of the work has been (will be) on putting AI audio generators into the hands of composers/musicians.

Therefore, I would love to have you involved in it - if you have any feedback, a features request, a demo / a device / or anything made with nn.terrain, I would love to hear. If you would like to collaborate on anything, please leave a message in this feedback form.

Acknowledgements

  • Shuoyang Zheng, the author of this work, is supported by UK Research and Innovation [EP/S022694/1].

  • This is built on top of acids-ircam's nn_tilde, with a lot of reused code including the cmakelists templates, backend.cpp, circular_buffer.h, and the model performing loop in nn.terrain_tilde.cpp.

  • Caillon, A., Esling, P., 2022. Streamable Neural Audio Synthesis With Non-Causal Convolutions. https://doi.org/10.48550/arXiv.2204.07064

  • Tancik, M., Srinivasan, P.P., Mildenhall, B., Fridovich-Keil, S., Raghavan, N., Singhal, U., Ramamoorthi, R., Barron, J.T., Ng, R., 2020. Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains. NeurIPS.

  • Vigliensoni, G., Fiebrink, R., 2023. Steering latent audio models through interactive machine learning, in: In Proceedings of the 14th International Conference on Computational Creativity.

About

Latent Terrain - Dissecting the Latent Space of Neural Audio Autoencoders

Resources

License

Stars

Watchers

Forks