Generate ambient music from text. Locally. No GPU required.
import latentscore as ls
ls.render("warm sunset over water").play()That's it. One line. You get audio playing on your speakers.
⚠️ Alpha — under active development. API may change between versions. Read more about how it works.
Requires Python 3.10–3.12. If you don't have it: brew install [email protected] (macOS) or pyenv install 3.10.
pip install latentscoreOr with conda:
conda create -n latentscore python=3.10 -y
conda activate latentscore
pip install latentscorelatentscore doctor # check setup and model availability
latentscore demo # render and play a sample
latentscore demo --duration 30 # 30-second demo
latentscore demo --output ambient.wav # save to fileimport latentscore as ls
audio = ls.render("warm sunset over water", duration=10.0)
audio.play() # plays on your speakers
audio.save("output.wav") # save to WAVls.render("jazz cafe at midnight").play()
ls.render("thunderstorm on a tin roof").play()
ls.render("lo-fi study beats").play()Build a config directly with human-readable labels:
import latentscore as ls
config = ls.MusicConfig(
tempo="slow",
brightness="dark",
space="vast",
density=3,
bass="drone",
pad="ambient_drift",
melody="contemplative",
rhythm="minimal",
texture="shimmer",
echo="heavy",
root="d",
mode="minor",
)
ls.render(config, duration=10.0).play()Start from a vibe and override specific parameters:
import latentscore as ls
audio = ls.render(
"morning coffee shop",
duration=10.0,
update=ls.MusicConfigUpdate(
brightness="very_bright",
rhythm="electronic",
),
)
audio.play()Step(+1) moves one level up the scale, Step(-1) moves one down. Saturates at boundaries.
from latentscore.config import Step
audio = ls.render(
"morning coffee shop",
duration=10.0,
update=ls.MusicConfigUpdate(
brightness=Step(+2), # two levels brighter
space=Step(-1), # one level less spacious
),
)
audio.play()Chain vibes together with smooth crossfade transitions:
import latentscore as ls
stream = ls.stream(
"morning coffee",
"afternoon focus",
"evening wind-down",
duration=60, # 60 seconds per vibe
transition=5.0, # 5-second crossfade
)
stream.play()
# Or collect and save
stream.collect().save("session.wav")For dynamic, interactive use (games, installations, adaptive UIs), use a generator to feed vibes and steer the music in real time:
import asyncio
from collections.abc import AsyncIterator
import latentscore as ls
from latentscore.config import Step
async def my_set() -> AsyncIterator[str | ls.MusicConfigUpdate]:
yield "warm jazz cafe at midnight"
await asyncio.sleep(8)
# Absolute override: switch to bright electronic
yield ls.MusicConfigUpdate(tempo="fast", brightness="very_bright", rhythm="electronic")
await asyncio.sleep(8)
# Relative nudge: dial brightness back down, add more echo
yield ls.MusicConfigUpdate(brightness=Step(-2), echo=Step(+1))
session = ls.live(my_set(), transition_seconds=2.0)
session.play(seconds=30)Sync generators work too — use Iterator instead of AsyncIterator and time.sleep instead of await asyncio.sleep.
For web servers and async apps:
import asyncio
import latentscore as ls
async def main() -> None:
audio = await ls.arender("neon city rain")
audio.save("neon.wav")
asyncio.run(main())Use any LLM through LiteLLM — OpenAI, Anthropic, Google, Mistral, Groq, and 100+ others. LiteLLM is included with latentscore.
import latentscore as ls
# Gemini (free tier available)
ls.render("cyberpunk rain on neon streets", model="external:gemini/gemini-3-flash-preview").play()
# Claude
ls.render("cozy library with rain outside", model="external:anthropic/claude-sonnet-4-5-20250929").play()
# GPT
ls.render("space station ambient", model="external:openai/gpt-4o").play()API keys are read from environment variables automatically (GEMINI_API_KEY, ANTHROPIC_API_KEY, OPENAI_API_KEY).
External models return rich metadata alongside audio:
audio = ls.render("cyberpunk rain", model="external:gemini/gemini-3-flash-preview")
if audio.metadata is not None:
print(audio.metadata.title) # e.g. "Neon Rain Drift"
print(audio.metadata.thinking) # the LLM's reasoning
print(audio.metadata.config) # the MusicConfig it chose
for palette in audio.metadata.palettes:
print([c.hex for c in palette.colors])Note: LLM models are slower than the default
fastmodel (network round-trips) and can occasionally produce invalid configs. The built-infastmodel is recommended for production use.
You give LatentScore a vibe (a short text description) and it generates ambient music that matches.
The default fast model uses embedding-based retrieval: your vibe text gets embedded with a sentence transformer, then matched against a curated library of 10,000+ music configurations using cosine similarity. The best-matching config drives a real-time audio synthesizer.
An alternative fast_heavy model uses LAION-CLAP audio embeddings to match text against what configs actually sound like. It scores higher on automated CLAP benchmarks but requires a heavier dependency (laion-clap).
Both approaches are instant (~2s), 100% reliable (no LLM hallucinations), and require no API keys. Our CLAP benchmarks showed that embedding retrieval outperforms Claude Opus 4.5 and Gemini 3 Flash at mapping vibes to music configurations.
All audio produced by LatentScore follows this contract:
- Format:
float32mono - Sample rate:
44100Hz - Range:
[-1.0, 1.0] - Shape:
(n,)numpy array
import numpy as np
import latentscore as ls
audio = ls.render("deep ocean")
samples = np.asarray(audio) # NDArray[np.float32]Every MusicConfig field uses human-readable labels. Full reference:
| Field | Labels |
|---|---|
tempo |
very_slow slow medium fast very_fast |
brightness |
very_dark dark medium bright very_bright |
space |
dry small medium large vast |
motion |
static slow medium fast chaotic |
stereo |
mono narrow medium wide ultra_wide |
echo |
none subtle medium heavy infinite |
human |
robotic tight natural loose drunk |
attack |
soft medium sharp |
grain |
clean warm gritty |
density |
2 3 4 5 6 |
root |
c c# d ... a# b |
mode |
major minor dorian mixolydian |
Layer styles:
| Layer | Styles |
|---|---|
bass |
drone sustained pulsing walking fifth_drone sub_pulse octave arp_bass |
pad |
warm_slow dark_sustained cinematic thin_high ambient_drift stacked_fifths bright_open |
melody |
procedural contemplative rising falling minimal ornamental arp_melody contemplative_minor call_response heroic |
rhythm |
none minimal heartbeat soft_four hats_only electronic kit_light kit_medium military tabla_essence brush |
texture |
none shimmer shimmer_slow vinyl_crackle breath stars glitch noise_wash crystal pad_whisper |
accent |
none bells pluck chime bells_dense blip blip_random brass_hit wind arp_accent piano_note |
Not recommended. The default
fastandfast_heavymodels are faster, more reliable, and produce higher-quality results. Expressive mode exists for experimentation only.
Runs a 270M-parameter Gemma 3 LLM locally. On macOS Apple Silicon, inference uses MLX (~5–15s). On CPU-only Linux/Windows, it uses transformers (30–120s per render). The local model can produce invalid configs and our benchmarks showed it barely outperforms a random baseline.
pip install 'latentscore[expressive]'
latentscore download expressivels.render("jazz cafe at midnight", model="expressive").play()The data_work/ folder contains the full research pipeline: data preparation, LLM-based config generation, SFT/GRPO training on Modal, CLAP benchmarking, and model export.
See data_work/README.md and docs/architecture.md for details.
See CONTRIBUTE.md for environment setup and contribution guidelines.
See docs/coding-guidelines.md for code style requirements.