Thanks to visit codestin.com
Credit goes to sourceforge.net

Showing 136 open source projects for "audio generation"

View related business solutions
  • See Everything. Miss Nothing. 30-day free trial Icon
    See Everything. Miss Nothing. 30-day free trial

    Don’t let IT surprises catch you off guard. PRTG keeps an eye on your whole network, so you don’t have to.

    As the IT backbone of your company, you can’t afford to miss a thing. PRTG monitors every device, application, and connection - on-premise and in the cloud. You get clear dashboards, smart alerts, and mobile access, so you’re always in control, wherever you are. No more guesswork or manual checks. PRTG’s powerful automation and easy setup mean you spend less time firefighting and more time moving your business forward. Discover how simple and reliable IT monitoring can be.
    Try PRTG 30-day full access trial
  • Build Securely on Azure with Proven Frameworks Icon
    Build Securely on Azure with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 1
    Text Generation Web UI

    Text Generation Web UI

    A gradio web UI for running Large Language Models like LLaMA

    A gradio web UI for running Large Language Models like LLaMA, llama.cpp, GPT-J, Pythia, OPT, and GALACTICA. Dropdown menu for switching between models. Notebook mode that resembles OpenAI's playground. Chat mode for conversation and role playing. Instruct mode compatible with Alpaca and Open Assistant formats. Nice HTML output for GPT-4chan. Markdown output for GALACTICA, including LaTeX rendering. Custom chat characters. Advanced chat features (send images, get audio responses with TTS). Very...
    Downloads: 16 This Week
    Last Update:
    See Project
  • 2
    Coqui TTS

    Coqui TTS

    A deep learning toolkit for Text-to-Speech, battle-tested in research

    TTS is a library for advanced Text-to-Speech generation. It's built on the latest research, was designed to achieve the best trade-off among ease-of-training, speed and quality. TTS comes with pre-trained models, tools for measuring dataset quality and is already used in 20+ languages for products and research projects. High-performance Deep Learning models for Text2Speech tasks. Text2Spec models (Tacotron, Tacotron2, Glow-TTS, SpeedySpeech). Speaker Encoder to compute speaker embeddings...
    Downloads: 34 This Week
    Last Update:
    See Project
  • 3
    AudioCraft

    AudioCraft

    Audiocraft is a library for audio processing and generation

    AudioCraft is a PyTorch library for text-to-audio and text-to-music generation, packaging research models and tooling for training and inference. It includes MusicGen for music generation conditioned on text (and optionally melody) and AudioGen for text-conditioned sound effects and environmental audio. Both models operate over discrete audio tokens produced by a neural codec (EnCodec), which acts like a tokenizer for waveforms and enables efficient sequence modeling. The repo provides...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 4
    HunyuanCustom

    HunyuanCustom

    Multimodal-Driven Architecture for Customized Video Generation

    HunyuanCustom is a multimodal video customization framework by Tencent Hunyuan, aimed at generating customized videos featuring particular subjects (people, characters) under flexible conditions, while maintaining subject/identity consistency. It supports conditioning via image, audio, video, and text, and can perform subject replacement in videos, generate avatars speaking given audio, or combine multiple subject images. The architecture builds on HunyuanVideo, with added modules for identity...
    Downloads: 8 This Week
    Last Update:
    See Project
  • The All-in-One Commerce Platform for Businesses - Shopify Icon
    The All-in-One Commerce Platform for Businesses - Shopify

    Shopify offers plans for anyone that wants to sell products online and build an ecommerce store, small to mid-sized businesses as well as enterprise

    Shopify is a leading all-in-one commerce platform that enables businesses to start, build, and grow their online and physical stores. It offers tools to create customized websites, manage inventory, process payments, and sell across multiple channels including online, in-person, wholesale, and global markets. The platform includes integrated marketing tools, analytics, and customer engagement features to help merchants reach and retain customers. Shopify supports thousands of third-party apps and offers developer-friendly APIs for custom solutions. With world-class checkout technology, Shopify powers over 150 million high-intent shoppers worldwide. Its reliable, scalable infrastructure ensures fast performance and seamless operations at any business size.
    Learn More
  • 5
    JUCE

    JUCE

    JUCE is an open-source cross-platform C++ application framework

    JUCE is an open-source cross-platform C++ application framework for creating high-quality desktop and mobile applications, including VST, VST3, AU, AUv3, RTAS and AAX audio plug-ins. JUCE can be easily integrated with existing projects via CMake, or can be used as a project generation tool via the Projucer, which supports exporting projects for Xcode (macOS and iOS), Visual Studio, Android Studio, Code::Blocks and Linux Makefiles as well as containing a source code editor. JUCE projects can...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 6
    Podcastfy.ai

    Podcastfy.ai

    Transforming Multimodal Content into Captivating Multilingual Audio

    Podcastfy is an open-source Python package that transforms multi-modal content (text, images) into engaging, multi-lingual audio conversations using GenAI. Input content includes websites, PDFs, youtube videos as well as images. Unlike UI-based tools focused primarily on note-taking or research synthesis (e.g. NotebookLM), Podcastfy focuses on the programmatic and bespoke generation of engaging, conversational transcripts and audio from a multitude of multi-modal sources enabling customization...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    Transformers

    Transformers

    State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX

    Transformers provides APIs and tools to easily download and train state-of-the-art pre-trained models. Using pre-trained models can reduce your compute costs, carbon footprint, and save you the time and resources required to train a model from scratch. These models support common tasks in different modalities. Text, for tasks like text classification, information extraction, question answering, summarization, translation, text generation, in over 100 languages. Images, for tasks like image...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 8
    Scribbletune

    Scribbletune

    Create music with JavaScript

    Scribbletune is a JavaScript library for creating music and sequences using a simple and intuitive syntax, allowing developers to generate MIDI files and integrate music composition into their applications.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 9
    Cookbook (Google Gemini)

    Cookbook (Google Gemini)

    Examples and guides for using the Gemini API

    The Gemini Cookbook is an official repository of examples and guides for using Google’s Gemini API. It provides a structured learning path with quick-start tutorials for beginners and practical examples for advanced users. The repository covers a wide range of Gemini capabilities, including text, images, video, speech, robotics, and multimodal interactions. It highlights newly introduced features such as Gemini 2.5 models (Flash and Pro), Gemini’s native image generation, Veo for video...
    Downloads: 5 This Week
    Last Update:
    See Project
  • Cloud data warehouse to power your data-driven innovation Icon
    Cloud data warehouse to power your data-driven innovation

    BigQuery is a serverless and cost-effective enterprise data warehouse that works across clouds and scales with your data.

    BigQuery Studio provides a single, unified interface for all data practitioners of various coding skills to simplify analytics workflows from data ingestion and preparation to data exploration and visualization to ML model creation and use. It also allows you to use simple SQL to access Vertex AI foundational models directly inside BigQuery for text processing tasks, such as sentiment analysis, entity extraction, and many more without having to deal with specialized models.
    Try for free
  • 10
    HunyuanVideo

    HunyuanVideo

    HunyuanVideo: A Systematic Framework For Large Video Generation Model

    HunyuanVideo is a cutting-edge framework designed for large-scale video generation, leveraging advanced AI techniques to synthesize videos from various inputs. It is implemented in PyTorch, providing pre-trained model weights and inference code for efficient deployment. The framework aims to push the boundaries of video generation quality, incorporating multiple innovative approaches to improve the realism and coherence of the generated content. Release of FP8 model weights to reduce GPU memory...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 11
    AirGap Vault

    AirGap Vault

    Installed on a smartphone that has no connection to any network

    AirGap is a crypto wallet system that lets you secure cypto assets with one secret on an offline device. The AirGap Vault application is installed on a dedicated device that has no connection to any network, thus it is air-gapped. The AirGap Wallet is installed on your everyday smartphone. AirGap Vault is responsible for secure key generation. Entropy from audio, video, touch and accelerometer are used together with the output of the hardware random number generator. The generated secret...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 12
    Qwen2.5-Omni

    Qwen2.5-Omni

    Capable of understanding text, audio, vision, video

    Qwen2.5-Omni is an end-to-end multimodal flagship model in the Qwen series by Alibaba Cloud, designed to process multiple modalities (text, images, audio, video) and generate responses both as text and natural speech in streaming real-time. It supports “Thinker-Talker” architecture, and introduces innovations for aligning modalities over time (for example synchronizing video/audio), robust speech generation, and low-VRAM/quantized versions to make usage more accessible. It holds state...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 13
    HunyuanVideo-Foley

    HunyuanVideo-Foley

    Multimodal Diffusion with Representation Alignment

    HunyuanVideo-Foley is a multimodal diffusion model from Tencent Hunyuan for high-fidelity Foley (sound effects) audio generation synchronized to video scenes. It is designed to generate audio that matches both visual content and textual semantic cues, for use in video production, film, advertising, games, etc. The model architecture aligns audio, video, and text representations to produce realistic synchronized soundtracks. Produces high-quality 48 kHz audio output suitable for professional use...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 14
    Diffusers

    Diffusers

    State-of-the-art diffusion models for image and audio generation

    Diffusers is the go-to library for state-of-the-art pretrained diffusion models for generating images, audio, and even 3D structures of molecules. Whether you're looking for a simple inference solution or training your own diffusion models, Diffusers is a modular toolbox that supports both. Our library is designed with a focus on usability over performance, simple over easy, and customizability over abstractions. State-of-the-art diffusion pipelines that can be run in inference with just a few...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 15
    Audiogen Codec

    Audiogen Codec

    48khz stereo neural audio codec for general audio

    AGC (Audiogen Codec) is a convolutional autoencoder based on the DAC architecture, which holds SOTA. We found that training with EMA and adding a perceptual loss term with CLAP features improved performance. These codecs, being low compression, outperform Meta's EnCodec and DAC on general audio as validated from internal blind ELO games. We trained (relatively) very low compression codecs in the pursuit of solving a core issue regarding general music and audio generation, low acoustic quality...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    MeloTTS

    MeloTTS

    High-quality multi-lingual text-to-speech library by MyShell.ai

    MeloTTS is an open-source text-to-speech (TTS) system that generates natural-sounding speech from text input. It utilizes advanced machine-learning models to produce high-quality audio outputs.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 17
    OpenAI .NET

    OpenAI .NET

    The official .NET library for the OpenAI API

    ... or configuration rather than hardcoding. Every synchronous method has an async counterpart, and the library offers convenient streaming primitives for chat completions so you can process tokens as they arrive. It supports tool/function calling, structured outputs via JSON schema, audio input/output, image generation, embeddings, Whisper transcription, and assistants with retrieval augmented generation.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 18
    OpenAI Go

    OpenAI Go

    The official Go library for the OpenAI API

    OpenAI Go is the official Go client library for accessing the OpenAI API. It enables developers to integrate OpenAI’s models and features into Go applications with a clean and idiomatic interface. The library provides support for a wide range of API endpoints including chat completions, assistants, embeddings, image generation, audio processing, and batch jobs. It includes built-in tools for handling authentication, managing API requests, and parsing structured responses. The repository also...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    GLM-4-Voice

    GLM-4-Voice

    GLM-4-Voice | End-to-End Chinese-English Conversational Model

    GLM-4-Voice is an open-source speech-enabled model from ZhipuAI, extending the GLM-4 family into the audio domain. It integrates advanced voice recognition and generation with the multimodal reasoning capabilities of GLM-4, enabling smooth natural interaction via spoken input and output. The model supports real-time speech-to-text transcription, spoken dialogue understanding, and text-to-speech synthesis, making it suitable for conversational AI, virtual assistants, and accessibility...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 20
    AudioKit

    AudioKit

    Swift audio synthesis, processing, & analysis platform

    ... AudioKit code repository. We accept and encourage Github sponsorship of the people who spend a lot of time supporting AudioKit. We want to inspire the next generation of audio app developers and we do that by highlighting AudioKit-powered apps and by creating our own apps under the "AudioKit Pro" brand including the world's most downloaded synth "AudioKit Synth One" and a host of other AudioKit Pro apps.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    Kitten TTS

    Kitten TTS

    State-of-the-art TTS model under 25MB

    KittenTTS is an open-source, ultra-lightweight, and high-quality text-to-speech model featuring just 15 million parameters and a binary size under 25 MB. It is designed for real-time CPU-based deployment across diverse platforms. Ultra-lightweight, model size less than 25MB. CPU-optimized, runs without GPU on any device. High-quality voices, several premium voice options available. Fast inference, optimized for real-time speech synthesis.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    miniaudio

    miniaudio

    Audio playback and capture library written in C,

    miniaudio is written in C with no dependencies except the standard library and should compile cleanly on all major compilers without the need to install any additional development packages. All major desktop and mobile platforms are supported. miniaudio gives you complete flexibility. With the low-level API, just initialize a connection to the device and send or receive raw audio data. The modular design of miniaudio allows you to use the low-level API without compromising your ability to make...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    gm

    gm

    R Package for Music Score and Audio Generation

    Create music easily, and show musical scores and audio files in R Markdown documents, R Jupyter Notebooks and RStudio.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Adversarial Robustness Toolbox

    Adversarial Robustness Toolbox

    Adversarial Robustness Toolbox (ART) - Python Library for ML security

    Adversarial Robustness Toolbox (ART) is a Python library for Machine Learning Security. ART provides tools that enable developers and researchers to evaluate, defend, certify and verify Machine Learning models and applications against the adversarial threats of Evasion, Poisoning, Extraction, and Inference. ART supports all popular machine learning frameworks (TensorFlow, Keras, PyTorch, MXNet, sci-kit-learn, XGBoost, LightGBM, CatBoost, GPy, etc.), all data types (images, tables, audio, video...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 25
    Daptin

    Daptin

    Daptin - Backend As A Service - GraphQL/JSON-API Headless CMS

    ... with 3rd party APIs. Multilingual tables support, supports Accept-Language header. Cloud storage sync like gdrive, dropbox, b2, s3 and more. Asset column to hold file and blob data, backed by storage. Multiple websites under separate sub-domain/sub-paths. Connect with external APIs by using extension points. Data View Streams. Flexible data import (auto create new tables and automated schema generation).
    Downloads: 1 This Week
    Last Update:
    See Project