Stars
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
Kyutai's Speech-To-Text and Text-To-Speech models based on the Delayed Streams Modeling framework.
My investigation of Voxtral Mini-3B's capabilities.
Voxtral: Convert Mistral into a end2end SpeechLM. No information bottleneck, preserves prosody, learns interruptions from data. Unlike GPT4o (closed) or Moshi (complex), it's open, simple, natural.
Open-source Rust based AI meeting assistant with 4x faster Parakeet/Whisper live transcription, speaker diarization, and Ollama summarization. 100% local processing. no cloud required. Meetily (Me…
Docker Automated Build Repository for siomiz/chrome -- Google Chrome via VNC (or via Chrome Remote Desktop)
Efficient Inference of Transformer models
Command-line client for WebSockets, like netcat (or curl) for ws:// with advanced socat-like functions
A gdbstub for connecting GDB to a RISC-V Debug Module
Template for Xilinx Vivado projects
starwaredesign / vivado-docker
Forked from BBN-Q/vivado-dockerDockerfile with Vivado for CI
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Port of OpenAI's Whisper model in C/C++
Minimal HW-based demo of GPUDirect RDMA on NVIDIA Jetson AGX Xavier running L4T
The easiest & fastest way to run customized and fine-tuned LLMs locally or on the edge
Implements a no-dependencies (e.g. to ESP-IDF or Arduino-ESP32) SDK for the ESP32.
A digital logic designer and circuit simulator.
Intel I225/I226 igc driver for Synology Kernel 4.4.180
Robust Speech Recognition via Large-Scale Weak Supervision
Hardware implementation of the SHA-256 cryptographic hash function
ASR/NLP/TTS deep learning inference library for NVIDIA Jetson using PyTorch and TensorRT