Sonaly

Index

Demo

Check how works in this demo: YouTube.

Description

App for analyzing conversation audio for various campaigns (such as support, sales, etc.) and verifying whether it meets defined requirements, such as script tracking (checklist), comments or phrases that should not be used, tone, audio quality, and more, using analytics and AI services.

Features

Support for multiple campaigns, each with its own checklist and audios.
Two execution modes: single run (processes and exits) or continuous (watches for new audios).
Transcribe audio using different models (OpenAI Whisper, Mistral Voxtral or any model compatible with OpenAI SDK)
Analyze audio for detail and justification (following a predefined conversation script, emotional and tone analysis, audio quality, and compliance summary).
Process using different AI models or services (for example: use OpenAI Whisper for transcription and Mistral for anonymizer and analysis).
Inserts the results in a structured manner into a database for further analysis.
Data protection: Does not include personal information (PII) or sensitive data (e.g., credit card numbers, social security numbers, etc.) in the analysis and hides this information in the JSON response (use [SENSITIVE] to hide it). For this, it uses AI after the transcription to send the anonymized text for the analysis.
Parallel processing: Processes multiple audios and campaigns in parallel.
Language support: Supports multiple languages (Spanish, English, French, Portuguese, German, Italian, Dutch, Hindi).
Failure recovery: Moves problematic audios to a failed folder for manual review, without stopping the process.
Modular and scalable structure.

Performed analysis

Transcription
Anonymization (if enabled, using AI)
Script tracking (checklist)
Comments or phrases that should not be used (do not checklist)
Tone
Audio quality
Compliance summary
Feedback and improvement areas
Strengths
Objective/goal achievement

Available models

OpenAI and any OpenAI SDK compatible model: For transcription and analysis.
Whisper: For transcription.
Any OpenAI SDK compatible model: For transcription and analysis.
Any Whisper compatible model: For transcription using OpenAI SDK.
Mistral: For transcription and analysis.
Any Mistral API compatible model: For transcription and analysis.
Huggingface: For transcription and analysis, with any available provider.

Integrations

Databases: Supabase, SQL Server (and any other database engine compatible, including Azure SQL), PostgreSQL (and any other database engine compatible), and MongoDB (and any other database engine compatible).
Cloud providers: AWS Lambda with S3 for files.

Languages

Support over 60 languages for audio analysis (more coming soon).
The support for a specific language is determined by the AI model (some models support just a few languages, meanwhile others support over 100 languages).
To check all the supported languages, just check the file src/utils/languages.js.

Project Structure

audioanalyzer/
├── index.js
├── src/ --> Contains the core logic of the application.
├── package.json
├── .env
├── README.md
├── campaigns/
│   └── [campaign_name]/
│       ├── checklist.txt
│       └── audios/
└── processed/
    └── [campaign_name]/
        ├── [processed_audio]
        ├── [result.txt]
        └── failed/
            ├── [failed_audio]
            └── [error.log]

src/: System logic, modularized.
campaigns/: Contains each campaign's folder, with its checklist.txt (if using local no-db mode) and audios/ folder.
processed/: Stores processed audios and their results (if using local no-db mode). Includes a failed/ subfolder for audios that couldn't be processed.

Setup

Clone the repository and install dependencies:

git clone <repo-url>
cd audioanalyzer
npm install

Set up your keys:

Create a .env file with the following content:

AI_TRANSCRIBER_SERVICE=openai|mistral|huggingface
AI_ANALYZER_SERVICE=openai|mistral|huggingface
AI_ANONYMIZER_SERVICE=openai|mistral|huggingface

ANONYMIZE_TRANSCRIPTION=true|false

# Maximum number of campaigns to process in parallel
CAMPAIGN_CONCURRENCY_LIMIT=

# Maximum number of audios to be processed in parallel per campaign
AUDIO_CONCURRENCY_LIMIT=

# If OpenAI is used
OPENAI_API_KEY=tu_api_key_aqui
OPENAI_BASE_URL= (if nothing it will use the default from OpenAI)
OPENAI_MODEL=

# If Whisper is used (in other service than OpenAI)
WHISPER_BASE_URL= (if nothing it will use the default from OpenAI)
WHISPER_MODEL=whisper-1
WHISPER_API_KEY= (it could be the same as OPENAI_API_KEY)

# IF Mistral is used
MISTRAL_API_KEY=tu_mistral_api_key
MISTRAL_AUDIO_MODEL=voxtral_model
# The text model is used for anonymizing the transcription
MISTRAL_TEXT_MODEL=mistral-model
MISTRAL_ENDPOINT=https://api.mistral.ai/v1

# If Huggingface is used
# Huggingface configuration
HUGGINGFACE_API_KEY=
HUGGINGFACE_PROVIDER_AUDIO=
HUGGINGFACE_PROVIDER_TEXT=
HUGGINGFACE_AUDIO_MODEL=
HUGGINGFACE_TEXT_MODEL=

# If you want to use a database (otherwise, if empty, results are only saved in the text file)
DB_ENGINE=supabase|sqlserver|postgresql|mongodb

# Supabase Configuration
SUPABASE_URL=
SUPABASE_ANON_KEY=
SUPABASE_CAMPAIGN_TABLE_NAME=
SUPABASE_RESULTS_TABLE_NAME=

# SQL Server, Azure SQL, or PostgreSQL Configuration
DBSERVER_USER=
DBSERVER_PASSWORD=
DBSERVER_SERVER=
DBSERVER_DATABASE=
DBSERVER_RESULTS_TABLE_NAME=
DBSERVER_CAMPAIGN_TABLE_NAME=
DBSERVER_SSL=true # For PostgreSQL
DBSERVER_PORT=5432 # For PostgreSQL

# MongoDB Configuration
MONGODB_CONNECTION_STRING=
MONGODB_DATABASE=
MONGODB_RESULTS_TABLE_NAME=
MONGODB_CAMPAIGN_TABLE_NAME=

Database

For more details on how to use the database and configure other engines, see the specific README

Usage Modes

There are two ways to run the application.

1. Single Run

Processes all pending audios once and then exits.

Process all campaigns:
```
npm start
```
Process only a specific campaign:
```
npm start [campaign_name]
```

2. Continuous Watch Mode

The script stays active and automatically processes any new audio added to the campaigns/*/audios/ folders.

Activate watch mode:
```
npm start -- --watch
```
(The -- is important to pass the flag to the script through npm).

Error Handling

If an audio cannot be processed (due to network error, API error, etc.), the system is robust:

The problematic audio is moved to processed/[campaign_name]/failed/.
A .log file is created next to the audio with error details.
The script continues processing other audios without interruption. This allows safe re-execution at any time to process pending audios.

Checklist

If DB_ENGINE is empty, the checklist.txt will be used to check the do and do not lists to analyze the audio.

If DB_ENGINE is not empty, the checklist needs to be in the database (campaign table).

To select the language (the first line in the checklist text file), use the codes for the languages supported, available in the file src/utils/languages.js.

Checklist.txt format:

es
OBJECTIVE: The objective/goal of the campaign (sell a specific product, etc.)

# DO
Initial greeting
Company introduction
Request for customer number
Polite farewell

# DONT
Ask for password

Requirements

Node.js >= 16
OpenAI account and API Key with access to Whisper and GPT
Or Mistral account and API Key to use Voxtral to analyze the audio instead of OpenAI.
Or any OpenAI compatible API to use as transcriber and analyzer.

AWS Lambda Usage with S3

This project can now run as a Lambda function, automatically processing audio files uploaded to an S3 bucket.

Expected S3 Structure

Audios should be uploaded to: campaigns/[campaign_name]/audios/[file]
The checklist should be in database or at: campaigns/[campaign_name]/checklist.txt
Results will be in database or in the text file and processed files will be saved in: processed/[campaign_name]/

Deployment Steps:

Package the code (including node_modules) into a zip file.
Upload the zip as a Lambda function.
Set the environment variables.
Create an S3 trigger for the Lambda function:
- Event: PUT
- Prefix: campaigns/
- Suffix: (empty or restricted to audio extensions)
Ensure the Lambda has permissions to read and write to the S3 bucket.

Notes

Processing and saving results is now fully in S3 and /tmp (Lambda's temp directory).
The Lambda policy template for S3 is in the file configuration/aws/LambdaPolicyForS3.json.
Watcher mode and local processing remain available for use outside Lambda.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
configuration/aws		configuration/aws
src		src
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
index.js		index.js
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Sonaly

Index

Demo

Description

Features

Performed analysis

Available models

Integrations

Languages

Project Structure

Setup

Database

Usage Modes

1. Single Run

2. Continuous Watch Mode

Error Handling

Checklist

Requirements

AWS Lambda Usage with S3

Expected S3 Structure

Deployment Steps:

Notes

To Do

About

Uh oh!

Releases

Packages

Languages

Techgethr/audioanalyzer

Folders and files

Latest commit

History

Repository files navigation

Sonaly

Index

Demo

Description

Features

Performed analysis

Available models

Integrations

Languages

Project Structure

Setup

Database

Usage Modes

1. Single Run

2. Continuous Watch Mode

Error Handling

Checklist

Requirements

AWS Lambda Usage with S3

Expected S3 Structure

Deployment Steps:

Notes

To Do

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages