- Demo
- Description
- Features
- Available models
- Project Structure
- Setup
- Database
- Usage Modes
- Error Handling
- Checklist
- Requirements
- AWS Lambda Usage with S3
- To Do
Check how works in this demo: YouTube.
App for analyzing conversation audio for various campaigns (such as support, sales, etc.) and verifying whether it meets defined requirements, such as script tracking (checklist), comments or phrases that should not be used, tone, audio quality, and more, using analytics and AI services.
- Support for multiple campaigns, each with its own checklist and audios.
- Two execution modes: single run (processes and exits) or continuous (watches for new audios).
- Transcribe audio using different models (OpenAI Whisper, Mistral Voxtral or any model compatible with OpenAI SDK)
- Analyze audio for detail and justification (following a predefined conversation script, emotional and tone analysis, audio quality, and compliance summary).
- Process using different AI models or services (for example: use OpenAI Whisper for transcription and Mistral for anonymizer and analysis).
- Inserts the results in a structured manner into a database for further analysis.
- Data protection: Does not include personal information (PII) or sensitive data (e.g., credit card numbers, social security numbers, etc.) in the analysis and hides this information in the JSON response (use [SENSITIVE] to hide it). For this, it uses AI after the transcription to send the anonymized text for the analysis.
- Parallel processing: Processes multiple audios and campaigns in parallel.
- Language support: Supports multiple languages (Spanish, English, French, Portuguese, German, Italian, Dutch, Hindi).
- Failure recovery: Moves problematic audios to a
failedfolder for manual review, without stopping the process. - Modular and scalable structure.
- Transcription
- Anonymization (if enabled, using AI)
- Script tracking (checklist)
- Comments or phrases that should not be used (do not checklist)
- Tone
- Audio quality
- Compliance summary
- Feedback and improvement areas
- Strengths
- Objective/goal achievement
- OpenAI and any OpenAI SDK compatible model: For transcription and analysis.
- Whisper: For transcription.
- Any OpenAI SDK compatible model: For transcription and analysis.
- Any Whisper compatible model: For transcription using OpenAI SDK.
- Mistral: For transcription and analysis.
- Any Mistral API compatible model: For transcription and analysis.
- Huggingface: For transcription and analysis, with any available provider.
- Databases: Supabase, SQL Server (and any other database engine compatible, including Azure SQL), PostgreSQL (and any other database engine compatible), and MongoDB (and any other database engine compatible).
- Cloud providers: AWS Lambda with S3 for files.
- Support over 60 languages for audio analysis (more coming soon).
- The support for a specific language is determined by the AI model (some models support just a few languages, meanwhile others support over 100 languages).
- To check all the supported languages, just check the file
src/utils/languages.js.
audioanalyzer/
├── index.js
├── src/ --> Contains the core logic of the application.
├── package.json
├── .env
├── README.md
├── campaigns/
│ └── [campaign_name]/
│ ├── checklist.txt
│ └── audios/
└── processed/
└── [campaign_name]/
├── [processed_audio]
├── [result.txt]
└── failed/
├── [failed_audio]
└── [error.log]
- src/: System logic, modularized.
- campaigns/: Contains each campaign's folder, with its
checklist.txt(if using local no-db mode) andaudios/folder. - processed/: Stores processed audios and their results (if using local no-db mode). Includes a
failed/subfolder for audios that couldn't be processed.
-
Clone the repository and install dependencies:
git clone <repo-url> cd audioanalyzer npm install
-
Set up your keys:
- Create a
.envfile with the following content:AI_TRANSCRIBER_SERVICE=openai|mistral|huggingface AI_ANALYZER_SERVICE=openai|mistral|huggingface AI_ANONYMIZER_SERVICE=openai|mistral|huggingface ANONYMIZE_TRANSCRIPTION=true|false # Maximum number of campaigns to process in parallel CAMPAIGN_CONCURRENCY_LIMIT= # Maximum number of audios to be processed in parallel per campaign AUDIO_CONCURRENCY_LIMIT= # If OpenAI is used OPENAI_API_KEY=tu_api_key_aqui OPENAI_BASE_URL= (if nothing it will use the default from OpenAI) OPENAI_MODEL= # If Whisper is used (in other service than OpenAI) WHISPER_BASE_URL= (if nothing it will use the default from OpenAI) WHISPER_MODEL=whisper-1 WHISPER_API_KEY= (it could be the same as OPENAI_API_KEY) # IF Mistral is used MISTRAL_API_KEY=tu_mistral_api_key MISTRAL_AUDIO_MODEL=voxtral_model # The text model is used for anonymizing the transcription MISTRAL_TEXT_MODEL=mistral-model MISTRAL_ENDPOINT=https://api.mistral.ai/v1 # If Huggingface is used # Huggingface configuration HUGGINGFACE_API_KEY= HUGGINGFACE_PROVIDER_AUDIO= HUGGINGFACE_PROVIDER_TEXT= HUGGINGFACE_AUDIO_MODEL= HUGGINGFACE_TEXT_MODEL= # If you want to use a database (otherwise, if empty, results are only saved in the text file) DB_ENGINE=supabase|sqlserver|postgresql|mongodb # Supabase Configuration SUPABASE_URL= SUPABASE_ANON_KEY= SUPABASE_CAMPAIGN_TABLE_NAME= SUPABASE_RESULTS_TABLE_NAME= # SQL Server, Azure SQL, or PostgreSQL Configuration DBSERVER_USER= DBSERVER_PASSWORD= DBSERVER_SERVER= DBSERVER_DATABASE= DBSERVER_RESULTS_TABLE_NAME= DBSERVER_CAMPAIGN_TABLE_NAME= DBSERVER_SSL=true # For PostgreSQL DBSERVER_PORT=5432 # For PostgreSQL # MongoDB Configuration MONGODB_CONNECTION_STRING= MONGODB_DATABASE= MONGODB_RESULTS_TABLE_NAME= MONGODB_CAMPAIGN_TABLE_NAME=
- Create a
For more details on how to use the database and configure other engines, see the specific README
There are two ways to run the application.
Processes all pending audios once and then exits.
- Process all campaigns:
npm start
- Process only a specific campaign:
npm start [campaign_name]
The script stays active and automatically processes any new audio added to the campaigns/*/audios/ folders.
- Activate watch mode:
(The
npm start -- --watch
--is important to pass the flag to the script through npm).
If an audio cannot be processed (due to network error, API error, etc.), the system is robust:
- The problematic audio is moved to
processed/[campaign_name]/failed/. - A
.logfile is created next to the audio with error details. - The script continues processing other audios without interruption. This allows safe re-execution at any time to process pending audios.
If DB_ENGINE is empty, the checklist.txt will be used to check the do and do not lists to analyze the audio.
If DB_ENGINE is not empty, the checklist needs to be in the database (campaign table).
To select the language (the first line in the checklist text file), use the codes for the languages supported, available in the file src/utils/languages.js.
Checklist.txt format:
es
OBJECTIVE: The objective/goal of the campaign (sell a specific product, etc.)
# DO
Initial greeting
Company introduction
Request for customer number
Polite farewell
# DONT
Ask for password
- Node.js >= 16
- OpenAI account and API Key with access to Whisper and GPT
- Or Mistral account and API Key to use Voxtral to analyze the audio instead of OpenAI.
- Or any OpenAI compatible API to use as transcriber and analyzer.
This project can now run as a Lambda function, automatically processing audio files uploaded to an S3 bucket.
- Audios should be uploaded to:
campaigns/[campaign_name]/audios/[file] - The checklist should be in database or at:
campaigns/[campaign_name]/checklist.txt - Results will be in database or in the text file and processed files will be saved in:
processed/[campaign_name]/
- Package the code (including
node_modules) into a zip file. - Upload the zip as a Lambda function.
- Set the environment variables.
- Create an S3 trigger for the Lambda function:
- Event:
PUT - Prefix:
campaigns/ - Suffix: (empty or restricted to audio extensions)
- Event:
- Ensure the Lambda has permissions to read and write to the S3 bucket.
- Processing and saving results is now fully in S3 and
/tmp(Lambda's temp directory). - The Lambda policy template for S3 is in the file
configuration/aws/LambdaPolicyForS3.json. - Watcher mode and local processing remain available for use outside Lambda.
- Decoupling of AI services between transcriber and analyzer.
- Automatic anonymization of sensitive data (PII, credit cards, etc.) using AI and LLM.
- Integration with more AI engines and Huggingface SDK.
- Batch and parallel processing for audios and campaigns.
- Integration with more database engines.
- AWS Lambda Usage with S3.
- Objetive/goal achievement analysis included in the result.
- Integration with Cloud providers (Azure, AWS, Google Cloud, and others).
- Calculate costs incurred for each audio and at campaign level.
- Web app to manage campaigns, and results.
- Dashboard with interactive visualizations of the analyses.
- REST API so that other systems can use the analysis.
- Administration panel for user management and billing.
- Product usage analytics.
- Customization of analysis according to the sector (education, healthcare, call centers, etc.).
- Identification of best practices and model agents.
- Integration with CRMs and ticketing systems (Salesforce, Zendesk, HubSpot).
- Automatic alerts to supervisors in case of critical situations (upset customer, potential termination, etc.).