0% found this document useful (0 votes)

13 views4 pages

Audio Transcript Config

The document is a configuration file for audio transcription tasks, detailing various transcription implementations including Vosk, Wav2Vec2, Whisper, Microsoft Azure, and Google services. It includes options for language models, audio conversion commands, and specific settings for each transcription method. Additionally, it provides guidelines for setting up remote services and managing resource requirements for optimal transcription performance.

Uploaded by

Allencar Limma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views4 pages

Audio Transcript Config

Uploaded by

Allencar Limma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 4

# Audio transcript task config file.

#########################################
# Choose the transcription implementation
#########################################

# Default implementation uses Vosk transcription on local CPU (slow and medium
quality).
# We include small portable models for 'en' and 'pt-BR', if you want to use a
different language model
# you should download it from https://alphacephei.com/vosk/models and put in
'models/vosk/[lang]' folder.
#implementationClass = iped.engine.task.transcript.VoskTranscriptTask

# Uses a local wav2vec2 implementation for transcription. Accuracy is much better

than most Vosk models.
# This is up to 10x slower than Vosk on high end CPUs. Using a good GPU is highly
recommended!
# Please check the installation steps:
https://github.com/sepinf-inc/IPED/wiki/User-Manual#wav2vec2
# If you enable this, you must set 'huggingFaceModel' param below.
#implementationClass = iped.engine.task.transcript.Wav2Vec2TranscriptTask

# Uses a local Whisper implementation for transcription. Accuracy is better than

wav2vec2 depending on the model.
# This is up to 4x slower than wav2vec2 depending on compared models. Using a high
end GPU is strongly recommended!
# Please check the installation steps:
https://github.com/sepinf-inc/IPED/wiki/User-Manual#whisper
# If you enable this, you must set 'whisperModel' param below.
implementationClass = iped.engine.task.transcript.WhisperTranscriptTask

# Uses a remote service for transcription.

# The remote service is useful if you have a central server/cluster with many GPUs
to be shared among processing nodes.
# Please check steps on https://github.com/sepinf-inc/IPED/wiki/User-Manual#remote-
transcription
# If you enable this, you must set 'remoteServiceAddress' param below.
#implementationClass = iped.engine.task.transcript.RemoteTranscriptionTask

# If you want to use the Microsoft Azure service implementation, comment above and
uncomment below.
# You MUST include Microsoft client-sdk.jar into plugins folder.
# Download it from
https://csspeechstorage.blob.core.windows.net/maven/com/microsoft/
cognitiveservices/speech/client-sdk/1.19.0/client-sdk-1.19.0.jar
# You must pass your subscription key using command line parameter -
XazureSubscriptionKey=XXXXXXXX
#implementationClass = iped.engine.task.transcript.MicrosoftTranscriptTask

# If you want to use the Google Service implementation, comment above and uncomment
below.
# You must include google-cloud-speech-1.22.5.jar AND ITS DEPENDENCIES in plugins
folder.
# You can download an all-in-one jar from https://gitlab.com/iped-project/iped-
maven/-/blob/master/com/google/cloud/google-cloud-speech/1.22.5-shaded/google-
cloud-speech-1.22.5-shaded.jar
# Finally you must set environment variable GOOGLE_APPLICATION_CREDENTIALS pointing
to your credential file
#implementationClass = iped.engine.task.transcript.GoogleTranscriptTask

#########################################
# Global options
#########################################

# Language model(s) to use when processing audios. 'auto' uses the 'locale' set on
LocalConfig.txt
# You can specify one or two languages separated by ; if Microsoft or Google are
used.
# Vosk implementation accepts just one language for now.
# Wav2Vec2 local or remote implementation don't use this, you must set
'huggingFaceModel' below
# Setting more than 1 lang model can result in wrong language detection.
language = detect

# Command to convert audios to wav before transcription. Do not change $INPUT or

$OUTPUT params.
convertCommand = mplayer -benchmark -vo null -vc null -srate 16000 -af
format=s16le,resample=16000,channels=1 -ao pcm:fast:file=$OUTPUT $INPUT

# Mime types or supertypes to process. If you want to add videos use ; as separator
and update 'convertCommand'.
mimesToProcess = audio/3gpp; audio/3gpp2; audio/vnd.3gpp.iufp; audio/x-aac;
audio/x-aiff; audio/amr; audio/amr-wb; audio/amr-wb+; audio/mp4; audio/ogg;
audio/vorbis; audio/x-oggflac; audio/x-oggpcm; audio/opus; audio/speex;
audio/qcelp; audio/vnd.wave; audio/x-caf; audio/x-ms-wma; audio/x-opus+ogg;
audio/ilbc

# Skip known audios found in the hash lookup database.

skipKnownFiles = true

# Minimum number of seconds to wait for each audio transcription.

minTimeout = 180

# Number of seconds to wait for each audio second transcription. 'minTimeout' param
above is added to this.
timeoutPerSec = 3

#########################################
# VoskTranscriptTask options
#########################################

# Minimum word score to include the word in transcription result. Applies just to
Vosk implementation.
# If you don't want to see * in transcription results, set this to 0
minWordScore = 0.5

#########################################
# Local Wav2Vec2TranscriptTask options
#########################################

# HuggingFace model card to be used by Wav2Vec2TranscriptTask. You must uncomment

one of them.

# Small models for portuguese, ~23-24% WER on tested data sets

# huggingFaceModel = lgris/bp_400h_xlsr2_300M
# huggingFaceModel = Edresson/wav2vec2-large-xlsr-coraa-portuguese
# Small models for other languages, WER not evaluated
# huggingFaceModel = jonatasgrosman/wav2vec2-large-xlsr-53-english
# huggingFaceModel = jonatasgrosman/wav2vec2-large-xlsr-53-german
# huggingFaceModel = jonatasgrosman/wav2vec2-large-xlsr-53-italian
# huggingFaceModel = jonatasgrosman/wav2vec2-large-xlsr-53-spanish
# huggingFaceModel = jonatasgrosman/wav2vec2-large-xlsr-53-french

# Bigger models, better on tested pt-BR data sets, but uses more RAM and is ~2x
slower.
# The portuguese model has ~19% WER on tested data sets.
# huggingFaceModel = jonatasgrosman/wav2vec2-xls-r-1b-portuguese
# huggingFaceModel = jonatasgrosman/wav2vec2-xls-r-1b-english
# huggingFaceModel = jonatasgrosman/wav2vec2-xls-r-1b-german
# huggingFaceModel = jonatasgrosman/wav2vec2-xls-r-1b-italian
# huggingFaceModel = jonatasgrosman/wav2vec2-xls-r-1b-spanish
# huggingFaceModel = jonatasgrosman/wav2vec2-xls-r-1b-french

#########################################
# Local WhisperTranscriptTask options
#########################################

# Possible values: tiny, base, small, medium, large-v3, dwhoelz/whisper-large-pt-

cv11-ct2
# large-v3 is much better than medium, but 2x slower and uses 2x more memory.
# If you know the language you want to transcribe, please set the 'language' option
above.
# 'language = auto' uses the 'locale' set on LocalConfig.txt
# 'language = detect' uses auto detection, but it can cause mistakes
whisperModel = medium

# Which device to use: 'cpu' or 'gpu'. For 'gpu', please see the official faster-
whisper documentation
# for the installation requirements: https://github.com/SYSTRAN/faster-whisper?
tab=readme-ov-file#gpu
device = cpu

# Compute type precision. This affects accuracy, speed and memory usage.
# Possible values: float32 (better), float16 (recommended for GPU), int8 (faster)
precision = int8

# Batch size (number of parallel transcriptions). If you have a GPU with enough
memory,
# increasing this value to e.g. 16 can speed up transcribing long audios up to 10x.
# Test what is the better value for your GPU before hitting OOM.
# This works just if you are using whisperx library instead of faster_whisper
batchSize = 1

#########################################
# RemoteAudioTranscriptTask options
#########################################

# IP:PORT of the service/central node used by the RemoteWav2Vec2TranscriptTask

implementation.
# remoteServiceAddress = 127.0.0.1:11111

#########################################
# MicrosoftTranscriptTask options
#########################################
# Specific for Microsoft service. Replace with your own subscription region
identifier from here: https://aka.ms/speech/sdkregion
serviceRegion = brazilsouth

# Depending on your subscription, Microsoft limits the number of max concurrent

requests
maxConcurrentRequests = 100

#########################################
# GoogleTranscriptTask options
#########################################

# Depending on your subscription, Google limits your request rate (per minute or
per second).
requestIntervalMillis = 67

# Set the Google transcription model to be used.

# Some possible values: default, phone_call, video, latest_short, latest_long
# For more information, see
https://cloud.google.com/speech-to-text/docs/transcription-model
googleModel = latest_long

Fine-Tuning Whisper for Multilingual ASR
No ratings yet
Fine-Tuning Whisper for Multilingual ASR
24 pages
Walmart Factory List
100% (2)
Walmart Factory List
5 pages
Distance Learning Courses DLEN
No ratings yet
Distance Learning Courses DLEN
35 pages
Tilting Vice PDF
No ratings yet
Tilting Vice PDF
33 pages
It Ix Sa1 Sample Paper
No ratings yet
It Ix Sa1 Sample Paper
3 pages
Offline Speech Recognition with VOSK
No ratings yet
Offline Speech Recognition with VOSK
3 pages
M3JP M3KP M3HP M3GP 2020
No ratings yet
M3JP M3KP M3HP M3GP 2020
252 pages
3 - Relaxation, Presence, Happiness - Acceptance
No ratings yet
3 - Relaxation, Presence, Happiness - Acceptance
182 pages
Virtual Palletization Plan FNDE
No ratings yet
Virtual Palletization Plan FNDE
299 pages
Comprehensive Guide to GA Crossover Techniques
No ratings yet
Comprehensive Guide to GA Crossover Techniques
65 pages
Geology of The Area
No ratings yet
Geology of The Area
4 pages
MSC Solid State Physics Lecture#3
No ratings yet
MSC Solid State Physics Lecture#3
17 pages
Lexicology Study Guide
No ratings yet
Lexicology Study Guide
34 pages
!pip Install Ibm - Watson
No ratings yet
!pip Install Ibm - Watson
6 pages
Impulse Invariance and Bilinear
No ratings yet
Impulse Invariance and Bilinear
8 pages
11 Ergonomics in Osh
No ratings yet
11 Ergonomics in Osh
9 pages
23:23:48
No ratings yet
23:23:48
364 pages
DeepSpeech 0.7.0 Release Notes
No ratings yet
DeepSpeech 0.7.0 Release Notes
15 pages
Sip Project
No ratings yet
Sip Project
7 pages
Onine Speech To Text Engine For Delimited Context
No ratings yet
Onine Speech To Text Engine For Delimited Context
90 pages
18 Amazon Rally-1
No ratings yet
18 Amazon Rally-1
11 pages
Desktop Voice Assiant Project Record
No ratings yet
Desktop Voice Assiant Project Record
9 pages
Origin of Adat Iban
No ratings yet
Origin of Adat Iban
7 pages
WhisperX Model
No ratings yet
WhisperX Model
6 pages
Fpv3dcam 3d FPV Camera Blackbird 2 User Guid Eng
No ratings yet
Fpv3dcam 3d FPV Camera Blackbird 2 User Guid Eng
16 pages
Purposive Communication - Lesson 3
No ratings yet
Purposive Communication - Lesson 3
7 pages
Translating Neural Signals To Text Using A Brain-Machine Interface
No ratings yet
Translating Neural Signals To Text Using A Brain-Machine Interface
10 pages
Class 12 Physics Electricity Experiment
No ratings yet
Class 12 Physics Electricity Experiment
18 pages
Fluent - Ai For ARM-AI Webinar v5 PDF
No ratings yet
Fluent - Ai For ARM-AI Webinar v5 PDF
33 pages
Audio Processing Dev Toolkit
No ratings yet
Audio Processing Dev Toolkit
1 page
7095 Aow10t Exemple
No ratings yet
7095 Aow10t Exemple
2 pages
Story Name: "The Story Canvas"
No ratings yet
Story Name: "The Story Canvas"
1 page
Gestation - Biology Presentation
No ratings yet
Gestation - Biology Presentation
8 pages
Fast and Lightweight On-Device TTS With Tacotron2 and LPCNet
No ratings yet
Fast and Lightweight On-Device TTS With Tacotron2 and LPCNet
5 pages
Azure VPN Setup for IT Professionals
No ratings yet
Azure VPN Setup for IT Professionals
19 pages
Graduation Project 2: Deep Representation Learning For Open Vocabulary EEG-to-Text Decoding
No ratings yet
Graduation Project 2: Deep Representation Learning For Open Vocabulary EEG-to-Text Decoding
42 pages
HuggingFace GPT2
No ratings yet
HuggingFace GPT2
43 pages
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming
No ratings yet
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming
10 pages
Inquiry Is A Learning Process That Motivates You To Obtain Knowledge or Information About People
100% (2)
Inquiry Is A Learning Process That Motivates You To Obtain Knowledge or Information About People
3 pages
Park psm74b - 1
No ratings yet
Park psm74b - 1
9 pages
MH 400
No ratings yet
MH 400
81 pages
Free Incoming Inspection Template
No ratings yet
Free Incoming Inspection Template
5 pages
Modality Adaptation For End-to-End Speech-to-Text Translation
No ratings yet
Modality Adaptation For End-to-End Speech-to-Text Translation
5 pages
Fine-Tune Whisper For Multilingual ASR With ? Transformers
No ratings yet
Fine-Tune Whisper For Multilingual ASR With ? Transformers
26 pages
Voice Assistant Using Python 2
No ratings yet
Voice Assistant Using Python 2
20 pages
OpenAI Whisper: Multilingual Speech Recognition
No ratings yet
OpenAI Whisper: Multilingual Speech Recognition
5 pages
Grade 9 - English All Unit 3 and Moments #3
No ratings yet
Grade 9 - English All Unit 3 and Moments #3
5 pages
Setting Up Packages For Speech Recognition
No ratings yet
Setting Up Packages For Speech Recognition
3 pages
Consulting Proposal
No ratings yet
Consulting Proposal
28 pages
Tacotron 2
No ratings yet
Tacotron 2
5 pages
Speech Recognition Transcription With Open Source ...
No ratings yet
Speech Recognition Transcription With Open Source ...
2 pages
Evaluation of State of Art Open-Source ASR Engines With Local Inferencing
No ratings yet
Evaluation of State of Art Open-Source ASR Engines With Local Inferencing
81 pages
Fine-Tune Wav2Vec2 For English ASR in Hugging Face With ? Transformers
No ratings yet
Fine-Tune Wav2Vec2 For English ASR in Hugging Face With ? Transformers
23 pages
Boosting Wav2Vec2 With N-Grams in ? Transformers
No ratings yet
Boosting Wav2Vec2 With N-Grams in ? Transformers
20 pages
Py Report
No ratings yet
Py Report
8 pages
Zero Shot Voice Cloning Guide
No ratings yet
Zero Shot Voice Cloning Guide
2 pages
Best Locally-Usable Voice Transcription AI in 2025
No ratings yet
Best Locally-Usable Voice Transcription AI in 2025
2 pages
Rare Project-2023-24 - 230614 - 163032
No ratings yet
Rare Project-2023-24 - 230614 - 163032
6 pages
Hydraulic System CX31 (UENR4778-01)
No ratings yet
Hydraulic System CX31 (UENR4778-01)
4 pages
ATI Ipynb
No ratings yet
ATI Ipynb
12 pages
Ai Voice Assistant PPT Project
No ratings yet
Ai Voice Assistant PPT Project
23 pages
Unit 3 NMU
No ratings yet
Unit 3 NMU
4 pages
Realtime Models Final Results
No ratings yet
Realtime Models Final Results
4 pages
Lec 7 Trans (Decoder) +ViT
No ratings yet
Lec 7 Trans (Decoder) +ViT
20 pages
Akinwunmi Akintan Ifs-19-0598 .Proposal Slide
No ratings yet
Akinwunmi Akintan Ifs-19-0598 .Proposal Slide
22 pages
224s 22 Lec1
No ratings yet
224s 22 Lec1
31 pages
Speech Recognition Techniques - GUVI
No ratings yet
Speech Recognition Techniques - GUVI
4 pages
NLP
No ratings yet
NLP
11 pages
Overview - Small LLM + Audio-Driven Talking-Head Pipeline
No ratings yet
Overview - Small LLM + Audio-Driven Talking-Head Pipeline
3 pages
Kragentic and Conversational Ai Platform
No ratings yet
Kragentic and Conversational Ai Platform
361 pages
Audio and Speech - OpenAI API
No ratings yet
Audio and Speech - OpenAI API
1 page
Agent
No ratings yet
Agent
20 pages
Requirements Ipex
No ratings yet
Requirements Ipex
1 page
Speech
No ratings yet
Speech
13 pages
FYP2 FinalReport F24 109 R WhisperMini
No ratings yet
FYP2 FinalReport F24 109 R WhisperMini
29 pages
Speech Recognition Architecture - Detailed View: 1. Acoustic Front-End (Feature Extraction)
No ratings yet
Speech Recognition Architecture - Detailed View: 1. Acoustic Front-End (Feature Extraction)
3 pages
Audio and Speech - OpenAI API
No ratings yet
Audio and Speech - OpenAI API
5 pages
Mid Defence Clone
No ratings yet
Mid Defence Clone
45 pages
Project Vision Updated
No ratings yet
Project Vision Updated
6 pages
Audio
No ratings yet
Audio
4 pages
VAD Tracker v2 Annotation Instructions
No ratings yet
VAD Tracker v2 Annotation Instructions
5 pages
Speech Recognition ML Only Procedure
No ratings yet
Speech Recognition ML Only Procedure
2 pages
System Overview
No ratings yet
System Overview
6 pages
Complete Installation and Running Guide - Whisper.c
No ratings yet
Complete Installation and Running Guide - Whisper.c
10 pages
Building A Windows Desktop AI Assistant (Python, Voice I - O, 3D Avatar)
No ratings yet
Building A Windows Desktop AI Assistant (Python, Voice I - O, 3D Avatar)
5 pages
Henmnath
No ratings yet
Henmnath
4 pages
# ? Jarvis AI Assistant
No ratings yet
# ? Jarvis AI Assistant
8 pages
Voice Assistant AI Python
No ratings yet
Voice Assistant AI Python
10 pages

Audio Transcript Config

Uploaded by

Audio Transcript Config

Uploaded by

# Audio transcript task config file.

# Uses a local wav2vec2 implementation for transcription. Accuracy is much better

# Uses a local Whisper implementation for transcription. Accuracy is better than

# Uses a remote service for transcription.

# Command to convert audios to wav before transcription. Do not change $INPUT or

# Skip known audios found in the hash lookup database.

# Minimum number of seconds to wait for each audio transcription.

# HuggingFace model card to be used by Wav2Vec2TranscriptTask. You must uncomment

# Small models for portuguese, ~23-24% WER on tested data sets

# Possible values: tiny, base, small, medium, large-v3, dwhoelz/whisper-large-pt-

# IP:PORT of the service/central node used by the RemoteWav2Vec2TranscriptTask

# Depending on your subscription, Microsoft limits the number of max concurrent

# Set the Google transcription model to be used.

You might also like