Thanks to visit codestin.com
Credit goes to github.com

Skip to content

dmcc/speech-language-processing

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 

Repository files navigation

Speech and Natural Language Processing

A curated list of speech and natural language processing resources. Other awesome lists can be found in the awesome-awesomeness list. If you want to contribute to this list (please do), send me a pull request. All Sub-caterogires are listed in alphabetical order

Finite State Toolkits and Regular Expressions

  • AT&T FSM Library
  • dk.brics.automaton Java toolkit for FSAs and regular expression.
  • Fare Fare is a finite state and regular expression libary for the .NET framework written in C#. am is a JavaScript library for working with automata and formal grammars for regular and context-free languages
  • Foma Finite-state compiler and C library
  • fsa Toolkit used in RWTH ASR engine
  • fsm2.0 Thomas Hanneforths fsm 2.0 library written C++ has a few nice operations such as three-way composition
  • fstrain A toolkit for training finite-state models
  • Kleene programming language High level finite state programming language built on top of OpenFst.
  • MIT FST Toolkit
  • MoMs-for-StochasticLanguages Spectral and other training algorithms for WFSAs.
  • Noam "Noam is a JavaScript library for working with automata and formal grammars for regular and context-free languages". Also has pretty cool examples using the viz.js
  • OpenFst
  • openfst-utils Nice set of utilities for OpenFst includes implemenation of Categorial semirings.openfst-utils.
  • openlat Toolkit for manipulating word lattice built on top of OpenFst. Includes support for reading and writing HTK compatiable lattices.
  • SFST - Stuttgart Finite State Transducer Tools "SFST is a toolbox for the implementation of morphological analysers and other tools which are based on finite state transducer technology."
  • Treb "Treba is a basic command-line tool for training, decoding, and calculating with weighted (probabilistic) finite state automata (PFSA) and Hidden Markov Models (HMMs)."

Many of the toools in the machine translation section also implement interesting graph and semiring operations.

Language Modelling Toolkits

  • Berkely LM
  • Bigfatlm Provides Hadoop training of Kneser-ney language models, written in Java.
  • CSLM "Continuous Space Language Model toolkit. CSLM toolkit is open-source software which implements the so-called continuous space language model.
  • DALM Double array language model.
  • KenLM Kenneth Heafield's language model toolkit, uses a very fast and low memory representation.
  • lwlm lwlm is an exact, full Bayesian implementation of the Latent Words Language Model (Deschacht and Moens, 2009).
  • Maximum Entropy Modeling Le Zhang has a comprehensive set of links related MaxEnt models.
  • Maximum entropy language models: SRILM extension "This patch adds the functionality to train and apply maximum entropy (MaxEnt) language models to the SRILM toolkit. Currently, only N-gram features are supported"
  • mitlm My personal favourite LM toolkit, super fast and seems to get slighlty higher accuracy.
  • MSRLM "This scalable language-model tool is used to build language models from large amounts of data. It supports modified absolute discounting and Kneser-Ney smoothing."
  • OpenGrm Language modelling toolkit for use with OpenFst.
  • cpyp C++ library for modeling with Pitman-Yor processes
  • RandLM Bloom filter based random language models
  • RNNLM Recurrent neural network language model toolkit.
  • Refr Re-ranking framework from the Johns-Hopkins workshop on confusion language modelling.
  • SRILM

Speech Recognition

  • Bavieca New open source toolkit featuring static and dynamic decoders.
  • CMU Sphinx Open Source Toolkit For Speech Recognition Project by Carnegie Mellon University
  • HTK "The Hidden Markov Model Toolkit (HTK) is a portable toolkit for building and manipulating hidden Markov models."
  • Kaldi Modern open source toolkit lead by Dan Povey featuring many state-of-the-art techniques.
  • Phonetisaurus Josef Novak's super fast WFST based Phoneticizer, site also has some really nice tutorials slides.
  • SCARF: A Segmental CRF Toolkit for Speech Recognition "SCARF is a toolkit for doing speech recognition with segmental conditional random fields."
  • trainc David Rybach and Michael Riley's tool for direct construction of context-dependency transducers (Interspeech best paper).
  • RASR RWTH ASR - The RWTH Aachen University Speech Recognition System

Machine Translation

  • Berkeley Aligner "...a word alignment software package that implements recent innovations in unsupervised word alignment."
  • cdec "Decoder, aligner, and model optimizer for statistical machine translation and other structured prediction models based on (mostly) context-free formalisms"
  • Jane "Jane is RWTH's open source statistical machine translation toolkit. Jane supports state-of-the-art techniques for phrase-based and hierarchical phrase-based machine translation."
  • Joshua Hierachical and syntax based machine translation decoder written in Java.
  • Moses Standard open source machine translation toolkit.
  • alignment-with-openfst <https://github.com/ldmt-muri/alignment-with-openfst>
  • zmert Nice Java Mert implementation by Omar F. Zaidan

Machine Learning

  • sofia-ml Fast incremental learning algorithms for classification, regression, ranking from Google.
  • Spearmint Spearmint is a package to perform Bayesian optimization according to the algorithms outlined in the paper: Practical Bayesian Optimization of Machine Learning Algorithms Jasper Snoek, Hugo Larochelle and Ryan P. Adams Advances in Neural Information Processing Systems, 2012
  • BIDData BIDMat is a matrix library intended to support large-scale exploratory data analysis. Its sister library BIDMach implements the machine learning layer.

Deep Learning

Natural Language Processing

  • SEAL Set expander for any language described in this paper
  • BLLIP reranking parser "BLLIP Parser is a statistical natural language parser including a generative constituent parser (first-stage) and discriminative maximum entropy reranker (second-stage)."

Other Tools

  • GrpahViz.sty Really handy tool adding dot languge directly to a LaTex document, useful for tweaking the small colorized WFST figure in papers and presentations.

Blogs

Books

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published