langid is a fast and efficient language identification tool, serving as a rewrite of the original langid/py3langid. This library provides a robust solution for detecting languages in text data. It is designed for developers, researchers, and anyone needing to quickly identify the language of a given text.
- Speed: Optimized for quick detection, making it suitable for large datasets.
- Accuracy: High accuracy rates for a wide range of languages.
- Lightweight: Minimal dependencies for easy integration.
- Flexible: Works with various input formats and sizes.
- Easy to Use: Simple API for quick implementation.
This repository covers a variety of topics relevant to language detection:
- detect-language
- detect-languages
- langid
- language-detection
- language-detection-lib
- language-detection-library
- language-detector
- language-identification
- language-recognition
- nlp
- whatlang
To install langid, you can use pip. Run the following command in your terminal:
pip install langidHere's a simple example of how to use langid in your Python project:
import langid
text = "Bonjour tout le monde"
language, confidence = langid.classify(text)
print(f"Detected language: {language} with confidence {confidence}")This code snippet will detect the language of the input text and provide a confidence score.
-
Parameters:
text(str): The text you want to analyze.
-
Returns:
- A tuple containing the detected language code and confidence score.
langid supports a wide range of languages. Here are a few examples:
- English (
en) - French (
fr) - Spanish (
es) - German (
de) - Chinese (
zh) - And many more...
You can find more examples in the examples directory.
To run tests for the langid library, use the following command:
pytestMake sure you have pytest installed. You can install it using pip:
pip install pytestContributions are welcome! If you want to contribute to langid, please follow these steps:
- Fork the repository.
- Create a new branch for your feature or bug fix.
- Make your changes and commit them.
- Push your branch to your forked repository.
- Create a pull request.
This project is licensed under the MIT License. See the LICENSE file for details.
You can find the latest releases of langid here. Download the appropriate version and follow the instructions for installation.
For any questions or suggestions, feel free to open an issue in the repository or contact the maintainer.
Thanks to the contributors and the community for their support in making langid a better tool.
If you find this library useful, please consider giving it a star on GitHub!
For more detailed information on language identification and natural language processing, consider the following resources:
- Natural Language Processing with Python
- Deep Learning for Natural Language Processing
- The Stanford NLP Group
To get started, simply clone the repository and run the example scripts:
git clone https://github.com/eliangonde/langid.git
cd langid
python example.pyJoin our community on GitHub and contribute to the project. We welcome discussions, suggestions, and feedback.
For the latest updates, visit the Releases section.