YAKE! - Yet Another Keyword Extractor

YAKE! is a lightweight, unsupervised, automatic keyword extraction method that uses text statistical features to select the most important keywords from a document.

Key Features

Unsupervised: No training data required, making it easy to use out-of-the-box
Language Independent: Works across different languages with built-in support for multiple languages
Domain Independent: Effective for various types of content including news articles, scientific papers, and web content
Single-Document: Designed to extract keywords from individual documents without needing a corpus
Customizable: Offers multiple parameters to fine-tune extraction for specific needs

Installation

pip install git+https://github.com/LIAAD/yake

This project uses uv for dependency management.

Install uv

curl -LsSf https://astral.sh/uv/install.sh | sh

Install the package

uv sync

Install in development mode

uv pip install -e ".[dev]"

Basic Usage

Python API

import yake

text = "Sources tell us that Google is acquiring Kaggle, a platform that hosts data science and machine learning competitions. Details about the transaction remain somewhat vague, but given that Google is hosting its Cloud Next conference in San Francisco this week, the official announcement could come as early as tomorrow."

# Simple extraction with default parameters
kw_extractor = yake.KeywordExtractor()
keywords = kw_extractor.extract_keywords(text)

for kw, score in keywords:
    print(f"{kw} ({score})")

Custom Parameters

# Configure the extractor with custom parameters
custom_kw_extractor = yake.KeywordExtractor(
    lan="en",                # Language
    n=3,                     # Maximum ngram size
    dedup_lim=0.9,            # Deduplication threshold
    dedup_func="seqm",        # Deduplication function
    window_size=1,           # Window size
    top=20                   # Number of keywords to extract
)

keywords = custom_kw_extractor.extract_keywords(text)

Command Line

yake -ti "Your text goes here" -l en -n 3 -v

Options:

    -ti, --text_input TEXT          Input text
    -i, --input_file TEXT           Input file
    -l, --language TEXT             Language 
    -n, --ngram-size INTEGER        Max size of the ngram
    -df, --dedup-func [leve|jaro|seqm]  Deduplication function
    -dl, --dedup-lim FLOAT          Deduplication threshold
    -ws, --window-size INTEGER      Window size
    -t, --top INTEGER               Number of keyphrases to extract
    -v, --verbose                   Show scores in output

Example Output

The lower the score, the more relevant the keyword is:

google (0.026580863364597897)
kaggle (0.0289005976239829)
san francisco (0.048810837074825336)
machine learning (0.09147989238151344)
data science (0.097574333771058)

Multilingual Support

YAKE! supports multiple languages:

# Portuguese example
custom_kw_extractor = yake.KeywordExtractor(lan="pt")
keywords = custom_kw_extractor.extract_keywords(portuguese_text)

References

Please cite the following works when using YAKE

Published at the Information Sciences Journal

Campos, R., Mangaravite, V., Pasquali, A., Jatowt, A., Jorge, A., Nunes, C. and Jatowt, A. (2020). YAKE! Keyword Extraction from Single Documents using Multiple Local Features. In Information Sciences Journal. Elsevier, Vol 509, pp 257-289. pdf

Conference papers at ECIR

Campos R., Mangaravite V., Pasquali A., Jorge A.M., Nunes C., and Jatowt A. (2018). A Text Feature Based Automatic Keyword Extraction Method for Single Documents. In: Pasi G., Piwowarski B., Azzopardi L., Hanbury A. (eds). Advances in Information Retrieval. ECIR 2018 (Grenoble, France. March 26 – 29). Lecture Notes in Computer Science, vol 10772, pp. 684 - 691. pdf

Campos R., Mangaravite V., Pasquali A., Jorge A.M., Nunes C., and Jatowt A. (2018). YAKE! Collection-independent Automatic Keyword Extractor. In: Pasi G., Piwowarski B., Azzopardi L., Hanbury A. (eds). Advances in Information Retrieval. ECIR 2018 (Grenoble, France. March 26 – 29). Lecture Notes in Computer Science, vol 10772, pp. 806 - 810. pdf (Best Short Paper Award)

Contributing

Please refer to the CONTRIBUTING.rst file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 158 Commits
.github/workflows		.github/workflows
demo		demo
docs		docs
tests		tests
yake		yake
.gitignore		.gitignore
AUTHORS.rst		AUTHORS.rst
CONTRIBUTING.rst		CONTRIBUTING.rst
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

YAKE! - Yet Another Keyword Extractor

Key Features

Installation

Install uv

Install the package

Install in development mode

Basic Usage

Python API

Custom Parameters

Command Line

Example Output

Multilingual Support

References

Contributing

About

Uh oh!

Releases

Packages

Languages

License

arianpasquali/yake

Folders and files

Latest commit

History

Repository files navigation

YAKE! - Yet Another Keyword Extractor

Key Features

Installation

Install uv

Install the package

Install in development mode

Basic Usage

Python API

Custom Parameters

Command Line

Example Output

Multilingual Support

References

Contributing

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages