pagerank

A lightweight, well-tested Python implementation of Google's PageRank algorithm and TextRank for keyword extraction.

Features

Lightweight PageRank: A clean implementation of the PageRank algorithm using power iteration.
TextRank for SEO: Extract meaningful keywords from text documents to understand topics and improve SEO.
Graph Flexibility: Works with graphs represented as dictionaries or lists of lists.
Customizable: Tweak parameters like damping factor, convergence tolerance, and max iterations.
Well-Tested: High test coverage to ensure reliability.
Typed: Fully type-hinted for better code quality and editor support.

Installation

This project uses uv for dependency management.

# Install only the required dependencies for production use
uv sync

# Install all development dependencies for contributing
uv sync --all-extras

You will also need to download NLTK data for TextRank. A helper script is provided:

uv run python download_nltk_data.py

Quick Start

PageRank Example

Calculate PageRank scores for a simple graph. The scores represent the "importance" of each node.

from pagerank import power_iteration

# Define a graph where keys are nodes and values are outgoing links
graph = {
    "A": {"B": 1, "C": 1},
    "B": {"C": 1},
    "C": {"A": 1},
}

# Calculate PageRank scores
scores = power_iteration(graph)
print(scores)
# A    0.443029
# C    0.354423
# B    0.202548
# dtype: float64

TextRank Example

Extract the most relevant keywords from a piece of text.

from textrank import textrank

document = """
Natural language processing is a subfield of linguistics, computer science,
and artificial intelligence concerned with the interactions between computers
and human language.
"""

# Extract top keywords
keyword_scores = textrank(document)
print(keyword_scores.head())
# language       0.239396
# computer       0.177059
# intelligence   0.155705
# subfield       0.134703
# linguistics    0.134703
# dtype: float64

Interactive Demo

For a more detailed walkthrough with visualizations, check out the Jupyter Notebook demo.

uv run jupyter notebook demo.ipynb

This demo covers:

Basic and advanced PageRank examples.
Keyword extraction with TextRank.
Visualizing results with matplotlib and seaborn.

API Reference

`pagerank.power_iteration`

power_iteration(transition_weights, rsp=0.15, epsilon=0.00001, max_iterations=1000)

Parameter	Type	Description	Default
`transition_weights`	`dict` or `list`	Graph representation.	Required
`rsp`	`float`	Random surfer probability (1 - damping).	`0.15`
`epsilon`	`float`	Convergence threshold.	`0.00001`
`max_iterations`	`int`	Max iterations to run.	`1000`

Returns: pandas.Series with nodes as keys and PageRank scores as values.

`textrank.textrank`

textrank(document, window_size=2, rsp=0.15, relevant_pos_tags=["NN", "NNP", "ADJ"])

Parameter	Type	Description	Default
`document`	`str`	Text to analyze.	Required
`window_size`	`int`	Co-occurrence window size.	`2`
`rsp`	`float`	Random surfer probability.	`0.15`
`relevant_pos_tags`	`list[str]`	POS tags to consider as keywords.	`["NN", "NNP", "ADJ"]`

Returns: pandas.Series with words as keys and TextRank scores as values, sorted descending.

Development

To set up a development environment, clone the repo and install the dependencies.

git clone https://github.com/ashkonf/PageRank.git
cd PageRank
uv sync --all-extras

This project uses pre-commit hooks for quality checks. Install them with:

uv run pre-commit install

Key Development Commands

Run tests: uv run pytest
Check formatting: uv run ruff format --check .
Check for linting errors: uv run ruff check .
Run type checks: uv run pyright .

Contributing

Contributions are welcome! Please feel free to open an issue or submit a pull request.

Fork the repository.
Create your feature branch (git checkout -b feature/my-new-feature).
Commit your changes (git commit -am 'Add some feature').
Push to the branch (git push origin feature/my-new-feature).
Open a new Pull Request.

License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

For more on the TextRank algorithm, see the original paper by Mihalcea and Tarau.

Name		Name	Last commit message	Last commit date
Latest commit History 151 Commits
.github/workflows		.github/workflows
src		src
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
demo.ipynb		demo.ipynb
download_nltk_data.py		download_nltk_data.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

pagerank

Features

Table of Contents

Installation

Quick Start

PageRank Example

TextRank Example

Interactive Demo

API Reference

`pagerank.power_iteration`

`textrank.textrank`

Development

Key Development Commands

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

Uh oh!

License

Uh oh!

ashkonf/pagerank

Folders and files

Latest commit

History

Repository files navigation

pagerank

Features

Table of Contents

Installation

Quick Start

PageRank Example

TextRank Example

Interactive Demo

API Reference

pagerank.power_iteration

textrank.textrank

Development

Key Development Commands

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

`pagerank.power_iteration`

`textrank.textrank`

Packages