Parsett

Parsett is a flexible and powerful toolkit for parsing and transforming torrent titles. It provides a robust mechanism to define custom parsing handlers and transformers, making it ideal for extracting meaningful information from torrent file names.

Important

This library is a Python port of the parse-torrent-title library from TheBeastLT and heavily modified to fit the needs of RTN.

Features

User-Friendly Interface: Effortlessly parse torrent titles with an intuitive interface.
Custom Handlers & Transformers: Easily define and integrate your own handlers and transformers.
Comprehensive Default Handlers: Leverage built-in handlers for common torrent title patterns.
Highly Extensible: Customize and extend the toolkit to fit your specific needs.

Installation

To install parsett, you can use pip:

pip install parsett

Quick Start

Basic Usage

To parse a torrent title using the default handlers, simply call parse_title():

from PTT import parse_title

result = parse_title("The Simpsons S01E01 1080p BluRay x265 HEVC 10bit AAC 5.1 Tigole")
print(result)

By default, languages are 2-char ISO 639-1 Standardized codes, not full names. To get the full names, you can add the argument translate_languages to parse_title():

result = parse_title("The.Walking.Dead.S06E07.SUBFRENCH.HDTV.x264-AMB3R.mkv", translate_languages=True)
print(result)

Would result in a languages field with the value ["French"] instead of ["fr"].

Examples

Here are some examples of parsed torrent titles:

Example 1

Title: The Simpsons S01E01 1080p BluRay x265 HEVC 10bit AAC 5.1 Tigole

Parsed Result:

{
    "title": "The Simpsons",
    "seasons": [1],
    "episodes": [1],
    "languages": [],
    "resolution": "1080p",
    "quality": "BluRay",
    "codec": "hevc",
    "bit_depth": "10bit",
    "audio": ["AAC"],
    "channels": ["5.1"]
}

Example 2

Title: www.Tamilblasters.party - The Wheel of Time (2021) Season 01 EP(01-08) [720p HQ HDRip - [Tam + Tel + Hin] - DDP5.1 - x264 - 2.7GB - ESubs]

Parsed Result:

{
    "title": "The Wheel of Time",
    "year": 2021,
    "seasons": [1],
    "episodes": [1, 2, 3, 4, 5, 6, 7, 8],
    "languages": ["Hindi", "Telugu", "Tamil"],
    "quality": "HDRip",
    "resolution": "720p",
    "codec": "avc",
    "audio": ["Dolby Digital Plus"],
    "channels": ["5.1"],
    "site": "www.Tamilblasters.party",
    "size": "2.7GB",
    "trash": True
}

Example 3

Title: The.Walking.Dead.S06E07.SUBFRENCH.HDTV.x264-AMB3R.mkv

Parsed Result:

{
    "title": "The Walking Dead",
    "seasons": [6],
    "episodes": [7],
    "languages": ["French"],
    "quality": "HDTV",
    "codec": "avc",
    "group": "AMB3R",
    "extension": "mkv",
    "container": "mkv"
}

Supported Fields

Here are the fields that are currently supported by the default handlers, along with their types:

title: str
resolution: str
date: str
year: int
ppv: bool
trash: bool
edition: str
extended: bool
convert: bool
hardcoded: bool
proper: bool
repack: bool
retail: bool
remastered: bool
unrated: bool
region: str
quality: str
bit_depth: str
hdr: list[str]
codec: str
audio: list[str]
channels: list[str]
group: str
container: str
volumes: list[int]
seasons: list[int]
episodes: list[int]
episode_code: str
complete: bool
languages: list[str]
dubbed: bool
site: str
extension: str
subbed: bool
documentary: bool
upscaled: bool

Advanced Usage

You can create and customize your own parser instance if needed:

from PTT import Parser, add_defaults

# Create a new parser instance
parser = Parser()

# Add default handlers
add_defaults(parser)

# Parse a torrent title
result = parser.parse("The Simpsons S01E01 1080p BluRay x265 HEVC 10bit AAC 5.1 Tigole")
print(result)

Adding Custom Handlers

parsett allows you to add custom handlers to extend the parsing capabilities. Here’s how you can do it:

Define a Custom Handler

A handler is a function that processes a specific pattern in the input string. Here’s an example of a custom handler that extracts hashtags from a string:

import regex
from PTT.parse import Parser

def hashtag_handler(input_string):
    hashtags = regex.findall(r"#(\w+)", input_string)
    return {"hashtags": hashtags}

# Create a new parser instance
parser = Parser()

# Add the custom handler
parser.add_handler("hashtags", regex.compile(r"#(\w+)"), hashtag_handler)

# Parse a string
result = parser.parse("This is a test string with #hashtags and #morehashtags.")
print(result)

Built-in Transformers

The parsett library offers a variety of built-in transformers to help you manipulate and standardize the extracted data. Here’s a rundown of the available transformers:

none: Leaves the input value unchanged.
value: Substitutes the input value with a predefined value.
integer: Converts the input value into an integer.
boolean: Returns True if the input value is truthy, otherwise False.
lowercase: Transforms the input value to lowercase.
uppercase: Transforms the input value to uppercase.
date: Parses and formats dates according to specified format(s).
range_func: Extracts and parses a range of numbers from the input string.
year_range: Extracts and parses a range of years from the input string.
array: Encapsulates the input value within a list.
uniq_concat: Appends unique values to an existing list.
transform_resolution: Standardizes resolution values to a consistent format.

Example Usage of Transformers

from parsett.transformers import lowercase, uppercase

# Add a handler with a transformer
parser.add_handler("lowercase_example", regex.compile(r"[A-Z]+"), lowercase)
parser.add_handler("uppercase_example", regex.compile(r"[a-z]+"), uppercase)

result = parser.parse("This is a MIXED case STRING.")
print(result)

Options in add_handler

The add_handler function allows you to specify options to control the behavior of the handler. The available options are:

default_options = {
    "skipIfAlreadyFound": True,
    "skipFromTitle": False,
    "skipIfFirst": False,
    "remove": False,
}

Option Details

skipIfAlreadyFound: If True, the handler will not process the input if the field has already been found.
skipFromTitle: If True, the matched pattern will be excluded from the title.
skipIfFirst: If True, the handler will not process the input if it is the first handler.
remove: If True, the matched pattern will be removed from the input string.

Example Usage of Options

parser.add_handler("custom_handler", regex.compile(r"\bexample\b", regex.IGNORECASE), lambda x: "example_value", {
    "skipIfAlreadyFound": False,
    "skipFromTitle": True,
    "skipIfFirst": True,
    "remove": True,
})

Extending the Parser

To extend the parser with additional functionality, you can create new transformers and handlers.

Creating a New Transformer

A transformer is a function that processes the extracted value. Here’s an example of a custom transformer that reverses a string:

def reverse(input_value):
    return input_value[::-1]

# Add a handler with the custom transformer
parser.add_handler("reverse_example", regex.compile(r"\w+"), reverse)

result = parser.parse("Reverse this string.")
print(result)

Creating a Custom Handler for Torrent Titles

Let's create a custom handler to extract the uploader name from a torrent title:

def uploader_handler(input_string):
    match = regex.search(r"Uploader: ([\w\s]+)", input_string)
    if match:
        return {"uploader": match.group(1)}
    return {}

# Add the custom handler
parser.add_handler("uploader", regex.compile(r"Uploader: ([\w\s]+)"), uploader_handler)

# Parse a string
result = parser.parse("Anatomia De Grey - Temporada 19 [HDTV][Cap.1905][Castellano][www.AtomoHD.nu].avi Uploader: JohnDoe")
print(result)

Development

To get started with development, clone the repository and install the dependencies with poetry:

poetry install

Contributing

Contributions are welcome! If you have ideas for new features or improvements, feel free to open an issue or submit a pull request on GitHub.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 175 Commits
.github/workflows		.github/workflows
PTT		PTT
tests		tests
.coveragerc		.coveragerc
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
pyrightconfig.json		pyrightconfig.json
pytest.ini		pytest.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Parsett

Features

Installation

Quick Start

Basic Usage

Examples

Example 1

Example 2

Example 3

Supported Fields

Advanced Usage

Adding Custom Handlers

Define a Custom Handler

Built-in Transformers

Example Usage of Transformers

Options in add_handler

Option Details

Example Usage of Options

Extending the Parser

Creating a New Transformer

Creating a Custom Handler for Torrent Titles

Development

Contributing

License

About

Uh oh!

Releases 42

Uh oh!

Contributors 8

Languages

License

dreulavelle/PTT

Folders and files

Latest commit

History

Repository files navigation

Parsett

Features

Installation

Quick Start

Basic Usage

Examples

Example 1

Example 2

Example 3

Supported Fields

Advanced Usage

Adding Custom Handlers

Define a Custom Handler

Built-in Transformers

Example Usage of Transformers

Options in add_handler

Option Details

Example Usage of Options

Extending the Parser

Creating a New Transformer

Creating a Custom Handler for Torrent Titles

Development

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 42

Uh oh!

Contributors 8

Languages