Parsett is a flexible and powerful toolkit for parsing and transforming torrent titles. It provides a robust mechanism to define custom parsing handlers and transformers, making it ideal for extracting meaningful information from torrent file names.
Important
This library is a Python port of the parse-torrent-title library from TheBeastLT
and heavily modified to fit the needs of RTN.
- User-Friendly Interface: Effortlessly parse torrent titles with an intuitive interface.
- Custom Handlers & Transformers: Easily define and integrate your own handlers and transformers.
- Comprehensive Default Handlers: Leverage built-in handlers for common torrent title patterns.
- Highly Extensible: Customize and extend the toolkit to fit your specific needs.
To install parsett, you can use pip:
pip install parsettTo parse a torrent title using the default handlers, simply call parse_title():
from PTT import parse_title
result = parse_title("The Simpsons S01E01 1080p BluRay x265 HEVC 10bit AAC 5.1 Tigole")
print(result)By default, languages are 2-char ISO 639-1 Standardized codes, not full names.
To get the full names, you can add the argument translate_languages to parse_title():
result = parse_title("The.Walking.Dead.S06E07.SUBFRENCH.HDTV.x264-AMB3R.mkv", translate_languages=True)
print(result)Would result in a languages field with the value ["French"] instead of ["fr"].
Here are some examples of parsed torrent titles:
Title: The Simpsons S01E01 1080p BluRay x265 HEVC 10bit AAC 5.1 Tigole
Parsed Result:
{
"title": "The Simpsons",
"seasons": [1],
"episodes": [1],
"languages": [],
"resolution": "1080p",
"quality": "BluRay",
"codec": "hevc",
"bit_depth": "10bit",
"audio": ["AAC"],
"channels": ["5.1"]
}Title: www.Tamilblasters.party - The Wheel of Time (2021) Season 01 EP(01-08) [720p HQ HDRip - [Tam + Tel + Hin] - DDP5.1 - x264 - 2.7GB - ESubs]
Parsed Result:
{
"title": "The Wheel of Time",
"year": 2021,
"seasons": [1],
"episodes": [1, 2, 3, 4, 5, 6, 7, 8],
"languages": ["Hindi", "Telugu", "Tamil"],
"quality": "HDRip",
"resolution": "720p",
"codec": "avc",
"audio": ["Dolby Digital Plus"],
"channels": ["5.1"],
"site": "www.Tamilblasters.party",
"size": "2.7GB",
"trash": True
}Title: The.Walking.Dead.S06E07.SUBFRENCH.HDTV.x264-AMB3R.mkv
Parsed Result:
{
"title": "The Walking Dead",
"seasons": [6],
"episodes": [7],
"languages": ["French"],
"quality": "HDTV",
"codec": "avc",
"group": "AMB3R",
"extension": "mkv",
"container": "mkv"
}Here are the fields that are currently supported by the default handlers, along with their types:
title:strresolution:strdate:stryear:intppv:booltrash:booledition:strextended:boolconvert:boolhardcoded:boolproper:boolrepack:boolretail:boolremastered:boolunrated:boolregion:strquality:strbit_depth:strhdr:list[str]codec:straudio:list[str]channels:list[str]group:strcontainer:strvolumes:list[int]seasons:list[int]episodes:list[int]episode_code:strcomplete:boollanguages:list[str]dubbed:boolsite:strextension:strsubbed:booldocumentary:boolupscaled:bool
You can create and customize your own parser instance if needed:
from PTT import Parser, add_defaults
# Create a new parser instance
parser = Parser()
# Add default handlers
add_defaults(parser)
# Parse a torrent title
result = parser.parse("The Simpsons S01E01 1080p BluRay x265 HEVC 10bit AAC 5.1 Tigole")
print(result)parsett allows you to add custom handlers to extend the parsing capabilities. Here’s how you can do it:
A handler is a function that processes a specific pattern in the input string. Here’s an example of a custom handler that extracts hashtags from a string:
import regex
from PTT.parse import Parser
def hashtag_handler(input_string):
hashtags = regex.findall(r"#(\w+)", input_string)
return {"hashtags": hashtags}
# Create a new parser instance
parser = Parser()
# Add the custom handler
parser.add_handler("hashtags", regex.compile(r"#(\w+)"), hashtag_handler)
# Parse a string
result = parser.parse("This is a test string with #hashtags and #morehashtags.")
print(result)The parsett library offers a variety of built-in transformers to help you manipulate and standardize the extracted data. Here’s a rundown of the available transformers:
none: Leaves the input value unchanged.value: Substitutes the input value with a predefined value.integer: Converts the input value into an integer.boolean: ReturnsTrueif the input value is truthy, otherwiseFalse.lowercase: Transforms the input value to lowercase.uppercase: Transforms the input value to uppercase.date: Parses and formats dates according to specified format(s).range_func: Extracts and parses a range of numbers from the input string.year_range: Extracts and parses a range of years from the input string.array: Encapsulates the input value within a list.uniq_concat: Appends unique values to an existing list.transform_resolution: Standardizes resolution values to a consistent format.
from parsett.transformers import lowercase, uppercase
# Add a handler with a transformer
parser.add_handler("lowercase_example", regex.compile(r"[A-Z]+"), lowercase)
parser.add_handler("uppercase_example", regex.compile(r"[a-z]+"), uppercase)
result = parser.parse("This is a MIXED case STRING.")
print(result)The add_handler function allows you to specify options to control the behavior of the handler. The available options are:
default_options = {
"skipIfAlreadyFound": True,
"skipFromTitle": False,
"skipIfFirst": False,
"remove": False,
}skipIfAlreadyFound: IfTrue, the handler will not process the input if the field has already been found.skipFromTitle: IfTrue, the matched pattern will be excluded from the title.skipIfFirst: IfTrue, the handler will not process the input if it is the first handler.remove: IfTrue, the matched pattern will be removed from the input string.
parser.add_handler("custom_handler", regex.compile(r"\bexample\b", regex.IGNORECASE), lambda x: "example_value", {
"skipIfAlreadyFound": False,
"skipFromTitle": True,
"skipIfFirst": True,
"remove": True,
})To extend the parser with additional functionality, you can create new transformers and handlers.
A transformer is a function that processes the extracted value. Here’s an example of a custom transformer that reverses a string:
def reverse(input_value):
return input_value[::-1]
# Add a handler with the custom transformer
parser.add_handler("reverse_example", regex.compile(r"\w+"), reverse)
result = parser.parse("Reverse this string.")
print(result)Let's create a custom handler to extract the uploader name from a torrent title:
def uploader_handler(input_string):
match = regex.search(r"Uploader: ([\w\s]+)", input_string)
if match:
return {"uploader": match.group(1)}
return {}
# Add the custom handler
parser.add_handler("uploader", regex.compile(r"Uploader: ([\w\s]+)"), uploader_handler)
# Parse a string
result = parser.parse("Anatomia De Grey - Temporada 19 [HDTV][Cap.1905][Castellano][www.AtomoHD.nu].avi Uploader: JohnDoe")
print(result)To get started with development, clone the repository and install the dependencies with poetry:
poetry installContributions are welcome! If you have ideas for new features or improvements, feel free to open an issue or submit a pull request on GitHub.
This project is licensed under the MIT License. See the LICENSE file for details.