A bot designed to detect and handle (dumb) spam on jschan imageboards.
Due to its nature, both false positives and false negatives are possible. If a user posts like a spammer (e.g., writes l*i*k*e t*h*i*s while sharing URLs), they will be considered one. Read the "How it works" section to understand these and other limitations or test if a string or a post is considered spam using the provided script.
Note that the current implementation should not be considered "production ready". Although it works and has been running for a few months at the time of writing, it was initially implemented as a proof of concept.
In short, the bot listens for new posts or threads through the global management socket. Posts and threads are handled similarly, and the terms are used interchangeably.
New posts are evaluated through a set of rules designed to reduce false positives. Posts without files, without a message, with a capcode, with a geo-flag from a whitelisted country, and without URLs are ignored.
If the post is not considered safe at this point, the message (excluding URLs) is evaluated. Currently, two modes of "spam detection" are supported: threshold and entries.
In threshold mode, the bot detects spam messages by comparing the ratio between tokens and message size against a threshold. A token is a character typically used by spammers to obfuscate a message (e.g., * or #). If the ratio exceeds the threshold, the post is flagged as spam.
In entries mode, the bot counts the number of consecutive entries. An entry is defined as a word character (as defined by Python regular expressions) followed by a token (e.g., a*). For example, with *#$ tokens, the message a*a#b$ would have three consecutive entries.
If the count surpasses a configurable amount, the post is considered spam.
Finally, if a post is considered spam, the configured moderation action is performed, and the post is saved as a JSON file (currently experimental and not thoroughly tested).
The moderation action is performed by the account configured in the .env file, and the username is hidden by default. The following moderation actions are supported: ban and delete. The ban action also deletes the post.
For reasons explained in the "How it works" section, you will need to provide the bot with credentials for an account with global staff permissions without 2FA enabled (not supported). Ideally, the account should be a global moderator since no other type of account has been tested.
- (Optional) Create and activate a new virtual environment by running
python -m venv path/to/venvandsource path/to/venv/bin/activaterespectively. - Use
pip install -r requirements.txtto install the dependencies. - Copy, update, and rename
.env.exampleto.env. - Copy, tweak, and rename
example.initoconfig.ini.
- (Optional) Activate the virtual environment with
source path/to/venv/bin/activate. - Run
python yacam/yacam.pyto start the bot.
You can test a string or a post against the rules by running python yacam/is_spam.py <mode> <file_path>.
Use python yacam/is_spam.py --help for more information.
The mode can be either string or post, the latter requiring a JSON file with the post data. The JSON content must follow the format provided by the jschan API in the global management socket or .../recents.json endpoint.