Pabulib (.pb) format file: Checker

A Python library for validating files in the .pb (Pabulib) format, ensuring compliance with the standards described at pabulib.org/format.

pip install git+https://github.com/pabulib/checker.git

TODO

pycountry should be installed
tests should be run before deployment - CI/CD
Should add correct order (to change it) not the actual one

Overview

The Checker is a utility for processing and validating .pb files. It performs a wide range of checks to ensure data consistency across meta, projects, and votes sections. We are very open for any code suggestions / changes.

Features

Key Functions

Budget Validation: Ensures that project costs align with the defined budget and checks for overages.
Vote and Project Count Validation: Cross-verifies counts in metadata against actual data.
Vote Length Validation: Validates that each voter’s submissions comply with minimum and maximum limits.
Duplicate Votes Detection: Identifies repeated votes within individual submissions.
Project Selection Validation: Ensures compliance with defined selection rules, such as Poznań or greedy algorithms.
Field Structure Validation: Verifies field presence, order, types, and constraints in metadata, projects, and votes.
Date Range Validation: Checks that metadata contains a valid date range.

Results Structure

The results from the validation process include three main sections:

1. Metadata

Tracks the overall processing statistics:

processed: Total number of files processed.
valid: Count of valid files.
invalid: Count of invalid files.

2. Summary

Provides aggregated error and warning counts by type for all processed files. Example:

{
  "empty lines": 3,
  "comma in float!": 2,
  "budget exceeded": 1
}

3. File Results

Details the outcomes for each processed file. Includes:

webpage_name: Generated name based on metadata.
results:
- File looks correct! if no errors or warnings.
- Detailed errors and warnings if issues are found.

Example Output

Valid File

{
  "metadata": {
    "processed": 1,
    "valid": 1,
    "invalid": 0
  },
  "summary": {},
  "file1": {
    "webpage_name": "Country_Unit_Instance_Subunit",
    "results": "File looks correct!"
  }
}

Invalid File

{
  "metadata": {
    "processed": 1,
    "valid": 0,
    "invalid": 1
  },
  "summary": {
    "empty lines": 1,
    "comma in float!": 1
  },
  "file1": {
    "webpage_name": "Country_Unit_Instance_Subunit",
    "results": {
      "errors": {
        "empty lines": {
          1: "contains empty lines at: [10, 20]"
        },
        "comma in float!": {
          1: "in budget"
        }
      },
      "warnings": {
        "wrong projects fields order": {
          1: "projects wrong fields order: ['name', 'cost', 'selected']."
        }
      }
    }
  }
}

Possible Issues

Errors

Critical issues that need to be fixed:

Empty Lines: contains empty lines at: [line_numbers]
Comma in Float: comma in float value at {field}
Project with No Cost: project: {project_id} has no cost!
Single Project Exceeded Whole Budget: project {project_id} has exceeded the whole budget!
Budget Exceeded: Budget exceeded by selected projects
Fully Funded Flag Discrepancy: fully_funded flag different than 1!
Unused Budget: Unused budget could fund project: {project_id}
Different Number of Votes: votes number in META: {meta_votes} vs counted from file: {file_votes}
Different Number of Projects: projects number in META: {meta_projects} vs counted from file: {file_projects}
Vote with Duplicated Projects: duplicated projects in a vote: {voter_id}
Vote Length Exceeded: Voter ID: {voter_id}, max vote length exceeded
Vote Length Too Short: Voter ID: {voter_id}, min vote length not met
Different Values in Votes: file votes vs counted votes mismatch for project: {project_id}
Different Values in Scores: file scores vs counted scores mismatch for project: {project_id}
No Votes or Scores in Projects: No votes or scores found in PROJECTS section
Invalid Field Value: field '{field_name}' has invalid value

Warnings

Non-critical issues that should be reviewed:

Wrong Field Order: {section_name} contains fields in wrong order: {fields_list}
Poznań Rule Not Followed: Projects not selected but should be: {project_ids}
Greedy Rule Not Followed: Projects selected but should not: {project_ids}

How to Use

Installation

Ensure all dependencies are installed:
- Python 3.8+
- Required modules:
  - pycountry
```
pip install -r requirements.txt
```
Install as a python package directly from github:
```
pip install git+https://github.com/pabulib/checker.git
```

Usage

Import the Checker class:
```
from pabulib.checker import Checker
```
Instantiate the Checker class:
```
checker = Checker()
```
Process Files: You can use process_files method which takes a list of path to files or their contents.
```
files = ["path/to/file1.pb", "raw content of file2"]
results = checker.process_files(files)
```

Get the results: ATM results is a python dict

import json

# for a summary, errors accross all files
print(json.dumps(results["summary"], indent=4))

# processing metadata, how many files were processed etc
print(json.dumps(results["metadata"], indent=4)) 


print(results) # to get details.
# for example
print(results[<file_name>])

Running Example Files

You can process example .pb files using the script examples/run_examples.py. This script demonstrates how to use the Checker to validate files.

Example files are located in the examples/ directory:
- example_valid.pb: A valid .pb file.
- example_invalid.pb: A .pb file containing errors.
Run the script:

python examples/run_examples.py

The results for both valid and invalid files will be printed in JSON format.

Customization

To add new validation rules or checks:

Define a new method in the Checker class.
Integrate it into the run_checks method for sequential execution.

Additional Information

For detailed examples or advanced usage, refer to the comments in the source code.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
examples		examples
pabulib		pabulib
pabulib_helpers		pabulib_helpers
tests		tests
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Pabulib (.pb) format file: Checker

TODO

Overview

Features

Key Functions

Results Structure

1. Metadata

2. Summary

3. File Results

Example Output

Valid File

Invalid File

Possible Issues

Errors

Warnings

How to Use

Installation

Usage

Running Example Files

Customization

Additional Information

About

Uh oh!

Releases

Packages

Languages

DominikPeters/pabulib-checker

Folders and files

Latest commit

History

Repository files navigation

Pabulib (.pb) format file: Checker

TODO

Overview

Features

Key Functions

Results Structure

1. Metadata

2. Summary

3. File Results

Example Output

Valid File

Invalid File

Possible Issues

Errors

Warnings

How to Use

Installation

Usage

Running Example Files

Customization

Additional Information

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages