Thank you for your interest in improving datamend. Every contribution matters — whether it is a bug report, a repair strategy, a new drift algorithm, documentation, or a test.
| Type | What to do |
|---|---|
| Bug report | Open an issue with a minimal reproducible example |
| Feature request | Open a discussion before opening a PR |
| New repair strategy | Implement BaseRepairPlugin and open a PR |
| New validator | Implement BaseValidatorPlugin and open a PR |
| New drift detector | Implement BaseDriftDetectorPlugin and open a PR |
| New tracer | Implement BaseTracerPlugin and open a PR |
| Documentation | Edit files in docs/ and open a PR |
| Tests | Add tests in tests/ targeting uncovered code paths |
git clone https://github.com/vignesh2027/datamend.py.git
cd datamend.py
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -e ".[dev]"pytest # Run all tests with coverage
pytest tests/test_repair.py # Run a specific test file
pytest -k "test_repair" # Run tests matching a patternCoverage must remain above 90% on all PRs.
datamend uses ruff for linting and formatting:
ruff check datamend/ # Lint
ruff format datamend/ # Format
mypy datamend/ # Type checkAll public functions, classes, and methods must have Google-style docstrings with Args and Returns sections. All public API must have full type hints.
from datamend.plugins.base import BaseRepairPlugin
from datamend.core.repair import RepairAction
from typing import List, Tuple
import pandas as pd
class MyRepairPlugin(BaseRepairPlugin):
name = "my_repair"
description = "Short description of what this repairs."
version = "0.1.0"
author = "Your Name"
def repair(self, df: pd.DataFrame) -> Tuple[pd.DataFrame, List[RepairAction]]:
# 1. Copy the DataFrame — never mutate in place
df = df.copy()
actions: List[RepairAction] = []
# 2. Detect the issue
# 3. Fix the issue
# 4. Log a RepairAction for every change
return df, actionsPlugins must:
- Never mutate the input DataFrame in place
- Return a
RepairActionfor every change they make - Handle errors gracefully (do not crash the pipeline)
- Have unit tests in
tests/
from datamend.plugins.base import BaseDriftDetectorPlugin
from typing import Any, Dict
import pandas as pd
class MyDriftPlugin(BaseDriftDetectorPlugin):
name = "my_drift"
description = "Custom drift detector."
def detect(self, reference: pd.Series, current: pd.Series, column: str) -> Dict[str, Any]:
drift_score = ... # your logic
return {
"drift_score": drift_score, # 0–100
"drifted": drift_score > 20,
"method": "my_method",
"details": {},
}- Fork the repository and create a branch:
git checkout -b feature/my-repair-plugin - Write your code, tests, and docstrings
- Run
pytestand confirm coverage is above 90% - Run
ruff check datamend/andmypy datamend/ - Open a PR against
mainwith a clear description of what the change does and why
If your plugin is domain-specific (e.g. medical, financial), publish it as a standalone package and declare the datamend entry-point:
[project.entry-points."datamend.plugins"]
my_plugin = "my_package.plugins:MyRepairPlugin"datamend will auto-discover it when the package is installed.
Be respectful and constructive. datamend is a community project and everyone is welcome.