sAwMIL (Sparse Aware Multiple-Instance Learning) is an open-source Python library providing a collection of Support Vector Machine (SVM) classifiers for multiple-instance learning (MIL). It builds upon ideas from the earlier misvm package, adapting it for the latest Python version, as well as introducing new models.
In Single-Instance Learning (SIL), the dataset consists of pairs of an instance and a label:
In binary settings, the label is
In Multiple-Instance Learning (MIL), the dataset consists of bags of instances paired with a single bag-level label:
To solve this problem, we can use NSK or sMIL models.
In some cases, each bag, along with the instances and a label, could contain a intra-bag mask that specifies which items are likely to contain the signal related to
To solve this problem, one can use the sAwMIL model.
sawmil supports three QP backends:
By default, the base package installs without any solver; pick one (or both) via extras.
pip install sawmil
# it installs numpy>=1.22 and scikit-learn>=1.7.0Gurobi is commercial software. You’ll need a valid license (academic or commercial), refer to the official website.
pip install "sawmil[gurobi]"
# in additionl to the base packages, it install gurobi>12.0.3pip install "sawmil[osqp]"
# in additionl to the base packages, it installs osqp>=1.0.4 and scipy>=1.16.1pip install "sawmil[daqp]"
# in additionl to the base packages, it installs daqp>=0.5 and scipy>=1.16.1pip install "sawmil[full]"from sawmil import SVM, RBF
k = RBF(gamma = 0.1)
# solver= "osqp" (default is "gurobi")
# SVM is for single-instances
clf = SVM(C=1.0,
kernel=k,
solver="osqp").fit(X, y)from sawmil.data import generate_dummy_bags
import numpy as np
rng = np.random.default_rng(0)
ds = generate_dummy_bags(
n_pos=300, n_neg=100, inst_per_bag=(5, 15), d=2,
pos_centers=((+2,+1), (+4,+3)),
neg_centers=((-1.5,-1.0), (-3.0,+0.5)),
pos_scales=((2.0, 0.6), (1.2, 0.8)),
neg_scales=((1.5, 0.5), (2.5, 0.9)),
pos_intra_rate=(0.25, 0.85),
ensure_pos_in_every_pos_bag=True,
neg_pos_noise_rate=(0.00, 0.05),
pos_neg_noise_rate=(0.00, 0.20),
outlier_rate=0.1,
outlier_scale=8.0,
random_state=42,
)Load a kernel:
from sawmil.kernels import get_kernel, RBF
k1 = get_kernel("rbf", gamma=0.1)
k2 = RBF(gamma=0.1)
# k1 == k2Fit NSK Model:
from sawmil.nsk import NSK
clf = NSK(C=1, kernel=k,
# bag kernel settings
normalizer='average',
# solver params
scale_C=True,
tol=1e-8,
verbose=False).fit(ds, None)
y = ds.y
print("Train acc:", clf.score(ds, y))from sawmil.smil import sMIL
k = get_kernel("linear") # base (single-instance kernel)
clf = sMIL(C=0.1,
kernel=k,
scale_C=True,
tol=1e-8,
verbose=False).fit(ds, None)See more examples in the example.ipynb notebook.
from sawmil.kernels import Product, Polynomial, Linear, RBF, Sum, Scale
from sawmil.sawmil import sAwMIL
k = Sum(Linear(),
Scale(0.5,
Product(Polynomial(degree=2), RBF(gamma=1.0))))
clf = sAwMIL(C=0.1,
kernel=k,
solver="gurobi",
eta=0.95) # here eta is high, since all items in the bag are relevant
clf.fit(ds)
print("Train acc:", clf.score(ds, ds.y))If you use sawmil package in academic work, please cite:
Savcisens, G. & Eliassi-Rad, T. sAwMIL: Python package for Sparse Multiple-Instance Learning (2025).
@software{savcisens2025sawmil,
author = {Savcisens, Germans and Eliassi-Rad, Tina},
title = {sAwMIL: Python package for Sparse Multiple-Instance Learning},
year = {2025},
doi = {10.5281/zenodo.16990499},
url = {https://github.com/carlomarxdk/sawmil}
}If you want to reference a specific version of the package, find the correct DOI here.