📜 SSDA 2023 – Historical Image Explorers

Welcome to the official repository of the SSDA 2023 Challenge by the Historical Image Explorers team from the University of Fribourg, Switzerland. This project explores advanced document image recognition methods tailored for low-resource historical German manuscripts.

📖 Project Overview

This repository is dedicated to researchers and developers working on small-scale historical datasets, especially those involving Fraktur scripts and degraded handwritten documents. The core objective is to empower the community with lightweight yet effective models that perform well on noisy, aged, and low-data historical corpora.

We focus on:

Script-specific preprocessing (e.g., binarization for noise reduction)
Contrastive experimentation with RGB vs. binary inputs
Training strategies designed for small-data regimes

🔍 Visual Example

Figure 1: Sample historical German textline script from SSDA2023 dataset.

⚙️ Key Features

Figure 2: Sample of binarized version of the textline, highlighting cleaner contours and improved contrast for recognition tasks.

- 🧪 **Sample Code** Hands-on Jupyter notebooks that showcase end-to-end pipelines for historical text recognition.

📦 Specialized Mini Datasets
Compact and curated datasets for efficient experimentation in low-resource settings.
🛠️ Training Best Practices
Guidance on working with limited data, including input transformation, augmentation, and model tuning tips.

📁 Repository Structure

Binary-TROCR.ipynb
Fine-tuning example using binarized image inputs with the TrOCR architecture.
RGB-TROCR.ipynb
A counterpart notebook for experiments using original RGB document images.
Binary_train_with_AUG.ipynb
Demonstrates training on binarized inputs with basic augmentation for robustness.
pre-processing.zip
Scripts for converting raw document scans into model-ready input formats (e.g., binarization, cropping).
README.txt
Notes and technical remarks addressing challenges in training with constrained computational resources.

🚀 Getting Started

We welcome contributions from the broader community. Whether you're improving model accuracy, proposing new preprocessing steps, or contributing additional datasets, feel free to open an issue or submit a pull request.

To get started:

git clone https://github.com/your-repo/SSDA2023-Historical-Image-Explorers.git
cd SSDA2023-Historical-Image-Explorers

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
fig		fig
Binary-TROCR.ipynb		Binary-TROCR.ipynb
Binary_train_with_AUG.ipynb		Binary_train_with_AUG.ipynb
Generating data.zip		Generating data.zip
Histrorical Image Explorers Slide.pdf		Histrorical Image Explorers Slide.pdf
How we have done.pdf		How we have done.pdf
README.md		README.md
README.txt		README.txt
RGB-TROCR.ipynb		RGB-TROCR.ipynb
pre-processing.zip		pre-processing.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📜 SSDA 2023 – Historical Image Explorers

📖 Project Overview

🔍 Visual Example

⚙️ Key Features

📁 Repository Structure

🚀 Getting Started

About

Uh oh!

Releases

Packages

Languages

back-kh/SSDA-Historical-German-Manuscripts-Recognitions

Folders and files

Latest commit

History

Repository files navigation

📜 SSDA 2023 – Historical Image Explorers

📖 Project Overview

🔍 Visual Example

⚙️ Key Features

📁 Repository Structure

🚀 Getting Started

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages