Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View samalloing's full-sized avatar
  • KB - National Library of the Netherlands
  • Den Haag

Block or report samalloing

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
64 stars written in HTML
Clear filter

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

HTML 14,853 2,133 Updated Nov 3, 2025

glTF – Runtime 3D Asset Delivery

HTML 7,578 1,169 Updated Nov 3, 2025

Schema.org - schemas and supporting software

HTML 5,803 883 Updated Nov 7, 2025

Course materials for the Data Science Specialization: https://www.coursera.org/specialization/jhudatascience/1

HTML 4,115 31,210 Updated Mar 30, 2021

Smallest possible syntactically valid files of different types

HTML 2,275 193 Updated Jul 18, 2024

website at https://raft.github.io

HTML 1,437 224 Updated Oct 19, 2025

EPUB 3 Sample Documents

HTML 467 164 Updated Jul 7, 2023

Shared workspace for EPUB 3 specifications.

HTML 344 63 Updated Oct 31, 2025

Data Curator - share usable open data

HTML 276 38 Updated Nov 25, 2021

EthOn - The Ethereum Ontology

HTML 247 44 Updated Oct 29, 2018

Documents produced by the CSV on the Web Working Group

HTML 165 48 Updated Nov 2, 2022

Specifications developed and maintained by the Webrecorder community.

HTML 136 17 Updated Oct 16, 2025

Centralised repository for WARC usage specifications.

HTML 118 32 Updated Oct 12, 2025

W3C Web Publications

HTML 81 19 Updated Feb 12, 2025

A collection of EPUB documents to systematically test EPUB Reading System conformance

HTML 80 23 Updated Dec 13, 2021

Perseus Treebank Data

HTML 75 46 Updated Jun 19, 2024

OCR evaluation brought to you by University of Alicante

HTML 66 27 Updated Sep 1, 2022

The Oxford Common File Layout (OCFL) specifications and website

HTML 63 14 Updated Aug 14, 2025

ICA Records in Contexts-Ontology (ICA RiC-O) GitHub repository web pages

HTML 60 20 Updated Jun 25, 2025

OxGarage is an web, and RESTful, service to manage the transformation of documents between a variety of formats. The majority of transformations use the Text Encoding Initiative format as a pivot f…

HTML 53 22 Updated Sep 18, 2015

Legacy Repository: TEI SimplePrint now merged into TEI Repository. Originally TEI Simple aimed to define a new highly-constrained and prescriptive subset of the Text Encoding Initiative (TEI) Guide…

HTML 49 11 Updated Dec 6, 2016

EPUB 3 Community Group Repository

HTML 48 16 Updated Apr 26, 2022

All ontologies used in NIF 2.0 (NIF-Core + vocabulary modules + helper modules)

HTML 37 7 Updated Jun 22, 2017

Loader software for automated imaging of optical media with Nimbie disc robot

HTML 36 3 Updated Mar 10, 2025

Java based viewer for PAGE XML files (layout + text content). Also supports ALTO XML, FineReader XML, and HOCR.

HTML 35 9 Updated May 25, 2023

A SKOS extension for statistical classifications

HTML 35 8 Updated Jul 18, 2024

Repo for the open standards for data guidebook

HTML 29 5 Updated Aug 21, 2025

The DDI Discovery Vocabulary, an RDF vocabulary for data description and discovery based on DDI

HTML 25 7 Updated May 5, 2023
Next