This repo builds a knowledge graph from UK legislation and provides an application for exploring the graph.
Originally prototyped by @livadlivesey and @GavEdwards.
This repo is the code used to build Lex Graph from scratch. If you wish to simply access the produced knowledge graph, Lex Graph can be downloaded from i.AI's Hugging Face Datasets.
To build the Lex Graph from scratch, please follow these steps:
- Clone the repo
git clone https://github.com/i-dot-ai/lex-graph-build.git
- Install poetry if you don't have it
pip install poetry
- Install the dependencies and create a virtual environment
poetry install
A dump of the latest XML versions of legislation is available from the new Legislation Research website from the National Archives. At the time of publishing this is in beta. If you are interested in gaining access to the raw data to build the graph from scratch, please contact the Legislation Data Team ([email protected]).
Once you have access, download the Legislative Texts Enacted CLML data and unzip it into data/raw.
The graph build process consists of two steps:
-
Pre-processing the raw data
-
Building the graph
The processed data is saved in the data/processed directory and the graph is saved in the data/graph directory.
Process a single test file
poetry run python scripts/preprocess.py --test
Process a custom file
poetry run python scripts/preprocess.py --file <file_path>
Process a subset of files
poetry run python scripts/preprocess.py --year <year> --type <type>
Process all files
poetry run python scripts/preprocess.py --all
Use a different input path (default input is data/raw
, default output is data/processed
)
poetry run python scripts/preprocess.py --input_path <input_path> --output_path <output_path>
You can also use a yaml configuration file instead of, or alongside, the command line arguments
poetry run python scripts/preprocess.py --config configs/preprocess_config.yaml
Build graph from a single test file
poetry run python scripts/build_graph.py --test
Build graph from a custom file
poetry run python scripts/build_graph.py --file <file_path>
Build graph from a subset of files
poetry run python scripts/build_graph.py --year <year> --type <type>
Build graph from all files
poetry run python scripts/build_graph.py --all
You can also use a yaml configuration file instead of, or alongside, the command line arguments
poetry run python scripts/build_graph.py --config configs/graph_config.yaml
The Streamlit app provides an interactive interface for exploring the UK legislation graph. The Streamlit app in the demo folder provides an interactive interface for exploring the UK legislation graph. The app.py file in the demo directory is the main entry point for the Streamlit application. It provides various functionalities for exploring and visualizing the legislation graph. See the README in the demo folder for more details.
This is a prototype and does not guarantee accurate data. The codebase and features are subject to change. Some functionality may be experimental and require further testing and validation.
-
Data Coverage: This prototype currently processes UK legislation data from the National Archives, but may not capture all legislative documents or their complete revision history. Some older or specialised documents might be missing or incompletely processed.
-
Graph Completeness: The relationships between legislative documents are primarily based on explicit references found in the XML files. Implicit connections, contextual relationships, or references using non-standard formats may be missed.
-
Data Accuracy: While we strive for accuracy, the automated parsing and graph construction process may contain errors, particularly when handling:
- Complex nested legislative structures
- Unusual formatting or non-standard XML structures
- Cross-references using ambiguous or incomplete citations
- Amendments and repeals that are conditionally applied
-
Performance Considerations: Processing the complete legislative dataset can be computationally intensive and time-consuming. On a well-powered laptop (e.g., Apple M3 Macbook Pro), we have found it takes up to 30 minutes to preprocess the full set of XML files (~15 minutes) and build the graph (~15 minutes). Users working with the full dataset should ensure adequate system resources are available.
-
Visualisation Constraints : The Streamlit visualization interface may experience performance limitations when displaying very large subgraphs or handling complex queries on the full dataset.
-
Legal Disclaimer: This tool is intended for research and analysis purposes only. It should not be relied upon for legal advice or as an authoritative source of legislation. Users should always refer to official sources for current and accurate legislative information.
This project builds upon and was inspired by the work of the Graphie team at King’s Quantitative and Digital Law Lab (QuantLaw), King's College London. Their original project Graphie demonstrated innovative approaches to legal knowledge graph construction and analysis of UK legislation, based on the Housing Act 2004. We encourage those interested in legal knowledge graphs to explore the original Graphie project available at: https://github.com/kclquantlaw/graphie.
All data is sourced from The National Archives legislation wesbite. Crown © and database right material reused under the Open Government Licence v3.0. Material derived from the European Institutions © European Union, 1998-2019, reused under the terms of Commission Decision 2011/833/EU.