Repository for SemUN project. It is composed of a docker-compose stack, with:
- An API (
un-semun-api) - A frontend (
un-semun-front) - An NLP pipeline (
un-ml-pipeline). - A Neo4j graph database (
neo4jservice indocker-compose.yml) - Scripts to populate the database (
un-semun-misc) - A scraper for the United Nations Digital Library
- To have more information on the project, please refer to the project proposal
- For more details about the final result, please refer to the paper
You also need to have Docker installed, I'm using OrbStack as a Docker desktop client for macOS, but regular Docker installation works perfectly fine as well.
When Docker is setup, you just have to run:
# Start the containers
docker-compose up -dOpen the frontend at http://localhost:8080/ if using Docker Desktop or http://un-semun-frontend.un-semun.orb.local/ if using OrbStack.
To stop the stack, just run:
docker-compose downYou are all set! ๐
To ingest documents, you can use the ML pipeline API. You can find more information about it in the README.md of the un-ml-pipeline folder.
You basically need to send a POST request to the /run endpoint at URL http://un-semun-api.un-semun.orb.local with a JSON body containing the following fields:
[
{"recordId": "<record_id_0>"},
{"recordId": "<record_id_1>"},
{"recordId": "<record_id_2>"},
...
]You can also send a POST request to the /run_search endpoint, at the same URL, with a natural language query to the UN Digital Library. The API will then scrape the results and ingest them in the database.
{
"q": "<query>"
}You can also include a limit number of results to scrape, by adding a field "n": <value> in the payload.
For instance:
{
"q": "Women in peacekeeping",
"n": 256
}