This project requires a Saxon EE license for XML processing. The license file (saxon-license.lic) is not included in this repository for security reasons.
For CI/CD environments, set the SAXON_LICENSE environment variable with the license content. For local development, place your saxon-license.lic file in the lib/ directory.
make -j $(nproc) testmake -j $(nproc) test indexINDEX=./target/dnf.index docker compose -p defako --profile=lite -f korap4dnb-compose.yml up -d
xdg-open http://localhost:4001/?q=Testssh -L 4001:localhost:4001 korap.dnb.de
xdg-open http://localhost:4001/?q=Testdocker compose -p defako downThis is actually the first step, but usually not necessary, as the comparatively expensive TEI P5 files in p5 folder are not deleted by make clean.
docker run --rm --init -v ./grobid.yaml:/opt/grobid/grobid-home/config/grobid.yaml --ulimit core=0 -e JAVA_OPTS=-Xmx400g -p 8070:8070 grobid/grobid:0.8.1java -jar lib/org.grobid.client-0.5.4-SNAPSHOT.one-jar.jar -n 100 -in /mnt/data/Diss-Sample/PDF -out p5Configure Apache2 to proxy requests to the local KorAP server:
ProxyPass /defako http://localhost:4001
ProxyPassReverse /defako http://localhost:4001Kupietz, Marc/Leinen, Peter/Diewald, Nils (2024): Towards a Very Large German Academic Corpus: Step 1: Building and Making Available a Corpus of 10,000 Doctoral Dissertations. Talk given at the Workshop on Comparable and Interoperable Corpora of Academic Texts @CLARIN2024 on 2024-10-18, Barcelona. https://corpora.ids-mannheim.de/slides/2024-10-17-Towards-a-German-Academic-Corpus/#/.