Squirrel is a crawler for the linked web. It provides several tools to search and collect data from the heterogeneous content of the linked web.
Documentation, tutorials and more: https://dice-group.github.io/squirrel.github.io/
- Java 1.8
- Apache Maven 3.6.0
- Docker 19.03.12
- Docker 19.03.12
or
- ORCA benchmark on the HOBBIT platform
Clone the repository in a directory of your choice with:
git clone https://github.com/dice-group/SquirrelEnter into the Squirrel directory and start RabbitMQ and MongoDB containers:
docker-compose up -d mongodb rabbitSet up your seeds in the file seed/seeds.txt and start the frontier and one worker instance with:
docker-compose up frontier worker1https://www.bibsonomy.org/bibtex/29fe2ef0c2e1908276d424c1ca3e06cbf/dice-research
- Go to https://master.project-hobbit.eu/
- Register an account or log in into an existing one
- Go to "Benchmarks"
- Select "ORCA" in the Benchmark list
- Select the system and set all parameters (also can be found by following links in the paper):
| Parameter | Effectiveness | Efficiency |
|---|---|---|
| Average crawl delay | 0 | 0 |
| Average node degree | 20 | 20 |
| Average ratio of disallowed resources | 0 | 0 |
| Average resource degree | 9 | 9 |
| Disallowed resources | 0 | 0 |
| Dump file compression ratio | 0.3 | 0 |
| Node size definition | Static | Static |
| Number of nodes | 100 | 200 |
| RDF dataset size | 1000 | 1000 |
| Seed | 20200318 | 20200318 |
| Use N3 dumps | true | true |
| Use NT dumps | true | true |
| Use RDF/XML dumps | true | true |
| Use TTL dumps | true | true |
| Weight of CKAN node occurrence | 5 | 0 |
| Weight of dereferencing HTTP node occurrence | 21 | 100 |
| Weight of HTTP dump file node occurrence | 40 | 0 |
| Weight of RDFa node occurrence | 4 | 0 |
| Weight of SPARQL node occurrence | 30 | 0 |
- Use "Submit" to queue the experiment
- Watch the received link for experiment results. You can use "Experiments → Experiment Status" page to check if it's still running.
It is also possible to deploy your own HOBBIT platform. Refer to the HOBBIT platform manual: https://hobbit-project.github.io/. In this case you may need system adapters for ORCA as well: https://github.com/topics/orca-system-adapter.