New version: https://github.com/sightnet
This project will not be maintained.
- Multithreading
- Cache
- Robots.txt
- Proxy
- Queue (BFS)
- Detect Trackers
- Http -> Https
- Encryption (rsa)
- API
- Proxy
- Nodes
- Rating
Please run the build every time to change the arguments.
The site is launched by default on port 8080 AND with tor proxy (!!!), to edit it you need to change config.json and rebuild website.
The api key for the database must be changed in the config and when the database is started(--api-key).
sudo docker pull typesense/typesense:0.24.0.rcn6
mkdir /tmp/typesense-data
sudo docker run -p 8108:8108 -v/tmp/data:/data typesense/typesense:0.24.0.rcn6 --data-dir /data --api-key=xyzsudo docker-compose build crawler --build-arg SITES="$(cat sites.txt)" --build-arg THREADS=1 --build-arg CONFIG="$(cat config.json)"
sudo docker-compose up crawlersudo docker-compose build website --build-arg CONFIG="$(cat config.json)"
sudo docker-compose up websitecd scripts && sh install_deps.shcd scripts && sh build_all.shThe site is launched by default on port 8080 AND with tor proxy (!!!), to edit it you need to change config.json.
The api key for the database must be changed in the config and when the database is started(--api-key).
mkdir /tmp/typesense-data &&
./typesense-server --data-dir=/tmp/typesense-data --api-key=xyz --enable-cors &&
sh scripts/init_db.sh./crawler ../../sites.txt 5 ../../config.json
#[sites_path] [threads_count] [config path]./website ../../config.json
#[config path]¯\(ツ)/¯
- Docker
- Encryption (assymetric)
- Multithreading crawler
- Robots Rules (from headers & html) & crawl-delay
- Responsive web design
- Own FTS (...)
- Images Crawler
./config.json
GNU Affero General Public License v3.0