Thanks to visit codestin.com
Credit goes to github.com

Skip to content

snuzi/web-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Recursively extract all website inbound links

Setup

composer install

Run

$baseurl = 'https://www.bbc.co.uk/food';

$linkStorage = new DBLinkStorage(__DIR__ . '/resources/database');
$extractor = new LinkExtractor($baseurl, $linkStorage);
$extractor->run();

Access extracted links

Check /rakibtg/SleekDB documentation how to make queries

Save links to a different storage

There is a build in storage DBLinkStorage based on NoSql database /rakibtg/SleekDB for this library but you can implement a different storage by implementing LinkStorageInterface

Run tests

Install local server dependencies

This step should be done only once in you local machine:

cd tests/server/
npm install

Run local server

node tests/server/server.js

Run tests

vendor/bin/phpunit

About

Recursively extract all website inbound links

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published