Thanks to visit codestin.com
Credit goes to github.com

Skip to content

mrjocantaro/boot-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Concurrent Web Crawler (TypeScript)

A concurrent web crawler built with Node.js and TypeScript.
It crawls a website starting from a base URL, extracts structured page data, and exports the results to a CSV report.

Features

  • Concurrent crawling with configurable limits
  • URL normalization to avoid duplicate crawling
  • Extracts:
    • Page URL
    • H1 title
    • First paragraph
    • Outgoing links
    • Image URLs
  • CSV report generation
  • Safe crawling with max page limits and request aborting

Requirements

  • Node.js 18+
  • npm

Installation

git clone https://github.com/YOUR-USERNAME/YOUR-REPO.git
cd YOUR-REPO
npm install

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published