Concurrent Web Crawler (TypeScript)

A concurrent web crawler built with Node.js and TypeScript.
It crawls a website starting from a base URL, extracts structured page data, and exports the results to a CSV report.

Features

Concurrent crawling with configurable limits
URL normalization to avoid duplicate crawling
Extracts:
- Page URL
- H1 title
- First paragraph
- Outgoing links
- Image URLs
CSV report generation
Safe crawling with max page limits and request aborting

Requirements

Node.js 18+
npm

Installation

git clone https://github.com/YOUR-USERNAME/YOUR-REPO.git
cd YOUR-REPO
npm install

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
dist		dist
src		src
.gitignore		.gitignore
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
report.csv		report.csv
tconfig.json		tconfig.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Concurrent Web Crawler (TypeScript)

Features

Requirements

Installation

About

Uh oh!

Releases

Packages

Languages

mrjocantaro/boot-crawler

Folders and files

Latest commit

History

Repository files navigation

Concurrent Web Crawler (TypeScript)

Features

Requirements

Installation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages