Thanks to visit codestin.com
Credit goes to github.com

Skip to content

twiny/wbot

Repository files navigation

WBot - a web crawler

A configurable, thread-safe web crawler, provides a minimal interface for crawling and downloading web pages.

Features:

  • Clean minimal API.
  • Configurable: MaxDepth, MaxBodySize, Rate Limit, Parrallelism, User Agent & Proxy rotation.
  • Memory-efficient, thread-safe.
  • Provides built-in interface: Fetcher, Store, Queue & a Logger.

TODO

  • Add support for robots.txt.
  • Add test cases.
  • Implement Fetch using Chromedp.
  • Add more examples.
  • Add documentation.

Bugs

Bugs or suggestions? Please visit the issue tracker.

About

A simple & efficient web crawler.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published