Kudzu

A simple web crawler for ruby.

Features

Run single-thread or multi-thread.
Pool HTTP connection.
Restrict links by url-based patterns.
Respect robots.txt.
Store page contents via adapter.

Dependencies

ruby 3.0+
libicu

Installation

Add to your application's Gemfile:

gem 'kudzu'

Then run:

$ bundle install

Usage

Crawl html files in example.com:

crawler = Kudzu::Crawler.new do
  user_agent 'YOUR_AWESOME_APP'
  add_filter do
    focus_host true
    allow_mime_type %w(text/html)
  end
end
crawler.run('http://example.com/') do
  on_success do |page, link|
    puts page.url
  end
end

Adapters

This gem supports only in-memory crawling by default. Use following adapter to save page contents persistently:

kudzu-adapter-active_record

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/kanety/kudzu. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.

License

The gem is available as open source under the terms of the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 146 Commits
.github/workflows		.github/workflows
lib		lib
spec		spec
.gitignore		.gitignore
.rspec		.rspec
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
Gemfile		Gemfile
LICENSE		LICENSE
README.md		README.md
Rakefile		Rakefile
kudzu.gemspec		kudzu.gemspec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kudzu

Features

Dependencies

Installation

Usage

Adapters

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

License

kanety/kudzu

Folders and files

Latest commit

History

Repository files navigation

Kudzu

Features

Dependencies

Installation

Usage

Adapters

Contributing

License

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

Packages