Thanks to visit codestin.com
Credit goes to github.com

Skip to content

slim-shah/Sitemap-Generator

Repository files navigation

Sitemap-Generator

Sitemap Generator is extended version of web crawler made by Bucky Roberts
(Github link: https://github.com/buckyroberts/Spider)
This bot allows you to extract keywords from website and store it in separate dictionary file dict.txt for indexing it in future. In addition, it also generates sitemap.html to view all collected links.

FootNotes:

  • This is multi-threaded program so twik it carefully !!!
  • SocialHunt.xml is not output of this project, it is ideal example of sitemap and it was generated by google sitemap generator
  • Currently this bot crawls only 30 links on website but you can change this by editing spider.py file line number 108.
  • You may need to install follwing library:
    • Beautiful Soup 4
    • urllib.request
    • nltk.corpus this is used for keyword processing
  • Enjoy Coding !

About

For tutorial video go to this website

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published