Sitemap Generator is extended version of web crawler made by Bucky Roberts
(Github link: https://github.com/buckyroberts/Spider)
This bot allows you to extract keywords from website and store it in separate dictionary file dict.txt for indexing it in future. In addition,
it also generates sitemap.html to view all collected links.
- This is multi-threaded program so twik it carefully !!!
- SocialHunt.xml is not output of this project, it is ideal example of sitemap and it was generated by google sitemap generator
- Currently this bot crawls only 30 links on website but you can change this by editing spider.py file line number 108.
- You may need to install follwing library:
- Beautiful Soup 4
- urllib.request
- nltk.corpus this is used for keyword processing
- Enjoy Coding !