bbscraper

Simple phpBB forum thread web scraper written in Python. Designed for command-line usage. Outputs data as CSV format into stdout.

This is an experiment-driven project. The code tends to be, but it's not fully idiomatic according to PEP8. The current implementation is very ad-hoc for a concrete particular scenario, however extending it to cover additional behavior and features should be trivial.

The scraped data fields per thread post are (in order): Post ID, Post name, Date of the post and Post body

Uses urllib3 for HTTP networking and BeautifulSoup for HTML parsing.

This package is not available via pip. You must download or clone this repository in order to use it.

Requirements

python +3 (developed using [email protected])
pip (optional)

Installation

Clone this repository:

git clone https://github.com/h2non/bbscraper.git && cd bbscraper

Install dependencies via pip:

sudo pip install -r requirements.txt

Or alternatively using setup.py:

python setup.py install

Command-line interface

usage: __main__.py [-h] -u URL [-f FORMAT] [-l LIMIT]

Scrape all thread posts of a phpBB based forum

optional arguments:
  -h, --help            show this help message and exit
  -u URL, --url URL     Full URL to forum thread
  -f FORMAT, --format FORMAT
                        Output format (default to CSV)

Report any issues to https://github.com/h2non/bbscraper/issues

Scrap the website and save data in forum.csv:

python bbscraper -u http://www.oldclassiccar.co.uk/forum/phpbb/phpBB2/viewtopic.php?t=12591 > forum.csv

Development

Run tests:

make test

License

MIT - Tomas Aparicio

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
bbscraper		bbscraper
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

bbscraper

Requirements

Installation

Command-line interface

Development

License

About

Uh oh!

Releases

Packages

Languages

License

bmritz/bbscraper

Folders and files

Latest commit

History

Repository files navigation

bbscraper

Requirements

Installation

Command-line interface

Development

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages