gscrape

gscrape is utility for scraping data from HTML structurs and saving to output file. gscrape work in multithread&concurancy mode. You may set number of the workers working in same time in the command line flag. The data that will be scraped from HTML should specified in the code of org.go file. The object OrgHtmlJson imlementing Scrape func of interface Scraper is customize the HTML data processing and the output data format.

Usage:

gscrape <-h> <-t NNN> <-w NNN> <-o output_file> -i input_file <url>

Flags:

-h -help: Show help (Default: false)
-t: The timeout in seconds for waiting a responses from web sites. (Default: 5)
-v -verbouse: Output fool log to StdOut (Default: false. Loging to the file gscrape.log in the local dir)
-w: The number of workers working in the same time. (Default: 5)
-o: File for result output. If the flag is absent then output will to the StdOut.
-i: Input web src for scraping data. If the flag is absent then input should from last argument.

The list of URLs for processing defined in the input file or in the command line (one URL). The parameters in the URL can include the masks of the types:
[nnn:nnn] - range between numbers not including last number (GO slice style)
[word1;word2;word3] - enumeration of strings

For exaple the mask

html://www.site.com/path?chapter=[one;two]&page=[1:3]

will trasform to the URLs:

html://www.site.com/path?chapter=one&page=1
html://www.site.com/path?chapter=one&page=2
html://www.site.com/path?chapter=two&page=1
html://www.site.com/path?chapter=two&page=2

The line in the input file may be commented by "//"

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
go.mod		go.mod
go.sum		go.sum
http_helper.go		http_helper.go
input.txt		input.txt
main.go		main.go
org.go		org.go
pool.go		pool.go
readme.md		readme.md
t_test.go		t_test.go
tasks.go		tasks.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

gscrape

Usage:

Flags:

About

Uh oh!

Releases

Packages

Uh oh!

Languages

mioxin/gscrape

Folders and files

Latest commit

History

Repository files navigation

gscrape

Usage:

Flags:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages