Thanks to visit codestin.com
Credit goes to github.com

Skip to content

012e/scrape-thpt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scrape THPTQG score

Scrape scores from Vietnam's national high school exam with any source of choice. image

Installation

As a library

Simply run:

go get github.com/012e/scrape-thpt

As a binary

go install github.com/012e/scrape-thpt@latest

Usage

As a binary

By default, scrapes score from Báo An Giang and saves them to students table. Basic usage:

$ scrape-thpt -help
  -con int
        Number of concurrent connections.
        Tweaks this number to scrape faster. (default 3)
  -end int
        End index, default value is start index
  -start int
        Start index
  -try int
        Total tries until give up scraping a candidate number (default 3)

As a library

scrapesource interface

All the scraping is mainly based on ScrapeSource interface:

type ScrapSource interface {
    // GetRequest returns a request to the intended scrape source
    GetRequest(sbd int) (*http.Request, error)

    // ParseResponse parses the response after getting the response from the requested source
    ParseResponse(resp *http.Response) (*models.Student, error)
}

For examples checkout scrapesources. It has already implemented scrape sources from baoangiang.com.vn, vietnamnet.vn, angiang.edu.vn.

Scraping

  1. To begin, create a new Scraper:
db := createGormDB()
scraper := scraper.NewScraper(scraper.Config{
    ConcurrentConnection: 3,               // three goroutines for scraping
    StartIndex:           51000001,        // scrapes between those range
    EndIndex:             51000010,
    Retries:              3,         // will retries 3 times before giving up
    DB:                   db,              // any gorm instance
    Source:               baoag.Scraper{}, // anything implements `ScrapeSource` interface
})

Currently, gorm is the only supported orm.

  1. Start scraping and handle errors (slice of structs contain error and candidate number):
scraper.Run()
errors := scraper.GetErrors()
for _, err := range errors {
	fmt.Println("failed %d: %v", err.ID, err.Err)
	// handle error
}

License

MIT license.

About

Scrape điểm thi thpt quốc gia

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published