-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Added context.Context Support #255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
See master branch commits for reason
See master branch for reason
Dropped support for Go 1.8
Added Go 1.11 support
Implementing my comment from PR gocolly#242
Took the best components from @vintik100s implementation and my own and merged them into a single impl.
Used best practices for context.Context args and added documentation to new exported components
Updated `antchfx/{htmlquery,xmlquery}` deps to now take advantage of new API in both `FindEachWithBreak`. This now allows breaking out of the `Find` loop if the context is cancelled.
|
I support a less example code to use which not change anything in colly. // contextTransport wrapper a context.Context for cancel requests
type contextTransport struct {
ctx context.Context
trans *http.Transport
}
func (t *contextTransport) RoundTrip(req *http.Request) (*http.Response, error) {
req = req.WithContext(t.ctx)
return t.trans.RoundTrip(req)
}
func collectorWithContext(c *colly.Collector, ctx context.Context) {
// We can stop all requests at `OnRequest` callback
// before send request to HTTP client.
c.OnRequest(func(req *colly.Request) {
select {
case <-ctx.Done():
req.Abort()
default:
}
})
// Use custome Transport to cancel all pending requests at HTTP client,
// which not have chance to stop at OnRequest callback.
trans := &contextTransport{
ctx: ctx,
trans: &http.Transport{},
}
c.WithTransport(trans)
}This will work when cancel the
|
|
Hey! Any reason why this became stuck? Is it because this PR changes lots of function signatures to have context argument, which is a pretty radical change? If so, what about making it much less drastic by only adding And since the common way to use colly is to have func crawl(ctx context.Context, url string) {
c := colly.NewCollector(colly.WithContext(ctx))
c.OnResponse(func(res *colly.Response) {
storeIntoSomeDatabase(ctx, res)
})
// skip
} |
I like this approach a lot! @WGH- could you work on this? |
I think it isn't a deal breaker. We can document this behavior or perhaps we can throw a warning if the user sets a custom timeout and also uses custom ctx. What do you think? |
What I said earlier applies only to the hack with setting context inside |
This extends PR #245 by adding context.Context support, as discussed in Issue #240. This contribution comes after discussion and example implementations from both myself and @vintik100.