An infinite scroll processing package using go-rod/rod.
A basic workflow of creating a infinite scroll processor with IS
.
- (1) Create a package
- (1.1) Write a
info
struct- REQUIRED: embed
is.InfoBase
foris.IInfo
interface - Add additional field as needed
- Override
String()
member function if needed
- REQUIRED: embed
- (1.2) Write a
processor
struct- REQUIRED: embed
*is.Processor
- Add additional field as needed
- REQUIRED: embed
- (1.3) Write package/struct level
New
function- (1.3.1) REQUIRED:
*is.Property
as one of its arguments - (1.3.2) REQUIRED: use
is.New(property)
to initialize embedded*is.Processor
- (1.3.3) Override
is.Processor
field functions as needed- As a bare minimum, MUST override
V020_Elements
andV030_ElementInfo
. Elseis.Run
will do nothing.
- As a bare minimum, MUST override
- (1.3.1) REQUIRED:
- (1.1) Write a
- (2) Write
main
- (2.1) Prepare a
is.Property
object, populate field as needed- REQUIRED: populate
Page
field (a*rod.Page
, representing a browser tab) - Set
UrlLoad
,true
to load page atUrlStr
. (Default:false
) - Set
UrlStr
to target site address. Not required ifUrlLoad
isfalse
- REQUIRED: populate
- (2.2) Allocate the
processor
- (2.3) Initialize the
processor
struct with theproperty
- (2.4) Call
Run
- (2.5) Output result
- (2.1) Prepare a
- x-feed in example - X.com feed processing using [IS]. With comment referencing workflow above.
- yt-toolbox - A more elaborate [IS] command line application.
The info
struct and IInfoList
provide a basic means to store and process information during Run()
.
// (1.1) Write a `info` struct
type XFeedInfo struct {
is.InfoBase // (1.1) REQUIRED: embed [is.InfoBase] to get [is.IInfo] interface
// Added fields
User string `json:"user,omitempty"`
Text string `json:"text,omitempty"`
}
The is.InfoBase
implemented the is.IInfo
interface functions:
Function | Description | Override Required |
---|---|---|
Matched() bool | Getter, return value of matched |
No |
MatchedStr() string | Getter, return value if matchedStr |
No |
SetMatched(matched bool) | Setter, set value of matched |
No |
SetMatchedStr(matchedStr string) | Setter, set value of matchedStr |
No |
String() string | Info struct to string | As needed |
The is.IInfo
allow info struct to be passed between the processor's V*
field functions in Run()
.
Add fields to the struct to store information.
// (1.2) Write a `processor` struct
type XFeedProcessor struct {
*is.Processor // (1.2) REQUIRED: embed `*is.Processor`
}
// (1.3) Override `is.Processor` field functions as needed
func (x *XFeedProcessor) override() {
x.V020_Elements = func(element *rod.Element) *rod.Elements {
prefix := x.MyType + ".V020"
ezlog.Trace().N(prefix).TxtStart().Out()
var es rod.Elements
tagName := "article"
if element == nil {
es = x.Page.MustElements(tagName)
} else {
es = element.MustElements(tagName)
}
ezlog.Trace().N(prefix).TxtEnd().Out()
return &es
}
x.V030_ElementInfo = func(element *rod.Element, index int) is.IInfo {
prefix := x.MyType + ".V030"
ezlog.Trace().N(prefix).TxtStart().Out()
ezlog.Trace().M(element.MustHTML()).Out()
info := new(XFeedInfo)
var (
err error
e *rod.Element
tag string
)
// Username
tag = "[data-testid='User-Name']"
e, err = element.Element(tag)
if err == nil && e != nil {
tag = "a"
e, err = e.Element(tag)
if err == nil && e != nil {
info.User = e.MustText()
}
}
// Tweet text
tag = "[data-testid='tweetText']"
e, err = element.Element(tag)
if err == nil && e != nil {
info.Text = e.MustText()
}
ezlog.Debug().N(prefix).Nn("info").M(info).Out()
ezlog.Trace().N(prefix).TxtEnd().Out()
return info
}
is.Processor
comes with 14 field functions:
Function | Description | Override Required |
---|---|---|
LoadPage func() | Load UrlStr |
No |
ScrollCalculation func(state *State) (scroll bool) | Detect end of page | No |
ScrollElement func(element *rod.Element) | Use rod.element.MustScrollIntoView for scrolling |
No |
V010_Container func() (container *rod.Element) | Return a container element. (default: Property.Container ) |
As needed |
V020_Elements func(container *rod.Element) *rod.Elements | Return collection of repeating elements in container from V010_Container (default: nil ) |
Yes |
V030_ElementInfo func(element *rod.Element, index int) (info IInfo) | Extract information from element , and put them into an [IInfo] structure, and return it. (default: `nil) |
Yes |
V040_ElementMatch func(element *rod.Element, index int, info IInfo) (matched bool, matchedStr string) | Determine element is a match or not base on info (default: true , "" ) |
As needed |
V050_ElementProcessMatched func(element *rod.Element, index int, info IInfo) | Do some processing (eg, print, write to file, db, etc) if element is a match (default: do nothing) |
As needed |
V060_ElementProcessUnmatch func(element *rod.Element, index int, info IInfo) | Do some processing if element is not a match (default: do nothing) |
As needed |
V070_ElementProcess func(element *rod.Element, index int, info IInfo) | Do some processing regardless of element is a match or not (default: do nothing) |
As needed |
V080_ElementScrollable func(element *rod.Element, index int, info IInfo) bool | Determine if element is scrollable (default: true) |
As needed (eg. element removed from DOM) |
V090_ElementLoopEnd func(element *rod.Element, index int, info IInfo) | Do some processing if required (default: do nothing) | As needed |
V100_ScrollLoopEnd func(state *State) | Do some processing if required (default: do nothing) | As needed |
Following is pseudo code of is.Processor.Run()
. Full code is here.
Run() {
state := new(State).New()
LoadPage()
Container = V010_Container()
for {
// -- SCROLL LOOP - START
if ScrollLoopBreak(state) { break }
ScrollElement(state.ElementLast)
elements = V020_Elements(Container)
for element(new ones after scroll) in elements {
// -- ELEMENTS LOOP - END
info := V030_ElementInfo(element, index)
matched, matchedStr := V040_ElementMatch(element, index, info)
if matched {
V050_ElementProcessMatched(element, index, info)
} else {
V060_ElementProcessUnmatch(element, index, info)
}
V070_ElementProcess(element, index, info)
if IInfoList != nil && info != nil { append(IInfoList, info) }
V080_ElementScrollable(element, index, info) { update state }
V090_ElementLoopEnd(element, index, info)
// -- ELEMENTS LOOP - END
}
ScrollCalculation(state)
V100_ScrollLoopEnd(state)
// -- SCROLL LOOP - END
}
}
The info
struct and IInfoList
provide a basic means to store and process information during Run()
.
V030_ElementInfo
, V040_ElementMatch
, V050_ElementProcessMatched
, V060_ElementProcessUnmatch
, V070_ElementProcess
The MIT License (MIT)
Copyright © 2025 John, Sing Dao, Siu [email protected]
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.