A vendor plugin for Saber that allows webmasters to scrape & archive content from the web & RSS feeds.
Clone the repository anywhere on your web server outside of Saber
git clone https://github.com/Datasilk/Charlotte- Open solution
Charlotte.slnusing Visual Studio 2019 or newer & build Charlotte - execute
bin\x64\Debug\Charlotte.exe -registerin PowerShell to register the Charlotte console application as a Windows Service, which will automatically start the WCF Hosted Service
While using the latest source code for Saber, do the following:
- Execute
git clone https://github.com/Datasilk/Saber-Collector Collectorwithin the folder/App/Vendors/
While using the latest release of Saber, do the following:
- Download latest release of Saber.Vendors.Collector
- Extract all files & folders from either the
win-x64orlinux-x64zip folder to Saber's/Vendors/folder
- run command
./publish.bat - publish
bin/Publish/Collector.7zas latest release
{
"browser": {
"endpoint": {
"development": "http://localhost:7007/GetDOM",
"staging": "http://localhost:7007/GetDOM",
"production": "http://localhost:7007/GetDOM"
}
},
"storage": {
"development": "/Content/Collector/",
"staging": "/Content/Collector/",
"production": "/Content/Collector/"
},
"domains": {
"downloads": {
"minIntervals": 60
}
}
}The URL for your instance of Charlotte's Web, a load balancer application that delegates requests to a cluster of Charlotte workers.
The relative or absolute path to the folder where you'd like to store downloaded content for Collector.
This path should typically be located on a network drive where instances of Collector running on multiple machines can access the drive in a local network.
Also note that the path must end with a / slash.
This number is used to make sure that Collector doesn't make too many requests on any given domain in a short period of time. The value is in seconds and determines the minimum time between each request made to a single domain. Collector will exclude any download queue items that meet this criteria when finding the next item in queue.