Thanks to visit codestin.com Credit goes to github.com
We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent 3e313ff commit da492bbCopy full SHA for da492bb
Day66-75/Scrapy爬虫框架分布式实现.md
@@ -6,11 +6,25 @@
6
7
### Scrapy分布式实现
8
9
-
10
11
-### 布隆过滤器
12
13
+1. 安装Scrapy-Redis。
+2. 配置Redis服务器。
+3. 修改配置文件。
+ - SCHEDULER = 'scrapy_redis.scheduler.Scheduler'
+ - DUPEFILTER_CLASS = 'scrapy_redis.dupefilter.RFPDupeFilter'
14
+ - REDIS_HOST = '1.2.3.4'
15
+ - REDIS_PORT = 6379
16
+ - REDIS_PASSWORD = '1qaz2wsx'
17
+ - SCHEDULER_QUEUE_CLASS = 'scrapy_redis.queue.FifoQueue'
18
+ - SCHEDULER_PERSIST = True(通过持久化支持接续爬取)
19
+ - SCHEDULER_FLUSH_ON_START = True(每次启动时重新爬取)
20
21
### Scrapyd分布式部署
22
23
+1. 安装Scrapyd
24
+2. 修改配置文件
25
+ - mkdir /etc/scrapyd
26
+ - vim /etc/scrapyd/scrapyd.conf
27
+3. 安装Scrapyd-Client
28
+ - 将项目打包成Egg文件。
29
+ - 将打包的Egg文件通过addversion.json接口部署到Scrapyd上。
30
+
0 commit comments