摘要:
把起始URL放到redis中去 from scrapy_redis.spiders import RedisSpider # 继承RedisSpider class ChoutiSpider(RedisSpider): name = 'chouti' allowed_domains = ['chou 阅读全文
摘要:
去重的配置: DUPEFILTER_KEY = 'dupefilter:%(timestamp)s' DUPEFILTER_CLASS = "scrapy_redis.dupefilter.RFPDupeFilter" 调度器配置: SCHEDULER = "scrapy_redis.schedul 阅读全文
摘要:
整个爬虫流程 1、scrapy crawl chouti --nolog 2、找到 SCHEDULER = "scrapy_redis.scheduler.Scheduler" 配置并实例化调试器对象 - 执行Scheduler.from_crawler - 执行Scheduler.from_set 阅读全文