scrapy自定义命令

一、关于scrapy内置命令

(scrapy_page) ➜  ~ scrapy -h
Scrapy 1.5.0 - no active project

Usage:
  scrapy <command> [options] [args]

Available commands:
  bench         Run quick benchmark test
  fetch         Fetch a URL using the Scrapy downloader
  genspider     Generate new spider using pre-defined templates
  runspider     Run a self-contained spider (without creating a project)
  settings      Get settings values
  shell         Interactive scraping console
  startproject  Create new project
  version       Print Scrapy version
  view          Open URL in browser, as seen by Scrapy

  [ more ]      More commands available when run from project directory

Use "scrapy <command> -h" to see more info about a command
(scrapy_page) ➜  ~

二、关于单个爬虫的自定义命令

  • 1、在根目录下创建一个start.py的文件(文件随意命名)

    from scrapy.cmdline import execute
    
    if __name__ == "__main__":
        execute("scrapy crawl 爬虫名字 --nolog".split())
        # 或者是这样
        # execute(["scrapy", "crawl", "blog", "--nolog"])
    
  • 2、直接点击右键运行文件就可以跑爬虫,不需要再手动输入命令行

  • 3、改进上面的脚本

    import os
    import sys
    from scrapy.cmdline import execute
    
    if __name__ == "__main__":
        sys.path.append(os.path.dirname(os.path.abspath(__file__)))
        execute("scrapy crawl blog --nolog".split())
        # 或者是这样
        # execute(["scrapy", "crawl", "blog", "--nolog"])
    

三、关于工程中多个爬虫一起执行的自定义命令

  • 1、在spiders同级创建任意目录,如:commands
  • 2、在其中创建crawlall.py文件此处文件名就是自定义的命令

    from scrapy.commands import ScrapyCommand
    
    class Command(ScrapyCommand):
        requires_project = True
    
        def syntax(self):
            return '[options]'
    
        def short_desc(self):
            return 'Runs all of the spiders'
    
        def run(self, args, opts):
            # 获取到全部的爬虫
            spider_list = self.crawler_process.spiders.list()
            for name in spider_list:
                self.crawler_process.crawl(name, **opts.__dict__)
            self.crawler_process.start()
    
  • 3、在settings.py 中添加配置

    # COMMANDS_MODULE = '项目名称.目录名称'
    # 自定义爬虫命令
    COMMANDS_MODULE = 'csdn.commands'
    
  • 4、直接可以使用scrapy -h查看命令

results matching ""

    No results matching ""