设置动态请求头

前面介绍过了,突破反爬虫策略中有一条设置请求头,虽然有个网站有很多请求头供我们使用,但是还是工作量太多了,刚好有别人写好的第三方包fake-useragent的地址

一、关于fake-useragent的使用

  • 1、安装依赖包

    pip3 install fake-useragent
    
  • 2、定义一个中间件来随机取一个请求头

    from fake_useragent import UserAgent
    
    class RandomUserAgentMiddleware(object):
        """
        使用fake-useragent定义一个随机切换头部的中间件
        """
        def __init__(self):
            super(RandomUserAgentMiddleware, self).__init__()
            self.ua = UserAgent()
    
        def process_request(self, request, spider):
            request.headers['User-Agent'] = self.ua.random
    
  • 3、测试

    class HttpbinSpider(scrapy.Spider):
        name = 'httpbin'
        allowed_domains = ['httpbin.org']
        start_urls = ['http://httpbin.org/user-agent']
    
        def parse(self, response):
            print('*' * 100)
            print(response.text)
            print('*' * 100)
            yield scrapy.Request(self.start_urls[0], callback=self.parse, dont_filter=True)
    

results matching ""

    No results matching ""