requests的高级使用
一、设置代理ip
1、直接在请求的时候加上
proxies
就可以,注意我们一般会写上http
和https
的,这样当遇到http
请求就会走http
字典对应的代理2、具体代码
import requests if __name__ == "__main__": # 定义一个请求头(模拟浏览器) headers = { 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.162 Safari/537.36'} # 定义代理的字典(可以去http://www.xicidaili.com查找最新的代理) proxies = { 'http': 'http://115.223.223.29:9000', 'https': 'https://197.232.21.141:59075' } response = requests.get('http://httpbin.org/get', headers=headers, proxies=proxies) response.encoding = 'utf-8' print(response.text)
二、关于requests
库操作cookie
1、从网站上获取
cookie
import requests if __name__ == "__main__": # 定义一个请求头(模拟浏览器) headers = { 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.162 Safari/537.36'} response = requests.get('https://www.baidu.com', headers=headers) # 迭代出全部的 for k, v in response.cookies.items(): print(k, '===', v) print(response.cookies) print(response.cookies.get_dict())
2、使用
session
会话存储当前网上的cookies
import requests if __name__ == "__main__": url = 'https://www.lagou.com/jobs/positionAjax.json?needAddtionalResult=false' headers = { 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.162 Safari/537.36', 'Referer': 'https://www.lagou.com/jobs/list_python?labelWords=&fromSearch=true&suginput=' } data = { 'first': 'true', 'pn': '1', 'kd': 'python', } # 使用session包装下 session = requests.session() response = session.post(url=url, headers=headers, data=data) print(response.json())
三、不安全证书网站的请求(12306
是最典型的)
1、只需要在请求的时候加上
verify=False
就可以2、具体代码
import requests if __name__ == "__main__": # 定义一个请求头(模拟浏览器) headers = { 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.162 Safari/537.36'} response = requests.get('http://www.12306.cn/mormhweb/', headers=headers, verify=False) response.encoding = 'utf-8' print(response.text)