python: 爬虫利器requests

286次阅读
没有评论
python:

requests并不是系统自带的模块,他是第三方库,需要安装才能使用

requests库使用方式

闲话少说,来,让我们上代码: 简单的看一下效果:

import requests requests = requests.session() headers = { ‘User-Agent’:’Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0′ } url = "http://httpbin.org" response = requests.get(url, headers=headers, timeout=None) print(response.text) print(response.cookies) print(response.content) print(response.content.decode("utf-8")) print(respone.json())

基本的post请求:

data = { "name":"zhaofan", "age":23 } response = requests.post("http://httpbin.org/post",data=data) print(response.text)

对于无效的网站证书请求方法:

import requests from requests.packages import urllib3 urllib3.disable_warnings() response = requests.get("https://www.12306.cn",verify=False) print(response.status_code)

代理设置:

import requests

proxies= { "http":"http://127.0.0.1:9999", "https":"http://127.0.0.1:8888" } response = requests.get("https://www.baidu.com",proxies=proxies) print(response.text)

如果代理需要设置账户名和密码,只需要将字典更改为如下: proxies = { "http":"http://user:password@127.0.0.1:9999" } 如果你的代理是通过sokces这种方式则需要pip install "requests[socks]" proxies= { "http":"socks5://127.0.0.1:9999", "https":"sockes5://127.0.0.1:8888" }

超时设置

通过timeout参数可以设置超时的时间

没有超时时间,一直等待 timeout=None

异常捕捉:

import requests

from requests.exceptions import ReadTimeout,ConnectionError,RequestException

try: response = requests.get("http://httpbin.org/get",timout=0.1) print(response.status_code) except ReadTimeout: print("timeout") except ConnectionError: print("connection Error") except RequestException: print("error")

神龙|纯净稳定代理IP免费测试>>>>>>>>天启|企业级代理IP免费测试>>>>>>>>IPIPGO|全球住宅代理IP免费测试

相关文章:

版权声明:Python教程2022-11-01发表,共计1430字。
新手QQ群:570568346,欢迎进群讨论 Python51学习