python爬虫（3）—-retrying模块与cookie相关请求-爬虫请求头与cookie

679次阅读

retrying模块

from retrying import retry
@retry(stop_max_attemp_number=3)
def func1():
print(“this is func1”)
raise ValueError(“this is test error”)import requests
from retrying import retry
“””
专门请求url地址的方法
“””
headers = {“User-Agent”: “Mozilla/5.0”}
@retry(stop_max_attempt_number=3)#让被装饰的函数反复执行三次，三次全部报错才
#报错，如果有一次没抱错，就不会报错
def _parse_url(url):
response = requests.get(url, headers=headers, timeout=5)
return response.content.decode()

def parse_url(url):
try:
html_str = _parse_url(url)
except:
html_str = None
return html_str

if __name__ == ‘__main__’:
url = “http://www.baidu.com”
print(parse_url(url)[:100])

处理cookie相关的请求
人人网{“email”: ”mr_mao_hacker@163.com”,“password”: “alarmchime”}
直接携带cookie请求url地址
1.cookie放在headers中
headers = {“User-Agent”:”…”, “Cookie”:“cookie字符串”}

2.cookie字典传给cookies参数
requests.get(url, cookies=cookie_dict)

先发送post请求，获取cookie，带上cookie请求登录后的页面
1.session = requests.session() #session具有的方法和requests一样
2.session.post(url, data, headers) #服务器设置在本地的cookie会保存在session
3.session.get(url)#会带上之前保存在session中的cookie，能够请求成功

improt requests
#实例化session
seesion = requests.session()
#使用session发送post请求，获取对方保存在本地的cookie
post_url = “http://…”
headers = {“User-Agent”:”…”, “Cookie”:”cookie字符串”}
post_data = {“email”:”…”, “password”:”….”}
session.post(post_url, headers=headers, data=post_data)
#再使用session请求登录后的页面
url = “http://….”
response = session.get(url, headers=headers)
with open(“renren3.html”, ‘w”, encoding=”utf-8″) as f:
f.write(response.contnet.decode())

神龙|纯净稳定代理IP免费测试>>>>>>>>天启|企业级代理IP免费测试>>>>>>>>IPIPGO|全球住宅代理IP免费测试

发表于：Python爬虫

2022-11-01

# Python爬虫

复制链接

赏

python爬虫（3）—-retrying模块与cookie相关请求-爬虫请求头与cookie

相关文章：

HTTP代理设置详解：一步步配置指南

什么是Socks5代理IP及其优势

Socks5代理配置教程及注意事项

什么是代理服务器IP：如何选择合适的

国外代理服务器的优势及选择建议

如何找到可靠的免费代理服务器

在线代理服务器的使用与推荐

HTTP代理服务器的设置及应用实例

静态代理IP怎么填写：步骤与示例

海外静态IP的代理选择与配置