python爬虫如何获取电影票房数据

1,326次阅读

没有评论

最近上新了不少新电影，也不知道哪一个电影好看，我们可以使用python爬虫获取数据来分析一下。这里我们把整体获取的流程分为需求分析和代码部分，下面小伙伴们就一起来看看怎样使用python爬虫获取电影票房数据吧。

1.简单需求分析

一边觉得可以把验证码取下来填上去获取cookies，另一边觉得可以先登录再取cookies，当然他们都成功了。

唯独用selenium去登录取cookies的爬下来是乱码。

2.代码实现

import requests
import re
from lxml import etree
import random
from concurrent.futures import ThreadPoolExecutor
import time
 
user_agent=[
# 请自己放上十几个头
]
#下面的cookie自己加，建议加多个
cookie=[]
 
list_urls=[]
def geturl(page):
    headers={
        'Cookie':random.choice(cookie),
        'User-Agent':random.choice(user_agent)
    }
    time.sleep(1)
    page = requests.get("http://58921.com/alltime?page={}".format(int(page)),headers=headers)
    html = page.content.decode(encoding='utf-8')
    with open("test.html",'wb') as f:
        f.write(html.encode())
    xpath_data=etree.HTML(page.content)
    list_urls_raw=xpath_data.xpath('//*[@id="content"]/div[3]/table/tbody/tr/td[3]/a/@href')
    # print(list_urls_raw)
    for url in list_urls_raw:
        list_urls.append(url)
    return list_urls
 
 
def get_number(url_half):
    headers={
        'User-Agent':random.choice(user_agent)        
    }
    Html=requests.get("http://58921.com"+url_half+"/boxoffice",headers).content.decode("utf-8")
    # print(Html)
    pattern_number = re.compile(r'(最新票房 (.+?))')
    pattern_name=re.compile(r'<h3>(.*)票房统计(.*)</h3>')
    # print(pattern)
    number=pattern_number.findall(Html)[0]
    name=pattern_name.findall(Html)[0]
    print(number,name)
    return number,name
 
with ThreadPoolExecutor(max_workers=2) as executor_first:
    for i in range(1,30): # 要几页自己调
        executor_first.submit(geturl,i)
print(list_urls)
print(len(list_urls))
with ThreadPoolExecutor(max_workers=2) as executor_second:
    executor_second.map(get_number,list_urls)

成果

以上就是python爬虫获取电影票房数据的办法，感兴趣的小伙伴也可以跟着小编的流程试一试。本文来源于网络，如有雷同联系作者修改。

神龙|纯净稳定代理IP免费测试>>>>>>>>天启|企业级代理IP免费测试>>>>>>>>IPIPGO|全球住宅代理IP免费测试

发表于：Python爬虫

2021-05-10

复制链接

赏

python爬虫如何获取电影票房数据

相关文章：

HTTP代理设置详解：一步步配置指南

什么是Socks5代理IP及其优势

Socks5代理配置教程及注意事项

什么是代理服务器IP：如何选择合适的

国外代理服务器的优势及选择建议

如何找到可靠的免费代理服务器

在线代理服务器的使用与推荐

HTTP代理服务器的设置及应用实例

静态代理IP怎么填写：步骤与示例

海外静态IP的代理选择与配置