Python 爬虫案例（七）

1,090次阅读

Pycharm爬虫连接SQL Server数据库
接上一篇，爬取的网站还是：https://xxgk.eic.sh.cn/jsp/view/eiaReportList.jsp 在test.py爬虫脚本中编写好需要爬取的数据：

import scrapy from Agnes_test1.items import AgnesTest1Item import re

class KeywordSpider(scrapy.Spider): name = 'test' start_urls = ['https://xxgk.eic.sh.cn/jsp/view/eiaReportList.jsp']

def parse(self, response): result_list = response.xpath("//*[@id='menu']/following-sibling::table/tr") for result in result_list: item=AgnesTest1Item() title= result.xpath("./td[2]/text()").extract_first() department= result.xpath("./td[4]/text()").extract_first() address = result.xpath("./td[6]/text()").extract_first() type = result.xpath("./td[7]/text()").extract_first() startDate= result.xpath("./td[8]/text()").extract_first() endDate = result.xpath("./td[9]/text()").extract_first() if title: item['title'] = re.sub("\xa0","",title) item['department']=re.sub("\xa0","",department) item['address'] =re.sub("\xa0","",address) item['type']=re.sub("\xa0","",type) item['startDate']=re.sub("\xa0","",startDate) item['endDate']=re.sub("\xa0","",endDate) yield item

接下来是pipelines.py，在这里将爬取的数据写入到SQL server数据库中：

import pymssql

class AgnesTest1Pipeline: def open_spider(self, spider): #数据直接导入sql server数据库 self.conn = pymssql.connect(host='localhost', port='1434', user='sa', password='123', database='agnes',charset='utf8') self.cursor = self.conn.cursor() print("数据库连接成功")

def process_item(self, item, spider): try: self.cursor.execute( "INSERT INTO dbo.project(title,department,address,type,startDate,endDate) VALUES (%s,%s,%s,%s,%s,%s)", (item['title'], item['department'], item['address'], item['type'], item['startDate'], item['endDate']))

self.conn.commit() print("数据插入成功") except Exception as ex: print(ex) return item

def close_spider(self, spider): self.conn.close()

最后是settings.py:

ITEM_PIPELINES = { 'Agnes_test1.pipelines.AgnesTest1Pipeline': 100, }

然后先别着急运行，先打开SQL server Management Studio,在数据库中新建一个数据库，我建的数据库名字为：agnes，之后在库中建立一个表格名字为project，添加列名，最后左列就成了这样（列的属性瞎选的，请勿吐槽）：
Python
最后在回到pycharm界面，在terminal中运行：scrapy crawl test，打开数据库查询，导入成功：
Python

神龙|纯净稳定代理IP免费测试>>>>>>>>天启|企业级代理IP免费测试>>>>>>>>IPIPGO|全球住宅代理IP免费测试

发表于：Python爬虫

2022-10-28

# Python爬虫

复制链接

赏

Python 爬虫案例（七）

相关文章：

HTTP代理设置详解：一步步配置指南

什么是Socks5代理IP及其优势

Socks5代理配置教程及注意事项

什么是代理服务器IP：如何选择合适的

国外代理服务器的优势及选择建议

如何找到可靠的免费代理服务器

在线代理服务器的使用与推荐

HTTP代理服务器的设置及应用实例

静态代理IP怎么填写：步骤与示例

海外静态IP的代理选择与配置