分分鐘就用python給他把整個(gè)網(wǎng)站的小說(shuō)都給下載下來(lái)了,,不愧是我啊,!
話不多說(shuō),,我們直接開整!文末附視頻
要準(zhǔn)備的東西
軟件
使用的模塊
requests >>> pip install requests 數(shù)據(jù)請(qǐng)求
parsel >>> pip install parsel 數(shù)據(jù)解析
(完善功能) 添加搜索功能 搜索小說(shuō)名字或者作者名字
tqdm >>> pip install tqdm 下載進(jìn)度條顯示模塊
pandas >>> pip install pandas 輸入的格式好看一些
win + R 輸入cmd 輸入安裝命令 pip install 模塊名 ,;
如果出現(xiàn)爆紅,,可能是因?yàn)椋W(wǎng)絡(luò)連接超時(shí),,切換國(guó)內(nèi)鏡像源,;
黃色是警告 ,可以忽略,;
輸入小說(shuō)名下載
打包成exe程序, 是需要安裝 pyinstaller ,,命令提示符窗口輸入 pip install pyinstaller
代碼展示
下方我會(huì)放上視頻,可以對(duì)照視頻講解的更加清楚,。
# 導(dǎo)入數(shù)據(jù)請(qǐng)求模塊
import requests
# 導(dǎo)入數(shù)據(jù)解析模塊
import parsel
# 導(dǎo)入正則表達(dá)式模塊
import re
# 導(dǎo)入pandas
import pandas as pd
# 導(dǎo)入進(jìn)度條顯示模塊
from tqdm import tqdm
while True:
key_word = input('請(qǐng)輸入你想要下載的小說(shuō)名字(輸入0即可退出): ')
if key_word == '0':
break
search_url = f'https://www.***.com/search.php?q={key_word}'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.82 Safari/537.36'
}
response = requests.get(url=search_url, headers=headers)
# print(response.text)
selector_1 = parsel.Selector(response.text)
divs = selector_1.css('.result-list div.result-item')
# print(divs)
if divs:
lis = []
for div in divs:
novel_name = div.css('.result-game-item-title-link::attr(title)').get() # 小說(shuō)名字
href = div.css('.result-game-item-title-link::attr(href)').get().split('/')[2] # ID
author = div.css('.result-game-item-info p:nth-child(1) span:nth-child(2)::text').get() # 作者
# print(novel_name, href, author)
dit = {
'書名': novel_name,
'作者': author,
'書ID': href,
}
lis.append(dit)
print(f'一共搜索到{len(lis)}條數(shù)據(jù)內(nèi)容, 結(jié)果如下')
search_data = pd.DataFrame(lis)
print(search_data)
key_num = input('請(qǐng)選擇你想要下載小說(shuō)序號(hào): ') # 輸入的數(shù)據(jù)類型字符串?dāng)?shù)據(jù)
novel_id = lis[int(key_num)]['書ID']
url = f'https://www.***.com/book/{novel_id}/'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.82 Safari/537.36'
}
response = requests.get(url, headers)
# print(response.text) # 正則表達(dá)式提取出來(lái)數(shù)據(jù)返回列表 ['天道修改器']
novel_name = re.findall('<h1>(.*?)</h1>', response.text)[0]
novel_info = re.findall('<dd><a href="(.*?)" >(.*?)</a></dd>', response.text)
# print(novel_name)
# print(novel_info)
for novel_url, novel_title in tqdm(novel_info):
# 'https://www.***e.com/book/60126/14362.html'
novel_url = 'https://www.***.com' + novel_url
# print(novel_url, novel_title)
# 1. 發(fā)送請(qǐng)求, 對(duì)于剛剛分析得到的url地址發(fā)送請(qǐng)求
# url = 'https://www./book/60126/14362.html'
response = requests.get(novel_url, headers) # <Response [200]> 返回response響應(yīng)對(duì)象, 200表示請(qǐng)求成功
# 2. 獲取數(shù)據(jù), 獲取服務(wù)器返回的response響應(yīng)數(shù)據(jù)
# response.text 獲取響應(yīng)體返回文本數(shù)據(jù)(網(wǎng)頁(yè)源代碼)
# print(response.text)
# 3. 解析數(shù)據(jù), 提取我們想要的數(shù)據(jù)內(nèi)容 小說(shuō)章節(jié)名字 以及小說(shuō)內(nèi)容
# 提取數(shù)據(jù)方式: xpath css re 這三種方式都是可以提取數(shù)據(jù)
selector = parsel.Selector(response.text) # 把獲取到的response.text 轉(zhuǎn)換成 selector 對(duì)象
# novel_title = selector.css('.bookname h1::text').get() # get獲取第一個(gè)標(biāo)簽數(shù)據(jù) 返回字符串?dāng)?shù)據(jù)
# novel_title_1 = selector.xpath('//*[@class="bookname"]/h1/text()').get() # get獲取第一個(gè)標(biāo)簽數(shù)據(jù) 返回字符串?dāng)?shù)據(jù)
novel_content_list = selector.css('#content::text').getall() # getall 獲取所有標(biāo)簽內(nèi)容, 返回列表數(shù)據(jù)
# 需要把列表轉(zhuǎn)成字符串?dāng)?shù)據(jù) join \n換行符
novel_content = '\n'.join(novel_content_list)
# print(novel_title)
# print(novel_title_1)
# print(novel_content_list)
# print(novel_content)
# 4. 保存數(shù)據(jù)
# w寫入數(shù)據(jù)但是覆蓋 a寫入追加寫入, 寫入文件末尾 b 二進(jìn)制模式
"""
第一章 xxx
小說(shuō)內(nèi)容
第二章 xxx
小說(shuō)內(nèi)容
"""
with open(novel_name + '.txt', mode='a', encoding='utf-8') as f:
f.write(novel_title)
f.write('\n')
f.write(novel_content)
f.write('\n')
# print('正在保存', novel_title)
else:
print('請(qǐng)正確輸入小說(shuō)名字或者作者名字 / 沒有這本書的數(shù)據(jù)..')
視頻講解:
https://www.bilibili.com/video/BV1FT4y1X7B2/
我