久久国产成人av_抖音国产毛片_a片网站免费观看_A片无码播放手机在线观看,色五月在线观看,亚洲精品m在线观看,女人自慰的免费网址,悠悠在线观看精品视频,一级日本片免费的,亚洲精品久,国产精品成人久久久久久久

分享

Python編程開發(fā)爬蟲抓取www.tmd86.com所有妹子圖片

 昵稱65365553 2019-07-17

懂點(diǎn)編程的館友都知道Python完善的網(wǎng)絡(luò)接口非常適合開發(fā)爬蟲和AI編程,。

今天分享自動(dòng)爬取妹子圖片的代碼,,不到100行真的超級簡單,、快捷,。

代碼開始:

import requests

from lxml import etree

import os

def a ():

    url = 'http://www./xinggan/'

    response = requests.get(url)

    # with open('.txt' , 'wb' ) as f :

    #     f.write(response.content)

    html_ele = etree.HTML(response.text)

    # li_ele_list = html_ele.xpath('//ul[@id="pins"]/li/a/@href')

    # print(li_ele_list)

    max_list = html_ele.xpath('//nav[@class="navigation pagination"]/div/a/text()')[3]

    # print(max_list)

    for i in range(1,int(max_list)+1):

        z_url = 'http://www./xinggan/list_{}.html/'.format(i)

        # print(z_url)

        response = requests.get(z_url)

        html_ele = etree.HTML(response.text)/

        li_ele_list = html_ele.xpath('//ul[@id="pins"]/li')

        for href_ele in li_ele_list:

            href_url = href_ele.xpath('./a/@href')[0]

            print(href_url)

            name = href_ele.xpath('./span/a/text()')[0]

            print(name)

            b(href_url, name)

        # break

def b(href_url,name):

    if not os.path.exists('/'+name):

        os.makedirs('/'+name)

    headers = {

    'Referer': str(href_url),

    'Upgrade-Insecure-Requests': '1',

    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.84 Safari/537.36',

    }

    # print(headers)

    response = requests.get(href_url,headers=headers)

    html_ele = etree.HTML(response.text)

    # print(html_ele)

    xq_max_list = html_ele.xpath('//div[@class="pagenavi"]/a')[-2]

    # print(xq_max_list)

    max_list = xq_max_list.xpath('./span/text()')[0]

    # print(max_list)

    for i in range(1,int(max_list)):

        xq_url = str(href_url)+'/'+str(i)

        print(xq_url)

        response = requests.get(xq_url,headers = headers)

        html_ele = etree.HTML(response.text)

        src_page = html_ele.xpath('//div[@class="main-image"]/p/a/img/@src')

        src_page = src_page[0]

        print(src_page)

        tname = src_page.split('/')[-1]

        print(tname)

        response = requests.get(src_page, headers=headers)

        with open( '/'+name+'/'+tname,'wb' ) as f:

            f.write(response.content)

if __name__ == '__main__':

    a()


代碼結(jié)束,,效率很高 so easy

    本站是提供個(gè)人知識(shí)管理的網(wǎng)絡(luò)存儲(chǔ)空間,,所有內(nèi)容均由用戶發(fā)布,,不代表本站觀點(diǎn)。請注意甄別內(nèi)容中的聯(lián)系方式,、誘導(dǎo)購買等信息,,謹(jǐn)防詐騙。如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容,,請點(diǎn)擊一鍵舉報(bào),。
    轉(zhuǎn)藏 分享 獻(xiàn)花(0

    0條評論

    發(fā)表

    請遵守用戶 評論公約

    類似文章 更多