乡下人产国偷v产偷v自拍,国产午夜片在线观看,婷婷成人亚洲综合国产麻豆,久久综合给合久久狠狠狠9

  • <output id="e9wm2"></output>
    <s id="e9wm2"><nobr id="e9wm2"><ins id="e9wm2"></ins></nobr></s>

    • 分享

      Python編程開發(fā)爬蟲抓取www.tmd86.com所有妹子圖片

       昵稱65365553 2019-07-17

      懂點(diǎn)編程的館友都知道Python完善的網(wǎng)絡(luò)接口非常適合開發(fā)爬蟲和AI編程。

      今天分享自動爬取妹子圖片的代碼,不到100行真的超級簡單、快捷。

      代碼開始:

      import requests

      from lxml import etree

      import os

      def a ():

          url = 'http://www./xinggan/'

          response = requests.get(url)

          # with open('.txt' , 'wb' ) as f :

          #     f.write(response.content)

          html_ele = etree.HTML(response.text)

          # li_ele_list = html_ele.xpath('//ul[@id="pins"]/li/a/@href')

          # print(li_ele_list)

          max_list = html_ele.xpath('//nav[@class="navigation pagination"]/div/a/text()')[3]

          # print(max_list)

          for i in range(1,int(max_list)+1):

              z_url = 'http://www./xinggan/list_{}.html/'.format(i)

              # print(z_url)

              response = requests.get(z_url)

              html_ele = etree.HTML(response.text)/

              li_ele_list = html_ele.xpath('//ul[@id="pins"]/li')

              for href_ele in li_ele_list:

                  href_url = href_ele.xpath('./a/@href')[0]

                  print(href_url)

                  name = href_ele.xpath('./span/a/text()')[0]

                  print(name)

                  b(href_url, name)

              # break

      def b(href_url,name):

          if not os.path.exists('/'+name):

              os.makedirs('/'+name)

          headers = {

          'Referer': str(href_url),

          'Upgrade-Insecure-Requests': '1',

          'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.84 Safari/537.36',

          }

          # print(headers)

          response = requests.get(href_url,headers=headers)

          html_ele = etree.HTML(response.text)

          # print(html_ele)

          xq_max_list = html_ele.xpath('//div[@class="pagenavi"]/a')[-2]

          # print(xq_max_list)

          max_list = xq_max_list.xpath('./span/text()')[0]

          # print(max_list)

          for i in range(1,int(max_list)):

              xq_url = str(href_url)+'/'+str(i)

              print(xq_url)

              response = requests.get(xq_url,headers = headers)

              html_ele = etree.HTML(response.text)

              src_page = html_ele.xpath('//div[@class="main-image"]/p/a/img/@src')

              src_page = src_page[0]

              print(src_page)

              tname = src_page.split('/')[-1]

              print(tname)

              response = requests.get(src_page, headers=headers)

              with open( '/'+name+'/'+tname,'wb' ) as f:

                  f.write(response.content)

      if __name__ == '__main__':

          a()


      代碼結(jié)束,效率很高 so easy

        本站是提供個人知識管理的網(wǎng)絡(luò)存儲空間,所有內(nèi)容均由用戶發(fā)布,不代表本站觀點(diǎn)。請注意甄別內(nèi)容中的聯(lián)系方式、誘導(dǎo)購買等信息,謹(jǐn)防詐騙。如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容,請點(diǎn)擊一鍵舉報。
        轉(zhuǎn)藏 分享 獻(xiàn)花(0

        0條評論

        發(fā)表

        請遵守用戶 評論公約

        類似文章 更多