乡下人产国偷v产偷v自拍,国产午夜片在线观看,婷婷成人亚洲综合国产麻豆,久久综合给合久久狠狠狠9

  • <output id="e9wm2"></output>
    <s id="e9wm2"><nobr id="e9wm2"><ins id="e9wm2"></ins></nobr></s>

    • 分享

      安裝python爬蟲scrapy踩過的那些坑和編程外的思考

       icecity1306 2016-01-14

      這些天應(yīng)朋友的要求抓取某個(gè)論壇帖子的信息,網(wǎng)上搜索了一下開源的爬蟲資料,看了許多對(duì)于開源爬蟲的比較發(fā)現(xiàn)開源爬蟲scrapy比較好用。但是以前一直用的java和php,對(duì)python不熟悉,于是花一天時(shí)間粗略了解了一遍python的基礎(chǔ)知識(shí)。于是就開干了,沒想到的配置一個(gè)運(yùn)行環(huán)境就花了我一天時(shí)間。下面記錄下安裝和配置scrapy踩過的那些坑吧。

      運(yùn)行環(huán)境:CentOS 6.0 虛擬機(jī)

      開始上來(lái)先得安裝python運(yùn)行環(huán)境。然而我運(yùn)行了一下python命令,發(fā)現(xiàn)已經(jīng)自帶了,竊(大)喜(坑)。于是google搜索了一下安裝步驟,pip install Scrapy直接安裝,發(fā)現(xiàn)不對(duì)。少了pip,于是安裝pip。再次pip install Scrapy,發(fā)現(xiàn)少了python-devel,于是這么來(lái)回折騰了一上午。后來(lái)下載了scrapy的源碼安裝,突然曝出一個(gè)需要python2.7版本,再通過python --version查看,一個(gè)2.6映入眼前;頓時(shí)千萬(wàn)個(gè)草泥馬在心中奔騰。

      于是查看了官方文檔(http://doc./en/master/intro/install.html),果然是要python2.7。沒辦法,只能升級(jí)python的版本了。

      1、升級(jí)python

      • 下載python2.7并安裝
      #wget https://www./ftp/python/2.7.10/Python-2.7.10.tgz
      #tar -zxvf Python-2.7.10.tgz
      #cd Python-2.7.10
      #./configure  
      #make all             
      #make install  
      #make clean  
      #make distclean
      • 檢查python版本
      #python --version

      發(fā)現(xiàn)還是2.6

      • 更改python命令指向
      #mv /usr/bin/python /usr/bin/python2.6.6_bak
      #ln -s /usr/local/bin/python2.7 /usr/bin/python
      • 再次檢查版本
      # python --version
      Python 2.7.10

      到這里,python算是升級(jí)完成了,繼續(xù)安裝scrapy。于是pip install scrapy,還是報(bào)錯(cuò)。

      Collecting Twisted>=10.0.0 (from scrapy)
        Could not find a version that satisfies the requirement Twisted>=10.0.0 (from scrapy) (from versions: )
      No matching distribution found for Twisted>=10.0.0 (from scrapy)

      少了 Twisted,于是安裝 Twisted

      2、安裝Twisted

      • 下載Twisted(https://pypi./packages/source/T/Twisted/Twisted-15.2.1.tar.bz2#md5=4be066a899c714e18af1ecfcb01cfef7)
      • 安裝
      cd Twisted-15.2.1
      python setup.py install
      
      • 查看是否安裝成功
      python
      Python 2.7.10 (default, Jun  5 2015, 17:56:24) 
      [GCC 4.4.4 20100726 (Red Hat 4.4.4-13)] on linux2
      Type "help", "copyright", "credits" or "license" for more information.
      >>> import twisted
      >>>

      此時(shí)索命twisted已經(jīng)安裝成功。于是繼續(xù)pip install scrapy,還是報(bào)錯(cuò)。

      3、安裝libxlst、libxml2和xslt-config

      Collecting libxlst
        Could not find a version that satisfies the requirement libxlst (from versions: )
      No matching distribution found for libxlst
      Collecting libxml2
        Could not find a version that satisfies the requirement libxml2 (from versions: )
      No matching distribution found for libxml2
      wget http://xmlsoft.org/sources/libxslt-1.1.28.tar.gz
      cd libxslt-1.1.28/
      ./configure
      make
      make install
      wget ftp://xmlsoft.org/libxml2/libxml2-git-snapshot.tar.gz
      cd libxml2-2.9.2/
      ./configure
      make
      make install

      安裝好以后繼續(xù)pip install scrapy,幸運(yùn)之星仍未降臨

      4、安裝cryptography

      Failed building wheel for cryptography

      下載cryptography(https://pypi./packages/source/c/cryptography/cryptography-0.4.tar.gz)

      安裝

      cd cryptography-0.4
      python setup.py build
      python setup.py install

      發(fā)現(xiàn)安裝的時(shí)候報(bào)錯(cuò):

      No package 'libffi' found

      于是下載libffi下載并安裝

      wget ftp://sourceware.org/pub/libffi/libffi-3.2.1.tar.gz
      cd libffi-3.2.1
      make
      make install

      安裝后發(fā)現(xiàn)仍然報(bào)錯(cuò)

      Package libffi was not found in the pkg-config search path.
          Perhaps you should add the directory containing `libffi.pc'
          to the PKG_CONFIG_PATH environment variable
          No package 'libffi' found

      于是設(shè)置:PKG_CONFIG_PATH

      export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:$PKG_CONFIG_PATH

      再次安裝scrapy

      pip install scrapy

      幸運(yùn)女神都去哪兒了?

      ImportError: libffi.so.6: cannot open shared object file: No such file or directory

      于是

      whereis libffi
      libffi: /usr/local/lib/libffi.a /usr/local/lib/libffi.la /usr/local/lib/libffi.so

      已經(jīng)正常安裝,網(wǎng)上搜索了一通,發(fā)現(xiàn)是LD_LIBRARY_PATH沒設(shè)置,于是

      export LD_LIBRARY_PATH=/usr/local/lib

      于是繼續(xù)安裝cryptography-0.4

      ./configure
      make
      make install

      此時(shí)正確安裝,沒有報(bào)錯(cuò)信息了。

      5、繼續(xù)安裝scrapy

      pip install scrapy

      看著提示信息:

      Building wheels for collected packages: cryptography
        Running setup.py bdist_wheel for cryptography

      在這里停了好久,在想幸運(yùn)女神是不是到了。等了一會(huì)

      Requirement already satisfied (use --upgrade to upgrade): zope.interface>=3.6.0 in /usr/local/lib/python2.7/site-packages/zope.interface-4.1.2-py2.7-linux-i686.egg (from Twisted>=10.0.0->scrapy)
      Collecting cryptography>=0.7 (from pyOpenSSL->scrapy)
        Using cached cryptography-0.9.tar.gz
      Requirement already satisfied (use --upgrade to upgrade): setuptools in /usr/local/lib/python2.7/site-packages (from zope.interface>=3.6.0->Twisted>=10.0.0->scrapy)
      Requirement already satisfied (use --upgrade to upgrade): idna in /usr/local/lib/python2.7/site-packages (from cryptography>=0.7->pyOpenSSL->scrapy)
      Requirement already satisfied (use --upgrade to upgrade): pyasn1 in /usr/local/lib/python2.7/site-packages (from cryptography>=0.7->pyOpenSSL->scrapy)
      Requirement already satisfied (use --upgrade to upgrade): enum34 in /usr/local/lib/python2.7/site-packages (from cryptography>=0.7->pyOpenSSL->scrapy)
      Requirement already satisfied (use --upgrade to upgrade): ipaddress in /usr/local/lib/python2.7/site-packages (from cryptography>=0.7->pyOpenSSL->scrapy)
      Requirement already satisfied (use --upgrade to upgrade): cffi>=0.8 in /usr/local/lib/python2.7/site-packages (from cryptography>=0.7->pyOpenSSL->scrapy)
      Requirement already satisfied (use --upgrade to upgrade): ordereddict in /usr/local/lib/python2.7/site-packages (from enum34->cryptography>=0.7->pyOpenSSL->scrapy)
      Requirement already satisfied (use --upgrade to upgrade): pycparser in /usr/local/lib/python2.7/site-packages (from cffi>=0.8->cryptography>=0.7->pyOpenSSL->scrapy)
      Building wheels for collected packages: cryptography
        Running setup.py bdist_wheel for cryptography
        Stored in directory: /root/.cache/pip/wheels/d7/64/02/7258f08eae0b9c930c04209959c9a0794b9729c2b64258117e
      Successfully built cryptography
      Installing collected packages: cryptography
        Found existing installation: cryptography 0.4
          Uninstalling cryptography-0.4:
            Successfully uninstalled cryptography-0.4
      Successfully installed cryptography-0.9

      顯示如此的信息??吹酱丝蹋瑑?nèi)流馬面。謝謝CCAV,感謝MTV,釣魚島是中國(guó)的。終于安裝成功了。

      6、測(cè)試scrapy

      創(chuàng)建測(cè)試腳本

      cat > myspider.py <<EOF
      from scrapy import Spider, Item, Field
      class Post(Item):
        title = Field()
      class BlogSpider(Spider):
        name, start_urls = 'blogspider', ['http://www.cnblogs.com/rwxwsblog/']
        def parse(self, response):
          return [Post(title=e.extract()) for e in response.css("h2 a::text")]
      EOF
      

      測(cè)試腳本能否正常運(yùn)行

      scrapy runspider myspider.py
      2015-06-06 20:25:16 [scrapy] INFO: Scrapy 1.0.0rc2 started (bot: scrapybot)
      2015-06-06 20:25:16 [scrapy] INFO: Optional features available: ssl, http11
      2015-06-06 20:25:16 [scrapy] INFO: Overridden settings: {}
      2015-06-06 20:25:16 [py.warnings] WARNING: :0: UserWarning: You do not have a working installation of the service_identity module: 'No module named service_identity'.  Please install it from <https://pypi./pypi/service_identity> and make sure all of its dependencies are satisfied.  Without the service_identity module and a recent enough pyOpenSSL to support it, Twisted can perform only rudimentary TLS client hostname verification.  Many valid certificate/hostname mappings may be rejected.
      
      2015-06-06 20:25:16 [scrapy] INFO: Enabled extensions: CloseSpider, TelnetConsole, LogStats, CoreStats, SpiderState
      2015-06-06 20:25:16 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
      2015-06-06 20:25:16 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
      2015-06-06 20:25:16 [scrapy] INFO: Enabled item pipelines: 
      2015-06-06 20:25:16 [scrapy] INFO: Spider opened
      2015-06-06 20:25:16 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
      2015-06-06 20:25:16 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
      2015-06-06 20:25:17 [scrapy] DEBUG: Crawled (200) <GET http://www.cnblogs.com/rwxwsblog/> (referer: None)
      2015-06-06 20:25:17 [scrapy] INFO: Closing spider (finished)
      2015-06-06 20:25:17 [scrapy] INFO: Dumping Scrapy stats:
      {'downloader/request_bytes': 226,
       'downloader/request_count': 1,
       'downloader/request_method_count/GET': 1,
       'downloader/response_bytes': 5383,
       'downloader/response_count': 1,
       'downloader/response_status_count/200': 1,
       'finish_reason': 'finished',
       'finish_time': datetime.datetime(2015, 6, 6, 12, 25, 17, 310084),
       'log_count/DEBUG': 2,
       'log_count/INFO': 7,
       'log_count/WARNING': 1,
       'response_received_count': 1,
       'scheduler/dequeued': 1,
       'scheduler/dequeued/memory': 1,
       'scheduler/enqueued': 1,
       'scheduler/enqueued/memory': 1,
       'start_time': datetime.datetime(2015, 6, 6, 12, 25, 16, 863599)}
      2015-06-06 20:25:17 [scrapy] INFO: Spider closed (finished)

      運(yùn)行正常(此時(shí)心中竊喜,^_^....)。

      7、創(chuàng)建自己的scrapy項(xiàng)目(此時(shí)換了一個(gè)會(huì)話)

      scrapy startproject tutorial

      輸出以下信息

      Traceback (most recent call last):
        File "/usr/local/bin/scrapy", line 9, in <module>
          load_entry_point('Scrapy==1.0.0rc2', 'console_scripts', 'scrapy')()
        File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 552, in load_entry_point
          return get_distribution(dist).load_entry_point(group, name)
        File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2672, in load_entry_point
          return ep.load()
        File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2345, in load
          return self.resolve()
        File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2351, in resolve
          module = __import__(self.module_name, fromlist=['__name__'], level=0)
        File "/usr/local/lib/python2.7/site-packages/Scrapy-1.0.0rc2-py2.7.egg/scrapy/__init__.py", line 48, in <module>
          from scrapy.spiders import Spider
        File "/usr/local/lib/python2.7/site-packages/Scrapy-1.0.0rc2-py2.7.egg/scrapy/spiders/__init__.py", line 10, in <module>
          from scrapy.http import Request
        File "/usr/local/lib/python2.7/site-packages/Scrapy-1.0.0rc2-py2.7.egg/scrapy/http/__init__.py", line 11, in <module>
          from scrapy.http.request.form import FormRequest
        File "/usr/local/lib/python2.7/site-packages/Scrapy-1.0.0rc2-py2.7.egg/scrapy/http/request/form.py", line 9, in <module>
          import lxml.html
        File "/usr/local/lib/python2.7/site-packages/lxml/html/__init__.py", line 42, in <module>
          from lxml import etree
      ImportError: /usr/lib/libxml2.so.2: version `LIBXML2_2.9.0' not found (required by /usr/local/lib/python2.7/site-packages/lxml/etree.so)

      心中無(wú)數(shù)個(gè)草泥馬再次狂奔。怎么又不行了?難道會(huì)變戲法?定定神看了下:ImportError: /usr/lib/libxml2.so.2: version `LIBXML2_2.9.0' not found (required by /usr/local/lib/python2.7/site-packages/lxml/etree.so)。這是那樣的熟悉呀!想了想,這怎么和前面的ImportError: libffi.so.6: cannot open shared object file: No such file or directory那么類似呢?于是

      8、添加環(huán)境變量

      export LD_LIBRARY_PATH=/usr/local/lib

      再次運(yùn)行:

      scrapy startproject tutorial

      輸出以下信息:

      [root@bogon scrapy]# scrapy startproject tutorial
      2015-06-06 20:35:43 [scrapy] INFO: Scrapy 1.0.0rc2 started (bot: scrapybot)
      2015-06-06 20:35:43 [scrapy] INFO: Optional features available: ssl, http11
      2015-06-06 20:35:43 [scrapy] INFO: Overridden settings: {}
      New Scrapy project 'tutorial' created in:
          /root/scrapy/tutorial
      
      You can start your first spider with:
          cd tutorial
          scrapy genspider example example.com

      尼瑪?shù)慕K于成功了。由此可見,scrapy運(yùn)行的時(shí)候需要 LD_LIBRARY_PATH 環(huán)境變量的支持??梢钥紤]將其加入環(huán)境變量中

      vi /etc/profile

      添加:export LD_LIBRARY_PATH=/usr/local/lib 這行(前面的PKG_CONFIG_PATH也可以考慮添加進(jìn)來(lái),export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:$PKG_CONFIG_PATH)

      保存后檢查是否存在異常:

      source /etc/profile

      開一個(gè)新的會(huì)話運(yùn)行

      scrapy runspider myspider.py

      發(fā)現(xiàn)正常運(yùn)行,可見LD_LIBRARY_PATH是生效的。至此scrapy就算正式安裝成功了。

      查看scrapy版本:運(yùn)行scrapy version,看了下scrapy的版本為“Scrapy 1.0.0rc2”

      9、編程外的思考(感謝閱讀到此的你,我自己都有點(diǎn)暈了。)

        • 有沒有更好的安裝方式呢?我的這種安裝方式是否有問題?有的話請(qǐng)告訴我。(很多依賴包我采用pip和easy_install都無(wú)法安裝,感覺是pip配置文件配置源的問題)
        • 一定要看官方的文檔,Google和百度出來(lái)的結(jié)果往往是碎片化的,不全面。這樣可以少走很多彎路,減少不必要的工作量。
        • 遇到的問題要先思考,想想是什么問題再Google和百度。
        • 解決問題要形成文檔,方便自己也方便別人。

      10、參考文檔

      http:///

      http://doc./en/master/

      http://blog.csdn.net/slvher/article/details/42346887

      http://blog.csdn.net/niying/article/details/27103081

      http://www.cnblogs.com/xiaoruoen/archive/2013/02/27/2933854.html

        本站是提供個(gè)人知識(shí)管理的網(wǎng)絡(luò)存儲(chǔ)空間,所有內(nèi)容均由用戶發(fā)布,不代表本站觀點(diǎn)。請(qǐng)注意甄別內(nèi)容中的聯(lián)系方式、誘導(dǎo)購(gòu)買等信息,謹(jǐn)防詐騙。如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容,請(qǐng)點(diǎn)擊一鍵舉報(bào)。
        轉(zhuǎn)藏 分享 獻(xiàn)花(0

        0條評(píng)論

        發(fā)表

        請(qǐng)遵守用戶 評(píng)論公約

        類似文章 更多