urllib操作URL的模块

SRE实战 互联网时代守护先锋,助力企业售后服务体系运筹帷幄!一键直达领取阿里云限量特价优惠。

例:from urllib import request

request.urlopen() 抓取网页函数

例:r = request.urlopen('http://www.baidu.com')

         r.read() 返回页面的HTML全部代码

         r.readline() 返回页面HTML一样代码

         r.readlines() 返回列表形式的HTML全部代码

r.info() 返回当前环境信息

例:Bdpagetype: 1

         Bdqid: 0xdefca0ed00016e4e

         Cache-Control: private

         Content-Type: text/html

         Cxy_all: baidu+430f78810ad19b54eeba51879195387f

         Date: Thu, 22 Nov 2018 03:35:22 GMT

         Expires: Thu, 22 Nov 2018 03:34:29 GMT

         P3p: CP=" OTI DSP COR IVA OUR IND COM "

         Server: BWS/1.1

         Set-Cookie: BAIDUID=22A080159BC1AB4FE37099B28FE016B6:FG=1; expires=Thu, 31-Dec-37 23:5

         5:55 GMT; max-age=2147483647; path=/; domain=.baidu.com

         Set-Cookie: BIDUPSID=22A080159BC1AB4FE37099B28FE016B6; expires=Thu, 31-Dec-37 23:55:55

         GMT; max-age=2147483647; path=/; domain=.baidu.com

         Set-Cookie: PSTM=1542857722; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647;

         path=/; domain=.baidu.com

         Set-Cookie: delPer=0; path=/; domain=.baidu.com

         Set-Cookie: BDSVRTM=0; path=/

         Set-Cookie: BD_HOME=0; path=/

         Set-Cookie: H_PS_PSSID=1467_21100_18559; path=/; domain=.baidu.com

         Vary: Accept-Encoding

         X-Ua-Compatible: IE=Edge,chrome=1

         Connection: close

         Transfer-Encoding: chunked

r.getcode() 返回当前抓取页面的请求状态码

例:200

request.urlretrieve() 抓取页面并写入文件

例:r = request.urlretrieve('http://www.baidu.com',filename='./2.html')

urlretrieve() 会产生缓存文件,如果要清楚缓存使用urlcleanup()

例:r.urlcleanup()

quote() 字符串转换为ASCII编码

例:s = request.quote('http://www.baidu.com')

         print(s)

         http%3A//www.baidu.com

unquote() 反ASCII编码

build_opener().addheaders 属性修改header头

例:from urllib import request as sa

         url = 'https://blog.csdn.net/dQCFKyQDXYm3F8rB0/article/details/84302896'

         headers = ('User-Agent','Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.26 Safari/537.36 Core/1.63.6776.400 QQBrowser/10.3.2601.400')

         dds = sa.build_opener()

         dds.addheaders = [headers]

         d   = dds.open(url).read()

         f   = open('./1.html','wb')

         f.write(d)

         f.close()

扫码关注我们
微信号:SRE实战
拒绝背锅 运筹帷幄