百度360必应搜狗淘宝本站头条
当前位置:网站首页 > 编程字典 > 正文

「Python」Urllib

toyiye 2024-06-21 12:40 11 浏览 0 评论

基本使用

1
2
3
4
5
6
7
8
9
10
11
12
13
14
import urllib.request

url = "https://ssr1.scrape.center/"

# 模拟浏览器向服务器发送请求
response = urllib.request.urlopen(url)

# 获取响应中的页面的源码
# read() 返回的是字节形式的二进制数
# 将二进制数据转换为字符串——解码 decode('编码的格式')
content = response.read().decode('utf-8')

# 打印数据
print(content)

一个类型和三个方法

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
"""  
一个类型和三个方法  
  
Author:binxin  
Date:2023/11/19 18:41  
"""  
import urllib.request  
  
url = "https://ssr1.scrape.center/"  
  
# 模拟浏览器向服务器发送请求  
response = urllib.request.urlopen(url)  
  
# response类型为HTTPResponse  
print(type(response))  
  
# 一个字节一个字节读取  
# content=response.read()  
# print(content)  
  
# 返回5个字节  
# content = response.read(5)  
# print(content)  
  
# 读取一行  
# content = response.readline()  
# print(content)  
  
# 按行读取所有内容  
# content=response.readlines()  
# print(content)  
  
# 返回状态码  
# print(response.getcode())  
  
# 返回url地址  
# print(response.geturl())  
  
# 获取状态信息  
print(response.getheaders())

下载

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
"""
下载

Author:binxin
Date:2023/11/20 20:19
"""
import urllib.request

# 下载网页
# url_page = "https://ssr1.scrape.center/"

# url代表下载路径 filename文件名
# urllib.request.urlretrieve(url_page, 'scrape.html')

# 下载图片
# url_img = 'https://p0.meituan.net/movie/ce4da3e03e655b5b88ed31b5cd7896cf62472.jpg@464w_644h_1e_1c'

# urllib.request.urlretrieve(url_img,'scrape.png')

# 下载视频
url_video = 'https://vd2.bdstatic.com/mda-pkjg3j1629re4z2h/720p/h264/1700480141426778560/mda-pkjg3j1629re4z2h.mp4?v_from_s=hkapp-haokan-hbe&auth_key=1700494480-0-0-98fb46f2a2d69b62592d1344d6ee60b0&bcevod_channel=searchbox_feed&pd=1&cr=2&cd=0&pt=3&logid=2080435259&vid=15565599946852966896&klogid=2080435259&abtest='

urllib.request.urlretrieve(url_video,'bilibili.mp4')

请求对象的定制

  1. UA:User Agent中文名为用户代理,简称 UA,它是一个特殊字符串头,使得服务器能够识别客户使用的操作系统 及版本、CPU 类型、浏览器及版本。浏览器内核、浏览器渲染引擎、浏览器语言、浏览器插件等
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
"""  
请求对象的定制  
  
Author:binxin  
Date:2023/11/20 20:37  
"""  
import urllib.request  
  
url = 'https://www.baidu.com'  
  
headers = {  
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36'  
}  
  
# 请求对象的定制  
request = urllib.request.Request(url=url, headers=headers)  
  
response = urllib.request.urlopen(request)  
  
content = response.read().decode('utf-8')  
  
print(content)

编解码

  1. get请求方式:urllib.parse.quote()
  2. 1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    """
    get请求的quote方法

    Author:binxin
    Date:2023/11/21 11:10
    """
    import urllib.request
    import urllib.parse

    # 获取 https://www.bing.com/search?q=周杰伦 网页源码
    url = 'https://www.bing.com/search?q='

    headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36'
    }

    # 将 周杰伦 变为unicode编码格式
    # 依赖于 urllib.prasename = urllib.parse.quote('周杰伦')

    url = url + name

    request = urllib.request.Request(url=url, headers=headers)
    response = urllib.request.urlopen(request)

    # 获取响应内容
    content = response.read().decode('utf-8')
    print(content)
  3. get请求方式:urllib.parse.urlencode()
  4. 1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    """
    get请求urlencode方法

    Author:binxin
    Date:2023/11/21 11:26
    """
    import urllib.parse
    import urllib.request

    # 应用场景:多个参数
    # https://www.baidu.com/s?wd=周杰伦&sex=男

    # data = {
    # 'wd': '周杰伦',
    # 'sex': '男',
    # 'location': '中国台湾省'
    # }
    #
    # a = urllib.parse.urlencode(data)
    #
    # print(a)

    # 获取网页源码
    url = 'https://www.baidu.com/s?'

    data = {
    'wd': '周杰伦',
    'sex': '男',
    'location': '中国台湾省'
    }

    data = urllib.parse.urlencode(data)

    url = url + data

    headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36'
    }

    request = urllib.request.Request(url=url, headers=headers)
    response = urllib.request.urlopen(request)
    content = response.read().decode('utf-8')
    print(content)

  5. post请求方式
  6. 1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    """
    post请求

    Author:binxin
    Date:2023/11/21 16:26
    """
    import urllib.request
    import urllib.parse

    url = 'https://fanyi.baidu.com/sug'

    headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36'
    }

    data = {
    'kw': 'spider'
    }

    # post请求的参数 必须要编码
    data = urllib.parse.urlencode(data).encode('utf-8')

    # post请求的参数需要放在请求对象定制的参数中
    request = urllib.request.Request(url=url, data=data, headers=headers)

    response = urllib.request.urlopen(request)
    content = response.read().decode('utf-8')

    # 字符串 --> json
    import json

    obj = json.loads(content)
    print(obj)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
"""
百度详细翻译

Author:binxin
Date:2023/11/21 16:40
"""
import urllib.request
import urllib.parse

url = 'https://fanyi.baidu.com/v2transapi?from=en&to=zh'

headers = {
    # 'Accept': '*/*',

    # 该行必须注释
    # 'Accept-Encoding': 'gzip, deflate, br',

    # 'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6',
    # 'Acs-Token': '1700554896797_1700555095251_haVFbhHVYodef+ABK34At5fE26z1GIBgK/5G/9f1yFzBA9YZ1QBKmV3/5uvNRmnNbLoH37UeOdciaPkcXzg5nmb1SdilOIoDEuDPRQ2slMYnAf3HxoiAcNv87L9UmtSu32wpt1gf6xcHChw9O+cWpCvdMwz7i/VaxfVqHVw48APnJ3YDedj4kLU2Jb2zKS82NpTXClm9oWXH0KQKQCjFl/91IfozQPxc6FWDsbnlmYi53NwunPzGIbuKLED+FcoLVE5yHDnClCH4SpyoC1oy03PPeEPUPxjP4ZBnaIro9NVDwqkcbWs1zuaMjwtXtQjXfKBDQtFoH3+a1KQgm2Z1BskaL1D85Z8FmrQvl+p8fjm9FZc/TM49BFjN3vrgzy+orf1Bk5PtTZFpYqMQ/qyEkLFNF5NV/BTdOzXhYB5If++Zth5IdGEiewM8xmLbikmnHuQG/2MFze06MR2wTNLM1ddbsHGXduBL1SZT1BH3Nhs=',
    # 'Connection': 'keep-alive',
    # 'Content-Length': '153',
    # 'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
    'Cookie': 'REALTIME_TRANS_SWITCH=1; FANYI_WORD_SWITCH=1; HISTORY_SWITCH=1; SOUND_SPD_SWITCH=1; SOUND_PREFER_SWITCH=1; PSTM=1690027568; ZFY=hJHsAJcm:BzM1AXD9a0vLfFZGJrzgW2kpVMqv0v6Ps1o:C; H_WISE_SIDS=234020_216844_213353_214793_110085_244716_257731_257015_260234_253022_259300_261715_236312_256419_265302_265881_266361_265776_267288_267371_266846_267421_265615_267405_265986_256302_266188_267898_259033_266713_268406_268593_268030_268842_259643_269232_269388_268766_188333_269730_269832_269904_269803_269049_267066_256739_270460_270534_267528_270625_270664_270548_270922_270966_271039_268874_270793_271169_271175_271193_268728_269771_267782_268987_269034_271229_269621_267659_271319_265032_269892_266027_270482_269609_270102_271608_270876_270443_269785_270157_271671_271985_271813_271957_271954_271943_256151_269211_234295_234207_266324_271187_272225_270055_272279_263618_267596_272055_272366_272008_272337_267559_272460_271145_8000076_8000108_8000124_8000136_8000159_8000164_8000168_8000177_8000179_8000186_8000203; BAIDU_WISE_UID=wapp_1692517164729_638; __bid_n=18a11e266418064ed3a010; Hm_lvt_64ecd82404c51e03dc91cb9e8c025574=1692497260,1692755924,1693024343,1693128438; H_WISE_SIDS_BFESS=234020_216844_213353_214793_110085_244716_257731_257015_260234_253022_259300_261715_236312_256419_265302_265881_266361_265776_267288_267371_266846_267421_265615_267405_265986_256302_266188_267898_259033_266713_268406_268593_268030_268842_259643_269232_269388_268766_188333_269730_269832_269904_269803_269049_267066_256739_270460_270534_267528_270625_270664_270548_270922_270966_271039_268874_270793_271169_271175_271193_268728_269771_267782_268987_269034_271229_269621_267659_271319_265032_269892_266027_270482_269609_270102_271608_270876_270443_269785_270157_271671_271985_271813_271957_271954_271943_256151_269211_234295_234207_266324_271187_272225_270055_272279_263618_267596_272055_272366_272008_272337_267559_272460_271145_8000076_8000108_8000124_8000136_8000159_8000164_8000168_8000177_8000179_8000186_8000203; APPGUIDE_10_6_5=1; APPGUIDE_10_6_6=1; BAIDUID_BFESS=EBC2B0F02DE54CA945DEC2A522C58DC0:FG=1; APPGUIDE_10_6_7=1; APPGUIDE_10_6_9=1; BDUSS=VBqWS16dlBxNHBaQjA4c04yM2Z3S1hJVEp2MGY4YWpCenZ0flZMVnZwSlZobjFsRVFBQUFBJCQAAAAAAQAAAAEAAAA1wVhn1r3Uwsz9xM~J-QAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFX5VWVV-VVlU; BDUSS_BFESS=VBqWS16dlBxNHBaQjA4c04yM2Z3S1hJVEp2MGY4YWpCenZ0flZMVnZwSlZobjFsRVFBQUFBJCQAAAAAAQAAAAEAAAA1wVhn1r3Uwsz9xM~J-QAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFX5VWVV-VVlU; BIDUPSID=EBC2B0F02DE54CA945DEC2A522C58DC0; H_PS_PSSID=39669_39663_39676_39678_39710_39713_39749_39674_39785_39703_39793_39682; BA_HECTOR=248l252k80ak00240l2080051ilmkii1r; RT="z=1&dm=baidu.com&si=82643ae1-aaab-49e1-b3e4-a3c61e4bb037&ss=lp6w2pjo&sl=1&tt=1s7&bcn=https%3A%2F%2Ffclog.baidu.com%2Flog%2Fweirwood%3Ftype%3Dperf&ld=2ui&ul=smo&hd=sn6"; ab_sr=1.0.1_YTNiYWUxNWM0YmU5MWRlMDdjNWY2MThlOTA0NDEzNmEwM2FhNTFkNzRiYzVkMjI4YTdjNjI5MTU5OWZlNzk4ZDU3NmViNmMzMjhlNTk2ZTI0ZDUzMTQzMTQzZTJiYWNiODBmOTVkYzVkOGQ1NWY1MGY2NDNlNTBmYzk4Njg1OWU5Y2IyZTA2OWRmYjQ4MjRhYWM2MWFiN2FkYTRhYjM5Y2NjMmE1NmYwMzFiMTgxNGQ1YjdjMGEwYzczZWU2NWMy',
    # 'Dnt': '1',
    # 'Host': 'fanyi.baidu.com',
    # 'Origin': 'https://fanyi.baidu.com',
    # 'Referer': 'https://fanyi.baidu.com/',
    # 'Sec-Ch-Ua': '"Not_A Brand";v="8", "Chromium";v="120", "Microsoft Edge";v="120"',
    # 'Sec-Ch-Ua-Mobile': '?0',
    # 'Sec-Ch-Ua-Platform': 'Windows',
    # 'Sec-Fetch-Dest': 'empty',
    # 'Sec-Fetch-Mode': 'cors',
    # 'Sec-Fetch-Site': 'same-origin',
    # 'Sec-Gpc': '1',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 Edg/120.0.0.0',
    # 'X-Requested-With': 'XMLHttpRequest'
}

data = {
    'from': 'en',
    'to': 'zh',
    'query': 'spider',
    'transtype': 'realtime',
    'simple_means_flag': '3',
    'sign': '63766.268839',
    'token': 'ae16933c30637316aa2381165ae3e29a',
    'domain': 'common',
    'ts': '1700555095216'
}

data = urllib.parse.urlencode(data).encode('utf-8')

request = urllib.request.Request(url=url, data=data, headers=headers)

response = urllib.request.urlopen(request)
content = response.read().decode('utf-8')

import json

obj = json.loads(content)
print(obj)

ajax的get请求

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
"""
ajax get请求
Author:binxin
Date:2023/11/23 19:16
"""
import urllib.request

# 获取豆瓣电影第一页的数据,并且保存起来
url = "https://movie.douban.com/j/chart/top_list?type=5&interval_id=100%3A90&action=&start=0&limit=20"

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36'
}

request = urllib.request.Request(url=url, headers=headers)

response = urllib.request.urlopen(request)

content = response.read().decode('utf-8')

# 数据下载到本地
# fp = open('douban.json', 'w', encoding='utf-8')
# fp.write(content)

with open('douban1.json', 'w', encoding='utf-8') as fp:
    fp.write(content)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
"""
豆瓣电影前十页

Author:binxin
Date:2023/11/23 19:30
"""
import urllib.parse
import urllib.request


# 下载豆瓣电影前十页

def create_request(page):
    base_url = "https://movie.douban.com/j/chart/top_list?type=5&interval_id=100%3A90&action=&"

    data = {
        'start': (page - 1) * 20,
        'limit': 20
    }

    data = urllib.parse.urlencode(data)

    url = base_url + data

    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36'
    }

    request = urllib.request.Request(url=url, headers=headers)

    return request


def get_content(request):
    response = urllib.request.urlopen(request)
    content = response.read().decode('utf-8')
    return content


def down_load(page, content):
    with open(f'douban{page}.json', 'w', encoding='utf-8') as fp:
        fp.write(content)


if __name__ == '__main__':
    start_page = int(input("起始页码:"))
    end_page = int(input("结束页码:"))
    for page in range(start_page, end_page + 1):
        #         每一页都有自己的请求
        request = create_request(page)
        content = get_content(request)
        down_load(page, content)

ajax的post请求

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
"""
ajax post

Author:binxin
Date:2023/11/23 20:15
"""

import urllib.request
import urllib.parse


def create_request(page):
    base_url = "https://www.kfc.com.cn/kfccda/ashx/GetStoreList.ashx?op=cname"

    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36'
    }

    data = {
        'cname': '北京',
        'pid': '',
        'pageIndex': page,
        'pageSize': '10',
    }

    data = urllib.parse.urlencode(data).encode('utf-8')

    request = urllib.request.Request(url=base_url, data=data, headers=headers)

    return request


def get_content(request):
    response = urllib.request.urlopen(request)

    content = response.read().decode('utf-8')

    return content


def down_load(page, content):
    with open(f'kfc{page}.json', 'w', encoding='utf-8') as fp:
        fp.write(content)


if __name__ == '__main__':
    start_page = int(input("起始页码:"))
    end_page = int(input("结束页码:"))
    for page in range(start_page, end_page + 1):
        request = create_request(page)
        content = get_content(request)

        down_load(page, content)

URLError\HTTPError

  1. HTTPError类是URLError类的子类
  2. 导入的包urllib.error.HTTPError urllib.error.URLError
  3. http错误:http错误是针对浏览器无法连接到服务器而增加出来的错误提示。引导并告诉浏览者该页是哪里出了问题
  4. 通过urllib发送请求的时候,有可能会发送失败,这个时候如果想让你的代码更加的健壮,可以通过try-except进行捕获异常,异常有两类,URLError\HTTPError
  5. 1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    """
    URLError/HTTPError
    Author:binxin
    Date:2023/11/23 20:40
    """
    import urllib.request
    import urllib.error

    # url = "https://blog.csdn.net/qq_43546721/article/details/1340030121"
    url = 'https://www.goudan111.com'
    headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36'
    }

    try:
    request = urllib.request.Request(url=url, headers=headers)

    response = urllib.request.urlopen(request)

    content = response.read().decode('utf-8')

    print(content)
    except urllib.error.HTTPError:
    print('系统正在升级...')
    except urllib.error.URLError:
    print('系统升级')

Cooking登录

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
"""
cooking登录
在数据采集需要绕过登录

Author:binxin
Date:2023/11/23 20:51
"""
import urllib.request

url = "https://m.weibo.cn/profile/7844546355"

headers = {
    'Accept': 'application/json, text/plain, */*',
    # 'Accept-Encoding': 'gzip, deflate, br',
    'Accept-Language': 'zh-CN,zh;q=0.9',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36',
    'Cookie': 'WEIBOCN_FROM=1110006030; SUB=_2A25IWztpDeRhGeFG71YU9CjPzjmIHXVrGTKhrDV6PUJbkdANLUHckW1NeW0UMmSkDnXteuMWZ6_P3Hrnm486Vsys; MLOGIN=1; _T_WM=60448710269; M_WEIBOCN_PARAMS=lfid%3D102803%26luicode%3D20000174%26uicode%3D20000174; XSRF-TOKEN=fd466d; mweibo_short_token=5c3125a1a8',
    'Mweibo-Pwa': '1',
    'Referer': 'https://m.weibo.cn/profile/7844546355',
    'Sec-Ch-Ua': '"Google Chrome";v="119", "Chromium";v="119", "Not?A_Brand";v="24"',
    'Sec-Ch-Ua-Mobile': '?0',
    'Sec-Ch-Ua-Platform': 'Windows',
    'Sec-Fetch-Dest': 'empty',
    'Sec-Fetch-Mode': 'cors',
    'Sec-Fetch-Site': 'same-origin'
}

request = urllib.request.Request(url=url, headers=headers)

response = urllib.request.urlopen(request)

content = response.read().decode('utf-8')

with open('weibo.html', 'w', encoding='utf-8') as fp:
    fp.write(content)

Handler处理器

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
"""
Handler处理器

Author:binxin
Date:2023/11/24 14:16
"""
import urllib.request

url = "https://www.baidu.com"

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36'
}

request = urllib.request.Request(url=url, headers=headers)

# 获取handler对象
handler = urllib.request.HTTPSHandler()

# 获取opener对象
opener = urllib.request.build_opener(handler)

# 调用open方法
response = opener.open(request)
content = response.read().decode('utf-8')
print(content)

代理服务器

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
"""
代理服务器

Author:binxin
Date:2023/11/24 14:25
"""
import urllib.request
import urllib.parse

url = 'https://www.baidu.com/s?wd=ip'

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36'
}

request = urllib.request.Request(url=url, headers=headers)

# response = urllib.request.urlopen(request)

# 可在快代理使用免费ip
proxies = {
    'http': '121.226.89.230:20516'
}

handler = urllib.request.ProxyHandler(proxies=proxies)

opener = urllib.request.build_opener(handler)

response = opener.open(request)

content = response.read().decode('utf-8')

with open('daili.html', 'w', encoding='utf-8') as fp:
    fp.write(content)

代理池

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
"""
代理池

Author:binxin
Date:2023/11/24 14:47
"""
import random
import urllib.request

proxies_pool = [
    {'http': '42.249.189.41:17666'},
    {'http': '27.154.221.103:19542'}
]

proxies = random.choice(proxies_pool)

url = 'http://www.baidu.com/s?wd=ip'

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36'
}

request = urllib.request.Request(url=url, headers=headers)

handler = urllib.request.ProxyHandler(proxies=proxies)

opener = urllib.request.build_opener(handler)

response = opener.open(request)

content = response.read().decode('utf-8')

with open('daili.html', 'w', encoding='utf-8') as fp:
    fp.write(content)

相关推荐

为何越来越多的编程语言使用JSON(为什么编程)

JSON是JavascriptObjectNotation的缩写,意思是Javascript对象表示法,是一种易于人类阅读和对编程友好的文本数据传递方法,是JavaScript语言规范定义的一个子...

何时在数据库中使用 JSON(数据库用json格式存储)

在本文中,您将了解何时应考虑将JSON数据类型添加到表中以及何时应避免使用它们。每天?分享?最新?软件?开发?,Devops,敏捷?,测试?以及?项目?管理?最新?,最热门?的?文章?,每天?花?...

MySQL 从零开始:05 数据类型(mysql数据类型有哪些,并举例)

前面的讲解中已经接触到了表的创建,表的创建是对字段的声明,比如:上述语句声明了字段的名称、类型、所占空间、默认值和是否可以为空等信息。其中的int、varchar、char和decimal都...

JSON对象花样进阶(json格式对象)

一、引言在现代Web开发中,JSON(JavaScriptObjectNotation)已经成为数据交换的标准格式。无论是从前端向后端发送数据,还是从后端接收数据,JSON都是不可或缺的一部分。...

深入理解 JSON 和 Form-data(json和formdata提交区别)

在讨论现代网络开发与API设计的语境下,理解客户端和服务器间如何有效且可靠地交换数据变得尤为关键。这里,特别值得关注的是两种主流数据格式:...

JSON 语法(json 语法 priority)

JSON语法是JavaScript语法的子集。JSON语法规则JSON语法是JavaScript对象表示法语法的子集。数据在名称/值对中数据由逗号分隔花括号保存对象方括号保存数组JS...

JSON语法详解(json的语法规则)

JSON语法规则JSON语法是JavaScript对象表示法语法的子集。数据在名称/值对中数据由逗号分隔大括号保存对象中括号保存数组注意:json的key是字符串,且必须是双引号,不能是单引号...

MySQL JSON数据类型操作(mysql的json)

概述mysql自5.7.8版本开始,就支持了json结构的数据存储和查询,这表明了mysql也在不断的学习和增加nosql数据库的有点。但mysql毕竟是关系型数据库,在处理json这种非结构化的数据...

JSON的数据模式(json数据格式示例)

像XML模式一样,JSON数据格式也有Schema,这是一个基于JSON格式的规范。JSON模式也以JSON格式编写。它用于验证JSON数据。JSON模式示例以下代码显示了基本的JSON模式。{"...

前端学习——JSON格式详解(后端json格式)

JSON(JavaScriptObjectNotation)是一种轻量级的数据交换格式。易于人阅读和编写。同时也易于机器解析和生成。它基于JavaScriptProgrammingLa...

什么是 JSON:详解 JSON 及其优势(什么叫json)

现在程序员还有谁不知道JSON吗?无论对于前端还是后端,JSON都是一种常见的数据格式。那么JSON到底是什么呢?JSON的定义...

PostgreSQL JSON 类型:处理结构化数据

PostgreSQL提供JSON类型,以存储结构化数据。JSON是一种开放的数据格式,可用于存储各种类型的值。什么是JSON类型?JSON类型表示JSON(JavaScriptO...

JavaScript:JSON、三种包装类(javascript 包)

JOSN:我们希望可以将一个对象在不同的语言中进行传递,以达到通信的目的,最佳方式就是将一个对象转换为字符串的形式JSON(JavaScriptObjectNotation)-JS的对象表示法...

Python数据分析 只要1分钟 教你玩转JSON 全程干货

Json简介:Json,全名JavaScriptObjectNotation,JSON(JavaScriptObjectNotation(记号、标记))是一种轻量级的数据交换格式。它基于J...

比较一下JSON与XML两种数据格式?(json和xml哪个好)

JSON(JavaScriptObjectNotation)和XML(eXtensibleMarkupLanguage)是在日常开发中比较常用的两种数据格式,它们主要的作用就是用来进行数据的传...

取消回复欢迎 发表评论:

请填写验证码