去年通过群友Carry的爬虫,整理了一份2009~2021年的爬虫。
Python脚本打包下载必应壁纸2009-2021
下载脚本
今天发现一个好项目,niumoo/bing-wallpaper: 必应每日超清壁纸(4K)。
这个项目把必应每日壁纸按月归档在一个markdown
文件。有日期、有4k画质下载链接。于是写了个简单的python脚本,把图片下载下来。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
|
#!/usr/bin/env python3
# coding=utf-8
#main.py
__author__ = 'https://blog.mzh.ren/zh/'
__version__ = (0, 1, 0)
import re
import os,sys
import requests
def get_all_markdown_files(path):
"""Get all the markdown files with all depth"""
markdown_files = []
for root, dirs, files in os.walk(path):
for file in files:
if file.endswith('.md'):
markdown_files.append(os.path.join(root, file))
return markdown_files
def download_files_from_markdown(markdown_file):
"""Download all the files from the markdown file"""
markdown_file_path = os.path.dirname(markdown_file)
items = get_date_links_from_markdown(markdown_file)
for item in items:
filename = markdown_file_path + '/' + get_filename_from_url(item)
download_file(item[1], filename)
def get_date_links_from_markdown(markdown_file):
"""Get all the links from the markdown file"""
content = read_markdown_file(markdown_file)
date_and_links = re.findall(r'(\d{4}-\d{2}-\d{2}) \[download 4k\]\((https[^\)]+)\)', content)
return date_and_links
def read_markdown_file(markdown_file):
"""Read the markdown file and return the content"""
with open(markdown_file, 'r') as f:
content = f.read()
return content
def download_file(url, filename):
"""Download the file from the url and save it as filename"""
if not check_file_exist(filename):
print('Downloading {} to {}'.format(url, filename))
r = requests.get(url)
with open(filename, 'wb') as f:
f.write(r.content)
def get_filename_from_url(date_and_link):
"""Get the filename from the date_and_link"""
# ('2022-10-09', 'https://cn.bing.com/th?id=OHR.GlassOctopus_EN-US6394802515_UHD.jpg')
return date_and_link[0] + '_' + get_url_pram_value(date_and_link[1], 'id')
def get_url_pram_value(url, param):
"""Get the value of the param from the url"""
regResult = re.search(r'{}=([^&]+)'.format(param), url)
if regResult:
return regResult.group(1)
def check_file_exist(filename):
"""Check if the file exists"""
return os.path.exists(filename)
if __name__ == '__main__':
if len(sys.argv) > 1:
path = sys.argv[1]
else:
path = './picture'
markdown_files = get_all_markdown_files(path)
for markdown_file in markdown_files:
download_files_from_markdown(markdown_file)
|
使用方法
将该项目下载到本地,添加main.py
到项目根目录,执行python3 main.py
即可。
下载全部
或带上目录
1
|
python main.py ./picture
|
下载某月
1
|
python main.py ./picture/2022-11
|
按年下载
2009年必应壁纸
https://www.aliyundrive.com/s/RpmzqFhvhdE
提取码: q9h1
2010年必应壁纸
https://www.aliyundrive.com/s/Bvwbyg3XMkJ
提取码: 1d8m
2011年必应壁纸
https://www.aliyundrive.com/s/4p3CzeBXz2b
提取码: i7i7
2012年必应壁纸
https://www.aliyundrive.com/s/ocrt2yJormD
提取码: u2p6
2013年必应壁纸
https://www.aliyundrive.com/s/YQG5kcWhGDB
提取码: rj32
2014年必应壁纸
https://www.aliyundrive.com/s/rCg5hPwc44v
提取码: 77ip
2015年必应壁纸
https://www.aliyundrive.com/s/xsMJgNKvQ4N
提取码: 29gt
2016年必应壁纸
https://www.aliyundrive.com/s/Gn6Brj254ni
提取码: rt28
2017年必应壁纸
https://www.aliyundrive.com/s/LNToUWihUEf
提取码: r14q
2018年必应壁纸
https://www.aliyundrive.com/s/Cr6TSVJasqU
提取码: ze57
2019年必应壁纸
https://www.aliyundrive.com/s/89hBHCHx6LC
提取码: 18sm
2020年必应壁纸
https://www.aliyundrive.com/s/VLU9nCtFg72
提取码: gr97
2021年必应壁纸
https://www.aliyundrive.com/s/LkmBCU5BrGV
提取码: lb61
2022年必应壁纸
https://www.aliyundrive.com/s/D4KrtYS9xro
提取码: 46te
打包下载
如果链接失效,请联系vx: gameboy1000
链接:https://pan.baidu.com/s/1K5TRVwfD0NHtPpDCVKlWeQ?pwd=1234
提取码:1234
链接:https://share.weiyun.com/zNuGYXOr 密码:dhmxvv