urllib 및 python을 통해 사진 다운로드
그래서 웹 코믹스를 다운로드하여 데스크탑의 폴더에 넣는 Python 스크립트를 만들려고합니다. 나는 비슷한 것을하는 몇 가지 유사한 프로그램을 찾았지만 필요한 것은 아닙니다. 내가 가장 비슷한 것으로 찾은 것은 바로 여기 ( http://bytes.com/topic/python/answers/850927-problem-using-urllib-download-images )입니다. 이 코드를 사용해 보았습니다.
>>> import urllib
>>> image = urllib.URLopener()
>>> image.retrieve("http://www.gunnerkrigg.com//comics/00000001.jpg","00000001.jpg")
('00000001.jpg', <httplib.HTTPMessage instance at 0x1457a80>)
그런 다음 컴퓨터에서 "00000001.jpg"파일을 검색했지만 캐시 된 사진 만 발견했습니다. 파일을 컴퓨터에 저장했는지 확실하지 않습니다. 파일 다운로드 방법을 이해하면 나머지를 처리하는 방법을 알고 있다고 생각합니다. 본질적으로 for 루프를 사용하고 '00000000'. 'jpg'에서 문자열을 분할하고 '00000000'을 가장 큰 숫자까지 늘리면 어떻게 든 결정해야합니다. 이 작업을 수행하는 가장 좋은 방법이나 파일을 올바르게 다운로드하는 방법에 대한 권장 사항은 무엇입니까?
감사!
6/15/10 편집
완성 된 스크립트는 다음과 같습니다. 선택한 디렉토리에 파일을 저장합니다. 이상한 이유로 파일이 다운로드되지 않았고 방금 완료되었습니다. 그것을 청소하는 방법에 대한 제안은 대단히 감사하겠습니다. 현재 사이트에 많은 만화가 있는지 확인하는 방법을 연구 중이므로 특정 수의 예외가 발생한 후에 프로그램을 종료하지 않고 최신 만화를 얻을 수 있습니다.
import urllib
import os
comicCounter=len(os.listdir('/file'))+1 # reads the number of files in the folder to start downloading at the next comic
errorCount=0
def download_comic(url,comicName):
"""
download a comic in the form of
url = http://www.example.com
comicName = '00000000.jpg'
"""
image=urllib.URLopener()
image.retrieve(url,comicName) # download comicName at URL
while comicCounter <= 1000: # not the most elegant solution
os.chdir('/file') # set where files download to
try:
if comicCounter < 10: # needed to break into 10^n segments because comic names are a set of zeros followed by a number
comicNumber=str('0000000'+str(comicCounter)) # string containing the eight digit comic number
comicName=str(comicNumber+".jpg") # string containing the file name
url=str("http://www.gunnerkrigg.com//comics/"+comicName) # creates the URL for the comic
comicCounter+=1 # increments the comic counter to go to the next comic, must be before the download in case the download raises an exception
download_comic(url,comicName) # uses the function defined above to download the comic
print url
if 10 <= comicCounter < 100:
comicNumber=str('000000'+str(comicCounter))
comicName=str(comicNumber+".jpg")
url=str("http://www.gunnerkrigg.com//comics/"+comicName)
comicCounter+=1
download_comic(url,comicName)
print url
if 100 <= comicCounter < 1000:
comicNumber=str('00000'+str(comicCounter))
comicName=str(comicNumber+".jpg")
url=str("http://www.gunnerkrigg.com//comics/"+comicName)
comicCounter+=1
download_comic(url,comicName)
print url
else: # quit the program if any number outside this range shows up
quit
except IOError: # urllib raises an IOError for a 404 error, when the comic doesn't exist
errorCount+=1 # add one to the error count
if errorCount>3: # if more than three errors occur during downloading, quit the program
break
else:
print str("comic"+ ' ' + str(comicCounter) + ' ' + "does not exist") # otherwise say that the certain comic number doesn't exist
print "all comics are up to date" # prints if all comics are downloaded
urllib.urlretrieve 사용 :
import urllib
urllib.urlretrieve("http://www.gunnerkrigg.com//comics/00000001.jpg", "00000001.jpg")
import urllib
f = open('00000001.jpg','wb')
f.write(urllib.urlopen('http://www.gunnerkrigg.com//comics/00000001.jpg').read())
f.close()
요청 라이브러리를 사용하여 레코드 전용.
import requests
f = open('00000001.jpg','wb')
f.write(requests.get('http://www.gunnerkrigg.com//comics/00000001.jpg').content)
f.close()
requests.get () 오류를 확인해야하지만.
Python 3의 경우 다음을 가져와야합니다 import urllib.request
.
import urllib.request
urllib.request.urlretrieve(url, filename)
@DiGMi의 답변에 대한 Python 3 버전 :
from urllib import request
f = open('00000001.jpg', 'wb')
f.write(request.urlopen("http://www.gunnerkrigg.com/comics/00000001.jpg").read())
f.close()
이 답변 을 찾았 으며 더 신뢰할 수있는 방식으로 편집했습니다.
def download_photo(self, img_url, filename):
try:
image_on_web = urllib.urlopen(img_url)
if image_on_web.headers.maintype == 'image':
buf = image_on_web.read()
path = os.getcwd() + DOWNLOADED_IMAGE_PATH
file_path = "%s%s" % (path, filename)
downloaded_image = file(file_path, "wb")
downloaded_image.write(buf)
downloaded_image.close()
image_on_web.close()
else:
return False
except:
return False
return True
From this you never get any other resources or exceptions while downloading.
If you know that the files are located in the same directory dir
of the website site
and have the following format: filename_01.jpg, ..., filename_10.jpg then download all of them:
import requests
for x in range(1, 10):
str1 = 'filename_%2.2d.jpg' % (x)
str2 = 'http://site/dir/filename_%2.2d.jpg' % (x)
f = open(str1, 'wb')
f.write(requests.get(str2).content)
f.close()
It's easiest to just use .read()
to read the partial or entire response, then write it into a file you've opened in a known good location.
Maybe you need 'User-Agent':
import urllib2
opener = urllib2.build_opener()
opener.addheaders = [('User-Agent', 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.137 Safari/537.36')]
response = opener.open('http://google.com')
htmlData = response.read()
f = open('file.txt','w')
f.write(htmlData )
f.close()
Aside from suggesting you read the docs for retrieve()
carefully (http://docs.python.org/library/urllib.html#urllib.URLopener.retrieve), I would suggest actually calling read()
on the content of the response, and then saving it into a file of your choosing rather than leaving it in the temporary file that retrieve creates.
All the above codes, do not allow to preserve the original image name, which sometimes is required. This will help in saving the images to your local drive, preserving the original image name
IMAGE = URL.rsplit('/',1)[1]
urllib.urlretrieve(URL, IMAGE)
Try this for more details.
This worked for me using python 3.
It gets a list of URLs from the csv file and starts downloading them into a folder. In case the content or image does not exist it takes that exception and continues making its magic.
import urllib.request
import csv
import os
errorCount=0
file_list = "/Users/$USER/Desktop/YOUR-FILE-TO-DOWNLOAD-IMAGES/image_{0}.jpg"
# CSV file must separate by commas
# urls.csv is set to your current working directory make sure your cd into or add the corresponding path
with open ('urls.csv') as images:
images = csv.reader(images)
img_count = 1
print("Please Wait.. it will take some time")
for image in images:
try:
urllib.request.urlretrieve(image[0],
file_list.format(img_count))
img_count += 1
except IOError:
errorCount+=1
# Stop in case you reach 100 errors downloading images
if errorCount>100:
break
else:
print ("File does not exist")
print ("Done!")
A simpler solution may be(python 3):
import urllib.request
import os
os.chdir("D:\\comic") #your path
i=1;
s="00000000"
while i<1000:
try:
urllib.request.urlretrieve("http://www.gunnerkrigg.com//comics/"+ s[:8-len(str(i))]+ str(i)+".jpg",str(i)+".jpg")
except:
print("not possible" + str(i))
i+=1;
What about this:
import urllib, os
def from_url( url, filename = None ):
'''Store the url content to filename'''
if not filename:
filename = os.path.basename( os.path.realpath(url) )
req = urllib.request.Request( url )
try:
response = urllib.request.urlopen( req )
except urllib.error.URLError as e:
if hasattr( e, 'reason' ):
print( 'Fail in reaching the server -> ', e.reason )
return False
elif hasattr( e, 'code' ):
print( 'The server couldn\'t fulfill the request -> ', e.code )
return False
else:
with open( filename, 'wb' ) as fo:
fo.write( response.read() )
print( 'Url saved as %s' % filename )
return True
##
def main():
test_url = 'http://cdn.sstatic.net/stackoverflow/img/favicon.ico'
from_url( test_url )
if __name__ == '__main__':
main()
If you need proxy support you can do this:
if needProxy == False:
returnCode, urlReturnResponse = urllib.urlretrieve( myUrl, fullJpegPathAndName )
else:
proxy_support = urllib2.ProxyHandler({"https":myHttpProxyAddress})
opener = urllib2.build_opener(proxy_support)
urllib2.install_opener(opener)
urlReader = urllib2.urlopen( myUrl ).read()
with open( fullJpegPathAndName, "w" ) as f:
f.write( urlReader )
Another way to do this is via the fastai library. This worked like a charm for me. I was facing a SSL: CERTIFICATE_VERIFY_FAILED Error
using urlretrieve
so I tried that.
url = 'https://www.linkdoesntexist.com/lennon.jpg'
fastai.core.download_url(url,'image1.jpg', show_progress=False)
참고URL : https://stackoverflow.com/questions/3042757/downloading-a-picture-via-urllib-and-python
'Programing' 카테고리의 다른 글
설치된 앵귤러 클리의 버전을 확인하고 있습니까? (0) | 2020.05.26 |
---|---|
PHPMailer 문자 인코딩 문제 (0) | 2020.05.26 |
현재 조각 개체를 가져옵니다 (0) | 2020.05.26 |
최대 또는 기본? (0) | 2020.05.26 |
Openssl은 내부 또는 외부 명령으로 인식되지 않습니다 (0) | 2020.05.26 |