## The Question :

354 people think this question is useful

I am creating a program that will download a .jar (java) file from a web server, by reading the URL that is specified in the .jad file of the same game/application. I’m using Python 3.2.1

I’ve managed to extract the URL of the JAR file from the JAD file (every JAD file contains the URL to the JAR file), but as you may imagine, the extracted value is type() string.

Here’s the relevant function:

def downloadFile(URL=None):
import httplib2
h = httplib2.Http(".cache")
resp, content = h.request(URL, "GET")
return content



However I always get an error saying that the type in the function above has to be bytes, and not string. I’ve tried using the URL.encode(‘utf-8′), and also bytes(URL,encoding=’utf-8’), but I’d always get the same or similar error.

So basically my question is how to download a file from a server when the URL is stored in a string type?

• @alvas, A bounty for this? The answerer is still (and quite) active on SO. Why not just add a comment and ask?
• Cos a good answer that lasts the test of time is worth awarding. Also, we should start doing this for a lot of other questions to check whether answers are relevant today. Especially when the sorting of SO answers are rather crazy, sometimes the out-dated or even worst answer goes to the top.

693 people think this answer is useful

If you want to obtain the contents of a web page into a variable, just read the response of urllib.request.urlopen:

import urllib.request
...
url = 'http://example.com/'
response = urllib.request.urlopen(url)
data = response.read()      # a bytes object
text = data.decode('utf-8') # a str; this step can't be used if data is binary



The easiest way to download and save a file is to use the urllib.request.urlretrieve function:

import urllib.request
...
# Download the file from url and save it locally under file_name:
urllib.request.urlretrieve(url, file_name)


import urllib.request
...
# Download the file from url, save it in a temporary directory and get the
# path to it (e.g. '/tmp/tmpb48zma.txt') in the file_name variable:



But keep in mind that urlretrieve is considered legacy and might become deprecated (not sure why, though).

So the most correct way to do this would be to use the urllib.request.urlopen function to return a file-like object that represents an HTTP response and copy it to a real file using shutil.copyfileobj.

import urllib.request
import shutil
...
# Download the file from url and save it locally under file_name:
with urllib.request.urlopen(url) as response, open(file_name, 'wb') as out_file:
shutil.copyfileobj(response, out_file)



If this seems too complicated, you may want to go simpler and store the whole download in a bytes object and then write it to a file. But this works well only for small files.

import urllib.request
...
# Download the file from url and save it locally under file_name:
with urllib.request.urlopen(url) as response, open(file_name, 'wb') as out_file:
data = response.read() # a bytes object
out_file.write(data)



It is possible to extract .gz (and maybe other formats) compressed data on the fly, but such an operation probably requires the HTTP server to support random access to the file.

import urllib.request
import gzip
...
# Read the first 64 bytes of the file inside the .gz archive located at url
url = 'http://example.com/something.gz'
with urllib.request.urlopen(url) as response:
with gzip.GzipFile(fileobj=response) as uncompressed:
file_header = uncompressed.read(64) # a bytes object
# Or do anything shown above using uncompressed instead of response.



159 people think this answer is useful

I use requests package whenever I want something related to HTTP requests because its API is very easy to start with:

first, install requests

\$ pip install requests



then the code:

from requests import get  # to make GET request

# open in binary mode
with open(file_name, "wb") as file:
# get request
response = get(url)
# write to file
file.write(response.content)



18 people think this answer is useful

I hope I understood the question right, which is: how to download a file from a server when the URL is stored in a string type?

import requests

url = 'https://www.python.org/static/img/python-logo.png'
fileName = 'D:\Python\dwnldPythonLogo.png'
req = requests.get(url)
file = open(fileName, 'wb')
for chunk in req.iter_content(100000):
file.write(chunk)
file.close()



11 people think this answer is useful

You can use wget which is popular downloading shell tool for that. https://pypi.python.org/pypi/wget This will be the simplest method since it does not need to open up the destination file. Here is an example.

import wget



10 people think this answer is useful

Here we can use urllib’s Legacy interface in Python3:

The following functions and classes are ported from the Python 2 module urllib (as opposed to urllib2). They might become deprecated at some point in the future.

Example (2 lines code):

import urllib.request

url = 'https://www.python.org/static/img/python-logo.png'
urllib.request.urlretrieve(url, "logo.png")



2 people think this answer is useful

Yes, definietly requests is great package to use in something related to HTTP requests. but we need to be careful with the encoding type of the incoming data as well below is an example which explains the difference


from requests import get

# case when the response is byte array
url = 'some_image_url'

response = get(url)
with open('output', 'wb') as file:
file.write(response.content)

# case when the response is text
# Here unlikely if the reponse content is of type **iso-8859-1** we will have to override the response encoding
url = 'some_page_url'

response = get(url)
# override encoding by real educated guess as provided by chardet
r.encoding = r.apparent_encoding

with open('output', 'w', encoding='utf-8') as file:
file.write(response.content)



0 people think this answer is useful

# Motivation

Sometimes, we are want to get the picture but not need to download it to real files,

For example, If I use the machine learning method, train a model that can recognize an image with the number (bar code).

When I spider some websites and that have those images so I can use the model to recognize it,

and I don’t want to save those pictures on my disk drive,

# Points

import requests
from io import BytesIO
response = requests.get(url)
with BytesIO as io_obj:
for chunk in response.iter_content(chunk_size=4096):
io_obj.write(chunk)



basically, is like to @Ranvijay Kumar

# An Example

import requests
from typing import NewType, TypeVar
from io import StringIO, BytesIO
import matplotlib.pyplot as plt
import imageio

URL = NewType('URL', str)
T_IO = TypeVar('T_IO', StringIO, BytesIO)

chunk_size = option.get('chunk_size', 4096)  # default 4KB
max_size = 1024 ** 2 * option.get('max_size', -1)  # MB, default will ignore.
if response.status_code != 200:
raise requests.ConnectionError(f'{response.status_code}')

instance_io = StringIO if isinstance(next(response.iter_content(chunk_size=1)), str) else BytesIO
io_obj = instance_io()
cur_size = 0
for chunk in response.iter_content(chunk_size=chunk_size):
cur_size += chunk_size
if 0 < max_size < cur_size:
break
io_obj.write(chunk)
io_obj.seek(0)
""" save it to real file.
with open('temp.png', mode='wb') as out_f:
"""
return io_obj

def main():
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'zh-TW,zh;q=0.9,en-US;q=0.8,en;q=0.7',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive',
'Host': 'statics.591.com.tw',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.87 Safari/537.36'
}
headers,  # You may need this. Otherwise, some websites will send the 404 error to you.
with io_img:
plt.rc('axes.spines', top=False, bottom=False, left=False, right=False)
plt.rc(('xtick', 'ytick'), color=(1, 1, 1, 0))  # same of plt.axis('off')
plt.show()

if __name__ == '__main__':
main()



0 people think this answer is useful

If you are using Linux you can use the wget module of Linux through the python shell. Here is a sample code snippet

import os
url = 'http://www.example.com/foo.zip'
os.system('wget %s'%url)



-3 people think this answer is useful
from urllib import request

def get(url):
with request.urlopen(url) as r: