Having recently had to modify Selenium Wire to properly support proxy server authentication with Python’s http.client.HTTPConnection and http.client.HTTPSConnection, I thought I’d summarise how I did that here, in case it’s useful to anybody in future.
These examples assume you’re working with a proxy server that’s using Basic Authentication. This authentication method works by sending a base64 encoded username/password string as a request header in the format:
Proxy-Authorization: Basic username:password
Creating the Proxy-Authorization
header
The username and password credentials should be separated by a colon and then the whole string base64 encoded. Python’s base64
module makes that fairly trivial:
import base64 username = 'myusername' password = 'mypassword' cred = '{}:{}'.format(username, password) cred = base64.b64encode(cred.encode('utf-8')).decode('utf-8')
Note that the base64.b64encode()
function takes a byte string, which that means that the string containing the credentials must first be turned into bytes
by calling .encode('utf-8')
. Similarly, .decode('utf-8')
is called on the base64 encoded result to turn it back into a string again.
The Proxy-Authorization
header can then be created and held within a dictionary:
headers = { 'Proxy-Authorization': 'Basic {}'.format(cred) }
Connecting to the proxy server
Whether you’re connecting to a site over HTTP or HTTPS, you establish a connection to the proxy server in the same way – by passing the proxy’s hostname and port to the HTTPConnection
and HTTPSConnection
classes when you instantiate them. For example:
http = HTTPConnection('proxy1:8080') https = HTTPSConnection('proxy1:8080')
Sending the credentials
This is where things start to differ slightly.
HTTPConnection
For HTTPConnection
, you pass the Proxy-Authorization
header each time you call the request()
method. This method accepts headers via an optional headers
argument where you can supply the headers dictionary:
http.request('GET', 'http://www.example.com/', headers=headers)
One thing to note here is that the URL that you pass as the second argument must be an absolute URL containing the hostname of the remote site. If you pass a relative URL then the proxy server won’t know where to send your request on to. This is true whether your proxy server is using authentication or not.
The Proxy-Authorization
header is a single-hop header, which means it won’t get passed on to the remote site by the proxy.
HTTPSConnection
HTTPSConnection
works slightly differently. After connecting to the proxy server, you need to tell it to establish a tunnel to the remote site, via the set_tunnel()
method.
Setting up a tunnel is necessary with HTTPS even if you’re not using proxy authentication. But when you are, it’s at this point where the Proxy-Authorization
header should be used.
set_tunnel()
takes an optional headers
argument, in a similar way to request()
, where you can supply the headers dictionary:
https.set_tunnel('www.example.com', headers=headers)
Once the tunnel has been established, subsequent calls to https.request()
can be made without sending the Proxy-Authorization
header. In fact, it’s important that you don’t supply the header to the request()
method, otherwise you may inadvertently expose the credentials to the remote site.