

urllib2.urlopen('https://2entwine.com/')urllib2's urlopen function is just a thin wrapper around a globally instantiated opener. An opener is a manager class for a set of protocol handlers, and it's the opener's job to dispatch a request to the correct handler. Openers have a default set of handlers for all the support protocols. For example, urllib2 contains a HTTPHandler which handles HTTP requests on behalf of the opener. It's possible to provide a handler to be used inplace of a default handler. To displace a default handler, a new handler has to be subclassed off of the default handler, and then passed to the opener when it's created.
import httplib, urllib2 class HTTP11(httplib.HTTP): _http_vsn = 11 _http_vsn_str = 'HTTP/1.1' class HTTP11Handler(urllib2.HTTPHandler): def http_open(self, req): return self.do_open(HTTP11, req) opener = urllib2.build_opener(HTTP11Handler()) urllib2.install_opener(opener)Theoretically this is all you need to get HTTP/1.1 working with urllib2. Unfortunately, it doesn't work but more about that later.
GET /atom.xml HTTP/1.1 Host: 2entwine.com Accept-Encoding: identity User-agent: Python-urllib/2.1 Host: 2entwine.comThe do_open for the HTTPHandler injects the Host header but so does httplib for HTTP/1.1 connections.
import httplib, urllib2 class HTTP11(httplib.HTTP): _http_vsn = 11 _http_vsn_str = 'HTTP/1.1' class HTTP11Handler(urllib2.HTTPHandler): def http_open(self, req): return self.do_open(HTTP11, req) def do_open(self, http_class, req): host = req.get_host() if not host: raise URLError('no host given') h = http_class(host) # will parse host:port if req.has_data(): data = req.get_data() h.putrequest('POST', req.get_selector()) if not 'Content-type' in req.headers: h.putheader('Content-type', 'application/x-www-form-urlencoded') if not 'Content-length' in req.headers: h.putheader('Content-length', '%d' % len(data)) else: h.putrequest('GET', req.get_selector()) h.putheader('Connection', 'close') ## scheme, sel = splittype(req.get_selector()) ## sel_host, sel_path = splithost(sel) ## h.putheader('Host', sel_host or host) for name, value in self.parent.addheaders: name = name.capitalize() if name not in req.headers: h.putheader(name, value) for k, v in req.headers.items(): h.putheader(k, v) # httplib will attempt to connect() here. be prepared # to convert a socket error to a URLError. try: h.endheaders() except socket.error, err: raise URLError(err) if req.has_data(): h.send(data) code, msg, hdrs = h.getreply() fp = h.getfile() if code == 200: return urllib2.addinfourl(fp, hdrs, req.get_full_url()) else: return self.parent.error('http', req, fp, code, msg, hdrs) opener = urllib2.build_opener(HTTP11Handler()) urllib2.install_opener(opener)The do_open method was borrowed from AbstractHTTPHandler which HTTPHandler subclasses off of. do_open should probably be refactored, but that's for another discussion. h.put_header("Connection", "close") has been added to make sure that the connection closes right after the request has been handled by the server. The three lines that have been commented out are responsible for adding the extra Host header.
This site is licensed under a
Creative Commons License