. Advertisement .
..3..
. Advertisement .
..4..
Here is the program I run:
import urllib.request
import re
url = "http://www.google.com"
regex = r'<title>(,+?)</title>'
pattern = re.compile(regex)
with urllib.request.urlopen(url) as response:
html = response.read()
title = re.findall(pattern, html)
print(title)
After I run, it returns an error:
Traceback (most recent call last):
File "path\to\file\Crawler.py", line 11, in <module>
title = re.findall(pattern, html)
File "C:\Python33\lib\re.py", line 201, in findall
return _compile(pattern, flags).findall(string)
TypeError: can't use a string pattern on a bytes-like object
Does anyone have any suggestions for the problem below: typeerror: cannot use a string pattern on a bytes-like object in the python – How to correct it?
The cause: The cause of the “typeerror: cannot use a string pattern on a bytes-like object in the python” is simply because your regex is a string, however
html
is made up of bytes:Solution:
You need to use
.decode
to converthtml
(a byte-like object) to a string.For example, this:
Your regex is a string but
html
bytes.Because python does not know the encoded bytes, it throws an error when you attempt to use string regex.
You can
decode
add bytes to a string.You can also use a bytes regex
This particular context allows you to get the encoding using the response headers.
For more information, see the
urlopen
documentation .