. Advertisement .
..3..
. Advertisement .
..4..
I get the “cannot use a string pattern on a bytes-like object” error as the title says. How can I fix it so the error goes away? Here is my detail:
import urllib.request
import re
url = "http://www.google.com"
regex = r'<title>(,+?)</title>'
pattern = re.compile(regex)
with urllib.request.urlopen(url) as response:
html = response.read()
title = re.findall(pattern, html)
print(title)
When I operated it, I received the error text:
Traceback (most recent call last):
File "path\to\file\Crawler.py", line 11, in <module>
title = re.findall(pattern, html)
File "C:\Python33\lib\re.py", line 201, in findall
return _compile(pattern, flags).findall(string)
TypeError: can't use a string pattern on a bytes-like object
I appreciate any help from you.
The cause:
You are facing up with this error because you are attempting to use a string to match with a bytes object. Moreover, python is not be informed that the bytes have been encoded. Therefore, the error happens after you run your program.
Solution:
To fix this error, you must change html (a bytes object) into a string by using
.decode
as following:A bytes regex also is not bad method for this error, so you can use it:
Let’s read this article to get more knowledge Change bytes to a Python String.
Your regex is a string but
html
bytes.Because python does not know the encoded bytes, it throws an error when you attempt to use string regex.
You can
decode
add bytes to a string.You can also use a bytes regex
This particular context allows you to get the encoding using the response headers.
For more information, see the
urlopen
documentation .