If you run into the “UnicodeDecodeError: ‘ascii’ codec can’t decode byte” error in Python, read the explanation below to get it resolved.
“UnicodeDecodeError: ‘ascii’ codec can’t decode byte” Error
Reproduce The Error
This error can happen when you read a text file or process a string that stores characters not available in the ASCII character encoding.
Suppose we have a text file message.txt with this content:
Bis zum nächsten Mal.
Here is how many programmers might use the open() function in Python to open it:
with open('message.txt', 'r', encoding='ascii') as f: lines = f.readlines() print(lines)
It looks like standard text reading, but we get this error instead:
File "/home/ittutoria/python.py", line 4, in <module> lines = f.readlines() File "/usr/lib/python3.10/encodings/ascii.py", line 26, in decode return codecs.ascii_decode(input, self.errors)
UnicodeDecodeError: ‘ascii’ codec can’t decode byte 0xc3 in position 9: ordinal not in range(128)
In the same manner, it will also appear when you try to, for instance, encode the string:
myStr = 'Bis zum nächsten Mal.' bytesdata = myStr.encode(encoding='ascii')
Traceback (most recent call last): File "/home/ittutoria/python.py", line 8, in <module> bytesdata = myStr.encode(encoding='ascii',) UnicodeEncodeError: 'ascii' codec can't encode character '\xe4' in position 9: ordinal not in range(128)
Character Encodings In Python
As you may already notice, both the encode() and open() functions above have the encoding=’ascii’ argument. What it does is to specify the character encoding you want to use for the text file/string, which is ASCII in this case.
However, this is the wrong choice. ASCII is an old encoding whose history can be traced back to the era of the telegraph. The full name of this standard is American Standard Code for Information Interchange. It shouldn’t surprise you that ASCII is catered to the English language.
The original format of ASCII uses only 7 bits to represent characters, many of which are special characters. There are many derivations, but most of them use just one more bit. This puts severe limitations on the number of characters the ASCII encoding can represent.
On the other hand, Unicode standards are designed with all the languages in the world in mind. UTF-8 is one of its most popular encodings. It can encode more than one million characters by using up to four bytes for each code unit.
While most modern developers have switched to UTF-8 as the default character encoding, many legacy programs still use ASCII by default. It doesn’t support characters of other languages, and attempts to encode them in Python with ASCII will result in errors.
Both open() and encode() support a wide range of encodings, including both ASCII and UTF-8. This setting is set with the encoding parameter.
How To Solve The Error
You can change the encoding parameter of encode() and open() to ‘utf-8’ to resolve the issue.
with open('message.txt', 'r', encoding='utf-8') as f: lines = f.readlines() print(lines) myStr = 'Bis zum nächsten Mal.' bytesdata = myStr.encode(encoding='utf-8') print(bytesdata.decode())
['Bis zum nächsten Mal. \n'] Bis zum nächsten Mal.
Because you have forced open() and encode() to use UTF-8 to process our string, Python should no longer raise the UnicodeDecodeError exception.
The “UnicodeDecodeError: ‘ascii’ codec can’t decode byte” error may occur in Python when you encode characters of other languages. Switching to the UTF-8 character encoding should get rid of this error. Want to learn to decode UTF-8? Follow this guide.