The Error
You're reading a text file in Python on Windows and suddenly hit this:
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10859: character maps to <undefined>
The file opens fine in Notepad or VS Code, but Python refuses. The reason comes down to one thing: encoding defaults.
Root Cause
Python's open() on Windows defaults to the system locale encoding โ usually cp1252 (Windows-1252, also called charmap). It only covers a subset of Latin characters.
When your file contains bytes outside that range โ UTF-8 text, files from a different codepage, anything with emoji or extended Unicode โ Python throws UnicodeDecodeError. Byte 0x9d is a classic example: valid in UTF-8, but undefined in cp1252.
On macOS and Linux this rarely happens. Their default is already UTF-8. Windows is the outlier.
Fix 1: Specify UTF-8 Encoding (Fixes 90% of Cases)
Just add encoding='utf-8' to your open() call:
# Before (breaks on Windows)
with open('data.txt') as f:
content = f.read()
# After (works everywhere)
with open('data.txt', encoding='utf-8') as f:
content = f.read()
If the file was saved as UTF-8 โ which most modern tools do by default โ this is all you need. One line change.
Verify it worked
with open('data.txt', encoding='utf-8') as f:
content = f.read()
print(f"Read {len(content)} characters successfully")
No exception. Fixed.
Fix 2: When You Don't Know the Encoding
Getting files from a client or third-party system? The encoding could be anything โ cp1252, latin-1, shift-jis, GBK. Use chardet to detect it automatically.
pip install chardet
import chardet
# Read raw bytes first
with open('data.txt', 'rb') as f:
raw = f.read()
result = chardet.detect(raw)
detected_encoding = result['encoding']
confidence = result['confidence']
print(f"Detected: {detected_encoding} (confidence: {confidence:.0%})")
# Now read with the detected encoding
with open('data.txt', encoding=detected_encoding) as f:
content = f.read()
Confidence below 70%? The file might be corrupted, mixed-encoding, or from a very obscure codepage. Check with the sender before doing anything with the output.
Fix 3: Skip or Replace Bad Bytes
Need to read the file fast and don't care about a few unreadable characters? The errors parameter handles this:
# Replace undecodable bytes with the replacement character (U+FFFD)
with open('data.txt', encoding='utf-8', errors='replace') as f:
content = f.read()
# Or drop them entirely
with open('data.txt', encoding='utf-8', errors='ignore') as f:
content = f.read()
Fair warning: errors='ignore' silently drops data. If you're processing customer records or financial exports, you'll never know what went missing. Reserve this for cases where the unreadable characters genuinely don't matter.
Fix 4: Force UTF-8 for the Whole Process
Running a script with file operations scattered throughout? Set PYTHONUTF8=1 once and every open() call defaults to UTF-8 automatically:
# Command Prompt
set PYTHONUTF8=1
python your_script.py
# PowerShell
$env:PYTHONUTF8 = "1"
python your_script.py
Or pass it inline with the -X flag:
python -X utf8 your_script.py
UTF-8 mode applies to the entire process โ file I/O, stdin, stdout, stderr, everything. Available in Python 3.7+.
To make it permanent, add PYTHONUTF8=1 to your Windows environment variables: System Properties โ Advanced โ Environment Variables โ New (under "User variables").
Fix 5: latin-1 for Old Windows Files
Some legacy files โ exports from old Access databases, ancient CRMs, government data portals โ are genuinely cp1252 or latin-1. If chardet confirms this, read with the right codec:
with open('legacy_file.txt', encoding='cp1252') as f:
content = f.read()
# latin-1 maps every byte 0x00-0xFF to a Unicode character
# so it never raises UnicodeDecodeError
with open('mystery_file.txt', encoding='latin-1') as f:
content = f.read()
Latin-1 accepts every possible byte value without error โ that's what makes it useful as a fallback. The catch: if the file is actually UTF-8, the output will be garbled. Use it last, not first.
Checking Your System's Default Encoding
Not sure what Python is defaulting to on your machine? Run this:
import sys
import locale
print(sys.getdefaultencoding()) # usually 'utf-8'
print(sys.getfilesystemencoding()) # 'utf-8' on Mac/Linux, 'mbcs' on Windows
print(locale.getpreferredencoding()) # what open() actually uses โ often 'cp1252'
If locale.getpreferredencoding() returns cp1252, that's the culprit. Every open() call without an explicit encoding= uses it โ silently, with no warning.
Prevention
- Always write
encoding='utf-8'in everyopen()call. Do it even on Linux and macOS where UTF-8 is the default โ your code will run on Windows someday. - Save files as UTF-8 in your editor. VS Code defaults to UTF-8. Notepad on Windows 10 and 11 also defaults to UTF-8 now, but older versions defaulted to ANSI/cp1252.
- If you publish a library, accept an
encodingparameter in any function that reads files. Document what the default is. - Pylint's
W1514(unspecified-encoding) flags everyopen()without an explicit encoding. Wire it into your CI pipeline and you'll catch these before they reach production.

