TL;DR: The Quick Fix
If your Ansible playbook crashes because a remote command returns non-UTF-8 characters, the fastest solution is forcing the remote environment to use UTF-8. You can also convert the output using iconv before Ansible attempts to parse it.
- name: Run command with forced UTF-8 locale
shell: "/usr/local/bin/legacy_report.sh"
environment:
LC_ALL: "en_US.UTF-8"
LANG: "en_US.UTF-8"
If the source data is actually corrupted or encoded in a format like Latin-1 (ISO-8859-1), pipe it through iconv to sanitize the stream:
- name: Force conversion to UTF-8 and ignore errors
shell: "cat legacy_inventory.txt | iconv -f ISO-8859-1 -t UTF-8//IGNORE"
register: output
The 2 AM Production Headache
It’s 2 AM, and your deployment is running smoothly. Suddenly, a routine task—perhaps reading a legacy log or a Windows file path—triggers a 50-line Python traceback. Instead of a helpful error, you see this frustrating message:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xA0 in position 142: invalid start byte
Ansible relies heavily on Python's decode() method. When it captures stdout or stderr from a remote node, it expects a clean UTF-8 stream. If the node returns a byte sequence that doesn't fit—like a stray 0xA0 from an old Latin-1 system—Python panics. This kills the entire playbook run immediately.
Why This Happens
Usually, the issue stems from one of three scenarios:
- Locale Mismatch: The remote shell uses a
CorPOSIXlocale. When a command outputs an accent or a curly quote, it sends raw bytes that aren't UTF-8 compliant. - Hidden Binary Data: You are running a command like
greporcaton a file that accidentally contains binary data, such as a compressed log or a database dump fragment. - Windows Encoding Issues: WinRM often communicates using
UTF-16or specific Windows Code Pages (like CP1252) that conflict with the Ansible controller’s expectations.
How to Fix It
1. Standardize the Remote Locale
Most modern systems support UTF-8, but they don't always use it by default for non-interactive shells. You can explicitly inject the correct locale variables into your task. This forces the remote process to communicate in a language Ansible understands.
- name: Execute script with UTF-8 environment
command: /opt/app/check_status.sh
environment:
LANG: "en_US.UTF-8"
LC_ALL: "en_US.UTF-8"
LC_CTYPE: "en_US.UTF-8"
2. Sanitize Output with iconv
Environment variables won't help if you're reading a file already saved in an incompatible encoding. In this case, you must transcode the data on the fly. The //IGNORE flag is vital here; it silently discards any bytes that cannot be converted rather than letting the playbook crash.
- name: Read a legacy log safely
shell: "cat /var/log/old_system.log | iconv -t UTF-8//IGNORE"
register: sanitized_log
3. Use Base64 for Raw Binary Data
Sometimes you actually need the raw data, such as a SSL certificate or a small firmware blob. Don't try to capture these as text. Instead, encode them into Base64 on the remote host. This transforms problematic bytes into a safe ASCII string that Ansible can easily transport.
- name: Capture binary data as Base64
shell: "cat /path/to/binary_file | base64"
register: binary_output
- name: Display the safe string
debug:
msg: "Encoded data: {{ binary_output.stdout }}"
To inspect the contents of a Base64 string during debugging, tools like the Base64 Encoder/Decoder are quite handy. They allow you to verify if the decoded output is actually what you expect or just garbled data before you update your logic.
Verifying the Solution
Don't wait for a full playbook run to test your fix. Use a targeted ad-hoc command against the problematic host to verify the encoding:
# Test if forcing the locale works
ansible webserver -m shell -a "locale; date" -e "ansible_env={'LANG':'en_US.UTF-8'}"
Next, check the specific command output type. If the command your_command | file - returns UTF-8 Unicode text, you are good to go. If it still reports ISO-8859 text or data, the iconv approach is your best bet.
Final Tips
- Check
localectl statuson your target nodes to see the system-wide defaults. - When searching through logs, use
grep --text. This preventsgrepfrom failing if it encounters a null byte or binary character. - If you find strange URL-encoded characters in your logs, you can use a URL Encoder/Decoder to see if hidden non-UTF-8 characters are lurking in your parameters.

