I am trying to sent a specific hexadecimal value across a serial COM port using PuTTY. Specifically, I want to send the hex codes 9C, B6, FC, and 8B. I have looked up the Alt codes for these and they are 156, 182, 252, and 139 respectively.
However, whenever I input the Alt codes, a preceding hex value of C2 is sent before 9C, B6, and 8B so the values that are sent are C2 9C, C2 B6, and C2 8B. The value for FC is changed to C3 FC.
Why are these values being placed before the hex value and why is FC being changed altogether? To me, it seems like there is a problem internally converting the Alt code to hex. Is there a way to directly input hex values without using Alt codes in PuTTY?
Answer
What you're seeing is just ordinary text character set conversion.
As far as PuTTY is concerned, you are typing (and reading) text, not raw binary data, therefore it has to convert the text to bytes in whatever configured character set before sending it over the wire.
In other words, when you type Alt+1 8 2, PuTTY receives the corresponding character from the legacy "OEM" charset that the system is configured for. (Typing Alt+0 1 8 2 would choose from the legacy "ANSI" (Windows-125x) character set.) In this case, the character is ¶
, a pilcrow.
Now PuTTY has to convert that character to bytes. Earlier PuTTY versions by default would choose the same legacy Windows-125x character set as the OS itself uses, e.g. Windows-1257, so the conversion used to be almost direct – input 1 8 2, receive byte 182 decimal (0xB6 hex).
However, as PuTTY usually connects to Linux or BSD servers, the huge majority of which have migrated to UTF-8 as the default, the latest PuTTY release started using UTF-8 by default as well. UTF-8 is an encoding of the Unicode mega-character-set, which has ¶
at position U+00B6, and it is mostly just coincidence that UTF-8 encodes that value as bytes C2 B6
:
U+00B6
→0000|0000 10|110110
→[110]00010 [10]110110
→C2 B6
U+00FC
→0000|0000 11|111100
→[110]00011 [10]111100
→C3 BC
U+20AC
→0010|0000 10|101100
→[1110]0010 [10]000010 [10]101100
→E2 82 AC
Wikipedia has it with colors
As a different example, the letter ė
used to be byte E6
in the Windows-1257 charset, but in Unicode it is U+0117, corresponding to bytes C4 97
in UTF-8. These sequences are of variable length, up to 4 bytes for larger positions.
If you absolutely must use PuTTY to send binary data, open the "Window → Translation" settings screen, and choose either CP437, ISO-8859-1, or Windows-1252 as the "Remote character set". (Save this in a separate session; do not save this as a global default because it will break regular SSH connections.)
Comments
Post a Comment