One of our work steps involves saving an MS Excel worksheet as CSV and then using UltraEdit to convert the CSV to UTF-8 before importing it into a server system.
The problem is that, after the conversion to UTF-8, the file always contains 3 nonsense characters at the start of the file:
ENTITY_ID;FIELD2;FIELD3,FIELD4;(etc.)
value1;value2;value3;value4;(etc).
Observations:
As you can see, there are 3 characters that are noise and cause the server to reject the CSV import because the first column is not named "ENTITY_ID". The characters are always the same.
These characters are not shown after the conversion, but when we close and reopen the file in UltraEdit again, then we do see the characters.
These characters are only visible in UltraEdit. Windows Notepad or Notepad++ does not show them.
Using Notepad++ to convert the CSV to UTF-8 produces the exact same output: a file with the same 3 odd characters in the beginning. The only difference is that Notepad++ does not display these characters, even after closing and reopening the file.
Workaround:
We reopen the file in UltraEdit, delete the noise, and then the server accepts the CSV import.
This step needs to be eliminated by fixing the actual problem.
Question: How can we avoid these 3 characters?
Answer
That's the byte order mark, encoded as UTF-8. Tell your editor to not add it at the beginning, or use a real decoder in your server system.
Comments
Post a Comment