Skip to main content

unicode - Saving a file in a CSV type in Excel always removes the BOM

I've been trying to find a reasonable solution/explanation (unsuccessfully) to find out why Excel defaults to removing the BOM when saving a file to the CSV type.


Please forgive me if you find this a duplicate of this question. This handles reading CSV files with non-ASCII encoding, but it doesn't cover saving the file back out (which is where the biggest issue lies).


Here is my current situation (which I'm going to gather is common among localized software dealing with Unicode characters and a CSV format):




  • We export data to a CSV format using UTF-16LE, ensuring the BOM is set (0xFFFE). We validate after the file is generated with a Hex editor to ensure it was set correctly.




  • Open the file in Excel (for this example we're exporting Japanese characters) and witness that Excel handles loading the file with the correct encoding.




  • Attempts to save this file will prompt you with a warning message indicating that the file may contain features that may not be compatible with Unicode encoding, but asks if you'd like to save anyway.




  • If you select the Save As dialog, it will immediately ask you to save the file as "Unicode Text" rather than CSV. If you select the "CSV" extension and save the file it removes the BOM (obviously along with all the Japanese characters).




Why would this happen? Is there a solution to this problem, or is this a known 'bug'/limitation of Excel?


Additionally (as a side issue) it appears that Excel, when loading UTF-16LE encoded CSV files, only uses TAB delimiters. Again, is this another known 'bug'/limitation of Excel?

Comments

Popular Posts

keyboard - Is there any utility/method to change Windows key bindings to type rare chars to currently empty bindings?

I'm currently typing this post with my windows XP machine and (Spanish) keyboard, and I'd like to add some extra symbols to my text. I could open the "char map" windows utility, look for the desired symbols, and paste them. But I'd like something quickier. For example, when I'm using my OSX Mac at work, I can easily add a ©, ™, ® or similar symbols, just pressing some weird ALT-GR + G / H / J, key combinations. In my (Spanish) keyboard mapping, these combinations are empty, as they don't produce any char at all, which, on the other hand, is perfectly normal and desirable. So, I thought: Why couldn't I add some extra key mappings on top of my currently empty ALT-GR + G/J/H Keys in my Spanish keyboard, and thus, being able to quickly type these special symbols? So that's my question: Is there any utility/method to achieve that effect under windows? (My version is XP). I've even googled this for some time but no luck. I've been a long term Hot...

virtualization - How to select paravirtualization interface in VirtualBox?

Given a windows 8 host system (Intel Core i5) and a Linux Fedora host, I would like to determine the optimal setting for the paravirtual interface. Options are none Default Legacy minimal Hyper-V KVM This page suggest the selection is only based on the guest system: The biggest change in VirtualBox 5.0 is the introduction of paravirtualization support, bringing higher performance and time-keeping accuracy to supported guest operating systems (Hyper-V on Windows and KVM on Linux). Is that correct? Answer The VirtualBox Manual , in the section titled Paravirtualization providers explains very clearly when each should be used (emphasis added): Minimal: Announces the presence of a virtualized environment. Additionally, reports the TSC and APIC frequency to the guest operating system. This provider is mandatory for running any Mac OS X guests. KVM: Presents a Linux KVM hypervisor interface which is recognized by Linux kernels starting with version 2.6.25. VirtualBox's implementati...

How do I transmit a single hexadecimal value serial data in PuTTY using an Alt code?

I am trying to sent a specific hexadecimal value across a serial COM port using PuTTY. Specifically, I want to send the hex codes 9C, B6, FC, and 8B. I have looked up the Alt codes for these and they are 156, 182, 252, and 139 respectively. However, whenever I input the Alt codes, a preceding hex value of C2 is sent before 9C, B6, and 8B so the values that are sent are C2 9C, C2 B6, and C2 8B. The value for FC is changed to C3 FC. Why are these values being placed before the hex value and why is FC being changed altogether? To me, it seems like there is a problem internally converting the Alt code to hex. Is there a way to directly input hex values without using Alt codes in PuTTY? Answer What you're seeing is just ordinary text character set conversion. As far as PuTTY is concerned, you are typing (and reading) text , not raw binary data, therefore it has to convert the text to bytes in whatever configured character set before sending it over the wire. In other words, when y...

Desktop reboots itself on sleep or hibernate

I have been using an ASUS M2NPV-VM motherboard for main home desktop workstation, operating Windows Vista x64. This computer has right from day one not been able to enter hibernate or standby; after Windows performs its final actions and brings the machine down, it would automatically revive itself for a reboot. Updating to the second latest BIOS (1201)has not helped (the latest BIOS revision would induce video refresh problems rendering it unusable). I have been reading related discussions on incidents similar to mine to no avail of a true workable solution. They appear to be more speculative guesses rather than actual knowledge on the inner workings of motherboard hardware. Does anybody have any electronic engineering experience on PC energy-saving standards to provide a more informed opinion how to go about getting this to work? More stories: this motherboard could not even reboot properly the first thing i used it. It was due to refresh rate of the onboard GPU, which had no influe...