Skip to main content

xml - Advantages of DOCX Format Over DOC


Today I have accidentally found out that the .docx is the same .zip (or there is no big difference between them). When you change the .docx to the .zip and open with WinRAR you see a bunch of XML files in the folders. In that XML file it is stored the text, fonts, owner, last modified and so on. In a word all the information is being stored as an XML data.


But the same is not right for .doc extension files. It is impossible to open them as .zip op as .rar.


So question: What is the advantage of storing .docx’s data in XML that Microsoft has changed the way of storing data? Indeed I want to know not the advantage of XML format but why Microsoft is using multiple XML files to store the .docx data. It turnes that .docx is not new format in the root.



Answer



A .docx file can store embedded resources, like image files, not just XML files. Instead of encoding stuff in base64 or something and storing it within an XML file or inventing yet another binary serialization format, they decided to go with the standard ZIP format.


Beside that, XML is a very verbose file format containing lots of redundant patterns. You can get a high compression ratio for XML files.


By the way, I don't really get the "tricking us" part. Is it better to invent a new cryptic file format from scratch or use a standard, known format?


Comments

Popular Posts

How do I transmit a single hexadecimal value serial data in PuTTY using an Alt code?

I am trying to sent a specific hexadecimal value across a serial COM port using PuTTY. Specifically, I want to send the hex codes 9C, B6, FC, and 8B. I have looked up the Alt codes for these and they are 156, 182, 252, and 139 respectively. However, whenever I input the Alt codes, a preceding hex value of C2 is sent before 9C, B6, and 8B so the values that are sent are C2 9C, C2 B6, and C2 8B. The value for FC is changed to C3 FC. Why are these values being placed before the hex value and why is FC being changed altogether? To me, it seems like there is a problem internally converting the Alt code to hex. Is there a way to directly input hex values without using Alt codes in PuTTY? Answer What you're seeing is just ordinary text character set conversion. As far as PuTTY is concerned, you are typing (and reading) text , not raw binary data, therefore it has to convert the text to bytes in whatever configured character set before sending it over the wire. In other words, when y...

linux - Extract/save a mail attachment using bash

Using normal bash tools (ie, built-ins or commonly-available command-line tools), is it possible, and how to extract/save attachments on emails? For example, say I have a nightly report which arrives via email but is a zip archive of several log files. I want to save all those zips into a backup directory. How would I accomplish that? Answer If you're aiming for portability, beware that there are several different versions of mail(1) and mailx(1) . There's a POSIX mailx command, but with very few requirements. And none of the implementations I have seem to parse attachments anyway. You might have the mpack package . Its munpack command saves all parts of a MIME message into separate files, then all you have to do is save the interesting parts and clean up the rest. There's also metamail . An equivalent of munpack is metamail -wy .

ubuntu - Why does my USB hdd returns SG_IO: bad/missing sense data?

I am able to boot and run commands from external USB hdd; the message in question appears for about 45 seconds then booting continues. GRUB2 is installed on internal HDD. When choosing to boot directly to /dev/sdb the message doesn't appear, however boot time is about the same as booting to internal HDD. /dev/sdb: Timing cached reads: 1018 MB in 2.00 seconds = 508.97 MB/sec Timing buffered disk reads: 80 MB in 3.03 seconds = 26.37 MB/sec pfeiffep@de:~$ sudo hdparm -i /dev/sdb /dev/sdb: SG_IO: bad/missing sense data, sb[]: 70 00 05 00 00 00 00 10 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 HDIO_GET_IDENTITY failed: Invalid argument Gparted correctly identifies the drive as SAMSUNG MP0402H. Any ideas how to remedy the HDIO & SG_IO messages?

Desktop reboots itself on sleep or hibernate

I have been using an ASUS M2NPV-VM motherboard for main home desktop workstation, operating Windows Vista x64. This computer has right from day one not been able to enter hibernate or standby; after Windows performs its final actions and brings the machine down, it would automatically revive itself for a reboot. Updating to the second latest BIOS (1201)has not helped (the latest BIOS revision would induce video refresh problems rendering it unusable). I have been reading related discussions on incidents similar to mine to no avail of a true workable solution. They appear to be more speculative guesses rather than actual knowledge on the inner workings of motherboard hardware. Does anybody have any electronic engineering experience on PC energy-saving standards to provide a more informed opinion how to go about getting this to work? More stories: this motherboard could not even reboot properly the first thing i used it. It was due to refresh rate of the onboard GPU, which had no influe...