Today I have accidentally found out that the .docx is the same .zip (or there is no big difference between them). When you change the .docx to the .zip and open with WinRAR you see a bunch of XML files in the folders. In that XML file it is stored the text, fonts, owner, last modified and so on. In a word all the information is being stored as an XML data.
But the same is not right for .doc extension files. It is impossible to open them as .zip op as .rar.
So question: What is the advantage of storing .docx’s data in XML that Microsoft has changed the way of storing data? Indeed I want to know not the advantage of XML format but why Microsoft is using multiple XML files to store the .docx data. It turnes that .docx is not new format in the root.
Answer
A .docx
file can store embedded resources, like image files, not just XML files. Instead of encoding stuff in base64 or something and storing it within an XML file or inventing yet another binary serialization format, they decided to go with the standard ZIP format.
Beside that, XML is a very verbose file format containing lots of redundant patterns. You can get a high compression ratio for XML files.
By the way, I don't really get the "tricking us" part. Is it better to invent a new cryptic file format from scratch or use a standard, known format?
Comments
Post a Comment