After having done some programming to read Microsoft Word files, I thought I'd write about how the Word 2007 or Office Open XML file format is put together. This isn't complete, but this will get you started.
Cracking the door open
When investigating a mystery file, the first thing a Unix junkie does is run
file on it. file is a nifty program that will try to identify what sort of data it's looking at, without paying any attention to the file extension. Let's do that now:
$ ls
Lecture 1.docx
$ file Lecture\ 1.docx
Lecture 1.docx: Zip archive data, at least v2.0 to extract