Working with the Microsoft Office Word Interop

The Big Project requires the ability to enter data into and extract data from the online system using offline means. While a web form is all well and dandy most of the time, we needed something in case the Internet, was, you know, destroyed. So we chose Microsoft Word Forms (using the Office .NET Interop) to handle the task (doesn’t everyone have word on their computer?). While it’s ultimately proving to do the job quite nicely, it wasn’t easy getting here.

First of all, like all Microsoft products that weren’t created by the .NET team (yes, this is props to ScottGu and co.), the documentation was lacking. Don’t get me wrong, every function was documented. But examples were sorely lacking, if they existed at all. And IDE support within Visual Studio is only partially complete. Everything that appears within the Intellisense popup has no description. I would expect more.

After a few weeks of learning, experimenting, and tweaking, I think my team and I have picked up some useful tips:

  • Use drop downs whenever possible. For free entry “fields” work great for short (UNFORMATED!) text entry; for anything of length if you wish to preserve Unicode formatting (i.e. bullets, paragraphs, etc) use a section. To create a section you just use the normal “Section Break” method. You’ll notice in the Status bar on the bottom each section is numbered and addressable via code using the Document.Sections[] method.
  • When reading those sections using the Section.Range.Text.ToString() method, empty sections may not return as null, often they will contain carriage return characters and form/line feeds. Even non null fields will have extraneous CC/FF/LFs. I have yet to understand this behavior. However I do have a work around. Simply run each string that you extract through a filter like the one below. This C# .NET code will check the UTF encoded value of the last character in the string, if it’s a carriage return (13), line feed (10), or form feed (12), it will remove it. The database can handle these characters, as does anything that can handle UTF encoding. The reason we want to remove these characters (besides unnecessary white space) is that when inserting that data back into an empty Word Form it causes our pretty section break to become page breaks. Don’t ask me why, I only know how to avoid it. :)
        private string EncodeUTF8(string inTxt)
            if (inTxt.Length == 0)
                return null;

            int count = 0;
            char[] c = inTxt.ToCharArray();

            while ((int)c[c.Length - 1] == 12 || (int)c[c.Length - 1] == 13
             || (int)c[c.Length - 1] == 10)
                if (inTxt.Length > 0)
                    inTxt = " " + inTxt.Remove(inTxt.Length - 1);
                    c = inTxt.ToCharArray();
            if (count > 0)
                inTxt = inTxt.Remove(0, count);
            if (inTxt.Length == 0)
                return null;
                return inTxt;

0 Responses to “Working with the Microsoft Office Word Interop”

  1. Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: