Monday, July 27, 2009

Locking/unlocking Word doc files programmatically

My team is going through a planning milestone again, and this means reading, reviewing, and approving a lot of specs and design documents.

So for this weekend I was toying with the idea of setting up a clone of Malevich (http://malevich.codeplex.com) for document reviews.

Malevich is of course the tool we (and now a whole bunch of other teams inside and outside Microsoft) are using for code reviews. Its main target is to make commenting easy - you simply click on a line of source code, an edit box opens, you type your comment for that line, and that's it. You can read more about Malevich's inspirations and aspirations here: http://1-800-magic.blogspot.com/2009/01/malevich-introduction.html.

Over the last 7 months Malevich has proven to be a big success. It streamlined code review process in the development team, involved many people in code reviews who otherwise would not be participating, and did wonders for the quality of our code base.

All this made me start thinking about introducing a similar process for spec reviews. After all, a review is a review, right?

The biggest problem with the spec reviews turns out to be the file format. Malevich operates on text files, and so rendering these files on the screen, showing a difference between the two versions of a file, and associating comments with the line turns out to be very simple. Specs (at Microsoft) are traditionally written as Microsoft Word documents.

Word turns out to have a very nice commenting mechanism, but rendering documents on a web page is not nearly as straightforward, and diffing them... that's a whole another project!

While pondering this idea, I ran into this blog post by Eric White: http://blogs.msdn.com/ericwhite/archive/2009/07/05/comparing-two-open-xml-documents-using-the-zip-extension-method.aspx which describes how to determine if two Word documents are the same (modulo comments). The post served as my first introduction into OpenXML, which is the format behind the Word document. Also, I read that Eric was planning a blog post about merging comments from two documents, and this lead me to the following design for the spec review site.

I am going to put together a system very similar to Malevich (let's call it Black Square for now), but instead of text files, it would hold Word documents. To create a review request, a reviewee would upload a document to the server via a web site. Upon upload, the server will lock the Word file in a way that would prevent all modifications to it other than the comments. It will then make the document available for reviewers to download.

To perform a review, the reviewer downloads the document, comments on it using Office reviews functionality, and upload it back to the server. The server will then merge the comments back into the master document, making comments from everybody available to all subsequent reviewers as well as the reviewee.

I've shot Eric an email, and as it turned out, he had already largely completed his merger, and he gave me a preliminary copy to beta test (the final version is now here: http://blogs.msdn.com/ericwhite/archive/2009/07/28/merging-comments-from-multiple-open-xml-documents-into-a-single-document.aspx).

Then I spent part of the weekend coding. After a few hours I had a skeleton web site and needed to code the first meaningful action - locking a Word document so only comments could be added.

When I have to deal with large new API sets, I tend to program by Google - search for a code snippet that best illustrates the use of the API. Internet is a great resource for that (with the only exception - reading is fine, copying code with unclear copyright into commercial problems is not!), and Windows source is even better (although I cannot use that for the open source projects, for similar reasons).

Well, as it turned out, there is a dearth of samples when it comes to OpenXML programming. Unlike most of .NET APIs, MSDN has no examples of use in its API documentation. There are a few "How to" samples of solving and end-to-end problem which primarily focus on processing the text, not configuration options of the Word file. And the rest of the Internet is pretty much silent on the subject.

To make matters worse, the API is based on XML with a bunch of types derived from base XML elements, so Intellisense does not often works.

After some struggle (and help from Eric) I was able to make sense of the programming model. Here's what's going on here.

The document has a bunch of sections. You can look them up by changing the docx extension of the file into zip, and then opening it in your favorite archiver. You will find that the file is just a zipped archive of a bunch of XML files. What I've done to figure out what elements need to be changed to lock the file was making the copy of the file, expanding it, then locking the file, expanding the result, and then diffing it.

This led me to two elements: documentSecurity in properties of ExtendedFilePropertiesPart, and documentProtection. The first one was easy - it had a counterpart in the object model, "doc.ExtendedFilePropertiesPart.Properties.DocumentSecurity", setting it was very easy:

WordprocessingDocument doc = WordprocessingDocument.Open(args[1], true);
doc.ExtendedFilePropertiesPart.Properties.DocumentSecurity =
new DocumentFormat.OpenXml.ExtendedProperties.DocumentSecurity(isLock ? "8" : "0");
doc.ExtendedFilePropertiesPart.Properties.Save();
doc.Close();


The second was a setting in MainDocumentPart. The hiccup for me (a very novice XML developer - remember, most of my life was spent deep in the guts of OS, I have not touched managed code and all attendant goo until a few months ago!) was that settings were a collection of OpenXML elements, and DocumentProtection, despite the existence of the type, was not addressable in the direct way, as a property of the settings. Instead, the settings needed to be interpreted as an XML record, e.g. via LINQ to XML:

DocumentProtection dp =
doc.MainDocumentPart.DocumentSettingsPart.Settings
.ChildElements.First<DocumentProtection>();
if (dp != null)
dp.Remove();

if (isLock)
{
dp = new DocumentProtection();
dp.Edit = DocumentProtectionValues.Comments;
dp.Enforcement = DocumentFormat.OpenXml.Wordprocessing.BooleanValues.One;

doc.MainDocumentPart.DocumentSettingsPart.Settings.AppendChild(dp);
}

doc.MainDocumentPart.DocumentSettingsPart.Settings.Save();


So here's a full code snippet. It gives you a command line utility to lock and unlock Word files (unlocking the file will - I think - also remove the password protection, although I did not try this).

You need OpenXML Format SDK 2.0 to run this, available here: http://www.microsoft.com/downloads/details.aspx?FamilyId=C6E744E5-36E9-45F5-8D8C-331DF206E0D0&displaylang=en, and a reference to DocumentFormat.OpenXml in your project.


//-----------------------------------------------------------------------
// <copyright>
// Copyright (C) Sergey Solyanik.
//
// This file is subject to the terms and conditions of the Microsoft Public License (MS-PL).
// See http://www.microsoft.com/opensource/licenses.mspx#Ms-PL for more details.
// </copyright>
//-----------------------------------------------------------------------
using System;
using System.Xml.Linq;

using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;

namespace LockDoc
{
/// <summary>
/// Manipulates modification permissions of an OpenXML document.
/// </summary>
class Program
{
/// <summary>
/// Locks/Unlocks an OpenXML document.
/// </summary>
/// <param name="args"></param>
static void Main(string[] args)
{
if (args.Length != 2)
{
Console.WriteLine("Usage: lockdoc lock|unlock filename.docx");
return;
}

bool isLock = false;
if (args[0].Equals("lock", StringComparison.OrdinalIgnoreCase))
{
isLock = true;
}
else if (!args[0].Equals("unlock", StringComparison.OrdinalIgnoreCase))
{
Console.Error.WriteLine("Wrong action!");
return;
}

WordprocessingDocument doc = WordprocessingDocument.Open(args[1], true);
doc.ExtendedFilePropertiesPart.Properties.DocumentSecurity =
new DocumentFormat.OpenXml.ExtendedProperties.DocumentSecurity
(isLock ? "8" : "0");
doc.ExtendedFilePropertiesPart.Properties.Save();

DocumentProtection dp =
doc.MainDocumentPart.DocumentSettingsPart
.Settings.ChildElements.First<DocumentProtection>();
if (dp != null)
{
dp.Remove();
}

if (isLock)
{
dp = new DocumentProtection();
dp.Edit = DocumentProtectionValues.Comments;
dp.Enforcement = DocumentFormat.OpenXml.Wordprocessing.BooleanValues.One;

doc.MainDocumentPart.DocumentSettingsPart.Settings.AppendChild(dp);
}

doc.MainDocumentPart.DocumentSettingsPart.Settings.Save();

doc.Close();
}
}
}


BTW, for the not faint-of-heart, here's the documentation for OpenXML format: http://www.ecma-international.org/publications/standards/Ecma-376.htm

And here are the Microsoft SDK docs: http://msdn.microsoft.com/en-us/library/bb448854(office.14).aspx

2 comments:

Unknown said...

there are lots of unlock source avail; across the gloge,before get unlock source should aware how to utilize the source inorder to that i found unlock guidence where can get clear picture about how to use the source or unlock.mobile unlock code

143 said...

eaHi I want to unlock the file ( word .dotx file) which was taken by some process for edit and lock is not released, But I am sure that process will not require file in future.How can I do?