C#: Create and Manipulate Word Documents
Programmatically Using DocX 
By John Atten, 14 Nov 2013   
In a recent post, I extolled the virtues of a 
wonderful OSS library I had found for working 
with Excel data programmatically, LinqToExcel. 
In that post, I also mentioned a fantastic library 
for working with Word docs as well, and promised 
to discuss it in my next post. This is that post.  
The big pain point in working with MS Word 
documents programmatically is . . . the Office 
Interop. To get almost anything done with Word 
(including simply pulling the text out of the 
document, you pretty much need to use Interop, 
which also means you have to have Word 
installed on the local machine which is consuming 
your application. Additionally, my understanding 
is that there are issues with doing Word 
automation on the server side.  
Image by Mohylek - Some Rights Reserved 
Interop is essentially a giant wrapper around the ugliness that is COM, and the abstraction layer is thin 
indeed. If you need to automate MS Office applications, Interop (or going all the way down to the COM 
level) is pretty much the only way to do it, and obviously, you need to have Office installed on the 
client machine for it to work at all.  
Often times though, we don't so much need to automate the office application directly so much as get 
at the contents of Office file (such as Word or Excel files). Dealing with all that Interop nastiness makes 
this more painful than it needs to be.  
Thankfully, the open source DocX by Cathal Coffey solves both problems nicely, and unlike Interop, 
presents an easy-to-use, highly discoverable API for performing myriad manipulations/extractions 
against the Word document format (the .docx format, introduced as of Word 2007). Best of all, DocX 
does not require that Word or any other Office dependencies be installed on the client machine! 
The full source is available from Coffey's Codeplex repo, or you can add DocX to your project using 
Nuget. 
10/2/2013 - NOTE: It has been noted by several commentors on Reddit and elsewhere that the MS 
official library OpenXml serves the same purpose as Docx, and after all, is the "official" library. I 
disagree - the OpenXml library "does more" but is inherently more complex to use. While it most certainly 
offers additional functionality not present in DocX, the DocX library creates a much simpler and more 
meaningful abstraction for what I find to be the most common use-cases working with Word documents 
programmatically. As always, the choice of libraries is a matter of preference, and to me, one of "Right 
tool for the job."   
1/23/2014 - NOTE: I mentioned in the opening paragraph the OSS project LinqToExcel, which is a 
fantastic library. However, LinqToExcel takes a dependency on the Access Database Engine, which can 
create issues when (for example) deploying to a remote server or other environment where administrative 
privileges may be limited. I discovered another OSS library with no such dependencies. You can read 
about it at  Use Cross-Platform/OSS ExcelDataReader to Read Excel Files with No Dependencies 
on Office or ACE  
In this post, we will look at a few of the basics for using this exceptionally useful library. Know that 
under the covers and with a little thought, there is a lot of functionality here beyond what we will look 
at in this article.  
  Getting Started - Create a Word Document Using the DocX Library  
  Use DocX to Add Formatted Paragraphs  
  Find and Replace Text Using DocX -Merge Templating, Anyone?  
  DocX Exposes Many of the Most Useful Parts of the Word Object Model 
  C#: Query Excel and .CSV Files Using LinqToExcel  
  Additional Resources/Items of Interest  
Add DocX to your project using Nuget 
NOTE: When adding Nuget packages to your project, consider keeping source control bloat down by 
using Nuget Package Restore so that packages are downloaded automatically on build rather than 
cluttering up your repo.   
As with LinqToExcel, you can add the DocX library to your Visual Studio solution using the Nuget 
Package Manager Console by doing: 
Install DocX using the Nuget Package Manager Console: 
Collapse | Copy Code 
PM> Install-Package DocX 
Alternatively, you can use the Solution Explorer. Right-click on the Solution, select "Manager Nuget 
Packages for Solution," and type "DocX in the search box (make sure you have selected "Online" in the 
left-hand menu). When you have located the DocX package, click Install: 
Install DocX using the Nuget Package Manager GUI in VS Solution Explorer:  
Getting Started  Create a Word Document Using the DocX Library  
Wanna Quickly create a Word-compatible, .docx format document on the fly from your code? Do this (I 
am assuming you have Word installed on your local machine): 
(Note  change the file path to reflect your own machine) 
A Really Simple Example: 
Collapse | Copy Code 
using Novacode; 
using System; 
using System.Diagnostics;   
namespace BlogSandbox 
{ 
    public class DocX_Examples 
    {       
        public void CreateSampleDocument() 
        { 
            // Modify to suit your machine: 
            string fileName = @"D:\Users\John\Documents\DocXExample.docx";    
            // Create a document in memory: 
            var doc = DocX.Create(fileName);    
            // Insert a paragrpah: 
            doc.InsertParagraph("This is my first paragraph");    
            // Save to the output directory: 
            doc.Save();    
            // Open in Word: 
            Process.Start("WINWORD.EXE", fileName); 
        } 
    } 
} 
Note in the above we need to add using Novacode; to our namespace imports at the top of the file. 
The DocX library is contained within this namespace. If you run the code above, a word document 
should open like this: 
Output of Really Simple Example Code:  
What we did in the above example was: 
  Create an in-memory instance of a DocX object with a file name passed in as part of the constructor.  
  Insert a DocX.Paragraph object containing some text.  
  Save the result to disc as a properly formatted .docx file.  
Until we execute the Save() method, we are working with the XML representation of our new 
document in memory. Once we save the file to disc, we find a standard Word-compatible file in our 
Documents/ directory. 
Use DocX to Add Formatted Paragraphs  A More Useful Example  
A slightly more useful example might be to create a document with some more complex formatted 
text: 
Create Multiple Paragraphs with Basic Formatting: 
Collapse | Copy Code 
public void CreateSampleDocument() 
{ 
    string fileName = @"D:\Users\John\Documents\DocXExample.docx"; 
    string headlineText = "Constitution of the United States"; 
    string paraOne = "" 
        + "We the People of the United States, in Order to form a more perfect Union, " 
        + "establish Justice, insure domestic Tranquility, provide for the common defence, " 
        + "promote the general Welfare, and secure the Blessings of Liberty to ourselves " 
        + "and our Posterity, do ordain and establish this Constitution for the United " 
        + "States of America.";    
    // A formatting object for our headline: 
    var headLineFormat = new Formatting(); 
    headLineFormat.FontFamily = new System.Drawing.FontFamily("Arial Black"); 
    headLineFormat.Size = 18D; 
    headLineFormat.Position = 12;    
    // A formatting object for our normal paragraph text: 
    var paraFormat = new Formatting(); 
    paraFormat.FontFamily = new System.Drawing.FontFamily("Calibri"); 
    paraFormat.Size = 10D;    
    // Create the document in memory: 
    var doc = DocX.Create(fileName);    
    // Insert the now text obejcts; 
    doc.InsertParagraph(headlineText, false, headLineFormat); 
    doc.InsertParagraph(paraOne, false, paraFormat);    
    // Save to the output directory: 
    doc.Save();    
    // Open in Word: 
    Process.Start("WINWORD.EXE", fileName); 
} 
Here, we have created some Formatting objects in advance, and then passed them as parameters to 
the InsertParagraph method for each of the two paragraphs we create in our code. When the code 
executes, Word opens and we see this: 
Output from Creating Multiple Formatted Paragraphs  
In the above, the FontFamily and Size properties of the Formatting object are self-evident. The 
Position property determines the spacing between the current paragraph and the next.  
We can also grab a reference to a paragraph object itself and adjust various properties. Instead of 
creating a Formatting object for our headline like we did in the previous example, we could grab a 
reference as the return from the InsertParagraph method and muck about: 
Applying Formatting to a Paragraph Using the Property Accessors: 
Collapse | Copy Code 
// Insert the Headline and do some formatting: 
Paragraph headline = doc.InsertParagraph(headlineText); 
headline.Color(System.Drawing.Color.Blue); 
headline.Font(new System.Drawing.FontFamily("Comic Sans MS")); 
headline.Bold(); 
headline.Position(12D); 
headline.FontSize(18D); 
This time, when the program executes, we see THIS:  
OH NO YOU DID NOT!  
Yes, yes I DID print that headline in Comic Sans. Just, you know, so you could see the difference in 
formatting.  
There is a lot that can be done with text formatting in a DocX document. Headers/Footers, paragraphs, 
and individual words and characters. Importantly, most of the things you might go looking for are 
easily discoverable  in other words, the author has done a great job building out his API.  
Find and Replace Text Using DocX - Merge Templating, Anyone?  
Of course, one of the most common things we might want to do is scan a pre-existing document, and 
replace certain text. Think templating here. For example, performing a standard Word Merge is not 
very doable on your web server, but using DocX, we can accomplish the same thing. The following 
example is simple due to space constraints, but you can see the possibilities: 
First, just for kicks, we will create an initial document programmatically in one method, then write 
another method to find and replace certain text in the document: 
Create a Sample Document: 
Collapse | Copy Code 
private DocX GetRejectionLetterTemplate() 
{ 
    // Adjust the path so suit your machine: 
    string fileName = @"D:\Users\John\Documents\DocXExample.docx";    
    // Set up our paragraph contents: 
    string headerText = "Rejection Letter"; 
    string letterBodyText = DateTime.Now.ToShortDateString(); 
    string paraTwo = "" 
        + "Dear %APPLICANT%" + Environment.NewLine + Environment.NewLine 
        + "I am writing to thank you for your resume. Unfortunately, your skills and " 
        + "experience do not match our needs at the present time. We will keep your " 
        + "resume in our circular file for future reference. Don't call us, "  
        + "we'll call you. "    
        + Environment.NewLine + Environment.NewLine 
        + "Sincerely, " 
        + Environment.NewLine + Environment.NewLine 
        + "Jim Smith, Corporate Hiring Manager";    
    // Title Formatting: 
    var titleFormat = new Formatting(); 
    titleFormat.FontFamily = new System.Drawing.FontFamily("Arial Black"); 
    titleFormat.Size = 18D; 
    titleFormat.Position = 12;    
    // Body Formatting 
    var paraFormat = new Formatting(); 
    paraFormat.FontFamily = new System.Drawing.FontFamily("Calibri"); 
    paraFormat.Size = 10D; 
    titleFormat.Position = 12;    
    // Create the document in memory: 
    var doc = DocX.Create(fileName);    
    // Insert each prargraph, with appropriate spacing and alignment: 
    Paragraph title = doc.InsertParagraph(headerText, false, titleFormat); 
    title.Alignment = Alignment.center;    
    doc.InsertParagraph(Environment.NewLine); 
    Paragraph letterBody = doc.InsertParagraph(letterBodyText, false, paraFormat); 
    letterBody.Alignment = Alignment.both;    
    doc.InsertParagraph(Environment.NewLine); 
    doc.InsertParagraph(paraTwo, false, paraFormat);    
    return doc; 
} 
See the %APPLICANT% placeholder? That is my replacement target (a poor-man's merge field, if you 
will). Now that we have a private method to generate a document template of sorts, let's add a public 
method to perform a replacement action: 
Find and Replace Text in a Word Document Using DocX: 
Collapse | Copy Code 
public void CreateRejectionLetter(string applicantField, string applicantName) 
{ 
    // We will need a file name for our output file (change to suit your machine): 
    string fileNameTemplate = @"D:\Users\John\Documents\Rejection-Letter-{0}-{1}.docx";    
    // Let's save the file with a meaningful name, including the  
    // applicant name and the letter date: 
    string outputFileName =  
    string.Format(fileNameTemplate, applicantName, DateTime.Now.ToString("MM-dd-yy"));    
    // Grab a reference to our document template: 
    DocX letter = this.GetRejectionLetterTemplate();    
    // Perform the replace: 
    letter.ReplaceText(applicantField, applicantName);    
    // Save as New filename: 
    letter.SaveAs(outputFileName);    
    // Open in word: 
    Process.Start("WINWORD.EXE", "\"" + outputFileName + "\""); 
} 
Now, when we run the code above, our output is thus:  
Obviously, the preceding example was a little contrived and overly simple. But you can see the 
potential . . . If our letter contained additional "merge fields, we could just as easily pass in a 
Dictionary<string, string>, where the Dictionary contains one or more Key Value Pairs 
containing a replacement target and a replacement value. Then we could iterate,  using the Dictionary 
Keys as the search string, and replace with the Dictionary values.  
DocX Exposes Many of the Most Useful Parts of the Word Object Model  
In this quick article, we have only scratched the surface. DocX exposes most of the stuff we commonly 
wish we could get to within a Word document (Tables, Pictures, Headers, Footers, Shapes, etc.) without 
forcing us to navigate the crusty Interop model. This also saves us from some of the COM de-
referencing issues which often arise when automating Word within an application. Ever had a bunch of 
"orphaned" instances of Word (or Excel, etc.) running in the background, visible only in the Windows 
Task Manager? Yeah, I thought so . . . 
If you need to generate or work with Word documents on a server, this is a great tool as well. No 
dependencies on MS Office, no need to have Word running. You can generate Word documents on the 
fly, and/or from templates, ready to be downloaded.