Tuesday, December 13, 2016

Could've Used This Years Ago

I was dismayed by all the repeated text across the 18 user guides in MS Word at Pearson. I knew there had to be gobs and gobs of repeated text. An easy way to automatically find all the repeated text was not self-evident. Ultimately, I cobbled together a solution with macros. Step 1 was to pre-process the 18 Word documents by adding a two character unique identifier after every paragraph. Step 2 was to insert all 18 Word documents into a single document. Step 3 was to perform additional global pre-processing. Specifically, I remember I removed all TOC fields in the single document which was easier than doing it in Step 1. Step 4 was to sort the headings and the text within the heading section. Step 5 was to analyze each heading to determine whether it had duplicate text. Step 6 was to consolidate the differences between the text into a single way of writing the text. The abundance of customer-specific text was often overwhelming. It was also overwhelming because of the differences in document conventions that had been created between the creation of the first document and the creation of the 18th document. For example, the earlier documents had headings like "To Edit a Contact" whereas the "middle"  documents had changed that heading to "Edit a Contact" and the "most recent" documents had changed those headings to the gerund - "Editing a Contact" - because there had not been a standard for headings that had been implemented in Document 1 and carried through to Document 18. That made it more complex when I was analyzing the content. While the headings were sorted in alphabetical order, those headings that began with "To" were (obviously) not near the "Edit" and "Editing" headings. That's just one small example of the challenges of that project. In the end, it didn't matter as we stopped using MS Word as our primary authoring tool and moved to Confluence for the rewrite of the system. We did not convert content from MS Word - we wrote from scratch - when we moved to Confluence. Initially, our direction was to not have numbered steps. Instead, we were going to instruct the users to use the user interface at the same time as the system. There was no reason to repeat the names of the UI elements, such as check boxes or drop-downs. Our goal was to not have a user guide with those UI elements repeated. If the user was looking at the screen, they knew whether "this" was a check box or a drop-down and we were not going to insult the user by telling the user how to use those elements. We would have an "Understand the System We were not going to duplicate  were not going to have numbered steps but after the first customer for the new

Sigh.

Anyways, I mention all of this because I just came across this helpful tip - Find Duplicate Words, Phrase Or Paragraphs In MS Word 2010 - and briefly pondered if I would have been able to leverage its simple 13 steps (though step 1 is a "duh" and step 13 is a "filler" step and not actually something the reader has to do) to make that whole project easier.

No comments: