Search This Blog

Single Sourcing Rant

This page began as a post and was then moved to this page. Scattered throughout the other 4000+ posts on this blog are mini-rants about single sourcing in documentation. The long-term goal is to review those mini-rants and either move that content to this page or if the rant is integrated too deeply into the existing post, include a link to that post.

You realize what that sentence just said, right?

Let me restate it:

There is a new rule on this blog. The new rule is this page will be a single source for rants on this subject but if I want to do so, it's okay to keep the existing content in the other place and add a link to it. It's the basis of single sourcing content - you want to write once and reuse many places. And yet, I just created a rule that strives for a single place to capture rants about single sourcing that allows an exception.  Single sourcing is hard work.

Always.

But for it to work, there can be no copy and paste, no multiple versions of the same content.

Otherwise, it's not single sourcing.

Yes.

Every single damn time.

I am a technical writer.

I am not a word processor.

I'm not a programmer.

I know my approach to single sourcing the content with which I am involved can be unpopular. At one former employer, my approach was seen as cumbersome. I think it is because the idea of writing content once and reusing it each time it was necessary to have that content was not done from the very beginning. It led to analyzing a lot of content and trying to figure out what version of a given blurb of text captured all the nuances of the other blurbs so I could create a snippet.

What is a snippet? By definition, it's a small piece or brief extract, which is about as helpful as asking, "Why is Metallica awesome?" and replying, "Because they are a heavy band." There's no substance to that definition, which I found through a Google search.

This is not a new idea. In this article from 2002, Ann Rockley offered her thoughts: Fundamental Concepts of Content Reuse

Here's an illustration of how a snippet can be used within documentation. What if you had a semi with 18 wheels on it and you needed to change all 18 tires. Further, assume that to change one tire, you perform 18 steps (just go with it). You would perform 324 steps in order to change all 18 tires (18 steps for each tire x 18 tires to change). There is no such thing in reality, but imagine, if you will, there is a "magic tire" that exists. The way the "magic tire" works is that when you change one of the 18 tires, the other 17 are updated automatically. There's not really anything else, outside of technology, that is similar. For this semi example, you would have to repeat the steps for each tire, but within technology, there is a way to update one thing and to have that update ripple through to many things. I suppose it's similar to when you put a stone in a pool of water. The stone creates ripples in the pond.  For me, a blurb of text and a screenshot that is used in multiple places are stones.

Therefore, when I think about snippets and content reuse, I cannot be convinced they should be created after all the content is written. Snippets need to be created during the actual act of writing content. That means as I'm writing, when I want to write the same text I know I wrote elsewhere the second time, I create a snippet.

I don't want to copy / paste content. Lisa P, from 365 Software says it best when she writes:

To put it in simple terms – Write once. Approve once. Use everywhere. This is the goal and this is what your company needs. Why? Because content costs money to create and maintain, it takes a lot of time to create and maintain, and the quality of the content (especially customer-facing content) has a direct impact on revenue and reputation. ... Content reuse is critical for saving money. (source: Selecting the Best Technology to Support your Content Reuse Strategy, Lisa Pietrangeli
Managing Partner, Executive Director of Operations and Business Development, 36Software)


Lisa's summary is why I embedded two videos from 36 Software below. I've been a technical writer for a long time. I was hired at NDP on 2/10/1995. Since that day, I have learned many things about the way I want to work. I spent a dozen years doing a lot of copy+paste between documents and it always felt like there was a better way. Turns out there is. My perspective has really been shaped into thinking like a programmer, thanks in large part to understanding content reuse. Many years ago, I remember reading on the HATT email list that if you copy and paste, you are not single-sourcing content. In his article called Single Source All Your Content, published on April 13, 2016, Thomas Aldous defined single sourcing as

source content should be authored and edited in one universal format that eventually gets published in its various output formats like HTML5, PDF, ePub, Mobile Applications, etc. You should never edit the published content directly. If a change is required to the published content, the source should be updated and republished.

I like to apply that concept to understanding how programmers might create modules of code, which is then used in multiple places.

Before I continue, bear in mind, I am writing I want to work like a programmer without knowing what it feels like to work as a programmer.

Therefore, with that in mind, the way I envision this concept in my mind is that if there is a Print button on Screen A and the task is to add a Print button on Screen B, a good / decent / knowledgeable programmer would not copy and paste the code between the area of the code for Screen A into the area of code for Screen B to get that Print button added. Instead, what they would do is separate the part of the code that adds the Print button to a separate module. Then, they would change the area of code for Screen A to reference that module and, also, add a reference to that module to add the Print button to Screen B. I am talking about the part of the code that actually creates the button on the screen and upon being clicked, does the "print" functionality. The point is that you maintain that functionality in a single place. I've discussed this example with programmers and they have reinforced that I 'get' the concept.

What gets me on my pedestal is when I see documentation that begs for content reuse. I installed Carbonite and selected the y: external hard drive (EHD) to be backed up. Then I selected a second EHD to be backed up and got an error message. I went to the Carbonite documentation to research whether I can specify more than one EHD. I found that no, I can only back up a single EHD.
I read this information in two different Support articles. By comparing and contrasting the content, it struck a sensitive subject: two versions of the same content.
Notice how in the first text, it says this:

If you try to add files from a second external hard drive to your backup, you will receive a message indicating that files or folders backed up from the previous external hard drive will be removed from your backup within 30 days.

but in the second text, it says this:

With your subscription, you are only able to back up a single external hard drive to be included in your backup. If you try to back up a second external hard drive, you will receive an error message. After selecting a second external hard drive for backup, you will receive a message indicating that all the information on the previous hard drive will be removed from your backup. To proceed, click Yes and that external hard drive will be selected for back up.

and shows a screenshot of the message.

There is a way to write the content that is applicable to both articles. I do this on a daily basis. I would write the content once, store it as a "snippet" and then add a reference to that snippet within the page.

Here's another definition of how snippets are useful. In the article Case Study | Hewlett Packard Enterprise

....The information engineering team also makes extensive use of snippets, variables, and conditional tags in MadCap Flare to maximize content reuse for its different outputs. ... "Flare variables and snippets are huge time-saving features for us. If we need to change product names for future versions or releases, we can automate those updates by just changing the variables," Fine explained. “With snippets, we remove the need to retype content, helping us to reduce user error and simplify our process of creating content.”

Does it matter that I use RoboHelp instead of Flare? No. This is about the 'concept' of content reuse, not the tool.

I really think it's awesome when technical writers confront the idea that content should be written once and used multiple places. As I outlined above in the Carbonite example, it's essential to be consistent. I remember talking to Lisa Pietrangeli at WritersUA in Memphis, TN, in 2012, about how the product she was demoing accomplished single sourcing with MS Word documents. In her article on LinkedIn, she points out exactly what I believe:
|
Before we continue, let’s all get on the same page about what content reuse means. To put it in simple terms – Write once. Approve once. Use everywhere. This is the goal and this is what your company needs. Why? Because content costs money to create and maintain, it takes a lot of time to create and maintain, and the quality of the content (especially customer-facing content) has a direct impact on revenue and reputation. Whether your company is creating technical documentation, sales proposals, SOWs, or quality standards, content is critical to making money. Content reuse is critical for saving money.
|
That is from https://www.linkedin.com/pulse/selecting-best-technology-support-your-content-reuse-part-lisa. There are three parts to her article and in the second part, I noticed this:
|
When choosing a content reuse solution, you'll need to nderstand your budget and desired implementation timelines.
|
https://www.linkedin.com/pulse/selecting-best-technology-support-your-content-reuse-part-lisa-1 |
Fear not, I emailed Lisa to fix the tpyo! <grin>
|
Then, in part 3, these are the capabilities that any tool being used for content reuse need to have:
|
  1. Capability #1: Component Reuse
  2. Capability #2: Metadata and Taxonomy
  3. Capability #3: Where Used
  4. Capability #4: Document Assembly
|
From https://www.linkedin.com/pulse/selecting-best-technology-support-your-content-reuse-part-lisa-2 I think those capabilities are spot-on!

When I see things like this, I shudder. The portion of the screenshot below with the red box was pasted onto the base image to illustrate a point. On one screen in this application, the Delete button is blue while on another, the Delete button is red. Add in the fact that both screens have a Reset button with the same color and size. Why is this? Is it a judgment call? Is it because the software developer thought, "Oh, deleting the whatever on this screen is important and dangerous so I'll make it red?"
My point is there is no reason I can fathom where what that system did is correct in the realm of content reuse. It is the same trap I mentioned at the top of this page - where a standard has been declared but there will be exceptions made, which means it's not a rule.

900 Screenshots

Another reason to pursue single sourcing and snippets and content reuse is so that I don't find myself in the same situation as John Dumbrille, who wrote, "Our software's user interface is being updated, all at once, later this year. Our documentation includes 900 screenshots. Advice?"

Of course I had advice and some of it was actually meaningful. I wrote the following:

I have gone through this before. I would treat the "revised" system as a "new" system as you are going to have to go through all the tasks in your user guide to catch other changes, such as navigation and enhanced functionality. As you do that work, capture your new screenshots.

One other point. Does this system really have 900 unique screens? If so, WOW! If the 900 screenshots is because you have the same screenshot within multiple tasks, one strategy I would use would be to create a snippet to store that screenshot and then include a reference to that snippet. Most HATs have the functionality (RoboHelp, Flare, Confluence are the 3 I'm familiar with personally). If the same screenshot is currently in 10 different tasks, you would only need to update the snippet.

Sounds like a great project!

It sincerely sounds like awesomeness! I had this type of work with the rewrites of a trouble reporting system. There was Trouble I, then Trouble II, then Trouble II with RELTEC, then Trouble III, then, shortly before I left the company, there was Trouble Management. I never had 900 screenshots in my documentation for that system or any other system, except maybe the service orders system I documented. If I would have known then what I know now, the documentation for that service order system would never have looked the way it did. I was using RoboHelp, but didn't know to use snippets for screen captures - I had to work at Pearson to learn about snippets. Now, that said, for that specific service order system, there were not a lot of duplicate screenshots. Going by memory, I'd guess 95% of the screenshots in that documentation were unique screenshots and not used in more than one area.


Sometimes, it appears the technical writer does understand the concepts I mentioned above. When that happens, the result is really good documentation. For example, I came upon this page: https://smashdocs.zendesk.com/hc/en-us/sections/115000348812-Create-Import-Documents, which is about SMASHDOCs. There are three FAQs, each with their own answer, on three different pages. Pay attention to the wording of the three FAQ answers:




Know what's awesome? Consistency. Each of the questions is answered with the identical statement:

"No, this is not possible. Currently, you can only import Word documents in .docx format in SMASHDOCs."

Assuming the FAQs were written in the order above, the writer didn't take the text in the first question and rewrite the above sentence into a different sentence. They didn't write, for example:

"Unfortunately, not at this time as you can only import .docx format (Word documents) in SMASHDOCs."

And they didn't take that second sentence and write this in the third FAQ, for example:

"At this moment in time, in SMASHDOCs, currently only importing .docx (Word) documents is supported."

I don't see any of those rewrites in their FAQ which means they were consistent. Sure, they may have copied and pasted the sentence from the first to the second to the third, instead of using a snippet and referring to that snippet to achieve the consistency I desire. I'll likely never know...

I remember thinking about using SmartDocs back in 2012 for the user guides at a previous employer. I attended a demo of the product at the 2012 WritersUA conference in Memphis, TN, and remember thinking how awesome it would be to incorporate into the user guides I was working with on a daily basis. The rest of the story is that we shifted to Confluence, instead of a Word-based workflow. Yet, what I thought at the time is true now: snippets and reusable content is the way to go. These two short videos reminded me of that:





Meanwhile, At Work...

One of the tools I have available to me at work is called Naavia. The neat trick about Naavia is that I can make changes in the text and then spit out a document that has that text in both a flowchart as well as the explanatory text for each part of the flowchart. The advantage of Naavia is that instead of making the text changes both in the flowchart proper and the place where I store the text, I make the text change once, in the place where I store the text and that change is then reflected in the flowchart. If I didn't have Naavia, I'd have a Word document and a Visio flowchart and I'd be trying to keep the text updated. Naavia allows one update to affect two different outputs. Of course, since I am working on Knowledge Management and working within Naavia daily, I see the value. It is a lot like this example, which is from a thread in a LinkedIn Madcap Flare discussion group:


This is another example of where snippets and single sourcing documentation makes perfect sense. The red boxes in the screen captures below are around unique text; the purple boxes in the screen captures below are around text that is duplicated in the two articles:



RoboHelp could support having a single topic with all of the content above and then tagging the text in the red boxes as being unique to the specific instance where it is relevant.

Knowledge Management

What has become very important to me in my daily work is the lack of discussion in Knowledge Management literature about snippets.

I'm trying to understand the chasm between technical writing & knowledge management:

In technical writing, I use snippets of text or graphics to create a reference to text or graphics that I need to use in multiple places.

In Knowledge Management, no one talks about snippets or reusable content or the actual meat & potatoes of writing and then maintaining Knowledge Articles (KAs).
There is no mention that I can find about why snippets can save you time with your KAs. It’s like you write a KA, you publish it, and it exists in a vacuum. I looked at 8 documents that I think will end up as KAs. I analyzed the contents and found the following:

There were 121 paragraphs of text and 62 graphics in those 8 documents.

58% of those paragraphs were used, verbatim, in more than one document
55% of those graphics were used, verbatim, in more than one document
From my perspective, I think there needs to be a way to store duplicate content as snippets so that before we create hundreds of identical paragraphs and graphics that are spread throughout hundreds of Knowledge Articles in our Knowledge Repository we have a solution. If there are reusable variables (e.g. snippet) to store duplicated content, the content is maintained efficiently. I would use reusable variables to centralize the definition of commonly used text, such as company names, product names, contact information, URLs, graphics and things of that nature. I think it would promote content reuse, making it easy to update placeholder text, and eliminate common Copy & Paste and Find & Replace errors. The problem is we do not have a way to create snippets in Cherwell.

I understand that part of the KCS Knowledge Management methodology is that there is a review process.

Brief side note about a video I found about Acorio's Knowledge Centered Support tool (which I am not using). I'm including it here because it talks about KCS in more depth: Acorio Knowledge Centered Support.
https://www.linkedin.com/pulse/kcs-93-adoption-team-knowledge-centered-support-paul-jay/



I think it’s implied that if you have outdated text or an outdated screenshot, it would be during that Review process when it would be fixed. To me, that is too reactionary. I think if you know you’re going to have standard text, like Help Desk contact information, it makes more sense to write and maintain it once. If the Help Desk becomes the “Support Desk,” the task of updating the email address in hundreds of Knowledge Articles seems like a giant waste of time – especially when I use a tool (RoboHelp) for other projects that support snippets! It felt like I was missing the obvious or that I was speaking Klingon when I talked about content reuse in relation to Cherwell's Knowledge Management functionality.


That's why, on Wednesday, March 28, 2018, which happened to be my 2 year anniversary at work (!!!), I talked about snippets in relation to the way I believed the Knowledge Articles in the Knowledge Repository should be created. I mentioned the statistics above - about how 58% of the content is duplicated and that 55% of the graphics are duplicated - and that Cherwell does not have snippet functionality. My manager was not at this meeting, but his boss was. He is well-acquainted with the work I have done with the Disaster Recovery documentation project and knows that the tool I use - Adobe RoboHelp - is powerful. Before I could say, "We should use Adobe RoboHelp for the Knowledge Articles," HE SAID, "Does the Adobe platform allow the creation of snippets?"

The smile on my face was a trillion miles wide. I said, "Yes."

Since the meeting, I have been working on a proof-of-concept to prove the power of snippets.

If only everyone understood the power of snippets. Some people do not.


I realized this when I was paying my Discover card bill. I found two issues with the website. First, there was a case where there likely was a global find and replace action performed and, because of that, there was a glitch (to me) in how the Discover site works.

Before I continue, though, I want to be crystal clear about what would be an easy conclusion.

I don't need, want, or plan to do any sort of "balance transfer" with any of my accounts - I was poking around the Discover.com website and wanted to see how the help text worked for the website.

Now then, this is what happened.

I went to the Frequently Asked Questions section and selected Balance Transfer. It took me to the list of links you see below. I clicked the first link and started reading What is a balance transfer?, expecting to read a definition. I immediately noticed that the second & third words were a hyperlink so I clicked it. I ended up on a different page, marked A. This told me that I could do a balance transfer and how doing so would ...get [me] a low promo rate to help pay them off, which is good information, but it doesn't tell me What is a balance transfer?, as the link had promised to tell me. Then I noticed the tab to the right of the Available Offers tab - called About Balance Transfers - which looked interesting so I clicked it. I ended up on a different page, marked B. This page defined the phrase balance transfer very nicely in two clear sections - What's a balance transfer and Why do one? - where I read the definition of a balance transfer.

Of course, that's when I realized that the text on the initial page did include a definition of balance transfer and when I looked closer at the B page, I noticed that the definitions were not the same. The same text was not used in both places!
Initial Page
Page A
Page B
So, which definition is right - the definition on the initial page or the definition on the About Balance Transfers page?

Here's what I would have done differently. First, I wouldn't have made the second and third words on page A be a hyperlink. I would have added a "To learn more, see [hyperlink]About Balance Transfers[/h]. Second, I would have reworked the text so that the definition of Balance Transfer was consistent between the two pages and, to do that, I would have a snippet with a single definition of the term. Now that I think that through, if the definition on page A and on page B were identical, I would not have a link from page A to page B as there would be no need to make the user click to go to page B to read the identical text. Instead, there would only be a link for Initiate a Balance Transfer on the initial page.

It would be helpful to the user to see the

SAME

Definition

for the phrase so that there is

NO

uncertainty in their mind.

On top of that, I don't plan to take the time to search their entire website and look for any other places that uses the phrase balance transfer - there could be many other places where the phrase is used - there could be no other place where the phrase is used.

I don't know.

I'm not curious enough about their website to look any further into it than what I did.

Besides, I have my own pot boiling with ideas that I need to watch over!

Wouldn't it be neat to have a process that would create snippets of duplicated text automatically and to then replace the duplicated text, within the HTML file, with a reference to the snippet?!? I think so! I hope to nail down what that process involves for my Knowledge Articles. The process must be repeatable and have as little manual intervention as possible. It's a tall order, but I'm a tall guy so I think it will work itself out!!

I decided to remove a link from the list below so that information about this topic is on this page and not elsewhere.

 Go ahead and read the "Addressing similar issues in Solve Loop articles" post, but I also want to call attention to the following, at the bottom of the post:

Mary Paez You can create a Knowledge Collection article that lists various (related issues) and have links to the short articles depending on the path taken in the KC article.

Alexander Tsmokalyuk I can, but those are Evolve Loop articles while my goal is to make Solve loop articles as useful as possible. Solve loop articles describe one exact issue.

Paul Hanson Mary Paez, if you have that list of articles and you have it listed in multiple KAs and one (or more) of the articles changes its title, how do you keep that list of articles updated to reflect that title change? Also, how do you track which articles have that list of articles so that you can verify all of the articles with the list are updated?

This is the total essence of what I have been trying to solve with KM and Cherwell (and hopefully RoboHelp) so that things like a list of relevant articles could be set up as a snippet and added to the articles that are related to each other.

What I fully expect to be the answer is to manually maintain that list of articles and to manually track which articles have that list of articles. There's unlikely going to be any sort of automated solution within Cherwell... which is why I want to use RoboHelp!

A Semi-Regular Status of the Knowledge Management Project at Work

I'm not going to use this page for weekly updates related to the Knowledge Management.

I promise.

2018-06-06 Status Meeting

My co-worker and I have made progress with Knowledge Management. When we met with [the project's sponsor] today, we looked at how Knowledge Management will look to the customer, which is something that she had asked us to show her. Thus, we showed her two options:
  1. store knowledge within Cherwell - translation: a non-RoboHelp solution
  2. store knowledge outside Cherwell - translation: a RoboHelp solution
I am hopeful that the RoboHelp solution will be selected.

Reason for the Opinion

When I get all excited about writing once and reusing many times, it is because of things like this:

Why 681 styles - is each unique? Why 207 list templates - is each unique? Why 588 Inline Shapes - is each unique?

I can't believe the answers to those simple questions are all "yes, they are unique." So, that means duplication. That means extra and redundant work. That means taking more time to do tasks inefficiently. That means waste.

That means I'm going to argue against copying / pasting text, graphics - ANYTHING - within a single document. If it has to be copied / pasted, it should be referenced.
I've often considered what I work on, when writing documentation, to be a type of programming. Would anyone that knows anything about programming believe that a programmer that copies and pastes their code in multiple places is working efficiently?

I would answer no.

Tool to Analyze Text for Possible Snippets

I'm on a quest for automation. I want to be able to take Knowledge Articles that are submitted for publication in MS Word and to run a utility to automatically determine what text should be a snippet. My query through Google led me to https://www.online-utility.org/text/analyzer.jsp, but I'll talk about that later.

After I found that website, I posted this to the Techwr-L list:

Hi,

I am looking at 8 different Word documents. The end game for these documents is to import them into my HAT (RoboHelp 2015) and maintain them in HTML. No problem - I know how to do all that.

What I want to pick your brains about is how to determine the frequency of the duplicated text. I know there is duplicate text across the documents because I took the 8 Word documents, inserted each into a single Word document, stripped out the graphics, and sorted the paragraphs.

I ended up with 280 sentences.

Sure, I can visually scan the list and find a sentence like this - "Create and confirm a 4-digit Citrix PIN." - and see that it exists twice. I know I could paste the list of 280 sentences into Excel and remove the rows that are duplicated - that's NOT what I'm looking for.

Instead, I'm looking for something close to this site: https://www.online-utility.org/text/analyzer.jsp, BUT I want to know how many times a sentence exists. For example, I pasted in the 280 sentences and the site came back with this information:
|
Some top phrases containing 8 words (without punctuation marks) Occurrences
configure secure hub configure secure hub configure secure 4
|
However, that text is the following text:
|
Configure Secure Hub
Configure Secure Hub
Configure Secure Hub
Configure Secure Hub
Configure Secure Hub
Configure Secure Hub
|
So what I want to do is paste in the 280 sentences and get a report that "Configure Secure Hub" exists in the list of 280 "6" times.

Have you found an easy way to do this?

The next step, after I figure out how to get the list of duplicated text is to generate .hts files (snippet files that RoboHelp recognizes) so that I can analyze the text outside of RoboHelp, create the .hts files, import the snippets into RoboHelp and then run find and replace actions to replace "Configure Secure Hub" with the reference to the snippet that will store the "Configure Secure Hub" text. I know how to create the snippet file, using a DOS command to "Copy [template.hts file] [name of snippet file]" but have yet to figure out how to get the actual text I want to store in the snippet INTO the snippet without manually pasting the text - Configure Secure Hub - into the snippet... but that's after I figure out to analyze the text automatically to know that "Configure Secure Hub" is repeated 6 times in the 280 sentences.

Jack DeLand, a Madcap Flare user, responded first:

From: Jack DeLand [mailto:jackdeland@adamcharlesconsulting.com]
Sent: Thursday, April 12, 2018 5:50 PM
To: Me
Subject: Re: Tool to Analyze Text for Possible Snippets
Meh. Switch to Flare and Analyzer.

I have known Jack in a virtual sense for probably 15 years. I smiled when I saw his response.

But it didn't provide a solution.

Peter Nielsen responded also:

From: techwr-l-bounces+twer_lists_all=hotmail.com@lists.techwr-l.com [mailto:techwr-l-bounces+twer_lists_all=hotmail.com@lists.techwr-l.com] On Behalf Of Peter Neilson
Sent: Thursday, April 12, 2018 4:54 PM
To: techwr-l@lists.techwr-l.com
Subject: Re: Tool to Analyze Text for Possible Snippets

Jobs like this are often easily handled by the software tools within Unix or Linux, or by clever use of emacs macros. For your purposes, though, the time involved in learning sed, grep, awk, and such tools, or (even worse) the time to become a good emacs hacker, would be a roadblock. I might suggest that you find a friendly local hacker (the original white-hat
meaning) who knows how to use those tools.

Your hacker will probably say, "Export everything to .txt files and I'll work on them."

Slightly more helpful than Jack's response, but not a solution.

Finally, I get to the response from the https://www.online-utility.org/text/analyzer.jsp site, which was about as helpful as Jack's response:

From: Mladen Adamović mladen.adamovic@gmail.com
Sent: Friday, April 13, 2018 5:58 AM
To: Me
Re: online utility comment
I don't know - perhaps manually tweek the text in the notepad. All best,
Mladen
On Thu, Apr 12, 2018 at 9:21 PM, I wrote:
Your tool is SOOOO close to what I’m looking for…
I have 280 sentences and I want to know how many of the sentences are duplicated. I pasted the 280 sentences into the text box and, in the results, I see this:
Actually, though, that text is this text in the list of 280 sentences:
What I want to see is the analysis be split by the paragraph mark (I pasted the above text from a MS Word doc that I copied to Notepad before pasting in the website)
| Phrase
Occurrences
Configure
Secure Hub 6
|
Is there something I can tweak in the settings to do what I want?

The answer was "No."

Thankfully, Paul Beverly on the Word-PC list wrote this macro for me, which does the trick…

Sub DuplicateSentenceCount()
' Version 13.04.18
' Counts frequency of any duplicate sentences

myTab = " . . . "
numSents = ActiveDocument.Sentences.Count
dupSents = ""
For i = 1 To numSents
  testSent = Trim(ActiveDocument.Sentences(i).Text)
  testSent = Replace(testSent, vbCr, "")
  If InStr(dupSents, testSent) = 0 And Len(testSent) > 10 Then
    myCount = 1
    For j = i + 1 To numSents
      compSent = Trim(ActiveDocument.Sentences(j).Text)
      compSent = Replace(compSent, vbCr, "")
      If compSent = testSent Then
        myCount = myCount + 1
      End If
    Next j
    If myCount > 1 Then
      StatusBar = testSent
      sentPlusCount = testSent & myTab & Trim(Str(myCount)) & vbCr
      dupSents = dupSents + sentPlusCount
    End If
  End If
Next i
Selection.EndKey Unit:=wdStory
Selection.TypeText Text:=vbCr & dupSents
End Sub

On this page - http://intentionaldesign.ca/2013/07/29/to-dita-or-not-to-dita-thats-a-good-question-part-2 - there is a cool explanation of "Re-use vs repurposing" in technical writing.

Tested a Theory

At work, there's a website that has ~3000 links to PDF files. The PDF files are generated from MS Word files. I was told by the team that manages those files that there is not a lot of duplication between the content, but I wondered about the graphics. The team has a standard of using black numbers with a red circle around it - a SnagIt default I've used in the past - on their screenshots. I took a collection of 70 graphics and looked at each graphic. I then renamed the graphics in such a way that if Graphic A showed the Same as Graphic B, then Graphic B would be renamed to Graphic A (2).png (for this specific test, I was using .png files). The screenshot below shows my results. Out of 70 PNG files, 28 had at least one duplicate. I made the professional judgment call to make differences in the resolution or what was cropped in the picture to be duplicates. It doesn't take a scientific calculator to divide 28 by 70 and get 40% as an answer.
The way I see it, it's not the best choice to accept what others tell you.
You should always validate information given to you.
If I had accepted what that team had said, I would be accepting an inaccurate analysis.
What makes me smirk is that the team responsible for the content doesn't know.

Editor's Note: The following links need to be incorporated into this post:
  1. 900 Screenshots - http://prhmusic.blogspot.com/2017/02/900-screenshots.html
  2. Introducing Geek Dad - http://prhmusic.blogspot.com/2017/01/introducing-geek-dad.html
  3. Automatically Generated Garbage - http://prhmusic.blogspot.com/2015/09/automatically-generated-garbage.html
  4. Reason for the Opinion - http://prhmusic.blogspot.com/2015/03/reason-for-opinion.html
  5. There's Not a Lot to Say - http://prhmusic.blogspot.com/2013/01/theres-not-lot-to-say.html
  6. Lars - http://prhmusic.blogspot.com/2017/10/lars.html
  7.  https://prhmusic.blogspot.com/search/label/OOD%20%28Object-Oriented%20Documentation%29

1 comment:

Paul Beverley said...

I wrote a set of macros that I thought fiction editors might find useful. One of them is CatchPhrase, which searches your novel for over-used phrases and counts how many times each phrase occurs. Is that the sort of thing you want? If so, it’s one of the 600+ macros in my free book: http://www.archivepub.co.uk/TheBook

Make the Ammends

SEBASTIAN BACH Says He Was Urged By ROBERT TRUJILLO To Reconnect With His Former Bandmates In SKID ROW June 20, 2018