Wednesday, December 14, 2016

Using Node.js for Text Processing

Intro

As a tech writer who is responsible for writing and publishing documentation in various formats, I've found a need to combine my hobby of toying around with JavaScript and document publication. In particular, I'm tasked with pulling information from an Atlassian's Confluence site down into a static HTML file set. However, the method I use (Export EclipseHelp with a custom template) doesn't reliably generate clean or consistent HTML documents. While the original intent of this tutorial was to update content extracted from Confluence, it can work on any HTML file.

I figure there are better ways of doing what I'm about to demonstrate, but my needs are rather particular (as in, this script needs to function as part of a bigger puzzle I employ for publication). If you have suggestions to improve it, I'd love to hear it.

This document doesn't cover how to export HTML from Confluence. What will be covered is a script I came up with that will complete a find and replace function on all HTML files in a particular directory.

Node.js requirements

This little script needs only three modules to read, write, gather a file list, and use jQuery-like features:

var fs = require('fs');
var cheerio = require('cheerio');
var shell = require('shelljs');

Documents, meet array

Using a shell module, I gather all the HTML files in a particular directory:

var fileNames = shell.ls('documents/*.html');

String it up

Read each document as a string if the document has the extension of .html:

for (i in fileNames) {
  if (fileNames[i].indexOf(".html") > -1) {
    $ = cheerio.load(fs.readFileSync(fileNames[i]).toString());
    ...
  }
}


While it may seem a bit redundant to look through the array matching the HTML file type, the array returned in fileNames can end with an empty element in the array and cause our script to throw an error at the end.

Here we use the cheerio module to add jQuery-like features to our script so we can do things like select elements and modify them in a number of ways.

Process the string

Check if an element with the class of footer. If it exists, remove it.

if ($('div.footer').length > 0) {
  console.log("Removing footer from ../" + fileNames[i]);
  $('div.footer').remove();
} else {
  console.log(fileNames[i] + " has no div.footer element.");
}


At this step in the script, we can have the actively selected HTML document be processed in a multitude of ways (e.g updating elements in the header, injecting Bootstrap grid system, swapping image locations, adding date stamps, and so on).

Update and save

Update the string (document) and save it out.

var removed = $.html();
fs.writeFileSync(fileNames[i],removed);

Full code:

var fs = require('fs');
var cheerio = require('cheerio');
var shell = require('shelljs');
var fileNames = shell.ls('documents/*.html');

for (i in fileNames) {
  if (fileNames[i].indexOf(".html") > -1) {
    $ = cheerio.load(fs.readFileSync(fileNames[i]).toString());
    if ($('div.footer').length > 0) {
      console.log("Removing footer from ../" + fileNames[i]);
      $('div.footer').remove();
    } else {
     console.log(fileNames[i] + " has no div.footer element.");
    }

    var removed = $.html();
    fs.writeFileSync(fileNames[i],removed); // save out HTML file
  }
}


Monday, November 14, 2016

Long Ticket Lists and Excel

Intro

When updating a release note that includes long lists of tickets (in my case, hundreds of bug fixes), going through the list can be problematic to keep the list up to date due to numerous factors that influences the ticket's timeline and release date. One trick I figured out was to use Excel to show me what has changed.

How this works

  1. Copy and paste the list of tickets into Excel (column A for example) from your ongoing or pre-lease document. It might be best to remove any formatting features in the list to  ensure content consistency.
  2. Go to your source of JIRA tickets and collect the list of tickets and copy it.
  3. Back in Excel, paste the new list into column B in the same Excel document. Again, make sure any formatting has been stripped from this list as well.
  4. Select both columns and set Conditional Formatting with Highlight Cell Rules set to Duplicate Values.
  5. Since the tickets highlighted are good to go, I like to format with green file with dark green text.
Any ticket that isn't highlighted in column B after setting this should stand out pretty well. These unhighlighted tickets will be the new tickets you need to add to your pre-release document. Any tickets that aren't highlighted in column A could mean that the ticket was removed.
Regardless of which tickets aren't highlighted in either column, you should still review the JIRA ticket to confirm it was added or removed.
Tip: You can quickly tell if tickets have been added and/or removed by selecting all tickets in each individual column and looking at the cell count. If the count is off, then some tickets have been added and/or removed. This system isn't perfect because sometimes a ticket can be removed and another ticket added thus keeping the ticket count the same. However, the highlighting will pick up these changes and show you.

Is this a perfect system? No. Does it do 80% of the work for me between ticket updates? Yes.
Can this be done using Google Sheets? Yes, but it isn't as easy to do in Excel nor is it reliable.
I would appreciate any feedback and/or thoughts on how they manage ticket lists.