Showing posts with label javascript. Show all posts
Showing posts with label javascript. Show all posts

Tuesday, May 14, 2019

Using Nightmare.js to Generate a Sitemap From Confluence

Introduction

In a recent project, I had a task to generate a list of documents in a particular Confluence space. I chose to explore my options using Nightmare.js. Using this Node.js (version 10.11.0) module, it allowed me to programmatically enter my credentials into Confluence, navigate to a specific document, and gather a list of documents (thanks to the target document using the Children Display macro that listed all the documents of the parent page of the target space). I also wanted this script to take arguments (flags) such as the username, password, spacekey, output file, and a delay value so that the process can be automated for a variety of reasons.

Required skills and npm packages

This tutorial requires a number of skills and/or npm modules to complete everything mentioned herein:
  • Confluence (5.x): You should be comfortable with creating pages that utilize the Children Display macro
  • Nightmare (3.0.1): have some familiarity with the basics of this module
  • Commander (2.19.0): have some familiarity with the basics of this module
  • Cheerio (1.0.0-rc.2): have some familiarity with the basics of this module
  • CSS: basic knowledge of how to select elements
  • JavaScript: fair knowledge of how to use JavaScript

Setting up requirements

First, we set off with requiring a number of modules:

const Nightmare = require("nightmare");
const cheerio = require('cheerio');
const program = require('commander');
const fs = require('fs');

....

Set up nightmare and flag options

The next two lines sets up nightmare to display it's process as it's going through the steps we'll program it to navigate and a selector to find the content we're looking for in our target document. The confluenceSelector is the CSS selector that will be used to find the desired content in the main body of the Confluence document.

....
const nightmare = Nightmare({
    show: true
});
const confluenceSelector = '#main-content';

....

Note: you don't want to see an Electron window pop up and nightmare to do it's stuff, set show to false.

Next, we set up the flags and their usage using commander's features:

...
program
  .version('0.0.1')
  .usage('-u <username> -p <password> -s <spacekey> -f <output.txt> -d <milliseconds>')
  .option('-u, --user', '*required* Username id')
  .option('-p, --password', '*required* User\'s password')
  .option('-s, --spacekey', '*required* Spacekey for the Confluence space')
  .option('-f --file', 'Text file to be used for tracking Confluence document names. Can be set to either true (defaults to the spacekey naming scheme) or a file name.')
  .option('-d, --delay', 'Delay (in milliseconds) to wait for server response')
  .parse(process.argv);

...

With the flags set, we now need to parse them into an object that we'll use throughout the rest of the script. We loop through the program.rawArgs value provided by the commander module. In this loop, we are looking for specific flags so we can associate the flag with the value associated with it.

...
var argument = {};

for (var i = 0; i < program.rawArgs.length; i++) {
  if (program.rawArgs[i] == '--user' || program.rawArgs[i] == '-u') {
    arguments.user = program.rawArgs[i + 1];
  }
  if (program.rawArgs[i] == '--password' || program.rawArgs[i] == '-p') {
    arguments.pass = program.rawArgs[i + 1];
  }
  if (program.rawArgs[i] == '--spacekey' || program.rawArgs[i] == '-s') {
    arguments.spacekey = program.rawArgs[i + 1];
  }
  if (program.rawArgs[i] == '--delay' || program.rawArgs[i] == '-d') {
    arguments.delay = parseInt(program.rawArgs[i + 1]);
  }
  if (program.rawArgs[i] == '--file' || program.rawArgs[i] == '-f') {
    arguments.file = program.rawArgs[i + 1];
  }
}

...


Since the delay flag is optional, we should set up a fallback if the user doesn't supply one. In this case, we're setting the delay to 10 seconds though you can adjust this delay value to a number you're comfortable with your Confluence server responding a login page request.

...
if (!arguments.delay) {
  arguments.delay = 10000;
  console.log('Server response delay not set. Assuming ' + arguments.delay + ' millisecond delay.');
}

...

Now we should set up the file path where we keep the site map information. If the user doesn't supply a file to output our data to, the script will use a fallback based on the submitted spacekey name.

...
if (arguments.file) {
  if (arguments.file.length > 5) {
    var confluenceSiteMap = arguments.file;
  } else {
    var confluenceSiteMap = arguments.spacekey + '-site_map.txt';
  }
} else {
  var confluenceSiteMap = confluenceSiteMap.txt;
}

...

The next thing our script will need is the Confluence URL to the site map document. Using the Children Display macro in your target Confluence space, we can gather all the document links in a single space by scraping this one document. Note: you should set up this Confluence document accordingly before executing this script and ensure it's named Site Map. Otherwise, you'll need to change the values in arguments.confluence.

...

if (arguments.spacekey) {
  arguments.confluence = <base Confluence URL> + '/display/' + arguments.spacekey + '/Site+Map';
}

...

With the arguments parsed, we should check that the user supplied the required flags. If any of these flags weren't submitted, then the script should gracefully exit.

...
if (!arguments.user || !arguments.pass || !arguments.spacekey) {
  if (!arguments.user) { // user id is required
    console.log('Username is required.');
  }
  if (!arguments.pass) { // password is required
    console.log('Password is required.')
  }

  if (!arguments.spacekey) {
    console.log('Spacekey is required.')
  }

  process.exit(1);

...

Pull content with nightmare

With the required flags set, we can now request a document from Confluence using your credentials. This chunk of code starts the nightmare.js process by navigating the Electron browser to the site map page in Confluence. The process belows assumes that a login is required when the target page is loaded, enters user supplied username and password in the appropriate fields (denoted by their element ids), click the login button (denoted by it's element id), wait for a period of time (hopefully long enough for the server to respond), grab the content from the predetermined CSS selector via the evaluate method, return the data for parsing later, and close the Electron browser.

...
} else {
  console.log('Getting document link list from ' + arguments.confluence);
  nightmare
    .goto(arguments.confluence)
    .type('#os_username', arguments.user)
    .type('#os_password', arguments.pass)
    .click('#loginButton')
    .wait(arguments.delay)
    .evaluate(confluenceSelector => {
      return {
        html: document.querySelector(confluenceSelector).innerHTML
      }
    }, confluenceSelector)
    .end()

...

Parse content with Cheerio

Now that nightmare.js has retrieved the document in question, we use the then method to load the HTML content into cheerio.js to generate a list of links. Generally speaking, the links listed in a Confluence document usually follow the li span a selector pattern inside the body of the document. Here, we use the output variable to hold the list of links found in the retrieve data.

...
.then(obj => {
  $ = cheerio.load(obj.html.toString());

  var output = '';

  $('li span a').each(function() {
    output += $(this).html() + '\n';
  });

...

Then, we write out the list of links we found in the Confluence document to our predetermined text file.

... 
  fs.writeFileSync(confluenceSiteMap, output, 'utf8');
})

...

Finally, we use the catch method to report back any errors.

...
  .catch(error => {
    console.error(error);
  });
}


Wrapping up

With the script complete, we should save it something like confluenceSitemap.js. From there, we can execute this command to generate our list of links text file: node confluenceSitemap.js -u <username> -p <password> -s <spacekey> -f <links.txt>

Thursday, December 14, 2017

Quick tip: Unwrap for Cheerio

Overview

I recently completed a request to parse, cleanup, and remove select elements from hundreds of HTML documents. Not a major task considering I have a node.js script in my toolbox that can handle this. At least, that's what I thought I had. When it came to removing select elements and leaving it's children elements and content intact, I found that there was a series of span elements wrapping around other elements. Apparently, the source documentation for these HTML files uses spans for all sorts of formatting features. Outside it's original source, these span elements serve absolutely no purpose. So, I was tasked with with removing them.

Like before, I figured I could use the cheerio module to use jQuery-like features (in particular unwrap to remove this unwanted element) and be done with it. Unfortunately, cheerio version 1.0.0 doesn't have an unwrap method.

After some time researching this problem, I found the contents method in cheerio and figured I can use this method to accomplish my task.

The code

Here is the entire code followed by the breakdown.

var fs = require('fs');
var cheerio = require('cheerio');
var shell = require('shelljs');

shell.ls('documents/*.html').map(function(file) {
  console.log('Unwrapping elements from ' + file);
  $ = cheerio.load(fs.readFileSync(file).toString());
  $('span[class^="unnecessary"]').each(function(i,elem) {
    var contents = $(this).contents();
    $(this).replaceWith(contents);
  });
  fs.writeFileSync(file,$.html());
});


Like most node.js scripts, we start off with requiring a few modules. Next we use shell's ls method to find all HTML documents in the documents directory and use the map method on this returned array to parse each found document.

Like before, we use cheerio to load each HTML document into the script with jQuery-like features.
Now, here is the tricky part: Find the element you want to remove, grab the content of said element and replace the selected element with it's contents. Essentially what this part of the script does is loads the selected element into an variable called contents and replaces the current element with the items of the content variable.

From there, we write out the modified document and carry on with the next one in the returned array.

Tuesday, November 14, 2017

Build an user macro with parameters in Confluence

Overview

Continuing off of last month's post entitled "Build an user macro in Confluence", this document will show you how to write an user macro that accepts input from the user. We will be creating an feature that allows for page redirects in Confluence. The macro will accept two inputs from the user: 1- page title to redirect the current page to and 2- delay before current page is redirect (which should include a default of 10 seconds). From there, the macro will display a basic redirect message, derived in part from the user's input, and load the target page after the delay has passed.

For this tutorial, you should have completed either last month's tutorial and/or Guide to User Macro TemplatesVelocity, and be comfortable with HTML and JavaScript.

Parameters

This user macro will take two user inputs: page title to redirect the current page to and a delay time. Let's take a look at these parameters and then break them down:

## @param URL:title=URL|type=string|required=true|desc=URL to redirect to.
## @param Time:title=Time|type=int|default=10|desc=Time to redirect (in seconds). Defaults to 10 seconds.


To declare a parameter, we use the ## @param <parameter name> definition. The parameter name can be anything as long as there are no spaces or starts with a number. It's a good habit to get into naming the parameter names by their purpose.

Next, we have the title of the parameter. The parameter is displayed in the macro's properties when entering the value for the parameter. Here, you can use any combination of characters and spaces as you see fit. The title value should be human readable.

To help the Confluence user macro determine what kind of value it is using and to make the user macro more user friendly, we define the type of input we are providing to the user macro. For the URL parameter, we declare the type to be a basic string (which means the value can be anything). For the Time parameter, we set the type to be int (integer (numbers only)).

For the URL parameter, we set a property called required to be true which tells Confluence that this parameter is mandatory. Not setting this parameter will result in an undefined value and thus disable the macro under it is defined.

For the Time parameter, we use the default property to provide the macro with a default value of 10 (seconds). This is a convenience for the user as they won't need to enter a value unless they want to use something other than the default value.

The last property in both of these parameters is the description (desc). This is the text that will be displayed in the user macro window when filling out the various parameters. You should strive to keep it short, simple, and direct to the point.

Macro Browser Information

As mentioned in the previous tutorial, you don't necessarily need to fill out every field in this section of the user macro but you are required to at least fill out the Macro Name and Macro Title. The others are up to you fill out as needed. But for this user macro I included the following values:
  • Macro Name: redirect
  • Visibility: Visible to all users in the Macro Browser
  • Macro Title: redirect
  • Description: Redirect current page to a new URL within a user specified time (seconds).
  • Categories: Navigation

Definition of User Macro

Since we won't be including any body information in this user macro, we'll set the Macro Body Processing to No macro body.

Template

For the template (code) of the user macro, we'll break it down into two sections: Setting Velocity variables and the rest of the code.

Setting Velocity variables

We set Velocity variable like this:
#set($variable=value)

For our macro's delay, we need to use the following to set up the input variable as milliseconds:

#set($timeOut= $paramTime + "000")

Here, we are setting the variable $timeOut to be the value of the Time parameter which the user supplies in the macro or the macro provides via it's default value.

Basic redirect message

Next, we need to set up the basic message that this page will redirect the current page to the target page.

<div id="redirectBox">This page will be redirected to <a href="$paramURL">$paramURL</a> in $paramTime seconds.</div>


Note the usage of the parameter variables and not the Velocity $timeOut variable so far. The code provides the wiki page with a few elements that displays a basic redirect notice and provides the user with a link in case the redirect function doesn't work automatically or the user wants to forego the delay and visit the target page sooner than later.

Redirect function

The last bit is the JavaScript function to redirect the page after the specified delay:

<script type="text/javascript">
  function Redirect() {
    window.location="$paramURL";
  }


  setTimeout('Redirect()', $timeOut);
</script>


Ideally, this JavaScript function would be included in a library that your Confluence server makes available globally but since it's a small and, hopefully, rarely used macro, it wouldn't hurt the page load time too much if we just include this function on every instance of the macro.

Complete macro code

## @param URL:title=URL|type=string|required=true|desc=URL to redirect to.
## @param Time:title=Time|type=int|default=10|desc=Time to redirect (in seconds). Defaults to 10 seconds.
#set($timeOut= $paramTime + "000")
<div id="redirectBox">This page will be redirected to <a href="$paramURL">$paramURL</a> in $paramTime seconds.</div>

<script type="text/javascript">
function Redirect() {
  window.location="$paramURL";
  }
setTimeout('Redirect()', $timeOut);
</script>



Resources

Monday, August 14, 2017

Update: Using Node.js for Text Processing

Last month, I gave a lightning talk on "Using Node.js for Text Processing" at the Monthly Front End PDX Meetup and I'd like to share my slides and updated code sample in this month's post.

For the most part, my presentation didn't change that much but what did change is some of the methods I recently started using which made my code more efficient.

Requirements

The required modules section of the script stays the same:

var fs = require('fs');
var cheerio = require('cheerio');
var shell = require('shelljs');
...


Looping through the documents

For a little more efficiency, I didn't declare a variable to hold the list of HTML documents and instead, piped the shell method of ls directly into a map function which allows the script to loop through each item it finds in the /documents directory that matches the HTML file format.:

...
shell.ls('documents/*.html').map(function(file) {

   ...
}

Convert documents to a string

Then, load the document to a string with jQuery-like features (thanks to the cheerio module):

...
$ = cheerio.load(fs.readFileSync(file).toString());
...


Process content

Finally, we do an if/then statement to find what we are looking for, remove it, and save out the file:

...
if ($('div.footer').length > 0) {
  $('div.footer').remove();
  fs.writeFileSync(file,$.html());
}
...


The complete script

Here is the complete revised script:

var fs = require('fs');
var cheerio = require('cheerio');
var shell = require('shelljs');

shell.ls('documents/*.html').map(function(file) {
  $ = cheerio.load(fs.readFileSync(file).toString());
  if ($('div.footer').length > 0) {
    $('div.footer').remove();
    fs.writeFileSync(file,$.html());
  }
}


Wednesday, December 14, 2016

Using Node.js for Text Processing

Intro

As a tech writer who is responsible for writing and publishing documentation in various formats, I've found a need to combine my hobby of toying around with JavaScript and document publication. In particular, I'm tasked with pulling information from an Atlassian's Confluence site down into a static HTML file set. However, the method I use (Export EclipseHelp with a custom template) doesn't reliably generate clean or consistent HTML documents. While the original intent of this tutorial was to update content extracted from Confluence, it can work on any HTML file.

I figure there are better ways of doing what I'm about to demonstrate, but my needs are rather particular (as in, this script needs to function as part of a bigger puzzle I employ for publication). If you have suggestions to improve it, I'd love to hear it.

This document doesn't cover how to export HTML from Confluence. What will be covered is a script I came up with that will complete a find and replace function on all HTML files in a particular directory.

Node.js requirements

This little script needs only three modules to read, write, gather a file list, and use jQuery-like features:

var fs = require('fs');
var cheerio = require('cheerio');
var shell = require('shelljs');

Documents, meet array

Using a shell module, I gather all the HTML files in a particular directory:

var fileNames = shell.ls('documents/*.html');

String it up

Read each document as a string if the document has the extension of .html:

for (i in fileNames) {
  if (fileNames[i].indexOf(".html") > -1) {
    $ = cheerio.load(fs.readFileSync(fileNames[i]).toString());
    ...
  }
}


While it may seem a bit redundant to look through the array matching the HTML file type, the array returned in fileNames can end with an empty element in the array and cause our script to throw an error at the end.

Here we use the cheerio module to add jQuery-like features to our script so we can do things like select elements and modify them in a number of ways.

Process the string

Check if an element with the class of footer. If it exists, remove it.

if ($('div.footer').length > 0) {
  console.log("Removing footer from ../" + fileNames[i]);
  $('div.footer').remove();
} else {
  console.log(fileNames[i] + " has no div.footer element.");
}


At this step in the script, we can have the actively selected HTML document be processed in a multitude of ways (e.g updating elements in the header, injecting Bootstrap grid system, swapping image locations, adding date stamps, and so on).

Update and save

Update the string (document) and save it out.

var removed = $.html();
fs.writeFileSync(fileNames[i],removed);

Full code:

var fs = require('fs');
var cheerio = require('cheerio');
var shell = require('shelljs');
var fileNames = shell.ls('documents/*.html');

for (i in fileNames) {
  if (fileNames[i].indexOf(".html") > -1) {
    $ = cheerio.load(fs.readFileSync(fileNames[i]).toString());
    if ($('div.footer').length > 0) {
      console.log("Removing footer from ../" + fileNames[i]);
      $('div.footer').remove();
    } else {
     console.log(fileNames[i] + " has no div.footer element.");
    }

    var removed = $.html();
    fs.writeFileSync(fileNames[i],removed); // save out HTML file
  }
}