CRUD the Docs: Update: Using Node.js for Text Processing

Monday, August 14, 2017

Update: Using Node.js for Text Processing

Last month, I gave a lightning talk on "Using Node.js for Text Processing" at the Monthly Front End PDX Meetup and I'd like to share my slides and updated code sample in this month's post.

For the most part, my presentation didn't change that much but what did change is some of the methods I recently started using which made my code more efficient.

Requirements

The required modules section of the script stays the same:

var fs = require('fs');
var cheerio = require('cheerio');
var shell = require('shelljs');
...

Looping through the documents

For a little more efficiency, I didn't declare a variable to hold the list of HTML documents and instead, piped the shell method of ls directly into a map function which allows the script to loop through each item it finds in the /documents directory that matches the HTML file format.:

...
shell.ls('documents/*.html').map(function(file) {

...
}

Convert documents to a string

Then, load the document to a string with jQuery-like features (thanks to the cheerio module):

...
$ = cheerio.load(fs.readFileSync(file).toString());
...

Process content

Finally, we do an if/then statement to find what we are looking for, remove it, and save out the file:

...
if ($('div.footer').length > 0) {
  $('div.footer').remove();
  fs.writeFileSync(file,$.html());
}
...

The complete script

Here is the complete revised script:

var fs = require('fs');
var cheerio = require('cheerio');
var shell = require('shelljs');

shell.ls('documents/*.html').map(function(file) {
  $ = cheerio.load(fs.readFileSync(file).toString());
  if ($('div.footer').length > 0) {
    $('div.footer').remove();
    fs.writeFileSync(file,$.html());
  }
}

Slide deck

Here are the slides I presented: https://docs.google.com/presentation/d/1R0GALRoOzNgTz0gcHIzpf0YQVhM6pLhA22ybM1x6YiQ/edit?usp=sharing

CRUD the Docs