Last month, I gave a lightning talk on "Using Node.js for Text Processing" at the Monthly Front End PDX Meetup and I'd like to share my slides and updated code sample in this month's post.
For the most part, my presentation didn't change that much but what did change is some of the methods I recently started using which made my code more efficient.
For the most part, my presentation didn't change that much but what did change is some of the methods I recently started using which made my code more efficient.
Requirements
The required modules section of the script stays the same:var fs = require('fs');
var cheerio = require('cheerio');
var shell = require('shelljs');
...Looping through the documents
For a little more efficiency, I didn't declare a variable to hold the list of HTML documents and instead, piped the shell method ofls directly into a map function which allows the script to loop through each item it finds in the /documents directory that matches the HTML file format.:...
shell.ls('documents/*.html').map(function(file) {   ...}Convert documents to a string
Then, load the document to a string with jQuery-like features (thanks to the cheerio module):...
$ = cheerio.load(fs.readFileSync(file).toString());
...Process content
Finally, we do an if/then statement to find what we are looking for, remove it, and save out the file:...
if ($('div.footer').length > 0) {
  $('div.footer').remove();
  fs.writeFileSync(file,$.html());
}
...The complete script
Here is the complete revised script:var fs = require('fs');
var cheerio = require('cheerio');
var shell = require('shelljs');
shell.ls('documents/*.html').map(function(file) {
  $ = cheerio.load(fs.readFileSync(file).toString());
  if ($('div.footer').length > 0) {
    $('div.footer').remove();
    fs.writeFileSync(file,$.html());
  }
}Slide deck
Here are the slides I presented: https://docs.google.com/presentation/d/1R0GALRoOzNgTz0gcHIzpf0YQVhM6pLhA22ybM1x6YiQ/edit?usp=sharing
