Introduction
I recently had a challenge to upload over a thousand HTML documents to Confluence. I won't go into the details of what scripts I created using various Node.js modules, but I did want to share with you how I maintained a list of documents that were or were not published to Confluence.Requirements
You should be comfortable with a terminal interface, managing documents in Confluence, and Confluence CLI plugin.
Using the Confluence CLI
confluence --action getPageList --id "<parent page id>" --descendents > uploaded_docs.txt
Note: the Confluence command itself needs to be setup as an alias in your Bash profile. The instructions for setting up the Confluence CLI plugin mentions how do some of this. My Bash alias looks something like this:
alias confluence="<path to confluence script>./confluence.sh --server <base Confluence URL> --user <user> --password <pasword>"
With that alias setup and a little forward thinking about how the space was going to be structured under a single document, I saved myself some time by parenting all the documents under this one ultimate parent document. (I wrote a script that handles that task as well which I'll share another time.) Having a single parent document, the CLI command reported back all the documents I needed to work with in one single execution of this command. Otherwise, I would have had to identify each parent document, execute this command on parent document, and tally up all the uploaded documents.
From here, with the two lists in hand, it was now a simple matter of finding the differences. There are several options out there to accomplish this but in the end, I just used Excel and used the conditional formatting feature to highlight the duplicates and the ones that weren't highlighted were the ones that needed to uploaded.
Maybe in the future I'll write a script that does this automatically from the two lists and share that process as well.