Tuesday, March 14, 2017

Export Content from Confluence - Part 3

Intro

Welcome to the final tutorial of this three part series on how to extract HTML content from Confluence.

In the previous two tutorials, we explored how to export content using Confluence's native solution Export Content from Confluence - Part 1 and Export Content from Confluence - Part 2. In this tutorial, we will run through the steps to generate HTML content into a zip file from an user selected space in Confluence using Scroll's HTML Exporter plugin. The biggest difference in using this plugin versus EclipseHelp Exporter, is that plugin offers the option to generate a search index of your extract content and the wiki's space logo will be a part of the final presentation.

Setup


This document assumes that you have the Scroll HTML Exporter plugin installed and properly enabled and that your user account has the proper credentials (at least view and export permissions) for exporting and plugin usage. If it's not, contact your Confluence administrator and request it to be set up.

This process was tested on Confluence 5.6.4 and Scroll HTML Exporter version 3.5.0. Results may very with other versions.

Exporting using HTML Exporter


  1. Navigate to the top most page in the space you wish to export from Confluence.
  2. Go to Tools > Export HTML.
  3. Click Customize Settings in lower left corner. For this tutorial, we are going to customize the content will wish to pull from Confluence.
  4. In the General step, you will need to make some choices depending on the needs of how and what you want to export from Confluence:
    1. In the Create drop down, you can choose between one HTML file for each Confluence page or a single large HTML file. For this tutorial, we'll be using the former option to export our content to the HTML format.
    2. Since this may be our first time using this plugin, the Template drop down will just list one option, Scroll WebHelp Template. This tutorial will not cover how to create HTML templates.
    3. With the Export option, we'll select This page and its children so we can collect every page starting with the current page and all of it's children.
  5. In the Central Processing step, you will have several options to output a few macros with your content, exporting images with original resolution, converting labels to index terms, and merging single, first heading and page titles. For this step, only enable the Export images with original resolution. If your pages in Confluence used the other macros previously listed and you need your exported content to use them, go ahead and enable those as well. Otherwise, leave those options disabled. I have found that exporting the content in as raw form as possible works best when attempting some post processing in my documentation publication scripts.
  6. In the File Naming step, we can leave the settings to their default values. These settings works very well with the vast majority of export settings and the content coming from Confluence.
  7. In the Search Index step, depending on your export needs, you may want to leave this option disabled. For this tutorial, leave it disable. When enabled, the plugin will generate a full text search index and add it to the exported content. Note: if you decide to export your content into one large HTML file, this feature won't be supported.
  8. Click Start Export and wait a moment or two while Confluence churns on the request. An Export in Progress window will appear informing you of the content export status (pages processed versus total number of pages in the request).
  9. Once the request has finished, your browser will download a zip file with the name of the page you selected to start the process from followed by the version number of the page, a date stamp of when it was exported, and a time stamp (set by the server clock). For example, your zip file name may look like this: Home-v12-20170104_1435.zip. The Export Result window will present with the size and number of pages in the export and how long it took. This window will also allow you to save the export scheme you just created, collect the ReST URL, and manage export schemes. If you are going to use this export process repeatedly, you should follow these steps:
    1. Click the Save Export Scheme ... button and select as new from the drop down. 
    2. In the Save new Export Scheme window, decide if this scheme will be used only in the current space or in all spaces (globally). For this tutorial, select in this space.
    3. Provide a good descriptive name and description in their respective fields.
    4. Click the Save button.
  10. Optional: back in the Export Result window, you can collect the ReST URL if you wish to automate the process in the future. Click the REST URL button and copy the URL listed in the middle of the REST URL window. Save this URL for your automation script.
  11. Optional: back in the Export Result window, you can manage the export schemes in either the current space or in all spaces (globally). A new browser window will open up and present you with options to modify any export schemes you may already have. That is, if you have the proper user permissions to see or edit features in the Space Admin page of the current space.
With the completion of this tutorial, you should have an HTML copy of your selected content and an export scheme that can be used repeatedly to pull content from your desired space in Confluence into an HTML zip file without having to go through this entire process again.

For future exports, all you need to do now is go to Tools > Export to HTML and select the export scheme you created in this tutorial.