Creating eBooks from AsciiDoc

Overview

This guide will show you how to convert AsciiDoc files into EPUB and Kindle’s .mobi format using open source software on Linux. This guide will assume you have some experience with Linux, and a general understanding of html, css, and XML will be very helpful. I will walk you through all the steps required to create these files, explaining how each tool works so that you can troubleshoot and adapt the workflow to make it work for you. My goal is for you to really understand how this process works, rather than just following rote steps without really understanding them. I will include links to related resources as we go through that will help round out your understanding. At the end of this article, you will have a Makefile that will allow you to easily build and modify eBooks.

This guide is tested on Ubuntu 14 x64, although it should work on most Debian-based systems without much effort. It could easily be used on any Linux-based system with a minimum amount of modification.

Overview of the Workflow

AsciiDoc is a simple document format that allows you to easily mark up text in an easy to read format, which can also be easily converted to a specific type of XML that is used for EPUBs. From Wikipedia: “AsciiDoc is a human-readable document format, semantically equivalent to DocBook XML, but using plain-text mark-up conventions.” A good overview of AsciiDoc can be found on the AsciiDoctor website.

Documents written in AsciiDoc will be converted to DocBook XML, a semantic language made for technical documentation. This DocBook XML is converted to EPUB3 html files using DocBook XSL stylesheets. XSL Stylesheets are used to convert XML to another format, in this case html files (eBooks, including .mobi and EPUB are merely an archive of html files and cascading style sheets, essentially a web page wrapped into a single archive). We will then turn those html files into an eBook using a few different tools.

Tools

All tools and files here are open source or free. We will use the following tools:

  1. asciidoctor: Used to convert AsciiDoc to DocBook XML.
  2. jing: An XML docbook schema validator.
  3. xsltproc: Used to convert Docbook XML to epub files and folders
  4. epubcheck: Validate and convert epub files & folders into single file.
  5. calibre: Calibre is a eBook viewer.
  6. KindleGen: Amazon tool to convert epub to mobi (kf8 and the older .mobi file).

We will also use the following files (download instructions are below):
The DocBook schema (in Relax NG schema Language).
The DocBook XSL Stylesheets, used to convert DocBook XML to HTML files from the DocBook Project.

Enviornment

This guide will have you setup a folder to hold the files for your eBook and all required files. It will assume that the filename of your eBook in adoc format is myBook.adoc. In this guide, we will setup all the required files, as well provide sample content for each file, so you will be able to create a full eBook from the example, and can then modify it after everything is working to match your workflow.

First, let’s install the required software:

sudo apt-get install -y asciidoctor jing xsltproc epubcheck calibre

Next let’s create the folder that will hold all the files for this eBook. We’re using a folder called ebook on the Desktop. We’ll create a number of necessary folders here as well:

mkdir ~/Desktop/ebook/
mkdir ~/Desktop/ebook/build-resources/
mkdir ~/Desktop/ebook/ebook-resources/
mkdir ~/Desktop/ebook/ebook-resources/graphics/
mkdir ~/Desktop/ebook/output/ 
cd ~/Desktop/ebook/

The build-resources folder will hold the DocBook XSL Stylesheets and the DocBook Schema file that we will download. This folder holds files that can be used for any ebook. The ebook-resources folder will store files that are specific to this one eBook, including css files, graphics (like the cover of the ebook), and any other files you want to include in your ebook. The output file is where our final products will be stored (the .mobi and EPUB files).

Now let’s create a simple AsciiDoc file. This will be the source material for our ebook. This format is text-based, and is simple to read and create. Since it’s text, it can also be added to your favorite version control tool (SVN, subversion, or the like). Here we will edit the adoc file using your favorite editor:

touch ~/Desktop/ebook/myBook.adoc
xdg-open ~/Desktop/ebook/myBook.adoc

with the following content:

= Witty Book Title
:doctype: book
:backend: docbook
:docinfo:
:!numbered:
:imagesdir: graphics

[dedication]
== My Dedications
This book is dedicated to.....

I'd also like to thank....

== This is the First Chapter

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas vehicula congue dolor, vel commodo magna viverra ac. Morbi ullamcorper, est eu egestas semper, velit elit bibendum orci, ut tristique tellus nulla sit amet ipsum. Sed fringilla, lacus sed viverra dictum, nulla augue placerat lectus, in efficitur magna risus non nibh. Ut laoreet, tortor at tempus mollis, magna risus ullamcorper dolor, quis rutrum ex augue a risus. Vivamus pellentesque accumsan est aliquam fringilla. Quisque eleifend ac eros in volutpat. Quisque eu euismod metus, at blandit diam. Phasellus in magna eget erat finibus lacinia quis at metus. Cras ut hendrerit sem.

Vivamus ligula est, volutpat nec convallis eget, efficitur at orci. Proin non aliquet nunc. Mauris dui odio, bibendum consectetur ligula at, faucibus dapibus est. Praesent porttitor, nisi sit amet accumsan euismod, sem felis semper turpis, ut fermentum leo orci sit amet tortor. Nulla eros leo, eleifend vitae ornare quis, mollis tristique eros. In quis accumsan arcu. Ut hendrerit vitae sem ut consectetur. Nunc enim massa, tempus id orci vitae, rhoncus laoreet nulla. Pellentesque elementum purus rutrum, condimentum elit vitae, sagittis magna. Maecenas ornare justo et arcu consequat, nec volutpat risus fermentum.

== This is the Second Chapter

Duis cursus ac augue id blandit. Nulla varius accumsan odio, sed vestibulum odio lobortis quis. Nunc vitae ipsum tortor. Ut ut eros dignissim est luctus finibus ac quis nulla. Etiam consequat, neque sit amet laoreet laoreet, magna odio ornare justo, et ornare sapien nunc quis dolor. Praesent felis metus, facilisis a quam id, euismod faucibus nibh. Mauris venenatis dui erat, vel auctor felis tempus eget. Pellentesque tellus metus, pretium aliquam tristique eget, bibendum ut sapien. Curabitur magna augue, feugiat id enim congue, ullamcorper iaculis arcu. Integer pulvinar elit nulla, at gravida velit sodales eget. In quis leo ac mauris mollis facilisis fermentum non ex. Ut a lorem lacinia, egestas sem eu, tincidunt risus. Proin non ornare lacus, vitae imperdiet erat.

== This is the Third Chapter

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas vehicula congue dolor, vel commodo magna viverra ac. Morbi ullamcorper, est eu egestas semper, velit elit bibendum orci, ut tristique tellus nulla sit amet ipsum. Sed fringilla, lacus sed viverra dictum, nulla augue placerat lectus, in efficitur magna risus non nibh. Ut laoreet, tortor at tempus mollis, magna risus ullamcorper dolor, quis rutrum ex augue a risus. Vivamus pellentesque accumsan est aliquam fringilla. Quisque eleifend ac eros in volutpat. Quisque eu euismod metus, at blandit diam. Phasellus in magna eget erat finibus lacinia quis at metus. Cras ut hendrerit sem.

Vivamus ut feugiat neque, sed varius tortor. Phasellus sit amet ante ut tortor pulvinar efficitur non non massa. Curabitur imperdiet justo nec urna cursus, sit amet dapibus lectus posuere. Aliquam hendrerit nisi eget nunc aliquet, gravida aliquam elit volutpat. Donec semper tincidunt neque in aliquet. Curabitur lobortis rutrum felis quis tempus. In bibendum neque vitae ipsum tempus aliquet. Maecenas euismod consequat pellentesque.

Maecenas in tincidunt nibh. Aliquam tempus libero non augue finibus fermentum. Sed massa leo, tempus sollicitudin consequat vel, ornare nec ligula. Donec et tellus bibendum, blandit nunc sed, porttitor magna. Integer eget pulvinar lorem. Sed in bibendum quam. Donec eu molestie ipsum, at maximus ipsum. Nunc vel mauris vulputate, faucibus dui quis, imperdiet enim. Phasellus sodales turpis quis velit egestas, in rhoncus diam pellentesque.

There are a number of interesting things happening in this file. The text we are using is Lorem ipsum, common filler text used by typesetters to allow you see how text looks on the screen or in print without getting caught up in the content of the text. The header of the adoc file begins with the Title of the book on the first line, underlined on the next line with a number of equals signs. The third line says that when we convert this file, we want the final document type to be a book (more on different document types here.) The backend actually gets overwridden at the command line when we convert this file later, but is good to have. The docinfo file command indicates that there is a docinfo.xml file (the one we will create below) that contains further header information. The Numbered line indicates that we don’t want page numbers. The first section we have is the dedication. The two equals signs indicate a chapter heading.

Now create the css file. This css file tells our ebook (EPUB or .mobi) how to be displayed, including font size, color, any anyting else that can be configured with css).

touch ~/Desktop/ebook/ebook-resources/master.css
xdg-open ~/Desktop/ebook/ebook-resources/master.css

and enter the following information:

html, body { height: 100%; margin: 0; padding:0; border-width: 0; }
 @page { margin: 5pt; }

/* indent paragraph */
 h2 + p {
 text-indent:0;
 }
 p {
 text-indent:1em;
 margin: 0;
 }
/* Set the minimum amount of lines to show up on a separate page. (There is not much support for this at the moment.)
https://github.com/reitermarkus/epub3-boilerplate/blob/master/Ebook/OPS/css/main.css*/
p,
blockquote {
  orphans: 2;
  widows: 2;
}
/* page break for dedication (xslst keeps on same page as the copyright */

div.dedication {
	page-break-before:always;
}

/* Move the legal notice from the title page to its own page */
div.legalnotice{
	page-break-before:always;
}

/* Tile Page formatting */
div.book div.titlepage h1{
	font-family: Helvetica,Arial,sans-serif;
	text-align: center;
	color: blue;
}

The docinfo file is an xml file that holds information about the author and the copyright information. This file needs to have the same name as your AsciiDoc file with -docinfo.xml appended, and be in the same folder as your adoc file. In this example, our AsciiDoc book is named myBook.adoc, so the docinfo file is named myBook-docinfo.xml. Create this file:

touch ~/Desktop/ebook/myBook-docinfo.xml 

with the following content (note that there can’t be any blank space at the beginning of this file):

Important note: use a text editor that will automatically recognize that you’re working in an XML file, so that it will format this file correctly (replacing the spaces with tabs). It is important that this file be formatted correctly, or you’ll get errors.

<author>
	<personname>
		<honorific>Mr</honorific>
		<firstname>Noah</firstname>
		<surname>Dietrich</surname>
	</personname>
</author>
<copyright>
  <year>2017</year>
  <holder>SublimeRobots Intl.</holder>
</copyright>
<legalnotice>
  <para>
    Copyright &#169; 2017 by Noah Dietrich
  </para>
  <para>
    All rights reserved. This book or any portion thereof may not be reproduced or used in any manner whatsoever without the express written permission of the publisher except for the use of brief quotations in a book review.
  </para>
  <para>
    Printed in the United States of America
  </para>
  <para>
    First Printing, 2017
  </para>
  <para>
    ISBN 0-9000000-0-0
  </para>
  <para>
    Jim &amp; Joe Publishers, LLC
  </para>
  <para>
    www.SublimeRobots.com
  </para>
</legalnotice>
<cover>
	<mediaobject>
		<imageobject>
			<imagedata fileref="graphics/cover.jpg">
			</imagedata>
		</imageobject>
	</mediaobject>
 </cover>

This information will be added to your ebook when it is processed, but is not stored in the adoc file. A good example of a docinfo file can be found here.

Next, we need to get the DocBook XML Schema file (docbookxi.rng) and the DocBook XSLT stylesheets. We will store them in our build-resources folder:

cd ~/Desktop/ebook/build-resources
wget http://docbook.org/xml/5.2b01/rng/docbookxi.rng

wget http://downloads.sourceforge.net/project/docbook/docbook-xsl-ns/1.79.1/docbook-xsl-ns-1.79.1.tar.bz2
tar -xvjf docbook-xsl-ns-1.79.1.tar.bz2

Finally, we need to download the KindleGen application from Amazon. Navigate to the KindleGen homepage, download the linux version, and extract the kindleGen binary to the build-resources folder.

cd ~/Desktop/ebook/build-resources
wget http://kindlegen.s3.amazonaws.com/kindlegen_linux_2.6_i386_v2_9.tar.gz
tar -xzvf kindlegen_linux_2.6_i386_v2_9.tar.gz kindlegen

The Book Cover: You’ll need to put a jpeg for the cover into the graphics folder named cover.jpg. If you don’t do this, you’ll need to remove the cover section in your myBook-docinfo.xml file. You can get information on recommended jpeg sizing here.

Begin Processing

The first step is converting the adoc file into DocBook XML format. This part can be a little frustrating sometimes, as many small semantic issues can cause errors to show up at this stage. Some issues I have encountered (and quick fixes for them if I have an answer):

  • Double Spaces. This can be fixed with: sed -i.bak ‘/^$/d;G’ myBook.adoc
  • Spaces at the begining of lines: This can be fixed with sed -i.bak ‘s/^[ \t]*//’ myBook.adoc
  • Empty Chapters.

from your ebooks directory, assuming you have all the above files setup correctly, run the following command:

cd ~/Desktop/ebook/
asciidoctor --backend docbook5 --doctype book --verbose --destination-dir ./output/ myBook.adoc

here we are using the asciidoctor application to convert the adoc file into an AsciiDoc XML file. This XML file will have all the same content as our original adoc file, only formatted (and marked up semantically) to meet the DocBook schema (docbookxi.rng). The schema describes the legal layout of all valid files. We are using these options:

  • --backend docbook5: Remember when I said above we’d override the backend command in the header of the adoc file? this is the line that does that. This line says that we want to use the docbook5 backend, which will generate DocBook 5.0 XML.
  • --doctype book: We also have this information in the header, but it’s better to be sure. The command line overrides the setting in the header as well, if there is a difference.
  • --verbose: We want as much output as possible, good for troubleshooting.
  • --destination-dir ./output/: we want the xml file that is created to be put into the output folder
  • myBook.adoc: this is the filename of the adoc file we want converted to DocBook XML.

you should see output similar to:

noah@thor:~/Desktop/ebook$ asciidoctor --backend docbook5 --doctype book --verbose --destination-dir ./output/ myBook.adoc
Input file: myBook.adoc
  Time to read and parse source: 0.00527
  Time to render document: 0.00944
  Total time to read, parse and render: 0.01476
noah@thor:~/Desktop/ebook$ 

If you you have errors, you may see output similar to:

asciidoctor: WARNING: myBook.adoc: line 9: invalid style for paragraph: dedication

This usually means that there is either an error in your header, or you docinfo xml file has an error (spaces instead of tabs, extra spaces between elements, and similar issues). You must fix these issues before continuing.

Now that we have our ebook in the DocBook XML file format, we want to validate that it is semantically correct. We want to check to see that its format matches the schema defined in the docbookxi.rng file (the schema in Relax NG schema language). For this, we use a tool called jing. There is another tool called xmllint, that also does validation, but I encountered issues with it, and found jing to be much more reliable. An excellent resource for understanding the details can be found in the Processing DocBook5 section in the DocBook XSL: The Complete Guide (you’ll be referencing this online ebook a lot if you want to do any configuration of your ebook).

So the content of our adoc file and the docinfo file have been combined into a single xml file in the output directory (you can open it to see what it looks like), and we need to validate it to make sure it’s formatted correctly (sometimes asciidoctor makes mistakes). To do this, we run the following command from the same directory as before (not the output directory):

cd ~/Desktop/ebook/
jing -i ./build-resources/docbookxi.rng output/myBook.xml

This command is simple, it takes the docbookxi.rng schema file as the first input (-i), and our book in xml format as our second input, and will tell us if it’s valid (properly formatted) DocBook XML. If you have issues, try to figure out what line of the xml file is causing the issue, and try to track it back to the original asciidoc or docinfo file. This can be a challenge to do, sometimes searching the internet for your error can help.

If you see no output, then there are no errors.

Next, we are going to use xsltproc to convert our DocBook XML file into a series of html files (HTML 5 files actually), copy in our css and images to create a folder that represents our entire ebook, including all required resources.

It helps here to understand how ebook file systems are laid out before they are zipped into an archive we consider an EPUB or mobi file. A basic EPUB has the following files and folder heirarchy stored in a zipped container:

mimetype
META-INF/
  container.xml
OEBPS/
  content.opf
  chapter1.xhtml
  chapter2.xhtml
  css/
    style.css
  toc.ncx
  graphics/
    cover.jpg

Good explanations of these files can be found on Wikipedia, as well as here and here.

We need to convert our valid DocBook XML into the above folder structure. Do do this, we use xsltproc, whic applies XSLT stylesheets to XML documents. XSLT stylesheets are a language for converting XML into other formats (in our case, HTML documents). The XSLT stylesheets are provided by the DocBook project. The following command converts our DocBook XML into an EPUB folder hierarchy:

cd ~/Desktop/ebook/
xsltproc --stringparam base.dir ./output/epub3-book/OEBPS/ --stringparam chapter.autolabel 0 --stringparam chunker.output.indent  yes ./build-resources/docbook-xsl-ns-1.79.1/epub3/chunk.xsl ./output/myBook.xml

Let’s break this down:

  1. xsltproc: The application that applies the XSLT stylesheets.
  2. –stringparam base.dir ./output/epub3-book/OEBPS/: Where will the EPUB folders be output. The epub3-book folder is the location of the EBOOK. this option requires the location of the OEBPS folder.
  3. –stringparam chapter.autolabel 0: Do not number chapter headings (I feel it looks better without numbers on chapters).
  4. –stringparam chunker.output.indent yes : Make the html output pretty (helps for troubleshooting)
  5. ./build-resources/docbook-xsl-ns-1.79.1/epub3/chunk.xsl: These are the XSLT stylesheets.
  6. ./output/myBook.xml: The source DocBook XML file.

the stringparam options above are specific to the XSLT files that we are working with. To find other options that are available, read through Chapter 7. HTML output options of DocBook XSL: The Complete Guide.

You should see output similar to:

noah@thor:~/Desktop/ebook$ xsltproc --stringparam base.dir ./output/epub3-book/OEBPS/ --stringparam chapter.autolabel 0 --stringparam chunker.output.indent  yes ./build-resources/docbook-xsl-ns-1.79.1/epub3/chunk.xsl ./output/myBook.xml
Writing ./output/epub3-book/OEBPS/bk01-toc.xhtml for book
Writing ./output/epub3-book/OEBPS/ch01.xhtml for chapter(_this_is_the_first_chapter)
Writing ./output/epub3-book/OEBPS/ch02.xhtml for chapter(_this_is_the_second_chapter)
Writing ./output/epub3-book/OEBPS/ch03.xhtml for chapter(_this_is_the_third_chapter)
Writing ./output/epub3-book/OEBPS/index.xhtml for book
Writing ./output/epub3-book/OEBPS/docbook-epub.css for book
Generating EPUB package files.
Writing ./output/epub3-book/OEBPS/cover.xhtml for mediaobject
Generating image list ...
Writing ./output/epub3-book/OEBPS/package.opf for book
Writing ./output/epub3-book/OEBPS/../META-INF/container.xml for book
Writing ./output/epub3-book/OEBPS/../mimetype for book
Generating NCX file ...
Writing ./output/epub3-book/OEBPS/toc.ncx for book
noah@thor:~/Desktop/ebook$ 

We also need to manually move our css file and images into the EPUB folder hierarchy (add any additional graphics you need at this stage):

cd ~/Desktop/ebook/
cp ./ebook-resources/master.css ./output/epub3-book/OEBPS/docbook-epub.css
cp -r ./ebook-resources/graphics/ ./output/epub3-book/OEBPS/

The next step is to convert our EPUB folders into a single file (our actual EPUB). To do this we use epubcheck, then rename the file:

epubcheck  ./output/epub3-book/ -mode exp -v 3.0 -save
mv ./output/epub3-book.epub ./output/myBook.epub

Here we are using -mode exp to have epubcheck validate the expanded EPUB archives, version 3.0, and save it to a single file: ./output/myBook.epub.
This epub file is the first final product. You can view this epub on any epub compatible reader (including calibre, which we installed earlier).

The final step is to convert out epub into the mobi format, for use on Amazon Kindle devices. This is done with KindleGen. This tool is simple, it takes the name of the epub folder to convert, and the name of the .mobi to create:

cd ~/Desktop/ebook/build-resources
./kindlegen ../output/myBook.epub

if you look in your ./output folder, you will now see your epub and .mobi files. If you have an amazon device, you can email the .mobi file to yourself and have it automatically download to your device. All kindle devices support this .mobi format. More information can be found here and here.

Automating the Build Process

You will quickly find that as you are modifying your files, it becomes a hassle to constantly run these commands. The solution to this is to use a Makefile. This tool was originally designed to compile software, but can be easily modified to simplify your ebook workflow.
in your ebook folder, create a new file called Makefile:

cd ~/Desktop/ebook/
touch Makefile

enter the following text (as with the docbook file above, replace spaces at the beginning of lines with tabs if needed):

mobi : epub
	#ebook-convert ./output/myBook.epub ./output/myBook.mobi
	./build-resources/kindlegen ./output/myBook.epub

epub : ebook
	epubcheck  ./output/epub3-book/ -mode exp -v 3.0 -save
	mv ./output/epub3-book.epub ./output/myBook.epub

ebook : docbook
	xsltproc --stringparam base.dir ./output/epub3-book/OEBPS/ \
		--stringparam chapter.autolabel 0 \
		--stringparam chunker.output.indent  yes \
		./build-resources/docbook-xsl-ns-1.79.1/epub3/chunk.xsl ./output/myBook.xml
	cp ./ebook-resources/master.css ./output/epub3-book/OEBPS/docbook-epub.css
	cp -r ./ebook-resources/graphics/ ./output/epub3-book/OEBPS/

docbook : 
	asciidoctor --backend docbook5 --doctype book --verbose --destination-dir ./output/ myBook.adoc
	jing -i ./build-resources/docbookxi.rng output/myBook.xml

.PHONY: clean
clean : 
	-rm -rf ./output/*

Open a command prompt, navigate to the ebook folder, and you can now build your ebook by issuing the command make complete. If you want to delete all old versions of the ebook, you can run make clean. if you get an error: Makefile:3: *** missing separator. Stop, then you need to replace all spaces with tabs at the beginning of lines (there are issues pasting tabs from a website into a document).

Some of the options you have here:

  • make docbook: Convert your adoc to DocBook XML and validate.
  • make ebook: This runs make docbook, and then creates the ebook folders.
  • make epub: This runs the above two commands, then converts the ebook folders into a single epub
  • make mobi: This runs the above commands, and finally converts the epub into a mobi
  • make clean: delete all files in the output folder.

You can modify this makefile to match your workflow, such as adding options to xsltproc (new lines are broken up with a backslash to improve readability), or having more files added to your ebook directory.

Conclusion

This guide has given you a simple framework for creating an ebook workflow. There are a number of things that can be improved or modified in this process to suit your needs, but hopeful you have learned enough to make these modifications yourself. You’ll probably want to improve the css files for your ebook (there are a number of websites that can better discuss epub css options, some of them are linked below). You may want to look at embedding images into your book, using specific fonts, adding different parameters to the XSL transforms, and many other options.

Feedback is welcomed, especially if there are errors in this guide or recommendations you have from your own experience: please contact me here.

Helpful Links

DocInfo.xml example.
Oreily Publications docinfo.xml example for erlang book.
publishing with iBooks example docinfo.xml.
another Oreily docbook.xml example.

Amazon Kindle Publishing Guidelines

CSS Boilerplate for eBooks.
Basic css styles for Kindle html.

The eBook Design and Development Guide on Amazon.

These two guides below use a2x from the asciidoc package, rather than asciidoctor to generate the xml from docbook. I prefer asciidoctor, as i find that it worked better for my workflow.
A good guide on converting docbook to epub and mobi.
Another good guide.

Comments are Disabled