This guide will show you how to convert AsciiDoc files into EPUB and Kindle’s .mobi format using open source software on Linux. This guide will assume you have some experience with Linux, and a general understanding of html, css, and XML will be very helpful. I will walk you through all the steps required to create these files, explaining how each tool works so that you can troubleshoot and adapt the workflow to make it work for you. My goal is for you to really understand how this process works, rather than just following rote steps without really understanding them. I will include links to related resources as we go through that will help round out your understanding. At the end of this article, you will have a Makefile that will allow you to easily build and modify eBooks.
This guide is tested on Ubuntu 14 x64, although it should work on most Debian-based systems without much effort. It could easily be used on any Linux-based system with a minimum amount of modification.
AsciiDoc is a simple document format that allows you to easily mark up text in an easy to read format, which can also be easily converted to a specific type of XML that is used for EPUBs. From Wikipedia: “AsciiDoc is a human-readable document format, semantically equivalent to DocBook XML, but using plain-text mark-up conventions.” A good overview of AsciiDoc can be found on the AsciiDoctor website.
Documents written in AsciiDoc will be converted to DocBook XML, a semantic language made for technical documentation. This DocBook XML is converted to EPUB3 html files using DocBook XSL stylesheets. XSL Stylesheets are used to convert XML to another format, in this case html files (eBooks, including .mobi and EPUB are merely an archive of html files and cascading style sheets, essentially a web page wrapped into a single archive). We will then turn those html files into an eBook using a few different tools.
All tools and files here are open source or free. We will use the following tools:
We will also use the following files (download instructions are below):
The DocBook schema (in Relax NG schema Language).
The DocBook XSL Stylesheets, used to convert DocBook XML to HTML files from the DocBook Project.
This guide will have you setup a folder to hold the files for your eBook and all required files. It will assume that the filename of your eBook in adoc format is myBook.adoc. In this guide, we will setup all the required files, as well provide sample content for each file, so you will be able to create a full eBook from the example, and can then modify it after everything is working to match your workflow.
First, let’s install the required software:
sudo apt-get install -y asciidoctor jing xsltproc epubcheck calibre
Next let’s create the folder that will hold all the files for this eBook. We’re using a folder called ebook on the Desktop. We’ll create a number of necessary folders here as well:
mkdir ~/Desktop/ebook/ mkdir ~/Desktop/ebook/build-resources/ mkdir ~/Desktop/ebook/ebook-resources/ mkdir ~/Desktop/ebook/ebook-resources/graphics/ mkdir ~/Desktop/ebook/output/ cd ~/Desktop/ebook/
The build-resources folder will hold the DocBook XSL Stylesheets and the DocBook Schema file that we will download. This folder holds files that can be used for any ebook. The ebook-resources folder will store files that are specific to this one eBook, including css files, graphics (like the cover of the ebook), and any other files you want to include in your ebook. The output file is where our final products will be stored (the .mobi and EPUB files).
Now let’s create a simple AsciiDoc file. This will be the source material for our ebook. This format is text-based, and is simple to read and create. Since it’s text, it can also be added to your favorite version control tool (SVN, subversion, or the like). Here we will edit the adoc file using your favorite editor:
touch ~/Desktop/ebook/myBook.adoc xdg-open ~/Desktop/ebook/myBook.adoc
with the following content:
= Witty Book Title :doctype: book :backend: docbook :docinfo: :!numbered: :imagesdir: graphics [dedication] == My Dedications This book is dedicated to..... I'd also like to thank.... == This is the First Chapter Lorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas vehicula congue dolor, vel commodo magna viverra ac. Morbi ullamcorper, est eu egestas semper, velit elit bibendum orci, ut tristique tellus nulla sit amet ipsum. Sed fringilla, lacus sed viverra dictum, nulla augue placerat lectus, in efficitur magna risus non nibh. Ut laoreet, tortor at tempus mollis, magna risus ullamcorper dolor, quis rutrum ex augue a risus. Vivamus pellentesque accumsan est aliquam fringilla. Quisque eleifend ac eros in volutpat. Quisque eu euismod metus, at blandit diam. Phasellus in magna eget erat finibus lacinia quis at metus. Cras ut hendrerit sem. Vivamus ligula est, volutpat nec convallis eget, efficitur at orci. Proin non aliquet nunc. Mauris dui odio, bibendum consectetur ligula at, faucibus dapibus est. Praesent porttitor, nisi sit amet accumsan euismod, sem felis semper turpis, ut fermentum leo orci sit amet tortor. Nulla eros leo, eleifend vitae ornare quis, mollis tristique eros. In quis accumsan arcu. Ut hendrerit vitae sem ut consectetur. Nunc enim massa, tempus id orci vitae, rhoncus laoreet nulla. Pellentesque elementum purus rutrum, condimentum elit vitae, sagittis magna. Maecenas ornare justo et arcu consequat, nec volutpat risus fermentum. == This is the Second Chapter Duis cursus ac augue id blandit. Nulla varius accumsan odio, sed vestibulum odio lobortis quis. Nunc vitae ipsum tortor. Ut ut eros dignissim est luctus finibus ac quis nulla. Etiam consequat, neque sit amet laoreet laoreet, magna odio ornare justo, et ornare sapien nunc quis dolor. Praesent felis metus, facilisis a quam id, euismod faucibus nibh. Mauris venenatis dui erat, vel auctor felis tempus eget. Pellentesque tellus metus, pretium aliquam tristique eget, bibendum ut sapien. Curabitur magna augue, feugiat id enim congue, ullamcorper iaculis arcu. Integer pulvinar elit nulla, at gravida velit sodales eget. In quis leo ac mauris mollis facilisis fermentum non ex. Ut a lorem lacinia, egestas sem eu, tincidunt risus. Proin non ornare lacus, vitae imperdiet erat. == This is the Third Chapter Lorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas vehicula congue dolor, vel commodo magna viverra ac. Morbi ullamcorper, est eu egestas semper, velit elit bibendum orci, ut tristique tellus nulla sit amet ipsum. Sed fringilla, lacus sed viverra dictum, nulla augue placerat lectus, in efficitur magna risus non nibh. Ut laoreet, tortor at tempus mollis, magna risus ullamcorper dolor, quis rutrum ex augue a risus. Vivamus pellentesque accumsan est aliquam fringilla. Quisque eleifend ac eros in volutpat. Quisque eu euismod metus, at blandit diam. Phasellus in magna eget erat finibus lacinia quis at metus. Cras ut hendrerit sem. Vivamus ut feugiat neque, sed varius tortor. Phasellus sit amet ante ut tortor pulvinar efficitur non non massa. Curabitur imperdiet justo nec urna cursus, sit amet dapibus lectus posuere. Aliquam hendrerit nisi eget nunc aliquet, gravida aliquam elit volutpat. Donec semper tincidunt neque in aliquet. Curabitur lobortis rutrum felis quis tempus. In bibendum neque vitae ipsum tempus aliquet. Maecenas euismod consequat pellentesque. Maecenas in tincidunt nibh. Aliquam tempus libero non augue finibus fermentum. Sed massa leo, tempus sollicitudin consequat vel, ornare nec ligula. Donec et tellus bibendum, blandit nunc sed, porttitor magna. Integer eget pulvinar lorem. Sed in bibendum quam. Donec eu molestie ipsum, at maximus ipsum. Nunc vel mauris vulputate, faucibus dui quis, imperdiet enim. Phasellus sodales turpis quis velit egestas, in rhoncus diam pellentesque.
There are a number of interesting things happening in this file. The text we are using is Lorem ipsum, common filler text used by typesetters to allow you see how text looks on the screen or in print without getting caught up in the content of the text. The header of the adoc file begins with the Title of the book on the first line, underlined on the next line with a number of equals signs. The third line says that when we convert this file, we want the final document type to be a book (more on different document types here.) The backend actually gets overwridden at the command line when we convert this file later, but is good to have. The docinfo file command indicates that there is a docinfo.xml file (the one we will create below) that contains further header information. The Numbered line indicates that we don’t want page numbers. The first section we have is the dedication. The two equals signs indicate a chapter heading.
Now create the css file. This css file tells our ebook (EPUB or .mobi) how to be displayed, including font size, color, any anyting else that can be configured with css).
touch ~/Desktop/ebook/ebook-resources/master.css xdg-open ~/Desktop/ebook/ebook-resources/master.css
and enter the following information:
html, body { height: 100%; margin: 0; padding:0; border-width: 0; } @page { margin: 5pt; } /* indent paragraph */ h2 + p { text-indent:0; } p { text-indent:1em; margin: 0; } /* Set the minimum amount of lines to show up on a separate page. (There is not much support for this at the moment.) https://github.com/reitermarkus/epub3-boilerplate/blob/master/Ebook/OPS/css/main.css*/ p, blockquote { orphans: 2; widows: 2; } /* page break for dedication (xslst keeps on same page as the copyright */ div.dedication { page-break-before:always; } /* Move the legal notice from the title page to its own page */ div.legalnotice{ page-break-before:always; } /* Tile Page formatting */ div.book div.titlepage h1{ font-family: Helvetica,Arial,sans-serif; text-align: center; color: blue; }
The docinfo file is an xml file that holds information about the author and the copyright information. This file needs to have the same name as your AsciiDoc file with -docinfo.xml appended, and be in the same folder as your adoc file. In this example, our AsciiDoc book is named myBook.adoc, so the docinfo file is named myBook-docinfo.xml. Create this file:
touch ~/Desktop/ebook/myBook-docinfo.xml
with the following content (note that there can’t be any blank space at the beginning of this file):
Important note: use a text editor that will automatically recognize that you’re working in an XML file, so that it will format this file correctly (replacing the spaces with tabs). It is important that this file be formatted correctly, or you’ll get errors.
<author> <personname> <honorific>Mr</honorific> <firstname>Noah</firstname> <surname>Dietrich</surname> </personname> </author> <copyright> <year>2017</year> <holder>SublimeRobots Intl.</holder> </copyright> <legalnotice> <para> Copyright © 2017 by Noah Dietrich </para> <para> All rights reserved. This book or any portion thereof may not be reproduced or used in any manner whatsoever without the express written permission of the publisher except for the use of brief quotations in a book review. </para> <para> Printed in the United States of America </para> <para> First Printing, 2017 </para> <para> ISBN 0-9000000-0-0 </para> <para> Jim & Joe Publishers, LLC </para> <para> www.SublimeRobots.com </para> </legalnotice> <cover> <mediaobject> <imageobject> <imagedata fileref="graphics/cover.jpg"> </imagedata> </imageobject> </mediaobject> </cover>
This information will be added to your ebook when it is processed, but is not stored in the adoc file. A good example of a docinfo file can be found here.
Next, we need to get the DocBook XML Schema file (docbookxi.rng) and the DocBook XSLT stylesheets. We will store them in our build-resources folder:
cd ~/Desktop/ebook/build-resources wget http://docbook.org/xml/5.2b01/rng/docbookxi.rng wget http://downloads.sourceforge.net/project/docbook/docbook-xsl-ns/1.79.1/docbook-xsl-ns-1.79.1.tar.bz2 tar -xvjf docbook-xsl-ns-1.79.1.tar.bz2
Finally, we need to download the KindleGen application from Amazon. Navigate to the KindleGen homepage, download the linux version, and extract the kindleGen binary to the build-resources folder.
cd ~/Desktop/ebook/build-resources wget http://kindlegen.s3.amazonaws.com/kindlegen_linux_2.6_i386_v2_9.tar.gz tar -xzvf kindlegen_linux_2.6_i386_v2_9.tar.gz kindlegen
The Book Cover: You’ll need to put a jpeg for the cover into the graphics folder named cover.jpg. If you don’t do this, you’ll need to remove the cover section in your myBook-docinfo.xml file. You can get information on recommended jpeg sizing here.
The first step is converting the adoc file into DocBook XML format. This part can be a little frustrating sometimes, as many small semantic issues can cause errors to show up at this stage. Some issues I have encountered (and quick fixes for them if I have an answer):
from your ebooks directory, assuming you have all the above files setup correctly, run the following command:
cd ~/Desktop/ebook/ asciidoctor --backend docbook5 --doctype book --verbose --destination-dir ./output/ myBook.adoc
here we are using the asciidoctor application to convert the adoc file into an AsciiDoc XML file. This XML file will have all the same content as our original adoc file, only formatted (and marked up semantically) to meet the DocBook schema (docbookxi.rng). The schema describes the legal layout of all valid files. We are using these options:
you should see output similar to:
noah@thor:~/Desktop/ebook$ asciidoctor --backend docbook5 --doctype book --verbose --destination-dir ./output/ myBook.adoc Input file: myBook.adoc Time to read and parse source: 0.00527 Time to render document: 0.00944 Total time to read, parse and render: 0.01476 noah@thor:~/Desktop/ebook$
If you you have errors, you may see output similar to:
asciidoctor: WARNING: myBook.adoc: line 9: invalid style for paragraph: dedication
This usually means that there is either an error in your header, or you docinfo xml file has an error (spaces instead of tabs, extra spaces between elements, and similar issues). You must fix these issues before continuing.
Now that we have our ebook in the DocBook XML file format, we want to validate that it is semantically correct. We want to check to see that its format matches the schema defined in the docbookxi.rng file (the schema in Relax NG schema language). For this, we use a tool called jing. There is another tool called xmllint, that also does validation, but I encountered issues with it, and found jing to be much more reliable. An excellent resource for understanding the details can be found in the Processing DocBook5 section in the DocBook XSL: The Complete Guide (you’ll be referencing this online ebook a lot if you want to do any configuration of your ebook).
So the content of our adoc file and the docinfo file have been combined into a single xml file in the output directory (you can open it to see what it looks like), and we need to validate it to make sure it’s formatted correctly (sometimes asciidoctor makes mistakes). To do this, we run the following command from the same directory as before (not the output directory):
cd ~/Desktop/ebook/ jing -i ./build-resources/docbookxi.rng output/myBook.xml
This command is simple, it takes the docbookxi.rng schema file as the first input (-i), and our book in xml format as our second input, and will tell us if it’s valid (properly formatted) DocBook XML. If you have issues, try to figure out what line of the xml file is causing the issue, and try to track it back to the original asciidoc or docinfo file. This can be a challenge to do, sometimes searching the internet for your error can help.
If you see no output, then there are no errors.
Next, we are going to use xsltproc to convert our DocBook XML file into a series of html files (HTML 5 files actually), copy in our css and images to create a folder that represents our entire ebook, including all required resources.
It helps here to understand how ebook file systems are laid out before they are zipped into an archive we consider an EPUB or mobi file. A basic EPUB has the following files and folder heirarchy stored in a zipped container:
mimetype META-INF/ container.xml OEBPS/ content.opf chapter1.xhtml chapter2.xhtml css/ style.css toc.ncx graphics/ cover.jpg
Good explanations of these files can be found on Wikipedia, as well as here and here.
We need to convert our valid DocBook XML into the above folder structure. Do do this, we use xsltproc, whic applies XSLT stylesheets to XML documents. XSLT stylesheets are a language for converting XML into other formats (in our case, HTML documents). The XSLT stylesheets are provided by the DocBook project. The following command converts our DocBook XML into an EPUB folder hierarchy:
cd ~/Desktop/ebook/ xsltproc --stringparam base.dir ./output/epub3-book/OEBPS/ --stringparam chapter.autolabel 0 --stringparam chunker.output.indent yes ./build-resources/docbook-xsl-ns-1.79.1/epub3/chunk.xsl ./output/myBook.xml
Let’s break this down:
the stringparam options above are specific to the XSLT files that we are working with. To find other options that are available, read through Chapter 7. HTML output options of DocBook XSL: The Complete Guide.
You should see output similar to:
noah@thor:~/Desktop/ebook$ xsltproc --stringparam base.dir ./output/epub3-book/OEBPS/ --stringparam chapter.autolabel 0 --stringparam chunker.output.indent yes ./build-resources/docbook-xsl-ns-1.79.1/epub3/chunk.xsl ./output/myBook.xml Writing ./output/epub3-book/OEBPS/bk01-toc.xhtml for book Writing ./output/epub3-book/OEBPS/ch01.xhtml for chapter(_this_is_the_first_chapter) Writing ./output/epub3-book/OEBPS/ch02.xhtml for chapter(_this_is_the_second_chapter) Writing ./output/epub3-book/OEBPS/ch03.xhtml for chapter(_this_is_the_third_chapter) Writing ./output/epub3-book/OEBPS/index.xhtml for book Writing ./output/epub3-book/OEBPS/docbook-epub.css for book Generating EPUB package files. Writing ./output/epub3-book/OEBPS/cover.xhtml for mediaobject Generating image list ... Writing ./output/epub3-book/OEBPS/package.opf for book Writing ./output/epub3-book/OEBPS/../META-INF/container.xml for book Writing ./output/epub3-book/OEBPS/../mimetype for book Generating NCX file ... Writing ./output/epub3-book/OEBPS/toc.ncx for book noah@thor:~/Desktop/ebook$
We also need to manually move our css file and images into the EPUB folder hierarchy (add any additional graphics you need at this stage):
cd ~/Desktop/ebook/ cp ./ebook-resources/master.css ./output/epub3-book/OEBPS/docbook-epub.css cp -r ./ebook-resources/graphics/ ./output/epub3-book/OEBPS/
The next step is to convert our EPUB folders into a single file (our actual EPUB). To do this we use epubcheck, then rename the file:
epubcheck ./output/epub3-book/ -mode exp -v 3.0 -save mv ./output/epub3-book.epub ./output/myBook.epub
Here we are using -mode exp to have epubcheck validate the expanded EPUB archives, version 3.0, and save it to a single file: ./output/myBook.epub.
This epub file is the first final product. You can view this epub on any epub compatible reader (including calibre, which we installed earlier).
The final step is to convert out epub into the mobi format, for use on Amazon Kindle devices. This is done with KindleGen. This tool is simple, it takes the name of the epub folder to convert, and the name of the .mobi to create:
cd ~/Desktop/ebook/build-resources ./kindlegen ../output/myBook.epub
if you look in your ./output folder, you will now see your epub and .mobi files. If you have an amazon device, you can email the .mobi file to yourself and have it automatically download to your device. All kindle devices support this .mobi format. More information can be found here and here.
You will quickly find that as you are modifying your files, it becomes a hassle to constantly run these commands. The solution to this is to use a Makefile. This tool was originally designed to compile software, but can be easily modified to simplify your ebook workflow.
in your ebook folder, create a new file called Makefile:
cd ~/Desktop/ebook/ touch Makefile
enter the following text (as with the docbook file above, replace spaces at the beginning of lines with tabs if needed):
mobi : epub #ebook-convert ./output/myBook.epub ./output/myBook.mobi ./build-resources/kindlegen ./output/myBook.epub epub : ebook epubcheck ./output/epub3-book/ -mode exp -v 3.0 -save mv ./output/epub3-book.epub ./output/myBook.epub ebook : docbook xsltproc --stringparam base.dir ./output/epub3-book/OEBPS/ \ --stringparam chapter.autolabel 0 \ --stringparam chunker.output.indent yes \ ./build-resources/docbook-xsl-ns-1.79.1/epub3/chunk.xsl ./output/myBook.xml cp ./ebook-resources/master.css ./output/epub3-book/OEBPS/docbook-epub.css cp -r ./ebook-resources/graphics/ ./output/epub3-book/OEBPS/ docbook : asciidoctor --backend docbook5 --doctype book --verbose --destination-dir ./output/ myBook.adoc jing -i ./build-resources/docbookxi.rng output/myBook.xml .PHONY: clean clean : -rm -rf ./output/*
Open a command prompt, navigate to the ebook folder, and you can now build your ebook by issuing the command make complete. If you want to delete all old versions of the ebook, you can run make clean. if you get an error: Makefile:3: *** missing separator. Stop, then you need to replace all spaces with tabs at the beginning of lines (there are issues pasting tabs from a website into a document).
Some of the options you have here:
You can modify this makefile to match your workflow, such as adding options to xsltproc (new lines are broken up with a backslash to improve readability), or having more files added to your ebook directory.
This guide has given you a simple framework for creating an ebook workflow. There are a number of things that can be improved or modified in this process to suit your needs, but hopeful you have learned enough to make these modifications yourself. You’ll probably want to improve the css files for your ebook (there are a number of websites that can better discuss epub css options, some of them are linked below). You may want to look at embedding images into your book, using specific fonts, adding different parameters to the XSL transforms, and many other options.
Feedback is welcomed, especially if there are errors in this guide or recommendations you have from your own experience: please contact me here.
DocInfo.xml example.
Oreily Publications docinfo.xml example for erlang book.
publishing with iBooks example docinfo.xml.
another Oreily docbook.xml example.
Amazon Kindle Publishing Guidelines
CSS Boilerplate for eBooks.
Basic css styles for Kindle html.
The eBook Design and Development Guide on Amazon.
These two guides below use a2x from the asciidoc package, rather than asciidoctor to generate the xml from docbook. I prefer asciidoctor, as i find that it worked better for my workflow.
A good guide on converting docbook to epub and mobi.
Another good guide.