Markup languages with Git, a document workflow

Markup languages with Git, a document workflow

Currently I’m writing this entry in a text editor that is a WYSIWYG type of editor. This is entirely common, to the point where people are just conditioned to thinking that editing text and WYSIWYG go together. There are however a number of document markup languages that are not WYSIWYG, lightweight ones such as Markdown and ReStructuredText are popular along with more powerful languages such as LaTeX.

All these markup languages store your documents in plain text files, for example LaTeX stores these as “.tex” files which then get compiled into whatever final output type you end up using (in my case this is usually pdf). At first I remember being put off by the way in which you needed to compile your documents with LaTeX, for someone conditioned to using packages like open office this extra step felt alien and weird. So strong was the sense strangeness of using a markup language for my regular documents that I abandoned using LaTeX the first time I encountered it. I didn’t come to a favourable cost vs benefit breakdown for learning LaTeX. This was because I overestimated the learning curve and I didn’t see how big the benefits were until later. The benefits of using a markup language very clearly outweigh the negatives for a variety of situations and document types. Hopefully after reading this some of these benefits will be more obvious.

It is important to choose the right tool for the job. For complex typesetting problems LaTeX is extremely powerful, for example if you need to typeset mathematics formulae or have extreme control over your formatting it’s just about the only game in town. However you don’t always need this power, if you are just dealing with simple documents with basic formatting you probably will do better using something like Markdown. This is especially the case if you are trying to convince other team-mates to also use a markup language for their documents. The benefits have to be clear to people for them to make the jump from whatever it is that they are currently using to create their documents. For me the tipping point was when I got into using version control software effectively, at this point the power of using documents with a markup language became really clear. Because I was working on a mathematics paper LaTeX ended up being the best choice for that project.

Collaboration

Have you ever had to collaborate on a sizeable document with a team? If so how did you manage merging all the individual changes into the final master document? I did this once for a technical report with open office writer and the process was particularly painful, combining all the changes was slow and error prone. There was a proliferation of similar files as well to deal with with people making a lot of temporary files, questions such as which files are most recent and which changes have to be migrated came up again and again. A couple of times we lost changes too because of documents being simultaneously overwritten. All these problems are hard to solve in an ad-hoc manner, thankfully this is mostly a solved problem though. The fact that LaTeX documents are stored in plain text is extremely useful when you consider that you can be using version control software on your documents. As the documents are stored in plain text this allows you to use all the software out there that can deal with maintain changes to text.

If you are not familiar with version control you really should stop and take some time to find out more. One of the big wins version control creates is when you need to collaborate on a document. The LaTeX + Git combination really shines through because it is now extremely straightforward to share your changes via making commits and merging your changes. (Note that any other markup language for documents will get this same benefit). Because Git is decentralized you don’t need to set up any servers, in fact you don’t even need a network or internet access, you can just push changes with a USB key if you need. This makes backing up your files really easy and saves the massive hassle of dealing with a ton of similarly named backup directories or zip files.

Benefits of using Git

From the point where you have got Git up and running onwards you can just edit your documents and save the changes in git as you go along. Aside from easier collaboration you get a lot of other major benefits essentially for free by doing it this way including:

  • A reliable way to back up your files (via git push to another location)
  • A full history of changes people have made.
  • Ability to check differences between revisions of the file.
  • Find out who wrote what lines by using things such as “git blame”.
  • Find things like the breakdown of how many lines people wrote in files.
  • Ability to integrate scripting tools into your document workflow (for example you could use git hooks to say build your document and place it in a common directory every time someone makes a push)

Note that some of those benefits have nothing at all to do with collaboration, even when I’m working on a project by myself I find Git to be worth using because of the way it provides me backups and a convenient way to look at what changes were made over time.

Here’s an example some of those things from looking at a README markdown file from a project I worked on. Here’s the log of commits changes for a file:

git log README.md

Which give the following:

commit d6a32350c3261c00f761eb3e125d025186e04104
Author: Tim <tim@example.com>
Date:   Thu Oct 4 14:22:35 2012 +1000

    Made a simple addition to the readme about git usage

commit b38dcd7b7f2e03248d85d4fe9f6fbb51d44c816a
Author: Janis <janis@example.com>
Date:   Wed Oct 3 16:24:27 2012 -0400

    Created a TODO file to track feature requests and added some more information to the README

commit 1bff657c061d89545fbce9e2cacf4aa99f9f1f73
Author: Janis <janis@example.com>
Date:   Tue Oct 2 21:23:23 2012 -0400

    added a little text into the readme file

commit 40e58627a20f7cd14c06e3db5865e0d995fbedf4
Author: Bob <bob@example.com>
Date:   Tue Oct 2 18:02:07 2012 -0700

    Initial commit

You can also do things like get a quick breakdown of who wrote which lines:

# count the number of lines attributed to each author
git blame --line-porcelain file_name |
sed -n 's/^author //p' |
sort | uniq -c | sort -rn

For example running that on README.md:

16 Janis
 3 Bob
 2 Tim

This just scratches the surface of the power that writing your documents in a markup language along with version control affords you.

Misc

Not every file that is generated in the process of making a document should be kept in source control. For example with LaTeX in general I’d make the following .gitignore file types:

*.aux
*.backup
*.dvi
*.log
*.pdf
*.ps
*.tex~

Leave a Reply

Your email address will not be published. Required fields are marked *