Useful latex/svn tools (merge, clean, svn, diff)

This blog is about some tools that I have developed (and yet another one that I have downloaded) which help me to streamline my latex work cycle. I make the tools available, hoping that other people will find them useful. However, they are admittedly limited (more about this) and as usual for free stuff they come with zero guarantee. Use them at your own risk.

The first little tool is for creating a cleaned up file before submitting it to a publisher who asks for source files. I call it ltxclean.pl, it is developed in Perl. It can be downloaded from here.
The functionality is
(1) to remove latex comments
(2) to remove \todo{} commands
(3) to merge files included from a main file into the main file
(4) to merge the bbl file into the same main file

If you make the tool executable (chmod a+x ltxclean.pl), you can use it like this:

$ ltxclean.pl main.tex > cleaned.tex

How does this work?

The tool reads in the source tex file, processes it line by line and produces some output to the standard output stream, which you can redirect (as shown above) to a file.
Thus, whatever the tool does is limited to the individual lines. This is a limitation, but this made it possible for me to write this tool in probably less time than I spend on writing about it now.
There are other limitations, see below. Now, how do we know that this worked? The advice is to run latex+dvips and then diff original.ps new.ps to see if there is any significant change. On the files I have tried, the only difference was the filename and the date.

Why this functionality and the glory details

As it happens, removing the comments before you submit a source file is crucial. Not long ago, it happened to me that I have submitted a source to a publisher and I did not care about removing the comments. At the publisher, they loaded the file into a program, which wrapped the long lines, including the ones with comments! This created a lot of garbage in the middle of the text. We were pressed against time, though I could not check the text in details. The result: The text was printed with a lot of garbage! Too bad!! A painful experience for me.. I will never again submit source files with the comments kept in the file! Now, the above utility is meant to handle comments correctly. It pays attention to not to create empty lines (and thus new lines) inadvertently, not to remove end-of-line comments etc.

The \todo{} commands belong to the same category: They are better removed before submitting the file. For my todos, I use the todonotes package, which puts todo notes on the margin (or within the text). This package supplies the \todo[XX]{ZZZ} command, where [XX] is optional. The above little script removes such todo commands, but only if they span a single line only. For now, you would need to remove multi-line todos by hand.

Another service of this little tool is to merge multiple files into a single one. Oftentimes, we use the latex command \input to break a large source file into multiple files. However, publishers typically want just one file. So this tool reads in the main file and the recursively, whenever it sees \input{FILE} in the source, it reads in the corresponding file and processes it before it continues with the current file (just like latex would work).

Finally, if the tool finds a \bibliography{...} command, it will take that out and open the .bbl file sharing the same base name as the input to the tool. Thus, if the tool was called on the file main.tex, when seeing a bibliography command, the tool will attempt to open main.bbl and include it in place of the \bibliography command. (If you use hyperref, turn off pagebackref, otherwise this functionality will not work.)

Managing revisions with svn

Two other small utilities that I make available are svnprevdiff and svnreviewchanges.
The purpose of these scripts is to help one review changes to files which are under svn control.
There is a third script, diffmerge, called by the above two scripts. This script takes two file arguments and loads these into the program DiffMerge which allows you to visually inspect the differences between the two files and make changes to the second one loaded. On a different platform/installation, or if you want to use a different tool for comparing/merging files.

The utility svnreviewchanges takes a file as an argument, compares it to its base version stored on your disk and opens up the two versions for comparison using diffmerge. The purpose is to allow one to quickly review how a file was changed before submitting a file to the svn server (so that you can write meaningful comments in the commit message).

The utility svnprevdiff takes a filename as an argument, compares it to its previous version stored on the svn server and then opens up the two versions using diffmerge. The purpose of this is to check the changes implemented by your pals after an update. A future version will take an optional argument which when present will be interpreted as a revision number. Maybe.

Advice on using latex when working in a team: Break long lines

A small, but useful thing is to put every sentence on its own line and generally avoiding long lines (even when writing equations). The reason is that this will make the job of diff much easier. And believe me, diffing is something people will end up doing for good or bad (mostly good) when they are on a team.

Some of my friends, like Antoska would recommend breaking up the individual sentences into multiple lines. You can do this, but if you overdo it, you will find yourself fiddling way too much with what goes into which line.

Finally, a tool which does this, written by Andrew Stacey, is fmtlatex.pl.
This is also in Perl and its documentation will be written on the screen if you use perldoc fmtlatex. I still have to try this.

Comments

Armelius2 June 2011 at 19:27
Thanks for posting the ltxclean.pl script. I was looking for exactly something like it to clean up my latex file from comments before submitting it to publisher, and was glad to find your post.
Anonymous11 November 2011 at 01:38
Same here, great script!
Anonymous26 December 2011 at 16:02
Good work. That helps me a lot. Thank you!
Jörg22 February 2012 at 02:57
Thanks for your script. Removing comments and todos works perfectly, but the flatten function does not work with my documents as some tables get messed up. Anyway, removing comments and todos is really useful, and for flattening I use the really old Flatex script. I have integrated your script into a batch file of mine that integrates Subversion and latexdiff (http://www.jwe.cc/2012/02/workflow-with-subversion-and-latex/).
Csaba Szepesvári28 February 2012 at 21:07
Hi Jörg,
Can you send me the example that messed up the script? I will try to understand what went wrong.
Cheers,
Csaba
Anonymous1 July 2012 at 10:20
I have used ltxclean.pl several times. I think it is great and I thank you for making it available. I have two minor comments that you may wish to think about. First, the latex package comment is not taken account of. I do not know if you did it on purpose, but things between
$\begin{comment} and \end{comment}$ are left unchanged (which to me seems a bit strange). Second, merging the bbl file into the main file could be optional. The way things are it is not easy to compare main.pdf to cleaned.pdf (I use diffpdf for this) as some symbols change (for example "and" changes to "&").
All the best.
Catrin Campbell-Moore6 March 2015 at 04:07
Thank you very much for this.

I had to run it as:
perl ltxclean.pl main.tex > cleaned.tex
to get it to work.

Unfortunately I found that when I try to use it it doesn't remove the todos properly. The problem is if the todos contain closed brackets. For example

"\todo{This code contains \textsf{commands}... and this isn't cleaned}" cleans to "... and this isn't cleaned}".

This is an issue for me because I often have math-commands in my todos.

Search This Blog

Musings about machine learning and other things

Constrained MDPs and the reward hypothesis