Colored word-based diff

When collaborating on a version-controlled LaTeX document I want to easily see which words (rather than lines) have been inserted and removed. Here are my notes on how to do it.

diff is the standard utility for displaying differences between text files. It is most useful for source code because it displays lines that differ between files. Line-based diffs are less useful for documents. Fortunately the GNU utility wdiff provides word-based diffs.

cwdiff is a wrapper to wdiff that can be used instead of diff to get colored, word-based diffs with context. Version control systems can easily be coerced into using this wrapper instead of diff. Examples: svnwdiff, bzrwdiff.

Git users have an easier life, they can use the built-in --color-words or --word-diff options. In fact, users of all version control systems can use git’s diff tool: gitcwdiff is a shell script wrapper that uses git instead of wdiff to do the work. This wrapper could replace cwdiff in the above wrappers for bzr and svn. This alternative wrapper will be useful if git is installed but wdiff isn’t (more and more likely these days).

Viewing output in a pager

If you want to page through colored text in a terminal using less, you’ll need to use the -R option or set LESS=-R in your shell configuration. Further aside: my zsh configuration file contains a global alias for conveniently paging through colorized output:

alias -g L='2>&1 |less -i -R'

To see the differences in a latex file between revisions 1 and 2 I do one of:

bzrwdiff -r1..2 paper.tex L
svnwdiff -r1:2 paper.tex L

and get nicely paging, searchable, colored word-based diffs.

Old version

The cwdiff wrapper above colorizes commands by giving wdiff escape codes generated by tput. I used to use colordiff, but its support for colorizing wdiff is slightly buggy. Fragments like “{+He said that it [the widget] needed work+}” are incorrectly colored, and square brackets appear frequently in LaTeX. I submitted a patch that attempts to improve things, but two years later it hadn’t been incorporated. You can get the resulting fixed colordiff script that I used to use, and the cwdiff.old script that used it.