Subversion is a bit lacking on the merging and branching front in comparison with some of the newer distributed version control systems, but it does make working with large projects easy. The single biggest reason for this is sparse checkouts.

At work we have a source tree that contains artwork, documentation, the source for all the third party libraries we use and compiled versions of these for multiple architectures and platforms (32bit, 64bit, windows, RHEL4, RHEL5 etc). This makes the full checkout rather large and there are many times when you only need a small fraction of all the files.

The problem with sparse checkouts is that it can become very laborious manually setting one up for anything beyond a few directories. For more details on the basics see here. As with many tedious tasks - computers can help!

I put together a little script for helping with this - get it here. I hope it's useful and keep reading for more details on how it works.

In Action

The final version of the script makes doing a sparse checkout as simple as doing an standard checkout:
./checkout.rb svn://server/trunk

To checkout using a named subset of files:
./checkout.rb --map documentation svn://server/trunk

To checkout using a locally defined subset of files (rather than a subset stored on the sever):
./checkout.rb --map local.yaml svn://server/trunk

Map Files

The map files are a simple YAML file that specifies which directories to checkout and to what depth. The map file allows other map files to be included meaning you can build up a new file map from existing ones.

Checkout maps (code.yaml):

 description: Everything needed to compile and build
 base: build/
         - build/thirdparty/*
         - build/code/buildtools/*
         - build/buildtools/@
         - build/plugins/*
         - build/libs/linux/*
         - build/libs/windows/*

Storing Map Files

In order to provide a centralised, easy to find location for the map files the script looks for a folder called sparse at the top level of a particular branch (e.g. svn://server/trunk/sparse). This folder contains all the sparse checkout file maps e.g.:

 $ svn ls svn://server/trunk/sparse

Storing the map files in subversion itself is one of the strengths of this system. This allows the file maps to be versioned along with the contents of the repository itself. If the maps were stored outside of subversion it would be impossible to keep them in sync with changes to the repository layout, something which became rapidly apparent in the first version of the script!

Map File Syntax

:Named map file stored in the sparse folder or a local file
:Textual description to list when doing --listmaps
:Directories to strip from the paths when doing the checkout
:List of files to checkout, must contain one of all, linux or windows sections
:List of folders to checkout on all platforms
:List of folders to checkout on linux
:List of folders to checkout on windows

The @ and * specify what to checkout inside the named folder:

 @ - Checkout only the files in the folder and don't recurse any deeper (--depth=files)
 * - Checkout the folder and everything inside it (--depth=infinity)

The Script

Finally the script!.

7 thoughts on “Subversion sparse checkout tool

  1. Hi Mark,

    This checkout.rb script is great! Nicely done. I'm a software developer at a small company that uses SVN version control, and we have many instances in which we use sparse checkouts. This script makes sparse checkouts so much nicer!

    I would like to know if I can create a README file from your explanation of the script. I would like to store it with the script itself, including attribution and a link to your website. I will not redistribute it, but will only keep them together in-house.

    Thanks very much.

  2. I wrote a short comment that I have placed in some of our .yaml files, to explain the use of the "base:" tag. Please let me know what you think. Thanks.

    # The "base" is appended to the repository-name on the command-line
    # of the sparse_checkout.rb script, to form the URL for the
    # checkout. e.g. base "foo" and URL "svn://xyz" would form URL "svn://xyz/foo".
    # The last component of the "base" is also the directory-name
    # of the working copy. e.g. base "foo" would make working copy "foo".
    # e.g. base "foo/bar" would make working copy "bar".

  3. Yeah that's a good summary of base property. The main reason behind putting it in the situation of doing a checkout of a set of deeply nested folders.

    Just for reference you can also supply a final command line argument to give the name of the folder to place the working copy into e.g. ./checkout.rb svn://server/trunk mark would do a checkout of trunk into a working copy folder called "mark".

  4. Hi Mark,
    I made some edits to the script that I would like to offer back to you. As we were using the script, it seemed to create intermediate directories that were marked as "depth empty", even if they had subdirectories. My edits change this behavior so that all intermediate directories are marked as "depth immediates" or "depth infinity", as appropriate.

    I would like to offer these changes back to you. I think you have my email address from the private "mail" field of this reply. If you like, you could email me your address, and I will send you the changed script and a "diff" file.

  5. Urr, I forgot to clearly state the motivation for the change. I want to be able to run "svn status" from the top of the working copy, and see the status of all contents of the subdirectories. However, if an intermediate directory is marked as "depth empty", the status command will not traverse into it.
    This change allows "svn status" to traverse the whole tree. The only minor downside is that one extra level of empty leaf directories is created at the edge of the sparse directory hierarchy.


Leave a reply

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>