Photo Management with git-annex and bash

If you are a regular reader, you may have noticed that I tend to feel unconfortable using one-size-fits-all everything-plus-coffee suites that take control of my data and try to be helpful. Just like my music player of choice doesn't keep my library, my image viewer does not double as a photo manager, nor it can be used for photoediting.

This article describes the set of tools and scripts I use to manage a collection of pictures in a way that satisfies my requirements.

[1]The usecase for this is being able to upload and manage photos while I'm traveling, so that when I'm back home they are already in the collection.




Graph of the repositories.

Rectangles are machines, ovals are repositories, red for bare and blue for regular ones. Red lines are git traffic and blue lines are git-annex traffic, the r and rw labels refer to usage and not strictly permissions.

The photos are stored in a set of git-annex repositories; a bare git repo is kept under gitolite to aid merging but does not store any picture. A copy of every pic is stored in one of two regular repositories on a NAS at home, one for each owner.

The repositories in regular use are those on the PC or the various portable devices (laptop, pandora); those are the ones where pictures are downloaded from the camera, added to the annex and then copied/pushed around. These repositories only carry the photos from the last year or so, or ones I plan to use/watch in the near future.

The repositories on the NAS could be used to import new images from the camera, and they can be shared via NFS or samba, but this is usually not needed, thanks to git-annex.


On a repository there is a simple directory structure:

- bin
+ pictures
| + owner_1
  | - ...
  | - 2011
  | + 2012
    | - 20120101
    | - 20120110
    | - ...
| - owner_2
+ tags
| - tag_1
| - tag_2
| - ...

bin includes a few scripts described in the next section.

pictures is where the (git-annex managed links to) pictures are, divided by owner of the camera [2], year and date, and with a filename that is based on the original ones, plus the hour when the pic was taken (from exim data).

[2]Which is mostly, but not always, the one who took the picture.

tags has a list of subdirectories, one for each tag, with links to the files in pictures with the names edited to include the date (so that they are properly sorted). Right now there is no (simple) way to get a list of images with a combination of tags, but I may write a script for it in the future.


Beside watching the pictures with feh (which includes both a slideshow and a thumbnail preview mode) and interacting directly with git-annex to move files around, I have written a couple of scripts for common operations.

The first one, simply takes a list of directories, looks for jpg images, copies them in a subdir of the current directory and adds them to the annex.

A typical usage is:

mount /mnt/sd_card
cd $PICS_REPOSITORY/pictures/owner_1/2012
$PICS_REPOSITORY/bin/ /mnt/sd_card/DCIM/*
git commit -m 'A nice photo taking trip'
git annex copy 2012MMDD/* --to nas_her
git push

The other script, is used to tag images, with the following syntax: "<tag 1> [<tag 2> [...]]" <file 1> [<file 2> [...]]

i.e. the first option is a list of space separated tags (with the spaces protected from the shell), followed by a list of files.

If the environment variable $TGALL_TAGDIR is set and points to a directory it is used as the tags directory, otherwise it assumes that you are in the same directory as the image and tries to use ../../../../tags.

Please note that while I've been using these scripts for some months, and they derive from scripts I started to use years ago, they are still somewhat specific to my usecase, and not really ready to be used as-is in a generic environment.

Backup stragegy


Full graph of the repositories.

As in the graph above, plus purple lines means git+git-annex traffic, and gray ovals are special git-annex directory (remotes).

Of course, no photo management is complete without a proper [3] backup stragegy.

[3]One nearby (for quick restores), one 30 km away, 30 meters above ground, one 30 km away, 30 meters below ground. :)

First of all, the annex repository is set so that at least two copies of each file are always reachable, by adding * annex.numcopies=2 to .gitattributes. For remote storage, there is a small, low power, computer in the home of a trusted friend, with a repository that keeps a copy of every picture. In the future this may be moved to a friends tahoe-lafs network, using the special git-annex remote.

Additionally, I have created a few directory special git-annex remotes, which get burned on DVD and kept at my parents' home as a last resort.


Years ago, when I bought my first digital camera, as soon as I realized that dropping files in random directories would not scale I tried to use the first (graphical) photo manager I found. While it sort-of-worked, I wasn't fully satisifed with it, and after the 1000th papercut I decided to start organizing my pictures using directories (one per day) and symbolic links to build thematic albums, using a bash script to sort things in their places.

I called this thing tgall (for text gallery) and placed a GPL header on the script, with the idea that maybe one day it would have grown into something generic and distributable.

This, stored on my computer with yearly backups on DVD, worked fine until the concept of "my computer" started to become less defined; at this point I moved everything to a NAS, sharing the pictures to other devices via NFS, but it wasn't a very satisfying solution.

Later I found out about git-annex, fell in love with the concept, rewrote the original script to support it and adopted it, with the results described in this article.

Send a comment: unless requested otherwise I may add it, or some extract, to this page.

Return to Top