Duplicate Image Algorithms

19 Mar 2006
Tags: VitaminSEE

GQView was my favorite image viewer on Linux, and a high quality image duplicate finder is the only feature that VitaminSEE lacks that I miss from GQView. I’ve already started thinking beyond 0.7.2 to the big feature of 0.8: Duplicate image search, like in GQview. I wasn’t sure what the algorithm in gqview was, so I finally stumbled upon this page which outlines a few image comparison implementations.

GQView looks like the simplest algorithm: It subdivides the image into a 32 x 32 grid, and then takes the average pixel color of each block. The difference is simply the sum of the difference between each block in the two images, normalized to a value between 0 and 1. The similar.c code in GQView is really that simple. I’m surprised that I got such good results back when I used it. Unless I find something better, I’m guessing that this is going to be the base algorithm, after I figure out how GQView scales with it to deal with hundreds (thousands?) of files.

And after 0.7.2 is released, of course.