This document presents an unsupervised and analytical workflow for clustering a large collection of forensic images by using classic clustering on deep feature representation of the images along with domain-related data to group them together.
The authors of this brief document seek to resolve the problem of efficiently curating the large collections of forensic images that could contribute to the quality of research in many domains. For their project, the authors’ main sources for images were forensic anthropology centers and crime scenes; the dataset, collected at the University of Tennessee’s Anthropology Research Center, contains one million images that were collected over eight years. The authors’ purpose was to develop and present an unsupervised and analytical workflow for clustering a large collection of forensic images by using classic clustering on deep feature representation of the images in addition to domain-related data to group them together. The authors show the workflow they developed, and discuss its development and methodology. They note that the model is pre-trained on ImageNet and produced a 2048-length feature vector for each image which is then reduced to 256 via PCA; they created vectors for weather, geographic and other external data, as well as image feature representations, in order to successfully cluster a large temporal forensic dataset in an unsupervised manner. The authors’ findings show that by adding weather features, the clustering precision increased to 89% from the initial approach that yielded only 64% precision.