This report describes work that aimed to develop methods that would at least partially automate the curation of a database containing more than one million forensic photographs; the project resulted in the creation of deep learning models which are needed to accurately group images by body parts, to segment body parts, and to achieve accurate results with few manually tagged images.
The authors of this report address the difficulty of using very large photograph collections for research or law enforcement such as that hosted by the Anthropology Research Facility (ARF) in the Forensic Anthropology Center (FAC) at the University of Tennessee, Knoxville (UTK). The collections found in the database have grown to more than 4TB of disk space of over one million photographs. This report describes an effort to develop a big data approach for curating large databases that would allow research or law enforcement to find forensically relevant features within the image content. The authors report on the development of methods that would at least partially automate curation; they specifically set out to do the following: train deep learning models on the set of existing tags and use those trained models to tag the remaining one million images; implement new capabilities in Image Cloud Platform for Use in Tagging and Research on Decomposition (ICPUTRD) that make it possible for experts in human decomposition to evaluate and improve the accuracy of those model-generated tags with minimal effort; and organize the detailed multivariate-temporal data representing the incidence of features representing the nomenclature terms and related covariates, such as temperature/humidity exposure, for hundreds of donors, and provide relevant analysis tools and methods. The authors conclude that applying a big data approach for more than one million images collected in ARF is likely to dramatically increase the fidelity to the analysis and yield more accurate results; they suggest that their study’s results can directly impact the medicolegal community by providing opportunities to model the decompression process, accounting more precisely for more sources of error, and produce known error rates, and they suggest that the legal value of evidence is increased when there are quantifiable error rates.