U.S. flag

An official website of the United States government, Department of Justice.

NCJRS Virtual Library

The Virtual Library houses over 235,000 criminal justice resources, including all known OJP works.
Click here to search the NCJRS Virtual Library

The Role of Simulated Data in Making the Best Predictions (from the 87th Annual Meeting of the American Association of Physical Anthropologists - 2018)

NCJ Number
253352
Journal
American Journal of Physical Anthropology Volume: 165 Issue: 66 Dated: April 2018 Pages: 195-195
Author(s)
Stephen D. Ousley; George R. Milner; Jesper L. Boldsen; Richard L. Jantz
Date Published
April 2018
Length
1 page
Annotation
This article presents results from two forensic scenarios: predicting sex and ancestry using bone measurements, and predicting age using many osteological traits with a new method (TA3).
Abstract

Machine Learning (ML) methods for regression and classification, along with the bootstrap, have revolutionized the analysis of data through resampling. The resulting simulated data sets are used to select the best fitting models and to estimate prediction precision and accuracy. These two tasks are especially important in forensic analyses, which should reflect predictive data analysis because they will be applied to new cases, rather than summarized in descriptive data analysis. Naturally, we want to use the methods that are expected to be the most accurate and precise for new cases; however, as the great Zen master Berra noted, "It's tough to make predictions, especially about the future." Predictive methods must therefore incorporate the "Known Unknowns" (Rumsfeld, 2002), and avoid overfitting by analyzing multiple independent training and test samples, each of which ideally should be large. Bootstrap and Monte Carlo methods mimic sampling variability that would be present in future cases, and both methods are incorporated into numerous routines to estimate prediction accuracy. No routine is perfect due to bias and variance issues, and to the nature of the data and the analytical method. New routines are always being explored. This article reports on a project that demonstrates the consequences of supposed overfitting may be relatively small in classification, and predicting age using TA3 is far more accurate than using previous methods, even with their underestimated prediction error. (publisher abstract modified)