U.S. flag

An official website of the United States government, Department of Justice.

NCJRS Virtual Library

The Virtual Library houses over 235,000 criminal justice resources, including all known OJP works.
Click here to search the NCJRS Virtual Library

Basecalling Using Hidden Markov Models

NCJ Number
309117
Author(s)
Petros Boufounos; Sameh El-Difrawy; Dan Ehrlich
Date Published
March 2004
Length
14 pages
Annotation

The authors discuss the suitability and potential benefits of using hidden Markov models as a basecalling tool, presenting a very simple model that matches the overall performance of PHRED in a preliminary evaluation; they provide detailed discussion of the motivation and theoretical background for their research, model selection for DNA sequencing, the generation of training data and a discussion of their model training, and model implementation and results.

Abstract

In this paper we propose hidden Markov models to model electropherograms from DNA sequencing equipment and perform basecalling. The authors model the state emission densities using artificial neural networks, and modify the Baum–Welch re-estimation procedure to perform training. Moreover, they develop a method that exploits consensus sequences to label training data, thus minimizing the need for hand labeling. The authors propose the same method for locating an electropherogram in a longer DNA sequence. They also perform a careful study of the basecalling errors and propose alternative HMM topologies that might further improve performance. Their results demonstrate the potential of these models. Based on these results, the authors conclude by suggesting further research directions. (Published Index Provided)