The authors discuss the suitability and potential benefits of using hidden Markov models as a basecalling tool, presenting a very simple model that matches the overall performance of PHRED in a preliminary evaluation; they provide detailed discussion of the motivation and theoretical background for their research, model selection for DNA sequencing, the generation of training data and a discussion of their model training, and model implementation and results.
In this paper we propose hidden Markov models to model electropherograms from DNA sequencing equipment and perform basecalling. The authors model the state emission densities using artificial neural networks, and modify the Baum–Welch re-estimation procedure to perform training. Moreover, they develop a method that exploits consensus sequences to label training data, thus minimizing the need for hand labeling. The authors propose the same method for locating an electropherogram in a longer DNA sequence. They also perform a careful study of the basecalling errors and propose alternative HMM topologies that might further improve performance. Their results demonstrate the potential of these models. Based on these results, the authors conclude by suggesting further research directions. (Published Index Provided)
Downloads
Similar Publications
- A Synthesis of the 2021 NIJ Forecasting Challenge Winning Reports
- Further Development of Raman Spectroscopy for Body Fluid Investigation: Forensic Identification, Limit of Detection, and Donor Characterization
- Utilizing Derivatizing Agents for the Differentiation of Cannabinoid isomers in Complex Food, Beverage and Personal-care Product Matrices by Ambient Ionization Mass Spectrometry