This paper discusses language-based techniques to authenticate documents involved in legal proceedings.
Documents composed on the computer, printed over networks, faxed over telephone lines or simply stored in electronic memory cannot be identified by traditional handwriting or typewriting analysis. In the case of networked printers--to which thousands of potential users have access-- even ink, paper, and printer identification cannot narrow the range of suspects or produce a solitary identification. Reviewing cases back as far as 1901, the paper discusses the question of whether the language of a document--grammar, spelling, punctuation, style, etc.--can be used to link document and author. The paper includes a summary of decisions regarding language-based evidence; conceptions of language use related to techniques; and empirical testing of language-based identification techniques. Techniques relying on common misconceptions about language were unreliable. However, techniques relying on linguistic science appeared to accurately cluster and discriminate documents. Tables, figures, bibliography, appendix