Although this paper addresses some of the problems stemming from unreliable and inaccurate law enforcement data and the ideas proposed use “name data” as an illustration of how to deal with “dirty data,” the proposed approaches are extended to other types of dirty data in the law enforcement databases, such as addresses, stolen item/article names/descriptions/brand names, etc.
For all database systems that collect and store data, data input errors occur, resulting in less than perfect data integrity, or what is common referred to as the “dirty data” problem. American investigators are not familiar with many foreign names such as Zacarias Moussaoui. If the first or last name is spelled incorrectly during a query, the person record could be missed. Individuals who are chronic offenders and those who are attempting to evade detection use alias. Moussaoui is also known as Shaqil and Abu Khalid al Sahrawi. Unless smart analytical tools are available for effective name matching where data integrity conditions, challenging name spellings, and deliberate obfuscation are present, the likelihood of missing a critical record is high. This paper addresses some of the problems stemming from unreliable and inaccurate law enforcement data. Although the ideas proposed are using “name data” as an illustration of how to deal with dirty data, the proposed approaches will be extended to other types of dirty data in the law enforcement databases, such as addresses, stolen item/article names/descriptions/brand names, etc. (Published abstract provided)