#11509. Dude, Where’s My Treatment Effect? Errors in Administrative Data Linking and the Destruction of Statistical Power in Randomized Experiments
August 2026 | publication date |
Proposal available till | 14-05-2025 |
4 total number of authors per manuscript | 0 $ |
The title of the journal is available only for the authors who have already paid for |
|
|
Journal’s subject area: |
Law;
Pathology and Forensic Medicine; |
Places in the authors’ list:
1 place - free (for sale)
2 place - free (for sale)
3 place - free (for sale)
4 place - free (for sale)
Abstract:
The increasing availability of large administrative datasets has led to an exciting innovation in criminal justice research—using administrative data to measure experimental outcomes in lieu of costly primary data collection. In order to minimize mistaken linkages, researchers often use stringent linking rules like “exact matching” to ensure that speculative matches do not lead to errors in an analytic dataset. We show that this, seemingly conservative, approach leads to underpowered experiments, leaves real treatment effects undetected, and can therefore have profound implications for entire experimental literatures. We derive an analytic result for the consequences of linking errors on statistical power and show how the problem varies across combinations of relevant inputs, including linking error rate, outcome density and sample size. In contrast to exact matching, machine learning-based probabilistic matching algorithms allow researchers to recover a considerable share of the statistical power lost under stringent data-linking rules. Failure to implement linking procedures designed to reduce linking errors can have dire consequences for subsequent analyses and, more broadly, for the viability of this type of experimental research.
Keywords:
Administrative data; Machine learning; Randomized experiments; Record linking
Contacts :