#12548. A Quadrilogy for (Big) Data Reliabilities
July 2026 | publication date |
Proposal available till | 29-05-2025 |
4 total number of authors per manuscript | 0 $ |
The title of the journal is available only for the authors who have already paid for |
|
|
Journal’s subject area: |
Communication; |
Places in the authors’ list:
1 place - free (for sale)
2 place - free (for sale)
3 place - free (for sale)
4 place - free (for sale)
Abstract:
This paper responds to the challenge of testing the reliabilities of really big data and proposes a quadrilogy of four measures of the reliability of data. These measures grew out of the recognition that crowd coded data contest big data scientists’ conviction that the social contexts and meanings of data become irrelevant in the face of their sheer volumes. Bigness has also challenged available inter–coder agreement coefficients and available software, which are either too restricted regarding the forms of data they accept or exceed computational limits when data become very large. In the course of tailoring Krippendorff’s alpha to very large data, the possibility emerged of dividing the concept of reliability into four separate kinds, serving different methodological aims in social research. They respectively assess the replicability of the process of generating data, the accuracy of generating data, the surrogacy of proposed theories, coders, formulas, or algorithms to serve as a substitute for human coders, and the decisiveness among several human judgements. Their mathematical relationships assure comparability. The paper develops this quadrilogy of agreement measures first for binary data, provides a link to software for computing it, but then extends it to nominal data–a first step towards further generalizations. It also proposes a computational path to estimate the confidence limits for each of these measures and the probabilities of accepting data as reliable when there is a chance of being below a tolerable level.
Keywords:
Crowdsourcing; Data reliability; Data science; Krippendorff’s alpha; Replicability; Accuracy; Surrogacy; Decisiveness; Coincidences; Contingencies; Reliability benchmarks
Contacts :