Site icon The Sociology Place

Local Data and Upstream Reporting as Sources of Error in the Covid 19 Undercount

The growth, stability, and decline in cases and mortality due to Covid 19 are generally based on administrative data that various people and organizations collect, harmonize, and aggregate.

Many data providers have made their data available in machine-readable form for public scrutiny with an intention to guide policy-makers and educate the public. These data are the empirical foundation for the decisions of individuals, institutions, and policy-makers.

The pandemic illuminates the importance of revealing errors in what has been, from the beginning, assuredly an under-count of cases and deaths.

This is based on the article:

Dubrow, Joshua K. 2021. “Local Data and Upstream Reporting as Sources of Error in the Administrative Data Undercount of Covid 19.” International Journal of Social Research Methodology. DOI: 10.1080/13645579.2021.1909337

See also: COVID-19 Counts of Cases and Deaths Reveal Social Group Biases

Importance of Local Data

Data producers have long asked data users to consider where the data come from. A pernicious source of error is at the most local level where the data are collected and reported upstream to the research designers, aggregators, and disseminators.

As locally sourced data are reported upstream, errors may emerge at any point in the data value chain. There is cross-national and cross-level variation in laws and norms of data collection, organization, analysis, storage, and use.

Perversely, a politicized bureaucracy and incentives to misreport become additional sources of systematic error in administrative data. Higher-level authorities of the data infrastructure may initiate or contribute to these errors; they ignore the local context at their peril.

Sources of Error

Data collection of Covid 19 counts is difficult.

At root, organizations depend on information provided by various and unequally resourced local and national data collectors that, in turn, received it from hospitals, labs, and other health organizations and medical authorities that depend on professionals within those organizations to report on Covid 19 cases and mortality.

The upstream reporting process varies by nation and descriptions of upstream reporting, in English, that share details of this process, are rare.

These difficulties are potential and interrelated sources of error.

Imagine the impact of local reporting agencies – the tens of thousands of hospitals with unequal economic development – on standardization. Standardization requires that the data from a variety of sources and at different levels of aggregation are similar enough for comparison across nations, within nations, and over time. Different reporting standards and methods beget judgment calls from both the collector and the aggregator.

As data are attached to local sources, there will be discrepancies in the quality of data reporting, perhaps from national government interference.

Government’s possible interference in data reporting can be more stark or subtle. In these situations, it is not clear whether there are data collection problems or the data are presented in ways that arouse suspicion.

Much overlooked are the conditions of the work environment in which these data are produced, and this includes the structural problems imposed on the people who labor to produce the data. Covid 19 data collection and reporting worldwide began in a novel pandemic and this massive process was not properly standardized within or between nations.

As a result, the people collecting the data, and the systems in which they work, have been unusually stressed.

Upstream reporting of Covid 19 cases and deaths depends on armies of white collar workers whose job it is to fill out the reports of Covid 19 from hospitals, labs, and other health agencies. They depend on lab workers who take the samples and deliver the results. These essential lab and data workers enjoy job security (for now) but are vulnerable to mental and physical anguish.

The speed required in the pandemic to standardize across all parts of the reporting system whose infrastructure cannot handle the load can cause problems in the production of timely and accurate data.


The sudden and voluminous data demands of Covid 19 shocked each nation’s multilevel data infrastructures.

As nations discovered that the health data produced by local sources are vital to national security, some pursue a top-down policy to standardize data reportage.

Local sources, severely stressed and unequally resourced, felt pressure to meet the data needs at their own level and the speedy standardization demands from the national level.

Errors may ensue.

Joshua K. Dubrow is a PhD from The Ohio State University and a Professor of Sociology at the Polish Academy of Sciences.

Exit mobile version