Authors

  1. Martin, Erika G. PhD, MPH
  2. Law, Jennie MPA
  3. Ran, Weijia MPhil
  4. Helbig, Natalie PhD, MPA
  5. Birkhead, Guthrie S. MD, MPH

Abstract

Context: Government datasets are newly available on open data platforms that are publicly accessible, available in nonproprietary formats, free of charge, and with unlimited use and distribution rights. They provide opportunities for health research, but their quality and usability are unknown.

 

Objective: To describe available open health data, identify whether data are presented in a way that is aligned with best practices and usable for researchers, and examine differences across platforms.

 

Design: Two reviewers systematically reviewed a random sample of data offerings on NYC OpenData (New York City, all offerings, n = 37), Health Data NY (New York State, 25% sample, n = 71), and http://HealthData.gov (US Department of Health and Human Services, 5% sample, n = 75), using a standard coding guide.

 

Setting: Three open health data platforms at the federal, New York State, and New York City levels.

 

Main Outcome Measures: Data characteristics from the coding guide were aggregated into summary indices for intrinsic data quality, contextual data quality, adherence to the Dublin Core metadata standards, and the 5-star open data deployment scheme.

 

Results: One quarter of the offerings were structured datasets; other presentation styles included charts (14.7%), documents describing data (12.0%), maps (10.9%), and query tools (7.7%). Health Data NY had higher intrinsic data quality (P < .001), contextual data quality (P < .001), and Dublin Core metadata standards adherence (P < .001). All met basic "web availability" open data standards; fewer met higher standards of "hyperlinked to other data."

 

Conclusions: Although all platforms need improvement, they already provide readily available data for health research. Sustained effort on improving open data websites and metadata is necessary for ensuring researchers use these data, thereby increasing their research value.