An introduction to the BEEHIVE EHR Datasets

MIMIC-IV: http://mimic-iv.mit.edu/

Semi-private dataset from the Beth-Israel Deaconess Medical Center in Boston, Massachusetts. 

Need to fill out physionet credentials, finish the human subjects credentials.

This is a relational database- everything is indexed by either a subject ID, a hospital admission ID, or a stay ID. 

It’s difficult to directly apply ML methods on MIMIC-IV. COP-E-CAT addresses this, you can use it as a way to preprocess and structure your version of the dataset into a tabular format.  

This is hosted on della, but please get access to the dataset (i.e. pass the human subjects certification) before you use that.

Penn Covid-19

Private dataset from the University of Pennsylvania Hospital Medical Center. Contains about 8000 Covid-19 hospitalizations. 

The dataset is located at /tigress/BEE/penn-covidsub/10262021_cohort/ and is formatted just like a regular EHR dataset (i.e. relational database that has information on medicines, labs, vitals, metadata)

This dataset is anonymized. 

Penn Ventilation

Private dataset from the University of Pennsylvania Hospital Medical Center that contains patients who used a ventilator. 

The dataset is located at /tigress/BEE/penn-ventPts and is also formatted just like a regular EHR dataset. 

This dataset is anonymized. 

Upcoming datasets: UCSF clinical notes dataset, UCSF Covid-19 dataset, UK Biobank