Study Datasets

The datasets in a study repository come in three different types:

  • Demographic. Demographic datasets record permanent characteristics of the participants which are collected only once for a study. Characteristics like birth gender, birth date, and enrollment date will not change over time. (From a database point of view, demographic datasets have one primary key, the participantId. Demographic datasets contain up to one row of data for each participant.)
  • Clinical. Clinical datasets record participant characteristics that vary over time in the study, such as physical exam data and simple lab test data. Typical data includes weight, blood pressure, or lymphocyte counts. This data is collected at multiple times over the course of the study. (From a database point of view, clinical datasets have two primary keys, the participantId and a time point. Clinical datasets may contain up to one row of data per subject/time point pair.)
  • Assay/Specimen. These datasets record the assay and specimen data in the study. Not only is this data typically collected repeatedly over time, but more than one of each per time point is possible, if, for example, multiple vials of blood are tested. (From a database point of view, assay/specimen datasets have the same primary keys as Clinical data, plus an optional third key. Multiple rows per subject/time point are allowed.)
In this step, we will import the Demographic and Clinical datasets into the study.

One simple way to create a new dataset is by importing an Excel file containing the data. The column names and types will be inferred from the file and may be adjusted as needed.

Create one or more Demographic Datasets

Each study needs at least one demographic dataset identifying the participants in the study. Our example data files includes two: Demographics and Consent.

  • Click the Manage tab.
  • Click Manage Datasets.
  • Click Create New Dataset.
  • On the Define Dataset screen:
    • Short Dataset Name: Enter "Demographics"
    • Leave the "Define Dataset Id Automatically" box checked.
    • Select the Import from File checkbox.
  • Click Next.
  • Click Choose File.
  • Browse to the sample directory you unzipped and select the file: [LabKeyDemoFiles]\Datasets\Demographics.xls.
  • You will see a preview of the imported dataset. Notice that the sample files we provide already have columns that are mapped to the required server columns "ParticipantId" and "Visit Date". When importing your own datasets, you may need to explicitly set these pulldowns which establish dataset keys.
  • Review the field names and data types and click Import.
  • You will see this dataset:

You have created your first dataset, and can see the ParticipantID and Date columns that will be used to integrate other information about these participants. Next, explicitly mark this dataset as demographic, since there will only be one row for each participant in the study:

  • Click Manage in the link bar above the grid to manage this dataset.
  • Click Edit Definition.
  • Check the Demographic Data checkbox. (This indicates that the dataset is collected only once for this participant and applies for all time.)
  • Click Save.

Import Clinical Datasets

The other .xls files provided in the sample datasets folder contain clinical data. Each time a new test or exam is performed on the participant, a new row of data is generated for that date. There will be multiple rows per participant, but only one row per participant and date combination.

  • HIV Test Results.xls
  • Lab Results.xls
  • Physical Exam.xls
To import this data, repeat the following steps for all three XLS files.

  • Click the Manage tab.
  • Click Manage Datasets.
  • Click Create New Dataset.
  • On the Define Dataset screen:
    • Short Dataset Name: Enter the name of the XLS file being imported (without the file extension).
    • Select the checkbox Import from File.
  • Click Next.
  • Browse to the file, select it, and ensure that all fields are being imported properly.
  • Click Import.

Clinical datasets have two keys: a participantID and a date. When you imported the dataset, you could see which columns would be used as those keys. There is no need to make any explicit changes to the dataset definitions at this time.

Previous Step | Next Step


previousnext
 
expand allcollapse all