This topic contians various examples of special features of LabKey SQL: Also see:

"Datasets" Special Column

SQL queries based on study datasets can make use of the special column "Datasets", which gives access to all datasets in the current study.

For example, the following query on the PhysicalExam dataset pulls in data from the Demographics dataset. Note the use of the "Datasets" special column.

SELECT PhysicalExam.ParticipantId,
PhysicalExam.Weight_kg,
PhysicalExam.Datasets.Demographics.Height_cm,
PhysicalExam.Datasets.Demographics.Gender
FROM PhysicalExam

"Datasets" provides a shortcut for queries that would otherwise use a JOIN to pull in data from another dataset. The query above can be used instead of the JOIN style query below:

SELECT PhysicalExam.ParticipantId,
PhysicalExam.Weight_kg,
Demographics.Height_cm,
Demographics.Gender
FROM PhysicalExam
JOIN Demographics
ON PhysicalExam.ParticipantId = Demographics.ParticipantId

GROUP BY

The GROUP BY function is useful when you wish to perform a calculation on a table that contains many types of items, but keep the calculations separate for each type of item. You can use GROUP BY to perform an average such that only rows that are marked as the same type are grouped together for the average.

For example, what if you wish to determine an average for each participant in a large study dataset that spans many participants and many visits. Simply averaging a column of interest across the entire dataset would produce a mean for all participants at once, not each participant. Using GROUP BY allows you to determine a mean for each participant individually.

A Simple GROUP BY Example

The GROUP BY function can be used on the Physical Exam dataset to determine the average temperature for each participant across all of his/her visits.

To set up this query, follow the basic steps described in the Create a SQL Query example to create a new query based on the "Physical Exam" table in the study schema. Name this new query "AverageTempPerParticipant."

If you are working with the LabKey demo study, these queries may be predefined, so you can view and edit them in place, or create new queries with different names.

Within the SQL Source editor, delete the SQL created there by default for this query and paste in the following SQL:

SELECT "Physical Exam".ParticipantID, 
ROUND(AVG("Physical Exam".Temp_C), 1) AS AverageTemp,
FROM "Physical Exam"
GROUP BY "Physical Exam".ParticipantID

For each ParticipantID, this query finds all rows for that ParticipantID and calculates the average temperature for these rows, rounded up to the 10ths digit. In other words, we calculate the participant's average temperature across all visits and store that value in a new column called "AverageTemp."

See similar results in our interactive example.

JOIN Columns from Different Tables

Use JOIN to combine columns from different tables.

The following query combines columns from the Physical Exam and Demographics tables.

SELECT "Physical Exam".ParticipantId,
"Physical Exam".Weight_kg,
"Physical Exam".Temp_C,
"Physical Exam".Pulse,
"Physical Exam".Respirations,
"Physical Exam".Pregnancy,
Demographics.Gender,
Demographics.Height
FROM "Physical Exam" INNER JOIN Demographics ON "Physical Exam".ParticipantId = Demographics.ParticipantId

JOIN a Calculated Column to Another Query

The JOIN function can be used to combine data in multiple queries. In our example, we can use JOIN to append our newly-calculated, per-participant averages to the Physical Exam dataset and create a new, combined query.

First, create a new query based on the "Physical Exam" table in the study schema. Call this query "Physical Exam + AverageTemp" and choose to edit it in the SQL Source Editor. Now add edit the SQL such that it looks as follows.

SELECT "Physical Exam".ParticipantId,
"Physical Exam".SequenceNum,
"Physical Exam".Date,
"Physical Exam".Day,
"Physical Exam".Weight_kg,
"Physical Exam".Temp_C,
"Physical Exam".SystolicBloodPressure,
"Physical Exam".DiastolicBloodPressure,
"Physical Exam".Pulse,
"Physical Exam".Pregnancy,
AverageTempPerParticipant.AverageTemp
FROM "Physical Exam"
INNER JOIN AverageTempPerParticipant
ON "Physical Exam".ParticipantID=AverageTempPerParticipant.ParticipantID

You have added one line before the FROM clause to add the AverageTemp column from the AverageTempPerParticipant dataset. You have also added one additional line after the FROM clause to explain how data in the AverageTempPerParticipant are mapped to columns in the Physical Exam table. The ParticipantID column is used for mapping between the tables.

See similar results in the interactive example.

Calculate a Column Using Other Calculated Columns

We next use our calculated columns as the basis for creating yet another calculated column that provides greater insight into our dataset.

This column will be the difference between a participant's temperature at a particular visit and the average temperature for all of his/her visits. This "TempDelta" statistic will let us look at deviations from the mean and identify outlier visits for further investigation.

Steps:

  • Create a new query named "Physical Exam + TempDelta" and base it on the "Physical Exam + AverageTemp" query we just created above.
  • Add the following SQL expression in the Query Designer, following the last column selected. Don't forget to add a comma on that previous line:
ROUND(("Physical Exam + AverageTemp".Temp_C-
"Physical Exam + AverageTemp".AverageTemp), 1) AS TempDelta
  • Provide a longer display name for the new column by pasting this content on the XML Metadata tab.
<tables xmlns="http://labkey.org/data/xml">
<table tableName="Physical Exam + TempDelta" tableDbType="NOT_IN_DB">
<columns>
<column columnName="TempDelta">
<columnTitle>Temperature Difference From Average</columnTitle>
</column>
</columns>
</table>
</tables>

See similar results in the interactive example.

Filter Calculated Column to Make Outliers Stand Out

It can be handy to filter your results such that outlying values stand out. This is simple to do in a LabKey grid using the column header filtering options.

Using the query above ("Physical Exam + TempDelta"), we want to show the visits in which a participant's temperature was unusually high for them, possibly indicating a fever. We filter the calculated "Temperature Difference From Average" column for all values greater than 1.5. Just click on the column header, select Filter. Choose "Is Greater Than" and type "1.5" in the popup, then click OK.

This leaves us with a list of all visits where a participant's temperature was more than 1.5 degrees C above the participant's mean temperature at all his/her visits. Notice the total number of filtered records is displayed above the grid.

See similar results in the Demo Study.

JOIN Data from Different Assays

The following example shows how to join columns from two different assays (ViralLoad and ImmuneScore) so that the columns RNACopies and Measure1 can be compared or visualized.

SELECT 
CASE
WHEN ViralLoad.ParticipantId IS NULL THEN ImmuneScore.ParticipantId
WHEN ViralLoad.ParticipantId IS NOT NULL THEN ViralLoad.ParticipantId
END AS
ParticipantId,
CASE
WHEN ViralLoad.VisitID IS NULL THEN ImmuneScore.VisitID
WHEN ViralLoad.VisitID IS NOT NULL THEN ViralLoad.VisitID
END AS
VisitID,
ViralLoad.RNACopies,
ImmuneScore.Measure1,
FROM ViralLoad
FULL OUTER JOIN ImmuneScore ON
ViralLoad.ParticipantId = ImmuneScore.ParticipantId AND ViralLoad.VisitID = ImmuneScore.VisitID

Discussion

Was this content helpful?

Log in or register an account to provide feedback


previousnext
 
expand all collapse all