Overview

Users importing instrument-generated tabular datasets into LabKey Server can solve a wide range of challenges using a transform script within the LabKey assay framework. For example:

  • Instrument-generated files often contain header lines before the main data table, denoted by a leading #, !, or other symbol. These lines often contain useful metadata about the protocol, reagents, or samples tested which ideally should be incorporated into the data import. Or, at least, these header lines need to be skipped over to find the main data table.
  • The file format is optimized for display, not for efficient storage and retrieval. For example, columns that correspond to individual samples are difficult to work within a database.
  • The data to be imported contains the display values from a lookup column, which need to be mapped to the foreign key values for storage.
  • You may need to fill in additional quality control values with imported assay data (see the example below for a demonstration of this).
R is a good choice of language for writing transform scripts, because R contains a lot of built-in functionality for manipulating tabular data sets.

Identifying the Path to the Script File

Transform scripts are associated with an assay by placing an absolute path to the script file in the Transform Scripts field in the assay designer. For details see Transformation Scripts. It is convenient to upload the script file to the File Repository in the same folder as the assay design. The the absolute path to the script file can be determined by concatenating the file root for the folder (available at > Folder > Management > Files tab) plus the path to the script file in the File web part (for example, "scripts\LoadData.R"). In the file path, LabKey Server accepts either backslashes (the default Windows format) or forward slashes.

When working on your own developer workstation, you can put the script file wherever you like, but putting it within the File Repository will make it easier to deploy to a production server. It also makes iterative development against a remote server easier, since you can use a Web-DAV enabled file editor to directly edit the same script file that the server is calling.

If your transform script calls other script files to do its work, the normal way to pull in the source code is using the source statement, for example

source("C:\lktrunk\build\deploy\files\MyAssayFolderName\@files\Utils.R")

But to keep the scripts so that they are easily moved to other servers, it is better to keep the script files together in the same directory. Use the built-in substitution token "${srcDirectory}" which the server automatically fills in to be the directory where the called script file (the one identified in the Transform Scripts field) is located, for example:

source("${srcDirectory}/Utils.R");

Accessing and Using the Run Properties File

The primary mechanism for communication between the LabKey Assay framework and the Transform script is the Run Properties file. Again a substitution token ${runInfo} tells the script code where to find this file. The script file should contain a line like

run.props = labkey.transform.readRunPropertiesFile("${runInfo}");

The run properties file contains three categories of properties:

1. Batch and run properties as defined by the user when creating an assay instance. These properties are of the format: <property name> <property value> <java data type>

for example,

gDarkStdDev 1.98223 java.lang.Double

When the transform script is called these properties will contain any values that the user has typed into the “Batch Properties” and “Run Properties” sections of the upload form. The transform script can assign or modify these properties based on calculations or by reading them from the raw data file from the instrument. The script must then write the modified properties file to the location specified by the transformedRunPropertiesFile property.

2. Context properties of the assay such as assayName, runComments, and containerPath. These are recorded in the same format as the user-defined batch and run properties, but they cannot be overwritten by the script.

3. Paths to input and output files. These are absolute paths that the script reads from or writes to. They are in a <property name> <property value> format without property types. The paths currently used are:

  • a. runDataUploadedFile: the raw data file that was selected by the user and uploaded to the server as part of an import process. This can be an Excel file, a tab-separated text file, or a comma-separated text file.
  • b. runDataFile: the imported data file after the assay framework has attempted to convert the file to .tsv format and match its columns to the assay data result set definition. The path will point to a subfolder below the script file directory, with a path value similar to <property value> <java property type>. The AssayId_22\42 part of the directory path serves to separate the temporary files from multiple executions by multiple scripts in the same folder.
C:\labkey\files\transforms\@files\scripts\TransformAndValidationFiles\AssayId_22\42\runDataFile.tsv
  • c. AssayRunTSVData: This file path is where the result of the transform script will be written. It will point to a unique file name in an “assaydata” directory that the framework creates at the root of the files tree. NOTE: this property is written on the same line as the runDataFile property.
  • d. errorsFile: This path is where a transform or validation script can write out error messages for use in troubleshooting. Not normally needed by an R script because the script usually writes errors to stdout, which are written by the framework to a file named “<scriptname>.Rout”.
  • e. transformedRunPropertiesFile: This path is where the script writes out the updated values of batch- and run-level properties that are listed in the runProperties file.

Choosing the Input File for Transform Script Processing

The transform script developer can choose to use either the runDataFile or the runDataUploadedFile as its input. The runDataFile would be the right choice for an Excel-format raw file and a script that fills in additional columns of the data set. By using the runDataFile, the assay framework does the Excel-to-TSV conversion and the script doesn’t need to know how to parse Excel files. The runDataUploadedFile would be the right choice for a raw file in TSV format that the script is going to reformat by turning columns into rows. In either case, the script writes its output to the AssayRunTSVData file.

Transform Script Options

There are two useful options presented as checkboxes in the Assay designer.

  • Save Script Data tells the framework to not delete the intermediate files such as the runProperties file after a successful run. This option is important during script development. It can be turned off to avoid cluttering the file space under the TransformAndValidationFiles directory that the framework automatically creates under the script file directory.
  • Upload In Background tells the framework to create a pipeline job as part of the import process, rather than tying up the browser session. It is useful for importing large data sets.

Connecting Back to the Server from a Transform Script

Sometimes a transform script needs to connect back to the server to do its job. One example is translating lookup display values into key values. The Rlabkey library available on CRAN has the functions needed to connect to, query, and insert or update data in the local LabKey Server where it is running. To give the connection the right security context (that of the current user), the assay framework provides the substitution token ${rLabkeySessionId}. Including this token on a line by itself near the beginning of the transform script eliminates the need to use a config file to hold a username and password for this loopback connection. It will be replaced with two lines that looks like:

labkey.sessionCookieName = "JSESSIONID"
labkey.sessionCookieContents = "TOMCAT_SESSION_ID"

where TOMCAT_SESSION_ID is the actual ID of the user's HTTP session.

Debugging an R Transform Script

You can load an R transform script into the R console/debugger and run the script with debug(<functionname>) commands active. Since the substitution tokens described above ( ${srcDirectory} , ${runInfo}, and ${rLabkeySessionId} ) are necessary to the correct operation of the script, the framework conveniently writes out a version of the script with these substitutions made, into the same subdirectory where the runProperties.tsv file is found. Load this modified version of the script into the R console.

Example Script

Input Data TSV File

Suppose you have the following Assay data in a TSV format:

SpecimenIDDateScoreMessage
S-12018-11-020.1 
S-22018-11-020.2 
S-32018-11-020.3 
S-42018-11-02-1 
S-52018-11-0299 

You want a transform script that can flag values greater than 1 and less than 0 as "Out of Range", so that the data enters the database in the form:

SpecimenIDDateScoreMessage
S-12018-11-020.1 
S-22018-11-020.2 
S-32018-11-020.3 
S-42018-11-02-1Out of Range
S-52018-11-0299Out of Range

The following R transform script will write to the Message column if it sees out of range values:

sampleTransform.R

library(Rlabkey)

################################################
# Read in the run properties and results data. #
################################################

run.props = labkey.transform.readRunPropertiesFile("${runInfo}");

# save the important run.props as separate variables
run.data.file = labkey.transform.getRunPropertyValue(run.props, "runDataFile");
run.output.file = run.props$val3[run.props$name == "runDataFile"];
error.file = labkey.transform.getRunPropertyValue(run.props, "errorsFile");

# read in the results data file content
run.data = read.delim(run.data.file, header=TRUE, sep="\t", stringsAsFactors = FALSE);

#######################
# Transform the data. #
#######################

# Your tranformation code goes here.

# If any Score value is less than 0 or greater than 1,
# then place "Out of Range" in the Message vector.
for(i in 1:nrow(run.data))
{
if (run.data$Score[i] < 0 | run.data$Score[i] > 1) {run.data$Message[i] <- "Out of Range"}
}

###########################################################
# Write the transformed data to the output file location. #
###########################################################

# write the new set of run data out to an output file
write.table(run.data, file=run.output.file, sep="\t", na="", row.names=FALSE, quote=FALSE);

# print the ending time for the transform script
writeLines(paste("nProcessing end time:",Sys.time(),sep=" "));

Setup

Before installing this sample, you may need to update your R engine.

  • Create a new folder of type Assay.
  • Download this R script: sampleTransform.R
  • Upload the script to the Files Repository of your new folder. Note the full path to the @files directory when uploading it.
  • Create an Assay Design with the following fields. You can either enter them yourself or download and import this assay design: Score.xar
    • SpecimenId - type Text (String)
    • Date - type DateTime
    • Score - type Number (Double)
    • Message - type Text (String)
  • Determine the absolute path to the script in the files repository. You can see it after uploading or by concatenating the <folder-root> with "/@files/sampleTransform.R"
  • Import data to the Assay Design. Include values less than 0 or greater than 1 to trigger "Out of Range" values in the Message field. You can use this example data file: R Script Assay Data.tsv
  • View the transformed results imported to the database to confirm that the R script is working correctly.

Related Topics

Discussion

Was this content helpful?

Log in or register an account to provide feedback


previousnext
 
expand all collapse all