Examples 2 & 3: Describe Protocols: /Documentation

Examples 2 & 3: Describe Protocols

This topic explains how to describe experiment protocols in a xar.xml file.

Experiment Log format and Protocol Parameters

The ExperimentRun section of the xar.xml for Example 1 contains a complete description of every ProtocolApplication instance and its inputs and outputs. If the experiment run had been previously loaded into a LabKey Server repository or compatible database, this type of xar.xml would be an effective format for exporting the experiment run data to another system. This document will use the term "export format" to describe a xar.xml that provides complete details of every ProtocolApplication as in Example 1. When loading new experiment run results for the first time, export format is both overly verbose and requires the xar.xml author (human or software) to invent unique IDs for many objects.

To see how an initial load of experiment run data can be made simpler, consider how protocols relate to protocol applications. A protocol for an experiment run can be thought of as a multi-step recipe. Given one or more starting inputs, the results of applying each step are predictable. The sample preparation step always produces a prepared material for every starting material. The analyze step always produces a data output for every prepared material input. If the xar.xml author could describe this level of detail about the protocols used in a run, the loader would have almost enough information to generate the ProtocolApplication records automatically. The other piece of information the xar.xml would have to describe about the protocols is what names and ids to assign to the generated records.

Example 1 included information in the ProtocolDefinitions section about the inputs and outputs of each step. Example 2 adds pre-defined ProtocolParameters to these protocols that tell the LabKey Server loader how to generate names and ids for ProtocolApplications and their inputs and outputs. Then Example 2 uses the ExperimentLog section to tell the Xar loader to generate ProtocolApplication records rather than explicitly including them in the Xar.xml. The following table shows these differences.

Table 2: Example 2 differences from Example 1

The number and base types of inputs and outputs for a protocol are defined by four elements, MaxInput…PerInstance and Output…PerInstance.

The names and LSIDs of the ProtocolApplications and their outputs can be generated at load time. The XarTemplate parameters determine how these names and LSIDs are formed.

Note new suffix on the LSID, discussed under Example 3.

<exp:Protocol rdf:about="urn:lsid:localhost:Protocol:SamplePrep.WithTemplates">

<exp:Name>Sample Prep Protocol</exp:Name>

<exp:ProtocolDescription>Describes sample handling and preparation steps</exp:ProtocolDescription>

<exp:ApplicationType>ProtocolApplication</exp:ApplicationType>

<exp:MaxInputMaterialPerInstance>1</exp:MaxInputMaterialPerInstance>

<exp:MaxInputDataPerInstance>0</exp:MaxInputDataPerInstance>

<exp:OutputMaterialPerInstance>1</exp:OutputMaterialPerInstance>

<exp:OutputDataPerInstance>0</exp:OutputDataPerInstance>

<exp:ParameterDeclarations>

<exp:SimpleVal Name="ApplicationLSIDTemplate" OntologyEntryURI="terms.fhcrc.org#XarTemplate.ApplicationLSID" ValueType="String">urn:lsid:localhost:ProtocolApplication:DoSamplePrep.WithTemplates</exp:SimpleVal>

<exp:SimpleVal Name="ApplicationNameTemplate" OntologyEntryURI="terms.fhcrc.org#XarTemplate.ApplicationName" ValueType="String">Prepare sample</exp:SimpleVal>

<exp:SimpleVal Name="OutputMaterialLSIDTemplate" OntologyEntryURI="terms.fhcrc.org#XarTemplate.OutputMaterialLSID" ValueType="String">urn:lsid:localhost:Material:PreparedSample.WithTemplates</exp:SimpleVal>

<exp:SimpleVal Name="OutputMaterialNameTemplate" OntologyEntryURI="terms.fhcrc.org#XarTemplate.OutputMaterialName" ValueType="String">Prepared sample</exp:SimpleVal>

</exp:ParameterDeclarations>

</exp:Protocol>

Example 2 uses the ExperimentLog section to instruct the loader to generate the ProtocolApplication records. The Xar loader uses the information in the ProtocolDefinitions and ProtocolActionDefinitions sections to generate these records.

Note the ProtocolApplications section is empty.

<exp:ExperimentRuns>

<exp:ExperimentRun rdf:about="urn:lsid:localhost:ExperimentRun:MinimalExperimentRun.WithTemplates">

<exp:Name>Example 2 (using log format)</exp:Name>

<exp:ProtocolLSID>urn:lsid:localhost:Protocol:MinimalRunProtocol.WithTemplates</exp:ProtocolLSID>

<exp:ExperimentLog>

<exp:ExperimentLogEntry ActionSequenceRef="1"/>

<exp:ExperimentLogEntry ActionSequenceRef="10"/>

<exp:ExperimentLogEntry ActionSequenceRef="20"/>

<exp:ExperimentLogEntry ActionSequenceRef="30"/>

</exp:ExperimentLog>

<exp:ProtocolApplications/>

</exp:ExperimentRun>

</exp:ExperimentRuns>

ProtocolApplication Generation

When loading a xar.xml using the ExperimentLog section, the loader generates ProtocolApplication records and their inputs/outputs. For this generation process to work, there must be at least one LogEntry in the ExperimentLog section of the xar.xml and the GenerateDataFromStepRecord attribute of the ExperimentRun must be either missing or have an explicit value of false.

The xar loader uses the following process:

Read an ExperimentLogEntry record in with its sequence number. The presence of this record in the xar.xml indicates that step has been completed. These LogEntry records must be in ascending sequence order. The loader also gets any optional information about parameters applied or specific inputs (Example 2 contains none of this optional information).
Lookup the protocol corresponding to the action sequence number, and also the protocol(s) that are predecessors to it. This information is contained in the ProtocolActionDefinitions.
Determine the set of all output Material objects and all output Data objects from the ProtocolApplication objects corresponding to the predecessor protocol(s). These become the set of inputs to the current action sequence. Because of the ascending sequence order of the LogEntry records, these predecessor outputs have already been generated. (If we are on the first protocol in the action set, the set of inputs is given by the StartingInputs section).
Get the MaxInputMaterialPerInstance and MaxInputDataPerInstance values for the current protocol step. These numbers are used to determine how many ProtocolApplication objects ("instances") to generate for the current protocol step. In the Example 2 case there is only one starting Material that never gets divided or fractionated, so there is only one instance of each protocol step required. (Example 3 will show multiple instances. ) The loader iterates through the set of Material or Data inputs and creates a ProtocolApplication object for every n inputs. The input objects are connected as InputRefs to the ProtocolApplications.
The name and LSID of each generated ProtocolApplication is deterimined by the ApplicationLSIDTemplate and ApplicationNameTemplate parameters. See below for details on these parameters.
For each generated ProtocolApplication, the loader then generates output Material or Data objects according to the Output…PerInstance values. The names and LSIDs or these generated objects are determined by the Output…NameTemplate and Output…LSIDTemplate parameters.
Repeat until the end of the ExperimentLog section.

Instancing properties of Protocol objects

As described above, four protocol properties govern how many ProtocolApplication objects are generated for an ExperimentLogEntry, and how many output objects are generated for each ProtocolApplication:

Property	Allowed values	Effect of property value
MaxInputMaterialPerInstance MaxInputDataPerInstance	0	The protocol does not accept [ Material \| Data ] objects as inputs
	1	For every [ Material \| Data ] object output by a predecessor step, create a new ProtocolApplication for this protocol
	>1	For every n [ Material \| Data ] objects output by a predecessor step, create a new ProtocolApplication. If the number of [ Material \| Data ] objects output by predecessors does not divide evenly by n, a warning is written to the log
	xsi:nil="true"	Equivalent to "unlimited". Create a single ProtocolApplication object and assign all [ Material \| Data ] outputs of predecessors as inputs to this single instance
	Combined constraint	If both MaxInputMaterialPerInstance and MaxInputDataPerInstance are not nil, then at least one of the two values must be 0 for the loader to automatically generate ProtocolApplication objects.
OutputMaterialPerInstance OutputDataPerInstance	0	An application of this Protocol does note create [ Material \| Data ] outputs
	1	Each ProtocolApplication of this Protocol "creates" one [ Material \| Data ] object
	n >1	Each ProtocolApplication of this Protocol "creates" n [ Material \| Data ] objects
	xsi:nil="true"	Equivalent to "unknown". Each ProtocolApplication of this Protocol may create 0, 1 or many [ Material \| Data ] outputs, but none are generated automatically. Its effect is currently equivalent to a value of 0, but in a future version of the software a nil value might be the signal to ask a custom load handler how many outputs to generate.

Protocol parameters for generating ProtocolApplication objects and their outputs

A ProtocolParameter has both a short name and a fully-qualified name (the "OntologyEntryURI" attribute). Currently both need to be specified for all parameters. These parameters are declared by including a SimpleVal element in the definition. If the SimpleVal element has non-empty content, the content is treated as the default value for the parameter. Non-default values can be specified in the ExperimentLogEntry node, but Example 2 does not do this.

Name	Fully-qualified name	Purpose
ApplicationLSIDTemplate	terms.fhcrc.org#XarTemplate.ApplicationLSID	LSID of a generated ProtocolApplication
ApplicationNameTemplate	terms.fhcrc.org#XarTemplate.ApplicationName	Name of a generated ProtocolApplication
OutputMaterialLSIDTemplate	terms.fhcrc.org#XarTemplate.OutputMaterialLSID	LSID of an output Material object
OutputMaterialNameTemplate	terms.fhcrc.org#XarTemplate.OutputMaterialName	Name of an output Material object
OutputDataLSIDTemplate	terms.fhcrc.org#XarTemplate.OutputDataLSID	LSID of an output Data object
OutputDataNameTemplate	terms.fhcrc.org#XarTemplate.OutputDataName	Name of an output Data object
OutputDataFileTemplate	terms.fhcrc.org#XarTemplate.OutputDataFile	Path name of an output Data object, used to set the DataFileUrl property . Relative to the OutputDataDir directory, if set; otherwise relative to the directory containing the xar.xml file
OutputDataDirTemplate	terms.fhcrc.org#XarTemplate.OutputDataDir	Directory for files associated with output Data objects, used to set the DataFileUrl property . Relative to the directory containing the xar.xml file

Substitution Templates and ProtocolApplication Instances

The LSIDs in Example 2 included an arbitrary ".WithTemplates" suffix, where the same LSIDs in Example 1 included ".FixedLSID" as a suffix. The only purpose of these LSID endings was to make the LSIDs unique between Example 1 and 2. Otherwise if a user tried to load Example 1 onto the same LabKey Server system as Example 2, the second load would fail with a "LSID already exists" error in the log. The behavior of the Xar loader when it encounters a duplicate LSID already in the database depends on the object it is attempting to load:

Experiment, ProtocolDefinitions, and ProtocolActionDefinitions will use existing saved objects in the database if a xar.xml being loaded uses an existing LSID. No attempt is made to compare the properties listed in the xar.xml with those properties in the database for objects with the same LSID.
An ExperimentRun will fail to load if its LSID already exists unless the CreateNewIfDuplicate attribute of the ExperimentRun is set to true. If this attribute is set to true, the loader will add a version number to the end of the existing ExperimentRun LSID in order to make it unique.
A ProtocolApplication will fail to load (and abort the entire xar.xml load) if its LSID already exists. (This is a good reason to use the ${RunLSIDBase} template described below for these objects.)
Data and Material objects that are starting inputs are treated like Experiment and Protocol objects—if their LSIDs already exist, the previously loaded definitions apply and the Xar.xml load continues.
Data and Material objects that are generated by a ProtocolApplication are treated like ProtocolApplication objects—if a duplicate LSID is encountered the xar.xml load fails with an error.

Users will encounter problems and confusion when LSIDs overlap or conflict unexpectedly. If a protocol reuses an existing LSID unexpectedly, for example, the user will not see the effect of protocol properties set in his or her xar.xml, but will see the previously loaded properties. If an experiment run uses the same LSID as a previously loaded run, the new run will fail to load and the user may be confused as to why.

Fortunately, the LabKey Server Xar loader has a feature called substitution templates that can alleviate the problems of creating unique LSIDs. If an LSID string in a xar.xml file contains one of these substitution templates, the loader will replace the template with a generated string at load time. A separate document called Life Sciences Identifiers (LSIDs) in LabKey Server details the structure of LSIDs and the substitution templates available. Example 3 uses these substitution templates in all of its LSIDs.

Example 3 also shows a fractionation protocol that generates multiple output materials for one input material. In order to generate unique LSIDs for all outputs, the OutputMaterialLSIDTemplate uses ${OutputInstance} to append a digit to the generated output object LSIDs. Since the subsequent protocol steps operate on only one input per instance, the LSIDs of all downstream objects from the fractionation step also need an instance number qualifier to maintain uniqueness. Object names also use instance numbers to remain distinct, though there is no uniqueness requirement for object Names.

Graph view of Example 3

Table 3: Example 3 differences from Example 2

The Protocol objects in Example 3 use the ${FolderLSIDBase} substitution template. The Xar loader will create an LSID that looks like urn:lsid:proteomics.fhcrc.org :Protocol.Folder-3017:Example3Protocol The integer “3017” in this LSID is unique to the folder in which the xar.xml load is being run. This means that other xar.xml files that use the same protocol (i.e. the Protocol element has the same rdf:about value, including template) and are loaded into the same folder will use the already-loaded protocol definition. If a xar.xml file with the same protocol is loaded into a different folder, a new Protocol record will be inserted into the database. The LSID of this record will be the same except for the number encoded in the “Folder-xxxx” portion of the namespace.	… <exp:Experiment rdf:about="${FolderLSIDBase}:Tutorial"> <exp:Name>Tutorial Examples</exp:Name> </exp:Experiment> <exp:ProtocolDefinitions> <exp:Protocol rdf:about="${FolderLSIDBase}:Example3Protocol"> <exp:Name>Example 3 Protocol</exp:Name> <exp:ProtocolDescription>This protocol and its children use substitution strings to generate LSIDs on load.</exp:ProtocolDescription> <exp:ApplicationType>ExperimentRun</exp:ApplicationType> <exp:MaxInputMaterialPerInstance xsi:nil="true"/> <exp:MaxInputDataPerInstance xsi:nil="true"/> <exp:OutputMaterialPerInstance xsi:nil="true"/> <exp:OutputDataPerInstance xsi:nil="true"/> <exp:ParameterDeclarations> <exp:SimpleVal Name="ApplicationLSIDTemplate" OntologyEntryURI="terms.fhcrc.org#XarTemplate.ApplicationLSID" ValueType="String"> ${RunLSIDBase}:DoMinimalRunProtocol</exp:SimpleVal> <exp:SimpleVal Name="ApplicationNameTemplate" OntologyEntryURI="terms.fhcrc.org#XarTemplate.ApplicationName" ValueType="String">Application of MinimalRunProtocol</exp:SimpleVal> </exp:ParameterDeclarations> </exp:Protocol> …
The records that make up the details of an experiment run-- ProtocolApplication objects and their Data or Material outputs—are commonly loaded multiple times in one folder. This happens, for example, when a researcher applies the exact same protocol to different starting samples in different runs. To keep the LSIDs of the output objects of the runs unique, the ${RunLSIDBase} template is useful. It does the same thing as the FolderLSIDBase except that the namespace contains a integer unique to the run being loaded. These LSIDs look like urn:lsid:proteomics.fhcrc.org :ProtocolApplication.Run-73:DoSamplePrep	<exp:Protocol rdf:about="${FolderLSIDBase}:Divide_sample"> <exp:Name>Divide sample</exp:Name> <exp:ProtocolDescription>Divide sample into 4 aliquots</exp:ProtocolDescription> <exp:ApplicationType>ProtocolApplication</exp:ApplicationType> <exp:MaxInputMaterialPerInstance>1</exp:MaxInputMaterialPerInstance> <exp:MaxInputDataPerInstance>0</exp:MaxInputDataPerInstance> <exp:OutputMaterialPerInstance>4</exp:OutputMaterialPerInstance> <exp:OutputDataPerInstance>0</exp:OutputDataPerInstance> <exp:OutputDataType>Data</exp:OutputDataType> <exp:ParameterDeclarations> <exp:SimpleVal Name="ApplicationLSIDTemplate" OntologyEntryURI="terms.fhcrc.org#XarTemplate.ApplicationLSID" ValueType="String"> ${RunLSIDBase}:DoDivide_sample</exp:SimpleVal> <exp:SimpleVal Name="ApplicationNameTemplate" OntologyEntryURI="terms.fhcrc.org#XarTemplate.ApplicationName" ValueType="String">Divide sample into 4</exp:SimpleVal>
Example 3 also includes an aliquot step, taking an input prepared material and producing 4 output materials that are measured portions of the input. In order to model this additional step, the xar.xml needs to include the following in the Protocol of the new step: · set the OutputMaterialPerInstance to 4 · use ${OutputInstance} in the LSIDs and names of the generated Material objects output. This will range from 0 to 3 in this example. · use ${InputInstance} in subsequent Protocol definitions and their outputs. Using ${InputInstance} in the protocol applications that are downstream of the aliquot step is necessary because there will be one ProtocolApplication object for each output of the previous step.	<exp:SimpleVal Name="OutputMaterialLSIDTemplate" OntologyEntryURI="terms.fhcrc.org#XarTemplate.OutputMaterialLSID" ValueType="String"> ${RunLSIDBase}:Aliquot.${OutputInstance}</exp:SimpleVal> <exp:SimpleVal Name="OutputMaterialNameTemplate" OntologyEntryURI="terms.fhcrc.org#XarTemplate.OutputMaterialName" ValueType="String"> Aliquot (${OutputInstance})</exp:SimpleVal> </exp:ParameterDeclarations> </exp:Protocol> <exp:Protocol rdf:about="${FolderLSIDBase}:Analyze"> <exp:Name>Example analysis protocol</exp:Name> … <exp:ParameterDeclarations> <exp:SimpleVal Name="ApplicationLSIDTemplate" OntologyEntryURI="terms.fhcrc.org#XarTemplate.ApplicationLSID" ValueType="String"> ${RunLSIDBase}:DoAnalysis.${InputInstance}</exp:SimpleVal> <exp:SimpleVal Name="ApplicationNameTemplate" OntologyEntryURI="terms.fhcrc.org#XarTemplate.ApplicationName" ValueType="String"> Analyze sample (${InputInstance})</exp:SimpleVal> <exp:SimpleVal Name="OutputDataLSIDTemplate" OntologyEntryURI="terms.fhcrc.org#XarTemplate.OutputDataLSID" ValueType="String"> ${RunLSIDBase}:AnalysisResult.${InputInstance}</exp:SimpleVal> <exp:SimpleVal Name="OutputDataNameTemplate" OntologyEntryURI="terms.fhcrc.org#XarTemplate.OutputDataName" ValueType="String"> Analysis results (${InputInstance})</exp:SimpleVal> </exp:ParameterDeclarations> </exp:Protocol>
When adding a new protocol step to a run, the xar.xml author must also add a ProtocolAction element that gives the step an ActionSequence number. This number must fall between the sequence numbers of its predecessor(s) and its successors. In this example, the Divide_sample step was inserted between the prepare and analyze steps and assigned a sequence number of 15. The succeeding step (Analyze) also needed an update of its PredecessorAction sequence ref, but none of the other action definition steps needed to be changes. (This is why it is useful to leave gaps in the sequence numbers when hand-editing xar.xml files.).	<exp:ProtocolActionDefinitions> <exp:ProtocolActionSet ParentProtocolLSID="${FolderLSIDBase}:Example3Protocol"> .. <exp:ProtocolAction ChildProtocolLSID="${FolderLSIDBase}:Divide_sample" ActionSequence="15"> <exp:PredecessorAction ActionSequenceRef="10"/> </exp:ProtocolAction> <exp:ProtocolAction ChildProtocolLSID="${FolderLSIDBase}:Analyze" ActionSequence="20"> <exp:PredecessorAction ActionSequenceRef="15"/> </exp:ProtocolAction> … </exp:ProtocolActionSet> </exp:ProtocolActionDefinitions>
One other substitution template that is useful is the ${XarFileId}. On load, this template becomes an integer unique to the xar.xml file. In example 3, the Starting_Sample gets a new LSID for every new xar.xml it is loaded from.	<exp:StartingInputDefinitions> <exp:Material rdf:about="${FolderLSIDBase}.${XarFileId}:Starting_Sample"> <exp:Name>Starting Sample</exp:Name> </exp:Material> </exp:StartingInputDefinitions>

Example 3 illustrates the difference between LogEntry format and export format more clearly. The file Example3.xar.xml uses the log entry format. It has 120 lines altogether, of which 15 are in the ExperimentRuns section. The file Example3_exportformat.xar.xml describes the exact same experiment but is 338 lines long. All of the additional lines are in the ExperimentRun section, describing the ProtocolApplications and their inputs and outputs explicitly.

LabKey Support

LabKey Support