The ExperimentRun section of the xar.xml for Example 1 contains a complete description of every ProtocolApplication instance and its inputs and outputs. If the experiment run had been previously loaded into a LabKey Server repository or compatible database, this type of xar.xml would be an effective format for exporting the experiment run data to another system. This document will use the term "export format" to describe a xar.xml that provides complete details of every ProtocolApplication as in Example 1. When loading new experiment run results for the first time, export format is both overly verbose and requires the xar.xml author (human or software) to invent unique IDs for many objects.
To see how an initial load of experiment run data can be made simpler, consider how protocols relate to protocol applications. A protocol for an experiment run can be thought of as a multi-step recipe. Given one or more starting inputs, the results of applying each step are predictable. The sample preparation step always produces a prepared material for every starting material. The analyze step always produces a data output for every prepared material input. If the xar.xml author could describe this level of detail about the protocols used in a run, the loader would have almost enough information to generate the ProtocolApplication records automatically. The other piece of information the xar.xml would have to describe about the protocols is what names and ids to assign to the generated records.
Example 1 included information in the ProtocolDefinitions section about the inputs and outputs of each step. Example 2 adds pre-defined ProtocolParameters to these protocols that tell the LabKey Server loader how to generate names and ids for ProtocolApplications and their inputs and outputs. Then Example 2 uses the ExperimentLog section to tell the Xar loader to generate ProtocolApplication records rather than explicitly including them in the Xar.xml. The following table shows these differences.
The number and base types of inputs and outputs for a protocol are defined by four elements, MaxInput…PerInstance and Output…PerInstance.
The names and LSIDs of the ProtocolApplications and their outputs can be generated at load time. The XarTemplate parameters determine how these names and LSIDs are formed.
Note new suffix on the LSID, discussed under Example 3. |
<exp:Protocol rdf:about="urn:lsid:localhost:Protocol:SamplePrep.WithTemplates"> <exp:Name>Sample Prep Protocol</exp:Name> <exp:ProtocolDescription>Describes sample handling and preparation steps</exp:ProtocolDescription> <exp:ApplicationType>ProtocolApplication</exp:ApplicationType> <exp:MaxInputMaterialPerInstance>1</exp:MaxInputMaterialPerInstance> <exp:MaxInputDataPerInstance>0</exp:MaxInputDataPerInstance> <exp:OutputMaterialPerInstance>1</exp:OutputMaterialPerInstance> <exp:OutputDataPerInstance>0</exp:OutputDataPerInstance> <exp:ParameterDeclarations> <exp:SimpleVal Name="ApplicationLSIDTemplate" OntologyEntryURI="terms.fhcrc.org#XarTemplate.ApplicationLSID" ValueType="String">urn:lsid:localhost:ProtocolApplication:DoSamplePrep.WithTemplates</exp:SimpleVal> <exp:SimpleVal Name="ApplicationNameTemplate" OntologyEntryURI="terms.fhcrc.org#XarTemplate.ApplicationName" ValueType="String">Prepare sample</exp:SimpleVal> <exp:SimpleVal Name="OutputMaterialLSIDTemplate" OntologyEntryURI="terms.fhcrc.org#XarTemplate.OutputMaterialLSID" ValueType="String">urn:lsid:localhost:Material:PreparedSample.WithTemplates</exp:SimpleVal> <exp:SimpleVal Name="OutputMaterialNameTemplate" OntologyEntryURI="terms.fhcrc.org#XarTemplate.OutputMaterialName" ValueType="String">Prepared sample</exp:SimpleVal> </exp:ParameterDeclarations> </exp:Protocol>
|
Example 2 uses the ExperimentLog section to instruct the loader to generate the ProtocolApplication records. The Xar loader uses the information in the ProtocolDefinitions and ProtocolActionDefinitions sections to generate these records.
Note the ProtocolApplications section is empty. |
<exp:ExperimentRuns> <exp:ExperimentRun rdf:about="urn:lsid:localhost:ExperimentRun:MinimalExperimentRun.WithTemplates"> <exp:Name>Example 2 (using log format)</exp:Name> <exp:ProtocolLSID>urn:lsid:localhost:Protocol:MinimalRunProtocol.WithTemplates</exp:ProtocolLSID> <exp:ExperimentLog> <exp:ExperimentLogEntry ActionSequenceRef="1"/> <exp:ExperimentLogEntry ActionSequenceRef="10"/> <exp:ExperimentLogEntry ActionSequenceRef="20"/> <exp:ExperimentLogEntry ActionSequenceRef="30"/> </exp:ExperimentLog> <exp:ProtocolApplications/> </exp:ExperimentRun> </exp:ExperimentRuns> |
When loading a xar.xml using the ExperimentLog section, the loader generates ProtocolApplication records and their inputs/outputs. For this generation process to work, there must be at least one LogEntry in the ExperimentLog section of the xar.xml and the GenerateDataFromStepRecord attribute of the ExperimentRun must be either missing or have an explicit value of false.
The xar loader uses the following process:
As described above, four protocol properties govern how many ProtocolApplication objects are generated for an ExperimentLogEntry, and how many output objects are generated for each ProtocolApplication:
Property |
Allowed values |
Effect of property value |
MaxInputMaterialPerInstance MaxInputDataPerInstance |
0 |
The protocol does not accept [ Material | Data ] objects as inputs |
1 |
For every [ Material | Data ] object output by a predecessor step, create a new ProtocolApplication for this protocol |
|
>1 |
For every n [ Material | Data ] objects output by a predecessor step, create a new ProtocolApplication. If the number of [ Material | Data ] objects output by predecessors does not divide evenly by n, a warning is written to the log |
|
xsi:nil="true" |
Equivalent to "unlimited". Create a single ProtocolApplication object and assign all [ Material | Data ] outputs of predecessors as inputs to this single instance |
|
Combined constraint |
If both MaxInputMaterialPerInstance and MaxInputDataPerInstance are not nil, then at least one of the two values must be 0 for the loader to automatically generate ProtocolApplication objects. |
|
OutputMaterialPerInstance OutputDataPerInstance |
0 |
An application of this Protocol does note create [ Material | Data ] outputs |
1 |
Each ProtocolApplication of this Protocol "creates" one [ Material | Data ] object |
|
n >1 |
Each ProtocolApplication of this Protocol "creates" n [ Material | Data ] objects |
|
xsi:nil="true" |
Equivalent to "unknown". Each ProtocolApplication of this Protocol may create 0, 1 or many [ Material | Data ] outputs, but none are generated automatically. Its effect is currently equivalent to a value of 0, but in a future version of the software a nil value might be the signal to ask a custom load handler how many outputs to generate. |
A ProtocolParameter has both a short name and a fully-qualified name (the "OntologyEntryURI" attribute). Currently both need to be specified for all parameters. These parameters are declared by including a SimpleVal element in the definition. If the SimpleVal element has non-empty content, the content is treated as the default value for the parameter. Non-default values can be specified in the ExperimentLogEntry node, but Example 2 does not do this.
Name |
Fully-qualified name |
Purpose |
ApplicationLSIDTemplate |
terms.fhcrc.org#XarTemplate.ApplicationLSID |
LSID of a generated ProtocolApplication |
ApplicationNameTemplate |
terms.fhcrc.org#XarTemplate.ApplicationName |
Name of a generated ProtocolApplication |
OutputMaterialLSIDTemplate |
terms.fhcrc.org#XarTemplate.OutputMaterialLSID |
LSID of an output Material object |
OutputMaterialNameTemplate |
terms.fhcrc.org#XarTemplate.OutputMaterialName |
Name of an output Material object |
OutputDataLSIDTemplate |
terms.fhcrc.org#XarTemplate.OutputDataLSID |
LSID of an output Data object |
OutputDataNameTemplate |
terms.fhcrc.org#XarTemplate.OutputDataName |
Name of an output Data object |
OutputDataFileTemplate |
terms.fhcrc.org#XarTemplate.OutputDataFile |
Path name of an output Data object, used to set the DataFileUrl property . Relative to the OutputDataDir directory, if set; otherwise relative to the directory containing the xar.xml file |
OutputDataDirTemplate |
terms.fhcrc.org#XarTemplate.OutputDataDir |
Directory for files associated with output Data objects, used to set the DataFileUrl property . Relative to the directory containing the xar.xml file |
The LSIDs in Example 2 included an arbitrary ".WithTemplates" suffix, where the same LSIDs in Example 1 included ".FixedLSID" as a suffix. The only purpose of these LSID endings was to make the LSIDs unique between Example 1 and 2. Otherwise if a user tried to load Example 1 onto the same LabKey Server system as Example 2, the second load would fail with a "LSID already exists" error in the log. The behavior of the Xar loader when it encounters a duplicate LSID already in the database depends on the object it is attempting to load:
Users will encounter problems and confusion when LSIDs overlap or conflict unexpectedly. If a protocol reuses an existing LSID unexpectedly, for example, the user will not see the effect of protocol properties set in his or her xar.xml, but will see the previously loaded properties. If an experiment run uses the same LSID as a previously loaded run, the new run will fail to load and the user may be confused as to why.
Fortunately, the LabKey Server Xar loader has a feature called substitution templates that can alleviate the problems of creating unique LSIDs. If an LSID string in a xar.xml file contains one of these substitution templates, the loader will replace the template with a generated string at load time. A separate document called Life Sciences Identifiers (LSIDs) in LabKey Server details the structure of LSIDs and the substitution templates available. Example 3 uses these substitution templates in all of its LSIDs.
Example 3 also shows a fractionation protocol that generates multiple output materials for one input material. In order to generate unique LSIDs for all outputs, the OutputMaterialLSIDTemplate uses ${OutputInstance} to append a digit to the generated output object LSIDs. Since the subsequent protocol steps operate on only one input per instance, the LSIDs of all downstream objects from the fractionation step also need an instance number qualifier to maintain uniqueness. Object names also use instance numbers to remain distinct, though there is no uniqueness requirement for object Names.
The Protocol objects in Example 3 use the ${FolderLSIDBase} substitution template. The Xar loader will create an LSID that looks like
urn:lsid:proteomics.fhcrc.org
The integer “3017” in this LSID is unique to the folder in which the xar.xml load is being run. This means that other xar.xml files that use the same protocol (i.e. the Protocol element has the same rdf:about value, including template) and are loaded into the same folder will use the already-loaded protocol definition.
If a xar.xml file with the same protocol is loaded into a different folder, a new Protocol record will be inserted into the database. The LSID of this record will be the same except for the number encoded in the “Folder-xxxx” portion of the namespace.
|
… <exp:Experiment rdf:about="${FolderLSIDBase}:Tutorial"> <exp:Name>Tutorial Examples</exp:Name> </exp:Experiment>
<exp:ProtocolDefinitions> <exp:Protocol rdf:about="${FolderLSIDBase}:Example3Protocol"> <exp:Name>Example 3 Protocol</exp:Name> <exp:ProtocolDescription>This protocol and its children use substitution strings to generate LSIDs on load.</exp:ProtocolDescription> <exp:ApplicationType>ExperimentRun</exp:ApplicationType> <exp:MaxInputMaterialPerInstance xsi:nil="true"/> <exp:MaxInputDataPerInstance xsi:nil="true"/> <exp:OutputMaterialPerInstance xsi:nil="true"/> <exp:OutputDataPerInstance xsi:nil="true"/> <exp:ParameterDeclarations> <exp:SimpleVal Name="ApplicationLSIDTemplate" OntologyEntryURI="terms.fhcrc.org#XarTemplate.ApplicationLSID" ValueType="String"> <exp:SimpleVal Name="ApplicationNameTemplate" OntologyEntryURI="terms.fhcrc.org#XarTemplate.ApplicationName" ValueType="String">Application of MinimalRunProtocol</exp:SimpleVal> </exp:ParameterDeclarations> </exp:Protocol> … |
The records that make up the details of an experiment run-- ProtocolApplication objects and their Data or Material outputs—are commonly loaded multiple times in one folder. This happens, for example, when a researcher applies the exact same protocol to different starting samples in different runs. To keep the LSIDs of the output objects of the runs unique, the ${RunLSIDBase} template is useful. It does the same thing as the FolderLSIDBase except that the namespace contains a integer unique to the run being loaded. These LSIDs look like
urn:lsid:proteomics.fhcrc.org
|
<exp:Protocol rdf:about="${FolderLSIDBase}:Divide_sample"> <exp:Name>Divide sample</exp:Name> <exp:ProtocolDescription>Divide sample into 4 aliquots</exp:ProtocolDescription> <exp:ApplicationType>ProtocolApplication</exp:ApplicationType> <exp:MaxInputMaterialPerInstance>1</exp:MaxInputMaterialPerInstance> <exp:MaxInputDataPerInstance>0</exp:MaxInputDataPerInstance> <exp:OutputMaterialPerInstance>4</exp:OutputMaterialPerInstance> <exp:OutputDataPerInstance>0</exp:OutputDataPerInstance> <exp:OutputDataType>Data</exp:OutputDataType> <exp:ParameterDeclarations> <exp:SimpleVal Name="ApplicationLSIDTemplate" OntologyEntryURI="terms.fhcrc.org#XarTemplate.ApplicationLSID" ValueType="String"> <exp:SimpleVal Name="ApplicationNameTemplate" OntologyEntryURI="terms.fhcrc.org#XarTemplate.ApplicationName" ValueType="String">Divide sample into 4</exp:SimpleVal>
|
Example 3 also includes an aliquot step, taking an input prepared material and producing 4 output materials that are measured portions of the input. In order to model this additional step, the xar.xml needs to include the following in the Protocol of the new step:
· set the OutputMaterialPerInstance to 4 · use ${OutputInstance} in the LSIDs and names of the generated Material objects output. This will range from 0 to 3 in this example. · use ${InputInstance} in subsequent Protocol definitions and their outputs.
Using ${InputInstance} in the protocol applications that are downstream of the aliquot step is necessary because there will be one ProtocolApplication object for each output of the previous step.
|
<exp:SimpleVal Name="OutputMaterialLSIDTemplate" OntologyEntryURI="terms.fhcrc.org#XarTemplate.OutputMaterialLSID" ValueType="String"> <exp:SimpleVal Name="OutputMaterialNameTemplate" OntologyEntryURI="terms.fhcrc.org#XarTemplate.OutputMaterialName" ValueType="String"> </exp:ParameterDeclarations> </exp:Protocol>
<exp:Protocol rdf:about="${FolderLSIDBase}:Analyze"> <exp:Name>Example analysis protocol</exp:Name> … <exp:ParameterDeclarations> <exp:SimpleVal Name="ApplicationLSIDTemplate" OntologyEntryURI="terms.fhcrc.org#XarTemplate.ApplicationLSID" ValueType="String"> <exp:SimpleVal Name="ApplicationNameTemplate" OntologyEntryURI="terms.fhcrc.org#XarTemplate.ApplicationName" ValueType="String"> <exp:SimpleVal Name="OutputDataLSIDTemplate" OntologyEntryURI="terms.fhcrc.org#XarTemplate.OutputDataLSID" ValueType="String"> <exp:SimpleVal Name="OutputDataNameTemplate" OntologyEntryURI="terms.fhcrc.org#XarTemplate.OutputDataName" ValueType="String"> </exp:ParameterDeclarations> </exp:Protocol>
|
When adding a new protocol step to a run, the xar.xml author must also add a ProtocolAction element that gives the step an ActionSequence number. This number must fall between the sequence numbers of its predecessor(s) and its successors. In this example, the Divide_sample step was inserted between the prepare and analyze steps and assigned a sequence number of 15. The succeeding step (Analyze) also needed an update of its PredecessorAction sequence ref, but none of the other action definition steps needed to be changes. (This is why it is useful to leave gaps in the sequence numbers when hand-editing xar.xml files.).
|
<exp:ProtocolActionDefinitions> <exp:ProtocolActionSet ParentProtocolLSID="${FolderLSIDBase}:Example3Protocol"> .. <exp:ProtocolAction ChildProtocolLSID="${FolderLSIDBase}:Divide_sample" ActionSequence="15"> <exp:PredecessorAction ActionSequenceRef="10"/> </exp:ProtocolAction> <exp:ProtocolAction ChildProtocolLSID="${FolderLSIDBase}:Analyze" ActionSequence="20"> <exp:PredecessorAction ActionSequenceRef="15"/> </exp:ProtocolAction> … </exp:ProtocolActionSet> </exp:ProtocolActionDefinitions> |
One other substitution template that is useful is the ${XarFileId}. On load, this template becomes an integer unique to the xar.xml file. In example 3, the Starting_Sample gets a new LSID for every new xar.xml it is loaded from. |
<exp:StartingInputDefinitions> <exp:Material rdf:about="${FolderLSIDBase}.${XarFileId}:Starting_Sample"> <exp:Name>Starting Sample</exp:Name> </exp:Material> </exp:StartingInputDefinitions> |
Example 3 illustrates the difference between LogEntry format and export format more clearly. The file Example3.xar.xml uses the log entry format. It has 120 lines altogether, of which 15 are in the ExperimentRuns section. The file Example3_exportformat.xar.xml describes the exact same experiment but is 338 lines long. All of the additional lines are in the ExperimentRun section, describing the ProtocolApplications and their inputs and outputs explicitly.