Analysis Archive Format

2024-03-28

Premium Feature — Available with all Premium Editions of LabKey Server. Learn more or contact LabKey.

The LabKey flow module supports importing and exporting analyses as a series of .tsv and supporting files in a zip archive. The format is intended to be simple for tools to reformat the results of an external analysis engine for importing into LabKey. Notably, the analysis definition is not included in the archive, but may be defined elsewhere in a FlowJo workspace gating hierarchy, an R flowCore script, or be defined by some other software package.

Export an Analysis Archive

From the flow Runs or FCSAnalysis grid, you can export the analysis results including the original FCS files, keywords, compensation matrices, and statistics.

  • Open the analysis and select the runs to export.
  • Select (Export).
  • Click the Analysis tab.
  • Make the selections you need and click Export.

Import an Analysis Archive

To import a flow analysis archive, perhaps after making changes outside the server to add different statistics, graphs, or other information, follow these steps:

  • In the flow folder, Flow Summary web part, click Upload and Import.
  • Drag and drop the analysis archive into the upload panel.
  • Select the archive and click Import Data.
  • In the popup, confirm that Import External Analysis is selected.
  • Click Import.

Analysis Archive Format

In brief, the archive format contains the following files:

 <root directory>
├─ keywords.tsv
├─ statistics.tsv

├─ compensation.tsv
├─ <comp-matrix01>
├─ <comp-matrix02>.xml

├─ graphs.tsv

├─ <Sample Name 01>/
│ └─ <graph01>.png
│ └─ <graph02>.svg

└─ <Sample Name 02>/
├─ <graph01>.png
└─ <graph02>.pdf

All analysis tsv files are optional. The keywords.tsv file lists the keywords for each sample. The statistics.tsv file contains summary statistic values for each sample in the analysis grouped by population. The graphs.tsv contains a catalog of graph images for each sample where the image format may be any image format (pdf, png, svg, etc.) The compensation.tsv contains a catalog of compensation matrices. To keep the directory listing clean, the graphs or compensation matrices may be grouped into sub-directories. For example, the graph images for each sample could be placed into a directory with the same name as the sample.

ACS Container Format

The ACS container format is not sufficient for direct import to LabKey. The ACS table of contents only includes relationships between files and doesn’t include, for example, the population name and channel/parameter used to calculate a statistic or render a graph. If the ACS ToC could include those missing metadata, the graphs.tsv would be made redundant. The statistics.tsv would still be needed, however.

If you have analyzed results tsv files bundled inside an ACS container, you may be able to extract portions of the files for reformatting into the LabKey flow analysis archive zip format, but you would need to generate the graphs.tsv file manually.

Statistics File

The statistics.tsv file is a tab-separated list of values containing stat names and values. The statistic values may be grouped in a few different ways: (a) no grouping (one statistic value per line), (b) grouped by sample (each column is a new statistic), (c) grouped by sample and population (the current default encoding), or (d) grouped by sample, population, and channel.

Sample Name

Samples are identified by the value in the sample column so must be unique in the analysis. Usually the sample name is just the FCS file name including the ‘.fcs’ extension (e.g., “12345.fcs”).

Population Name

The population column is a unique name within the analysis that identifies the set of events that the statistics were calculated from. A common way to identify the statistics is to use the gating path with gate names separated by a forward slash. If the population name starts with “(” or contains one of “/”, “{”, or “}” the population name must be escaped. To escape illegal characters, wrap the entire gate name in curly brackets { }. For example, the population “A/{B/C}” is the sub-population “B/C” of population “A”.

Statistic Name

The statistic is encoded in the column header as statistic(parameter:percentile) where the parameter and percentile portions are required depending upon the statistic type. The statistic part of the column header may be either the short name (“%P”) or the long name (“Frequency_Of_Parent”). The parameter part is required for the frequency of ancestor statistic and for other channel based statistics. The frequency of ancestor statistic uses the name of an ancestor population as the parameter value while the other statistics use a channel name as the parameter value. To represent compensated parameters, the channel name is wrapped in angle brackets, e.g “<FITC-A>”. The percentile part is required only by the “Percentile” statistic and is an integer in the range of 1-99.

The statistic value is a either an integer number or a double. Count stats are integer values >= 0. Percentage stats are doubles in the range 0-100. Other stats are doubles. If the statistic is not present for the given sample and population, it is left blank.

Allowed Statistics

Short NameLong NameParameterType
CountCountn/aInteger
%Frequencyn/aDouble (0-100)
%PFrequency_Of_Parentn/aDouble (0-100)
%GFrequency_Of_Grandparentn/aDouble (0-100)
%ofFrequency_Of_Ancestorancestor population nameDouble (0-100)
MinMinchannel nameDouble
MaxMaxchannel nameDouble
MedianMedianchannel nameDouble
MeanMeanchannel nameDouble
GeomMeanGeometric_Meanchannel nameDouble
StdDevStd_Devchannel nameDouble
rStdDevRobust_Std_Devchannel nameDouble
MADMedian_Abs_Devchannel nameDouble
MAD%Median_Abs_Dev_Percentchannel nameDouble (0-100)
CVCVchannel nameDouble
rCVRobust_CVchannel nameDouble
%ilePercentilechannel name and percentile 1-99Double (0-100)

For example, the following are valid statistic names:

  • Count
  • Robust_CV(<FITC>)
  • %ile(<Pacific-Blue>:30)
  • %of(Lymphocytes)

Examples

NOTE: The following examples are for illustration purposes only.


No Grouping: One Row Per Sample and Statistic

The required columns are Sample, Population, Statistic, and Value. No extra columns are present. Each statistic is on a new line.

SamplePopulationStatisticValue
Sample1.fcsS/L/Lv/3+/4+/IFNg+IL2+%P0.85
Sample1.fcsS/L/Lv/3+/4+/IFNg+IL2-Count12001
Sample2.fcsS/L/Lv/3+/{escaped/slash}Median(FITC-A)23,000
Sample2.fcsS/L/Lv/3+/4+/IFNg+IL2+%ile(<Pacific-Blue>:30)0.93


Grouped By Sample

The only required column is Sample. The remaining columns are statistic columns where the column name contain the population name and statistic name separated by a colon.

SampleS/L/Lv/3+/4+/IFNg+IL2+:CountS/L/Lv/3+/4+/IFNg+IL2+:%PS/L/Lv/3+/4+/IFNg+IL2-:%ile(<Pacific-Blue>:30)S/L/Lv/3+/4+/IFNg+IL2-:%P
Sample1.fcs120010.93123140.24
Sample2.fcs130560.85130230.56


Grouped By Sample and Population

The required columns are Sample and Population. The remaining columns are statistic names including any required parameter part and percentile part.

SamplePopulationCount%PMedian(FITC-A)%ile(<Pacific-Blue>:30)
Sample1.fcsS/L/Lv/3+/4+/IFNg+IL2+120010.934522312314
Sample1.fcsS/L/Lv/3+/4+/IFNg+IL2-123120.94 12345
Sample2.fcsS/L/Lv/3+/4+/IFNg+IL2+130560.85 13023
Sample2.fcsS/L/Lv/{slash/escaped}30420.3513023 


Grouped By Sample, Population, and Parameter

The required columns are Sample, Population, and Parameter. The remaining columns are statistic names with any required percentile part.

SamplePopulationParameterCount%PMedian%ile(30)
Sample1.fcsS/L/Lv/3+/4+/IFNg+IL2+ 120010.93  
Sample1.fcsS/L/Lv/3+/4+/IFNg+IL2+FITC-A  45223 
Sample1.fcsS/L/Lv/3+/4+/IFNg+IL2+<Pacific-Blue>   12314


Graphs File

The graphs.tsv file is a catalog of plot images generated by the analysis. It is similar to the statistics file and lists the sample name, plot file name, and plot parameters. Currently, the only plot parameters included in the graphs.tsv are the population and x and y axes. The graph.tsv file contains one graph image per row. The population column is encoded in the same manner as in the statistics.tsv file. The graph column is the colon-concatenated x and y axes used to render the plot. Compensated parameters are surrounded with <> angle brackets. (Future formats may split x and y axes into separate columns to ease parsing.) The path is a relative file path to the image (no “.” or “..” is allowed in the path) and the image name is usually just an MD5-sum of the graph bytes.

Multi-sample or multi-plot images are not yet supported.

SamplePopulationGraphPath
Sample1.fcsS/L/Lv/3+/4+/IFNg+IL2+<APC-A>sample01/graph01.png
Sample1.fcsS/L/Lv/3+/4+/IFNg+IL2-SSC-A:<APC-A>sample01/graph02.png
Sample2.fcsS/L/Lv/3+/4+/IFNg+IL2+FSC-H:FSC-Asample02/graph01.svg
...   


Compensation File

The compensation.tsv file maps sample names to compensation matrix file paths. The required columns are Sample and Path. The path is a relative file path to the matrix (no “.” or “..” is allowed in the path). The comp. matrix file is in the FlowJo comp matrix file format or a GatingML transforms:spilloverMatrix XML document.

SamplePath
Sample1.fcscompensation/matrix1
Sample2.fcscompensation/matrix2.xml


Keywords File

The keywords.tsv lists the keyword names and values for each sample. This file has the required columns Sample, Keyword, and Value.

SampleKeywordValue
Sample1.fcs$MODEL
Sample1.fcs$DATATYPEF
...