Premium Feature — This feature is available in the Professional, Professional Plus, and Enterprise Editions. Learn more or contact LabKey

File Watchers let site administrators set up the monitoring of directories on the file system. Multiple file watchers can be set up to monitor a single directory for file changes, or each file watcher could watch a different location. When new or updated files appear in a monitored directory, a specified pipeline task (or set of tasks) will be triggered. Pipeline tasks can include:

    • Reload a folder archive: Reload an unzipped folder archive.
    • Reload lists from data files: Import data to an existing List from source files.
    • Import/reload study datasets from data files: Create or reload study datasets from source files. This task can create dataset definitions if they are do not already exist in the study.
    • Reload an entire study: Reload or populate a study from a study archive.
    • Import a directory of FCS files: Import flow files to the flow module.
Each File Watcher can be configured to be triggered only when specific file name patterns are detected, such as watching for '.xlsx' files, etc. Use caution when defining multiple file watchers to monitor the same location. If file name patterns are not sufficiently distinct, you may encounter conflicts among filewatchers acting on the same files.

When files are detected, by default they are moved (not copied) to the LabKey folder's pipeline root where they are picked up for processing. (You can change this default behavior and specify that the files be moved to a different location.)

Create a FileWatcher Listener

  • Navigate to the folder where you want the files to be imported, i.e. the destination in LabKey.
  • Open the File Watcher management UI:
    • If it is a study folder, click the Manage tab, then click Manage File Watchers.
    • In other folders, select (Admin) > Folder > Management. Click the Import tab and scroll down.
  • Depending on your project's enabled module set, options for trigger creation may vary. Click the desired link below Create a trigger to...:
    • Reload folder archive: This option reloads a previously created folder archive. The reloader expects an unzipped folder archive with a folder.xml in the base directory. To create an unzipped folder archive, export the folder to your browser as a .zip file, and then unzip it. (Available in any folder type.)
    • Reload lists using data file: This option imports data to existing Lists from source files. Note that this task cannot create a new List definition: the List definition must already exist on the server. The list module must also be enabled in the folder. By default, this task replaces List data using the contents of the source files. You may merge data by including the custom parameter: “mergeData”: “true”. (Available in any folder type.)
    • Reload study: This option reloads or populates a study from an unzipped study archive. This task is triggered by a .txt file (i.e. studyload.txt) and reloads a study (both its datasets and study configurations such as cohort assignments). To create a study archive, see Export a Study. (Available in a study folder.)
    • Import/reload study datasets using data file: This option creates and/or loads data into study datasets from source files. This task can create dataset definitions if they are do not already exist in the study. Note that merging dataset data is not supported, only truncate and replace all. Upon reload the entire dataset design and data are replaced. Dataset columns are added or removed depending on the columns found in the Excel/TSV source file. (Available in a study folder.)
    • Import a directory of FCS files: Import flow files to the flow module. (Available in a flow folder.)
  • Manage File Watcher Triggers: Click to see the table of all currently configured file watchers.

Configure the Trigger

Details

  • Name - A unique name for the trigger.
  • Description - A description for the trigger.
  • Type - Currently supports one value 'pipeline-filewatcher'.
  • Pipeline Task - Tasks that can be run without user intervention, making them eligible to use during FileWatcher imports. See above for detailed descriptions of these options. Options vary based on the modules present in the project, but may include:
    • Import/reload study datasets using data file - (either TSV or Excel)
    • Reload lists using a data file - (either TSV or Excel)
    • Reload study - import datasets, lists, and study properties
    • Import a directory of FCS files - This option is enabled inside a Flow folder.
  • Run as username - The file watcher will run as this user in the pipeline. It is strongly recommended that this user has elevated permissions to perform updates, deletes, etc.
  • Assay Provider - Use this provider for running assay import runs.
  • Enabled - Turns on detection and triggering.
Click Next to move to the next panel.

Configuration

  • Location - File drop location. This can be an absolute path on the server’s file system or a relative path under the container’s pipeline root.
  • Include child folders - A boolean indicating whether to seek uploadable files in subdirectories (currently to a max depth of 3).
  • FilePattern: A Java regular expression that captures filenames of interest and can extract and use information from the filename to set other properties. We recommend using a regex interpreter, such as https://regex101.com/, to test the behavior of your file pattern. Options are described below and include:
  • Quiet period - Number of seconds to wait after file activity before executing a job (minimum is 1). If you encounter conflicts, particularly when running multiple filewatchers monitoring the same location, try increasing the quiet period.
  • Move - Where the file should be moved. This must be a relative or absolute container path, and values from the filename can be substituted in the move path (ex. study). If there is no move location specified, the file’s location must be under a pipeline root.
  • Copy file to - Where the file should be copied to before analysis. This can be absolute or relative to the current project/folder.
  • Parameter Function - Include a JavaScript function to be executed during the move. (See details below.)
  • Add custom parameter - These parameters will be passed to the chosen pipeline task for consumption in addition to the standard configuration.
Click Save when finished.

File Pattern Options

No file pattern / Default file pattern

If no FilePattern is supplied, the default pattern is used:

(^\D*).(?:tsv|txt|xls|xlsx)

This pattern matches only file names that contain letters and special characters (for example: Dataset_A.tsv). File names which include digits (for example: Dataset_1.tsv) are not matched, and their data will not be loaded.

If you want to target datasets that have digits in their names, use a "name capture group" as the FilePattern. See below for details.

Under the default file pattern, the following reloading behavior will occur:

File NameFile Watcher Behavior
DemgraphicsA.tsvFile matched, data loaded into dataset DemographicsA.
DemgraphicsB.tsvFile matched, data loaded into dataset DemographicsB.
Demgraphics1.tsvNo file match, data will not be loaded.
Demgraphics2.tsvNo file match, data will not be loaded.

User defined pattern

You can use any regex pattern to select source files for reloading. For example, suppose you have the following third source files:

FooStudy_Demographics.tsv
FooStudy_LabResults.tsv
BarStudy_Demographics.tsv

The regex file pattern...

FooStudy_(.+).(tsv)

will result in the following behavior...

File NameFile Watcher Behavior
FooStudy_Demographics.tsvFile matched, data loaded into dataset FooStudy_Demographics.
FooStudy_LabResults.tsvFile matched, data loaded into dataset FooStudy_LabResults.
BarStudy_Demographics.tsvNo file match, data will not be loaded.

"Name Capture Group" pattern

This type of file pattern extracts names or ID's from the source file name and targets an existing dataset of the same name or id. For example, suppose you have a source file with the following name:

dataset_Demographics_.xls

The following file pattern extracts the value <name> from the file name, in this case the string "Demographics" that occurs between the underscore characters, and loads data into an existing dataset with the same name "Demographics".

dataset_(?<name>.+)_.(xlsx|tsv|xls)

Note that you can use the technique above to target datasets that include numbers in their names. For example, using the pattern above, the following behavior will result.

File NameFile Watcher Behavior
dataset_Demographics_.tsvFile matched, data loaded into dataset Demographics.
datasetDemographics.tsvNo file match, data will not be loaded.
dataset_LabResults1_.tsvFile matched, data loaded into dataset LabResults1.
dataset_LabResults2_.tsvFile matched, data loaded into dataset LabResults2.

To target a dataset by its dataset id, rather than its name, then use the following regex, where <id> refers to the dataset id. Note that you can determine a dataset's id by navigating to your study's Manage tab, and clicking Manage Datasets. The table of existing datasets shows the id for each dataset in the first column.

dataset_(?<id>.+)_.(xlsx|tsv|xls)

Examples

Example: Dataset Creation and Import

Suppose you want to create a set of datasets based on Excel and TSV files, and load data into those datasets. To set this up, do the following:

  • Prepare your Excel/TSV files to match the expectations of your study, especially, time point-style (date or visit), ParticipantId column name, and time column name
  • Copy the Excel/TSV files to a location available to the File Watcher. You can do this by either (1) copying the file to the server's machine or (2) uploading the file into the study's File Repository.
  • Create a trigger to Import/reload study datasets using data file.
  • Location: point the trigger at your directory of files.
  • When the trigger is enabled, datasets will be created and loaded in your study.

Example: FCS Files

Consider a process where FCS flow data is deposited in a common location by a number of users, with each data export placed into a subdirectory of the watched folder, perhaps in a separate subdirectory per user.

When the File Watcher finds these files, they are placed into a new location under the folder pipeline root based on the current user and date. Example: @pipeline/${username}/${date('YYYY-MM')}

LabKey then imports the FCS data to that container. All FCS files within a single directory are imported as a single experiment run in the flow module.

Example: File Name Pattern Matching

Consider a set of data with original filenames matching a format like this: "sample_<timestamp>_<study_id>.xml", for example:

sample_2017-09-06_study20.xml

An example filePattern regular expression that would capture such filenames would be:

sample_(.+)_(?<study>.+).xml

If the specified pattern matches a file placed in the watched location, then the specified move and/or execute steps will be performed on that file. Nothing will happen to files in the watched location which do not match the pattern.

If the regular expression contains named capturing groups, such as the "(?<study>.+)" portion in the example above, then the corresponding value (in this example "study20" can be substituted into other property expressions. For instance, a move setting of:

/studies/${study}/@pipeline/import/${now:date}
would resolve into:
/studies/study20/@pipeline/import/2017-11-07 (or similar)
This substitution allows the administrator to configure the file watcher to automatically determine the destination folder based on the name, ensuring that the data is uploaded to the correct location.

Example: Using the Parameter Function

The Parameter Function is a JavaScript function which is executed during the move. In the example below, the username is selected programmatically:

var userName = sourcePath.getNameCount() > 0 ? sourcePath.getName(0) : null;
var ret = {'pipeline, username': userName }; ret;

Related Topics

Discussion

Was this content helpful?

Log in or register an account to provide feedback


previousnext
 
expand all collapse all