ETL: Examples: /home/

ETL: Examples

This page contains sample ETL configuration files you can use as templates for development.

Interval - 1 Hour
Interval - 5 Minutes
Cron - 1 Hour
Merge
Merge By Run ID
Merge with Alternate Key
Append with Two Targets
Truncate
Passing Parameters to a SQL Query
Truncate the Target Query
Java Transforms

Instructions for putting together working ETL samples (and a test environment) are available in this topic: Tutorial: Extract-Transform-Load (ETL).

Interval - 1 Hour

<?xml version="1.0" encoding="UTF-8"?>
<etl xmlns="http://labkey.org/etl/xml">
    <name>Patient - Merge</name>
    <description>Merges patient data to the target query.</description>
    <transforms>
        <transform id="1hour">
            <source schemaName="external" queryName="etl_source" />
            <destination schemaName="patient" queryName="etl_target" targetOption="merge"/>
        </transform>
    </transforms>
    <incrementalFilter className="ModifiedSinceFilterStrategy" timestampColumnName="modified" />
    <schedule><poll interval="1h"></poll></schedule>
</etl>

Interval - 5 Minutes

<schedule><poll interval="5m" /></schedule>

Cron - 1 Hour

Check at midnight every day.

<?xml version="1.0" encoding="UTF-8"?>
<etl xmlns="http://labkey.org/etl/xml">
    <name>Cron on the hour</name>
    <transforms>
        <transform id="eachHour">
            <description>valid</description>
            <source schemaName="external" queryName="etl_source" />
            <destination schemaName="patient" queryName="etl_target" targetOption="merge"/>
        </transform>
    </transforms>
    <incrementalFilter className="ModifiedSinceFilterStrategy" timestampColumnName="modified" />
    <schedule><cron expression="0 0 * * * ?" /></schedule>
</etl>

Merge

<?xml version="1.0" encoding="UTF-8"?>
<etl xmlns="http://labkey.org/etl/xml">
    <name>Merge</name>
    <description>Merge rows from etl_source to etl_target.</description>
    <transforms>
        <transform id="merge">
            <description>Merge to target.</description>
            <source schemaName="external" queryName="etl_source" />
            <destination schemaName="patient" queryName="etl_target" targetOption="merge"/>
        </transform>
    </transforms>
    <incrementalFilter className="ModifiedSinceFilterStrategy" timestampColumnName="modified" />
</etl>

Merge by Run ID

<?xml version="1.0" encoding="UTF-8"?>
<etl xmlns="http://labkey.org/etl/xml">
    <name>MergeByRunId</name>
    <description>Merge by run id.</description>
    <transforms>
        <transform id="step1">
            <description>Copy to target</description>
            <source schemaName="patient" queryName="etlsource" />
            <destination schemaName="target" queryName="etltarget" />
        </transform>
    </transforms>
    <incrementalFilter className="RunFilterStrategy" runTableSchema="patient" 
      runTable="Transfer" pkColumnName="Rowid" fkColumnName="TransformRun" />
    <schedule>
        <poll interval="15s" />
    </schedule>
</etl>

Merge with Alternate Key

Specify an alternate key to use for merging when the primary key is not suitable, i.e. would cause duplicates or orphaned data.

<destination schemaName="vehicle" queryName="etl_target2" targetOption="merge"> 
      <alternateKeys> 
          <!-- The pk of the target table is the "rowId" column. Use "id" as an alternate match key --> 
          <column name="id"/> 
      </alternateKeys> 
</destination>

Append with Two Targets

For example, you might want to ensure that a given stored procedure is executed (step1) before loading the data into the destination (step2).

<?xml version="1.0" encoding="UTF-8"?>
<etl xmlns="http://labkey.org/etl/xml">
    <name>Append</name>
    <description>append rows from etl_source to etl_target and etl_target2</description>
    <transforms>
        <transform id="step1">
            <description>Copy to target</description>
            <source schemaName="external" queryName="etl_source" timestampcolumnname="modfiied" />
            <destination schemaName="patient" queryName="etl_target" />
        </transform>
        <transform id="step2">
            <description>Copy to target two</description>
            <source schemaName="external" queryName="etl_source" />
            <destination schemaName="patient" queryName="etl_target2" />
        </transform>
    </transforms>
    <incrementalFilter className="SelectAllFilterStrategy"/>
</etl>

Truncate

<?xml version="1.0" encoding="UTF-8"?>
<etl xmlns="http://labkey.org/etl/xml">
    <name>Truncate</name>
    <description>Clear target and append rows from etl_source.</description>
    <transforms>
        <transform id="step1">
            <description>Copy to target</description>
            <source schemaName="patient" queryName="etl_source" />
            <destination schemaName="patient" queryName="etl_target" targetOption="truncate"/>
        </transform>
    </transforms>
    <incrementalFilter className="ModifiedSinceFilterStrategy" timestampColumnName="modified" />
    <schedule>
        <poll interval="15s" />
    </schedule>
</etl>

Passing Parameters to a SQL Query

The following ETL process passes parameters (MinTemp=99 and MinWeight=150) into its source query (a parameterized query).

<?xml version="1.0" encoding="UTF-8" ?>  
<etl xmlns="http://labkey.org/etl/xml">  
    <name>PatientsToTreated</name>  
    <description>Transfers from the Patients table to the Treated table.</description>  
    <transforms>  
        <transform id="step1">  
            <description>Patients to Treated Table</description>  
            <source queryName="Patients" schemaName="study"/>  
            <destination schemaName="study" queryName="Treated"/>  
        </transform>  
    </transforms> 
    <parameters> 
        <parameter name="MinTemp" value="99" type="DECIMAL" /> 
        <parameter name="MinWeight" value="150" type="DECIMAL" /> 
    </parameters> 
    <schedule>  
        <poll interval="1h"/>  
    </schedule>  
</etl>

Truncate the Target Query

The following truncates the target table, without copying any data from a source query. Note the lack of a <source> element.

<?xml version="1.0" encoding="UTF-8"?>
<etl xmlns="http://labkey.org/etl/xml">
    <name>Truncate Patients Table</name>
    <description>Update data for study on male patients.</description>
    <transforms>
        <transform id="trunc">
            <destination schemaName="study" queryName="Patients" targetOption="truncate"/>
        </transform>
    </transforms>
    <schedule>
        <poll interval="1h"/>
    </schedule>
</etl>

Java Transforms

Java developers can add a Java class to handle the transformation step of an ETL process. The column to be transformed points to the Java class as follows:

<columnTransforms>
    <column source="columnToTransform" transformClass="org.labkey.di.columnTransforms.MyJavaClass"/>
</columnTransforms>

The Java class receives the values of the column one row at a time. The Java class can validate, transform or perform some other action on these values. What is returned from the doTransform method of this class is what gets inserted into the target table. See below for an example implementation. Also see the ColumnTransform interface for available setters, getters, and methods.

The ETL source below uses the Java class org.labkey.di.columnTransforms.TestColumnTransform to apply changes to data in the "name" column.

ETL.xml

<?xml version="1.0" encoding="UTF-8"?>
<etl xmlns="http://labkey.org/etl/xml">
<name>Append Transformed Column</name>
<description>Append rows from etl_source to etl_target, applying column transformation using a Java class.</description>
<transforms>
    <transform id="step1" type="org.labkey.di.pipeline.TransformTask">
        <description>Copy to target</description>
        <source schemaName="vehicle" queryName="etl_source" />
        <destination schemaName="vehicle" queryName="etl_target">
            <columnTransforms>
                <column source="name" transformClass="org.labkey.di.columnTransforms.TestColumnTransform"/>
            </columnTransforms>
            <constants>
                <column name="myConstant" type="varchar" value="aConstantValue"/>
            </constants>
        </destination>
    </transform>
</transforms>
<incrementalFilter className="SelectAllFilterStrategy" />
</etl>

The Java class below is used by the ETL process to apply transformations to the supplied column, in this case the "name" column.

TestColumnTransform.java

package org.labkey.di.columnTransforms;

import org.labkey.api.di.columnTransform.AbstractColumnTransform;

/**
 * An example of Java implementing a transform step.
 * Prepends the value of the "id" column of the source query
 * to the value of the source column specified in the ETL configuration xml,
 * then appends the value of the "myConstant" constant set in the xml.
 */
public class TestColumnTransform extends AbstractColumnTransform
{
    @Override
    protected Object doTransform(Object inputValue)
    {
        Object prefix = getInputValue("id");
        String prefixStr = null == prefix ? "" : prefix.toString();
        return prefixStr + "_" + inputValue + "_" + getConstant("myConstant");
    }
}

LabKey Support

LabKey Support

ETL: Examples

ETL: Examples

Interval - 1 Hour

Interval - 5 Minutes

Cron - 1 Hour

Merge

Merge by Run ID

Merge with Alternate Key

Append with Two Targets

Truncate

Passing Parameters to a SQL Query

Truncate the Target Query

Java Transforms

ETL.xml

TestColumnTransform.java

Search

Pages

LabKey Support

LabKey Support

ETL: Examples

ETL: Examples

Interval - 1 Hour

Interval - 5 Minutes

Cron - 1 Hour

Merge

Merge by Run ID

Merge with Alternate Key

Append with Two Targets

Truncate

Passing Parameters to a SQL Query

Truncate the Target Query

Java Transforms

ETL.xml

TestColumnTransform.java

Search

Scope ?

Categories ?

Sort ?

Pages