Premium Feature — Available in the Professional, Professional Plus, and Enterprise Editions. Learn more or contact LabKey.

The filter strategy is how the ETL process identifies new rows in the source database. Filter strategies use a designated field on the destination table to be compared to the source and only pulls over new rows based on that field. There are three options:

  • SelectAllFilterStrategy: Apply no further filter to the source; simply transform/transfer all rows returned by the query.
  • ModifiedSinceFilterStrategy: Use a specified date/timestamp column (timeStampColumnName) to identify the new records. Rows changed since the last run will be transformed. This is the most commonly used filter strategy.
  • RunFilterStrategy: Check a specified column, typically an increasing integer column (e.g. Run ID), against a given or stored value. For instance, any rows with a higher value than when the ETL process was last run are transformed. Often used for relational data.
For example, the strategy below says to check for updated data by consulting the "Date" field.

<incrementalFilter className="ModifiedSinceFilterStrategy" timestampColumnName="Date" />

Filter Strategies and Merge Options

 SelectAllRunModifiedSince
merge   
append  Add new rows to your target table and avoid conflicts/duplicate rows.
truncate   

Incremental Deletion of Target Rows

When incrementally deleting rows based on a selective filter strategy, use the element deletedRowsSource to correctly track the filtered values for deletion independently of the main query. Even if there are no new rows in the source query, any new records in the "deleteRowsSource" will still be found and deleted from the target. Using this method, the non-deleted rows will keep their row ids, maintaining any links to other objects in the target table.

Discussion

Was this content helpful?

Log in or register an account to provide feedback


previousnext
 
expand all collapse all