XML Schema "etl.xsd"

Target Namespace:

Defined Components:

1 global element, 30 local elements, 26 complexTypes, 5 simpleTypes

Default Namespace-Qualified Form:

Local Elements: qualified; Local Attributes: unqualified

Schema Location:

C:\dev\labkey\labkeyHome\server\modules\premiumModules\dataintegration\schemas\etl.xsd, see XML source

Imports Schema:

queryCustomView.xsd [src]

XML Source

<?xml version="1.0" encoding="UTF-8"?>

<all>

</all>

ETL can be run on its own, rather than a subcomponent that can only be queued from another ETL.

</documentation>

</annotation>

</attribute>

ETL can run at a site level scope, not just in a container.

</documentation>

</annotation>

</attribute>

By default, if a job for the etl is already pending (waiting), we block adding another instance of the etl to the job queue. Set this flag to override and allow multiple instances in the queue.

</documentation>

</annotation>

</attribute>

Optional. Experimental, Postgres datasources only. Causes a multi-step ETL to wrap an entire run of the ETL in a single transaction
on the source datasource containing the schema specified. This transaction will be isolation level REPEATABLE READ, guaranteeing consistent
state of source data throughout the transaction. This prevents phantom reads of source data which has been inserted/updated since
the ETL started. Only meaningful if the source schemas for every step in the ETL are all from the same datasource. Individual steps
will not have their own transaction boundaries.
The intended use case is for a source datasource external to LabKey Server.
This setting is independent of the optional transactDestinationSchema setting. If both are specified, and the two schemas are from
the same datasource, the ETL will be wrapped in a single REPEATABLE READ transaction. This scenario is only supported if
tha single datasource is Postgres, and is not recommended.

</documentation>

</annotation>

</attribute>

Optional. Very experimental, use with caution. Causes a multi-step ETL to wrap an entire run of the ETL in a single transaction
on the destination datasource containing the schema specified. This transaction will only be committed upon successful completion
of the ETL run; any error will cause a full rollback of every step up to that point. Only meaningful if the destination schemas for every
step in the ETL are all from the same datasource. Individual steps will not have their own transaction boundaries, nor will setting
a batchSize have any effect. Using this setting will very likely cause lock contention problems on the destination queries,
especially on SQL Server.
This setting is independent of the optional transactSourceSchema setting. If both are specified, and the two schemas are from
the same datasource, the ETL run will be wrapped in a single REPEATABLE READ transaction. This scenario is only supported if
tha single datasource is Postgres, and is not recommended.

</documentation>

</annotation>

</attribute>

</complexType>

</sequence>

</complexType>

</sequence>

</complexType>

<all>

</all>

Container path for doing cross-container ETLs. If not entered, defaults to the container where the ETL is running.

</documentation>

</annotation>

</attribute>

Override the column name set on a ModifiedSinceFilter incremental filter.

</documentation>

</annotation>

</attribute>

Override the column name set on a RunFilter incremental filter.

</documentation>

</annotation>

</attribute>

Wrap selecting from source query in a transaction. Only used for simple query transform types.

</documentation>

</annotation>

</attribute>

Specify the type of container filter to use on the source query.

</documentation>

</annotation>

</attribute>

</complexType>

Targets will be of type 'query' (the default), or 'file'. For query, the schemaName and queryName attributes are required. For file, the fileBaseName and fileExtension attributes are required.

</documentation>

</annotation>

<all>

</all>

Bulk loads minimize the logging for auditing purposes and other overhead.

</documentation>

</annotation>

</attribute>

The base name to use for an output target file. Special substitutions:
'${TransformRunId}' will be substituted with the transformRunId.
'${Timestamp}' will be substituted with the timestamp at file creation.

</documentation>

</annotation>

</attribute>

Optional (required for pipeline usage), the extension for the output target file. A leading dot is optional; there will always be a dot separator between the file basename and extension

</documentation>

</annotation>

</attribute>

Character to qualify text when a field contains the column or row delimiter

</documentation>

</annotation>

</attribute>

Wrap writing to target query in a transaction. Not used when target is a file.

</documentation>

</annotation>

</attribute>

Formerly called "transactionSize". For query targets, incrementally commit target transaction every n rows instead of in a single wrapping transaction.
For file targets, instead of writing a single file, create a new file every n rows. Each will be the fileBaseName + "-1", "-2", etc, + fileExtension

</documentation>

</annotation>

</attribute>

If specified, rather than count individual rows for the batchSize, use this as a sentinel field, and only increment the count of a current batch when the value in this field changes.
If the name of the relevant column is being mapped via columnTransforms, use the target name, not the source name.

</documentation>

</annotation>

</attribute>

</complexType>

</restriction>

</simpleType>

The target will be a file of basename and extension specified by fileBaseName and fileExtension.
By default the field and row separators will be for a tsv file, but this can be overridden with the optional columnDelimiter and rowDelimiter attributes.
For ETLs that use the target file as input to a pipeline command tasks, it is possible to skip running the command task for a 0 row file. Use the "etlOutputHadRows"
parameter in the pipeline configuration. See nlpContext.xml for an example of this.

</documentation>

</annotation>

</enumeration>

</restriction>

</simpleType>

</sequence>

<documentation>Wrap the call of the procedure in a transaction.</documentation>

</annotation>

</attribute>

</complexType>

When present, this procedure will be used to check if there is work for the job to do. If the output value
of this parameter is equal to the noWorkValue, there is no work for the job to do.
This can either be an actual value ("4"), or there is a substitution syntax to indicate comparison should be against the
input value of a certain parameter. E.g., a parameter name="batchId" noWorkValue="${batchId}" will indicate there
is no work for the job if the output batchId is the same as the one saved from the previous run.

</documentation>

</annotation>

</attribute>

</complexType>

These match two of the scopes defined in the Variable Map persisted in the TransformConfiguration.TransformState field.
(We don't support the notion of a 'parent' scope here.) Globally scoped parameters allow sharing/passing of context across multiple stored procedure steps.

</documentation>

</annotation>

</restriction>

</simpleType>

</sequence>

</complexType>

<all>

</all>

Use 'StoredProcedure' when handling the transform task wtih a stored procedure.
Use 'ExternalPipelineTask' (in conjunction with externalTaskId) when handling the transform task with an XML-defined pipeline task.
Use 'TaskrefTransformStep' (in conjunction with a taskRef element) to queue another ETL to run.

</documentation>

</annotation>

</attribute>

The command task (wrapped as a pipeline task) that handles the transform.
E.g., 'org.labkey.api.pipeline.cmd.CommandTask:myEngineCommand'

</documentation>

</annotation>

</attribute>

If an externalTaskId command task is defined, settings from this pipeline
(to date, workflow process key) will be applied to the pipeline wrapping the externalTask
E.g., 'org.labkey.api.pipeline.file.FileAnalysisTaskPipeline:myPipelineName'

</documentation>

</annotation>

</attribute>

Persist the job state to the database after this step (in addition to at the end of a successfully complete job).
Use with extreme caution; if a later step causes an error in the job, this will still be the assumed state for the next run.

</documentation>

</annotation>

</attribute>

</complexType>

<all>

</all>

</complexType>

Run the ETL at the specified interval, for example, 2h to run every two hours.
Valid intervals are:
y: Years
m: Months, if day or hour is specified
d: Days
h: Hours
m: Minutes, if no day or hour is specified
s: Seconds

</documentation>

</annotation>

</attribute>

</complexType>

</complexType>

The column in the deletedRowsSource which holds the key values to delete in the target. If not specified, the PK values of the delete query is used.

</documentation>

</annotation>

</attribute>

The column in the target query corresponding to the deleted key values. If not specified, we assume the PK of the target query.<br/>
If a non-PK column is specified, the delete values will be used as lookups to map back to the corresponding PK values in the target.

</documentation>

</annotation>

</attribute>

</extension>

</complexContent>

</complexType>

<all>

Defines a query with a list of rows to delete from the target in this incremental run.<br/>
Timestamp or run filter values against the delete rows source are tracked independently from the values in the<br/>
original source query. I.e., there could be no new records in the source query, but new records in the delete rows source<br/>
will still be found and deleted from the target.

</documentation>

</annotation>

</element>

</all>

Container path for doing cross-container run filter strategy. If not specified, defaults to the container where the ETL is running.

</documentation>

</annotation>

</attribute>

</complexType>

</restriction>

</simpleType>

Intended for use when the ETL's source query is a parameterized SQL query.
Parameter values placed here are passed into the SQL query.
For details see
https://www.labkey.org/Documentation/wiki-page.view?name=paramsql
and
https://www.labkey.org/Documentation/wiki-page.view?name=etlSamples#param

</documentation>

</annotation>

</sequence>

</complexType>

</complexType>

Enumerated list of java.sql.types. Mirrors the enum defined in java JdbcType.

</documentation>

</annotation>

</restriction>

</simpleType>

</sequence>

</complexType>

Intended for use when an ETL invokes file analysis pipeline command tasks.
Will be passed as parameter overrides into the protocol.

</documentation>

</annotation>

</complexType>

</sequence>

</complexType>

</sequence>

</complexType>

</complexType>

Defines a set of alternate match keys to use for merge ETLs instead of the primary key of the target.

</documentation>

</annotation>

</sequence>

</complexType>

</complexType>

</sequence>

</complexType>

Defines a java class to use to transform a column value.
This works for both query and file targets.
The implementation of the transform class declares if "source" and/or "target"
column names are required. Default implementation requires only source, and the
same name is used for the target column.
If no transformClass is specified, a simple in-flight column name mapping is
performed between the source and target. In this case both source and target are required.

</documentation>

</annotation>

</complexType>

A list of fields with constant values that will be added to the source fields.
If there is a column with that name in the source query, the value provided here will
be an override.
These can be defined at the global etl level, applying to all steps, or at the individual step
(destination) level. If the same column is specified both globally and in an individual step,
that step will receive the value defined at the step level.
The form of these are the same as for source query parameters. See
https://www.labkey.org/Documentation/wiki-page.view?name=etlSamples#param

</documentation>

</annotation>

</sequence>

</complexType>

</schema>

This XML schema documentation has been generated with DocFlex/XML RE 1.7.2 using DocFlex/XML XSDDoc 2.1.0 template set.

DocFlex/XML RE is a reduced edition of DocFlex/XML, which is a tool for programming and running highly sophisticated documentation and reports generators by the data obtained from any kind of XML files. The actual doc-generators are implemented in the form of special templates that are designed visually using a high quality Template Designer GUI basing on the XML schema (or DTD) files describing the data source XML.

DocFlex/XML XSDDoc is a commercial template application of DocFlex/XML that implements a high-end XML Schema documentation generator with simultaneous support of framed multi-file HTML, single-file HTML and RTF output formats. (More formats are planned in the future).

A commercial license for "DocFlex/XML XSDDoc" will allow you:

To configure the generated documentation so much as you want. Thanks to our template technology, it was possible to support more than 300 template parameters (working the same as "options" of an ordinary doc-gen), which will give you an unprecedented control over the generated content!
To use certain features disabled in the free mode (such as the full documenting of substitution groups).
To enable/disable documenting of the initial, imported, included and redefined XML schemas selectively.
To document local element components both globally and locally (similar to attributes).
To enable/disable reproducing of namespace prefixes.
To format your annotations with XHTML tags and reproduce that formatting both in HTML and RTF output.
To insert images in your annotations using XHTML <img> tags (supported both in HTML and RTF output).
To use PlainDoc.tpl main template to generate all the XML schema documentation in the form of a single HTML file.
To use the same template to generate the incredible quality RTF documentation.
To document only selected XML schema components specified by name.
To remove this very advertisement text

Once having only such a license, you will be able to run the fully-featured XML schema documentation generator both with DocFlex/XML SDK and with DocFlex/XML RE, which is a reduced free edition containing only the template interpretor / output generator. No other licenses will be required!

But this is not all. In addition to it, a commercial license for DocFlex/XML SDK will allow you to modify the XSDDoc templates themselves as much as you want. You will be able to achieve whatever was impossible to do with the template parameters only. And, of course, you could develop any template applications by your own!

Please note: By purchasing a license for this software, you not only acquire a useful tool, you will also make an important investment in its future development, the result of which you could enjoy later by yourself. Every single your purchase matters and makes a difference for us!

To buy a license, please follow this link: http://www.filigris.com/shop/