Premium Feature — Available in the Professional, Professional Plus, and Enterprise Editions. Learn more or contact LabKey.

In this step you will download and install a basic workspace for working with ETL processes. In our example, the data warehouse is represented by another table on the same server, rather than requiring configuration of multiple machines.

Set Up ETL Workspace

In this step you will import a pre-configured workspace in which to develop ETL processes. (Note that there is nothing mandatory about the way this workspace has been put together -- your own ETL workspace may be different, depending on the needs of your project. This particular workspace has been configured especially for this tutorial as a shortcut to avoid many set up steps, steps such as connecting to source datasets, adding an empty dataset to use as the target of ETL scripts, and adding ETL-related web parts.)

  • Download the folder archive:

  • Log in to your server and navigate to a location where you can create a new folder.
  • Create a new folder named "ETL Workspace". Choose the folder type "Study" and accept other folder defaults.
  • Import ETLWorkspace.folder.zip into the folder:
    • In the Study Overview panel, click Import Study.
    • On the Folder Management page, confirm Local zip archive is selected and click Choose File.
    • Select the folder archive that you downloaded: ETLWorkspace.folder.zip.
    • Click Import Study.
    • When the import is complete, click ETL Workspace to see the workspace.

You now have a workspace where you can develop ETL scripts. It includes:

  • A LabKey Study with various datasets to use as data sources
  • An empty dataset named Patients to use as a target destination
  • The ETL Workspace tab provides an area to manage and run your ETL processes. Notice that this tab contains three web parts:
    • Data Transforms shows the available ETL processes. Currently it is empty because there are none defined.
    • The Patients dataset (the target dataset for the process) is displayed, also empty because no ETL process has been run yet. When you run an ETL process in the next step the the empty Patients dataset will begin to fill with data.
    • The Demographics dataset (the source dataset for this tutorial) is displayed with more than 200 records.

Create an ETL

  • Click the ETL Workspace tab to ensure you are on the main folder page.
  • Select (Admin) > Folder > Management.
  • Click the ETLs tab. If you don't see this tab, you may not have the dataintegration module enabled on your server. Check on the Folder Type tab, under Modules.
  • Click (Insert new row) under Custom ETL Definitions.
  • Replace the default XML in the edit panel with the following code:
    <?xml version="1.0" encoding="UTF-8"?>
    <etl xmlns="http://labkey.org/etl/xml">
    <name>Demographics >>> Patients (Females)</name>
    <description>Update data for study on female patients.</description>
    <transforms>
    <transform id="femalearv">
    <source schemaName="study" queryName="FemaleARV"/>
    <destination schemaName="study" queryName="Patients" targetOption="merge"/>
    </transform>
    </transforms>
    <schedule>
    <poll interval="1h"/>
    </schedule>
    </etl>
  • Click Save.
  • Click the ETL Workspace tab to return to the main dashboard.
  • The new ETL named "Demographics >>> Patients (Females)" is now ready to run. Notice it has been added to the list under Data Transforms.

Start Over | Next Step

Discussion

Was this content helpful?

Log in or register an account to provide feedback


previousnext
 
expand all collapse all