External Schemas and Data Sources

Documentation
An external schema can provide access to tables that are managed on any PostgreSQL database, or with Premium Editions of LabKey Server, other database types including Snowflake, Microsoft SQL Server, Oracle, MySQL, Amazon Redshift, or SAS/SHARE.

Site Administrators can make externally-defined schemas accessible within LabKey, with options to limit access and/or only a subset of tables within each schema. Once a schema is accessible, externally-defined tables become usable within LabKey as if they were defined in the primary database.

This topic covers how to configure external data sources and load schemas from those data sources. It has been revised for versions 24.3 and higher. If you are configuring an external datasource and/or schema on an earlier version, see the documentation archives for the previous configuration method.

Overview

External schemas (tables in external data sources) give administrators access to edit external tables from within the LabKey interface, if the schema has been marked editable and the table has a primary key. XML metadata can also be added to specify formatting or lookups. Folder-level security is enforced for the display and editing of data contained in external schemas.

You can also pull data from an existing LabKey schema in a different folder by creating a "linked schema". You can choose to expose some or all of the tables from the original schema. The linked tables and queries may be filtered such that only a subset of the rows are shown. For details see Linked Schemas and Tables.

Note: you cannot create joins across data sources, including joins between external and internal schema on LabKey Server. As a work around, use an ETL to copy the data from the external data source(s) into the main internal data source. Once all of the data is in the main data source, you can create joins on the data.

Usage Scenarios

  • Display, analyze, and report on any data stored on any database server within your institution.
  • Build LabKey applications using external data without relocating the data.
  • Create custom queries that join data from standard LabKey tables with user-defined tables in the same database.
  • Publish SAS data sets to LabKey Server, allowing secure, dynamic access to data sets residing in a SAS repository.
Changes to the data are reflected automatically in both directions. Data rows that are added, deleted, or updated from either the LabKey Server interface or through external routes (for example, external tools, scripts, or processes) are automatically reflected in both places. Changes to the table schema are not immediately reflected, see below.)

Avoid: LabKey strongly recommends that you avoid defining the core LabKey Server schemas as external schemas. There should be no reason to use a LabKey schema as an external schema and doing so invites problems during upgrades and can be a source of security issues.

External Data Source Configuration

Before you define an external schema in LabKey server, you must first configure a new data source in LabKey Server. To do so, give the data source a unique JNDI name that you will use when configuring external schemas.

  • This name, included in parameter names for the section, should end with a "DataSource" suffix that will not be shown later when you select it (i.e. "externalPgDataSource" will appear as "externalPg").
  • The name give must be unique.
  • Substitute the name you assign in the definition of the necessary properties in the application.properties file.
For each external data source, you will create (and uncomment) a new set of these lines, substituting the JNDI name for the data source where you see "@@extraJdbcDataSource@@" in the property name, and providing the necessary values for each:
context.resources.jdbc.@@extraJdbcDataSource@@.driverClassName=@@extraJdbcDriverClassName@@
context.resources.jdbc.@@extraJdbcDataSource@@.url=@@extraJdbcUrl@@
context.resources.jdbc.@@extraJdbcDataSource@@.username=@@extraJdbcUsername@@
context.resources.jdbc.@@extraJdbcDataSource@@.password=@@extraJdbcPassword@@

For example, if want to add a new external PostgreSQL data source named "externalPgDataSource", the application.properties section for it might look like this, noting the customization of the property name as well as the value, and providing the database name and login credentials to use:

context.resources.jdbc.externalPgDataSource.driverClassName=org.postgresql.Driver
context.resources.jdbc.externalPgDataSource.url=jdbc:postgresql://localhost:5432/<DB_NAME>
context.resources.jdbc.externalPgDataSource.username=<DB_USERNAME>
context.resources.jdbc.externalPgDataSource.password=<DB_PASSWORD>

Add an additional set of these lines for each external data source required, always providing a new unique name.

The additional properties available for the primary data source can also be set for any external data source following the same pattern of customizing the property name with the data source name.

External Database Details

See the following topics for more about the type of external data source you are using. For each, there are specifics about the driverClassName, url, and validationQuery for the database type, as well as other considerations and troubleshooting guidelines:

Load an External Schema from an External Data Source

Once you've defined the external data source, you can proceed to add an externally defined schema (i.e., a set of data tables) to LabKey Server. This schema and its tables will be surfaced in the Query Schema Browser.

To load an externally-defined schema into LabKey Server:

  • Click on the folder/project where you would like to place the schema.
  • Select (Admin) > Developer Links > Schema Browser.
  • Click Schema Administration.
  • Click New External Schema.

Fill out the following fields:

  • Schema Name – Required.
    • Name of the schema as you want to see it within LabKey Server.
    • When you select the Database Schema Name two items below this, this field will default to match that name, but you can specify an alternate name here if needed.
  • Data Source - JNDI name of the data source where you will find this external schema.
    • This is the "root" of the element you included in configuring this data source in the application.properties file, i.e. "someExternalDataSource" will appear as "someExternal").
    • Data source names must be unique.
    • All external data sources configured correctly are listed as options in this drop-down.
  • Database Schema Name: Required. Name of the physical schema within the external database.
    • This dropdown will offer all the schemas accessible to the user account under which the external schema was defined. If you don't see a schema you expect here, check the external database directly using the same credentials as the external data source connection uses.
    • Show System Schemas: Check the box to show system schemas which are filtered out of this dropdown.
  • Editable: Check to allow insert/update/delete operations on the external schema. This option currently only works on MSSQL and Postgres databases, and only for tables with a single primary key.
  • Index Schema Meta Data: Determines whether the schema should be indexed for full-text search.
  • Fast Cache Refresh: Whether or not this external schema is set to refresh its cache often. Intended for use during development.
  • Tables: Allows you to expose or hide selected tables within the schema.
    • Click the to expand the table list.
    • Checked tables are shown in the Query Schema Browser; unchecked tables are hidden.
    • If you don't see a table you expect here, check the external database directly using the same credentials as the external data source connection uses.
  • Meta Data: You can use a specialized XML format to specify how columns are displayed in LabKey.
    • For example you can specify data formats, column titles, and URL links. This field accepts instance documents of the TableInfo XML Schema XML schema.
When you are finished, click the Create button at the bottom of the form.

Manage External Schemas

  • Select (Admin) > Developer Links > Schema Browser.
  • Click Schema Administration.
The Schema Administration page displays the external and linked schemas that have been defined in the current folder. Click links for each to:
  • View Schema
  • Edit
  • Reload
  • Delete

Reload an External Schema

External schema metadata is not automatically reloaded. It is cached within LabKey Server for an hour, meaning changes, such as to the number of tables or columns, are not immediately reflected. If you make changes to external schema metadata, you may explicitly reload your external schema immediately using the reload link on the Schema Administration page.

Locate External Schemas Site-Wide

Find a comprehensive list of all external datasources and external schemas on the admin console:

  • Select (Admin) > Site > Admin Console.
  • Under Diagnostics, click Data Sources.
You'll see links to all containers (beginning with /) followed by all the external schemas defined therein. Click to directly access each external schema definition.

Test External Data Sources

On the Data Source Administration page, for each data source, you'll see a Test button. Click to open the test preparation page.

This test will enumerate all schemas and tables within the data source, attempting to retrieve the row count and 100 rows of data from each table, skipping the first 10 rows. This operation could take several minutes, depending on the number of schemas and tables. The server log will show progress as each table is tested.

This is not an exhaustive test but it does exercise paging, sorting, data type conversion, etc. on many different tables. Since some schemas and tables can be problematic (e.g., performance issues or unsupported data types), you can exclude testing of certain schemas and tables via name or prefix before executing the test by adding names to the corresponding text boxes on the test preparation page. Appending * to the end of a name acts as a prefix filter. These exclusions will be saved associated with the data source name.

Configure for Connection Validation

If there is a network failure or if a database server is restarted, the connection to the data source is broken and must be reestablished. Each data source definition can include a simple "validation query" to test each connection and attempt reconnection. If a broken connection is found, LabKey will attempt to create a new one. The default validation query will work for a PostgreSQL or MS SQL external data source:

context.resources.jdbc.@@extraJdbcDataSource@@.validationQuery=SELECT 1

For other databases, see the database specific documentation for the validation query to use.

Retry Failed Data Source Connections

Connections to external Data Sources are attempted during server startup. If an error occurs, the Data Source will not be available and will not appear on the dropdown list. When you view external schemas, you'll see the reason the connection isn't available, such as "no data source configuration exists" or "connection to [someExternalDataSource] failed".

To retry failed connections without restarting the server:

  • Select (Admin) > Developer Links > Schema Browser.
  • Click Schema Administration.
  • Click Retry Failed Data Source Connections.

Troubleshoot External Schemas

If you encounter problems defining and using external schema, start by checking the following:

  • Does the application.properties section for the external schema conform to the database-specific templates we provide? Over time both expected parameters and URL formats will change.
  • Is the name of the data source unique, and does it exactly match what you are entering when you define the external schema?
  • Is your external database of a supported version?
  • If you don't see the expected tables when defining a new external schema:
    • Try directly accessing the external database with the credentials provided in application.properties to confirm access to the "Database Schema Name" you specified to see if you see the expected tables.
    • Check whether a schema of that name already exists on your server. Check the box to show hidden schemas to confirm.
    • If you can directly access the LabKey database, check whether there is already an entry in the query.externalschema table. This table stores these definitions but is not surfaced within the LabKey schema browser.
    • If there is an unexpected duplicate external schema, you will not be able to add a new one. One possible cause could be a "broken view" on that schema. Learn more in this topic.
  • Check the raw schema metadata for the external schema by going to the container where it is defined and accessing the URL of this pattern (substitute your SCHEMA_NAME):
    /query-rawSchemaMetaData.view?schemaName=SCHEMA_NAME
  • Try enabling additional logging. Follow the instructions in this topic to temporarily change the logging level from "INFO" to "DEBUG", refresh the cache, and then retry the definition of the external schema to obtain debugging information.
    • Loggers
    • Loggers of interest:
      • org.labkey.api.data.SchemaTableInfoCache - for loading of tables from an external schema
      • org.labkey.api.data.ConnectionWrapper - for JDBC metadata, including attempts to query external databases for tables
      • org.labkey.audit.event.ContainerAuditEvent - at the INFO level, will report creation of external schema

Related Topics

Was this content helpful?

Log in or register an account to provide feedback


previousnext
 
expand allcollapse all