Database Development Guide: /Documentation

Database Development Guide

This topic covers low-level database access from Java code running in LabKey Server. "Low-level" specifically means access to JDBC functionality to communicate directly with the database and the various helpers we use internally that wrap JDBC.

This topic does not cover the the "user schema" layer that presents an abstraction of the underlying database to the user and which supports our APIs and LabKey SQL engine.

Security
DbScope
DbSchema
SqlFragment
SqlExecutor
SqlSelector
TableInfo
TableSelector
Table
QueryService.getSelectSQL()
High Level API
QueryUpdateService

As a guideline, the "low-level" objects start with DbScope and DbSchema and spread out from there. The "user-level" objects start with DefaultSchema which hands out UserSchema objects. The TableInfo class is a very important class and is shared between both DbSchema and UserSchema. When using a TableInfo it's important to keep in mind which world you are dealing with (DbSchema or UserSchema).

Security

It is very important to remember that when you are directly manipulating the database, you are responsible for enforcing permissions. The rules that any particular API or code path may need to enforce can be completely custom, so there is no blanket rule about what you must do. However, be mindful of common patterns involving the "container" column. Many tables in LabKey are partitioned into containers (aka folders in the UI) by a column named "container" that joins to core.containers. All requests (API or UI) to the LabKey server are done in the context of a container, and that container has a SecurityPolicy that is used as the default for evaluating permissions. For instance, the security annotations on an action are evaluated against the SecurityPolicy associated with the "container" of the current request.

@RequiresPermission(InsertPermission.class)
public class InsertAction extends AbstractIssueAction
{
  // Because of the @RequiresPermission annotation, 
  // We know the current user has insert permission in the current container
  assert getContainer().hasPermission(getUser(),InsertPermission.class);
}

If you are relying on these default security checks, it is very important to make sure that your code only operates on rows associated with the current container. For instance, if the url provides the value of a primary key of a row to update or delete, your code must validate that the container of that row matches the container of the current request.

Reference: https://www.owasp.org/index.php/Top_10_2013-A4-Insecure_Direct_Object_References

DbScope

DbScope directly corresponds to a javax.sql.Datasource configured in your webapp, and is therefore a source of JDBC connections that you can use directly in your code. We also think of a DbScope as a collection of schemas, the caching layer for meta data associated with schemas and tables, and our transaction manager.

You usually do not need to directly handle JDBC Connections, as most of our helpers take a DbScope/DbSchema/TableInfo directly rather than a Connection. However, DbScope.getConnection() is available if you need it.

DbSchema schema = DbSchema.get("core", DbSchemaType.Module);
DbScope scope = schema.getScope();
Connection conn= scope.getConnection();

DbScope also provides a very helpful abstraction over JDBC's built-in transaction api. Writing correct transaction code directly using JDBC can be tricky. This is especially true when you may be writing a helper function that does not know whether a higher level routine has already started a transaction or not. This is very easy to handle using DbScope.ensureTransaction(). Here is the recommended pattern to follow:

DbSchema schema = DbSchema.get("core", DbSchemaType.Module);
DbScope scope = schema.getScope();
try (DbScope.Transaction tx = scope.ensureTransaction())
{
   site = Table.insert(user, AccountsSchema.getTableInfoSites(), site);
   tx.commit();
}

When there is no transaction already pending on this scope object, a Connection object is created that is associated with the current thread. Subsequent calls to getConnection() on this thread return the same connection. This ensures that everyone in this scope/thread participates in the same transaction. If the code executes successfully, the call to tx.commit() then completes the transaction. When the try-with-resources block completes it closes the transaction. If the transaction has been committed then the code continues on. If the transaction has not been committed, the code assumes an error has occurred and aborts the transaction.

If there is already a transaction pending, the following happens. We recognize that a database transaction is pending, so do not start a new one. However, the Transaction keeps track of the "depth" of the nesting of ensureTransaction() scopes. The call to commit() again does nothing, but pops one level of nesting. The final outer commit() in the calling code commits the transaction.

In the case of any sort of error involving the database, it is almost always best to throw. With Postgres almost any error returned by JDBC causes the connection to be marked as unusable, so there is probably nothing the calling code can do to recover from the DB error, and keep going. Only the code with the outermost transaction may be able to "recover" by deciding to try the whole transaction over again (or report an error).

See class DbScope.TransactionImpl for more details.

DbSchema

DbSchema corresponds to a database schema, e.g. a collection of related tables. This class is usually the starting point for any code that wants to talk to the database. You can request a DbSchema via the static method DbSchema.get(schemaName, DbSchemaType). E.g. to get the core schema which contains the user, security, and container related tables you would use DbSchema.get("core", DbSchemaType.Module).

getScope() returns the scope that contains this schema, used for transaction management among other things.
getSqlDialect() is a helper class with methods to aid in writing cross-platform compatible sql.
getTableNames() returns a list of tables contained in this schema.
getTable(tablename) returns a TableInfo for the requested table. See below.

SqlFragment

SQL injection is a common type of vulnerability in web applications, allowing attackers to execute arbitrary SQL of their choice. Thus, it is best practice to use parameter markers in your SQL statements rather than trying to directly concatenate constants. SQLFragment simplifies this by carrying around the SQL text and the parameter values in one object. We use this class in many of our helpers. Here is an example:

SQLFragment select = new SQLFragment("SELECT *, ? as name FROM table", name);
SQLFragment where = new SQLFragment("WHERE rowid = ? and container = ?", 
    rowid, getContainer());
select.append("n");
select.append(where);

Reference: https://www.owasp.org/index.php/Top_10_2013-A1-Injection

SqlExecutor

We're ready to actually do database stuff. Let's update a row. Say we have a name and a rowid and we want to update a row in the database. We know we're going to start with our DbScope and using a SqlFragment is always a good idea. We also have a helper called SqlExecutor, so you don't have to deal with Connection and PreparedStatement. It also translates SqlException into a runtime exceptions

DbScope scope = DbScope.get("myschema");
SQLFragment update = new SQLFragment("UPDATE mytable SET name=? WHERE rowid=?", 
    name, rowid);
long count = new SqlExecutor(scope).execute(update);

Don't forget your container filter! And you might want to do more than one update in a single transaction… So...

DbScope scope = DbScope.get("myschema");
try (DbScope.Transaction tx = scope.ensureTransaction())
{
  SQLFragment update = new SQLFragment(
    "UPDATE mytable SET name=? WHERE rowid=? And container=?", 
    name, rowid, getContainer());
  long count = new SqlExecutor(scope).execute(update);
  // You can even write JDBC code here if you like
  Connection conn = scope.getConnection();
  // update some stuff
   ...
  tx.commit();
}

Reference: http://docs.spring.io/spring-framework/docs/current/javadoc-api/org/springframework/jdbc/support/SQLErrorCodeSQLExceptionTranslator.html

SqlSelector

SqlSelector is the data reading version of SqlExecutor. However, it's a bit more complicated because it does so many useful things. The basic pattern is

SQLFragment select = new SQLFragment("SELECT * FROM table WHERE container = ?",
   getContainer());
RESULT r = new SqlSelector(scope, select).getRESULT(..)

Where getRESULT is some method that formats the result in some useful way. Most of the interesting methods are specified by the Selector interface (implemented by both SqlSelector and TableSelector, explained below). Here are a few useful ones.

.exists() -- Does this query return 1 or more rows? Logically equivalent to .getRowCount() > 1 but .exists() is usually more concise and efficient.
.getRowCount() -- How many rows does this query return?
.getResultSet() -- return JDBC resultset (loaded into memory and cached by default). getResultSet(false), returns a "raw" uncached JDBC resultset.
.getObject(class) -- return a single value from a one-row result. Class can be an intrinsic like String.class for a one column result. It can be Map.class to return a map (name -> value) representing the row, or a java bean. E.g. MyClass.class. In this case, it is populated using reflection to bind column names to field names. You can also customize the bean construction code (see ObjectFactory, BeanObjectFactory, BuilderObjectFactory)

Reference: http://commons.apache.org/proper/commons-beanutils/javadocs/v1.9.3/apidocs/org/apache/commons/beanutils/package-summary.html#package.description

.getArrayList(class), .getCollection(class) -- like getObject(), but for queries where you expect more than one row.
.forEach(lambda) .forEachMap(lambda) -- pass in a function to be called to process each row of the result

getResultSet(false) and the forEach() variants stream results without pulling all the rows into memory, making them useful for processing very large results. getResultSet() brings the entire result set into memory and returns a result set that is disconnected from the underlying JDBC connection. This simplifies error handling, but is best for modest sized results. Other result-returning methods, such as getArrayList() and getCollection(), must load all the data to populate the data structures they return (although they are populated by a streaming result set, so only one copy exists at a time). Use the forEach() methods to process results, if at all possible; they are both efficient and easy to use (e.g., iterating and closing are handled automatically and no need to deal with checked SQLExceptions that are throws by every ResultSet method).

TableInfo

TableInfo captures metadata about a database table. The metadata includes information about the database storage of the table and columns (e.g. types and constraints) as well as a lot of information about how the table should be rendered by the LabKey UI.

Note that TableInfo objects that are returned by a UserSchema are virtual tables and may have arbitrarily complex mapping between the virtual or "user" view of the schema and the underlying physical database. Some may not have any representation in the database at all (see EnumTableInfo).

If you are writing SQL against known tables in a known schema, you may never need to touch a TableInfo object. However, if you want to query tables created by other modules, say the issues table or a sample type, you probably need to start with a TableInfo and use a helper to generate your select SQL. As for update, you probably should not be updating other module's tables directly! Look for a service interface or use QueryUpdateService (see below).

DbSchema schema = DbSchema.get("issues", DbSchemaType.Module);
SchemaTableInfo ti = schema.getTable("issues");

TableSelector

TableSelector is a sql generator QueryService.getSelectSQL() wrapped around SqlSelector. Given a TableInfo, you can execute queries by specifying column lists, filter, and sort in code.

DbSchema schema = DbSchema.get("issues", DbSchemaType.Module);
SchemaTableInfo ti = schema.getTable("issues");
new TableSelector(ti,
       PageFlowUtil.set("IssueId","Title"),
       new SimpleFilter(new FieldKey(null,"container"),getContainer()),
       new Sort("IssueId")
);

Table

The poorly named Table class has some very helpful utilities for inserting, updating and deleting rows of data while maintaining basic LabKey semantics. It automatically handles the columns created, createdby, modified, and modifiedby. For simple UI-like interactions with the database (e.g. updates in response to a user action) we strongly recommend using these methods. Either that or go through the front door via QueryUpdateService which is how our query apis are implemented.

.insert() insert a single row with data in a Map or a java bean.
.update() update a single row with data in a Map or a java bean
.delete() delete a single row
.batchExecute() this helper can be used to execute the same statement repeatedly over a collection of data. Note, that we have a lot of support for fast data import, so don't reach for this method first. However, it can be useful.

QueryService.getSelectSQL()

QueryService is the primary interface to LabKey SQL functionality. However, QueryService.getSelectSQL() was migrated from the Table class in the hope of one day integrating that bit of SQL generation with the LabKey SQL compiler (the dream lives). getSelectSql() is still a stand-alone helper for generating select statements for any TableInfo. When using this method, you are still responsible for providing all security checks and adding the correct container filter.

High Level API

If you are writing custom APIs, it is just as likely that you are trying to modify/combine/wrap existing functionality on existing schemas and tables. If you are not updating tables that you own, e.g. created by SQL scripts in your own module, you should probably avoid trying to update those tables directly. Doing so may circumvent any special behavior that the creator of those tables relies on, it may leave caches in an inconsistent state, and may not correctly enforce security. In cases like this, you may want to act very much as the "user", by accessing high-level APIs with the same behavior as if the user executed a javascript API.

In addition, to QueryUpdateService as documented below, also look for public services exposed by other modules. E.g.

ExperimentService
StudyService
SearchService
QueryService
ContainerManager (not strictly speaking a service)

QueryUpdateService

Also not really a service, this isn't a global singleton. For TableInfo objects returned by user schemas, QueryUpdateService is the half of the interface that deals with updating data. It can be useful to just think of the dozen or so methods on this interface as belonging to TableInfo. Note, not all tables visible in the schema browser support insert/update/delete via QueryUpdateService. If not, there is probably an internal API that can be used instead. For instance you can't create a folder by calling insert on the core.containers table. You would use ContainerManager.createContainer().

Here is an example of inserting one row QueryUpdateService()

UserSchema lists = DefaultSchema.get(user, c).getSchema("lists")
TableInfo mylist = lists.getTable("mylist");
Map<String,Object> row = new HashMap<>();
row.put("title","string");
row.put("value",5);
BatchValidationException errors = new BatchValidationException();
mylist.getUpdateService().insertRows(user, c, Arrays.asList(row), errors, null, null);
if (errors.hasErrors())
   throw errors;

.insertRows() -- insert rows, and returns rows with generated keys (e.g. rowids)
.importRows() -- like insert, but may be faster. Does not reselect rowids.
.mergeRows() -- only supported by a few table types
.updateRows() -- updates specified rows
.deleteRows() -- deletes specified rows
.truncateRows() -- delete all rows

LabKey Support

LabKey Support