Register Protein Sequences

2024-04-18

Premium Feature — Available with LabKey Biologics LIMS. Learn more or contact LabKey.

This topic covers how to register a new protein sequence using the graphical user interface. To register entities in bulk via file import, see Create Registry Sources. To register entities using the API, or to bulk import sequences from an Excel spreadsheet, see Use the Registry API.

You can enter the Protein Sequence wizard in a number of ways:
  • Via the nucleotide sequence wizard. When registering a nucleotide sequence, you have the option of continuing on to register the corresponding protein sequence.
  • Via the header bar. Select Registry > Protein Sequences.
    • Select Add > Add manually.

Protein Sequence Wizard

The wizard for registering a new protein sequence proceeds through five tabs:

Details

  • Name: Provide a name, or one will be generated for you. Hover to see the naming pattern
  • Description: (Optional) A text description of the sequence
  • Alias: (Optional) List one or more aliases. Type a name, click enter when complete. Continue to add more as needed.
  • Organisms: (Optional) Start typing the organism name to narrow the pulldown menu of options. Multiple values are accepted.
  • Protein Sequence Parents: (Optional) List parent component(s) for this sequence. Start typing to narrow the pulldown menu of options.
  • Seq Part: (Optional) Indicates this sequence can be used as part of a larger sequence. Accepted values are 'Leader', 'Linker', and 'Tag'. When set, chain format must be set to 'SeqPart'.
Click Next to continue.

Sequence

On the sequence tab, you can translate a protein sequence from a nucleotide sequence as outlined below. If you prefer to manually enter a protein sequence from scratch click Manually add a sequence at the bottom.

  • Nucleotide Sequence: (Optional) The selection made here will populate the left-hand text box with the nucleotide sequence.
  • Translation Frame: (Required). The nucleotide sequence is translated into the protein sequence (which will be shown in the right-hand text box) by parsing it into groups of three. The selection of translation frame determines whether the first second or third nucleotide in the series 'heads' the first group of three. Options: 1,2,3.
  • Sequence Length: This value is based on the selected nucleotide sequence.
  • Nucleotide Start: This value is based on the nucleotide sequence and the translation frame.
  • Nucleotide End: This value is based on the nucleotide sequence and the translation frame.
  • Translated Sequence Length: This value is based on the nucleotide sequence and the translation frame.
  • Protein Start: Specific the start location of the protein to be added to the registry.
  • Protein End: Specific the end location of the protein to be added to the registry.

Click Next to continue.

Annotations

The annotations tab displays any matching annotations found in the annotation library. You can also add annotations manually at this point in the registration wizard.

  • Name: a freeform name
  • Type: for example, Leader, Variable, Tag, etc. Start typing to narrow the menu options.
  • Category: 'Feature' or 'Region'
  • Description: (Optional)
  • Start and End Positions: 1-based offsets within the sequence
Editing is not allowed at this point, but you can edit annotations after the registration wizard is complete.

Suggested annotations can be “removed” by clicking the red icons in the grid panel. They can also be added back using the green icon if the user changes their mind.

For complete details on using the annotation panel see Protein Sequence Annotations.

Click Next to continue the wizard.

Properties

  • Chain Format: select a chain format from the dropdown (start typing to filter the list of options). An administrator defines the set of options on the ChainFormats list. LabKey Biologics will attempt to classify the protein's chain format if possible.
  • ε: the extinction coefficient
  • Avg. Mass The average mass
  • Num. S-S The number of disulfide bonds
  • pI The isoelectric point
  • Num Cys. The number of cysteine elements

Default or best guess values may prepopulate the wizard, but can be edited as needed.

Click Next to continue.

Confirm

The Confirm panel provides a summary of the protein about to be added to the registry.

Click Finish to add the protein to the registry.

Editing Protein Sequence Fields

Once you have defined a protein sequence, you can locate it in a grid and click the name to reopen to see the details. Some fields are eligible for editing. Those that are "in use" by the system or other entities cannot be changed. All edits are logged.

Related Topics