Hi all,
I'm trying to load protein annotation file uniprot_sprot.xml (over 2GB file) using "Load New Annot File" button in LabKey 2.2. After about 5 minutes the following error appears in Diagnostics log:
ERROR MLProteinHandler 2007-09-26 15:58:22,942 AnnotLoader8 : Final table insert failed in uniprot's endElement: org.postgresql.util.PSQLException: ERROR: current transaction is aborted, commands ignored until end of transaction block
Any suggestions? Thank you,
Tomas
BTW: I had no problem loading test annotation "yeast_a.xml" that is a part of CPASdemo.zip package. |
|
jeckels responded: |
2007-09-28 15:17 |
Tomas,
Are there any other error messages in the log? The one that you included above is a secondary failure, so hopefully there's some other output that will indicate what the first failure was.
Thanks,
Josh |
|
trejtar responded: |
2007-09-29 06:26 |
Hi Josh,
I couldn't find any other error messages. So,I repeated loading of the annotation but this time I followed "running threads" under Diagnostics:
After initiation of loading a new thread appears:
AnnotLoader5 (RUNNABLE)
at org.apache.xerces.util.SymbolTable.addSymbol(Unknown Source)
at org.apache.xerces.impl.XMLEntityScanner.scanQName(Unknown Source)
at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanAttribute(Unknown Source)
at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.labkey.ms2.protein.XMLProteinHandler.parse(XMLProteinHandler.java:314)
at org.labkey.ms2.protein.XMLProteinLoader.parseFile(XMLProteinLoader.java:114)
at org.labkey.ms2.protein.AnnotationUploadManager$AnnotationLoadJob.run(AnnotationUploadManager.java:132)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown Source)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
and it seams that the file is being properly parsed. After 5 minutes this thread changes to:
AnnotLoader5 (WAITING)
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(Unknown Source)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(Unknown Source)
at java.util.concurrent.DelayQueue.take(Unknown Source)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(Unknown Source)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
And at the same time error I originally posted appears in the site error log:
ERROR MLProteinHandler 2007-09-29 8:18:22,942 AnnotLoader5 : Final table insert failed in uniprot's endElement: org.postgresql.util.PSQLException: ERROR: current transaction is aborted, commands ignored until end of transaction block
I'm guessing that at some point the annotation file cannot be parser - it's damaged, incompatible format?? I tried to upload the file again from ExPASy server but MD5 is the same, so the file should be OK.
I appreciate your help.
Tomas |
|
jeckels responded: |
2007-10-01 16:31 |
Tomas,
I looked into the relevant code and its error handling is in need of some significant improvements, which is why you're not getting a more useful message in the log. I'll try to address this more fully in 2.3, but I'm attaching a special MS2 module for 2.2 that will hopefully provide more detailed information.
Please back up your existing ms2.module file from your existing <LABKEY_INSTALLATION_ROOT>/modules directory and replace it with this one. Then try rerunning the upload and check the log file for messages. Note that this won't actually fix the problem, but it might provide enough information to diagnose what the problem is.
Thanks,
Josh |
|
|
trejtar responded: |
2007-10-02 15:03 |
Assigned To: jeckels |
Hi Josh,
thanks for your help. Attached is error log generated with the new ms2.module:
Hope it helps.
Tomas |
|
|
jeckels responded: |
2007-10-03 13:20 |
Are there any other error messages earlier in the log file? I'm guessing that there may be a different message that started the whole slew of errors. If it's not in the labkey.log file, it might be in labkey.log.1 (since we rotate one log file's worth of output after it fills up).
Thanks,
Josh |
|
trejtar responded: |
2007-10-04 06:12 |
Josh,
I cleaned log files and imported SwissProt annotation again. The resulting log files are attached.
Thank you,
Tomas |
|
|
jeckels responded: |
2007-10-04 11:49 |
The problem is that there is an entry in your XML file with a common name, "Marinobacter hydrocarbonoclasticus (strain DSM 11845)", that is too long for our current database column. It's 53 characters long and we can only handle 50 characters. Inserting it into the database fails, which then puts everything into a bad state.
In 2.3 we can change the database schema to allow for longer names. Until then, you should be able to load the file (or at least get further) if you edit the file to shorten the common name for that entry. Is that a reasonable workaround for you until 2.3 is released in December?
Thanks,
Josh |
|
trejtar responded: |
2007-10-04 15:00 |
Josh,
sounds good. I'll write a perl script to trim organism names to 50 characters.
Alternatively, I could upload older swissprot annotation - which worked OK in our older cpas install.
And if all fails, December is not that far away...
Thanks for your help.
Cheers,
Tomas |
|
slottad responded: |
2007-10-05 22:36 |
Being the impatient sort, I created a python script to trim the organism names.
It seems to be working for me, 9 batches so far (obviously I am too impatient to wait for completion).
Here it is for the world to enjoy.
Sample usage: trim_names.py uniprot_sprot.xml > uniprot_sprot_trim.xml
This should work for the TREMBL file as well, but I have not tested it yet (re: impatient, above)
Also, when you fix this in 2.3, it might be best, in addition to expanding the field size, to also check for the length input data so as to not fail when some other field exceeds capacity in the future (as I am sure it will).
Douglas |
|
|
|
|