NCBI database - annotatins problem?

CPAS Forum (Inactive)
NCBI database - annotatins problem? weiclav  2010-12-20 01:05
Status: Closed
 
Hi everyone,

we succesfully use Labkey server for searching MS/MS spectra against UniProt (SwissProt) and also against our local database using in house Mascot server. However there is a problem when I try to search against NCBInr database (when searching against the whole database and also in the case of taxonomy restrected search; please see attached log file).

The search itself proceeds without problem on Mascot side (I can go through the results of the search manualy) and the database is correctly downloaded by Labkey server if necessary. The problem is when the server is going to replace proteins IDs by parsing through the database. The mentioned error is "Fail to get line from sequence database".

The database is about 6GB big and is mentioned only in "Fasta File" table on the "Protein Database" site inside admin console. The database is not mentioned in the "Protein Annotations Loaded" table.

I tried to manualy load annotations for the database but without any luck. After submission of the database to annotations load, searver starts to read the database (I hope se, HDD diod blinks :) ) and after a couple of minutes (about 10-20 minutes) HDD stops and only CPU is probably processing the database (tomcat5.exe process is using one CPU core). But even after the weekend the database did not show up in the "Protein Annotations Loaded" table (tomcat5.exe process do not show any activity but it has allocated about 1024MB). When I observed the problem first time I increased Max Heap Size to 1024 MB but it has no effect on annotations load or MS/MS search problem.

Could be the problem with MS/MS search against NCBI database connected with missing annotations?

Thanks in advance for any help.

Best regards,
David

PC specs (Mascot server is running on the same PC):
WinXP-Prof, 64-bit
2 CPUs
4 cores/CPU
16GB RAM
Labkey server version 10.20
 
 
jeckels responded:  2010-12-20 11:02
Hi David,

It looks like the job failed due to a problem running Mascot2XML, which converts from Mascot's native output format to pepXML. I took a quick look at the code but it wasn't clear where the problem was. It's possible that it doesn't like FASTA files that are larger than 4GB. Someone on the TPP mailing list can probably provide better assistance:

http://groups.google.com/group/spctools-discuss

As far as the FASTA import into LabKey Server, is the database still using significant CPU? Can you go to Admin->Admin Console->Running Threads and attach the resulting page to a message? That will show whether the web server is still waiting for the database to finish the load.

Thanks,
Josh
 
weiclav responded:  2010-12-21 02:10
Hi Josh,

thanks for your quick response! I am trying to load annotations for uniprot database in xml format now so I am not able to test NCBI database again. But I think I am in the similar situation with uniprot database as in the case of NCBI database.

Yesterday I started to load annotations for uniprot database in xml format (4.6GB). Up to now I worked only with uniprot database in fasta format that is significantly smaller (about 230MB). The database show up in the "Protein Annotations Loaded" table (this is different in comparison to loading annotations for NCBI database - NCBI database did not even show up there...) and the number of processed records were rising. Now, the number of processed records is not changing (still on 175000). The tompact5.exe process is on its max memory usage (1,153,296K; Max Heap Size is set to 1024MB). It seems to me that this particular problem is connected with Max Heap size. Do I have to have set Max Heap Size larger than database I am going to load? Enclosed please find running processes.

Regarding the Mascot search problem, I am going to try search the same data with XTandem! against the same NCBI database and see whether there is the same problem. I hope I will post results today.

Thanks again for your time!
David

EDIT: "Recover" button helped and now uniprot database processing is running again. I will try NCBI database again once the processing of uniprot database is finished. Unfortunately this can not be applied to NCBI database because I can not see it in the table... Thanks for your time!
 
jeckels responded:  2010-12-21 13:39
Hi David,

Looking at the information that you posted, the server wasn't loading any FASTA or Uniprot XML files at the time.

There may be errors in the log file about what caused the import to fail. Can you post the contents of <TOMCAT_HOME>/logs/labkey-errors.log? It's also available through the web site under Admin->Admin Comsole->View All Site Errors.

The Java virtual machine rarely returns memory to the operating system, even when it's not actively using it. If the memory usage temporarily spiked to the point where it was using the full 1 GB heap, it would most likely continue to show that level of usage in Task Manager.

You can gain more insight into the Java heap through Admin->Admin Console->Memory Usage. The top graph will show actual heap usage relative to the total and maximum size.

Thanks,
Josh
 
weiclav responded:  2010-12-22 03:50
Hi Josh,

enclosed please find error log entries that should be related with the problem of annotations loading from NCBI database. It looks like Max Heap Size of 1024MB is not enough. Is that right? My guess is that the first batch of NCBI database records is too large to be processed with Heap size of 1024MB.

Thanks againg!
David

Btw: server is still annotating the uniprot database so I try to increase max heap size and load NCBI database again most probably tomorrow
 
jeckels responded:  2010-12-22 06:58
Hi David,

Yes, your server ran out of heap space. It's not clear if it's due solely to the annotation import, or if there was something running in the background that was also consuming space, like the full-text search indexer. Increasing the heap size may help eliminate this in the future. Please note that if you are running a 32-bit Java VM, you can't have a heap larger than about 1.4GB.

If you also add the -XX:+HeapDumpOnOutOfMemoryError startup argument to your Java VM when you increase the heap size, it will dump the contents of the heap if it runs out in the future. We can then analyze it to determine the cause and hopefully fix the problem.

Thanks,
Josh
 
weiclav responded:  2011-03-01 23:31
Hi Josh,

sorry for such a late response...

I tried to increase max heap size up to 1300 MB and it still reports the same error during the Mascot (or X!Tandem) search against a NCBI database. I am also not able to load annotations for the database using "load annotation" site.

Recently I updated to Labkey server 10.3 and the behaviour is the same.

I already get memory dump as you mentioned but with 10.2 version. I try to gen new one with the 10.3 version of Labkey server.

Thanks a lot!
David
 
jeckels responded:  2011-03-14 17:46
Title: NCBI database - annotations problem?
Hi David,

A memory dump from 10.2 should be similar enough to 10.3 to be informative. If so, can you Zip up the file and let me know how big it is so I can create a reasonable drop point for it?

Thanks,
Josh