[maker-devel] long running maker question

Chris Wilks cwilks at stanford.edu
Wed Nov 4 19:45:06 MST 2009


Hi,

We're running Maker right now for the alignment of cross species est and cdna sequences to the Arabidopsis thaliana genome.
We've successfully run it recently on protein sequences (both same species and plant species in general).  

However, with the cross species nucleotide sequences we were experiencing a very long running time for the assignments on chromosome 1 (> 96 hours @ 100% cpu, on 8 concurrent threads), much of it appearing to be tblastx.
Then, the job finished, and posted that all chromosomes had been started and died (and retried and died).  

>From what I've seen, the job dying part is probably due to running out of /tmp space, so I reset that parameter to not use the system /tmp and use one which should have much more space to fill.  However, I'm still concerned that long running behavior I saw with tblastx will still occur.

I should also note that we've already generated the repeats, so this job is not re-running them (we're feeding them in from a maker generated gff file using the rm_gff setting in the maker_opts.ctl file).

I picked up the latest version (as of October 6th) and I have the quick and dirty installation of Bioperl 1.6.
Our cross species fasta file is ~1.8 Gigabytes large and we're running it against a ~125Mbps genome.
There are 2,684,575 sequences in the cross species fasta file.

So I'm wondering 1) if this is just normal running times for large files and we have to live with it, or 2) whether we can shorten this up and/or there's something anomalous going on?

Thanks,
Chris Wilks
TAIR

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20091104/98b96eaf/attachment.html>


More information about the maker-devel mailing list