[maker-devel] Status check?

Xavier Watkins xavier at flymine.org
Thu Oct 22 05:38:27 MDT 2009


Hi,
Thanks everyone for your help. I'm already running mpi_maker, planning  
on using more CPUs next time...

I'm using the following:

RepeatMasker with cross_match (running cross match seems to be the bit  
that's taking ages)

snap version 2006-07-28
GeneMarkS
NCBI blastall 2.2.20 for blastx

Not really sure what happens in RepeatMasker but from what I  
understand it tries to blast the genome against itself to build a  
library of possible repeats? Is there a way of building this library  
of repeats in a more efficient way?

Many thanks,
Xavier


On 21 Oct 2009, at 16:07, Mark Yandell wrote:

>
> Hi Xavier,
>
> I agree: This seems way to long. I can basically reproduce flybase's  
> annotations and blast data in about 3 days on my laptop-- so 2 weeks  
> on 5 processor's seems way too long.
>
> Is there some. special, really huge dataset you are running? Are you  
> doing TBLASTX to align hits from a large database of sequences?
>
> --mark
>
> Mark Yandell
> Associate Professor of Human Genetics
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> ph:801-587-7707
> ________________________________________
> From: maker-devel-bounces at yandell-lab.org [maker-devel-bounces at yandell-lab.org 
> ] On Behalf Of Carson Holt
> Sent: Wednesday, October 21, 2009 8:57 AM
> To: Xavier Watkins; maker-devel at yandell-lab.org
> Subject: Re: [maker-devel] Status check?
>
> The time spent depends primarily on the size of the protein, EST,  
> and repeat protein databases provided.  BLAST actually makes up  
> about 90% of the run time for MAKER.  If your using 5 processors, I  
> suggest using mpi_maker instead of regular maker.  It gets better  
> performance on multiprocessor systems.  The number of slices is  
> dependant on what you set max_dna_len to be in the maker_opt.ctl  
> file.  Increasing the number increases memory usage.  Just divide  
> the contig length by that number.  Doing a test run on the entire  
> Drosophila genome could take a while especially if you used large  
> protein and EST databases for the analysis.  It is 120 Megabases in  
> size, and with the default max_dna_len of 100,000,  it would be  
> divided into 1,200 chunks.  It could take anywhere from 4 days to 3  
> weeks depending on the BLAST databases used.
>
> I guess Barry already answered the question on how to check on run  
> status.  Individual contigs also create a file called run.log.   
> These will be under theVoid directory for each individual contig in  
> the MAKER datastore directory.  These files also contain entries  
> with labels like STARTED and FINISHED for each individual analysis.   
> The master_datastore_index.log file has status tags for entire  
> contigs as apposed to individual analyses.
>
> I hope that helps.  Let us know how it goes.
>
> Thanks,
> Carson
>
>
> On 10/21/09 3:21 AM, "Xavier Watkins" <xavier at flymine.org> wrote:
>
> Hi,
> I'm currently doing a test run of Maker on the D. mel genome and I  
> would like to estimate the time it takes to run on our system (it  
> has now been running for 2 weeks on 5 processors).
> Is there a way to know how many processes are left to run when  
> running MAKER, or to know which contigs (chromosomes in my case)  
> have finished running? From what I see it chops up the contigs into  
> slices when running RepeatMasker (currently on .151) is there a way  
> to know the total number of slices?
>
> Apologies if I've missed this info in the documentation, I couldn't  
> find it.
>
> All the best,
> Xavier
>
>
>




More information about the maker-devel mailing list