[maker-devel] Status check?

Mark Yandell myandell at genetics.utah.edu
Wed Oct 21 09:07:36 MDT 2009


Hi Xavier,

I agree: This seems way to long. I can basically reproduce flybase's annotations and blast data in about 3 days on my laptop-- so 2 weeks on 5 processor's seems way too long.

Is there some. special, really huge dataset you are running? Are you doing TBLASTX to align hits from a large database of sequences?

--mark

Mark Yandell
Associate Professor of Human Genetics
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
ph:801-587-7707
________________________________________
From: maker-devel-bounces at yandell-lab.org [maker-devel-bounces at yandell-lab.org] On Behalf Of Carson Holt
Sent: Wednesday, October 21, 2009 8:57 AM
To: Xavier Watkins; maker-devel at yandell-lab.org
Subject: Re: [maker-devel] Status check?

The time spent depends primarily on the size of the protein, EST, and repeat protein databases provided.  BLAST actually makes up about 90% of the run time for MAKER.  If your using 5 processors, I suggest using mpi_maker instead of regular maker.  It gets better performance on multiprocessor systems.  The number of slices is dependant on what you set max_dna_len to be in the maker_opt.ctl file.  Increasing the number increases memory usage.  Just divide the contig length by that number.  Doing a test run on the entire Drosophila genome could take a while especially if you used large protein and EST databases for the analysis.  It is 120 Megabases in size, and with the default max_dna_len of 100,000,  it would be divided into 1,200 chunks.  It could take anywhere from 4 days to 3 weeks depending on the BLAST databases used.

I guess Barry already answered the question on how to check on run status.  Individual contigs also create a file called run.log.  These will be under theVoid directory for each individual contig in the MAKER datastore directory.  These files also contain entries with labels like STARTED and FINISHED for each individual analysis.  The master_datastore_index.log file has status tags for entire contigs as apposed to individual analyses.

I hope that helps.  Let us know how it goes.

Thanks,
Carson


On 10/21/09 3:21 AM, "Xavier Watkins" <xavier at flymine.org> wrote:

Hi,
I'm currently doing a test run of Maker on the D. mel genome and I would like to estimate the time it takes to run on our system (it has now been running for 2 weeks on 5 processors).
Is there a way to know how many processes are left to run when running MAKER, or to know which contigs (chromosomes in my case) have finished running? From what I see it chops up the contigs into slices when running RepeatMasker (currently on .151) is there a way to know the total number of slices?

Apologies if I've missed this info in the documentation, I couldn't find it.

All the best,
Xavier





More information about the maker-devel mailing list