[maker-devel] Status check?

Carson Holt carson.holt at genetics.utah.edu
Thu Oct 22 10:45:10 MDT 2009


I'm also surprised you said that RepeatMasker is taking so long.  I'm wondering if this is related to cross_match as wublast always seems to scream through this step.  The suggestion to switch from 'all' to 'Drosophila' seems like a good start, but I'd like to check some other things as well.  Is there anyway you could provide me with your datasets and maker control files?  You could try an online storage site like http://www.getdropbox.com/ which gives you 2 Gb of free storage online.  They also let you share folders.  You could then send me a link to the shared folder.

Thanks,
Carson


On 10/22/09 5:38 AM, "Xavier Watkins" <xavier at flymine.org> wrote:

Hi,
Thanks everyone for your help. I'm already running mpi_maker, planning
on using more CPUs next time...

I'm using the following:

RepeatMasker with cross_match (running cross match seems to be the bit
that's taking ages)

snap version 2006-07-28
GeneMarkS
NCBI blastall 2.2.20 for blastx

Not really sure what happens in RepeatMasker but from what I
understand it tries to blast the genome against itself to build a
library of possible repeats? Is there a way of building this library
of repeats in a more efficient way?

Many thanks,
Xavier


On 21 Oct 2009, at 16:07, Mark Yandell wrote:

>
> Hi Xavier,
>
> I agree: This seems way to long. I can basically reproduce flybase's
> annotations and blast data in about 3 days on my laptop-- so 2 weeks
> on 5 processor's seems way too long.
>
> Is there some. special, really huge dataset you are running? Are you
> doing TBLASTX to align hits from a large database of sequences?
>
> --mark
>
> Mark Yandell
> Associate Professor of Human Genetics
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> ph:801-587-7707
> ________________________________________
> From: maker-devel-bounces at yandell-lab.org [maker-devel-bounces at yandell-lab.org
> ] On Behalf Of Carson Holt
> Sent: Wednesday, October 21, 2009 8:57 AM
> To: Xavier Watkins; maker-devel at yandell-lab.org
> Subject: Re: [maker-devel] Status check?
>
> The time spent depends primarily on the size of the protein, EST,
> and repeat protein databases provided.  BLAST actually makes up
> about 90% of the run time for MAKER.  If your using 5 processors, I
> suggest using mpi_maker instead of regular maker.  It gets better
> performance on multiprocessor systems.  The number of slices is
> dependant on what you set max_dna_len to be in the maker_opt.ctl
> file.  Increasing the number increases memory usage.  Just divide
> the contig length by that number.  Doing a test run on the entire
> Drosophila genome could take a while especially if you used large
> protein and EST databases for the analysis.  It is 120 Megabases in
> size, and with the default max_dna_len of 100,000,  it would be
> divided into 1,200 chunks.  It could take anywhere from 4 days to 3
> weeks depending on the BLAST databases used.
>
> I guess Barry already answered the question on how to check on run
> status.  Individual contigs also create a file called run.log.
> These will be under theVoid directory for each individual contig in
> the MAKER datastore directory.  These files also contain entries
> with labels like STARTED and FINISHED for each individual analysis.
> The master_datastore_index.log file has status tags for entire
> contigs as apposed to individual analyses.
>
> I hope that helps.  Let us know how it goes.
>
> Thanks,
> Carson
>
>
> On 10/21/09 3:21 AM, "Xavier Watkins" <xavier at flymine.org> wrote:
>
> Hi,
> I'm currently doing a test run of Maker on the D. mel genome and I
> would like to estimate the time it takes to run on our system (it
> has now been running for 2 weeks on 5 processors).
> Is there a way to know how many processes are left to run when
> running MAKER, or to know which contigs (chromosomes in my case)
> have finished running? From what I see it chops up the contigs into
> slices when running RepeatMasker (currently on .151) is there a way
> to know the total number of slices?
>
> Apologies if I've missed this info in the documentation, I couldn't
> find it.
>
> All the best,
> Xavier
>
>
>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20091022/2d7cb1e3/attachment.html>


More information about the maker-devel mailing list