[maker-devel] Status check?
Daniel Standage
byuhobbes at gmail.com
Thu Oct 22 10:48:07 MDT 2009
My test runs with Maker also suggest that RepeatMasker is taking up the
majority of time. We are using cross_match since we don't have access to
wublast.
Daniel Standage
Plant Genetics Lab
Brigham Young University
On Thu, Oct 22, 2009 at 10:45 AM, Carson Holt <carson.holt at genetics.utah.edu
> wrote:
> I’m also surprised you said that RepeatMasker is taking so long. I’m
> wondering if this is related to cross_match as wublast always seems to
> scream through this step. The suggestion to switch from ‘all’ to
> ‘Drosophila’ seems like a good start, but I’d like to check some other
> things as well. Is there anyway you could provide me with your datasets and
> maker control files? You could try an online storage site like
> http://www.getdropbox.com/ which gives you 2 Gb of free storage online.
> They also let you share folders. You could then send me a link to the
> shared folder.
>
> Thanks,
> Carson
>
>
>
> On 10/22/09 5:38 AM, "Xavier Watkins" <xavier at flymine.org> wrote:
>
> Hi,
> Thanks everyone for your help. I'm already running mpi_maker, planning
> on using more CPUs next time...
>
> I'm using the following:
>
> RepeatMasker with cross_match (running cross match seems to be the bit
> that's taking ages)
>
> snap version 2006-07-28
> GeneMarkS
> NCBI blastall 2.2.20 for blastx
>
> Not really sure what happens in RepeatMasker but from what I
> understand it tries to blast the genome against itself to build a
> library of possible repeats? Is there a way of building this library
> of repeats in a more efficient way?
>
> Many thanks,
> Xavier
>
>
> On 21 Oct 2009, at 16:07, Mark Yandell wrote:
>
> >
> > Hi Xavier,
> >
> > I agree: This seems way to long. I can basically reproduce flybase's
> > annotations and blast data in about 3 days on my laptop-- so 2 weeks
> > on 5 processor's seems way too long.
> >
> > Is there some. special, really huge dataset you are running? Are you
> > doing TBLASTX to align hits from a large database of sequences?
> >
> > --mark
> >
> > Mark Yandell
> > Associate Professor of Human Genetics
> > Eccles Institute of Human Genetics
> > University of Utah
> > 15 North 2030 East, Room 2100
> > Salt Lake City, UT 84112-5330
> > ph:801-587-7707
> > ________________________________________
> > From: maker-devel-bounces at yandell-lab.org [
> maker-devel-bounces at yandell-lab.org
> > ] On Behalf Of Carson Holt
> > Sent: Wednesday, October 21, 2009 8:57 AM
> > To: Xavier Watkins; maker-devel at yandell-lab.org
> > Subject: Re: [maker-devel] Status check?
> >
> > The time spent depends primarily on the size of the protein, EST,
> > and repeat protein databases provided. BLAST actually makes up
> > about 90% of the run time for MAKER. If your using 5 processors, I
> > suggest using mpi_maker instead of regular maker. It gets better
> > performance on multiprocessor systems. The number of slices is
> > dependant on what you set max_dna_len to be in the maker_opt.ctl
> > file. Increasing the number increases memory usage. Just divide
> > the contig length by that number. Doing a test run on the entire
> > Drosophila genome could take a while especially if you used large
> > protein and EST databases for the analysis. It is 120 Megabases in
> > size, and with the default max_dna_len of 100,000, it would be
> > divided into 1,200 chunks. It could take anywhere from 4 days to 3
> > weeks depending on the BLAST databases used.
> >
> > I guess Barry already answered the question on how to check on run
> > status. Individual contigs also create a file called run.log.
> > These will be under theVoid directory for each individual contig in
> > the MAKER datastore directory. These files also contain entries
> > with labels like STARTED and FINISHED for each individual analysis.
> > The master_datastore_index.log file has status tags for entire
> > contigs as apposed to individual analyses.
> >
> > I hope that helps. Let us know how it goes.
> >
> > Thanks,
> > Carson
> >
> >
> > On 10/21/09 3:21 AM, "Xavier Watkins" <xavier at flymine.org> wrote:
> >
> > Hi,
> > I'm currently doing a test run of Maker on the D. mel genome and I
> > would like to estimate the time it takes to run on our system (it
> > has now been running for 2 weeks on 5 processors).
> > Is there a way to know how many processes are left to run when
> > running MAKER, or to know which contigs (chromosomes in my case)
> > have finished running? From what I see it chops up the contigs into
> > slices when running RepeatMasker (currently on .151) is there a way
> > to know the total number of slices?
> >
> > Apologies if I've missed this info in the documentation, I couldn't
> > find it.
> >
> > All the best,
> > Xavier
> >
> >
> >
>
>
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at yandell-lab.org
> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20091022/b23daee4/attachment-0001.html>
More information about the maker-devel
mailing list