[maker-devel] Status check?

Xavier Watkins xavier at flymine.org
Thu Oct 22 11:05:37 MDT 2009


You should have got an email with the dropbox url.
Unfortunately was not able to get wublast before the change to AB-Blast.

Thanks,
Xavier

On 22 Oct 2009, at 17:53, Carson Holt wrote:

> I think it would still be useful.  Since that one change seemed to  
> make all the difference, I’ll run some tests comparing cross_match  
> and wublast under different configurations on our system.
>
> Thanks,
> Carson
>
>
>
> On 10/22/09 10:50 AM, "Xavier Watkins" <xavier at flymine.org> wrote:
>
> Switching to "Drosophila" made all the difference, it's now whizzing  
> through the analysis! Did you still want me to upload the files to  
> my dropbox?
>
> Many thanks,
> Xavier
>
> On 22 Oct 2009, at 17:45, Carson Holt wrote:
>
> I’m also surprised you said that RepeatMasker is taking so long.   
> I’m wondering if this is related to cross_match as wublast always  
> seems to scream through this step.  The suggestion to switch from  
> ‘all’ to ‘Drosophila’ seems like a good start, but I’d like to check  
> some other things as well.  Is there anyway you could provide me  
> with your datasets and maker control files?  You could try an online  
> storage site like http://www.getdropbox.com/ which gives you 2 Gb of  
> free storage online.  They also let you share folders.  You could  
> then send me a link to the shared folder.
>
>  Thanks,
>  Carson
>
>
>  On 10/22/09 5:38 AM, "Xavier Watkins" <xavier at flymine.org> wrote:
>
>
> Hi,
>  Thanks everyone for your help. I'm already running mpi_maker,  
> planning
>  on using more CPUs next time...
>
>  I'm using the following:
>
>  RepeatMasker with cross_match (running cross match seems to be the  
> bit
>  that's taking ages)
>
>  snap version 2006-07-28
>  GeneMarkS
>  NCBI blastall 2.2.20 for blastx
>
>  Not really sure what happens in RepeatMasker but from what I
>  understand it tries to blast the genome against itself to build a
>  library of possible repeats? Is there a way of building this library
>  of repeats in a more efficient way?
>
>  Many thanks,
>  Xavier
>
>
>  On 21 Oct 2009, at 16:07, Mark Yandell wrote:
>
>  >
>  > Hi Xavier,
>  >
>  > I agree: This seems way to long. I can basically reproduce  
> flybase's
>  > annotations and blast data in about 3 days on my laptop-- so 2  
> weeks
>  > on 5 processor's seems way too long.
>  >
>  > Is there some. special, really huge dataset you are running? Are  
> you
>  > doing TBLASTX to align hits from a large database of sequences?
>  >
>  > --mark
>  >
>  > Mark Yandell
>  > Associate Professor of Human Genetics
>  > Eccles Institute of Human Genetics
>  > University of Utah
>  > 15 North 2030 East, Room 2100
>  > Salt Lake City, UT 84112-5330
>  > ph:801-587-7707
>  > ________________________________________
>  > From: maker-devel-bounces at yandell-lab.org [maker-devel-bounces at yandell-lab.org
>  > ] On Behalf Of Carson Holt
>  > Sent: Wednesday, October 21, 2009 8:57 AM
>  > To: Xavier Watkins; maker-devel at yandell-lab.org
>  > Subject: Re: [maker-devel] Status check?
>  >
>  > The time spent depends primarily on the size of the protein, EST,
>  > and repeat protein databases provided.  BLAST actually makes up
>  > about 90% of the run time for MAKER.  If your using 5 processors, I
>  > suggest using mpi_maker instead of regular maker.  It gets better
>  > performance on multiprocessor systems.  The number of slices is
>  > dependant on what you set max_dna_len to be in the maker_opt.ctl
>  > file.  Increasing the number increases memory usage.  Just divide
>  > the contig length by that number.  Doing a test run on the entire
>  > Drosophila genome could take a while especially if you used large
>  > protein and EST databases for the analysis.  It is 120 Megabases in
>  > size, and with the default max_dna_len of 100,000,  it would be
>  > divided into 1,200 chunks.  It could take anywhere from 4 days to 3
>  > weeks depending on the BLAST databases used.
>  >
>  > I guess Barry already answered the question on how to check on run
>  > status.  Individual contigs also create a file called run.log.
>  > These will be under theVoid directory for each individual contig in
>  > the MAKER datastore directory.  These files also contain entries
>  > with labels like STARTED and FINISHED for each individual analysis.
>  > The master_datastore_index.log file has status tags for entire
>  > contigs as apposed to individual analyses.
>  >
>  > I hope that helps.  Let us know how it goes.
>  >
>  > Thanks,
>  > Carson
>  >
>  >
>  > On 10/21/09 3:21 AM, "Xavier Watkins" <xavier at flymine.org> wrote:
>  >
>  > Hi,
>  > I'm currently doing a test run of Maker on the D. mel genome and I
>  > would like to estimate the time it takes to run on our system (it
>  > has now been running for 2 weeks on 5 processors).
>  > Is there a way to know how many processes are left to run when
>  > running MAKER, or to know which contigs (chromosomes in my case)
>  > have finished running? From what I see it chops up the contigs into
>  > slices when running RepeatMasker (currently on .151) is there a way
>  > to know the total number of slices?
>  >
>  > Apologies if I've missed this info in the documentation, I couldn't
>  > find it.
>  >
>  > All the best,
>  > Xavier
>  >
>  >
>  >
>
>
>
>
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20091022/d72b0d63/attachment.html>


More information about the maker-devel mailing list