[maker-devel] Status check?
Xavier Watkins
xavier at flymine.org
Thu Oct 22 10:31:53 MDT 2009
Thanks Mike, I'll give it a go, that looks like what I'm looking for!
On 22 Oct 2009, at 17:28, Reith, Michael wrote:
> Hi Xavier,
>
> Just a quick comment from a novice user - in maker_opts.ctl, repeat
> masking is set by default to "all" (first line under Repeat Masking -
> model_org:all). It took me a while to figure out that if you have
> even
> just 20 or 30 Mb of sequence, cross_match takes forever to run it
> through all of Repbase. Try changing the model_org to "Drosophila" or
> "diptera". That should help get through the repeat masking more
> quickly.
>
> Mike
>
> -----Original Message-----
> From: maker-devel-bounces at yandell-lab.org
> [mailto:maker-devel-bounces at yandell-lab.org] On Behalf Of Xavier
> Watkins
> Sent: October 22, 2009 8:38 AM
> To: Mark Yandell
> Cc: maker-devel at yandell-lab.org; Carson Holt
> Subject: Re: [maker-devel] Status check?
>
> Hi,
> Thanks everyone for your help. I'm already running mpi_maker, planning
> on using more CPUs next time...
>
> I'm using the following:
>
> RepeatMasker with cross_match (running cross match seems to be the bit
> that's taking ages)
>
> snap version 2006-07-28
> GeneMarkS
> NCBI blastall 2.2.20 for blastx
>
> Not really sure what happens in RepeatMasker but from what I
> understand it tries to blast the genome against itself to build a
> library of possible repeats? Is there a way of building this library
> of repeats in a more efficient way?
>
> Many thanks,
> Xavier
>
>
> On 21 Oct 2009, at 16:07, Mark Yandell wrote:
>
>>
>> Hi Xavier,
>>
>> I agree: This seems way to long. I can basically reproduce flybase's
>> annotations and blast data in about 3 days on my laptop-- so 2 weeks
>> on 5 processor's seems way too long.
>>
>> Is there some. special, really huge dataset you are running? Are you
>> doing TBLASTX to align hits from a large database of sequences?
>>
>> --mark
>>
>> Mark Yandell
>> Associate Professor of Human Genetics
>> Eccles Institute of Human Genetics
>> University of Utah
>> 15 North 2030 East, Room 2100
>> Salt Lake City, UT 84112-5330
>> ph:801-587-7707
>> ________________________________________
>> From: maker-devel-bounces at yandell-lab.org
> [maker-devel-bounces at yandell-lab.org
>> ] On Behalf Of Carson Holt
>> Sent: Wednesday, October 21, 2009 8:57 AM
>> To: Xavier Watkins; maker-devel at yandell-lab.org
>> Subject: Re: [maker-devel] Status check?
>>
>> The time spent depends primarily on the size of the protein, EST,
>> and repeat protein databases provided. BLAST actually makes up
>> about 90% of the run time for MAKER. If your using 5 processors, I
>> suggest using mpi_maker instead of regular maker. It gets better
>> performance on multiprocessor systems. The number of slices is
>> dependant on what you set max_dna_len to be in the maker_opt.ctl
>> file. Increasing the number increases memory usage. Just divide
>> the contig length by that number. Doing a test run on the entire
>> Drosophila genome could take a while especially if you used large
>> protein and EST databases for the analysis. It is 120 Megabases in
>> size, and with the default max_dna_len of 100,000, it would be
>> divided into 1,200 chunks. It could take anywhere from 4 days to 3
>> weeks depending on the BLAST databases used.
>>
>> I guess Barry already answered the question on how to check on run
>> status. Individual contigs also create a file called run.log.
>> These will be under theVoid directory for each individual contig in
>> the MAKER datastore directory. These files also contain entries
>> with labels like STARTED and FINISHED for each individual analysis.
>> The master_datastore_index.log file has status tags for entire
>> contigs as apposed to individual analyses.
>>
>> I hope that helps. Let us know how it goes.
>>
>> Thanks,
>> Carson
>>
>>
>> On 10/21/09 3:21 AM, "Xavier Watkins" <xavier at flymine.org> wrote:
>>
>> Hi,
>> I'm currently doing a test run of Maker on the D. mel genome and I
>> would like to estimate the time it takes to run on our system (it
>> has now been running for 2 weeks on 5 processors).
>> Is there a way to know how many processes are left to run when
>> running MAKER, or to know which contigs (chromosomes in my case)
>> have finished running? From what I see it chops up the contigs into
>> slices when running RepeatMasker (currently on .151) is there a way
>> to know the total number of slices?
>>
>> Apologies if I've missed this info in the documentation, I couldn't
>> find it.
>>
>> All the best,
>> Xavier
>>
>>
>>
>
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at yandell-lab.org
> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org
>
More information about the maker-devel
mailing list