[maker-devel] Maker with Augustus

Carson Holt carson.holt at genetics.utah.edu
Wed Oct 21 18:00:11 MDT 2009


Feeding GFF3 is fine.  Infact MAKER will go faster if it doesn't have to spend all the time building the alignments itself.  If you provide everything via GFF3, you can decide yourself what to provide.  BLASTX alignments are usually rougher than the protein2genome alignment as the alignments produced by exonerate correctly align around splice sites and in order.  You can decide whether to supply both or not.  I find I get better sensitivity by using both.  You can also use other programs like BLAT to align proteins (if you prefer it to BLAST) and provide it's output as GFF3.  You can try playing with the parameters in the maker_bopts.ctl file to see if lower thresholds work better for you and allow more sequences to align.  You may see specificity go down and sensitivity go up for gene predictions or you might not see much change at all.  It's really can be like a balancing act.  The defaults seem to work well on most sequence, but it's worth altering them if you think you are not getting as many alignments as you should.  Also note that some alignments might be filtered out not because they are poor, but rather because they are redundant.  MAKER will keep only the 10 best alignments overlapping a given region if they are heavily redundant (i.e. Same exon intron structure again and again and again).  So if you have a highly conserved gene and your aligning copies from 20 organisms, they will all align the same, so there is no need to keep the same alignment to such a high depth because they all provide identical information.  I doubt this is what is happening in your case though.  I suggest you just try lowering the alignment thresholds in the maker_bopts.ctl file.

MAKER does utilize the target tag in the GFF3 file, and having the information allows MAKER to provide certain downstream compatibility with programs like Apollo.  Off the top of my head, I can't remember if MAKER dies if it is missing.  I think MAKER just assumes 100% alignment if the target tag is missing and add it to the final output file to maintain GFF3 compatibility with Apollo.  I may be wrong though.

I hope that answers your questions.  Let me know how everything works for you.

Thanks,
Carson


On 10/21/09 11:20 AM, "david.swarbreck" <dswarbreck at stanford.edu> wrote:

Thanks carson and congratulations on the new arrival. I plan to generate
the evidence alignments (using maker and other tools) prior to running
maker for the final gene build is there any disadvantage at all in
feeding in GFF rather than fasta sequences? Is it necessary to provide
the blastx hits as well as the polished protein2genome alignments?

Also does maker utilize the target tag information i.e. the transcript
relative positions, I notice the maker protein/nucleotide GFFs include
this but I'm wondering if I need to add this info for alignments
generated outside of maker. One final question regarding the
stringencies for the blast runs how dangerous is it to alter the default
parameters, i.e is loosening these highly likely to be detrimental to
the gene build. From 146,022 protein sequences in genbank (same phylum
as the genomic sequence) only 17386 were aligned under the default
parameters in Maker.

Regards
David

Carson Holt wrote:
> Sorry for the slower than average response. My wife went into labor
> over the weekend.
>
> For the initial training of augustus, you can either select an
> existing organism or do the training yourself. Once augustus is
> trained, MAKER will produce both the ab initio gene predictions
> (without hints) and hint based predictions for you. You don't have to
> worry about building the hints file. MAKER does that for you based on
> alignments produced from the EST and protein evidence you provided in
> the maker_opts.ctl file. Just make sure you set the
> AUGUSTUS_CONFIG_PATH environmental variable in your profile per the
> augustus install instructions.
>
> I hope that answers your question. If not let me know.
>
> Thanks,
> Carson
>
>
> On 10/15/09 12:29 PM, "david.swarbreck" <dswarbreck at stanford.edu> wrote:
>
>     Hi,
>
>     I have a question relating to using Augustus as the predictor with
>     Maker. From looking at the documentation for augustus the hints files
>     detailing aligned features uses a non-standard GFF format. These
>     include
>     the use of non SO compliant features such as exonpart (treated
>     differently to exon by augustus) and the inclusion of a source
>     identifier in col9 which has to be specified in the augustus
>     config. In
>     addition in the standalone Augustus setting a bonus and penalty is
>     required for obeying a hint feature.
>
>     So my question is when using Augustus as the predictor in Maker is
>     Maker
>     taking care of this for me or do i still need to define exonparts as
>     distinct from exon and provide bonus and penalty information. Is
>     there a
>     particular GFF3 format which maker requires to define for example
>     alignments e.g can these use the match type (e.g. EST_match), can this
>     use the cigar format or should these features be described as exons.
>
>     Any help appreciated
>     david
>
>
>     _______________________________________________
>     maker-devel mailing list
>     maker-devel at yandell-lab.org
>     http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org
>

--
************************************
David Swarbreck
The Arabidopsis Information Resource
Carnegie Institution of Washington
Department of Plant Biology
260 Panama St.
Stanford, CA 94305
U.S.A

E-mail dswarbreck at stanford.edu
www.arabidopsis.org/about/staff.jsp
************************************


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20091021/6a103246/attachment.html>


More information about the maker-devel mailing list