[maker-devel] question about gff "source" column
Joseph Fass
joseph.fass at gmail.com
Wed Jun 3 12:21:21 MDT 2009
I've run maker an my sequence (a set of BACs) with both te_proteins and a
nucleotide repeat library listed and repeat masking turned on, and with '-p
snap,augustus,est2genome' ... and a set of unigenes from the species listed
in maker_opts...
Augustus didn't produce anything (but that's not my question), snap did, and
the EST mapping seemed to work. My problem is that the gff output file
includes "match" and "match_part" features with the following different
categories of source listed:
blastn
blastx:repeatmask
est2genome
maker
repeatmasker
snap
snap_masked
I'm guessing that "blastn" indicates repeats detected from the nt repeat
file, "blastx:repeatmask" indicates repeats detected from the te_proteins
file, "est2genome" indicates hit locations from the unigenes, "repeatmasker"
indicates ... not sure, "snap" indicates snap predicted genes, and
"snap_masked" indicates predicted genes that overlap repeat regions?
But I'd like confirmation of these guesses. And, more importantly, is the
"maker" category the summarized, best predictions from overlapping snap
predictions and unigene hits, that don't overlap any transposable elements
or repeats?? Is that the category of features that maker produces by
combining prediction and blast hits?
Any comment would be appreciated ...
Thanks,
~Joe
--
Joseph Fass
Bioinformatics Programmer
UC Davis Bioinformatics Core
joseph.fass -at- gmail.com (professional)
970.227.5928 (c) || 530.752.2698 (w)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20090603/124d1ef1/attachment.html>
More information about the maker-devel
mailing list