[maker-devel] Maker: some suggestions and comments
Ferenc Kiss
fkiss at hygiene.uni-wuerzburg.de
Sat Oct 10 11:33:13 MDT 2009
Hello MAKER people,
I'm not sure if I'm right with the address. First of all I want to pay you a
compliment for your project. I like it very much and it offers my great
possibilities!
To my person: My name is Ferenc Kiss and I'm phd student at the University
of Wuerzburg in Germany. My topics are the genetic manipulation of the fox
tape worm and it's genome annotation.
I want to perfrom a BLASTX run with the contig sequences against the nr
database.
I have some suggestions and comments:
To compare the time consumption of NCBI BLAST and WU-BLAST I wrote a perl
script, which performed several blast runs and measured the time and
reported the number of found sequences. I used the hard coded options MAKER
uses and altered them to get an idea what they are doinig:
command
n seqs
time
formatdb -i $db -p T
07:19
blastall -p blastx -i $file -d $db -a 4 -e 0.0000001
85
13:10
blastall -p blastx -i $file -d $db -a 4 -b 10000 -v 10000 -e 0.0000001 -z
300 -Y 500000000 -K 100 -U -F T -I
1255
13:09
blastall -p blastx -i $file -d $db -a 4 -b 10000 -v 10000 -e 0.0000001 -K
100 -U -F T -I
85
12:54
blastall -p blastx -i $file -d $db -a 4 -b 10000 -v 10000 -e 0.0000001 -U
-F T -I
85
12:54
blastall -p blastx -i $file -d $db -a 4 -b 10000 -v 10000 -e 0.0000001 -z
300 -K 100 -U -F T -I
2576
13:26
blastall -p blastx -i $file -d $db -a 4 -b 10000 -v 10000 -e 0.0000001 -Y
500000000 -K 100 -U -F T -I
1235
12:58
xdformat -k -p $db
02:32
blastx $db $file cpus=4 E=0.0000001
73
01:13:57
blastx $db $file B=10000 V=10000 E=0.0000001 wordmask=seg Z=300 Y=500000000
hspmax=100 gspmax=100 lcmask kap gi cpus=4
40
01:08:44
blastx $db $file B=10000 V=10000 E=0.0000001 wordmask=seg hspmax=100
gspmax=100 lcmask kap gi cpus=4
2
01:08:33
blastx $db $file B=10000 V=10000 E=0.0000001 wordmask=seg lcmask kap gi
cpus=4
2
01:08:43
I was using a query sequence of approx. 24 kb and nr as database.
It could be that the WU-BLAST suite I posses is not optimal for my system
but it's the only one I got. Judging from the results WU-BLAST is much
slower than NCBI BLAST. This is why I'm using NCBI's BLAST suite. I don't
understand why MAKER is using the -z -Y and -K flags and not the dafaults.
When running the masking steps I found some errors concerning the e value
given in the control files. I changed the code in the GI.pm subroutine
blastx_as_chunks to
my $bit_blastx = ($rflag) ? $CTL_OPT->{bit_rm_blastx} :
$CTL_OPT->{bit_blastx};
my $eval_blastx = ($rflag) ? $CTL_OPT->{eval_rm_blastx} :
$CTL_OPT->{eval_blastx};
my $pcov_blastx = ($rflag) ? $CTL_OPT->{pcov_rm_blastx} :
$CTL_OPT->{pcov_blastx};
my $pid_blastx = ($rflag) ? $CTL_OPT->{pid_rm_blastx} :
$CTL_OPT->{pid_blastx};
and it worked as expected.
When I masked the sequences I used my own nucleotide repeat library, which
sometimes cover whole contigs. After runlevel 1 a lot of sequences where
masked nearly complete by RepeatMasker. Runlevel 2 failed in this cases
because blastall couldn't calculate the parameters. I circumvent this
problem by replacing the subroutine runBLASTX line $command .= " -o
$outfile"; in GI.pm with $command .= " 1> $out_file 2>'xy'";. This was only
working when I deleted the error fetching lines in blastx.pm. There could
maybe be a more elegant way to manage this.
After specifiying my tmp_dir in the control files it hasn't worked the way I
wanted it. Therefore I altered once again GI.pm:
$TMP = tempdir("maker_XXXXXX", CLEANUP => 1, TMPDIR => 1, DIR => 'path');
To get the STDOUT/STDERR output of MAKER into a file I added the lines
open(STDERR,'>Maker.txt') or die "Fehler.txt kann nicht angelegt werden:
$!\n";
open(STDOUT,'>Maker.txt') or die "Fehler.txt kann nicht angelegt werden:
$!\n";
directly to /bin/maker. This helps me in debugging.
I also tried to alter the code to perform BLAST queries using the NCBI BLAST
server. This would by a great alternative to standalone BLAST. But because
of lack of time I haven't continued to follow this direction. I also had to
cheat a little bit to get the nr database to work.
Since I'm an autodidact and I have no contact to any person involved in
bioinformatics it would be great to stay in contact with you. My points are
not meant to annoy you. It is my first time to participate in an open
project.
I wish you a nice weekend,
Ferenc Kiss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20091010/ee11bc4a/attachment.html>
More information about the maker-devel
mailing list