Why do other chicken EST sets appear to have more known genes in them?

It has been brought to our attention that our collection of chicken ESTs have a much lower fraction of hits with the known databases. This certainly seems to be the case. We have analysed a few subsets of chicken ESTs in the public domain.

EST or Contig set Number of sequences BLAST hits vs Swiss-Trembl
1E-03 1E-06 70 bits
Riken set 1 7,410 57% 54% 49%
EMBL EST set 2 23,026 74% 71% 67%
Assembled set 2 10,068 67% 63% 60%
BBSRC ESTs 330,388 50% 48% 45%
BBSRC assembly 85,486 39% 37% 35%
BBSRC+Genbank contigs (All) 97,221 42% 39% 35%
BBSRC+Genbank contigs (BBSRC only) 73,023 35% 32% 28%
BBSRC+Genbank contigs (Genbank only) 8,637 43% 39% 34%
BBSRC+Genbank mixed contigs 15,561 75% 72% 70%

Notes: The Riken set 1 corresponds to those ESTs published from the Buerstedde group, whilst the EMBL EST set 2 include the Riken set plus further ESTs deposited from the Buerstedde group not originally listed in the paper. This set were assembled using PHRAP to produce 10,068 contings (Assembled set 2). The BBSRC project ESTs and contigs are considered next. Finally, the last 5 rows refer to contigs produced by assembling the BBSRC ESTs and 60,000 chicken ESTs in Genbank to give 97,221 contigs. Only 15,561 of these are shared by both the BBSRC and Genbank ESTs.

As can be seen, the BBSRC set has a relatively low percentage of "known" genes in it compared to other sets. We believe this is for a number of reasons: