Sp Other Assemblies

Table of Contents

  1. Evidential gene (evigene)
  2. Common set of Trinity Transcripts

Evidential gene (evigene)

Summary

The method described here Lv Other assemblies was applied to these datasets:

SRR531952|embryo 0h
SRR531949|embryo 10h
SRR531860|embryo 18h
SRR531853|embryo 24h
SRR531948|embryo 24h
SRR532074|embryo 30h
SRR531956|embryo 40h
SRR531964|embryo 48h
SRR531954|embryo 56h
SRR531996|embryo 64h
SRR531950|embryo 72h
SRR532151|larva four-arm stage
SRR532055|larva vestibular invagination stage
SRR532143|larva pentagonal disc stage
SRR533746|larva tube-foot protrusion stage
SRR531957|post-metamorphosis
SRR531953|adult tissue coelomocyte
SRR531955|adult tissue gut
SRR532046|adult tissue radial nerve
SRR532121|adult tissue testes
SRR531958|adult tissue ovary

The resulting Sp_evigene_mRNA and Sp_evigene_pep databases contain 80545 entries. The BUSCO
scores for the peptides are:

C: S: D: F: M:
97.3% 92.3% 5.0% 0.3% 2.4%

Common set of Trinity Transcripts

Summary

This method 'best-transcript-set'-from-many-libraries was used. Trinity 2.8.3 was employed. The resulting Sp_common_mRNA and Sp_common_pep databases contain the best common representation of each gene as determined from Trinity 2.8.3 runs over many RNA-seq libraries. There are 29742 entries in each file, or very roughly one representation per gene. In brief the method attempts to find the most commonly used ends and to group together splicing variants for the same gene, so that a single representative may be selected from that group. This produces about 2.7X fewer sequences than Evigene, mostly by removing near duplicates.
The transcriptome has NCBI accession number GHFM01000000. The cutoff parameter of 60% which was used was determined from this table of BUSCO (peptide) scores:

Cutoff C: S: D: F: M:
100 98.5% 92.3% 6.2% 0.2% 1.3%
90 98.6% 93.7% 4.9% 0.2% 1.2%
80 98.6% 93.8% 4.8% 0.2% 1.2%
60 98.6% 93.8% 4.8% 0.2% 1.2%
50 98.3% 93.5% 4.8% 0.2% 1.5%
40 97.6% 93.1% 4.5% 0.2% 2.2%