Developmental Transcriptomes of S. purpuratus

What's New

  • 8/2020 RNA-seq data web application has been moved to http://legacy.echinobase.org/shiny/quantdev/. The data is also available at our new echinobase platform.
  • 2/2014 Important Notice: Some institution networks block traffic on port 2000. To ensure accessibility, the RNA-seq data web application has been moved to port 3838: http://www.echinobase.org:3838/quantdev/ Please update your bookmark. Sorry for the inconvenience.
  • 1/2014: The paper describing the quantitative developmental transcriptomes is published on Developmental Biology.
  • 11/2013: A completely new query tool, together with quantitative embryonic dataset, is released.
  • 10/2012: The paper describing the gene structure is published on Genome Research.
  • 6/2012: The quantitative expression data (FPKM) is now calculated from the CDS instead of the full-length sequence.

Introduction

These RNA-seq data were generated from a comprehensive transcriptome survey of 22 samples, including 10 embryonic stages, 6 feeding larval and metamorphosed juvenile stages, and 6 adult tissues.

In total 784 million 76bp pair-end reads were obtained. The reads were mapped onto the S. purpuratus genome v3.0, and new gene models were constructed. These RNA-seq models are identified with WHL22 prefix. Various information about the analysis is provided on this page.

The analysis was done based on S. purpuratus genome v3.0. The genome v3.0 differs with the current v3.1 in only a few places due to the removal of contaminating microbial sequences. The assembled transcript sequences remain the same, although the coordinates of exons might change.

For more detailed information, please see the Publication section.

Data

1. Sequence and Expression, Using the Query Tool

The Query Tool provides an interface to search the data by

  • Echinobase official gene name (e.g. Tgif),
  • Echinobase official gene ID (e.g. SPU_018126),
  • RNA-seq model ID (e.g. WHL22.614286),
  • Function class (2 levels),
  • Expression profile cluster ID (e.g. 020)

It returns a table of gene models found, with

  • A brief info table of names, IDs, function classes, expression clusters, links to Echinobase and IGV genome browser;
  • A table of quantitative data;
  • Expression dynamics in embryonic stages by line plots or heat maps;
  • mRNA sequences, predicted CDS and protein sequences;
  • Downloadable data table.

Note:

  • The Search box accepts multiple ID/names. WHL and SPU IDs can be embedded in other text, and the program recognizes them by the pattern.Names have to be one name per line.
  • The expression cluster IDs can be found in this plot. They are [three-digit numbers] in the plot titles. Use the complete IDs like '020', not just '20'.
  • To use the IGV link, make sure you have downloaded the IGV program and it is running on your computer. The link will locate the corresponding gene model in the genome context.
  • mRNA sequences are sequences assembled directly from the RNA-seq data.CDS/protein sequences are predicted.

2. Gene Structure and Raw Reads, Using IGV Genome Browser

IGV (Integrative Genomics Viewer), developed by Broad Institute MIT, is a high-performance desktop genome browser for interactive exploration of large, integrated datasets. The genome view of RNA-seq models assembled in this study and reads etc can be accessed through IGV using the data server described below.

Quick Start:

  • Download IGV:
    • Go to IGV home page,download the program with the option for the maximum memory compatible with your computer, launch IGV. In practice, ~1G memory is necessary to load gene models, ~1.5-2G or even more memory is necessary to load reads.
  • Load the S. purpuratus genome:
    • In IGV, go to the genomes drop-down menu, select "S.purpuratus(3.0)", to load the scaffolds and GLEAN3 models.
  • Load datasets:
    • First, change the data server setting: select menu: View ->Prefereces -> Advanced, select "Edit server properties", change the Data Registry URL to this (rnaseq/igv/igv-reg.txt).Don't change the Genome Server URL.
    • Then, select menu: File -> Load from server, select the datasets.
  • Navigation:
    • The locus of interest can be reached by selecting the drop-down menu of scaffolds. You can also type the coordinate, exact official gene name, or RNA-seq model ID in the search box.
    • The view can be panned and zoomed by the mouse or keyboard shortcuts.
    • The gene name is the "official" name used in Echinobase. Make sure the exact same text is typed in the search box.

Tips:

  • The feature panel (GLEAN gene model) can be merged with the data panel: View -> Preferences -> General, select "Combine Data and Feature Panels".
  • The order of tracks can be set by drag-and-drop.
  • The track colors and other options can be accessed by a right-click.
  • The track should be expanded to check overlapped features.
  • All these settings can be saved as a session. Here is a session file to start with.

3. Alternative Data Accesses

The assembled transcriptome sequences have been submitted to NCBI Transcriptome Shotgun Assembly Sequence Database under accession numbers JT094275 - JT123346 that can also be retrieved in its entirety through NCBI BioProject Database by the accession number PRJNA81157.

The sequences and other data are being integrated into Echinobase on individual gene pages. A BLAST service to search the assembled mRNA sequences is also provided.

Various files:

Publications

  • Tu Q, Cameron RA, Worley KC, Gibbs RA, Davidson EH. 2012. Gene structure in the sea urchin Strongylocentrotus purpuratus based on transcriptome analysis. Genome Res 22: 2079-2087. doi:10.1101/gr.139170.112
  • Tu Q, Cameron RA, Davidson EH. 2014. Quantitative developmental transcriptomes of the sea urchin Strongylocentrotus purpuratus. Dev Biol 385: 160–167. doi:10.1016/j.ydbio.2013.11.019

Contact

Qiang Tu (qtu at caltech.edu)