XENLA GeneModel2012
Data summary
Disclaimer
|
Taira201203_XENLA_tissue
Collect total RNA from 14 Tissue of Xenopus laevis J strain.
- Brain, eye, heart, intestine, kidney, liver, lung, muscle, ovary, pancreas, skin, spleen, stomach, testis
- Sons & daughters of single pair of frogs (Their mother frog was used for 1st BAC-end sequencing)
- Standard Illumina sample prep. (poly-A capture)
- Illumina HiSeq 2000, 2x100 bp
- 108.5 billions of nucleotide calls in total.
- 55M ~ 130M reads/tissue (27M ~ 65M pairs)
- Brief report for data processing
Taira201203_XENLA_stage
Collect total RNA from 11 different developmental stages of Xenopus laevis J strain embryo.
- Stage 01, 08, 09, 10.5, 12, 15, 20, 25, 30, 35, 40
- Sons & daughters of single pair of frogs (their mother frog was used for 1st BAC-end sequencing)
- Standard Illumina sample prep. (poly-A capture)
- Illumina HiSeq 2000, 2x100 bp
- 163.8 billions of nucleotide calls in total.
- 40M ~ 110M reads/tissue (20M ~ 55M pairs)
- Brief report for data processing
Assembled transcripts
Raw sequences
- From tissue samples: Protein FASTA & cDNA FASTA (112,045 in total)
- From stage samples: Protein FASTA & cDNA FASTA (78,546 in total)
Orthologous sequences
- Take all orthologous candidate genes from BLASTP results (top-3 in max. See [#] for the details.).
- Through the order of 'XENLA'->'HUMAN'->'XENTR'->'MOUSE'->'DANRE'->'CHICK'->'CAEEL'->'DROME' in species, report assembled transcript Id with following conditions.
- An assembled transcript has orthologous candidates in a given species, both as target (database in BLAST search) and query.
- There is at least one overlap between query list and target list. For example, the same gene in other organism should be identified as one of top 3 hits in bi-directional BLAST search.
- If there are more than one overlapped genes, report all of them.
- If an assembled transcript has candidate orthologous gene in one species, stop searching orthologs and move on to next assembled transcript. So, if a transcript has orthologous gene satisfied this criteria in HUMAN, orthologs in other species next in order, i.e. MOUSE, DANRE, CHICK, etc., are not searched. Main reason for this is to remove redundancy of highly conserved across all species.
Here's candidate orthologs for each assembled transcripts:
- From tissue samples: xdata:/J/Taira201203_XENLA_tissue_pep_final.nr_gene_list (42,890 transcripts in total)
- From stage samples: xdata:/J/Taira201203_XENLA_stage_pep_final.nr_gene_list (31,833 transcripts in total)
Based on this table, we selected transcripts/peptides as non-redundant sequence set. 'orthoGeneAll' set contains all sequences reported on 'nr_gene_list' table, and 'orthoGeneOne' set contains the longest sequence per orthologous gene group. For example, in tissue sample set, the following three transcripts are reported as known X. laevis rfx2 gene.
Taira201203_XENLA_tissue_00066978 XENLA rfx2|XB-GENE-991777,rfx6|XB-GENE-6488525 Taira201203_XENLA_tissue_00144530 XENLA rfx2|XB-GENE-991777 Taira201203_XENLA_tissue_00191686 XENLA rfx2|XB-GENE-991777
In 'orthoGeneAll', all three sequences are reported, although in 'orthoGeneOne', Taira201203_XENLA_tissue_00144530 is not reported (it is shorter than Taira201203_XENLA_tissue_00191686). We should mention that, in this example, we did not pick one of three, because Taira201203_XENLA_tissue_00066978 has another canddiate gene, rfx6, that is not presented in other two genes.
- OrthoGeneAll from tissue samples: Protein FASTA & cDNA FASTA (42,890 sequences)
- OrthoGeneAll from stage samples: Protein FASTA & cDNA FASTA (31,833 sequences)
- OrthoGeneOne from tissue samples: Protein FASTA & cDNA FASTA (24,762 sequences)
- OrthoGeneOne from stage samples: Protein FASTA & cDNA FASTA (18,848 sequences)
Annotation
Orthologous genes
We used EnsEMBL-66 as main protein sequences. For X. laevis, we used protein sequences from XenBase (downloaded on Dec-2011). These are top-3 genes (based on E-value), with > 40% aligned length (based on query sequence). It should be mentioned that this is based on simple BLASTP search. We are currently working on more accurate orthology analysis based on phylgenetic tree based method.
XENLA (X. laevis) | HUMAN | XENTR (X. tropicalis) | MOUSE | DANRE (zebrafish) | CHICK (chicken) | CAEEL (worm) | DROME (fly) | |
---|---|---|---|---|---|---|---|---|
Stage pep as query | Stage pep --> XENLA | Stage pep --> HUMAN | Stage pep --> XENTR | Stage pep --> MOUSE | Stage pep --> DANRE | Stage pep --> CHICK | Stage pep --> CAEEL | Stage pep --> DROME |
Stage pep as target | Stage pep --> XENLA | HUMAN --> Stage pep | XENTR --> Stage pep | MOUSE --> Stage pep | DANRE --> Stage pep | CHICK --> Stage pep | CAEEL --> Stage pep | DROME --> Stage pep |
Tissue pep as query | Tissue pep --> XENLA | Tissue pep --> HUMAN | Tissue pep --> XENTR | Tissue pep --> MOUSE | Tissue pep --> DANRE | Tissue pep --> CHICK | Tissue pep --> CAEEL | Tissue pep --> DROME |
Tissue pep as target | Tissue pep --> XENLA | HUMAN --> Tissue pep | XENTR --> Tissue pep | MOUSE --> Tissue pep | DANRE --> Tissue pep | CHICK --> Tissue pep | CAEEL --> Tissue pep | DROME --> Tissue pep |
Microarray
- Affymetrix microarray v.1 (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL1318)
- Mapped to tissue samples: xdata:/J/affy_XENLA_v1.Taira201203_XENLA_tissue_cdna_final.probeset
- Mapped to stage samples: xdata:/J/affy_XENLA_v1.Taira201203_XENLA_stage_cdna_final.probeset
- Affymetrix microarray v.2 (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL10756)
- Mapped to tissue samples: xdata:/J/affy_XENLA_v2.Taira201203_XENLA_tissue_cdna_final.probeset
- Mapped to stage samples: xdata:/J/affy_XENLA_v2.Taira201203_XENLA_stage_cdna_final.probeset
Contributors
- Masanori Taira (Graduate School of Science, University of Tokyo)
- Shuji Takahashi (Komaba Organization for Educational Excellence, College of Arts and Sciences, University of Tokyo)
- Toshiaki Tanaka (Tokyo Institute of Technology)
- Atsushi Toyoda and Asao Fujiyama (National Institute of Genetics)
- Yutaka Suzuki (Graduate School of Frontier Sciences, University of Tokyo)
- Edward M. Marcotte (University of Texas at Austin)
- John B. Wallingford (University of Texas at Austin)
- Taejoon Kwon (University of Texas at Austin)
- Texas Advanced Computing Center (TACC)