XENLA GeneModel2012
Check Xenopus_Genome_Project#Assembled_transcripts for the latest (refined) gene model. TaejoonKwon (talk) 11:19, 31 January 2014 (CST)
Disclaimer
|
WT data
Taira201203 data
- Contributed by Masanori Taira (Graduate School of Science, University of Tokyo), Shuji Takahashi (Komaba Organization for Educational Excellence, College of Arts and Sciences, University of Tokyo), Toshiaki Tanaka (Tokyo Institute of Technology), Atsushi Toyoda and Asao Fujiyama (National Institute of Genetics), Yutaka Suzuki (Graduate School of Frontier Sciences, University of Tokyo)
- If you have more question about this data, please contact Dr. Masanori Taira, Dr. Edward Marcotte, or Dr. Taejoon Kwon.
Taira201203_XENLA_tissue
Collect total RNA from 14 Tissue of Xenopus laevis J strain.
- Brain, eye, heart, intestine, kidney, liver, lung, muscle, ovary, pancreas, skin, spleen, stomach, testis
- Sons & daughters of single pair of frogs (Their mother frog was used for 1st BAC-end sequencing)
- Standard Illumina sample prep. (poly-A capture)
- Illumina HiSeq 2000, 2x100 bp
- 55M ~ 130M reads/tissue (27M ~ 65M pairs)
- Raw sequences (112,045 in total)
- nr_gene_list (42,890 transcripts)
- OrthoGeneAll (42,890 sequences)
- OrthoGeneOne (24,762 sequences)
Taira201203_XENLA_stage
Collect total RNA from 11 different developmental stages of Xenopus laevis J strain embryo.
- Stage 01, 08, 09, 10.5, 12, 15, 20, 25, 30, 35, 40
- Sons & daughters of single pair of frogs (their mother frog was used for 1st BAC-end sequencing)
- Standard Illumina sample prep. (poly-A capture)
- Illumina HiSeq 2000, 2x100 bp
- 40M ~ 110M reads/tissue (20M ~ 55M pairs)
- Raw sequences (78,546 in total)
- nr_gene_list (31,833 transcripts in total)
- OrthoGeneAll (31,833 sequences)
- OrthoGeneOne (18,848 sequences)
Making nr_gene_list (orthologs to other species) & orthoGene
This process is no longer used in annotation process. TaejoonKwon (talk)
- Take all orthologous candidate genes from BLASTP results (top-3 in max. See [#] for the details.).
- Through the order of 'XENLA'->'HUMAN'->'XENTR'->'MOUSE'->'DANRE'->'CHICK'->'CAEEL'->'DROME' in species, report assembled transcript Id with following conditions.
- An assembled transcript has orthologous candidates in a given species, both as target (database in BLAST search) and query.
- There is at least one overlap between query list and target list. For example, the same gene in other organism should be identified as one of top 3 hits in bi-directional BLAST search.
- If there are more than one overlapped genes, report all of them.
- If an assembled transcript has candidate orthologous gene in one species, stop searching orthologs and move on to next assembled transcript. So, if a transcript has orthologous gene satisfied this criteria in HUMAN, orthologs in other species next in order, i.e. MOUSE, DANRE, CHICK, etc., are not searched. Main reason for this is to remove redundancy of highly conserved across all species.
Based on 'nr_gene_list' table, we selected transcripts/peptides as non-redundant sequence set. 'orthoGeneAll' set contains all sequences reported on 'nr_gene_list' table, and 'orthoGeneOne' set contains the longest sequence per orthologous gene group. For example, in tissue sample set, the following three transcripts are reported as known X. laevis rfx2 gene.
Taira201203_XENLA_tissue_00066978 XENLA rfx2|XB-GENE-991777,rfx6|XB-GENE-6488525 Taira201203_XENLA_tissue_00144530 XENLA rfx2|XB-GENE-991777 Taira201203_XENLA_tissue_00191686 XENLA rfx2|XB-GENE-991777
In 'orthoGeneAll', all three sequences are reported, although in 'orthoGeneOne', Taira201203_XENLA_tissue_00144530 is not reported (it is shorter than Taira201203_XENLA_tissue_00191686). We should mention that, in this example, we did not pick one of three, because Taira201203_XENLA_tissue_00066978 has another candidate gene, rfx6, that is not presented in other two genes.