XENLA GeneModel2012

From Marcotte Lab
Jump to: navigation, search

Check Xenopus_Genome_Project#Assembled_transcripts for the latest (refined) gene model. TaejoonKwon (talk) 11:19, 31 January 2014 (CST)


  • Data users may freely download and analyze sequences posted here.
  • Data users may use data to analyze their own data, i.e. reference database for MS/MS proteomics data, and/or RNA-seq data.
  • The publication and presentation of global analysis of data with these sequences are not allowed until 'data owner'. As soon as the paper is accepted, we will post that info on this website. If it is not clear to whom you should contact, please contact to Dr. Taejoon Kwon.


WT data

Taira201203 data

  • Contributed by Masanori Taira (Graduate School of Science, University of Tokyo), Shuji Takahashi (Komaba Organization for Educational Excellence, College of Arts and Sciences, University of Tokyo), Toshiaki Tanaka (Tokyo Institute of Technology), Atsushi Toyoda and Asao Fujiyama (National Institute of Genetics), Yutaka Suzuki (Graduate School of Frontier Sciences, University of Tokyo)
  • If you have more question about this data, please contact Dr. Masanori Taira, Dr. Edward Marcotte, or Dr. Taejoon Kwon.


Collect total RNA from 14 Tissue of Xenopus laevis J strain.

  • Brain, eye, heart, intestine, kidney, liver, lung, muscle, ovary, pancreas, skin, spleen, stomach, testis
  • Sons & daughters of single pair of frogs (Their mother frog was used for 1st BAC-end sequencing)
  • Standard Illumina sample prep. (poly-A capture)
  • Illumina HiSeq 2000, 2x100 bp
  • 55M ~ 130M reads/tissue (27M ~ 65M pairs)


Collect total RNA from 11 different developmental stages of Xenopus laevis J strain embryo.

  • Stage 01, 08, 09, 10.5, 12, 15, 20, 25, 30, 35, 40
  • Sons & daughters of single pair of frogs (their mother frog was used for 1st BAC-end sequencing)
  • Standard Illumina sample prep. (poly-A capture)
  • Illumina HiSeq 2000, 2x100 bp
  • 40M ~ 110M reads/tissue (20M ~ 55M pairs)

Making nr_gene_list (orthologs to other species) & orthoGene

This process is no longer used in annotation process. TaejoonKwon (talk)

  1. Take all orthologous candidate genes from BLASTP results (top-3 in max. See [#] for the details.).
  2. Through the order of 'XENLA'->'HUMAN'->'XENTR'->'MOUSE'->'DANRE'->'CHICK'->'CAEEL'->'DROME' in species, report assembled transcript Id with following conditions.
    • An assembled transcript has orthologous candidates in a given species, both as target (database in BLAST search) and query.
    • There is at least one overlap between query list and target list. For example, the same gene in other organism should be identified as one of top 3 hits in bi-directional BLAST search.
    • If there are more than one overlapped genes, report all of them.
    • If an assembled transcript has candidate orthologous gene in one species, stop searching orthologs and move on to next assembled transcript. So, if a transcript has orthologous gene satisfied this criteria in HUMAN, orthologs in other species next in order, i.e. MOUSE, DANRE, CHICK, etc., are not searched. Main reason for this is to remove redundancy of highly conserved across all species.

Based on 'nr_gene_list' table, we selected transcripts/peptides as non-redundant sequence set. 'orthoGeneAll' set contains all sequences reported on 'nr_gene_list' table, and 'orthoGeneOne' set contains the longest sequence per orthologous gene group. For example, in tissue sample set, the following three transcripts are reported as known X. laevis rfx2 gene.

Taira201203_XENLA_tissue_00066978	XENLA	rfx2|XB-GENE-991777,rfx6|XB-GENE-6488525
Taira201203_XENLA_tissue_00144530	XENLA	rfx2|XB-GENE-991777
Taira201203_XENLA_tissue_00191686	XENLA	rfx2|XB-GENE-991777

In 'orthoGeneAll', all three sequences are reported, although in 'orthoGeneOne', Taira201203_XENLA_tissue_00144530 is not reported (it is shorter than Taira201203_XENLA_tissue_00191686). We should mention that, in this example, we did not pick one of three, because Taira201203_XENLA_tissue_00066978 has another candidate gene, rfx6, that is not presented in other two genes.