Contents
Annotations for use with MISO and sashimi_plot¶
This page contains links to GFF annotations for use with MISO and sashimi_plot. The GFF annotation format and how it is used by MISO is described in detail in the MISO manual.
Exon-centric annotations¶
These annotations include GFF files (.gff3
extension) that can be used with MISO.
Exon-centric annotations for human and mouse genomes¶
Version 1 of the human/mouse annotations (compiled 2008):¶
- Mouse genome (mm9) alternative events v1.0
- Mouse genome (mm10) alternative events v1.0
- Human genome (hg18) alternative events v1.0
- Human genome (hg19) alternative events v1.0
These contain annotations of:
- Skipped exons (SE)
- Alternative 3’/5’ splice sites (A3SS, A5SS)
- Mutually exclusive exons (MXE)
- Tandem 3’ UTRs (TandemUTR)
- Retained introns (RI)
- Alternative first exons (AFE)
- Alternative last exons (ALE)
Version 1 of the annotations for human and mouse genomes was derived from by Wang et. al. (2008) using ESTs and various annotation databases (like Ensembl, UCSC and AceView) to define alternative splicing events. Briefly, each splicing event was considered alternative if it was supported by several ESTs, and alternative tandem 3’ UTRs (TandemUTR events) were derived from PolyA DB.
Note that Version 1 of the annotations was originally made for mm9 and hg18, and the mm10 and hg19 annotation was made by coordinate mapping (using UCSC’s liftOver
utility) of mm9 to mm10, hg18 to hg19.
Warning
The lifted over Version 1 annotations of mm10/hg19 contain the
ID
entries in the GFF from mm9/hg18; however, the actual genomic coordinates, which are the only part read by MISO, have been lifted over to the more recent genomes. TheID
value used in the GFF is arbitrary and is ignored by MISO; it is only used to encode the gene models hierarchy of genes, mRNAs and exons. Also note that lifting over is an imperfect process: not all events can always be fully lifted over.
Mapping from alternative events to genes for Version 1 annotations
Version 1 annotations from the links above contain a mapping from alternative events to genes, based on Ensembl annotation. These are tab-delimited files the first column (event_id
) is the ID
of the event from its GFF file and the second column (gene_id
) corresponds to a comma-separated list of Ensembl identifiers for the gene(s) the event overlaps. If the event overlaps multiple genes (which could happen because multiple Ensembl identifiers are sometimes given to the same gene, or because the genes overlap and/or are contained within each other in the annotation), then multiple Ensembl identifiers will be listed. A mapping file is given for each event type (e.g. skipped exons, tandem 3’ UTRs, etc.) Events that cannot be mapped to genes are recorded as NA
.
Version 2 (alpha release) of the human/mouse annotations (compiled June 2013):¶
- Mouse genome (mm9) alternative events v2.0
- Mouse genome (mm10) alternative events v2.0
- Human genome (hg18) alternative events v2.0
- Human genome (hg19) alternative events v2.0
These contain annotations of:
- Skipped exons (SE)
- Alternative 3’/5’ splice sites (A3SS, A5SS)
- Mutually exclusive exons (MXE)
- Retained introns (RI)
Version 2 of the annotations was derived by considering all transcripts annotated in Ensembl genes, knownGenes (UCSC) and RefSeq genes. The flanking exons to alternative exons were chosen using the “common shortest” rule, i.e. taking the shortest stretches of flanking that are most common among the annotated transcripts for the gene. The code used to generate these annotations is available as part of rnaseqlib.
The annotations contain the following additional GFF attributes for each event’s gene
entry:
ensg_id
: Ensembl ID for the gene the event falls withinrefseq_id
: RefSeq ID for the gene the event falls withingsymbol
: Gene symbol for the gene the event falls within
These annotations are still being tested. Comments on the annotation are welcomed.
Exon-centric annotations for fly genome¶
These fly genome annotations were derived by the Graveley lab.
Isoform-centric annotations and reference gene models¶
We provide GFF3 annotations based on UCSC Table Browser’s version of Ensembl genes for the following genomes:
- Mouse Ensembl genes from UCSC
- Annotation for mm9: mm9 ensGene GFF annotation
- Annotation for mm10: mm10 ensGene GFF annotation
- Human Ensembl genes from UCSC
- Annotation for hg18: hg18 ensGene GFF annotation
- Annotation for hg19: hg19 ensGene GFF annotation
These can be used with MISO for isoform-centric quantitation, or with sashimi_plot to make plots of RNA-Seq data across gene models.
For convenience, we also provide GFF3 annotations of gene models from Ensembl (release 65), which were simply converted from Ensembl’s GTF to GFF3 format and are otherwise identical to the Ensembl annotation.
- Mouse Ensembl genes (16 M, .zip): Mus_musculus.NCBIM37.65.gff
- Human Ensembl genes (26 M, .zip): Homo_sapiens.GRCh37.65.gff
Note that these annotations follow Ensembl-style chromosome names where as the UCSC-derived Ensembl annotations follow UCSC-style chromosome names.
Alternative 3’ UTR annotations (hybrid)¶
In addition to exon-centric tandem 3’ UTR annotations, alternative 3’ UTR annotations for mouse (mm9) were made available by Wencheng Li and Bin Tian (these were derived by the 3’ READS method: Analysis of alternative cleavage and polyadenylation by 3′ region extraction and deep sequencing). These contain two or more 3’ UTR annotations per gene: