Contents
Glossary of terms related to MISO¶
Terms used in the MISO manual.
Annotations and GFF files¶
- Exon-centric: An annotation of alternative events in the genome that is based on inclusion/exclusion of a particular exon in a transcript. For example, an exon-centric annotation of an alternatively skipped exon would contain two isoforms, one containing the skipped exon and its two flanking exon, and another isoform containing only the two flanking exons. This “exon-centric” annotation does not incorporate other exons in the gene, and so Ψ values obtained from this annotation correspond only to the inclusion of the alternative exon relative to its two annotated flanking exons, without considering any other parts of the gene’s isoforms.
- Isoform-centric: Unlike exon-centric annotations, in isoform-centric annotations each whole isoform of a gene is annotated and used as input to MISO. Ψ values obtained this way are vectors, each entry corresponding to the percent inclusion of a whole isoform in the annotated gene.
Read alignments and BAM files¶
- Paired-end versus single-end: In paired-end sequencing, both ends of a cluster on a flow cell are sequenced. Each mate is guaranteed to have originated from the same molecule. In single-end sequencing, only one end of a molecule is sequenced. MISO supports both paired-end and single-end data. All paired-end data can be run in MISO as single-end by simply omitting the
--paired-end
parameter. In that case, MISO will treat each mate of a pair independently. - Properly paired read pair: This term applies only to paired-end data, and refers to read pairs where both mates are mapped in a way that makes sense given the strandedness of the RNA-Seq protocol and the alignments of the individual mates. When MISO maps read pairs to event annotations in paired-end mode, it only considers properly paired reads. If the mates maps to distinct chromosomes, then the read pair will not be considered properly paired. Similarly, if one mate maps in opposite orientation to what is expected given the strandedness setting, it will not be considered properly paired. Finally, if one mate maps to within the boundaries of an annotated event but the other does not, the read pair will not be considered (though if such cases are common, one can use MISO in single-end mode.) MISO will generally look for the BAM flag that encodes whether a read pair is properly paired or not. Otherwise, it pairs mates together from a BAM file using their read IDs.
- Overhang: Overhang applied to splice junction reads, refers to the minimum number of bases covered by the read on any of the exons involved in the junction. For example, if a junction read of length 30 is aligned to the border of two exons with 10 bases covered on one exon and 20 bases covered on the other exon, the overhang is defined to be 10 (the smallest of the two numbers.) For single-end reads, requiring a considerable overhang like 4 or more helps filter alignments that appear as junction reads but are simply artifacts of sequencing errors and/or alignment errors. Overhang is not defined for paired-end reads.
Inference terms¶
There are a number of technical parameter settings related to Markov chain Monte Carlo inference (MCMC), which the MISO engine is based on. In virtually all cases, users never have to mind or alter these settings, but they are explained here for completeness. These are configurable from the MISO settings file.
- Number of (MCMC) chains: The number of independent MCMC chains used by MISO when performing inferences.
The default number is 6 which is considered a conservative setting for the problem. High chain numbers like 6 prevent MISO from getting stuck in suboptimal Ψ values.
- Lag: The number of MCMC samples to skip over when computing the posterior distribution over Ψ. The default is 10. High settings of this parameter can, in some cases, prevent autocorrelations between MCMC samples.
- Burn-in: The number of initial MCMC samples to exclude when computing the posterior distribution over Ψ. Large settings of burn-in can prevent generation of posterior distributions over Ψ that are closely correlated to the initial random setting of Ψ used by the sampler.