Your submission: mouse_sim_long_short on ONT+Illumina data

Background

Challenge 1 is evaluated according to four criteria:

  1. Broad GENCODE Annotation
  2. Subset of manually curated loci selected by GENCODE
  3. sim Lexogen Set 4
  4. Simulated data.

The LRGASP uses SQANTI categories to define evaluating features and metrics for Challenge 1.

LRGASP Challenge 1 Definitions:

This document shows the performance of your pipeline for criteria 4. Critical data for evaluation according to 2. and 4. will be made available after the closure of the challenge, and therefore pre-evaluation reports cannot be provided. Note you???re your criterion 1 metrics reported here have been calculated using GENCODE human v38 and mouse M27 releases while the final evaluation will use human v39 and mouse M28 to be released after completing of the challenge.

Evaluation of detected transcripts for Challenge 1

Global overview

Value
Number of genes detected 14361
Number of known genes detected 14270
Number of transcripts detected 25679
Number of transcripts associated to a known gene 25365
Number of unique SJ detected 143856
Absolute value Relative value (%)
Novel SJ 3851 0.03
Non-canonical SJ 3119 0.02

Evaluation of FSM

Absolute value Relative value (%)
Number of isoforms 19882 -
Reference Match 16965 85.33
5’ reference supported (transcript) 18251 91.8
3’ reference supported (transcript) 18061 90.84
5’ reference supported (gene) 18851 94.81
3’ reference supported (gene) 18830 94.71
Supported Reference Transcript Model (SRTM) 18052 90.8
Reference redundancy Level 1 -

Evaluation of ISM

Absolute value Relative value (%)
Number of isoforms 721 -
5’ reference supported (transcript) 163 22.61
3’ reference supported (transcript) 200 27.74
5’ and 3’ reference supported (gene) 127 17.61
5’ reference supported (gene) 298 298
3’ reference supported (gene) 332 46.05
Supported Reference Transcript Model (SRTM) 127 17.61
Reference redundancy Level 1.03 -

Evaluation NIC

Absolute value Relative value (%)
Number of isoforms 2096 -
5’ and 3’ reference supported (gene) 1366 65.17
5’ reference supported (gene) 1789 85.35
3’ reference supported (gene) 1563 74.57
Intron retention incidence 271 12.93

Evaluation NNC

Absolute value Relative value (%)
Number of isoforms 2666 -
5’ and 3’ reference supported (gene) 297 11.14
5’ reference supported (gene) 2450 91.9
3’ reference supported (gene) 341 12.79
Non-canonical SJ incidence 2136 80.12
Full Illumina SJ support 2666 100
RT-switching incidence 407 15.27

Evaluation of Simulation

Simulated transcripts were grouped according to different thresholds and attributes, so metrics were calculated regarding to these ground truth setttings. These sets of ground truth transcripts are:

The following metrics and definitions apply to simulated transcripts.

Evaluation of all simulated transcripts

Value
Number of isoforms simulated 27152
True Positive detections (TP) 15378
Number of transcripts associated to TP (Reference Match) 15385
Partial True Positive detections (PTP) 2961
Number of transcripts associated to PTP 3008
False Negative (FN) 8937
False Positive (FP) 7286
Sensitivity 0.57
Precision 0.6
Non Redundant Precision 0.6
Positive Detection Rate 0.67
False Discovery Rate 0.4
False Detection Rate 0.28
Redundancy 1.01

Evaluation of all GENCODE simulation

Value
Number of isoforms simulated 20152
True Positive detections (TP) 14414
Number of transcripts associated to TP (Reference Match) 14420
Partial True Positive detections (PTP) 1719
Number of transcripts associated to PTP 1742
False Negative (FN) 4127
False Positive (FP) 9517
Sensitivity 0.72
Positive Detection Rate 0.8
Redundancy 1.01

Evaluation of only GENCODE transcripts simulated

Value
Number of isoforms simulated 15608
True Positive detections (TP) 11859
Number of transcripts associated to TP (Reference Match) 11865
Partial True Positive detections (PTP) 1080
Number of transcripts associated to PTP 1099
False Negative (FN) 2766
False Positive (FP) 12715
Sensitivity 0.76
Positive Detection Rate 0.82
Redundancy 1.01

Evaluation of only GENCODE transcripts simulated with TPM >= 5

Value
Number of isoforms simulated 9428
True Positive detections (TP) 7755
Number of transcripts associated to TP (Reference Match) 7761
Partial True Positive detections (PTP) 406
Number of transcripts associated to PTP 414
False Negative (FN) 1341
False Positive (FP) 17504
Sensitivity 0.82
Positive Detection Rate 0.86
Redundancy 1.01

Evaluation of novelty

Value
Number of isoforms simulated 7000
True Positive detections (TP) 964
Number of transcripts associated to TP (Reference Match) 965
Partial True Positive detections (PTP) 1242
Number of transcripts associated to PTP 1266
False Negative (FN) 4810
False Positive (FP) 23448
Sensitivity 0.14
Positive Detection Rate 0.31
Redundancy 1.02