Your submission: human_simulation_cDNA_ONT on ONT data

Background

Challenge 1 is evaluated according to four criteria:

  1. Broad GENCODE Annotation
  2. Subset of manually curated loci selected by GENCODE
  3. sim Lexogen Set 4
  4. Simulated data.

The LRGASP uses SQANTI categories to define evaluating features and metrics for Challenge 1.

LRGASP Challenge 1 Definitions:

This document shows the performance of your pipeline for criteria 4. Critical data for evaluation according to 2. and 4. will be made available after the closure of the challenge, and therefore pre-evaluation reports cannot be provided. Note you???re your criterion 1 metrics reported here have been calculated using GENCODE human v38 and mouse M27 releases while the final evaluation will use human v39 and mouse M28 to be released after completing of the challenge.

Evaluation of detected transcripts for Challenge 1

Global overview

Value
Number of genes detected 17105
Number of known genes detected 16366
Number of transcripts detected 84043
Number of transcripts associated to a known gene 79854
Number of unique SJ detected 125912
Absolute value Relative value (%)
Novel SJ 17766 0.14
Non-canonical SJ 0 0.00

Evaluation of FSM

Absolute value Relative value (%)
Number of isoforms 10864 -
Reference Match 5624 51.77
5’ reference supported (transcript) 7929 72.98
3’ reference supported (transcript) 7219 66.45
5’ reference supported (gene) 8316 76.55
3’ reference supported (gene) 7680 70.69
Supported Reference Transcript Model (SRTM) 6125 56.38
Reference redundancy Level 1 -

Evaluation of ISM

Absolute value Relative value (%)
Number of isoforms 47748 -
5’ reference supported (transcript) 4341 9.09
3’ reference supported (transcript) 9033 18.92
5’ and 3’ reference supported (gene) 254 0.53
5’ reference supported (gene) 5354 5354
3’ reference supported (gene) 10807 22.63
Supported Reference Transcript Model (SRTM) 254 0.53
Reference redundancy Level 2.3 -

Evaluation NIC

Absolute value Relative value (%)
Number of isoforms 2198 -
5’ and 3’ reference supported (gene) 255 11.6
5’ reference supported (gene) 337 15.33
3’ reference supported (gene) 1836 83.53
Intron retention incidence 50 2.27

Evaluation NNC

Absolute value Relative value (%)
Number of isoforms 19044 -
5’ and 3’ reference supported (gene) 5933 31.15
5’ reference supported (gene) 7607 39.94
3’ reference supported (gene) 10032 52.68
Non-canonical SJ incidence 0 0
Full Illumina SJ support 19044 100
RT-switching incidence 449 2.36

Evaluation of Simulation

Simulated transcripts were grouped according to different thresholds and attributes, so metrics were calculated regarding to these ground truth setttings. These sets of ground truth transcripts are:

The following metrics and definitions apply to simulated transcripts.

Evaluation of all simulated transcripts

Value
Number of isoforms simulated 48192
True Positive detections (TP) 5593
Number of transcripts associated to TP (Reference Match) 5594
Partial True Positive detections (PTP) 22486
Number of transcripts associated to PTP 48436
False Negative (FN) 20638
False Positive (FP) 30013
Sensitivity 0.12
Precision 0.07
Non Redundant Precision 0.07
Positive Detection Rate 0.57
False Discovery Rate 0.62
False Detection Rate 0.36
Redundancy 1.96

Evaluation of all GENCODE simulation

Value
Number of isoforms simulated 41964
True Positive detections (TP) 4435
Number of transcripts associated to TP (Reference Match) 4436
Partial True Positive detections (PTP) 18592
Number of transcripts associated to PTP 37525
False Negative (FN) 19315
False Positive (FP) 42082
Sensitivity 0.11
Positive Detection Rate 0.54
Redundancy 1.85

Evaluation of only GENCODE transcripts simulated

Value
Number of isoforms simulated 28414
True Positive detections (TP) 3156
Number of transcripts associated to TP (Reference Match) 3157
Partial True Positive detections (PTP) 15657
Number of transcripts associated to PTP 32810
False Negative (FN) 9956
False Positive (FP) 48076
Sensitivity 0.11
Positive Detection Rate 0.65
Redundancy 1.95

Evaluation of only GENCODE transcripts simulated with TPM >= 5

Value
Number of isoforms simulated 16196
True Positive detections (TP) 1991
Number of transcripts associated to TP (Reference Match) 1992
Partial True Positive detections (PTP) 10816
Number of transcripts associated to PTP 24584
False Negative (FN) 3688
False Positive (FP) 57467
Sensitivity 0.12
Positive Detection Rate 0.77
Redundancy 2.12

Evaluation of novelty

Value
Number of isoforms simulated 6228
True Positive detections (TP) 1158
Number of transcripts associated to TP (Reference Match) 1158
Partial True Positive detections (PTP) 3894
Number of transcripts associated to PTP 10911
False Negative (FN) 1323
False Positive (FP) 71974
Sensitivity 0.19
Positive Detection Rate 0.79
Redundancy 2.46