Your submission: human_simulation_cDNA_ONT on ONT data

Background

Challenge 1 is evaluated according to four criteria:

  1. Broad GENCODE Annotation
  2. Subset of manually curated loci selected by GENCODE
  3. sim Lexogen Set 4
  4. Simulated data.

The LRGASP uses SQANTI categories to define evaluating features and metrics for Challenge 1.

LRGASP Challenge 1 Definitions:

This document shows the performance of your pipeline for criteria 4. Critical data for evaluation according to 2. and 4. will be made available after the closure of the challenge, and therefore pre-evaluation reports cannot be provided. Note you???re your criterion 1 metrics reported here have been calculated using GENCODE human v38 and mouse M27 releases while the final evaluation will use human v39 and mouse M28 to be released after completing of the challenge.

Evaluation of detected transcripts for Challenge 1

Global overview

Value
Number of genes detected 13942
Number of known genes detected 13842
Number of transcripts detected 22202
Number of transcripts associated to a known gene 22045
Number of unique SJ detected 51773
Absolute value Relative value (%)
Novel SJ 405 0.01
Non-canonical SJ 147 0.00

Evaluation of FSM

Absolute value Relative value (%)
Number of isoforms 16945 -
Reference Match 16824 99.29
5’ reference supported (transcript) 16867 99.54
3’ reference supported (transcript) 16857 99.48
5’ reference supported (gene) 16893 99.69
3’ reference supported (gene) 16878 99.6
Supported Reference Transcript Model (SRTM) 16856 99.47
Reference redundancy Level 1 -

Evaluation of ISM

Absolute value Relative value (%)
Number of isoforms 4158 -
5’ reference supported (transcript) 685 16.47
3’ reference supported (transcript) 1152 27.71
5’ and 3’ reference supported (gene) 107 2.57
5’ reference supported (gene) 871 871
3’ reference supported (gene) 1409 33.89
Supported Reference Transcript Model (SRTM) 107 2.57
Reference redundancy Level 1.19 -

Evaluation NIC

Absolute value Relative value (%)
Number of isoforms 706 -
5’ and 3’ reference supported (gene) 86 12.18
5’ reference supported (gene) 188 26.63
3’ reference supported (gene) 367 51.98
Intron retention incidence 415 58.78

Evaluation NNC

Absolute value Relative value (%)
Number of isoforms 236 -
5’ and 3’ reference supported (gene) 57 24.15
5’ reference supported (gene) 126 53.39
3’ reference supported (gene) 103 43.64
Non-canonical SJ incidence 0 0
Full Illumina SJ support 236 100
RT-switching incidence 15 6.36

Evaluation of Simulation

Simulated transcripts were grouped according to different thresholds and attributes, so metrics were calculated regarding to these ground truth setttings. These sets of ground truth transcripts are:

The following metrics and definitions apply to simulated transcripts.

Evaluation of all simulated transcripts

Value
Number of isoforms simulated 48192
True Positive detections (TP) 15133
Number of transcripts associated to TP (Reference Match) 15175
Partial True Positive detections (PTP) 3129
Number of transcripts associated to PTP 3714
False Negative (FN) 30693
False Positive (FP) 3313
Sensitivity 0.31
Precision 0.68
Non Redundant Precision 0.68
Positive Detection Rate 0.36
False Discovery Rate 0.29
False Detection Rate 0.15
Redundancy 1.08

Evaluation of all GENCODE simulation

Value
Number of isoforms simulated 41964
True Positive detections (TP) 15016
Number of transcripts associated to TP (Reference Match) 15058
Partial True Positive detections (PTP) 2575
Number of transcripts associated to PTP 3066
False Negative (FN) 25127
False Positive (FP) 4078
Sensitivity 0.36
Positive Detection Rate 0.4
Redundancy 1.08

Evaluation of only GENCODE transcripts simulated

Value
Number of isoforms simulated 28414
True Positive detections (TP) 10122
Number of transcripts associated to TP (Reference Match) 10159
Partial True Positive detections (PTP) 2020
Number of transcripts associated to PTP 2431
False Negative (FN) 16850
False Positive (FP) 9612
Sensitivity 0.36
Positive Detection Rate 0.41
Redundancy 1.09

Evaluation of only GENCODE transcripts simulated with TPM >= 5

Value
Number of isoforms simulated 16196
True Positive detections (TP) 5898
Number of transcripts associated to TP (Reference Match) 5925
Partial True Positive detections (PTP) 1340
Number of transcripts associated to PTP 1630
False Negative (FN) 9346
False Positive (FP) 14647
Sensitivity 0.36
Positive Detection Rate 0.42
Redundancy 1.1

Evaluation of novelty

Value
Number of isoforms simulated 6228
True Positive detections (TP) 117
Number of transcripts associated to TP (Reference Match) 117
Partial True Positive detections (PTP) 554
Number of transcripts associated to PTP 648
False Negative (FN) 5566
False Positive (FP) 21437
Sensitivity 0.02
Positive Detection Rate 0.11
Redundancy 1.16