Your submission: human_sim_cdna_ont on ONT data

Background

Challenge 1 is evaluated according to four criteria:

  1. Broad GENCODE Annotation
  2. Subset of manually curated loci selected by GENCODE
  3. sim Lexogen Set 4
  4. Simulated data.

The LRGASP uses SQANTI categories to define evaluating features and metrics for Challenge 1.

LRGASP Challenge 1 Definitions:

This document shows the performance of your pipeline for criteria 4. Critical data for evaluation according to 2. and 4. will be made available after the closure of the challenge, and therefore pre-evaluation reports cannot be provided. Note you???re your criterion 1 metrics reported here have been calculated using GENCODE human v38 and mouse M27 releases while the final evaluation will use human v39 and mouse M28 to be released after completing of the challenge.

Evaluation of detected transcripts for Challenge 1

Global overview

Value
Number of genes detected 16590
Number of known genes detected 15442
Number of transcripts detected 35733
Number of transcripts associated to a known gene 34055
Number of unique SJ detected 77645
Absolute value Relative value (%)
Novel SJ 1355 0.02
Non-canonical SJ 336 0.00

Evaluation of FSM

Absolute value Relative value (%)
Number of isoforms 11869 -
Reference Match 7634 64.32
5’ reference supported (transcript) 10409 87.7
3’ reference supported (transcript) 7822 65.9
5’ reference supported (gene) 10927 92.06
3’ reference supported (gene) 8127 68.47
Supported Reference Transcript Model (SRTM) 7947 66.96
Reference redundancy Level 1.05 -

Evaluation of ISM

Absolute value Relative value (%)
Number of isoforms 19321 -
5’ reference supported (transcript) 2435 12.6
3’ reference supported (transcript) 1931 9.99
5’ and 3’ reference supported (gene) 125 0.65
5’ reference supported (gene) 3313 3313
3’ reference supported (gene) 2658 13.76
Supported Reference Transcript Model (SRTM) 125 0.65
Reference redundancy Level 1.44 -

Evaluation NIC

Absolute value Relative value (%)
Number of isoforms 1771 -
5’ and 3’ reference supported (gene) 130 7.34
5’ reference supported (gene) 457 25.8
3’ reference supported (gene) 396 22.36
Intron retention incidence 57 3.22

Evaluation NNC

Absolute value Relative value (%)
Number of isoforms 1094 -
5’ and 3’ reference supported (gene) 181 16.54
5’ reference supported (gene) 446 40.77
3’ reference supported (gene) 381 34.83
Non-canonical SJ incidence 8 0.73
Full Illumina SJ support 1094 100
RT-switching incidence 56 5.12

Evaluation of Simulation

Simulated transcripts were grouped according to different thresholds and attributes, so metrics were calculated regarding to these ground truth setttings. These sets of ground truth transcripts are:

The following metrics and definitions apply to simulated transcripts.

Evaluation of all simulated transcripts

Value
Number of isoforms simulated 48192
True Positive detections (TP) 6727
Number of transcripts associated to TP (Reference Match) 6752
Partial True Positive detections (PTP) 15215
Number of transcripts associated to PTP 21503
False Negative (FN) 27025
False Positive (FP) 7478
Sensitivity 0.14
Precision 0.19
Non Redundant Precision 0.19
Positive Detection Rate 0.44
False Discovery Rate 0.64
False Detection Rate 0.21
Redundancy 1.33

Evaluation of all GENCODE simulation

Value
Number of isoforms simulated 41964
True Positive detections (TP) 6606
Number of transcripts associated to TP (Reference Match) 6631
Partial True Positive detections (PTP) 13498
Number of transcripts associated to PTP 19039
False Negative (FN) 22626
False Positive (FP) 10063
Sensitivity 0.16
Positive Detection Rate 0.46
Redundancy 1.33

Evaluation of only GENCODE transcripts simulated

Value
Number of isoforms simulated 28414
True Positive detections (TP) 4327
Number of transcripts associated to TP (Reference Match) 4345
Partial True Positive detections (PTP) 11863
Number of transcripts associated to PTP 17132
False Negative (FN) 12859
False Positive (FP) 14256
Sensitivity 0.15
Positive Detection Rate 0.55
Redundancy 1.38

Evaluation of only GENCODE transcripts simulated with TPM >= 5

Value
Number of isoforms simulated 16196
True Positive detections (TP) 2413
Number of transcripts associated to TP (Reference Match) 2424
Partial True Positive detections (PTP) 8487
Number of transcripts associated to PTP 12983
False Negative (FN) 5717
False Positive (FP) 20326
Sensitivity 0.15
Positive Detection Rate 0.65
Redundancy 1.47

Evaluation of novelty

Value
Number of isoforms simulated 6228
True Positive detections (TP) 121
Number of transcripts associated to TP (Reference Match) 121
Partial True Positive detections (PTP) 1717
Number of transcripts associated to PTP 2464
False Negative (FN) 4399
False Positive (FP) 33148
Sensitivity 0.02
Positive Detection Rate 0.29
Redundancy 1.41