Your submission: human_sim_cdna_pacbio_ls on PacBio+Illumina data

Background

Challenge 1 is evaluated according to four criteria:

  1. Broad GENCODE Annotation
  2. Subset of manually curated loci selected by GENCODE
  3. sim Lexogen Set 4
  4. Simulated data.

The LRGASP uses SQANTI categories to define evaluating features and metrics for Challenge 1.

LRGASP Challenge 1 Definitions:

This document shows the performance of your pipeline for criteria 4. Critical data for evaluation according to 2. and 4. will be made available after the closure of the challenge, and therefore pre-evaluation reports cannot be provided. Note you???re your criterion 1 metrics reported here have been calculated using GENCODE human v38 and mouse M27 releases while the final evaluation will use human v39 and mouse M28 to be released after completing of the challenge.

Evaluation of detected transcripts for Challenge 1

Global overview

Value
Number of genes detected 20213
Number of known genes detected 20194
Number of transcripts detected 46134
Number of transcripts associated to a known gene 46112
Number of unique SJ detected 187191
Absolute value Relative value (%)
Novel SJ 439 0.00
Non-canonical SJ 1083 0.01

Evaluation of FSM

Absolute value Relative value (%)
Number of isoforms 45318 -
Reference Match 45119 99.56
5’ reference supported (transcript) 45174 99.68
3’ reference supported (transcript) 45225 99.79
5’ reference supported (gene) 45233 99.81
3’ reference supported (gene) 45273 99.9
Supported Reference Transcript Model (SRTM) 45197 99.73
Reference redundancy Level 1 -

Evaluation of ISM

Absolute value Relative value (%)
Number of isoforms 301 -
5’ reference supported (transcript) 37 12.29
3’ reference supported (transcript) 207 68.77
5’ and 3’ reference supported (gene) 51 16.94
5’ reference supported (gene) 90 90
3’ reference supported (gene) 240 79.73
Supported Reference Transcript Model (SRTM) 51 16.94
Reference redundancy Level 1 -

Evaluation NIC

Absolute value Relative value (%)
Number of isoforms 180 -
5’ and 3’ reference supported (gene) 130 72.22
5’ reference supported (gene) 147 81.67
3’ reference supported (gene) 159 88.33
Intron retention incidence 26 14.44

Evaluation NNC

Absolute value Relative value (%)
Number of isoforms 313 -
5’ and 3’ reference supported (gene) 188 60.06
5’ reference supported (gene) 224 71.57
3’ reference supported (gene) 255 81.47
Non-canonical SJ incidence 5 1.6
Full Illumina SJ support 313 100
RT-switching incidence 9 2.88

Evaluation of Simulation

Simulated transcripts were grouped according to different thresholds and attributes, so metrics were calculated regarding to these ground truth setttings. These sets of ground truth transcripts are:

The following metrics and definitions apply to simulated transcripts.

Evaluation of all simulated transcripts

Value
Number of isoforms simulated 48192
True Positive detections (TP) 44402
Number of transcripts associated to TP (Reference Match) 44474
Partial True Positive detections (PTP) 462
Number of transcripts associated to PTP 467
False Negative (FN) 3600
False Positive (FP) 1193
Sensitivity 0.92
Precision 0.96
Non Redundant Precision 0.96
Positive Detection Rate 0.93
False Discovery Rate 0.04
False Detection Rate 0.03
Redundancy 1.01

Evaluation of all GENCODE simulation

Value
Number of isoforms simulated 41964
True Positive detections (TP) 39821
Number of transcripts associated to TP (Reference Match) 39893
Partial True Positive detections (PTP) 292
Number of transcripts associated to PTP 292
False Negative (FN) 2103
False Positive (FP) 5949
Sensitivity 0.95
Positive Detection Rate 0.95
Redundancy 1.01

Evaluation of only GENCODE transcripts simulated

Value
Number of isoforms simulated 28414
True Positive detections (TP) 28301
Number of transcripts associated to TP (Reference Match) 28372
Partial True Positive detections (PTP) 244
Number of transcripts associated to PTP 244
False Negative (FN) 109
False Positive (FP) 17518
Sensitivity 1
Positive Detection Rate 1
Redundancy 1.01

Evaluation of only GENCODE transcripts simulated with TPM >= 5

Value
Number of isoforms simulated 16196
True Positive detections (TP) 16162
Number of transcripts associated to TP (Reference Match) 16209
Partial True Positive detections (PTP) 110
Number of transcripts associated to PTP 110
False Negative (FN) 34
False Positive (FP) 29815
Sensitivity 1
Positive Detection Rate 1
Redundancy 1.01

Evaluation of novelty

Value
Number of isoforms simulated 6228
True Positive detections (TP) 4581
Number of transcripts associated to TP (Reference Match) 4581
Partial True Positive detections (PTP) 170
Number of transcripts associated to PTP 175
False Negative (FN) 1497
False Positive (FP) 41378
Sensitivity 0.74
Positive Detection Rate 0.76
Redundancy 1.01