Your submission: mouse_sim_cdna_pacbio on PacBio data

Background

Challenge 1 is evaluated according to four criteria:

  1. Broad GENCODE Annotation
  2. Subset of manually curated loci selected by GENCODE
  3. sim Lexogen Set 4
  4. Simulated data.

The LRGASP uses SQANTI categories to define evaluating features and metrics for Challenge 1.

LRGASP Challenge 1 Definitions:

This document shows the performance of your pipeline for criteria 4. Critical data for evaluation according to 2. and 4. will be made available after the closure of the challenge, and therefore pre-evaluation reports cannot be provided. Note you???re your criterion 1 metrics reported here have been calculated using GENCODE human v38 and mouse M27 releases while the final evaluation will use human v39 and mouse M28 to be released after completing of the challenge.

Evaluation of detected transcripts for Challenge 1

Global overview

Value
Number of genes detected 13874
Number of known genes detected 13843
Number of transcripts detected 24579
Number of transcripts associated to a known gene 24543
Number of unique SJ detected 140393
Absolute value Relative value (%)
Novel SJ 367 0
Non-canonical SJ 418 0

Evaluation of FSM

Absolute value Relative value (%)
Number of isoforms 23465 -
Reference Match 23232 99.01
5’ reference supported (transcript) 23306 99.32
3’ reference supported (transcript) 23278 99.2
5’ reference supported (gene) 23408 99.76
3’ reference supported (gene) 23420 99.81
Supported Reference Transcript Model (SRTM) 23372 99.6
Reference redundancy Level 1.01 -

Evaluation of ISM

Absolute value Relative value (%)
Number of isoforms 489 -
5’ reference supported (transcript) 67 13.7
3’ reference supported (transcript) 193 39.47
5’ and 3’ reference supported (gene) 218 44.58
5’ reference supported (gene) 276 276
3’ reference supported (gene) 405 82.82
Supported Reference Transcript Model (SRTM) 218 44.58
Reference redundancy Level 1.01 -

Evaluation NIC

Absolute value Relative value (%)
Number of isoforms 351 -
5’ and 3’ reference supported (gene) 306 87.18
5’ reference supported (gene) 317 90.31
3’ reference supported (gene) 335 95.44
Intron retention incidence 13 3.7

Evaluation NNC

Absolute value Relative value (%)
Number of isoforms 238 -
5’ and 3’ reference supported (gene) 138 57.98
5’ reference supported (gene) 170 71.43
3’ reference supported (gene) 186 78.15
Non-canonical SJ incidence 16 6.72
Full Illumina SJ support 238 100
RT-switching incidence 13 5.46

Evaluation of Simulation

Simulated transcripts were grouped according to different thresholds and attributes, so metrics were calculated regarding to these ground truth setttings. These sets of ground truth transcripts are:

The following metrics and definitions apply to simulated transcripts.

Evaluation of all simulated transcripts

Value
Number of isoforms simulated 27152
True Positive detections (TP) 22292
Number of transcripts associated to TP (Reference Match) 22341
Partial True Positive detections (PTP) 468
Number of transcripts associated to PTP 481
False Negative (FN) 4660
False Positive (FP) 1757
Sensitivity 0.82
Precision 0.91
Non Redundant Precision 0.91
Positive Detection Rate 0.83
False Discovery Rate 0.09
False Detection Rate 0.07
Redundancy 1.01

Evaluation of all GENCODE simulation

Value
Number of isoforms simulated 20152
True Positive detections (TP) 20057
Number of transcripts associated to TP (Reference Match) 20106
Partial True Positive detections (PTP) 245
Number of transcripts associated to PTP 248
False Negative (FN) 90
False Positive (FP) 4225
Sensitivity 1
Positive Detection Rate 1
Redundancy 1.01

Evaluation of only GENCODE transcripts simulated

Value
Number of isoforms simulated 15608
True Positive detections (TP) 15572
Number of transcripts associated to TP (Reference Match) 15611
Partial True Positive detections (PTP) 161
Number of transcripts associated to PTP 164
False Negative (FN) 36
False Positive (FP) 8804
Sensitivity 1
Positive Detection Rate 1
Redundancy 1.01

Evaluation of only GENCODE transcripts simulated with TPM >= 5

Value
Number of isoforms simulated 9428
True Positive detections (TP) 9416
Number of transcripts associated to TP (Reference Match) 9430
Partial True Positive detections (PTP) 73
Number of transcripts associated to PTP 76
False Negative (FN) 12
False Positive (FP) 15073
Sensitivity 1
Positive Detection Rate 1
Redundancy 1.01

Evaluation of novelty

Value
Number of isoforms simulated 7000
True Positive detections (TP) 2235
Number of transcripts associated to TP (Reference Match) 2235
Partial True Positive detections (PTP) 223
Number of transcripts associated to PTP 233
False Negative (FN) 4570
False Positive (FP) 22111
Sensitivity 0.32
Positive Detection Rate 0.35
Redundancy 1.02