Your submission: mouse_simulation_cDNA_PacBio on PacBio data

Background

Challenge 1 is evaluated according to four criteria:

  1. Broad GENCODE Annotation
  2. Subset of manually curated loci selected by GENCODE
  3. sim Lexogen Set 4
  4. Simulated data.

The LRGASP uses SQANTI categories to define evaluating features and metrics for Challenge 1.

LRGASP Challenge 1 Definitions:

This document shows the performance of your pipeline for criteria 4. Critical data for evaluation according to 2. and 4. will be made available after the closure of the challenge, and therefore pre-evaluation reports cannot be provided. Note you???re your criterion 1 metrics reported here have been calculated using GENCODE human v38 and mouse M27 releases while the final evaluation will use human v39 and mouse M28 to be released after completing of the challenge.

Evaluation of detected transcripts for Challenge 1

Global overview

Value
Number of genes detected 13618
Number of known genes detected 13468
Number of transcripts detected 23306
Number of transcripts associated to a known gene 23129
Number of unique SJ detected 137208
Absolute value Relative value (%)
Novel SJ 526 0
Non-canonical SJ 233 0

Evaluation of FSM

Absolute value Relative value (%)
Number of isoforms 21699 -
Reference Match 21580 99.45
5’ reference supported (transcript) 21643 99.74
3’ reference supported (transcript) 21601 99.55
5’ reference supported (gene) 21662 99.83
3’ reference supported (gene) 21624 99.65
Supported Reference Transcript Model (SRTM) 21607 99.58
Reference redundancy Level 1 -

Evaluation of ISM

Absolute value Relative value (%)
Number of isoforms 945 -
5’ reference supported (transcript) 114 12.06
3’ reference supported (transcript) 748 79.15
5’ and 3’ reference supported (gene) 138 14.6
5’ reference supported (gene) 242 242
3’ reference supported (gene) 822 86.98
Supported Reference Transcript Model (SRTM) 138 14.6
Reference redundancy Level 1.02 -

Evaluation NIC

Absolute value Relative value (%)
Number of isoforms 256 -
5’ and 3’ reference supported (gene) 239 93.36
5’ reference supported (gene) 250 97.66
3’ reference supported (gene) 244 95.31
Intron retention incidence 30 11.72

Evaluation NNC

Absolute value Relative value (%)
Number of isoforms 229 -
5’ and 3’ reference supported (gene) 192 83.84
5’ reference supported (gene) 204 89.08
3’ reference supported (gene) 199 86.9
Non-canonical SJ incidence 90 39.3
Full Illumina SJ support 229 100
RT-switching incidence 22 9.61

Evaluation of Simulation

Simulated transcripts were grouped according to different thresholds and attributes, so metrics were calculated regarding to these ground truth setttings. These sets of ground truth transcripts are:

The following metrics and definitions apply to simulated transcripts.

Evaluation of all simulated transcripts

Value
Number of isoforms simulated 27152
True Positive detections (TP) 21352
Number of transcripts associated to TP (Reference Match) 21367
Partial True Positive detections (PTP) 976
Number of transcripts associated to PTP 993
False Negative (FN) 5496
False Positive (FP) 946
Sensitivity 0.79
Precision 0.92
Non Redundant Precision 0.92
Positive Detection Rate 0.8
False Discovery Rate 0.08
False Detection Rate 0.04
Redundancy 1.03

Evaluation of all GENCODE simulation

Value
Number of isoforms simulated 20152
True Positive detections (TP) 18956
Number of transcripts associated to TP (Reference Match) 18971
Partial True Positive detections (PTP) 677
Number of transcripts associated to PTP 690
False Negative (FN) 1123
False Positive (FP) 3645
Sensitivity 0.94
Positive Detection Rate 0.94
Redundancy 1.03

Evaluation of only GENCODE transcripts simulated

Value
Number of isoforms simulated 15608
True Positive detections (TP) 14772
Number of transcripts associated to TP (Reference Match) 14783
Partial True Positive detections (PTP) 598
Number of transcripts associated to PTP 607
False Negative (FN) 790
False Positive (FP) 7916
Sensitivity 0.95
Positive Detection Rate 0.95
Redundancy 1.04

Evaluation of only GENCODE transcripts simulated with TPM >= 5

Value
Number of isoforms simulated 9428
True Positive detections (TP) 9044
Number of transcripts associated to TP (Reference Match) 9052
Partial True Positive detections (PTP) 307
Number of transcripts associated to PTP 311
False Negative (FN) 366
False Positive (FP) 13943
Sensitivity 0.96
Positive Detection Rate 0.96
Redundancy 1.03

Evaluation of novelty

Value
Number of isoforms simulated 7000
True Positive detections (TP) 2396
Number of transcripts associated to TP (Reference Match) 2396
Partial True Positive detections (PTP) 299
Number of transcripts associated to PTP 303
False Negative (FN) 4373
False Positive (FP) 20607
Sensitivity 0.34
Positive Detection Rate 0.38
Redundancy 1.03