Your submission: human_simulation_pb on PacBio data

Background

Challenge 1 is evaluated according to four criteria:

  1. Broad GENCODE Annotation
  2. Subset of manually curated loci selected by GENCODE
  3. sim Lexogen Set 4
  4. Simulated data.

The LRGASP uses SQANTI categories to define evaluating features and metrics for Challenge 1.

LRGASP Challenge 1 Definitions:

This document shows the performance of your pipeline for criteria 4. Critical data for evaluation according to 2. and 4. will be made available after the closure of the challenge, and therefore pre-evaluation reports cannot be provided. Note you???re your criterion 1 metrics reported here have been calculated using GENCODE human v38 and mouse M27 releases while the final evaluation will use human v39 and mouse M28 to be released after completing of the challenge.

Evaluation of detected transcripts for Challenge 1

Global overview

Value
Number of genes detected 49960
Number of known genes detected 18760
Number of transcripts detected 169226
Number of transcripts associated to a known gene 61450
Number of unique SJ detected 558653
Absolute value Relative value (%)
Novel SJ 346813 0.62
Non-canonical SJ 368433 0.66

Evaluation of FSM

Absolute value Relative value (%)
Number of isoforms 31759 -
Reference Match 30286 95.36
5’ reference supported (transcript) 30594 96.33
3’ reference supported (transcript) 31285 98.51
5’ reference supported (gene) 30934 97.4
3’ reference supported (gene) 31459 99.06
Supported Reference Transcript Model (SRTM) 30706 96.68
Reference redundancy Level 1 -

Evaluation of ISM

Absolute value Relative value (%)
Number of isoforms 1728 -
5’ reference supported (transcript) 144 8.33
3’ reference supported (transcript) 1482 85.76
5’ and 3’ reference supported (gene) 241 13.95
5’ reference supported (gene) 372 372
3’ reference supported (gene) 1564 90.51
Supported Reference Transcript Model (SRTM) 241 13.95
Reference redundancy Level 1.03 -

Evaluation NIC

Absolute value Relative value (%)
Number of isoforms 457 -
5’ and 3’ reference supported (gene) 181 39.61
5’ reference supported (gene) 194 42.45
3’ reference supported (gene) 234 51.2
Intron retention incidence 121 26.48

Evaluation NNC

Absolute value Relative value (%)
Number of isoforms 27506 -
5’ and 3’ reference supported (gene) 2273 8.26
5’ reference supported (gene) 3765 13.69
3’ reference supported (gene) 2606 9.47
Non-canonical SJ incidence 26634 96.83
Full Illumina SJ support 27506 100
RT-switching incidence 1944 7.07

Evaluation of Simulation

Simulated transcripts were grouped according to different thresholds and attributes, so metrics were calculated regarding to these ground truth setttings. These sets of ground truth transcripts are:

The following metrics and definitions apply to simulated transcripts.

Evaluation of all simulated transcripts

Value
Number of isoforms simulated 48192
True Positive detections (TP) 30284
Number of transcripts associated to TP (Reference Match) 30287
Partial True Positive detections (PTP) 3012
Number of transcripts associated to PTP 3096
False Negative (FN) 16221
False Positive (FP) 135843
Sensitivity 0.63
Precision 0.18
Non Redundant Precision 0.18
Positive Detection Rate 0.66
False Discovery Rate 0.82
False Detection Rate 0.8
Redundancy 1.04

Evaluation of all GENCODE simulation

Value
Number of isoforms simulated 41964
True Positive detections (TP) 25160
Number of transcripts associated to TP (Reference Match) 25160
Partial True Positive detections (PTP) 2552
Number of transcripts associated to PTP 2626
False Negative (FN) 15266
False Positive (FP) 141440
Sensitivity 0.6
Positive Detection Rate 0.64
Redundancy 1.04

Evaluation of only GENCODE transcripts simulated

Value
Number of isoforms simulated 28414
True Positive detections (TP) 21624
Number of transcripts associated to TP (Reference Match) 21624
Partial True Positive detections (PTP) 1876
Number of transcripts associated to PTP 1942
False Negative (FN) 5900
False Positive (FP) 145660
Sensitivity 0.76
Positive Detection Rate 0.79
Redundancy 1.05

Evaluation of only GENCODE transcripts simulated with TPM >= 5

Value
Number of isoforms simulated 16196
True Positive detections (TP) 14209
Number of transcripts associated to TP (Reference Match) 14209
Partial True Positive detections (PTP) 1080
Number of transcripts associated to PTP 1135
False Negative (FN) 1718
False Positive (FP) 153882
Sensitivity 0.88
Positive Detection Rate 0.89
Redundancy 1.06

Evaluation of novelty

Value
Number of isoforms simulated 6228
True Positive detections (TP) 5124
Number of transcripts associated to TP (Reference Match) 5127
Partial True Positive detections (PTP) 460
Number of transcripts associated to PTP 470
False Negative (FN) 955
False Positive (FP) 163629
Sensitivity 0.82
Positive Detection Rate 0.85
Redundancy 1.06