Your submission: Simulated_human on PacBio data

Background

Challenge 1 is evaluated according to four criteria:

  1. Broad GENCODE Annotation
  2. Subset of manually curated loci selected by GENCODE
  3. sim Lexogen Set 4
  4. Simulated data.

The LRGASP uses SQANTI categories to define evaluating features and metrics for Challenge 1.

LRGASP Challenge 1 Definitions:

This document shows the performance of your pipeline for criteria 4. Critical data for evaluation according to 2. and 4. will be made available after the closure of the challenge, and therefore pre-evaluation reports cannot be provided. Note you???re your criterion 1 metrics reported here have been calculated using GENCODE human v38 and mouse M27 releases while the final evaluation will use human v39 and mouse M28 to be released after completing of the challenge.

Evaluation of detected transcripts for Challenge 1

Global overview

Value
Number of genes detected 21130
Number of known genes detected 19961
Number of transcripts detected 42609
Number of transcripts associated to a known gene 41295
Number of unique SJ detected 182390
Absolute value Relative value (%)
Novel SJ 626 0.00
Non-canonical SJ 960 0.01

Evaluation of FSM

Absolute value Relative value (%)
Number of isoforms 40100 -
Reference Match 39770 99.18
5’ reference supported (transcript) 39826 99.32
3’ reference supported (transcript) 39998 99.75
5’ reference supported (gene) 39921 99.55
3’ reference supported (gene) 40060 99.9
Supported Reference Transcript Model (SRTM) 39894 99.49
Reference redundancy Level 1 -

Evaluation of ISM

Absolute value Relative value (%)
Number of isoforms 507 -
5’ reference supported (transcript) 53 10.45
3’ reference supported (transcript) 442 87.18
5’ and 3’ reference supported (gene) 111 21.89
5’ reference supported (gene) 149 149
3’ reference supported (gene) 466 91.91
Supported Reference Transcript Model (SRTM) 111 21.89
Reference redundancy Level 1 -

Evaluation NIC

Absolute value Relative value (%)
Number of isoforms 334 -
5’ and 3’ reference supported (gene) 251 75.15
5’ reference supported (gene) 269 80.54
3’ reference supported (gene) 291 87.13
Intron retention incidence 68 20.36

Evaluation NNC

Absolute value Relative value (%)
Number of isoforms 354 -
5’ and 3’ reference supported (gene) 306 86.44
5’ reference supported (gene) 315 88.98
3’ reference supported (gene) 329 92.94
Non-canonical SJ incidence 288 81.36
Full Illumina SJ support 354 100
RT-switching incidence 71 20.06

Evaluation of Simulation

Simulated transcripts were grouped according to different thresholds and attributes, so metrics were calculated regarding to these ground truth setttings. These sets of ground truth transcripts are:

The following metrics and definitions apply to simulated transcripts.

Evaluation of all simulated transcripts

Value
Number of isoforms simulated 48192
True Positive detections (TP) 39738
Number of transcripts associated to TP (Reference Match) 39750
Partial True Positive detections (PTP) 765
Number of transcripts associated to PTP 770
False Negative (FN) 8017
False Positive (FP) 2089
Sensitivity 0.82
Precision 0.93
Non Redundant Precision 0.93
Positive Detection Rate 0.83
False Discovery Rate 0.07
False Detection Rate 0.05
Redundancy 1.01

Evaluation of all GENCODE simulation

Value
Number of isoforms simulated 41964
True Positive detections (TP) 34992
Number of transcripts associated to TP (Reference Match) 35003
Partial True Positive detections (PTP) 698
Number of transcripts associated to PTP 703
False Negative (FN) 6583
False Positive (FP) 6903
Sensitivity 0.83
Positive Detection Rate 0.84
Redundancy 1.01

Evaluation of only GENCODE transcripts simulated

Value
Number of isoforms simulated 28414
True Positive detections (TP) 26065
Number of transcripts associated to TP (Reference Match) 26073
Partial True Positive detections (PTP) 363
Number of transcripts associated to PTP 367
False Negative (FN) 2234
False Positive (FP) 16169
Sensitivity 0.92
Positive Detection Rate 0.92
Redundancy 1.01

Evaluation of only GENCODE transcripts simulated with TPM >= 5

Value
Number of isoforms simulated 16196
True Positive detections (TP) 15371
Number of transcripts associated to TP (Reference Match) 15374
Partial True Positive detections (PTP) 154
Number of transcripts associated to PTP 157
False Negative (FN) 770
False Positive (FP) 27078
Sensitivity 0.95
Positive Detection Rate 0.95
Redundancy 1.01

Evaluation of novelty

Value
Number of isoforms simulated 6228
True Positive detections (TP) 4746
Number of transcripts associated to TP (Reference Match) 4747
Partial True Positive detections (PTP) 67
Number of transcripts associated to PTP 67
False Negative (FN) 1434
False Positive (FP) 37795
Sensitivity 0.76
Positive Detection Rate 0.77
Redundancy 1