Your submission: human_sim_cdna_pacbio on PacBio data
Background
Challenge 1 is evaluated according to four criteria:
- Broad GENCODE Annotation
- Subset of manually curated loci selected by GENCODE
- sim Lexogen Set 4
- Simulated data.
The LRGASP uses SQANTI categories to define evaluating features and metrics for Challenge 1.
LRGASP Challenge 1 Definitions:
- Full Splice Match (FSM): Transcripts matching a reference transcript at all splice junctions.
- Incomplete Splice Match (ISM): Transcripts matching consecutive, but not all, splice junctions of the reference transcripts.
- Novel in Catalog (NIC): Transcripts containing new combinations of already annotated splice junctions or novel splice junctions formed from already annotated donors and acceptors.
- Novel Not in Catalog (NNC): Transcripts using novel donors and/or acceptors.
- Reference Match (RM): FSM transcript with 5?? and 3??ends within 50 nts of the TSS/TTS annotation.
- 3?? polyA supported: Transcript with polyA motif support at the 3??end.
- 5?? CAGE supported: Transcript with CAGE support at the 5??end.
- 3?? reference supported: Transcript with 3??end within 50 nts from reference transcript or gene TTS.
- 5?? reference supported: Transcript with 5??end within 50 nts from reference transcript or gene TSS.
- Supported Reference Transcript Model (SRTM): FSM/ISM transcript with 5?? end within 50nts of the TSS or has CAGE support AND 3?? end within 50nts of the TTS or has polyA motif support
- Supported Novel Transcript Model (SNTM): NIC/NNC transcript with 5?? end within 50nts of the TSS or CAGE support AND 3?? end within 50nts of the TTS or polyA motif support AND Illumina read support at novel junctions
- Redundancy: # LR transcript models / reference model
- Intron retention (IR): level Number of IR within the NIC category
- Illumina Splice Junction (SJ): Support % SJ in transcript model with Illumina support
- Full Illumina Splice Junction Support: % transcripts in category with all SJ supported
- % Novel Junctions: # of new junctions / total # junctions
- % Non-canonical junctions: # of non-canonical junctions / total # junctions
- % Non-canonical transcripts % transcripts with at least one nc junction
- Intra-priming: Evidence of intra-priming (see SQANTI ref)
- RT-switching: Evidence of RT-switching (see SQANTI ref)
This document shows the performance of your pipeline for criteria 4. Critical data for evaluation according to 2. and 4. will be made available after the closure of the challenge, and therefore pre-evaluation reports cannot be provided. Note you???re your criterion 1 metrics reported here have been calculated using GENCODE human v38 and mouse M27 releases while the final evaluation will use human v39 and mouse M28 to be released after completing of the challenge.
Evaluation of detected transcripts for Challenge 1
Global overview
Number of genes detected |
20837 |
Number of known genes detected |
20837 |
Number of transcripts detected |
44161 |
Number of transcripts associated to a known gene |
44150 |
Number of unique SJ detected |
188304 |
|
Novel SJ |
261 |
0.00 |
Non-canonical SJ |
1047 |
0.01 |
|
Evaluation of FSM
Number of isoforms |
43581 |
- |
Reference Match |
43120 |
98.94 |
5’ reference supported (transcript) |
43340 |
99.45 |
3’ reference supported (transcript) |
43188 |
99.1 |
5’ reference supported (gene) |
43473 |
99.75 |
3’ reference supported (gene) |
43276 |
99.3 |
Supported Reference Transcript Model (SRTM) |
43238 |
99.21 |
Reference redundancy Level |
1 |
- |
|
Evaluation of ISM
Number of isoforms |
97 |
- |
5’ reference supported (transcript) |
37 |
38.14 |
3’ reference supported (transcript) |
24 |
24.74 |
5’ and 3’ reference supported (gene) |
59 |
60.82 |
5’ reference supported (gene) |
77 |
77 |
3’ reference supported (gene) |
75 |
77.32 |
Supported Reference Transcript Model (SRTM) |
59 |
60.82 |
Reference redundancy Level |
1 |
- |
|

Evaluation NIC
Number of isoforms |
302 |
- |
5’ and 3’ reference supported (gene) |
270 |
89.4 |
5’ reference supported (gene) |
290 |
96.03 |
3’ reference supported (gene) |
276 |
91.39 |
Intron retention incidence |
35 |
11.59 |
|
Evaluation NNC
Number of isoforms |
170 |
- |
5’ and 3’ reference supported (gene) |
101 |
59.41 |
5’ reference supported (gene) |
146 |
85.88 |
3’ reference supported (gene) |
114 |
67.06 |
Non-canonical SJ incidence |
60 |
35.29 |
Full Illumina SJ support |
170 |
100 |
RT-switching incidence |
42 |
24.71 |
|
Evaluation of Simulation
Simulated transcripts were grouped according to different thresholds and attributes, so metrics were calculated regarding to these ground truth setttings. These sets of ground truth transcripts are:
- All simulated transcripts (Using GENCODE + novelty as reference).
- Only GENCODE transcripts simulated.
- Only GENCODE transcripts simulated with at least 1 TPM expression.
- Only GENCODE transcripts simulated with at least 5 TPM expression.
- Only those transcripts that were added from a different organism (novelty).
The following metrics and definitions apply to simulated transcripts.
- Number of isoforms simulated: Total number of simulated isoforms taken into account as ground truth.
- True Positive detections (TP): Isoforms identified as Reference Match of a simulated transcript.
- Partial True Positive detections (PTP): Isoforms identified as ISM or FSM_non_RM
- False Negative (FN): Transcripts that were expected to be detected but they were not.
- False Positive (FP): Transcripts that were detected but they weren’t simulated.
- Sensitivity: TP/Number of isoforms simulated
- Precision: RM/Detected isoforms
- Non_redundant Precision: TP/ detected isoforms
- Positive Detection Rate: unique(TP+PTP)/Number of isoforms simulated
- False Discovery Rate: (FP + PTP)/Detected isoforms
- False Detection Rate: (FP)/Detected isoforms
- Redundancy: (Detected isoforms associated to simulated transcripts)/(Simulated transcripts detected)
Evaluation of all simulated transcripts
Number of isoforms simulated |
48192 |
True Positive detections (TP) |
42625 |
Number of transcripts associated to TP (Reference Match) |
42645 |
Partial True Positive detections (PTP) |
517 |
Number of transcripts associated to PTP |
518 |
False Negative (FN) |
5091 |
False Positive (FP) |
998 |
Sensitivity |
0.88 |
Precision |
0.97 |
Non Redundant Precision |
0.97 |
Positive Detection Rate |
0.89 |
False Discovery Rate |
0.03 |
False Detection Rate |
0.02 |
Redundancy |
1 |
|
Evaluation of all GENCODE simulation
Number of isoforms simulated |
41964 |
True Positive detections (TP) |
39590 |
Number of transcripts associated to TP (Reference Match) |
39595 |
Partial True Positive detections (PTP) |
380 |
Number of transcripts associated to PTP |
380 |
False Negative (FN) |
2021 |
False Positive (FP) |
4186 |
Sensitivity |
0.94 |
Positive Detection Rate |
0.95 |
Redundancy |
1 |
|
Evaluation of only GENCODE transcripts simulated
Number of isoforms simulated |
28414 |
True Positive detections (TP) |
26710 |
Number of transcripts associated to TP (Reference Match) |
26713 |
Partial True Positive detections (PTP) |
258 |
Number of transcripts associated to PTP |
258 |
False Negative (FN) |
1464 |
False Positive (FP) |
17190 |
Sensitivity |
0.94 |
Positive Detection Rate |
0.95 |
Redundancy |
1 |
|
Evaluation of only GENCODE transcripts simulated with TPM >= 5
Number of isoforms simulated |
16196 |
True Positive detections (TP) |
15131 |
Number of transcripts associated to TP (Reference Match) |
15131 |
Partial True Positive detections (PTP) |
137 |
Number of transcripts associated to PTP |
137 |
False Negative (FN) |
943 |
False Positive (FP) |
28893 |
Sensitivity |
0.93 |
Positive Detection Rate |
0.94 |
Redundancy |
1 |
|
Evaluation of novelty
Number of isoforms simulated |
6228 |
True Positive detections (TP) |
3035 |
Number of transcripts associated to TP (Reference Match) |
3050 |
Partial True Positive detections (PTP) |
137 |
Number of transcripts associated to PTP |
138 |
False Negative (FN) |
3070 |
False Positive (FP) |
40973 |
Sensitivity |
0.49 |
Positive Detection Rate |
0.51 |
Redundancy |
1.01 |
|