Reporting & Visualization
OneRoof generates comprehensive reports from pipeline runs, including structured JSON data, MultiQC integration, and interactive visualizations. This document describes the reporting outputs and configuration options.
1 Report Outputs
Each pipeline run produces the following in the report directory:
09_report/
├── oneroof_report.json # Full structured report
├── oneroof_report_summary.json # Summary without per-sample details
├── multiqc/
│ ├── multiqc_config.yaml # MultiQC configuration
│ ├── oneroof_general_stats_mqc.tsv
│ ├── oneroof_coverage_table_mqc.tsv
│ ├── oneroof_variants_mqc.tsv # If variants present
│ ├── oneroof_amplicon_efficiency_mqc.tsv # If primers provided
│ └── oneroof_amplicon_heatmap_mqc.tsv # If primers provided
└── visualizations/
├── coverage_heatmap.html
├── coverage_bar.html
├── qc_status_summary.html
├── coverage_distribution.html
├── completeness_distribution.html
├── qc_scatter.html
├── variant_type_bar.html # If variants present
├── variant_effect_bar.html # If variants present
├── amplicon_ranking.html # If primers provided
├── amplicon_dropout.html # If primers provided
└── amplicon_heatmap.html # If primers provided
06_QC/
├── oneroof_multiqc_report.html # Unified MultiQC report
└── oneroof_multiqc_report_data/ # MultiQC data files
2 JSON Report Schema
The JSON report follows a versioned schema (currently 0.1.0-alpha). The full schema is available at schemas/current/oneroof_report.schema.json.
2.1 Top-Level Structure
{
"schema_version": "0.1.0-alpha",
"generated_at": "2024-01-15T10:30:00Z",
"run_metadata": { ... },
"summary": { ... },
"samples": { ... }
}2.2 Run Metadata
Contains pipeline configuration:
| Field | Description |
|---|---|
platform |
"ont" or "illumina" |
reference.name |
Reference sequence name |
reference.length |
Reference length in bases |
primers.provided |
Whether primer scheme was used |
parameters.min_depth_coverage |
Minimum depth for consensus |
parameters.min_consensus_freq |
Consensus variant frequency threshold |
parameters.min_variant_frequency |
Subclonal variant frequency threshold |
2.3 Summary Statistics
Aggregate metrics across all samples:
| Field | Description |
|---|---|
sample_count |
Total samples processed |
samples_pass |
Samples passing QC |
samples_warn |
Samples with QC warnings |
samples_fail |
Samples failing QC |
mean_coverage_depth |
Average coverage across samples |
mean_genome_coverage |
Average genome fraction at ≥10x |
total_variants_called |
Sum of variants across samples |
2.4 Per-Sample Metrics
Each sample includes:
qc_status: One of"pass","warn", or"fail"qc_notes: Human-readable explanations for non-passing statusalignment: Read mapping statisticsvariants: Variant calling results (if available)consensus: Consensus sequence quality metrics (if available)metagenomics: Sylph profiling results (if enabled)haplotyping: Devider haplotype phasing (Nanopore only, if enabled)
3 QC Thresholds
Samples are classified as pass/warn/fail based on configurable thresholds:
| Metric | Pass | Warn | Fail |
|---|---|---|---|
| Genome coverage (≥10x) | ≥95% | ≥80% | <80% |
| Completeness (non-N) | ≥98% | ≥90% | <90% |
| N percentage | <1% | <5% | ≥5% |
| Mapped reads | ≥1000 | ≥100 | <100 |
3.1 Customizing Thresholds
Thresholds can be adjusted in conf/reporting.config:
params {
qc_coverage_pass = 0.95
qc_coverage_warn = 0.80
qc_completeness_pass = 0.98
qc_completeness_warn = 0.90
qc_n_pct_warn = 5.0
qc_n_pct_fail = 10.0
qc_min_reads_warn = 1000
qc_min_reads_fail = 100
}4 Visualizations
All visualizations are generated as self-contained HTML files with interactive features (hover tooltips, zoom, pan).
4.1 Coverage Visualizations
Coverage Heatmap: Multi-sample view showing coverage depth across samples. Useful for identifying systematic coverage patterns.
Coverage Bar Chart: Per-sample mean coverage comparison.
Coverage Distribution: Histogram of coverage values across samples.
4.2 QC Dashboard
QC Status Summary: Donut chart showing pass/warn/fail distribution.
QC Scatter Plot: Coverage vs. completeness, colored by QC status. Helps identify samples that need attention.
Completeness Distribution: Histogram of consensus completeness values.
4.3 Variant Visualizations
Generated only when variants are called:
Variant Type Bar: Stacked bar chart showing SNPs, insertions, deletions, and MNPs per sample.
Variant Effect Bar: Stacked bar chart showing variant effects (missense, synonymous, etc.) per sample.
4.4 Amplicon Efficiency
Generated only when primers are provided:
Amplicon Ranking: Bar chart of amplicons sorted by median read count, colored by performance tier (good/moderate/poor).
Amplicon Dropout Scatter: Scatter plot of median reads vs. dropout rate. Amplicons in the upper-left (low reads, high dropout) may need primer redesign.
Amplicon Heatmap: Sample × amplicon matrix showing read counts. Rows are samples (alphabetical), columns are amplicons (by genomic position). Color intensity represents read count on a log scale.
5 MultiQC Integration
OneRoof adds custom content to MultiQC reports. The unified report is output to 06_QC/oneroof_multiqc_report.html.
5.1 General Statistics Table
New columns added to the MultiQC General Statistics table:
- Mean Coverage
- Genome % (at ≥10x)
- Completeness %
- N %
These columns include conditional formatting (green/yellow/red) based on QC thresholds.
5.2 Coverage Summary Section
A dedicated table section with detailed coverage statistics:
- Mean and median coverage
- Genome fraction at 1x, 10x, 100x
- Min/max coverage values
5.3 Variant Summary Section
A stacked bar chart showing variant types per sample (if variants are called):
- SNPs, insertions, deletions, and MNPs
- Only samples with non-zero variants are shown
5.4 Amplicon Efficiency Section
When primers are provided, two additional sections appear:
Amplicon Efficiency Table: Sortable table with per-amplicon metrics:
- Median reads across samples
- Dropout percentage (samples with zero reads)
- Sample count
- Performance tier (good/moderate/poor with color coding)
Amplicon Coverage Heatmap: Sample × amplicon matrix showing read counts:
- Rows are samples (sorted alphabetically)
- Columns are amplicons (sorted by genomic position)
- Color intensity represents read count
5.5 Section Ordering
Sections appear in this order (configurable in conf/multiqc_config.yaml):
- OneRoof General Stats (top)
- Amplicon Coverage Heatmap
- Amplicon Efficiency Table
- Coverage Summary Table
- Variant Summary
- FastQC
- Software Versions (bottom)
6 Configuration Reference
Full reporting configuration in conf/reporting.config:
params {
// Report generation flags
generate_json_report = true
generate_multiqc = true
generate_static_plots = true
// Report detail level: "summary", "standard", or "full"
report_level = "standard"
// QC thresholds (see above)
// Custom MultiQC config template (optional)
multiqc_config_template = null
}6.1 MultiQC Configuration
The MultiQC configuration template lives at conf/multiqc_config.yaml. It controls:
- Report title and subtitle
- Section ordering
- Search patterns for custom content files
- Table column visibility
- Conditional formatting colors
To customize, either edit conf/multiqc_config.yaml directly or provide your own template via --multiqc_config_template.
7 Programmatic Access
The JSON report can be consumed by downstream tools:
import json
from pathlib import Path
report = json.loads(Path("oneroof_report.json").read_text())
# Access summary
print(f"Samples passing QC: {report['summary']['samples_pass']}")
# Iterate samples
for sample_id, metrics in report["samples"].items():
if metrics["qc_status"] == "fail":
print(f"{sample_id}: {metrics['qc_notes']}")7.1 Schema Validation
Reports can be validated against the JSON schema:
import json
import jsonschema
report = json.loads(Path("oneroof_report.json").read_text())
schema = json.loads(Path("schemas/current/oneroof_report.schema.json").read_text())
jsonschema.validate(report, schema) # Raises on invalid