modalysis
modalysis is a pipeline-oriented toolkit for methylation and DMR analysis. It exposes a CLI, a FastAPI server, and reusable Python modules.
All user-facing operations run through this stack:
CLI parser -> CLI handler -> HTTP client -> FastAPI server -> core function
Prerequisites
Python 3.13+
Install dependencies:
uv sync
Required Input Types
The pipeline expects these input categories:
GFF annotation file (for gene coordinates and descriptions)
Pileup
.bedfiles (per sample/modification)DMR
.bedfilesExpression TSV files (
GENE_ID<TAB>STATUS, such asUP,DOWN,NDE)Allowed chromosomes file (one chromosome name per line)
Output Types
Tabular command outputs:
.modalysis(TSV)Plot command outputs:
.pngOptional
dmr gene-counts --output-excel:.xlsx
Recommended Pipeline Order
Run commands in this order so each downstream stage has required inputs:
Start server:
modalysis serverFormat GFF:
modalysis gff formatAnnotate GFF with expression labels:
modalysis gff annotateFormat each pileup file:
modalysis pileup formatMerge pileups per manifestation/modification:
modalysis pileup mergeFormat each DMR file:
modalysis dmr formatAnnotate DMRs with gene regions:
modalysis dmr annotateAggregate DMR results:
modalysis dmr gene-countsmodalysis dmr common-genes
Generate plots as needed:
modalysis plot mean-methylationmodalysis plot gene-heatmapmodalysis plot dmr-dotplotmodalysis plot common-genes-venn
Command Reference
Default server port: 8000.
modalysis server
Purpose: Start the FastAPI server used by all analysis commands.
Algorithm:
Launches
fastapi run(orfastapi devwith--dev) againstsrc/modalysis/server/main.py.
Usage:
uv run modalysis server [--port 8000] [--dev]
Parameters:
Flag |
Required |
Default |
Description |
|---|---|---|---|
|
No |
|
Server port. |
|
No |
|
Enables autoreload development mode. |
Output:
Running HTTP server (no
.modalysisfile).
modalysis gff format
Purpose:
Normalize a raw GFF into the pipeline’s compact .modalysis gene table.
Algorithm:
Reads TSV rows from the source GFF.
Keeps only rows with exactly 9 columns.
Keeps only
protein_coding_genefeatures.Filters to chromosomes present in
--allowed-chromosomes.Converts start coordinate to zero-based (
start - 1).Extracts
IDanddescriptionfrom attributes.Writes columns:
CHROMOSOME, START, END, GENE_ID, DESCRIPTION.
Usage:
uv run modalysis gff format \
--input-path /path/to/input.gff \
--output-path /path/to/output_dir \
--output-name formatted_gff \
--allowed-chromosomes /path/to/allowed_chromosomes.txt \
[--port 8000]
Parameters:
Flag |
Required |
Default |
Description |
|---|---|---|---|
|
Yes |
- |
Input GFF path. |
|
Yes |
- |
Output directory. |
|
Yes |
- |
Output basename ( |
|
Yes |
- |
File with one valid chromosome per line. |
|
No |
|
Server port. |
Output:
/path/to/output_dir/formatted_gff.modalysis
modalysis gff annotate
Purpose: Attach expression status labels to each GFF gene row.
Algorithm:
Loads each expression TSV as
{GENE_ID -> STATUS}.For every gene in formatted GFF, looks up each expression source.
Writes joined annotations like
LABEL: VALUE; LABEL2: VALUE2intoEXPRESSION.
Usage:
uv run modalysis gff annotate \
--gff-path /path/to/formatted_gff.modalysis \
--expression-paths /path/to/expr_a.tsv /path/to/expr_b.tsv \
--expression-labels tissue_a tissue_b \
--output-path /path/to/output_dir \
--output-name annotated_gff \
[--port 8000]
Parameters:
Flag |
Required |
Default |
Description |
|---|---|---|---|
|
Yes |
- |
Formatted GFF |
|
Yes |
- |
One or more expression TSV files. |
|
Yes |
- |
Label per expression file (same order). |
|
Yes |
- |
Output directory. |
|
Yes |
- |
Output basename. |
|
No |
|
Server port. |
Output:
/path/to/output_dir/annotated_gff.modalysiswith addedEXPRESSIONcolumn.
modalysis pileup format
Purpose:
Normalize raw pileup records into a minimal .modalysis representation.
Algorithm:
Reads raw pileup rows.
Keeps only rows with exactly 18 columns.
Filters by allowed chromosomes.
Extracts columns for genomic key and counts.
Writes columns:
CHROMOSOME, START, END, MODIFICATION, N_VALID_COV, N_MOD.
Usage:
uv run modalysis pileup format \
--input-path /path/to/raw_pileup.bed \
--output-path /path/to/output_dir \
--output-name sample_mod \
--allowed-chromosomes /path/to/allowed_chromosomes.txt \
[--port 8000]
Parameters:
Flag |
Required |
Default |
Description |
|---|---|---|---|
|
Yes |
- |
Raw pileup file path. |
|
Yes |
- |
Output directory. |
|
Yes |
- |
Output basename. |
|
Yes |
- |
File with one valid chromosome per line. |
|
No |
|
Server port. |
Output:
/path/to/output_dir/sample_mod.modalysis
modalysis pileup merge
Purpose: Aggregate multiple formatted pileup files by genomic key.
Algorithm:
Uses key
(CHROMOSOME, START, END, MODIFICATION).Sums
N_VALID_COVandN_MODacross files.Tracks in how many files each key appears.
Filters keys using:
minimum file count (
--min-files)minimum file coverage percentage (
--min-file-coverage)minimum total reads (
--min-reads)
Writes merged rows with
TOTAL_FILESandN_FILES.
Usage:
uv run modalysis pileup merge \
--pileup-paths /path/to/a.modalysis /path/to/b.modalysis \
--output-path /path/to/output_dir \
--output-name merged_mod \
[--min-files 2] \
[--min-file-coverage 50.0] \
[--min-reads 5] \
[--port 8000]
Parameters:
Flag |
Required |
Default |
Description |
|---|---|---|---|
|
Yes |
- |
Formatted pileup inputs to merge. |
|
Yes |
- |
Output directory. |
|
Yes |
- |
Output basename. |
|
No |
|
Minimum files containing key. |
|
No |
|
Minimum |
|
No |
|
Minimum summed |
|
No |
|
Server port. |
Output:
/path/to/output_dir/merged_mod.modalysis
modalysis dmr format
Purpose:
Filter and normalize raw DMR rows into a consistent .modalysis table.
Algorithm:
Reads raw DMR rows.
Keeps only rows with exactly 23 columns.
Filters by allowed chromosomes.
Applies thresholds on score, p-value, sample percentages, and read counts.
Writes retained rows with columns:
CHROMOSOME, START, END, SCORE, MAP_BASED_P_VALUE, EFFECT_SIZE, PCT_A_SAMPLES, PCT_B_SAMPLES.
Usage:
uv run modalysis dmr format \
--input-path /path/to/raw_dmr.bed \
--output-path /path/to/output_dir \
--output-name dmr_formatted \
--allowed-chromosomes /path/to/allowed_chromosomes.txt \
[--min-score 5] \
[--max-p-value 0.05] \
[--min-pct-a-samples 50.0] \
[--min-pct-b-samples 50.0] \
[--min-reads 5] \
[--port 8000]
Parameters:
Flag |
Required |
Default |
Description |
|---|---|---|---|
|
Yes |
- |
Raw DMR file path. |
|
Yes |
- |
Output directory. |
|
Yes |
- |
Output basename. |
|
Yes |
- |
File with one valid chromosome per line. |
|
No |
|
Keep rows with score >= this value. |
|
No |
|
Keep rows with p-value <= this value. |
|
No |
|
Minimum |
|
No |
|
Minimum |
|
No |
|
Minimum read count in both groups. |
|
No |
|
Server port. |
Output:
/path/to/output_dir/dmr_formatted.modalysis
modalysis dmr annotate
Purpose: Annotate each formatted DMR interval with overlapping gene regions.
Algorithm:
Parses formatted GFF genes by chromosome.
Builds promoter/body/enhancer regions per gene.
For each DMR interval, finds overlapping genes in each region.
Appends columns
PROMOTER,BODY,ENHANCER(comma-separated gene IDs).
Usage:
uv run modalysis dmr annotate \
--dmr-path /path/to/dmr_formatted.modalysis \
--gff-path /path/to/formatted_gff.modalysis \
--output-path /path/to/output_dir \
--output-name dmr_annotated \
[--port 8000]
Parameters:
Flag |
Required |
Default |
Description |
|---|---|---|---|
|
Yes |
- |
Formatted DMR |
|
Yes |
- |
Formatted GFF |
|
Yes |
- |
Output directory. |
|
Yes |
- |
Output basename. |
|
No |
|
Server port. |
Output:
/path/to/output_dir/dmr_annotated.modalysis
modalysis dmr gene-counts
Purpose: Count unique genes by manifestation, expression profile, effect sign, modification, and region.
Algorithm:
Validates all list arguments have compatible lengths.
Loads expression mapping from annotated GFF
EXPRESSIONfield.Reads annotated DMR files and groups genes by:
(manifestation, expression_profile, effect_sign, modification, region).Uses unique gene sets to avoid duplicate counts.
Writes TSV summary rows.
Optional: writes grouped-header Excel workbook (
--output-excel).
Usage:
uv run modalysis dmr gene-counts \
--annotated-dmr-paths /path/to/a.modalysis /path/to/b.modalysis \
--manifestations M1 M1 \
--modifications 5MC 5MC_5HMC \
--manifestation-labels M1 \
--expression-labels tissue_1 \
--annotated-gff-path /path/to/gff_annotated.modalysis \
--output-path /path/to/output_dir \
--output-name gene_counts \
[--output-excel] \
[--port 8000]
Parameters:
Flag |
Required |
Default |
Description |
|---|---|---|---|
|
Yes |
- |
Annotated DMR inputs. |
|
Yes |
- |
Manifestation label per DMR input. |
|
Yes |
- |
Modification label per DMR input. |
|
Yes |
- |
Canonical manifestation labels used for expression matching. |
|
Yes |
- |
Expression labels mapped to |
|
Yes |
- |
Annotated GFF with |
|
Yes |
- |
Output directory. |
|
Yes |
- |
Output basename. |
|
No |
|
Also write |
|
No |
|
Server port. |
Output:
/path/to/output_dir/gene_counts.modalysisOptional:
/path/to/output_dir/gene_counts.xlsx
modalysis dmr common-genes
Purpose: Find genes shared between two modifications for each manifestation and region.
Algorithm:
Validates list lengths and that modification A/B differ.
Loads gene expression status from annotated GFF.
From annotated DMRs, collects genes from negative effect-size rows only.
For each manifestation and region, computes set intersection between modification A and B.
Writes summary rows and per-gene rows including expression status.
Usage:
uv run modalysis dmr common-genes \
--annotated-dmr-paths /path/to/a.modalysis /path/to/b.modalysis \
--manifestations M1 M1 \
--modifications 5MC 5MC_5HMC \
--manifestation-labels M1 \
--expression-labels tissue_1 \
--modification-a 5MC \
--modification-b 5MC_5HMC \
--annotated-gff-path /path/to/gff_annotated.modalysis \
--output-path /path/to/output_dir \
--output-name common_genes \
[--port 8000]
Parameters:
Flag |
Required |
Default |
Description |
|---|---|---|---|
|
Yes |
- |
Annotated DMR inputs. |
|
Yes |
- |
Manifestation label per DMR input. |
|
Yes |
- |
Modification label per DMR input. |
|
Yes |
- |
Canonical manifestation labels used for expression matching. |
|
Yes |
- |
Expression labels mapped to manifestations by order. |
|
Yes |
- |
First modification for intersection. |
|
Yes |
- |
Second modification for intersection. |
|
Yes |
- |
Annotated GFF with expression data. |
|
Yes |
- |
Output directory. |
|
Yes |
- |
Output basename. |
|
No |
|
Server port. |
Output:
/path/to/output_dir/common_genes.modalysis
modalysis plot mean-methylation
Purpose: Plot mean methylation by chromosome, grouped by region (promoter/body/enhancer).
Algorithm:
Builds gene regions from formatted GFF.
For each merged pileup file, accumulates
N_MOD / N_VALID_COVby chromosome and region.Draws line plots across region-partitioned x-axis.
Supports optional chromosome ordering and custom title.
Usage:
uv run modalysis plot mean-methylation \
--gff-path /path/to/formatted_gff.modalysis \
--merged-pileup-paths /path/to/m1.modalysis /path/to/m2.modalysis \
--labels 5MC 5MC_5HMC \
--output-path /path/to/output_dir \
--output-name mean_methylation \
[--y-min 0.0] \
[--y-max 0.1] \
[--chromosome-order-path /path/to/order.txt] \
[--plot-title "Custom Title"] \
[--port 8000]
Parameters:
Flag |
Required |
Default |
Description |
|---|---|---|---|
|
Yes |
- |
Formatted GFF |
|
Yes |
- |
One or more merged pileup |
|
Yes |
- |
Display label per merged pileup path. |
|
Yes |
- |
Output directory. |
|
Yes |
- |
Output basename ( |
|
No |
|
Y-axis lower bound. |
|
No |
|
Y-axis upper bound. |
|
No |
|
Optional chromosome ordering file. |
|
No |
|
Optional plot title override. |
|
No |
|
Server port. |
Output:
/path/to/output_dir/mean_methylation.png
modalysis plot gene-heatmap
Purpose: Generate gene-level heatmaps for manifestation/expression/effect-sign/modification combinations.
Algorithm:
Builds manifestation->expression label mapping.
Loads per-gene expression from annotated GFF.
Collects genes per combination from annotated DMRs.
Accumulates per-gene methylation means from merged pileups.
Renders one heatmap per non-empty combination with shared color scale.
Usage:
uv run modalysis plot gene-heatmap \
--annotated-dmr-paths /path/to/dmr1.modalysis /path/to/dmr2.modalysis \
--manifestations M1 M1 \
--modifications 5MC 5MC_5HMC \
--manifestation-labels M1 \
--expression-labels tissue_1 \
--annotated-gff-path /path/to/gff_annotated.modalysis \
--gff-path /path/to/formatted_gff.modalysis \
--merged-pileup-paths /path/to/p1.modalysis /path/to/p2.modalysis \
--pileup-manifestations M1 M1 \
--pileup-modifications 5MC 5MC_5HMC \
--output-path /path/to/output_dir \
--output-name heatmap \
[--show-gene-labels] \
[--effect-signs NEGATIVE NON_NEGATIVE] \
[--port 8000]
Parameters:
Flag |
Required |
Default |
Description |
|---|---|---|---|
|
Yes |
- |
Annotated DMR inputs. |
|
Yes |
- |
Manifestation label per DMR input. |
|
Yes |
- |
Modification label per DMR input. |
|
Yes |
- |
Canonical manifestation labels. |
|
Yes |
- |
Expression labels aligned to manifestation labels. |
|
Yes |
- |
Annotated GFF with |
|
Yes |
- |
Formatted GFF for gene coordinates. |
|
Yes |
- |
Merged pileup inputs. |
|
Yes |
- |
Manifestation label per merged pileup path. |
|
Yes |
- |
Modification label per merged pileup path. |
|
Yes |
- |
Output directory. |
|
Yes |
- |
Output prefix ( |
|
No |
|
Show gene IDs on y-axis. |
|
No |
both |
Restrict to |
|
No |
|
Server port. |
Output:
Multiple PNG files like
/path/to/output_dir/heatmap_<...>.png
modalysis plot dmr-dotplot
Purpose: Plot DMR positions within promoter/body/enhancer panels for each gene.
Algorithm:
Loads expression states and gene coordinates.
Converts each DMR to region-relative position:
promoter: distance to gene start
body: percent through gene body
enhancer: distance from gene end
Groups positions by manifestation/expression/effect-sign/modification/gene.
Renders one 3-panel dotplot per non-empty combination.
Draws consensus windows containing many distinct genes.
Usage:
uv run modalysis plot dmr-dotplot \
--annotated-dmr-paths /path/to/dmr1.modalysis /path/to/dmr2.modalysis \
--manifestations M1 M1 \
--modifications 5MC 5MC_5HMC \
--manifestation-labels M1 \
--expression-labels tissue_1 \
--annotated-gff-path /path/to/gff_annotated.modalysis \
--gff-path /path/to/formatted_gff.modalysis \
--output-path /path/to/output_dir \
--output-name dotplot \
[--show-gene-labels] \
[--effect-signs NEGATIVE NON_NEGATIVE] \
[--port 8000]
Parameters:
Flag |
Required |
Default |
Description |
|---|---|---|---|
|
Yes |
- |
Annotated DMR inputs. |
|
Yes |
- |
Manifestation label per DMR input. |
|
Yes |
- |
Modification label per DMR input. |
|
Yes |
- |
Canonical manifestation labels. |
|
Yes |
- |
Expression labels aligned to manifestation labels. |
|
Yes |
- |
Annotated GFF with |
|
Yes |
- |
Formatted GFF for coordinates. |
|
Yes |
- |
Output directory. |
|
Yes |
- |
Output prefix ( |
|
No |
|
Show gene IDs. |
|
No |
both |
Restrict to |
|
No |
|
Server port. |
Output:
Multiple PNG files like
/path/to/output_dir/dotplot_<...>.png
modalysis plot common-genes-venn
Purpose: Plot Venn diagrams of common negative-DMR genes for two modifications.
Algorithm:
From annotated DMR inputs, keeps only rows with negative effect size.
Collects gene sets by
(manifestation, modification, region).For each manifestation and each region, draws set overlap panel for modification A vs B.
Usage:
uv run modalysis plot common-genes-venn \
--annotated-dmr-paths /path/to/dmr1.modalysis /path/to/dmr2.modalysis \
--manifestations M1 M1 \
--modifications 5MC 5MC_5HMC \
--modification-a 5MC \
--modification-b 5MC_5HMC \
--output-path /path/to/output_dir \
--output-name common_venn \
[--port 8000]
Parameters:
Flag |
Required |
Default |
Description |
|---|---|---|---|
|
Yes |
- |
Annotated DMR inputs. |
|
Yes |
- |
Manifestation label per DMR input. |
|
Yes |
- |
Modification label per DMR input. |
|
Yes |
- |
First modification to compare. |
|
Yes |
- |
Second modification to compare. |
|
Yes |
- |
Output directory. |
|
Yes |
- |
Output basename ( |
|
No |
|
Server port. |
Output:
/path/to/output_dir/common_venn.png
Troubleshooting
ConnectionError/ request failures:Ensure
uv run modalysis serveris running on the same port passed to command--port.
Validation errors about list lengths:
In DMR/plot aggregation commands, ensure paired list arguments have matching lengths and consistent ordering.
Empty/near-empty outputs:
Relax thresholds such as
--min-score,--max-p-value,--min-file-coverage,--min-reads.Verify chromosome naming in input files matches your allowed chromosome list.
ValueError: Modification A and B must be different:Use distinct values for
--modification-aand--modification-b.
Testing
Run the full suite:
uv run pytest -q
Run with coverage:
uv run pytest --cov=modalysis --cov-report=term-missing
Run focused suites:
uv run pytest tests/core -q
uv run pytest tests/server -q
uv run pytest tests/client -q
uv run pytest tests/cli -q
uv run pytest tests/e2e -q
Build Docs
Build the Sphinx site:
uv run sphinx-build -b html docs docs/_build/html
Open docs/_build/html/index.html in a browser.
pnpm wrangler pages deploy docs/_build/html --project-name modalysis