Core Modules
Chromosome normalization helpers.
- modalysis.core.chromosomes.normalize_allowed_chromosomes(allowed_chromosomes: list[str]) set[str][source]
Normalize chromosome names to uppercase for case-insensitive filtering.
- Parameters:
allowed_chromosomes (list[str])
- Return type:
set[str]
Expression field parsing utilities.
- modalysis.core.expression.parse_expression_field(expression_field: str) dict[str, str][source]
Parse LABEL: VALUE; … expression text into uppercase mapping.
- Parameters:
expression_field (str)
- Return type:
dict[str, str]
Gene-region parsing and interval lookup helpers.
- class modalysis.core.gene_regions.ChromosomeRegions[source]
Bases:
TypedDict- promoter: list[tuple[int, int, str]]
- body: list[tuple[int, int, str]]
- enhancer: list[tuple[int, int, str]]
- promoter_starts: list[int]
- body_starts: list[int]
- enhancer_starts: list[int]
- modalysis.core.gene_regions.parse_gff(gff_path: str) dict[str, list[tuple[int, int, str]]][source]
Parse formatted GFF .modalysis file.
Returns a dict: chromosome -> sorted list of (start, end, gene_id).
- Parameters:
gff_path (str)
- Return type:
dict[str, list[tuple[int, int, str]]]
- modalysis.core.gene_regions.build_gene_regions(genes_by_chromosome: dict[str, list[tuple[int, int, str]]], promoter_upstream: int = 1000, enhancer_downstream: int = 1000) dict[str, ChromosomeRegions][source]
Build promoter/body/enhancer region boundaries for annotation lookup.
- Parameters:
genes_by_chromosome (dict[str, list[tuple[int, int, str]]])
promoter_upstream (int)
enhancer_downstream (int)
- Return type:
dict[str, ChromosomeRegions]
- modalysis.core.gene_regions.find_genes_at_position(position: int, region_list: list[tuple[int, int, str]], starts_list: list[int]) list[str][source]
Find all gene IDs whose region contains the given position.
- Parameters:
position (int)
region_list (list[tuple[int, int, str]])
starts_list (list[int])
- Return type:
list[str]
- modalysis.core.gene_regions.find_genes_overlapping_interval(interval_start: int, interval_end: int, region_list: list[tuple[int, int, str]], starts_list: list[int]) list[str][source]
Find all gene IDs whose region overlaps the given half-open interval [start, end).
- Parameters:
interval_start (int)
interval_end (int)
region_list (list[tuple[int, int, str]])
starts_list (list[int])
- Return type:
list[str]
Core GFF formatting and expression annotation routines.
- modalysis.core.gff.format(input_path: str, output_path: str, output_name: str, allowed_chromosomes: list[str]) None[source]
Format raw GFF rows into gene-level .modalysis output.
- Parameters:
input_path (str)
output_path (str)
output_name (str)
allowed_chromosomes (list[str])
- Return type:
None
- modalysis.core.gff.annotate(gff_path: str, expression_paths: list[str], expression_labels: list[str], output_path: str, output_name: str) None[source]
Annotate formatted GFF genes with one or more expression sources.
- Parameters:
gff_path (str)
expression_paths (list[str])
expression_labels (list[str])
output_path (str)
output_name (str)
- Return type:
None
Core pileup formatting and merge routines.
- modalysis.core.pileup.format(input_path: str, output_path: str, output_name: str, allowed_chromosomes: list[str]) None[source]
Format a raw pileup file into canonical .modalysis columns.
- Parameters:
input_path (str)
output_path (str)
output_name (str)
allowed_chromosomes (list[str])
- Return type:
None
- modalysis.core.pileup.merge(pileup_paths: list[str], output_path: str, output_name: str, min_files: int = 2, min_file_coverage: float = 50.0, min_reads: int = 5) None[source]
Merge formatted pileup files by genomic key and apply coverage/read filters.
- Parameters:
pileup_paths (list[str])
output_path (str)
output_name (str)
min_files (int)
min_file_coverage (float)
min_reads (int)
- Return type:
None
Core DMR formatting, annotation, and aggregation routines.
- modalysis.core.dmr._to_excel_column_name(column_index: int) str[source]
Convert 1-based integer column index to Excel letter notation.
- Parameters:
column_index (int)
- Return type:
str
- modalysis.core.dmr._excel_inline_string_cell(row: int, column: int, value: str) str[source]
Build XML for an inline string cell in an XLSX worksheet.
- Parameters:
row (int)
column (int)
value (str)
- Return type:
str
- modalysis.core.dmr._excel_number_cell(row: int, column: int, value: int) str[source]
Build XML for a numeric cell in an XLSX worksheet.
- Parameters:
row (int)
column (int)
value (int)
- Return type:
str
- modalysis.core.dmr._write_gene_counts_excel(output_path_with_name: Path, manifestation_order: list[str], modification_order: list[str], count_lookup: dict[tuple[str, str, str, str, str], int]) None[source]
Write a compact XLSX workbook for aggregated DMR gene counts.
- Parameters:
output_path_with_name (Path)
manifestation_order (list[str])
modification_order (list[str])
count_lookup (dict[tuple[str, str, str, str, str], int])
- Return type:
None
- modalysis.core.dmr.format(input_path: str, output_path: str, output_name: str, allowed_chromosomes: list[str], min_score: float = 5, max_p_value: float = 0.05, min_pct_a_samples: float = 50.0, min_pct_b_samples: float = 50.0, min_reads: int = 5) None[source]
Filter and normalize raw DMR rows into the .modalysis schema.
- Parameters:
input_path (str)
output_path (str)
output_name (str)
allowed_chromosomes (list[str])
min_score (float)
max_p_value (float)
min_pct_a_samples (float)
min_pct_b_samples (float)
min_reads (int)
- Return type:
None
- modalysis.core.dmr.annotate(dmr_path: str, gff_path: str, output_path: str, output_name: str) None[source]
Annotate DMR intervals with overlapping promoter/body/enhancer gene IDs.
- Parameters:
dmr_path (str)
gff_path (str)
output_path (str)
output_name (str)
- Return type:
None
- modalysis.core.dmr.gene_counts(annotated_dmr_paths: list[str], manifestations: list[str], modifications: list[str], manifestation_labels: list[str], expression_labels: list[str], annotated_gff_path: str, output_path: str, output_name: str, output_excel: bool = False) None[source]
Count unique genes by manifestation/expression/effect/modification/region.
- Parameters:
annotated_dmr_paths (list[str])
manifestations (list[str])
modifications (list[str])
manifestation_labels (list[str])
expression_labels (list[str])
annotated_gff_path (str)
output_path (str)
output_name (str)
output_excel (bool)
- Return type:
None
- modalysis.core.dmr.common_genes(annotated_dmr_paths: list[str], manifestations: list[str], modifications: list[str], manifestation_labels: list[str], expression_labels: list[str], modification_a: str, modification_b: str, annotated_gff_path: str, output_path: str, output_name: str) None[source]
Find common negative-effect genes across two modifications by region.
- Parameters:
annotated_dmr_paths (list[str])
manifestations (list[str])
modifications (list[str])
manifestation_labels (list[str])
expression_labels (list[str])
modification_a (str)
modification_b (str)
annotated_gff_path (str)
output_path (str)
output_name (str)
- Return type:
None
Formatting helpers for plot labels.
- modalysis.core.plots.label_format.format_modification_label(modification: str) str[source]
Convert normalized modification labels into human-readable labels.
- Parameters:
modification (str)
- Return type:
str
Mean methylation line-plot generation across regions and chromosomes.
- modalysis.core.plots.mean_methylation._find_overlapping_regions(position: int, region_list: list[tuple[int, int, str]], starts_list: list[int]) bool[source]
Check if a position overlaps with any regions using binary search.
Returns True if the position falls within at least one region. A position overlaps a region if region_start <= position < region_end.
- Parameters:
position (int)
region_list (list[tuple[int, int, str]])
starts_list (list[int])
- Return type:
bool
- modalysis.core.plots.mean_methylation._accumulate_pileup(merged_pileup_path: str, regions: dict[str, ChromosomeRegions]) dict[tuple[str, str], list[int]][source]
Read a merged pileup file and accumulate n_valid_cov and n_mod per (chromosome, region).
- Returns:
(chromosome, region_name) -> [sum_n_valid_cov, sum_n_mod]
- Return type:
dict
- Parameters:
merged_pileup_path (str)
regions (dict[str, ChromosomeRegions])
- modalysis.core.plots.mean_methylation.plot_mean_methylation(gff_path: str, merged_pileup_paths: list[str], labels: list[str], output_path: str, output_name: str, y_min: float = 0.0, y_max: float = 0.1, chromosome_order: list[str] | None = None, plot_title: str | None = None) None[source]
Generate region-grouped chromosome methylation line plots.
- Parameters:
gff_path (str)
merged_pileup_paths (list[str])
labels (list[str])
output_path (str)
output_name (str)
y_min (float)
y_max (float)
chromosome_order (list[str] | None)
plot_title (str | None)
- Return type:
None
Gene-level methylation heatmap generation.
- modalysis.core.plots.gene_heatmap._collect_genes_by_combination(annotated_dmr_paths: list[str], manifestations: list[str], modifications: list[str], manifestation_to_expression_label: dict[str, str], gene_to_expression: dict[str, dict[str, str]]) dict[tuple[str, str, str, str, str], set[str]][source]
Read annotated DMR files and collect the set of genes for each (manifestation, expression_profile, effect_sign, modification, region) combination.
- Returns:
key -> set of gene_ids
- Return type:
dict
- Parameters:
annotated_dmr_paths (list[str])
manifestations (list[str])
modifications (list[str])
manifestation_to_expression_label (dict[str, str])
gene_to_expression (dict[str, dict[str, str]])
- modalysis.core.plots.gene_heatmap._accumulate_pileup_per_gene(merged_pileup_path: str, regions: dict[str, ChromosomeRegions]) dict[tuple[str, str], list[int]][source]
Read a merged pileup file and accumulate n_valid_cov and n_mod per (gene_id, region_name).
- Returns:
(gene_id, region_name) -> [sum_n_valid_cov, sum_n_mod]
- Return type:
dict
- Parameters:
merged_pileup_path (str)
regions (dict[str, ChromosomeRegions])
- modalysis.core.plots.gene_heatmap.plot_gene_heatmap(annotated_dmr_paths: list[str], manifestations: list[str], modifications: list[str], manifestation_labels: list[str], expression_labels: list[str], annotated_gff_path: str, gff_path: str, merged_pileup_paths: list[str], pileup_manifestations: list[str], pileup_modifications: list[str], output_path: str, output_name: str, show_gene_labels: bool = False, effect_signs: list[str] | None = None) None[source]
Render per-combination heatmaps using DMR-selected genes and pileup means.
- Parameters:
annotated_dmr_paths (list[str])
manifestations (list[str])
modifications (list[str])
manifestation_labels (list[str])
expression_labels (list[str])
annotated_gff_path (str)
gff_path (str)
merged_pileup_paths (list[str])
pileup_manifestations (list[str])
pileup_modifications (list[str])
output_path (str)
output_name (str)
show_gene_labels (bool)
effect_signs (list[str] | None)
- Return type:
None
DMR position dotplot generation within promoter/body/enhancer regions.
- modalysis.core.plots.dmr_dotplot._build_gene_coordinate_lookup(gff_path: str) dict[str, tuple[str, int, int]][source]
Build a lookup from gene_id -> (chromosome, start, end) using the formatted GFF file.
- Returns:
gene_id (uppercase) -> (chromosome, start, end)
- Return type:
dict
- Parameters:
gff_path (str)
- modalysis.core.plots.dmr_dotplot._collect_dmr_positions(annotated_dmr_paths: list[str], manifestations: list[str], modifications: list[str], manifestation_to_expression_label: dict[str, str], gene_to_expression: dict[str, dict[str, str]], gene_coords: dict[str, tuple[str, int, int]]) dict[tuple[str, str, str, str, str, str], list[float]][source]
Read annotated DMR files and collect the position of each DMR within its gene region.
- Returns:
- (manifestation, expression_profile, effect_sign, modification, gene_id, region)
-> list of float positions For PROMOTER: distance from gene start (-1000 = far upstream, 0 = gene start) For BODY: percentage (0-100) For ENHANCER: distance from gene end (0-1000)
- Return type:
dict
- Parameters:
annotated_dmr_paths (list[str])
manifestations (list[str])
modifications (list[str])
manifestation_to_expression_label (dict[str, str])
gene_to_expression (dict[str, dict[str, str]])
gene_coords (dict[str, tuple[str, int, int]])
- modalysis.core.plots.dmr_dotplot._find_consensus_window(region_points: list[tuple[float, str]], window_size: float, min_genes: int) tuple[float, float] | None[source]
Find a window containing points from at least min_genes distinct genes.
- Parameters:
region_points (list[tuple[float, str]])
window_size (float)
min_genes (int)
- Return type:
tuple[float, float] | None
- modalysis.core.plots.dmr_dotplot._render_dotplot(gene_positions: dict[str, dict[str, list[float]]], title: str, output_file_path: Path, show_gene_labels: bool = False) bool[source]
Render a single dotplot PNG.
- Parameters:
gene_positions (dict[str, dict[str, list[float]]]) – dict of gene_id -> {region -> [positions]} where region is PROMOTER, BODY, or ENHANCER
title (str) – plot title string
output_file_path (Path) – Path object for output file
show_gene_labels (bool)
- Return type:
bool
- modalysis.core.plots.dmr_dotplot.plot_dmr_dotplot(annotated_dmr_paths: list[str], manifestations: list[str], modifications: list[str], manifestation_labels: list[str], expression_labels: list[str], annotated_gff_path: str, gff_path: str, output_path: str, output_name: str, show_gene_labels: bool = False, effect_signs: list[str] | None = None) None[source]
Render DMR position dotplots for each manifestation/expression/modification slice.
- Parameters:
annotated_dmr_paths (list[str])
manifestations (list[str])
modifications (list[str])
manifestation_labels (list[str])
expression_labels (list[str])
annotated_gff_path (str)
gff_path (str)
output_path (str)
output_name (str)
show_gene_labels (bool)
effect_signs (list[str] | None)
- Return type:
None
Venn plotting for overlapping negative DMR genes across modifications.
- modalysis.core.plots.common_genes_venn._collect_negative_gene_sets(annotated_dmr_paths: list[str], manifestations: list[str], modifications: list[str]) tuple[dict[tuple[str, str, str], set[str]], list[str]][source]
Collect per-region gene sets from negative-effect DMR rows only.
- Parameters:
annotated_dmr_paths (list[str])
manifestations (list[str])
modifications (list[str])
- Return type:
tuple[dict[tuple[str, str, str], set[str]], list[str]]
- modalysis.core.plots.common_genes_venn._draw_venn_panel(ax: Axes, set_a: set[str], set_b: set[str], label_a: str, label_b: str, title: str) None[source]
Draw one two-set Venn-like panel with counts and labels.
- Parameters:
ax (Axes)
set_a (set[str])
set_b (set[str])
label_a (str)
label_b (str)
title (str)
- Return type:
None
- modalysis.core.plots.common_genes_venn.plot_common_genes_venn(annotated_dmr_paths: list[str], manifestations: list[str], modifications: list[str], modification_a: str, modification_b: str, output_path: str, output_name: str) None[source]
Render regional Venn panels comparing two modifications per manifestation.
- Parameters:
annotated_dmr_paths (list[str])
manifestations (list[str])
modifications (list[str])
modification_a (str)
modification_b (str)
output_path (str)
output_name (str)
- Return type:
None