ppanggolin.figures package

Submodules

ppanggolin.figures.draw_spot module

ppanggolin.figures.draw_spot.add_gene_labels(fig, source_data: ~bokeh.models.sources.ColumnDataSource) -> (<class 'bokeh.models.layouts.Column'>, <class 'bokeh.models.annotations.labels.LabelSet'>)

Parameters:

fig –
source_data –

Returns:

ppanggolin.figures.draw_spot.add_gene_tools(recs: GlyphRenderer, source_data: ColumnDataSource) → Column

Define tools to change the outline and fill colors of genes

Parameters:

recs –
source_data –

Returns:

ppanggolin.figures.draw_spot.add_genome_tools(fig, gene_recs: GlyphRenderer, genome_recs: GlyphRenderer, gene_source: ColumnDataSource, genome_source: ColumnDataSource, nb: int, gene_labels: LabelSet)

Parameters:

fig –
gene_recs –
genome_recs –
gene_source –
genome_source –
nb –
gene_labels –

Returns:

ppanggolin.figures.draw_spot.check_predicted_spots(pangenome): checks pangenome status and .h5 files for predicted spots, raises an error if they were not predicted

ppanggolin.figures.draw_spot.draw_curr_spot(gene_lists: list, ordered_counts: list, fam_to_mod: dict, fam_col: dict, output: Path)

Parameters:

gene_lists –
ordered_counts –
fam_to_mod –
fam_col – Dictionary with for each family the corresponding color
file_name –

Returns:

ppanggolin.figures.draw_spot.draw_selected_spots(selected_spots: List[Spot] | Set[Spot], pangenome: Pangenome, output: Path, overlapping_match: int, exact_match: int, set_size: int, disable_bar: bool = False)

Draw only the selected spot and give parameters

Parameters:

selected_spots – List of the selected spot by user
pangenome – Pangenome containing spot
output – Path to output directory
overlapping_match – Allowed number of missing persistent genes when comparing flanking genes
exact_match –
set_size –
disable_bar – Allow preventing bar progress print

ppanggolin.figures.draw_spot.draw_spots(pangenome: Pangenome, output: Path, spot_list: str, disable_bar: bool = False)

Main function to draw spot

Parameters:

pangenome – Pangenome with spot predicted
output – Path to output directory
spot_list – List of spot to draw
disable_bar – Allow to disable progress bar

ppanggolin.figures.draw_spot.is_gene_list_ordered(genes: List[Feature]): Check if a list of genes is ordered.

ppanggolin.figures.draw_spot.line_order_gene_lists(gene_lists: list, overlapping_match: int, exact_match: int, set_size: int)

Line ordering of all rgps

Parameters:

gene_lists – list
overlapping_match – Allowed number of missing persistent genes when comparing flanking genes
exact_match – Number of perfectly matching flanking single copy markers required to associate RGPs
set_size – Number of single copy markers to use as flanking genes for RGP

ppanggolin.figures.draw_spot.make_colors_for_iterable(it: set) → dict

Randomly picks a color for all elements of a given iterable

Parameters:: it – Iterable families or modules
Returns:: Dictionary with for each element a random color associate

ppanggolin.figures.draw_spot.mk_genomes(gene_lists: list, ordered_counts: list) -> (<class 'bokeh.models.sources.ColumnDataSource'>, <class 'list'>)

Parameters:

gene_lists –
ordered_counts –

Returns:

ppanggolin.figures.draw_spot.mk_source_data(genelists: list, fam_col: dict, fam_to_mod: dict) -> (<class 'bokeh.models.sources.ColumnDataSource'>, <class 'list'>)

Parameters:

genelists –
fam_col – Dictionary with for each family the corresponding color
fam_to_mod – Dictionary with the correspondence modules families

Returns:

ppanggolin.figures.draw_spot.order_gene_lists(gene_lists: list, overlapping_match: int, exact_match: int, set_size: int)

Order all rgps the same way, and order them by similarity in gene content.

Parameters:

gene_lists – List of genes in rgps
overlapping_match – Allowed number of missing persistent genes when comparing flanking genes
exact_match – Number of perfectly matching flanking single copy markers required to associate RGPs
set_size – Number of single copy markers to use as flanking genes for RGP

Returns:

List of ordered genes

ppanggolin.figures.draw_spot.row_order_gene_lists(gene_lists: list) → list

Row ordering of all rgps

Parameters:: gene_lists –

:return : An ordered genes list

ppanggolin.figures.draw_spot.subgraph(spot: Spot, outname: Path, with_border: bool = True, set_size: int = 3, multigenics: set | None = None, fam_to_mod: dict | None = None)

Write a pangeome subgraph of the gene families of a spot in gexf format

Parameters:

spot –
outname –
with_border –
set_size –
multigenics –
fam_to_mod –

ppanggolin.figures.drawing module

ppanggolin.figures.drawing.check_spot_args(args: Namespace)

Check whether the draw_spots and spots arguments are valid.

Parameters:: args (argparse.Namespace) – The parsed command line arguments.
Raises:: argparse.ArgumentError – If args.spots is specified but args.draw_spots is False.

ppanggolin.figures.drawing.launch(args: Namespace)

Command launcher

Parameters:: args – All arguments provide by user

ppanggolin.figures.drawing.parser_draw(parser: ArgumentParser)

Parser for specific argument of draw command

Parameters:: parser – parser for align argument

ppanggolin.figures.drawing.subparser(sub_parser: _SubParsersAction) → ArgumentParser

Subparser to launch PPanGGOLiN in Command line

:param sub_parser : sub_parser for align command

:return : parser arguments for align command

ppanggolin.figures.tile_plot module

ppanggolin.figures.tile_plot.build_presence_absence_matrix(families: set, org_index: dict) → csc_matrix

Build the presence-absence matrix for gene families.

This matrix indicates the presence (1) or absence (0) of each gene family across different organisms.

Parameters:

families – A set of gene families to be included in the matrix.
org_index – A dictionary mapping each organism to its respective index in the matrix.

Returns:

A sparse matrix (Compressed Sparse Column format) representing the presence-absence of gene families.

ppanggolin.figures.tile_plot.create_partition_shapes(separators: List[Tuple[str, float]], xval_max: float, heatmap_row: int, partition_to_color: Dict[str, str]) → List[dict]

Create the shapes for plot separators to visually distinguish partitions in the plot.

Parameters:

separators – A list of tuples containing partition names and their corresponding separator positions.
xval_max – The maximum x-value for the plot.
heatmap_row – The row number of the heatmap.
partition_to_color – A dictionary mapping partition names to their corresponding colors.

Returns:

A list of shape dictionaries for Plotly to use in the plot.

ppanggolin.figures.tile_plot.create_tile_plot(binary_data: List[List[float]], text_data: List[List[str]], fam_order: List[str], partition_separator: List[tuple], order_organisms: List[Organism], dendrogram_fig: Figure, draw_dendrogram: bool) → Figure

Create the heatmap tile plot using Plotly.

Parameters:

binary_data – The binary presence-absence matrix data.
text_data – Hover text data for each cell in the heatmap.
fam_order – List of gene family names in the desired order.
partition_separator – List of tuples containing partition names and their separator positions.
order_organisms – List of organisms in the desired order.
dendrogram_fig – Plotly figure object for the dendrogram.
draw_dendrogram – Flag indicating whether to draw the dendrogram.

Returns:

A Plotly Figure object representing the tile plot.

ppanggolin.figures.tile_plot.draw_tile_plot(pangenome: Pangenome, output: Path, nocloud: bool = False, draw_dendrogram: bool = False, add_metadata: bool = False, metadata_sources: Set[str] | None = None, disable_bar: bool = False)

Draw a tile plot from a partitioned pangenome.

Parameters:

pangenome – Partitioned pangenome.
output – Path to the output directory where the tile plot will be saved.
nocloud – If True, exclude the cloud partition from the plot.
draw_dendrogram – If True, include a dendrogram in the tile plot.
disable_bar – If True, disable the progress bar during processing.

ppanggolin.figures.tile_plot.generate_dendrogram(mat_p_a: csc_matrix, org_index: dict) → Tuple[List, Figure]

Generate the order of organisms based on a dendrogram.

Parameters:

mat_p_a – Sparse matrix representing the presence-absence of gene families.
org_index – Dictionary mapping organism names to their respective indices in the matrix.

Returns:

A tuple containing the ordered list of organisms and the dendrogram figure.

ppanggolin.figures.tile_plot.get_heatmap_hover_text(ordered_families: List, order_organisms: List) → List[List[str]]

Generate hover text for the heatmap cells.

Parameters:

ordered_families – The list of ordered gene families.
order_organisms – The list of ordered organisms.

Returns:

A 2D list of strings representing hover text for each heatmap cell.

ppanggolin.figures.tile_plot.metadata_stringify(gene) → str

Convert gene metadata to a formatted string.

Parameters:: gene – The gene object with potential metadata.
Returns:: A formatted string containing gene metadata information.

ppanggolin.figures.tile_plot.order_nodes(partitions_dict: dict, shell_subs: set) → Tuple[List, List[Tuple[str, float]]]

Order gene families based on their partitions.

Parameters:

partitions_dict – A dictionary where keys are partition names and values are lists of gene families in each partition.
shell_subs – A set of shell subpartition names.

Returns:

A tuple containing the ordered list of gene families and a list of partition separators.

ppanggolin.figures.tile_plot.prepare_data_structures(pangenome: Pangenome, nocloud: bool) → Tuple[set, dict]

Prepare data structures required for generating the tile plot.

Parameters:

pangenome – Partitioned pangenome containing gene families and organism data.
nocloud – If True, exclude gene families belonging to the cloud partition.

Returns:

A tuple containing a set of gene families to be plotted and a dictionary mapping organisms to their indices.

ppanggolin.figures.tile_plot.process_tile_data(families: set, order_organisms: List) → Tuple[List[List[float]], List[List[str]], List[str], List[Tuple[str, float]]]

Process data for each tile in the plot.

Parameters:

families – A set of gene families to be processed.
order_organisms – The ordered list of organisms for the tile plot.

Returns:

A tuple containing binary data, text data, family order, and separators for the plot.

ppanggolin.figures.ucurve module

ppanggolin.figures.ucurve.build_ucurve_plot(number_of_organisms: int, count: Dict[int, Dict[str, int]], max_bar: int, is_partitioned: bool, has_undefined: bool, chao: float, soft_core: float, number_of_gene_families: int) → Figure

Build a U-curve bar plot of gene family frequencies across genomes.

This function visualizes the distribution of gene families (e.g., persistent, shell, cloud) across genomes in the pangenome using a stacked bar chart. It supports both partitioned and non-partitioned pangenomes, and highlights the soft core threshold as a vertical dashed line.

Parameters:

number_of_organism – Number of organisms in the pangenome.
count – Nested dictionary mapping genome counts to gene family counts by partition.
max_bar – Maximum y-axis value to set the plot height.
is_partitioned – Flag indicating if the pangenome is partitioned into persistent, shell, cloud.
has_undefined – Flag indicating presence of undefined gene families.
chao – Chao1 richness estimator value, shown in the plot title.
soft_core – Threshold fraction defining the soft core genome, used to draw a vertical line.
number_of_gene_families – Total number of gene families in the pangenome.

Returns:

A Plotly Figure object containing the U-curve bar plot.

ppanggolin.figures.ucurve.compute_family_counts(pangenome: Pangenome) → Tuple[Dict[int, Dict[str, int]], int, bool, bool, float | str]

Compute gene family distribution across genomes, and estimate the total family richness using Chao1.

This function computes:

The number of gene families found in exactly n genomes (for each n)
Their counts by partition (e.g. persistent, shell, cloud, etc.)
Whether the pangenome is partitioned
Whether any gene family has an undefined partition
A Chao1 estimate of the total number of gene families (to account for unobserved families)

Parameters:: pangenome – A Pangenome object with annotated and partitioned gene families
Returns:: A tuple containing: - family_count_by_org_and_part: dict[int][str] → count of gene families by number of organisms and partition - max_family_count: the highest number of gene families for any given organism count (for scaling plots) - is_partitioned: whether at least one gene family has a defined partition - has_undefined: whether any gene family has the undefined (“U”) partition - chao: Chao1 estimate for total gene family richness (float or “NA” if not computable)

ppanggolin.figures.ucurve.draw_ucurve(pangenome: Pangenome, output: Path, soft_core: float = 0.95, disable_bar: bool = False)

Draws the U-shaped curve of gene family frequency distribution.

Parameters:

pangenome – Partitioned pangenome
output – Path to output directory
soft_core – Soft core threshold to use
disable_bar – Allow to disable progress bar

ppanggolin.figures package

Submodules

ppanggolin.figures.draw_spot module

ppanggolin.figures.drawing module

ppanggolin.figures.tile_plot module

ppanggolin.figures.ucurve module

Module contents