ppanggolin.figures package

Submodules

ppanggolin.figures.draw_spot module

ppanggolin.figures.draw_spot.add_gene_labels(fig, source_data: ~bokeh.models.sources.ColumnDataSource) -> (<class 'bokeh.models.layouts.Column'>, <class 'bokeh.models.annotations.labels.LabelSet'>)
Parameters:
  • fig

  • source_data

Returns:

ppanggolin.figures.draw_spot.add_gene_tools(recs: GlyphRenderer, source_data: ColumnDataSource) Column

Define tools to change the outline and fill colors of genes

Parameters:
  • recs

  • source_data

Returns:

ppanggolin.figures.draw_spot.add_genome_tools(fig, gene_recs: GlyphRenderer, genome_recs: GlyphRenderer, gene_source: ColumnDataSource, genome_source: ColumnDataSource, nb: int, gene_labels: LabelSet)
Parameters:
  • fig

  • gene_recs

  • genome_recs

  • gene_source

  • genome_source

  • nb

  • gene_labels

Returns:

ppanggolin.figures.draw_spot.check_predicted_spots(pangenome)

checks pangenome status and .h5 files for predicted spots, raises an error if they were not predicted

ppanggolin.figures.draw_spot.draw_curr_spot(gene_lists: list, ordered_counts: list, fam_to_mod: dict, fam_col: dict, output: Path)
Parameters:
  • gene_lists

  • ordered_counts

  • fam_to_mod

  • fam_col – Dictionary with for each family the corresponding color

  • file_name

Returns:

ppanggolin.figures.draw_spot.draw_selected_spots(selected_spots: List[Spot] | Set[Spot], pangenome: Pangenome, output: Path, overlapping_match: int, exact_match: int, set_size: int, disable_bar: bool = False)

Draw only the selected spot and give parameters

Parameters:
  • selected_spots – List of the selected spot by user

  • pangenome – Pangenome containing spot

  • output – Path to output directory

  • overlapping_match – Allowed number of missing persistent genes when comparing flanking genes

  • exact_match

  • set_size

  • disable_bar – Allow preventing bar progress print

ppanggolin.figures.draw_spot.draw_spots(pangenome: Pangenome, output: Path, spot_list: str, disable_bar: bool = False)

Main function to draw spot

Parameters:
  • pangenome – Pangenome with spot predicted

  • output – Path to output directory

  • spot_list – List of spot to draw

  • disable_bar – Allow to disable progress bar

ppanggolin.figures.draw_spot.is_gene_list_ordered(genes: List[Feature])

Check if a list of genes is ordered.

ppanggolin.figures.draw_spot.line_order_gene_lists(gene_lists: list, overlapping_match: int, exact_match: int, set_size: int)

Line ordering of all rgps

Parameters:
  • gene_lists – list

  • overlapping_match – Allowed number of missing persistent genes when comparing flanking genes

  • exact_match – Number of perfectly matching flanking single copy markers required to associate RGPs

  • set_size – Number of single copy markers to use as flanking genes for RGP

ppanggolin.figures.draw_spot.make_colors_for_iterable(it: set) dict

Randomly picks a color for all elements of a given iterable

Parameters:

it – Iterable families or modules

Returns:

Dictionary with for each element a random color associate

ppanggolin.figures.draw_spot.mk_genomes(gene_lists: list, ordered_counts: list) -> (<class 'bokeh.models.sources.ColumnDataSource'>, <class 'list'>)
Parameters:
  • gene_lists

  • ordered_counts

Returns:

ppanggolin.figures.draw_spot.mk_source_data(genelists: list, fam_col: dict, fam_to_mod: dict) -> (<class 'bokeh.models.sources.ColumnDataSource'>, <class 'list'>)
Parameters:
  • genelists

  • fam_col – Dictionary with for each family the corresponding color

  • fam_to_mod – Dictionary with the correspondence modules families

Returns:

ppanggolin.figures.draw_spot.order_gene_lists(gene_lists: list, overlapping_match: int, exact_match: int, set_size: int)

Order all rgps the same way, and order them by similarity in gene content.

Parameters:
  • gene_lists – List of genes in rgps

  • overlapping_match – Allowed number of missing persistent genes when comparing flanking genes

  • exact_match – Number of perfectly matching flanking single copy markers required to associate RGPs

  • set_size – Number of single copy markers to use as flanking genes for RGP

Returns:

List of ordered genes

ppanggolin.figures.draw_spot.row_order_gene_lists(gene_lists: list) list

Row ordering of all rgps

Parameters:

gene_lists

:return : An ordered genes list

ppanggolin.figures.draw_spot.subgraph(spot: Spot, outname: Path, with_border: bool = True, set_size: int = 3, multigenics: set | None = None, fam_to_mod: dict | None = None)

Write a pangeome subgraph of the gene families of a spot in gexf format

Parameters:
  • spot

  • outname

  • with_border

  • set_size

  • multigenics

  • fam_to_mod

ppanggolin.figures.drawing module

ppanggolin.figures.drawing.check_spot_args(args: Namespace)

Check whether the draw_spots and spots arguments are valid.

Parameters:

args (argparse.Namespace) – The parsed command line arguments.

Raises:

argparse.ArgumentError – If args.spots is specified but args.draw_spots is False.

ppanggolin.figures.drawing.launch(args: Namespace)

Command launcher

Parameters:

args – All arguments provide by user

ppanggolin.figures.drawing.parser_draw(parser: ArgumentParser)

Parser for specific argument of draw command

Parameters:

parser – parser for align argument

ppanggolin.figures.drawing.subparser(sub_parser: _SubParsersAction) ArgumentParser

Subparser to launch PPanGGOLiN in Command line

:param sub_parser : sub_parser for align command

:return : parser arguments for align command

ppanggolin.figures.tile_plot module

ppanggolin.figures.tile_plot.build_presence_absence_matrix(families: set, org_index: dict) csc_matrix

Build the presence-absence matrix for gene families.

This matrix indicates the presence (1) or absence (0) of each gene family across different organisms.

Parameters:
  • families – A set of gene families to be included in the matrix.

  • org_index – A dictionary mapping each organism to its respective index in the matrix.

Returns:

A sparse matrix (Compressed Sparse Column format) representing the presence-absence of gene families.

ppanggolin.figures.tile_plot.create_partition_shapes(separators: List[Tuple[str, float]], xval_max: float, heatmap_row: int, partition_to_color: Dict[str, str]) List[dict]

Create the shapes for plot separators to visually distinguish partitions in the plot.

Parameters:
  • separators – A list of tuples containing partition names and their corresponding separator positions.

  • xval_max – The maximum x-value for the plot.

  • heatmap_row – The row number of the heatmap.

  • partition_to_color – A dictionary mapping partition names to their corresponding colors.

Returns:

A list of shape dictionaries for Plotly to use in the plot.

ppanggolin.figures.tile_plot.create_tile_plot(binary_data: List[List[float]], text_data: List[List[str]], fam_order: List[str], partition_separator: List[tuple], order_organisms: List[Organism], dendrogram_fig: Figure, draw_dendrogram: bool) Figure

Create the heatmap tile plot using Plotly.

Parameters:
  • binary_data – The binary presence-absence matrix data.

  • text_data – Hover text data for each cell in the heatmap.

  • fam_order – List of gene family names in the desired order.

  • partition_separator – List of tuples containing partition names and their separator positions.

  • order_organisms – List of organisms in the desired order.

  • dendrogram_fig – Plotly figure object for the dendrogram.

  • draw_dendrogram – Flag indicating whether to draw the dendrogram.

Returns:

A Plotly Figure object representing the tile plot.

ppanggolin.figures.tile_plot.draw_tile_plot(pangenome: Pangenome, output: Path, nocloud: bool = False, draw_dendrogram: bool = False, add_metadata: bool = False, metadata_sources: Set[str] | None = None, disable_bar: bool = False)

Draw a tile plot from a partitioned pangenome.

Parameters:
  • pangenome – Partitioned pangenome.

  • output – Path to the output directory where the tile plot will be saved.

  • nocloud – If True, exclude the cloud partition from the plot.

  • draw_dendrogram – If True, include a dendrogram in the tile plot.

  • disable_bar – If True, disable the progress bar during processing.

ppanggolin.figures.tile_plot.generate_dendrogram(mat_p_a: csc_matrix, org_index: dict) Tuple[List, Figure]

Generate the order of organisms based on a dendrogram.

Parameters:
  • mat_p_a – Sparse matrix representing the presence-absence of gene families.

  • org_index – Dictionary mapping organism names to their respective indices in the matrix.

Returns:

A tuple containing the ordered list of organisms and the dendrogram figure.

ppanggolin.figures.tile_plot.get_heatmap_hover_text(ordered_families: List, order_organisms: List) List[List[str]]

Generate hover text for the heatmap cells.

Parameters:
  • ordered_families – The list of ordered gene families.

  • order_organisms – The list of ordered organisms.

Returns:

A 2D list of strings representing hover text for each heatmap cell.

ppanggolin.figures.tile_plot.metadata_stringify(gene) str

Convert gene metadata to a formatted string.

Parameters:

gene – The gene object with potential metadata.

Returns:

A formatted string containing gene metadata information.

ppanggolin.figures.tile_plot.order_nodes(partitions_dict: dict, shell_subs: set) Tuple[List, List[Tuple[str, float]]]

Order gene families based on their partitions.

Parameters:
  • partitions_dict – A dictionary where keys are partition names and values are lists of gene families in each partition.

  • shell_subs – A set of shell subpartition names.

Returns:

A tuple containing the ordered list of gene families and a list of partition separators.

ppanggolin.figures.tile_plot.prepare_data_structures(pangenome: Pangenome, nocloud: bool) Tuple[set, dict]

Prepare data structures required for generating the tile plot.

Parameters:
  • pangenome – Partitioned pangenome containing gene families and organism data.

  • nocloud – If True, exclude gene families belonging to the cloud partition.

Returns:

A tuple containing a set of gene families to be plotted and a dictionary mapping organisms to their indices.

ppanggolin.figures.tile_plot.process_tile_data(families: set, order_organisms: List) Tuple[List[List[float]], List[List[str]], List[str], List[Tuple[str, float]]]

Process data for each tile in the plot.

Parameters:
  • families – A set of gene families to be processed.

  • order_organisms – The ordered list of organisms for the tile plot.

Returns:

A tuple containing binary data, text data, family order, and separators for the plot.

ppanggolin.figures.ucurve module

ppanggolin.figures.ucurve.build_ucurve_plot(number_of_organisms: int, count: Dict[int, Dict[str, int]], max_bar: int, is_partitioned: bool, has_undefined: bool, chao: float, soft_core: float, number_of_gene_families: int) Figure

Build a U-curve bar plot of gene family frequencies across genomes.

This function visualizes the distribution of gene families (e.g., persistent, shell, cloud) across genomes in the pangenome using a stacked bar chart. It supports both partitioned and non-partitioned pangenomes, and highlights the soft core threshold as a vertical dashed line.

Parameters:
  • number_of_organism – Number of organisms in the pangenome.

  • count – Nested dictionary mapping genome counts to gene family counts by partition.

  • max_bar – Maximum y-axis value to set the plot height.

  • is_partitioned – Flag indicating if the pangenome is partitioned into persistent, shell, cloud.

  • has_undefined – Flag indicating presence of undefined gene families.

  • chao – Chao1 richness estimator value, shown in the plot title.

  • soft_core – Threshold fraction defining the soft core genome, used to draw a vertical line.

  • number_of_gene_families – Total number of gene families in the pangenome.

Returns:

A Plotly Figure object containing the U-curve bar plot.

ppanggolin.figures.ucurve.compute_family_counts(pangenome: Pangenome) Tuple[Dict[int, Dict[str, int]], int, bool, bool, float | str]

Compute gene family distribution across genomes, and estimate the total family richness using Chao1.

This function computes:
  • The number of gene families found in exactly n genomes (for each n)

  • Their counts by partition (e.g. persistent, shell, cloud, etc.)

  • Whether the pangenome is partitioned

  • Whether any gene family has an undefined partition

  • A Chao1 estimate of the total number of gene families (to account for unobserved families)

Parameters:

pangenome – A Pangenome object with annotated and partitioned gene families

Returns:

A tuple containing: - family_count_by_org_and_part: dict[int][str] → count of gene families by number of organisms and partition - max_family_count: the highest number of gene families for any given organism count (for scaling plots) - is_partitioned: whether at least one gene family has a defined partition - has_undefined: whether any gene family has the undefined (“U”) partition - chao: Chao1 estimate for total gene family richness (float or “NA” if not computable)

ppanggolin.figures.ucurve.draw_ucurve(pangenome: Pangenome, output: Path, soft_core: float = 0.95, disable_bar: bool = False)

Draws the U-shaped curve of gene family frequency distribution.

Parameters:
  • pangenome – Partitioned pangenome

  • output – Path to output directory

  • soft_core – Soft core threshold to use

  • disable_bar – Allow to disable progress bar

Module contents