ppanggolin.figures package
Submodules
ppanggolin.figures.draw_spot module
- ppanggolin.figures.draw_spot.add_gene_labels(fig, source_data: ~bokeh.models.sources.ColumnDataSource) -> (<class 'bokeh.models.layouts.Column'>, <class 'bokeh.models.annotations.labels.LabelSet'>)
- Parameters:
fig –
source_data –
- Returns:
- ppanggolin.figures.draw_spot.add_gene_tools(recs: GlyphRenderer, source_data: ColumnDataSource) Column
Define tools to change the outline and fill colors of genes
- Parameters:
recs –
source_data –
- Returns:
- ppanggolin.figures.draw_spot.add_genome_tools(fig, gene_recs: GlyphRenderer, genome_recs: GlyphRenderer, gene_source: ColumnDataSource, genome_source: ColumnDataSource, nb: int, gene_labels: LabelSet)
- Parameters:
fig –
gene_recs –
genome_recs –
gene_source –
genome_source –
nb –
gene_labels –
- Returns:
- ppanggolin.figures.draw_spot.check_predicted_spots(pangenome)
checks pangenome status and .h5 files for predicted spots, raises an error if they were not predicted
- ppanggolin.figures.draw_spot.draw_curr_spot(gene_lists: list, ordered_counts: list, fam_to_mod: dict, fam_col: dict, output: Path)
- Parameters:
gene_lists –
ordered_counts –
fam_to_mod –
fam_col – Dictionary with for each family the corresponding color
file_name –
- Returns:
- ppanggolin.figures.draw_spot.draw_selected_spots(selected_spots: List[Spot] | Set[Spot], pangenome: Pangenome, output: Path, overlapping_match: int, exact_match: int, set_size: int, disable_bar: bool = False)
Draw only the selected spot and give parameters
- Parameters:
selected_spots – List of the selected spot by user
pangenome – Pangenome containing spot
output – Path to output directory
overlapping_match – Allowed number of missing persistent genes when comparing flanking genes
exact_match –
set_size –
disable_bar – Allow preventing bar progress print
- ppanggolin.figures.draw_spot.draw_spots(pangenome: Pangenome, output: Path, spot_list: str, disable_bar: bool = False)
Main function to draw spot
- Parameters:
pangenome – Pangenome with spot predicted
output – Path to output directory
spot_list – List of spot to draw
disable_bar – Allow to disable progress bar
- ppanggolin.figures.draw_spot.is_gene_list_ordered(genes: List[Feature])
Check if a list of genes is ordered.
- ppanggolin.figures.draw_spot.line_order_gene_lists(gene_lists: list, overlapping_match: int, exact_match: int, set_size: int)
Line ordering of all rgps
- Parameters:
gene_lists – list
overlapping_match – Allowed number of missing persistent genes when comparing flanking genes
exact_match – Number of perfectly matching flanking single copy markers required to associate RGPs
set_size – Number of single copy markers to use as flanking genes for RGP
- ppanggolin.figures.draw_spot.make_colors_for_iterable(it: set) dict
Randomly picks a color for all elements of a given iterable
- Parameters:
it – Iterable families or modules
- Returns:
Dictionary with for each element a random color associate
- ppanggolin.figures.draw_spot.mk_genomes(gene_lists: list, ordered_counts: list) -> (<class 'bokeh.models.sources.ColumnDataSource'>, <class 'list'>)
- Parameters:
gene_lists –
ordered_counts –
- Returns:
- ppanggolin.figures.draw_spot.mk_source_data(genelists: list, fam_col: dict, fam_to_mod: dict) -> (<class 'bokeh.models.sources.ColumnDataSource'>, <class 'list'>)
- Parameters:
genelists –
fam_col – Dictionary with for each family the corresponding color
fam_to_mod – Dictionary with the correspondence modules families
- Returns:
- ppanggolin.figures.draw_spot.order_gene_lists(gene_lists: list, overlapping_match: int, exact_match: int, set_size: int)
Order all rgps the same way, and order them by similarity in gene content.
- Parameters:
gene_lists – List of genes in rgps
overlapping_match – Allowed number of missing persistent genes when comparing flanking genes
exact_match – Number of perfectly matching flanking single copy markers required to associate RGPs
set_size – Number of single copy markers to use as flanking genes for RGP
- Returns:
List of ordered genes
- ppanggolin.figures.draw_spot.row_order_gene_lists(gene_lists: list) list
Row ordering of all rgps
- Parameters:
gene_lists –
:return : An ordered genes list
- ppanggolin.figures.draw_spot.subgraph(spot: Spot, outname: Path, with_border: bool = True, set_size: int = 3, multigenics: set | None = None, fam_to_mod: dict | None = None)
Write a pangeome subgraph of the gene families of a spot in gexf format
- Parameters:
spot –
outname –
with_border –
set_size –
multigenics –
fam_to_mod –
ppanggolin.figures.drawing module
- ppanggolin.figures.drawing.check_spot_args(args: Namespace)
Check whether the draw_spots and spots arguments are valid.
- Parameters:
args (argparse.Namespace) – The parsed command line arguments.
- Raises:
argparse.ArgumentError – If args.spots is specified but args.draw_spots is False.
- ppanggolin.figures.drawing.launch(args: Namespace)
Command launcher
- Parameters:
args – All arguments provide by user
- ppanggolin.figures.drawing.parser_draw(parser: ArgumentParser)
Parser for specific argument of draw command
- Parameters:
parser – parser for align argument
- ppanggolin.figures.drawing.subparser(sub_parser: _SubParsersAction) ArgumentParser
Subparser to launch PPanGGOLiN in Command line
:param sub_parser : sub_parser for align command
:return : parser arguments for align command
ppanggolin.figures.tile_plot module
- ppanggolin.figures.tile_plot.build_presence_absence_matrix(families: set, org_index: dict) csc_matrix
Build the presence-absence matrix for gene families.
This matrix indicates the presence (1) or absence (0) of each gene family across different organisms.
- Parameters:
families – A set of gene families to be included in the matrix.
org_index – A dictionary mapping each organism to its respective index in the matrix.
- Returns:
A sparse matrix (Compressed Sparse Column format) representing the presence-absence of gene families.
- ppanggolin.figures.tile_plot.create_partition_shapes(separators: List[Tuple[str, float]], xval_max: float, heatmap_row: int, partition_to_color: Dict[str, str]) List[dict]
Create the shapes for plot separators to visually distinguish partitions in the plot.
- Parameters:
separators – A list of tuples containing partition names and their corresponding separator positions.
xval_max – The maximum x-value for the plot.
heatmap_row – The row number of the heatmap.
partition_to_color – A dictionary mapping partition names to their corresponding colors.
- Returns:
A list of shape dictionaries for Plotly to use in the plot.
- ppanggolin.figures.tile_plot.create_tile_plot(binary_data: List[List[float]], text_data: List[List[str]], fam_order: List[str], partition_separator: List[tuple], order_organisms: List[Organism], dendrogram_fig: Figure, draw_dendrogram: bool) Figure
Create the heatmap tile plot using Plotly.
- Parameters:
binary_data – The binary presence-absence matrix data.
text_data – Hover text data for each cell in the heatmap.
fam_order – List of gene family names in the desired order.
partition_separator – List of tuples containing partition names and their separator positions.
order_organisms – List of organisms in the desired order.
dendrogram_fig – Plotly figure object for the dendrogram.
draw_dendrogram – Flag indicating whether to draw the dendrogram.
- Returns:
A Plotly Figure object representing the tile plot.
- ppanggolin.figures.tile_plot.draw_tile_plot(pangenome: Pangenome, output: Path, nocloud: bool = False, draw_dendrogram: bool = False, add_metadata: bool = False, metadata_sources: Set[str] | None = None, disable_bar: bool = False)
Draw a tile plot from a partitioned pangenome.
- Parameters:
pangenome – Partitioned pangenome.
output – Path to the output directory where the tile plot will be saved.
nocloud – If True, exclude the cloud partition from the plot.
draw_dendrogram – If True, include a dendrogram in the tile plot.
disable_bar – If True, disable the progress bar during processing.
- ppanggolin.figures.tile_plot.generate_dendrogram(mat_p_a: csc_matrix, org_index: dict) Tuple[List, Figure]
Generate the order of organisms based on a dendrogram.
- Parameters:
mat_p_a – Sparse matrix representing the presence-absence of gene families.
org_index – Dictionary mapping organism names to their respective indices in the matrix.
- Returns:
A tuple containing the ordered list of organisms and the dendrogram figure.
- ppanggolin.figures.tile_plot.get_heatmap_hover_text(ordered_families: List, order_organisms: List) List[List[str]]
Generate hover text for the heatmap cells.
- Parameters:
ordered_families – The list of ordered gene families.
order_organisms – The list of ordered organisms.
- Returns:
A 2D list of strings representing hover text for each heatmap cell.
- ppanggolin.figures.tile_plot.metadata_stringify(gene) str
Convert gene metadata to a formatted string.
- Parameters:
gene – The gene object with potential metadata.
- Returns:
A formatted string containing gene metadata information.
- ppanggolin.figures.tile_plot.order_nodes(partitions_dict: dict, shell_subs: set) Tuple[List, List[Tuple[str, float]]]
Order gene families based on their partitions.
- Parameters:
partitions_dict – A dictionary where keys are partition names and values are lists of gene families in each partition.
shell_subs – A set of shell subpartition names.
- Returns:
A tuple containing the ordered list of gene families and a list of partition separators.
- ppanggolin.figures.tile_plot.prepare_data_structures(pangenome: Pangenome, nocloud: bool) Tuple[set, dict]
Prepare data structures required for generating the tile plot.
- Parameters:
pangenome – Partitioned pangenome containing gene families and organism data.
nocloud – If True, exclude gene families belonging to the cloud partition.
- Returns:
A tuple containing a set of gene families to be plotted and a dictionary mapping organisms to their indices.
- ppanggolin.figures.tile_plot.process_tile_data(families: set, order_organisms: List) Tuple[List[List[float]], List[List[str]], List[str], List[Tuple[str, float]]]
Process data for each tile in the plot.
- Parameters:
families – A set of gene families to be processed.
order_organisms – The ordered list of organisms for the tile plot.
- Returns:
A tuple containing binary data, text data, family order, and separators for the plot.
ppanggolin.figures.ucurve module
- ppanggolin.figures.ucurve.build_ucurve_plot(number_of_organisms: int, count: Dict[int, Dict[str, int]], max_bar: int, is_partitioned: bool, has_undefined: bool, chao: float, soft_core: float, number_of_gene_families: int) Figure
Build a U-curve bar plot of gene family frequencies across genomes.
This function visualizes the distribution of gene families (e.g., persistent, shell, cloud) across genomes in the pangenome using a stacked bar chart. It supports both partitioned and non-partitioned pangenomes, and highlights the soft core threshold as a vertical dashed line.
- Parameters:
number_of_organism – Number of organisms in the pangenome.
count – Nested dictionary mapping genome counts to gene family counts by partition.
max_bar – Maximum y-axis value to set the plot height.
is_partitioned – Flag indicating if the pangenome is partitioned into persistent, shell, cloud.
has_undefined – Flag indicating presence of undefined gene families.
chao – Chao1 richness estimator value, shown in the plot title.
soft_core – Threshold fraction defining the soft core genome, used to draw a vertical line.
number_of_gene_families – Total number of gene families in the pangenome.
- Returns:
A Plotly Figure object containing the U-curve bar plot.
- ppanggolin.figures.ucurve.compute_family_counts(pangenome: Pangenome) Tuple[Dict[int, Dict[str, int]], int, bool, bool, float | str]
Compute gene family distribution across genomes, and estimate the total family richness using Chao1.
- This function computes:
The number of gene families found in exactly n genomes (for each n)
Their counts by partition (e.g. persistent, shell, cloud, etc.)
Whether the pangenome is partitioned
Whether any gene family has an undefined partition
A Chao1 estimate of the total number of gene families (to account for unobserved families)
- Parameters:
pangenome – A Pangenome object with annotated and partitioned gene families
- Returns:
A tuple containing: - family_count_by_org_and_part: dict[int][str] → count of gene families by number of organisms and partition - max_family_count: the highest number of gene families for any given organism count (for scaling plots) - is_partitioned: whether at least one gene family has a defined partition - has_undefined: whether any gene family has the undefined (“U”) partition - chao: Chao1 estimate for total gene family richness (float or “NA” if not computable)
- ppanggolin.figures.ucurve.draw_ucurve(pangenome: Pangenome, output: Path, soft_core: float = 0.95, disable_bar: bool = False)
Draws the U-shaped curve of gene family frequency distribution.
- Parameters:
pangenome – Partitioned pangenome
output – Path to output directory
soft_core – Soft core threshold to use
disable_bar – Allow to disable progress bar