ppanggolin.context package

Submodules

ppanggolin.context.searchGeneContext module

ppanggolin.context.searchGeneContext.add_edges_to_context_graph(context_graph: Graph, contig: Contig, contig_windows: List[Tuple[int, int]], transitivity: int) → Graph

Add edges to the context graph based on contig genes and windows.

Parameters:

context_graph – The context graph to which edges will be added.
contig – contig containing genes to add the edges
contig_windows – A list of tuples representing the start and end positions of contig windows.
transitivity – The number of next genes to consider when adding edges.

Returns:

A context graph specific to the contig of interest with edges added

ppanggolin.context.searchGeneContext.add_val_to_dict_attribute(attr_dict: dict, attribute_key, attribute_value)

Add an attribute value to an edge or node dictionary set.

Parameters:

attr_dict – The dictionary containing the edge/node attributes.
attribute_key – The key of the attribute.
attribute_value – The value of the attribute to be added.

ppanggolin.context.searchGeneContext.align_sequences_to_families(pangenome: Pangenome, output: Path, sequence_file: Path | None = None, identity: float = 0.5, coverage: float = 0.8, use_representatives: bool = False, no_defrag: bool = False, cpu: int = 1, translation_table: int = 11, tmpdir: Path | None = None, keep_tmp: bool = False, disable_bar=True) → Tuple[Set[GeneFamily], Dict[GeneFamily, Set[str]]]

Align sequences to pangenome gene families to get families of interest

Parameters:

pangenome – Pangenome containing GeneFamilies to align with sequence set
sequence_file – Path to file containing the sequences
output – Path to output directory
tmpdir – Path to temporary directory
identity – minimum identity threshold between sequences and gene families for the alignment
coverage – minimum coverage threshold between sequences and gene families for the alignment
use_representatives – Use representative sequences of families rather than all sequences to align input genes
no_defrag – do not use the defragmentation workflow if true
cpu – Number of core used to process
disable_bar – Allow preventing bar progress print
translation_table – The translation table to use when the input sequences are nucleotide sequences.
keep_tmp – If True, keep temporary files.

Returns:

Set of gene families of interest and dict which link gene families to sequence ID

ppanggolin.context.searchGeneContext.check_pangenome_for_context_search(pangenome: Pangenome, sequences: bool = False)

Check pangenome status and information to search context

Parameters:

pangenome – The pangenome object
sequences – True if search contexts with sequences

ppanggolin.context.searchGeneContext.compute_edge_metrics(context_graph: Graph, gene_proportion_cutoff: float) → None

Compute various metrics on the edges of the context graph.

Parameters:

context_graph – The context graph.
gene_proportion_cutoff – The minimum proportion of shared genes between two features for their edge to be considered significant.

ppanggolin.context.searchGeneContext.compute_gene_context_graph(families: Iterable[GeneFamily], transitive: int = 4, window_size: int = 0, disable_bar: bool = False) → Tuple[Graph, Dict[FrozenSet[GeneFamily], Set[Organism]]]

Construct the graph of gene contexts between families of the pangenome.

Parameters:

families – An iterable of gene families.
transitive – Size of the transitive closure used to build the graph.
window_size – Size of the window for extracting gene contexts (default: 0).
disable_bar – Flag to disable the progress bar (default: False).

Returns:

The constructed gene context graph and the combination of gene families corresponding to the context that exist in at least one genome

ppanggolin.context.searchGeneContext.export_context_to_dataframe(gene_contexts: set, fam2seq: Dict[GeneFamily, Set[str]], families_of_interest: Set[GeneFamily], output: Path)

Export the results into dataFrame

Parameters:

gene_contexts – connected components found in the pangenome
fam2seq – Dictionary with gene families as keys and set of sequence ids as values
families_of_interest – families of interest that are at the origin of the context.
output – output path

ppanggolin.context.searchGeneContext.fam_to_seq(seq_to_pan: dict) → dict

Create a dictionary with gene families as keys and list of sequences id as values

Parameters:: seq_to_pan – Dictionary storing the sequence ids as keys and the gene families to which they are assigned as values
Returns:: Dictionary reversed

ppanggolin.context.searchGeneContext.get_contig_to_genes(gene_families: Iterable[GeneFamily]) → Dict[Contig, Set[Gene]]

Group genes from specified gene families by contig.

Parameters:: gene_families – An iterable of gene families object.
Returns:: A dictionary mapping contigs to sets of genes.

ppanggolin.context.searchGeneContext.get_gene_contexts(context_graph: Graph, families_of_interest: Set[GeneFamily]) → Set[GeneContext]

Extract gene contexts from a context graph based on the provided set of gene families of interest.

Gene contexts are extracted from a context graph by identifying connected components. The function filters the connected components based on the following criteria: - Remove singleton families (components with only one gene family). - Remove components that do not contain any gene families of interest.

For each remaining connected component, a GeneContext object is created.

Parameters:

context_graph – The context graph from which to extract gene contexts.
families_of_interest – Set of gene families of interest.

Returns:

Set of GeneContext objects representing the extracted gene contexts.

ppanggolin.context.searchGeneContext.get_n_next_genes_index(current_index: int, next_genes_count: int, contig_size: int, is_circular: bool = False) → Iterator[int]

Generate the indices of the next genes based on the current index and contig properties.

Parameters:

current_index – The index of the current gene.
next_genes_count – The number of next genes to consider.
contig_size – The total number of genes in the contig.
is_circular – Flag indicating whether the contig is circular (default: False).

Returns:

An iterator yielding the indices of the next genes.

Raises:

IndexError – If the current index is out of range for the given contig size.

ppanggolin.context.searchGeneContext.increment_attribute_counter(edge_dict: dict, key: Hashable)

Increment the counter for an edge/node attribute in the edge/node dictionary.

Parameters:

edge_dict – The dictionary containing the attributes.
key – The key of the attribute.

ppanggolin.context.searchGeneContext.launch(args: Namespace)

Command launcher

Parameters:: args – All arguments provide by user

ppanggolin.context.searchGeneContext.make_graph_writable(context_graph)

The original context graph contains ppanggolin objects as nodes and lists and dictionaries in edge attributes. Since these objects cannot be written to the output graph, this function creates a new graph that contains only writable objects.

Parameters:: context_graph – List of gene context. it includes graph of the context

ppanggolin.context.searchGeneContext.parser_context(parser: ArgumentParser)

Parser for specific argument of context command

Parameters:: parser – parser for align argument

ppanggolin.context.searchGeneContext.search_gene_context_in_pangenome(pangenome: Pangenome, output: Path, sequence_file: Path | None = None, families: Path | None = None, transitive: int = 4, jaccard_threshold: float = 0.85, window_size: int = 1, graph_format: str = 'graphml', disable_bar=True, **kwargs)

Main function to search common gene contexts between sequence set and pangenome families

Parameters:

pangenome – Pangenome containing GeneFamilies to align with sequence set
sequence_file – Path to file containing the sequences
families – Path to file containing families name
output – Path to output directory
transitive – number of genes to check on both sides of a family aligned with an input sequence
jaccard_threshold – Jaccard index threshold to filter edges in graph
window_size – Number of genes to consider in the gene context.
graph_format – Write format of the context graph. Can be graphml or gexf
disable_bar – Allow preventing bar progress print

ppanggolin.context.searchGeneContext.subparser(sub_parser: _SubParsersAction) → ArgumentParser

Subparser to launch PPanGGOLiN in Command line

:param sub_parser : sub_parser for align command

:return : parser arguments for align command

ppanggolin.context.searchGeneContext.write_graph(graph: Graph, output_dir: Path, graph_format: str)

Write a graph to file in the GraphML format or/and in GEXF format.

Parameters:

graph – Graph to write
output_dir – The output directory where the graph file will be written.
graph_format – Formats of the output graph. Can be graphml or gexf

ppanggolin.context package

Submodules

ppanggolin.context.searchGeneContext module

Module contents