ppanggolin.nem package

Submodules

ppanggolin.nem.partition module

ppanggolin.nem.partition.check_pangenome_former_partition(pangenome: Pangenome, force: bool = False)

checks pangenome status and .h5 files for former partitions, delete them if allowed or raise an error

Parameters:

pangenome – Pangenome object
force – Allow to force write on Pangenome file

ppanggolin.nem.partition.evaluate_nb_partitions(organisms: set, output: Path | None = None, sm_degree: int = 10, free_dispersion: bool = False, chunk_size: int = 500, krange: list | None = None, icl_margin: float = 0.05, draw_icl: bool = False, cpu: int = 1, seed: int = 42, tmpdir: Path | None = None, disable_bar: bool = False) → int

Evaluate the optimal number of partition for the pangenome

Parameters:

organisms – Set of organisms from pangenome
tmpdir – temporary directory path
output – output directory path to draw ICL
sm_degree – Maximum degree of the nodes to be included in the smoothing process.
free_dispersion – use if the dispersion around the centroid vector of each partition during must be free.
chunk_size – Size of the chunks when performing partitioning using chunks of organisms.
krange – Range of K values to test when detecting K automatically.
icl_margin – margin use to select the lowest K in maximizing ICL
draw_icl – draw the ICL curve for all the tested K values.
cpu – Number of available core
seed – seed used to generate random numbers
disable_bar – Disable progress bar

Returns:

Ideal number of partition computed

ppanggolin.nem.partition.launch(args: Namespace)

Command launcher

Parameters:: args – All arguments provide by user

ppanggolin.nem.partition.nem_samples(pack: tuple) → Tuple[dict, None, None] | Tuple[int, float, float] | Tuple[dict, dict, float]

run partitioning :param pack: {index: int, tmpdir: str, beta: float, sm_degree: int, free_dispersion: bool, kval: int, seed: int, init: str, keep_tmp_files: bool}

Returns:

ppanggolin.nem.partition.nem_single(args: List[Tuple[Path, int, float, bool, int, int, str, bool, int, bool]]) → Tuple[dict, None, None] | Tuple[int, float, float] | Tuple[dict, dict, float]

Allow to run partitioning in multiprocessing to evaluate partition number

Parameters:: args – {nem_dir_path: str, nb_org: int, beta: float, free_dispersion: bool, kval: int, seed: int, init: str, keep_files: bool, itermax: int, just_log_likelihood: bool}
Returns:: Result of run partitioning

ppanggolin.nem.partition.parser_partition(parser: ArgumentParser)

Parser for specific argument of partition command

Parameters:: parser – parser for align argument

ppanggolin.nem.partition.partition(pangenome: Pangenome, output: Path | None = None, beta: float = 2.5, sm_degree: int = 10, free_dispersion: bool = False, chunk_size: int = 500, kval: int = -1, krange: list | None = None, icl_margin: float = 0.05, draw_icl: bool = False, cpu: int = 1, seed: int = 42, tmpdir: Path | None = None, keep_tmp_files: bool = False, force: bool = False, disable_bar: bool = False)

Partitioning the pangenome

Parameters:

pangenome – Pangenome containing GeneFamilies to align with sequence set
tmpdir – temporary directory path
output – output directory path to draw ICL
beta – strength of the smoothing using the graph topology during partitioning. 0 deactivate spatial smoothing
sm_degree – Maximum degree of the nodes to be included in the smoothing process.
free_dispersion – use if the dispersion around the centroid vector of each partition during must be free.
chunk_size – Size of the chunks when performing partitioning using chunks of organisms.
kval – Number of partitions to use. Must be at least 2. If under 2, it will be detected automatically.
krange – Range of K values to test when detecting K automatically.
icl_margin – margin use to select the lowest K in maximizing ICL
draw_icl – draw the ICL curve for all the tested K values.
cpu – Number of available core
seed – seed used to generate random numbers
keep_tmp_files – True if you want to keep the temporary NEM files
force – Allow to force write on Pangenome file
disable_bar – Disable progress bar

ppanggolin.nem.partition.partition_nem(index: int, kval: int, beta: float = 2.5, sm_degree: int = 10, free_dispersion: bool = False, seed: int = 42, init: str = 'param_file', tmpdir: Path | None = None, keep_tmp_files: bool = False) → Tuple[dict, None, None] | Tuple[int, float, float] | Tuple[dict, dict, float]

Parameters:

index – Index of the sample group
tmpdir – temporary directory path
kval – Number of partitions to use
beta – strength of the smoothing using the graph topology during partitioning. 0 deactivate spatial smoothing
sm_degree – Maximum degree of the nodes to be included in the smoothing process.
free_dispersion – use if the dispersion around the centroid vector of each partition during must be free.
seed – seed used to generate random numbers
init – Initiate nem parameters with pangenome parameters or randomly
keep_tmp_files – True if you want to keep the temporary NEM files

Returns:

ppanggolin.nem.partition.run_partitioning(nem_dir_path: Path, nb_org: int, beta: float = 2.5, free_dispersion: bool = False, kval: int = 3, seed: int = 42, init: str = 'param_file', keep_files: bool = False, itermax: int = 100, just_log_likelihood: bool = False) → Tuple[dict, None, None] | Tuple[int, float, float] | Tuple[dict, dict, float]

Main function to make partitioning

Parameters:

nem_dir_path – Path to directory with nem files
nb_org – Number of organisms
beta – strength of the smoothing using the graph topology during partitioning. 0 deactivate spatial smoothing
free_dispersion – use if the dispersion around the centroid vector of each partition during must be free.
kval – Number of partitions to use. Must be at least 2. If under 2, it will be detected automatically.
seed – seed used to generate random numbers
init – Initiate nem parameters with pangenome parameters or randomly
keep_files – True if you want to keep the NEM files
itermax – Maximum iteration to compute partitioning
just_log_likelihood – Return only nem parameter result

Returns:

Nem parameters and if not just log likelihood the families associated to partition

ppanggolin.nem.partition.subparser(sub_parser: _SubParsersAction) → ArgumentParser

Subparser to launch PPanGGOLiN in Command line

:param sub_parser : sub_parser for align command

:return : parser arguments for align command

ppanggolin.nem.partition.write_nem_input_files(tmpdir: Path, organisms: set, sm_degree: int = 10) → Tuple[float, int]

Create and format input files for partitioning with NEM

Parameters:

tmpdir – temporary directory path
organisms – Set of organism from pangenome
sm_degree – Maximum degree of the nodes to be included in the smoothing process.

Returns:

total edge weight to ponderate beta and number of families

ppanggolin.nem.rarefaction module

ppanggolin.nem.rarefaction.draw_curve(output: Path, data: list, max_sampling: int = 10)

Draw the rarefaction curve and associated data

Parameters:

output – output directory path to draw the rarefaction curve and associated data
max_sampling – Maximum number of organisms in a sample
data –

ppanggolin.nem.rarefaction.launch(args: Namespace)

Command launcher

Parameters:: args – All arguments provide by user

ppanggolin.nem.rarefaction.launch_raref_nem(args: Tuple[int, Path, float, int, bool, int, int, list, int]) → Tuple[Tuple[Dict[str, int], int]]

Launch raref_nem in multiprocessing

Parameters:: args – {index: int, tmpdir: str, beta: float, sm_degree: int, free_dispersion: bool, chunk_size: int, kval: int, krange: list, seed: int}
Returns:: Count of each partition and parameters for the given sample index

ppanggolin.nem.rarefaction.make_rarefaction_curve(pangenome: Pangenome, output: Path, tmpdir: Path | None = None, beta: float = 2.5, depth: int = 30, min_sampling: int = 1, max_sampling: int = 100, sm_degree: int = 10, free_dispersion: bool = False, chunk_size: int = 500, kval: int = -1, krange: list | None = None, cpu: int = 1, seed: int = 42, kestimate: bool = False, soft_core: float = 0.95, disable_bar: bool = False)

Main function to make the rarefaction curve

Parameters:

pangenome – Pangenome containing GeneFamilies to align with sequence set
output – output directory path to draw the rarefaction curve and associated data
tmpdir – temporary directory path
beta – strength of the smoothing using the graph topology during partitioning. 0 deactivate spatial smoothing
depth – Number of samplings at each sampling point
min_sampling – Minimum number of organisms in a sample
max_sampling – Maximum number of organisms in a sample
sm_degree – Maximum degree of the nodes to be included in the smoothing process.
free_dispersion – use if the dispersion around the centroid vector of each partition during must be free.
chunk_size – Size of the chunks when performing partitioning using chunks of organisms.
kval – Number of partitions to use. Must be at least 2. If under 2, it will be detected automatically.
krange – Range of K values to test when detecting K automatically.
cpu – Number of available core
seed – seed used to generate random numbers
kestimate – recompute the number of partitions for each sample between the values provided by krange
soft_core – Soft core threshold
disable_bar – Disable progress bar

ppanggolin.nem.rarefaction.parser_rarefaction(parser: ArgumentParser)

Parser for specific argument of graph command

Parameters:: parser – parser for align argument

ppanggolin.nem.rarefaction.raref_nem(index: int, tmpdir: Path, beta: float = 2.5, sm_degree: int = 10, free_dispersion: bool = False, chunk_size: int = 500, kval: int = -1, krange: list | None = None, seed: int = 42) → Tuple[Dict[str, int], int]

Parameters:

index – Index of the sample group organisms
tmpdir – temporary directory path
beta – strength of the smoothing using the graph topology during partitioning. 0 deactivate spatial smoothing
sm_degree – Maximum degree of the nodes to be included in the smoothing process.
free_dispersion – use if the dispersion around the centroid vector of each partition during must be free.
chunk_size – Size of the chunks when performing partitioning using chunks of organisms.
kval – Number of partitions to use
krange – Range of K values to test when detecting K automatically.
seed – seed used to generate random numbers

Returns:

Count of each partition and parameters for the given sample index

ppanggolin.nem.rarefaction.subparser(sub_parser: _SubParsersAction) → ArgumentParser

Subparser to launch PPanGGOLiN in Command line

:param sub_parser : sub_parser for align command

:return : parser arguments for align command

ppanggolin.nem package

Submodules

ppanggolin.nem.partition module

ppanggolin.nem.rarefaction module

Module contents