ppanggolin.nem package
Submodules
ppanggolin.nem.partition module
- ppanggolin.nem.partition.check_pangenome_former_partition(pangenome: Pangenome, force: bool = False)
checks pangenome status and .h5 files for former partitions, delete them if allowed or raise an error
- Parameters:
pangenome – Pangenome object
force – Allow to force write on Pangenome file
- ppanggolin.nem.partition.evaluate_nb_partitions(organisms: set, output: Path | None = None, sm_degree: int = 10, free_dispersion: bool = False, chunk_size: int = 500, krange: list | None = None, icl_margin: float = 0.05, draw_icl: bool = False, cpu: int = 1, seed: int = 42, tmpdir: Path | None = None, disable_bar: bool = False) int
Evaluate the optimal number of partition for the pangenome
- Parameters:
organisms – Set of organisms from pangenome
tmpdir – temporary directory path
output – output directory path to draw ICL
sm_degree – Maximum degree of the nodes to be included in the smoothing process.
free_dispersion – use if the dispersion around the centroid vector of each partition during must be free.
chunk_size – Size of the chunks when performing partitioning using chunks of organisms.
krange – Range of K values to test when detecting K automatically.
icl_margin – margin use to select the lowest K in maximizing ICL
draw_icl – draw the ICL curve for all the tested K values.
cpu – Number of available core
seed – seed used to generate random numbers
disable_bar – Disable progress bar
- Returns:
Ideal number of partition computed
- ppanggolin.nem.partition.launch(args: Namespace)
Command launcher
- Parameters:
args – All arguments provide by user
- ppanggolin.nem.partition.nem_samples(pack: tuple) Tuple[dict, None, None] | Tuple[int, float, float] | Tuple[dict, dict, float]
run partitioning :param pack: {index: int, tmpdir: str, beta: float, sm_degree: int, free_dispersion: bool, kval: int, seed: int, init: str, keep_tmp_files: bool}
- Returns:
- ppanggolin.nem.partition.nem_single(args: List[Tuple[Path, int, float, bool, int, int, str, bool, int, bool]]) Tuple[dict, None, None] | Tuple[int, float, float] | Tuple[dict, dict, float]
Allow to run partitioning in multiprocessing to evaluate partition number
- Parameters:
args – {nem_dir_path: str, nb_org: int, beta: float, free_dispersion: bool, kval: int, seed: int, init: str, keep_files: bool, itermax: int, just_log_likelihood: bool}
- Returns:
Result of run partitioning
- ppanggolin.nem.partition.parser_partition(parser: ArgumentParser)
Parser for specific argument of partition command
- Parameters:
parser – parser for align argument
- ppanggolin.nem.partition.partition(pangenome: Pangenome, output: Path | None = None, beta: float = 2.5, sm_degree: int = 10, free_dispersion: bool = False, chunk_size: int = 500, kval: int = -1, krange: list | None = None, icl_margin: float = 0.05, draw_icl: bool = False, cpu: int = 1, seed: int = 42, tmpdir: Path | None = None, keep_tmp_files: bool = False, force: bool = False, disable_bar: bool = False)
Partitioning the pangenome
- Parameters:
pangenome – Pangenome containing GeneFamilies to align with sequence set
tmpdir – temporary directory path
output – output directory path to draw ICL
beta – strength of the smoothing using the graph topology during partitioning. 0 deactivate spatial smoothing
sm_degree – Maximum degree of the nodes to be included in the smoothing process.
free_dispersion – use if the dispersion around the centroid vector of each partition during must be free.
chunk_size – Size of the chunks when performing partitioning using chunks of organisms.
kval – Number of partitions to use. Must be at least 2. If under 2, it will be detected automatically.
krange – Range of K values to test when detecting K automatically.
icl_margin – margin use to select the lowest K in maximizing ICL
draw_icl – draw the ICL curve for all the tested K values.
cpu – Number of available core
seed – seed used to generate random numbers
keep_tmp_files – True if you want to keep the temporary NEM files
force – Allow to force write on Pangenome file
disable_bar – Disable progress bar
- ppanggolin.nem.partition.partition_nem(index: int, kval: int, beta: float = 2.5, sm_degree: int = 10, free_dispersion: bool = False, seed: int = 42, init: str = 'param_file', tmpdir: Path | None = None, keep_tmp_files: bool = False) Tuple[dict, None, None] | Tuple[int, float, float] | Tuple[dict, dict, float]
- Parameters:
index – Index of the sample group
tmpdir – temporary directory path
kval – Number of partitions to use
beta – strength of the smoothing using the graph topology during partitioning. 0 deactivate spatial smoothing
sm_degree – Maximum degree of the nodes to be included in the smoothing process.
free_dispersion – use if the dispersion around the centroid vector of each partition during must be free.
seed – seed used to generate random numbers
init – Initiate nem parameters with pangenome parameters or randomly
keep_tmp_files – True if you want to keep the temporary NEM files
- Returns:
- ppanggolin.nem.partition.run_partitioning(nem_dir_path: Path, nb_org: int, beta: float = 2.5, free_dispersion: bool = False, kval: int = 3, seed: int = 42, init: str = 'param_file', keep_files: bool = False, itermax: int = 100, just_log_likelihood: bool = False) Tuple[dict, None, None] | Tuple[int, float, float] | Tuple[dict, dict, float]
Main function to make partitioning
- Parameters:
nem_dir_path – Path to directory with nem files
nb_org – Number of organisms
beta – strength of the smoothing using the graph topology during partitioning. 0 deactivate spatial smoothing
free_dispersion – use if the dispersion around the centroid vector of each partition during must be free.
kval – Number of partitions to use. Must be at least 2. If under 2, it will be detected automatically.
seed – seed used to generate random numbers
init – Initiate nem parameters with pangenome parameters or randomly
keep_files – True if you want to keep the NEM files
itermax – Maximum iteration to compute partitioning
just_log_likelihood – Return only nem parameter result
- Returns:
Nem parameters and if not just log likelihood the families associated to partition
- ppanggolin.nem.partition.subparser(sub_parser: _SubParsersAction) ArgumentParser
Subparser to launch PPanGGOLiN in Command line
:param sub_parser : sub_parser for align command
:return : parser arguments for align command
- ppanggolin.nem.partition.write_nem_input_files(tmpdir: Path, organisms: set, sm_degree: int = 10) Tuple[float, int]
Create and format input files for partitioning with NEM
- Parameters:
tmpdir – temporary directory path
organisms – Set of organism from pangenome
sm_degree – Maximum degree of the nodes to be included in the smoothing process.
- Returns:
total edge weight to ponderate beta and number of families
ppanggolin.nem.rarefaction module
- ppanggolin.nem.rarefaction.draw_curve(output: Path, data: list, max_sampling: int = 10)
Draw the rarefaction curve and associated data
- Parameters:
output – output directory path to draw the rarefaction curve and associated data
max_sampling – Maximum number of organisms in a sample
data –
- ppanggolin.nem.rarefaction.launch(args: Namespace)
Command launcher
- Parameters:
args – All arguments provide by user
- ppanggolin.nem.rarefaction.launch_raref_nem(args: Tuple[int, Path, float, int, bool, int, int, list, int]) Tuple[Tuple[Dict[str, int], int]]
Launch raref_nem in multiprocessing
- Parameters:
args – {index: int, tmpdir: str, beta: float, sm_degree: int, free_dispersion: bool, chunk_size: int, kval: int, krange: list, seed: int}
- Returns:
Count of each partition and parameters for the given sample index
- ppanggolin.nem.rarefaction.make_rarefaction_curve(pangenome: Pangenome, output: Path, tmpdir: Path | None = None, beta: float = 2.5, depth: int = 30, min_sampling: int = 1, max_sampling: int = 100, sm_degree: int = 10, free_dispersion: bool = False, chunk_size: int = 500, kval: int = -1, krange: list | None = None, cpu: int = 1, seed: int = 42, kestimate: bool = False, soft_core: float = 0.95, disable_bar: bool = False)
Main function to make the rarefaction curve
- Parameters:
pangenome – Pangenome containing GeneFamilies to align with sequence set
output – output directory path to draw the rarefaction curve and associated data
tmpdir – temporary directory path
beta – strength of the smoothing using the graph topology during partitioning. 0 deactivate spatial smoothing
depth – Number of samplings at each sampling point
min_sampling – Minimum number of organisms in a sample
max_sampling – Maximum number of organisms in a sample
sm_degree – Maximum degree of the nodes to be included in the smoothing process.
free_dispersion – use if the dispersion around the centroid vector of each partition during must be free.
chunk_size – Size of the chunks when performing partitioning using chunks of organisms.
kval – Number of partitions to use. Must be at least 2. If under 2, it will be detected automatically.
krange – Range of K values to test when detecting K automatically.
cpu – Number of available core
seed – seed used to generate random numbers
kestimate – recompute the number of partitions for each sample between the values provided by krange
soft_core – Soft core threshold
disable_bar – Disable progress bar
- ppanggolin.nem.rarefaction.parser_rarefaction(parser: ArgumentParser)
Parser for specific argument of graph command
- Parameters:
parser – parser for align argument
- ppanggolin.nem.rarefaction.raref_nem(index: int, tmpdir: Path, beta: float = 2.5, sm_degree: int = 10, free_dispersion: bool = False, chunk_size: int = 500, kval: int = -1, krange: list | None = None, seed: int = 42) Tuple[Dict[str, int], int]
- Parameters:
index – Index of the sample group organisms
tmpdir – temporary directory path
beta – strength of the smoothing using the graph topology during partitioning. 0 deactivate spatial smoothing
sm_degree – Maximum degree of the nodes to be included in the smoothing process.
free_dispersion – use if the dispersion around the centroid vector of each partition during must be free.
chunk_size – Size of the chunks when performing partitioning using chunks of organisms.
kval – Number of partitions to use
krange – Range of K values to test when detecting K automatically.
seed – seed used to generate random numbers
- Returns:
Count of each partition and parameters for the given sample index
- ppanggolin.nem.rarefaction.subparser(sub_parser: _SubParsersAction) ArgumentParser
Subparser to launch PPanGGOLiN in Command line
:param sub_parser : sub_parser for align command
:return : parser arguments for align command