Write pangenome sequences
The fasta command can be used to write sequences of the pangenome or specific parts of the pangenome in FASTA format.
Most options require a partition.
Available partitions are:
allfor the entire pangenome.Persistentfor persistent familiesShellfor shell genes or familiesCloudfor cloud genes or familiesrgpfor genes or families found in RGPscorefor core genes or familiessoftcorefor softcore genes or families
When using the softcore filter, the --soft_core option can be used to modify the threshold used to determine what is part of the softcore. It is set to 0.95 by default.
Genes
Nucleotide sequences
With the --genes partition option PPanGGOLiN will write the nucleotide CDS sequences for the given partition.
It can be used as such, to write all the genes of the pangenome for example:
ppanggolin fasta -p pangenome.h5 --output MY_GENES --genes all
Or to write only the persistent genes:
ppanggolin fasta -p pangenome.h5 --output MY_GENES --genes persistent
Protein sequences
With the --proteins partition option PPanGGOLiN will write the nucleotide CDS sequences for the given partition.
It can be used as such, to write all the genes of the pangenome for example:
ppanggolin fasta -p pangenome.h5 --output MY_GENES --proteins all
Or to write only the cloud genes:
ppanggolin fasta -p pangenome.h5 --output MY_GENES --genes_prot cloud
To translate the gene sequences, PPanGGOLiN uses the MMSeqs2 translatenucs command.
So for this option you can specify multiple threads with --cpu.
You can also specify the translation table to use with --translate_table.
The temporary directory, can be specified with --tmpdir to store the MMSeqs2 database and other files. Temporary files will be deleted at the end of the execution. To keep them, you can use the --keep_tmp option.
Gene families
Protein sequences
With the --prot_families partition option PPanGGOLiN will write the protein sequences of the representative gene for each family for the given partition.
It can be used as such for all families:
ppanggolin fasta -p pangenome.h5 --output MY_PROT --prot_families all
Or for all the shell families for example:
ppanggolin fasta -p pangenome.h5 --output MY_PROT --prot_families shell
Nucleotide sequences
With the --gene_families partition option PPanGGOLiN will write the nucleotide sequences of the representative gene for each family for the given partition.
It can be used as such for all families:
ppanggolin fasta -p pangenome.h5 --output MY_GENES_FAMILIES --gene_families all
Or for the core families for example:
ppanggolin fasta -p pangenome.h5 --output MY_GENES_FAMILIES --gene_families core
Modules
All the precedent command admit a module as partition.
So you can write the protein sequences for the family in module_X as such:
ppanggolin fasta -p pangenome.h5 --output MY_REGIONS --prot_families module_X
Or the nucleotide sequence of all genes in module_X:
ppanggolin fasta -p pangenome.h5 --output MY_REGIONS --genes module_X
Regions
This option can be used to write the nucleotide sequences of the detected RGPs. It requires the fasta sequences used to compute the pangenome, as originally provided when you computed your pangenome.
This command has only two filters:
all, for all regions
complete, for only the ‘complete’ regions which are not on a contig border
It can be used as such:
ppanggolin fasta -p pangenome.h5 --output MY_REGIONS --regions all --fasta genomes.fasta.list