Reference¶
proksee_batch¶
Proksee Batch.
proksee_batch.main¶
Main module for the Proksee Batch tool.
This file defines the proksee-batch command-line interface and implements the high-level logic of the tool.
- proksee_batch.__main__.generate_js_data(output_dir, genome_info, run_date, input_dir)¶
Generates a JavaScript file with genome information wrapped in a variable assignment.
- Parameters:
output_dir (str)
genome_info (List[Dict[str, Any]])
run_date (str)
input_dir (str)
- Return type:
None
- proksee_batch.__main__.get_description_from_metadata(metadata_path)¶
Extracts the accession and description from a metadata file.
- Parameters:
metadata_path (str) – Path to the metadata file.
- Returns:
The description extracted from the metadata file.
- Return type:
str
- proksee_batch.__main__.handle_error_exit(error_message, exit_code=1)¶
Handles errors by printing a message to sys.stderr and exiting the program.
- Parameters:
error_message (str) – The error message to be printed.
exit_code (int) – The exit code to be used for sys.exit. Defaults to 1.
- Return type:
None
- Exits:
SystemExit: Exits the program with the provided exit code.
proksee_batch.download_example_input¶
Downloads GenBank files from NCBI FTP site.
- proksee_batch.download_example_input.download_example_genbank_files(genome_dir)¶
Downloads GenBank files from NCBI FTP site.
Parameters: genome_dir (str): The path to the directory where the GenBank files should be saved.
Returns: None
- Parameters:
genome_dir (str)
- Return type:
None
- proksee_batch.download_example_input.download_example_input(output_dir)¶
Downloads example GenBank, BLAST, and BED files for several example bacterial genomes.
- Parameters:
output_dir (str)
- Return type:
None
- proksee_batch.download_example_input.download_file(url, local_filename)¶
Downloads a file from a given URL and saves it to the local file system.
Parameters: url (str): The URL of the file to be downloaded. local_filename (str): The local path, including filename, where the file should be saved.
Returns: None
- Parameters:
url (str)
local_filename (str)
- Return type:
None
- proksee_batch.download_example_input.download_genbank_file(output_dir, url)¶
Downloads GenBank files from NCBI FTP site.
Parameters: output_dir (str): The path to the directory where the GenBank files should be saved. url (str): The URL of the file to be downloaded.
Returns: None
- Parameters:
output_dir (str)
url (str)
- Return type:
None
proksee_batch.generate_report_html¶
Code for generating an HTML report file with a table containing links to Proksee projects and images for each sample. A single genome viewer is positioned to the right of the table.
The output directory will be structured as in the following example:
- output_directory/
- cgview-js_code/
…
- html_report_code/
style.css table-functions.js viewer-functions.js utilities.js
- data/
genome_name_1.js genome_name_2.js …
report.html
- proksee_batch.generate_report_html.generate_report_html(output_dir, genome_info)¶
Generates an HTML report file with a table containing links to Proksee projects and images for each sample. A single genome viewer is positioned to the right of the table.
- Parameters:
output_dir (str)
genome_info (Dict[str, Any])
- Return type:
None
proksee_batch.get_stats_from_seq_file¶
- proksee_batch.get_stats_from_seq_file.get_stats_from_seq_file(seq_file, format)¶
Get basic stats from a GenBank or FASTA file.
- Parameters:
seq_file (str)
format (str)
- Return type:
Tuple[str, str, int, int, float]
proksee_batch.merge_cgview_json_with_template¶
- proksee_batch.merge_cgview_json_with_template.merge_cgview_json_with_template(basic_json_file, template_file, output_file)¶
Merge a basic cgview map in JSON format with a Proksee configuration file in JSON format.
- Parameters:
basic_json_file (str)
template_file (str)
output_file (str)
- Return type:
None
proksee_batch.parse_additional_features¶
- class proksee_batch.parse_additional_features.BedFeatureDict¶
- class proksee_batch.parse_additional_features.BedMetaDict¶
- class proksee_batch.parse_additional_features.BlastFeatureDict¶
- class proksee_batch.parse_additional_features.BlastMetaDict¶
- class proksee_batch.parse_additional_features.FeatureDecorationDict¶
- class proksee_batch.parse_additional_features.GffFeatureDict¶
- class proksee_batch.parse_additional_features.GffMetaDict¶
- class proksee_batch.parse_additional_features.TrackDict¶
- class proksee_batch.parse_additional_features.VcfFeatureDict¶
- class proksee_batch.parse_additional_features.VcfMetaDict¶
- proksee_batch.parse_additional_features.add_bed_features_and_tracks(bed_files, json_file, output_file)¶
Parses BED files, adds the parsed BED features and tracks to the cgview map JSON data structure, and writes the cgview map JSON data structure to a new file.
Parameters: bed_files (list): A list of paths to BED files. json_file (str): The path to a cgview map JSON file. output_file (str): The path to the output file.
Returns: None
- Parameters:
bed_files (List[str])
json_file (str)
output_file (str)
- Return type:
None
- proksee_batch.parse_additional_features.add_blast_features_and_tracks(blast_files, json_file, output_file)¶
Parses BLAST result files, adds the parsed BLAST features and tracks to the cgview map JSON data structure, and writes the cgview map JSON data structure to a new file.
Parameters: blast_files (list): A list of paths to BLAST result files. json_file (str): The path to a cgview map JSON file. output_file (str): The path to the output file.
Returns: None
- Parameters:
blast_files (List[str])
json_file (str)
output_file (str)
- Return type:
None
- proksee_batch.parse_additional_features.add_gff_features_and_tracks(gff_files, json_file, output_file)¶
Parses GFF files, adds the parsed GFF features and tracks to the cgview map JSON data structure, and writes the cgview map JSON data structure to a new file.
Parameters: gff_files (list): A list of paths to GFF files. json_file (str): The path to a cgview map JSON file. output_file (str): The path to the output file.
Returns: None
- Parameters:
gff_files (List[str])
json_file (str)
output_file (str)
- Return type:
None
- proksee_batch.parse_additional_features.add_vcf_features_and_tracks(vcf_files, json_file, output_file)¶
Parses VCF files, adds the parsed VCF features and tracks to the cgview map JSON data structure, and writes the cgview map JSON data structure to a new file.
Parameters: vcf_files (list): A list of paths to VCF files. json_file (str): The path to a cgview map JSON file. output_file (str): The path to the output file.
Returns: None
- Parameters:
vcf_files (List[str])
json_file (str)
output_file (str)
- Return type:
None
- proksee_batch.parse_additional_features.get_feature_locations_and_scores_from_bed_features(bed_features)¶
Gets feature locations and scores from BED features.
Parameters: bed_features (list): A list of parsed BED features.
Returns: list: A list of tuples containing feature locations and scores.
- Parameters:
bed_features (List[BedFeatureDict])
- Return type:
List[Tuple[int, int, float]]
- proksee_batch.parse_additional_features.get_feature_locations_and_scores_from_blast_features(blast_features)¶
Gets feature locations and scores from BLAST features.
Parameters: blast_features (list): A list of parsed BLAST features.
Returns: list: A list of tuples containing feature locations and scores.
- Parameters:
blast_features (List[BlastFeatureDict])
- Return type:
List[Tuple[int, int, float]]
- proksee_batch.parse_additional_features.get_feature_locations_and_scores_from_gff_features(gff_features)¶
Gets feature locations and scores from GFF features.
Parameters: gff_features (list): A list of parsed GFF features.
Returns: list: A list of tuples containing feature locations and scores.
- Parameters:
gff_features (List[GffFeatureDict])
- Return type:
List[Tuple[int, int, float]]
- proksee_batch.parse_additional_features.get_feature_locations_and_scores_from_vcf_features(vcf_features)¶
Gets feature locations and scores from VCF features.
Parameters: vcf_features (list): A list of parsed VCF features.
Returns: list: A list of tuples containing feature locations and scores.
- Parameters:
vcf_features (List[VcfFeatureDict])
- Return type:
List[Tuple[int, int, float]]
- proksee_batch.parse_additional_features.parse_bed_files(bed_files)¶
Parses BED files.
Parameters: bed_files (list): A list of paths to BED files.
Returns: tuple: A tuple containing a list of parsed BED features and a list of parsed BED tracks.
- Parameters:
bed_files (List[str])
- Return type:
Tuple[List[BedFeatureDict], List[TrackDict]]
- proksee_batch.parse_additional_features.parse_blast_files(blast_files)¶
Parses BLAST result files.
Parameters: blast_files (list): A list of paths to BLAST result files.
Returns: tuple: A tuple containing a list of parsed BLAST features and a list of parsed BLAST tracks.
- Parameters:
blast_files (List[str])
- Return type:
Tuple[List[BlastFeatureDict], List[TrackDict]]
- proksee_batch.parse_additional_features.parse_gff_files(gff_files)¶
Parses GFF files.
Parameters: gff_files (list): A list of paths to GFF files.
Returns: tuple: A tuple containing a list of parsed GFF features and a list of parsed GFF tracks.
- Parameters:
gff_files (List[str])
- Return type:
Tuple[List[GffFeatureDict], List[TrackDict]]
- proksee_batch.parse_additional_features.parse_vcf_files(vcf_files)¶
Parses VCF files.
Parameters: vcf_files (list): A list of paths to VCF files.
Returns: tuple: A tuple containing a list of parsed VCF features and a list of parsed VCF tracks.
- Parameters:
vcf_files (List[str])
- Return type:
Tuple[List[VcfFeatureDict], List[TrackDict]]
proksee_batch.seq_file_to_cgview_json¶
- proksee_batch.seq_file_to_cgview_json.fasta_to_cgview_json(genome_name, fasta_file, json_file)¶
Convert a FASTA file to a CGView JSON file. The JSON file will be in the same format as generated by the genbank_to_cgview_json function. There will be no features in the JSON file, only sequences/contigs.
- Parameters:
genome_name (str)
fasta_file (str)
json_file (str)
- Return type:
None
- proksee_batch.seq_file_to_cgview_json.genbank_to_cgview_json(genome_name, genbank_file, json_file)¶
Convert a GenBank file to a CGView JSON file.
- Parameters:
genome_name (str)
genbank_file (str)
json_file (str)
- Return type:
None
- proksee_batch.seq_file_to_cgview_json.remove_problematic_characters_from_contig_name(contig_name)¶
Remove problematic characters from a contig name. This is necessary some downstream software may not be able to handle contig names with certain characters.
- Parameters:
contig_name (str)
- Return type:
str
- proksee_batch.seq_file_to_cgview_json.seq_to_json_contig(seq_id, seq)¶
Convert a sequence to a dictionary with the sequence ID, sequence length, and sequence. The dictionary will be in the format expected for a sequence in a CGView JSON file.
- Parameters:
seq_id (str)
seq (str)
- Return type:
Dict[str, Any]
proksee_batch.validate_input_data¶
The input directory must be structured as in the following example:
- input_directory/
- genome_name_1/
- genbank/
genome1.gbk
- fasta/
genome1.fna
- blast/
abc.txt def.tsv
- bed/
ghi.bed jkl.bed
- json/
template1.json
- vcf/
mno.vcf pqr.vcf
- gff/
stu.gff vwx.gff3
- genome_name_2/
- genbank/
genome2.gbff
- fasta/
genome2.fa
- blast/
yza.txt bcd.tsv
- bed/
efg.bed hij.bed
- json/
template2.json
- vcf/
klm.vcf nop.vcf
- gff/
qrs.gff tuv.gff3
…
The genbank directory must contain a single GenBank file with the extension .gbk, .gbff, or .gb. This is the genome that will be visualized. If the genbank directory is not present, then proksee-batch will use a file from the fasta directory instead (otherwise the fasta directory is ignored). The blast, bed, vcf, and gff directories are optional. They contain files with additional genomic features. The json directory is also optional. It contains a custom Proksee project JSON file that will be used as a template for the visualization.
- proksee_batch.validate_input_data.check_vcf_ref_vs_alt_genotypes(vcf_file_path, genome_file_path, genome_file_type)¶
Checks if the genotypes in the genome in the GenBank file match the REF genotypes in the VCF file. :param vcf_file_path: The path to the VCF file. :type vcf_file_path: str :param genbank_file_path: The path to the GenBank file. :type genbank_file_path: str
- Returns:
True if the genotypes in the genome in the GenBank file match the REF genotypes in the VCF file, False otherwise.
- Return type:
bool
- Parameters:
vcf_file_path (str)
genome_file_path (str)
genome_file_type (str)
- proksee_batch.validate_input_data.check_vcf_seq_ids(vcf_file_path, seq_file_path, seq_file_format)¶
Checks if all the sequence IDs in the first column of the VCF file are contigs in a GenBank or FASTA file.
- Parameters:
vcf_file_path (str) – The path to the VCF file.
seq_file_path (str) – The path to the GenBank or FASTA file.
seq_file_format (str) – The format of the GenBank or FASTA file. Valid values are “genbank” and “fasta”.
- Returns:
True if all sequence IDs in the VCF file are contigs in the sequence file, False otherwise.
- Return type:
bool
- proksee_batch.validate_input_data.get_data_files(input_subdir, data_type)¶
Returns the paths to the data files of the specified type in the provided subdirectory.
- Parameters:
input_subdir (str) – The path to the subdirectory containing the data files.
data_type (str) – The type of the data files to be returned. Valid values are “genbank”, “blast”, “bed”, “json”, “vcf”, and “gff”.
- Returns:
The paths to the data files.
- Return type:
list
- proksee_batch.validate_input_data.handle_error_exit(error_message, exit_code=1)¶
Handles errors by printing a message to sys.stderr and exiting the program.
- Parameters:
error_message (str) – The error message to be printed.
exit_code (int) – The exit code to be used for sys.exit. Defaults to 1.
- Return type:
None
- Exits:
SystemExit: Exits the program with the provided exit code.
- proksee_batch.validate_input_data.validate_input_directory_contents(input)¶
Validates if the provided input directory contains the required subdirectories and files.
- Parameters:
input (str) – The path to the input directory to be checked.
- Raises:
SystemExit – If the input directory does not contain the required subdirectories.
- Return type:
None