Template#

Template for module#

All input formats available for the module

class proteobench.modules.template.parse_settings.ParseSettings(input_format: str)[source]#

Bases: object

Structure that contains all the parameters used to parse the given database search output.

class proteobench.modules.template.parse.ParseInputs[source]#

Bases: ParseInputsInterface

convert_to_standard_format(parse_settings: ParseSettings) → DataFrame[source]#: Convert a search engine output into a generic format supported by the module.

class proteobench.modules.template.datapoint.Datapoint(id: str | None = None, is_temporary: bool = True, search_engine: str | None = None, software_version: int = 0, fdr_psm: int = 0, fdr_peptide: int = 0, fdr_protein: int = 0, MBR: bool = False, precursor_tol: int = 0, precursor_tol_unit: str = 'Da', fragment_tol: int = 0, fragment_tol_unit: str = 'Da', enzyme_name: str | None = None, missed_cleavages: int = 0, min_pep_length: int = 0, max_pep_length: int = 0)[source]#

Bases: object

Data used to store the experimental metadata and data analysis settings.

Example for attributes:: id: A unique identifier for the datapoint. is_temporary: A boolean flag indicating whether the datapoint is temporary or not. search_engine: The name of the search engine used for the experiment. software_version: The version number of the software used for the experiment. fdr_psm: The false discovery rate at the peptide-spectrum match level. fdr_peptide: The false discovery rate at the peptide level. fdr_protein: The false discovery rate at the protein level. MBR: A boolean flag indicating whether match-between-runs was enabled or not. precursor_tol: The precursor mass tolerance in units specified by precursor_tol_unit. precursor_tol_unit: The unit of the precursor mass tolerance. Either “Da” or “ppm”. fragment_tol: The fragment mass tolerance in units specified by fragment_tol_unit. fragment_tol_unit: The unit of the fragment mass tolerance. Either “Da” or “ppm”. enzyme_name: The name of the enzyme used for digestion. missed_cleavages: The number of allowed missed cleavages during digestion. min_pep_length: The minimum peptide length for identification. max_pep_length: The maximum peptide length for identification. weighted_sum: The weighted sum score used for protein inference. nr_prec: The number of precursors used for protein inference.

MBR: bool = False#

calculate_benchmarking_metric_1(intermediate_data)[source]#

Calculates the first benchmarking metric based on the intermediate data.

Parameters:: intermediate_data (dict) – A dictionary containing the intermediate data.
Returns:: The value of the first benchmarking metric.
Return type:: metric_1 (float)

calculate_benchmarking_metric_2(intermediate_data)[source]#

Calculates the second benchmarking metric based on the intermediate data.

Parameters:: intermediate_data (dict) – A dictionary containing the intermediate data.
Returns:: The value of the second benchmarking metric.
Return type:: metric_2 (float)

dump_json_object(file_name)[source]#

Dumps the datapoint as a JSON object to a file.

Parameters:: file_name (str) – The name of the file to write to.

Writes a JSON representation of the datapoint to a file with the given name. Appends the JSON object to the end of the file if it already exists.

enzyme_name: str = None#

fdr_peptide: int = 0#

fdr_protein: int = 0#

fdr_psm: int = 0#

fragment_tol: int = 0#

fragment_tol_unit: str = 'Da'#

generate_id()[source]#

Generates a unique id for the datapoint based on the search engine and software version.

Sets the id attribute to a string composed of the search engine name, software version number, and current timestamp separated by underscores. Prints the id to stdout.

id: str = None#

is_temporary: bool = True#

max_pep_length: int = 0#

min_pep_length: int = 0#

missed_cleavages: int = 0#

precursor_tol: int = 0#

precursor_tol_unit: str = 'Da'#

search_engine: str = None#

software_version: int = 0#