Template#

Template for module#

All input formats available for the module

class proteobench.modules.template.parse_settings.ParseSettings(input_format: str)[source]#

Bases: object

Structure that contains all the parameters used to parse the given database search output.

class proteobench.modules.template.parse.ParseInputs[source]#

Bases: object

convert_to_standard_format(parse_settings: ParseSettings) DataFrame[source]#

Convert a search engine output into a generic format supported by the module.

class proteobench.modules.template.module.Module[source]#

Bases: object

Description of the Module.

add_current_data_point(current_datapoint: Series, all_datapoints: DataFrame | None = None) DataFrame[source]#

Add current data point to all data points and load them from file if empty.

Parameters:
  • all_datapoints – The data points from previous runs.

  • current_datapoint – The current data point to be added.

Returns:

The data points with the current data point added.

Return type:

all_datapoints

benchmarking(input_file: str, input_format: str, user_input: dict, all_datapoints)[source]#

Main workflow of the module. Used to benchmark workflow results.

Parameters:
  • input_file – Path to the workflow output file.

  • input_format – Format of the workflow output file.

  • user_input – User provided parameters for plotting.

  • all_datapoints – DataFrame containing all datapoints from the proteobench repo.

  • default_cutoff_min_prec – Minimum number of runs an ion has to be identified in.

Returns:

Tuple containing the intermediate data structure, and all datapoints.

Return type:

tuple[DataFrame, DataFrame]

generate_datapoint(input_format: str, user_input: dict) Datapoint[source]#

Method used to compute benchmarks for the provided intermediate structure.

Parameters:
  • intermediate – The intermediate data structure.

  • input_format – The format of the input file.

  • user_input – The user input settings.

Returns:

The computed benchmark values.

Return type:

df

generate_intermediate(parse_settings: ParseSettings) DataFrame[source]#

Calculate intermediate values from the uploaded file.

Parameters:
  • standard_format – The uploaded file in a standard format.

  • parse_settings – The settings used to parse the uploaded file.

Returns:

The intermediate values calculated from the uploaded file.

Return type:

intermediate

is_implemented() bool[source]#

Returns whether the module is fully implemented.

load_input_file(input_format: str) DataFrame[source]#

Method loads dataframe from a input file depending on its format.

Parameters:
  • input_csv – The path to the input file.

  • input_format – The format of the input file.

Returns:

The dataframe loaded from the input file.

Return type:

input_data_frame

class proteobench.modules.template.datapoint.Datapoint(id: str | None = None, is_temporary: bool = True, search_engine: str | None = None, software_version: int = 0, fdr_psm: int = 0, fdr_peptide: int = 0, fdr_protein: int = 0, MBR: bool = False, precursor_tol: int = 0, precursor_tol_unit: str = 'Da', fragment_tol: int = 0, fragment_tol_unit: str = 'Da', enzyme_name: str | None = None, missed_cleavages: int = 0, min_pep_length: int = 0, max_pep_length: int = 0)[source]#

Bases: object

Data used to store the experimental metadata and data analysis settings.

Example for attributes:

id: A unique identifier for the datapoint. is_temporary: A boolean flag indicating whether the datapoint is temporary or not. search_engine: The name of the search engine used for the experiment. software_version: The version number of the software used for the experiment. fdr_psm: The false discovery rate at the peptide-spectrum match level. fdr_peptide: The false discovery rate at the peptide level. fdr_protein: The false discovery rate at the protein level. MBR: A boolean flag indicating whether match-between-runs was enabled or not. precursor_tol: The precursor mass tolerance in units specified by precursor_tol_unit. precursor_tol_unit: The unit of the precursor mass tolerance. Either “Da” or “ppm”. fragment_tol: The fragment mass tolerance in units specified by fragment_tol_unit. fragment_tol_unit: The unit of the fragment mass tolerance. Either “Da” or “ppm”. enzyme_name: The name of the enzyme used for digestion. missed_cleavages: The number of allowed missed cleavages during digestion. min_pep_length: The minimum peptide length for identification. max_pep_length: The maximum peptide length for identification. weighted_sum: The weighted sum score used for protein inference. nr_prec: The number of precursors used for protein inference.

MBR: bool = False#
calculate_benchmarking_metric_1(intermediate_data: dict) float[source]#

Calculates the first benchmarking metric based on the intermediate data.

Parameters:

intermediate_data – A dictionary containing the intermediate data.

Returns:

The value of the first benchmarking metric.

Return type:

metric_1

calculate_benchmarking_metric_2(intermediate_data: dict) float[source]#

Calculates the second benchmarking metric based on the intermediate data.

Parameters:

intermediate_data – A dictionary containing the intermediate data.

Returns:

The value of the second benchmarking metric.

Return type:

metric_2

dump_json_object(file_name: str) None[source]#

Dumps the datapoint as a JSON object to a file.

Writes a JSON representation of the datapoint to a file with the given name. Appends the JSON object to the end of the file if it already exists.

Parameters:

file_name – The name of the file to write to.

enzyme_name: str = None#
fdr_peptide: int = 0#
fdr_protein: int = 0#
fdr_psm: int = 0#
fragment_tol: int = 0#
fragment_tol_unit: str = 'Da'#
generate_id()[source]#

Generates a unique id for the datapoint based on the search engine and software version.

Sets the id attribute to a string composed of the search engine name, software version number, and current timestamp separated by underscores. Prints the id to stdout.

id: str = None#
is_temporary: bool = True#
max_pep_length: int = 0#
min_pep_length: int = 0#
missed_cleavages: int = 0#
precursor_tol: int = 0#
precursor_tol_unit: str = 'Da'#
search_engine: str = None#
software_version: int = 0#
proteobench.modules.template.plot.plot_bench1(result_df)[source]#

Plot results with Plotly Express.

proteobench.modules.template.plot.plot_bench2(result_df)[source]#

Plot results with Plotly Express.