Template#
Template for module#
All input formats available for the module
- class proteobench.modules.template.parse_settings.ParseSettings(input_format: str)[source]#
Bases:
objectStructure that contains all the parameters used to parse the given database search output.
- class proteobench.modules.template.parse.ParseInputs[source]#
Bases:
object- convert_to_standard_format(parse_settings: ParseSettings) DataFrame[source]#
Convert a search engine output into a generic format supported by the module.
- class proteobench.modules.template.module.Module[source]#
Bases:
objectDescription of the Module.
- add_current_data_point(current_datapoint: Series, all_datapoints: DataFrame | None = None) DataFrame[source]#
Add current data point to all data points and load them from file if empty.
- Parameters:
all_datapoints – The data points from previous runs.
current_datapoint – The current data point to be added.
- Returns:
The data points with the current data point added.
- Return type:
all_datapoints
- benchmarking(input_file: str, input_format: str, user_input: dict, all_datapoints)[source]#
Main workflow of the module. Used to benchmark workflow results.
- Parameters:
input_file – Path to the workflow output file.
input_format – Format of the workflow output file.
user_input – User provided parameters for plotting.
all_datapoints – DataFrame containing all datapoints from the proteobench repo.
default_cutoff_min_prec – Minimum number of runs an ion has to be identified in.
- Returns:
Tuple containing the intermediate data structure, and all datapoints.
- Return type:
tuple[DataFrame, DataFrame]
- generate_datapoint(input_format: str, user_input: dict) Datapoint[source]#
Method used to compute benchmarks for the provided intermediate structure.
- Parameters:
intermediate – The intermediate data structure.
input_format – The format of the input file.
user_input – The user input settings.
- Returns:
The computed benchmark values.
- Return type:
df
- generate_intermediate(parse_settings: ParseSettings) DataFrame[source]#
Calculate intermediate values from the uploaded file.
- Parameters:
standard_format – The uploaded file in a standard format.
parse_settings – The settings used to parse the uploaded file.
- Returns:
The intermediate values calculated from the uploaded file.
- Return type:
intermediate
- class proteobench.modules.template.datapoint.Datapoint(id: str | None = None, is_temporary: bool = True, search_engine: str | None = None, software_version: int = 0, fdr_psm: int = 0, fdr_peptide: int = 0, fdr_protein: int = 0, MBR: bool = False, precursor_tol: int = 0, precursor_tol_unit: str = 'Da', fragment_tol: int = 0, fragment_tol_unit: str = 'Da', enzyme_name: str | None = None, missed_cleavages: int = 0, min_pep_length: int = 0, max_pep_length: int = 0)[source]#
Bases:
objectData used to store the experimental metadata and data analysis settings.
- Example for attributes:
id: A unique identifier for the datapoint. is_temporary: A boolean flag indicating whether the datapoint is temporary or not. search_engine: The name of the search engine used for the experiment. software_version: The version number of the software used for the experiment. fdr_psm: The false discovery rate at the peptide-spectrum match level. fdr_peptide: The false discovery rate at the peptide level. fdr_protein: The false discovery rate at the protein level. MBR: A boolean flag indicating whether match-between-runs was enabled or not. precursor_tol: The precursor mass tolerance in units specified by precursor_tol_unit. precursor_tol_unit: The unit of the precursor mass tolerance. Either “Da” or “ppm”. fragment_tol: The fragment mass tolerance in units specified by fragment_tol_unit. fragment_tol_unit: The unit of the fragment mass tolerance. Either “Da” or “ppm”. enzyme_name: The name of the enzyme used for digestion. missed_cleavages: The number of allowed missed cleavages during digestion. min_pep_length: The minimum peptide length for identification. max_pep_length: The maximum peptide length for identification. weighted_sum: The weighted sum score used for protein inference. nr_prec: The number of precursors used for protein inference.
- calculate_benchmarking_metric_1(intermediate_data: dict) float[source]#
Calculates the first benchmarking metric based on the intermediate data.
- Parameters:
intermediate_data – A dictionary containing the intermediate data.
- Returns:
The value of the first benchmarking metric.
- Return type:
metric_1
- calculate_benchmarking_metric_2(intermediate_data: dict) float[source]#
Calculates the second benchmarking metric based on the intermediate data.
- Parameters:
intermediate_data – A dictionary containing the intermediate data.
- Returns:
The value of the second benchmarking metric.
- Return type:
metric_2
- dump_json_object(file_name: str) None[source]#
Dumps the datapoint as a JSON object to a file.
Writes a JSON representation of the datapoint to a file with the given name. Appends the JSON object to the end of the file if it already exists.
- Parameters:
file_name – The name of the file to write to.