proteobench.datapoint.quant_datapoint module#
This module provides functionality for handling and processing quantitative datapoints in the ProteoBench framework.
- class proteobench.datapoint.quant_datapoint.QuantDatapoint(id: str = None, software_name: str = None, software_version: int = 0, search_engine: str = None, search_engine_version: int = 0, ident_fdr_psm: int = 0, ident_fdr_peptide: int = 0, ident_fdr_protein: int = 0, enable_match_between_runs: bool = False, precursor_mass_tolerance: str = None, fragment_mass_tolerance: str = None, enzyme: str = None, allowed_miscleavages: int = 0, min_peptide_length: int = 0, max_peptide_length: int = 0, is_temporary: bool = True, intermediate_hash: str = '', results: dict = None, median_abs_epsilon: float = 0, mean_abs_epsilon: float = 0, nr_prec: int = 0, comments: str = '', proteobench_version: str = '')[source]#
Bases:
objectA data structure used to store the results of a benchmark run.
- static generate_datapoint(intermediate: DataFrame, input_format: str, user_input: dict, default_cutoff_min_prec: int = 3) Series[source]#
Generate a Datapoint object containing metadata and results from the benchmark run.
- Parameters:
intermediate (pd.DataFrame) – The intermediate DataFrame containing benchmark results.
input_format (str) – The format of the input data (e.g., file format).
user_input (dict) – User-defined input values for the benchmark.
default_cutoff_min_prec (int, optional) – The default minimum precursor cutoff value. Defaults to 3.
- Returns:
A Pandas Series containing the Datapoint’s attributes as key-value pairs.
- Return type:
pd.Series
- generate_id() None[source]#
Generate a unique ID for the benchmark run by combining the software name and a timestamp.
This ID is used to uniquely identify each run of the benchmark.
- get_metrics(min_nr_observed: int = 1) dict[int, dict[str, float]][source]#
Compute various statistical metrics from the provided DataFrame for the benchmark, but optimized to do fewer passes over the data.
- static get_metrics_old(df: DataFrame, min_nr_observed: int = 1) Dict[int, Dict[str, float]][source]#
Compute various statistical metrics from the provided DataFrame for the benchmark.
- Parameters:
df (pd.DataFrame) – The DataFrame containing the benchmark results.
min_nr_observed (int, optional) – The minimum number of observed values for a valid computation. Defaults to 1.
- Returns:
A dictionary containing computed metrics such as ‘median_abs_epsilon’, ‘variance_epsilon’, etc.
- Return type:
- proteobench.datapoint.quant_datapoint.filter_df_numquant_epsilon(row: Dict[str, Any], min_quant: int = 3, metric: str = 'median') float | None[source]#
Extract the ‘median_abs_epsilon’ value from a row (assumed to be a dictionary).
- Parameters:
- Returns:
The ‘median_abs_epsilon’ value if found, otherwise None.
- Return type:
float or None