proteobench.datapoint package#
Datapoint module for ProteoBench benchmarking.
- class proteobench.datapoint.DatapointBase[source]#
Bases:
ABCAbstract base class for benchmark datapoints.
This class defines the interface that all datapoint types must implement, allowing for modular and extensible datapoint handling for different benchmarking modules.
Subclasses should define their own attributes specific to their benchmarking module.
- abstractmethod static generate_datapoint(intermediate: DataFrame, input_format: str, user_input: dict, **kwargs) Series[source]#
Generate a datapoint object containing metadata and results from the benchmark run.
- Parameters:
- Returns:
A Pandas Series containing the datapoint’s attributes as key-value pairs.
- Return type:
pd.Series
- abstractmethod generate_id() None[source]#
Generate a unique ID for the benchmark run.
This ID should uniquely identify each run of the benchmark.
- class proteobench.datapoint.QuantDatapointHYE(id: str = None, software_name: str = None, software_version: int = 0, search_engine: str = None, search_engine_version: int = 0, ident_fdr_psm: int = 0, ident_fdr_peptide: int = 0, ident_fdr_protein: int = 0, enable_match_between_runs: bool = False, precursor_mass_tolerance: str = None, fragment_mass_tolerance: str = None, enzyme: str = None, allowed_miscleavages: int = 0, min_peptide_length: int = 0, max_peptide_length: int = 0, is_temporary: bool = True, intermediate_hash: str = '', results: dict = None, median_abs_epsilon_global: float = 0, mean_abs_epsilon_global: float = 0, median_abs_epsilon_eq_species: float = 0, mean_abs_epsilon_eq_species: float = 0, median_abs_epsilon_precision_global: float = 0, mean_abs_epsilon_precision_global: float = 0, median_abs_epsilon_precision_eq_species: float = 0, mean_abs_epsilon_precision_eq_species: float = 0, nr_prec: int = 0, comments: str = '', proteobench_version: str = '')[source]#
Bases:
DatapointBaseA data structure used to store the results of a quantification benchmark run.
This class extends DatapointBase to implement quantification-specific metrics and metadata storage for LFQ benchmarking runs.
- median_abs_epsilon_eq_species#
Median absolute epsilon value for equivalently weighted species.
- Type:
- mean_abs_epsilon_eq_species#
Mean absolute epsilon value for equivalently weighted species.
- Type:
- median_abs_epsilon_precision_global#
Median absolute precision epsilon (deviation from empirical center).
- Type:
- mean_abs_epsilon_precision_global#
Mean absolute precision epsilon (deviation from empirical center).
- Type:
- median_abs_epsilon_precision_eq_species#
Median absolute precision epsilon for equivalently weighted species.
- Type:
- mean_abs_epsilon_precision_eq_species#
Mean absolute precision epsilon for equivalently weighted species.
- Type:
- static generate_datapoint(intermediate: DataFrame, input_format: str, user_input: dict, default_cutoff_min_prec: int = 3) Series[source]#
Generate a Datapoint object containing metadata and results from the benchmark run.
- Parameters:
intermediate (pd.DataFrame) – The intermediate DataFrame containing benchmark results.
input_format (str) – The format of the input data (e.g., file format).
user_input (dict) – User-defined input values for the benchmark.
default_cutoff_min_prec (int, optional) – The default minimum precursor cutoff value. Defaults to 3.
- Returns:
A Pandas Series containing the Datapoint’s attributes as key-value pairs.
- Return type:
pd.Series
- generate_id() None[source]#
Generate a unique ID for the benchmark run by combining the software name and a timestamp.
This ID is used to uniquely identify each run of the benchmark.
- get_count_metrics(min_nr_observed: int) dict[str, int][source]#
Compute precursor counts (total and per-species).
- get_epsilon_metrics(min_nr_observed: int, agg: str = 'median') dict[str, float][source]#
Compute epsilon-based accuracy metrics using specified aggregation.
- Parameters:
- Returns:
Accuracy metrics: global, equal-species average, and per-species values
- Return type:
- get_metrics(min_nr_observed: int = 1) dict[int, dict[str, float]][source]#
Compute all benchmark metrics (backward compatible wrapper).
Merges: epsilon (accuracy) + precision + cv + roc + counts
- static get_metrics_old(df: DataFrame, min_nr_observed: int = 1) Dict[int, Dict[str, float]][source]#
Compute various statistical metrics from the provided DataFrame for the benchmark.
- Parameters:
df (pd.DataFrame) – The DataFrame containing the benchmark results.
min_nr_observed (int, optional) – The minimum number of observed values for a valid computation. Defaults to 1.
- Returns:
A dictionary containing computed metrics such as ‘median_abs_epsilon’, ‘variance_epsilon’, etc.
- Return type:
- get_precision_metrics(min_nr_observed: int, agg: str = 'median') dict[str, float][source]#
Compute precision metrics directly from log2FC (log2_A_vs_B) column.
Precision measures deviation from the empirical center (reproducibility), computed independently from expected ratios.
- Parameters:
- Returns:
Precision metrics including: - {agg}_log2_empirical_{species}: Center of log2FC distribution per species - {agg}_abs_epsilon_precision_global: Global aggregated precision - {agg}_abs_epsilon_precision_eq_species: Equal-weighted species average - {agg}_abs_epsilon_precision_{species}: Per-species precision values
- Return type:
Submodules#
- proteobench.datapoint.datapoint_base module
- proteobench.datapoint.quant_datapoint module
QuantDatapointHYEQuantDatapointHYE.idQuantDatapointHYE.software_nameQuantDatapointHYE.software_versionQuantDatapointHYE.search_engineQuantDatapointHYE.search_engine_versionQuantDatapointHYE.ident_fdr_psmQuantDatapointHYE.ident_fdr_peptideQuantDatapointHYE.ident_fdr_proteinQuantDatapointHYE.enable_match_between_runsQuantDatapointHYE.precursor_mass_toleranceQuantDatapointHYE.fragment_mass_toleranceQuantDatapointHYE.enzymeQuantDatapointHYE.allowed_miscleavagesQuantDatapointHYE.min_peptide_lengthQuantDatapointHYE.max_peptide_lengthQuantDatapointHYE.is_temporaryQuantDatapointHYE.intermediate_hashQuantDatapointHYE.resultsQuantDatapointHYE.median_abs_epsilon_globalQuantDatapointHYE.mean_abs_epsilon_globalQuantDatapointHYE.median_abs_epsilon_eq_speciesQuantDatapointHYE.mean_abs_epsilon_eq_speciesQuantDatapointHYE.median_abs_epsilon_precision_globalQuantDatapointHYE.mean_abs_epsilon_precision_globalQuantDatapointHYE.median_abs_epsilon_precision_eq_speciesQuantDatapointHYE.mean_abs_epsilon_precision_eq_speciesQuantDatapointHYE.nr_precQuantDatapointHYE.commentsQuantDatapointHYE.proteobench_versionQuantDatapointHYE.allowed_miscleavagesQuantDatapointHYE.commentsQuantDatapointHYE.enable_match_between_runsQuantDatapointHYE.enzymeQuantDatapointHYE.fragment_mass_toleranceQuantDatapointHYE.generate_datapoint()QuantDatapointHYE.generate_id()QuantDatapointHYE.get_count_metrics()QuantDatapointHYE.get_cv_metrics()QuantDatapointHYE.get_epsilon_metrics()QuantDatapointHYE.get_metrics()QuantDatapointHYE.get_metrics_old()QuantDatapointHYE.get_precision_metrics()QuantDatapointHYE.get_roc_metrics()QuantDatapointHYE.idQuantDatapointHYE.ident_fdr_peptideQuantDatapointHYE.ident_fdr_proteinQuantDatapointHYE.ident_fdr_psmQuantDatapointHYE.intermediate_hashQuantDatapointHYE.is_temporaryQuantDatapointHYE.max_peptide_lengthQuantDatapointHYE.mean_abs_epsilon_eq_speciesQuantDatapointHYE.mean_abs_epsilon_globalQuantDatapointHYE.mean_abs_epsilon_precision_eq_speciesQuantDatapointHYE.mean_abs_epsilon_precision_globalQuantDatapointHYE.median_abs_epsilon_eq_speciesQuantDatapointHYE.median_abs_epsilon_globalQuantDatapointHYE.median_abs_epsilon_precision_eq_speciesQuantDatapointHYE.median_abs_epsilon_precision_globalQuantDatapointHYE.min_peptide_lengthQuantDatapointHYE.nr_precQuantDatapointHYE.precursor_mass_toleranceQuantDatapointHYE.proteobench_versionQuantDatapointHYE.resultsQuantDatapointHYE.search_engineQuantDatapointHYE.search_engine_versionQuantDatapointHYE.software_nameQuantDatapointHYE.software_version
compute_roc_auc()compute_roc_auc_directional()filter_df_numquant_epsilon()filter_df_numquant_nr_prec()