proteobench.datapoint package#
Datapoint module for ProteoBench benchmarking.
- class proteobench.datapoint.DatapointBase[source]#
Bases:
ABCAbstract base class for benchmark datapoints.
This class defines the interface that all datapoint types must implement, allowing for modular and extensible datapoint handling for different benchmarking modules.
Subclasses should define their own attributes specific to their benchmarking module.
- abstractmethod static generate_datapoint(intermediate: DataFrame, input_format: str, user_input: dict, **kwargs) Series[source]#
Generate a datapoint object containing metadata and results from the benchmark run.
- Parameters:
- Returns:
A Pandas Series containing the datapoint’s attributes as key-value pairs.
- Return type:
pd.Series
- abstractmethod generate_id() None[source]#
Generate a unique ID for the benchmark run.
This ID should uniquely identify each run of the benchmark.
- class proteobench.datapoint.QuantDatapointHYE(id: str = None, software_name: str = None, software_version: int = 0, search_engine: str = None, search_engine_version: int = 0, ident_fdr_psm: int = 0, ident_fdr_peptide: int = 0, ident_fdr_protein: int = 0, enable_match_between_runs: bool = False, precursor_mass_tolerance: str = None, fragment_mass_tolerance: str = None, enzyme: str = None, allowed_miscleavages: int = 0, min_peptide_length: int = 0, max_peptide_length: int = 0, is_temporary: bool = True, intermediate_hash: str = '', results: dict = None, median_abs_epsilon_global: float = 0, mean_abs_epsilon_global: float = 0, median_abs_epsilon_eq_species: float = 0, mean_abs_epsilon_eq_species: float = 0, median_abs_epsilon_precision_global: float = 0, mean_abs_epsilon_precision_global: float = 0, median_abs_epsilon_precision_eq_species: float = 0, mean_abs_epsilon_precision_eq_species: float = 0, nr_prec: int = 0, comments: str = '', proteobench_version: str = '')[source]#
Bases:
DatapointBaseA data structure used to store the results of a quantification benchmark run.
This class extends DatapointBase to implement quantification-specific metrics and metadata storage for LFQ benchmarking runs.
- median_abs_epsilon_eq_species#
Median absolute epsilon value for equivalently weighted species.
- Type:
- mean_abs_epsilon_eq_species#
Mean absolute epsilon value for equivalently weighted species.
- Type:
- median_abs_epsilon_precision_global#
Median absolute precision epsilon (deviation from empirical center).
- Type:
- mean_abs_epsilon_precision_global#
Mean absolute precision epsilon (deviation from empirical center).
- Type:
- median_abs_epsilon_precision_eq_species#
Median absolute precision epsilon for equivalently weighted species.
- Type:
- mean_abs_epsilon_precision_eq_species#
Mean absolute precision epsilon for equivalently weighted species.
- Type:
- static generate_datapoint(intermediate: DataFrame, input_format: str, user_input: dict, default_cutoff_min_prec: int = 3, max_nr_observed: int = None) Series[source]#
Generate a Datapoint object containing metadata and results from the benchmark run.
- Parameters:
intermediate (pd.DataFrame) – The intermediate DataFrame containing benchmark results.
input_format (str) – The format of the input data (e.g., file format).
user_input (dict) – User-defined input values for the benchmark.
default_cutoff_min_prec (int, optional) – The default minimum precursor cutoff value. Defaults to 3.
max_nr_observed (int, optional) – Maximum nr_observed value to calculate metrics for. If None, defaults to 6.
- Returns:
A Pandas Series containing the Datapoint’s attributes as key-value pairs.
- Return type:
pd.Series
- generate_id() None[source]#
Generate a unique ID for the benchmark run by combining the software name and a timestamp.
This ID is used to uniquely identify each run of the benchmark.
- static get_cv_metrics(df: DataFrame, min_nr_observed: int) dict[str, float][source]#
Compute CV quantiles.
- static get_epsilon_metrics(df: DataFrame, min_nr_observed: int, agg: str = 'median') dict[str, float][source]#
Compute epsilon-based accuracy metrics using specified aggregation.
- Parameters:
- Returns:
Accuracy metrics: global, equal-species average, and per-species values
- Return type:
- static get_metrics(df: DataFrame, min_nr_observed: int = 3) Dict[int, Dict[str, float]][source]#
Compute statistical metrics from the provided DataFrame.
- static get_precision_metrics(df: DataFrame, min_nr_observed: int, agg: str = 'median') dict[str, float][source]#
Compute precision metrics directly from log2FC (log2_A_vs_B) column.
Precision measures deviation from the empirical center (reproducibility), computed independently from expected ratios.
- Parameters:
- Returns:
Precision metrics including: - {agg}_log2_empirical_{species}: Center of log2FC distribution per species - {agg}_abs_epsilon_precision_global: Global aggregated precision - {agg}_abs_epsilon_precision_eq_species: Equal-weighted species average - {agg}_abs_epsilon_precision_{species}: Per-species precision values
- Return type:
Submodules#
- proteobench.datapoint.datapoint_base module
- proteobench.datapoint.denovo_datapoint module
DenovoDatapointDenovoDatapoint.idDenovoDatapoint.software_nameDenovoDatapoint.software_versionDenovoDatapoint.search_engineDenovoDatapoint.search_engine_versionDenovoDatapoint.ident_fdr_psmDenovoDatapoint.ident_fdr_peptideDenovoDatapoint.ident_fdr_proteinDenovoDatapoint.enable_match_between_runsDenovoDatapoint.precursor_mass_toleranceDenovoDatapoint.fragment_mass_toleranceDenovoDatapoint.enzymeDenovoDatapoint.allowed_miscleavagesDenovoDatapoint.min_peptide_lengthDenovoDatapoint.max_peptide_lengthDenovoDatapoint.is_temporaryDenovoDatapoint.intermediate_hashDenovoDatapoint.resultsDenovoDatapoint.median_abs_epsilonDenovoDatapoint.mean_abs_epsilonDenovoDatapoint.nr_precDenovoDatapoint.commentsDenovoDatapoint.proteobench_versionDenovoDatapoint.checkpointDenovoDatapoint.commentsDenovoDatapoint.decoding_strategyDenovoDatapoint.evaluate_ptm()DenovoDatapoint.generate_datapoint()DenovoDatapoint.generate_id()DenovoDatapoint.get_indepth_metrics()DenovoDatapoint.get_metrics()DenovoDatapoint.get_ptm_metrics()DenovoDatapoint.get_species_metrics()DenovoDatapoint.get_spectrum_metrics()DenovoDatapoint.idDenovoDatapoint.intermediate_hashDenovoDatapoint.is_temporaryDenovoDatapoint.isotope_error_rangeDenovoDatapoint.max_intensityDenovoDatapoint.max_mzDenovoDatapoint.max_peptide_lengthDenovoDatapoint.max_precursor_chargeDenovoDatapoint.min_intensityDenovoDatapoint.min_mzDenovoDatapoint.min_peptide_lengthDenovoDatapoint.min_precursor_chargeDenovoDatapoint.n_beamsDenovoDatapoint.n_peaksDenovoDatapoint.precision_aaDenovoDatapoint.precision_peptideDenovoDatapoint.precursor_mass_toleranceDenovoDatapoint.proteobench_versionDenovoDatapoint.recall_aaDenovoDatapoint.recall_peptideDenovoDatapoint.record_proportions_to_results_feature()DenovoDatapoint.remove_precursor_tolDenovoDatapoint.resultsDenovoDatapoint.software_nameDenovoDatapoint.software_versionDenovoDatapoint.tokens
calculate_prc()collapse_aa_scores()get_prc_curve()
- proteobench.datapoint.quant_datapoint module
QuantDatapointHYEQuantDatapointHYE.idQuantDatapointHYE.software_nameQuantDatapointHYE.software_versionQuantDatapointHYE.search_engineQuantDatapointHYE.search_engine_versionQuantDatapointHYE.ident_fdr_psmQuantDatapointHYE.ident_fdr_peptideQuantDatapointHYE.ident_fdr_proteinQuantDatapointHYE.enable_match_between_runsQuantDatapointHYE.precursor_mass_toleranceQuantDatapointHYE.fragment_mass_toleranceQuantDatapointHYE.enzymeQuantDatapointHYE.allowed_miscleavagesQuantDatapointHYE.min_peptide_lengthQuantDatapointHYE.max_peptide_lengthQuantDatapointHYE.is_temporaryQuantDatapointHYE.intermediate_hashQuantDatapointHYE.resultsQuantDatapointHYE.median_abs_epsilon_globalQuantDatapointHYE.mean_abs_epsilon_globalQuantDatapointHYE.median_abs_epsilon_eq_speciesQuantDatapointHYE.mean_abs_epsilon_eq_speciesQuantDatapointHYE.median_abs_epsilon_precision_globalQuantDatapointHYE.mean_abs_epsilon_precision_globalQuantDatapointHYE.median_abs_epsilon_precision_eq_speciesQuantDatapointHYE.mean_abs_epsilon_precision_eq_speciesQuantDatapointHYE.nr_precQuantDatapointHYE.commentsQuantDatapointHYE.proteobench_versionQuantDatapointHYE.allowed_miscleavagesQuantDatapointHYE.commentsQuantDatapointHYE.enable_match_between_runsQuantDatapointHYE.enzymeQuantDatapointHYE.fragment_mass_toleranceQuantDatapointHYE.generate_datapoint()QuantDatapointHYE.generate_id()QuantDatapointHYE.get_cv_metrics()QuantDatapointHYE.get_epsilon_metrics()QuantDatapointHYE.get_metrics()QuantDatapointHYE.get_precision_metrics()QuantDatapointHYE.idQuantDatapointHYE.ident_fdr_peptideQuantDatapointHYE.ident_fdr_proteinQuantDatapointHYE.ident_fdr_psmQuantDatapointHYE.intermediate_hashQuantDatapointHYE.is_temporaryQuantDatapointHYE.max_peptide_lengthQuantDatapointHYE.mean_abs_epsilon_eq_speciesQuantDatapointHYE.mean_abs_epsilon_globalQuantDatapointHYE.mean_abs_epsilon_precision_eq_speciesQuantDatapointHYE.mean_abs_epsilon_precision_globalQuantDatapointHYE.median_abs_epsilon_eq_speciesQuantDatapointHYE.median_abs_epsilon_globalQuantDatapointHYE.median_abs_epsilon_precision_eq_speciesQuantDatapointHYE.median_abs_epsilon_precision_globalQuantDatapointHYE.min_peptide_lengthQuantDatapointHYE.nr_precQuantDatapointHYE.precursor_mass_toleranceQuantDatapointHYE.proteobench_versionQuantDatapointHYE.resultsQuantDatapointHYE.search_engineQuantDatapointHYE.search_engine_versionQuantDatapointHYE.software_nameQuantDatapointHYE.software_version
QuantDatapointPYEQuantDatapointPYE.median_abs_log2_fc_error_spike_insQuantDatapointPYE.nr_quantified_spike_insQuantDatapointPYE.dynamic_range_human_plasmaQuantDatapointPYE.median_abs_epsilon_human_plasmaQuantDatapointPYE.dynamic_range_human_plasmaQuantDatapointPYE.generate_datapoint()QuantDatapointPYE.median_abs_epsilon_human_plasmaQuantDatapointPYE.median_abs_log2_fc_error_spike_insQuantDatapointPYE.nr_quantified_spike_ins
compute_roc_auc()compute_roc_auc_directional()filter_df_numquant_epsilon()filter_df_numquant_nr_prec()