proteobench.score package#

Score module for ProteoBench benchmarking.

class proteobench.score.QuantScoresHYE(precursor_column_name: str, species_expected_ratio, species_dict: Dict[str, str])[source]#

Bases: ScoreBase

Class for computing quantification scores for LFQ benchmarking.

This class implements the ScoreBase interface to compute quantification-specific metrics including condition statistics, fold changes, and epsilon (difference) values.

Parameters:
  • precursor_column_name (str) – Name of the precursor column.

  • species_expected_ratio (dict) – Dictionary containing the expected ratios for each species.

  • species_dict (dict) – Dictionary containing the species names and their column mappings.

static compute_condition_stats(relevant_columns_df: DataFrame, min_intensity=0, precursor='precursor ion') DataFrame[source]#

Method used to precursor statistics, such as number of observations, CV, mean per condition etc.

Parameters:
  • relevant_columns_df (pd.DataFrame) – DataFrame containing the relevant columns for the statistics.

  • min_intensity (int, optional) – Minimum intensity value to filter for. Defaults to 0.

  • precursor (str, optional) – Name of the precursor column. Defaults to “precursor ion.

Returns:

DataFrame containing the precursor statistics.

Return type:

pd.DataFrame

static compute_epsilon(withspecies, species_expected_ratio) DataFrame[source]#

Compute epsilon for each species in species_expected_ratio.

Parameters:
  • withspecies (pd.DataFrame) – DataFrame containing the species columns and the log2_A_vs_B column.

  • species_expected_ratio (dict) – Dictionary containing the expected ratios for each species.

Returns:

DataFrame containing the epsilon values.

Return type:

pd.DataFrame

static convert_replicate_to_raw(replicate_to_raw: dict) DataFrame[source]#

Convert replicate_to_raw dictionary into a dataframe.

Parameters:

replicate_to_raw (dict) – Dictionary containing the replicate to raw mapping.

Returns:

DataFrame containing the replicate to raw mapping.

Return type:

pd.DataFrame

generate_intermediate(filtered_df: DataFrame, replicate_to_raw: dict) DataFrame[source]#

Generate intermediate data structure for quantification scores.

Parameters:
  • filtered_df (pd.DataFrame) – DataFrame containing the filtered data.

  • replicate_to_raw (dict) – Dictionary containing the replicate to raw mapping.

Returns:

DataFrame containing the intermediate data structure.

Return type:

pd.DataFrame

class proteobench.score.ScoreBase[source]#

Bases: ABC

Abstract base class for computing benchmark scores.

This class defines the interface that all score calculators must implement, allowing for modular and extensible score computation for different benchmarking modules.

abstractmethod generate_intermediate(filtered_df: DataFrame, replicate_to_raw: dict) DataFrame[source]#

Generate intermediate data structure for scores.

Parameters:
  • filtered_df (pd.DataFrame) – DataFrame containing the filtered data.

  • replicate_to_raw (dict) – Dictionary containing the replicate to raw mapping.

Returns:

DataFrame containing the intermediate data structure with computed scores.

Return type:

pd.DataFrame

Submodules#