proteobench.score.entrapmentscores module#

Module containing plasma quantification score calculators (PYE - Plasma Year Edition).

class proteobench.score.entrapmentscores.EntrapmentScores(mapping_file: str)[source]#

Bases: ScoreBase

Class for computing entrapment scores for entrapment benchmarking.

This class inherits from ScoreBase and extends it with entrapment metrics and calculations.

Parameters:: precursor_column_name (str) – Name of the precursor column.

calculate_fdp_at_fdr_thresholds(df: DataFrame, n_intervals: int = 10) → Dict[float, Dict[str, float]][source]#

Compute lower-bound, combined, and paired FDP at evenly-spaced Q-value thresholds.

Thresholds are spaced from max_q / n_intervals to max_q in n_intervals equal steps, where max_q is the maximum Q-value in df (i.e. the reported FDR). The mapping file is loaded once and the pair-index merge is performed once; only the Q-value filter varies per step.

Parameters:

df (pd.DataFrame) – Intermediate DataFrame produced by generate_intermediate and filtered by validate_entrapment_coverage.
n_intervals (int) – Number of evenly-spaced thresholds. Defaults to 10.

Returns:

Mapping of {threshold: {lower_bound_FDP, combined_FDP, paired_FDP, nr_id_features}}. Thresholds where no targets are identified are omitted.

Return type:

Dict[float, Dict[str, float]]

static calculate_lower_bound_fdp(df: DataFrame) → Dict[int, float][source]#

Compute the lower bound false discovery proportion (FDP) for the given DataFrame.

Parameters:: df (pd.DataFrame) – DataFrame containing the intermediate file for which to compute the lower bound FDP.
Returns:: The computed lower bound FDP value.
Return type:: Float

calculate_metrics(df: DataFrame) → Dict[str, float][source]#

Handle the calculation of all entrapment metrics for the given DataFrame. Ensures 1% FDR filtering for the main plot metrics. Handles categorisation into valid, invalid, and inconclusive based on bound values.

Parameters:: df (pd.DataFrame) – DataFrame containing the intermediate file for which to compute the metrics.
Returns:: A dictionary containing all computed metric values.
Return type:: Dict[str, float]

calculate_paired_fdp(df: DataFrame) → Dict[int, float][source]#

Compute the paired false discovery proportion (FDP) for the given DataFrame.

Parameters:: df (pd.DataFrame) – DataFrame containing the intermediate file for which to compute the paired FDP.
Returns:: The computed paired FDP value.
Return type:: Float

static calculate_reported_fdr(df: DataFrame, score_col: str = 'Q-Value') → float[source]#

Estimate the FDR threshold applied by the search engine from the output data.

The reported FDR is inferred as the maximum score value in the DataFrame for the given score column. For Q-value-based outputs this equals the least significant accepted Q-value, which corresponds to the FDR cutoff the search engine applied. The score_col parameter makes the method applicable to different entrapment levels: use "Q-Value" for PSM/precursor level, or the appropriate column name for peptide- or protein-level outputs.

Parameters:

df (pd.DataFrame) – DataFrame containing the intermediate data for one entrapment run. Must contain the column specified by score_col.
score_col (str, optional) – Name of the Q-value (or equivalent score) column. Defaults to "Q-Value".

Returns:

The maximum value found in score_col, interpreted as the applied FDR threshold.

Return type:

float

static calculate_upper_bound_combined_fdp(df: DataFrame) → Dict[int, float][source]#

Compute the false discovery proportion (FDP) for the given DataFrame.

Parameters:: df (pd.DataFrame) – DataFrame containing the intermediate file for which to compute the FDP.
Returns:: The computed FDP value.
Return type:: Float

static categorise_metric(lower_bound: float, upper_bound: float, fdr: float) → str[source]#

Categorise the FDR into: valid: Upper bound lower than reported FDR invalid: Lower bound higher than reported FDR inconclusive: Lower bound lower than reported FDR but upper bound higher than reported FDR

Parameters:

lower_bound (float) – The lower bound for categorisation.
upper_bound (float) – The upper bound for categorisation.
fdr (float) – The false discovery rate for categorisation.

Returns:

The category of the metric value (“valid”, “invalid”, or “inconclusive”).

Return type:

str

generate_intermediate(filtered_df: DataFrame) → DataFrame[source]#

Generate intermediate data structure for entrapment scores.

Parameters:: filtered_df (pd.DataFrame) – DataFrame containing the filtered data.
Returns:: DataFrame containing the intermediate data structure.
Return type:: pd.DataFrame

validate_entrapment_coverage(df: DataFrame, max_missing_fraction: float = 0.03) → DataFrame[source]#

Check that identified peptides are covered by the entrapment mapping file and return a filtered DataFrame containing only peptides that have a pair.

Raises EntrapmentError if the fraction of peptides absent from the mapping file exceeds max_missing_fraction. This indicates a FASTA mismatch — most commonly caused by enabling in-silico digestion in the search engine when the entrapment FASTA is already pre-digested.

Parameters:

df (pd.DataFrame) – Intermediate DataFrame produced by generate_intermediate. Must contain a "Peptide" column.
max_missing_fraction (float) – Maximum tolerated fraction of unmatched peptides. Defaults to 0.03.

Returns:

Copy of df with rows whose peptide has no paired entrapment removed.

Return type:

pd.DataFrame

Raises:

EntrapmentError – If the fraction of unmatched peptides exceeds max_missing_fraction.