proteobench.score.entrapmentscores module#
Module containing plasma quantification score calculators (PYE - Plasma Year Edition).
- class proteobench.score.entrapmentscores.EntrapmentScores(mapping_file: str)[source]#
Bases:
ScoreBaseClass for computing entrapment scores for entrapment benchmarking.
This class inherits from ScoreBase and extends it with entrapment metrics and calculations.
- Parameters:
precursor_column_name (str) – Name of the precursor column.
- calculate_fdp_at_fdr_thresholds(df: DataFrame, n_intervals: int = 10) Dict[float, Dict[str, float]][source]#
Compute lower-bound, combined, and paired FDP at evenly-spaced Q-value thresholds.
Thresholds are spaced from
max_q / n_intervalstomax_qinn_intervalsequal steps, wheremax_qis the maximum Q-value indf(i.e. the reported FDR). The mapping file is loaded once and the pair-index merge is performed once; only the Q-value filter varies per step.- Parameters:
df (pd.DataFrame) – Intermediate DataFrame produced by
generate_intermediateand filtered byvalidate_entrapment_coverage.n_intervals (int) – Number of evenly-spaced thresholds. Defaults to 10.
- Returns:
Mapping of
{threshold: {lower_bound_FDP, combined_FDP, paired_FDP, nr_id_features}}. Thresholds where no targets are identified are omitted.- Return type:
- static calculate_lower_bound_fdp(df: DataFrame) Dict[int, float][source]#
Compute the lower bound false discovery proportion (FDP) for the given DataFrame.
- Parameters:
df (pd.DataFrame) – DataFrame containing the intermediate file for which to compute the lower bound FDP.
- Returns:
The computed lower bound FDP value.
- Return type:
Float
- calculate_metrics(df: DataFrame) Dict[str, float][source]#
Handle the calculation of all entrapment metrics for the given DataFrame. Ensures 1% FDR filtering for the main plot metrics. Handles categorisation into valid, invalid, and inconclusive based on bound values.
- calculate_paired_fdp(df: DataFrame) Dict[int, float][source]#
Compute the paired false discovery proportion (FDP) for the given DataFrame.
- Parameters:
df (pd.DataFrame) – DataFrame containing the intermediate file for which to compute the paired FDP.
- Returns:
The computed paired FDP value.
- Return type:
Float
- static calculate_reported_fdr(df: DataFrame, score_col: str = 'Q-Value') float[source]#
Estimate the FDR threshold applied by the search engine from the output data.
The reported FDR is inferred as the maximum score value in the DataFrame for the given score column. For Q-value-based outputs this equals the least significant accepted Q-value, which corresponds to the FDR cutoff the search engine applied. The
score_colparameter makes the method applicable to different entrapment levels: use"Q-Value"for PSM/precursor level, or the appropriate column name for peptide- or protein-level outputs.- Parameters:
df (pd.DataFrame) – DataFrame containing the intermediate data for one entrapment run. Must contain the column specified by
score_col.score_col (str, optional) – Name of the Q-value (or equivalent score) column. Defaults to
"Q-Value".
- Returns:
The maximum value found in
score_col, interpreted as the applied FDR threshold.- Return type:
- static calculate_upper_bound_combined_fdp(df: DataFrame) Dict[int, float][source]#
Compute the false discovery proportion (FDP) for the given DataFrame.
- Parameters:
df (pd.DataFrame) – DataFrame containing the intermediate file for which to compute the FDP.
- Returns:
The computed FDP value.
- Return type:
Float
- static categorise_metric(lower_bound: float, upper_bound: float, fdr: float) str[source]#
- Categorise the FDR into
valid: Upper bound lower than reported FDR invalid: Lower bound higher than reported FDR inconclusive: Lower bound lower than reported FDR but upper bound higher than reported FDR
- generate_intermediate(filtered_df: DataFrame) DataFrame[source]#
Generate intermediate data structure for entrapment scores.
- Parameters:
filtered_df (pd.DataFrame) – DataFrame containing the filtered data.
- Returns:
DataFrame containing the intermediate data structure.
- Return type:
pd.DataFrame
- validate_entrapment_coverage(df: DataFrame, max_missing_fraction: float = 0.03) DataFrame[source]#
Check that identified peptides are covered by the entrapment mapping file and return a filtered DataFrame containing only peptides that have a pair.
Raises
EntrapmentErrorif the fraction of peptides absent from the mapping file exceedsmax_missing_fraction. This indicates a FASTA mismatch — most commonly caused by enabling in-silico digestion in the search engine when the entrapment FASTA is already pre-digested.- Parameters:
df (pd.DataFrame) – Intermediate DataFrame produced by
generate_intermediate. Must contain a"Peptide"column.max_missing_fraction (float) – Maximum tolerated fraction of unmatched peptides. Defaults to 0.03.
- Returns:
Copy of
dfwith rows whose peptide has no paired entrapment removed.- Return type:
pd.DataFrame
- Raises:
EntrapmentError – If the fraction of unmatched peptides exceeds
max_missing_fraction.