proteobench.score.entrapmentscores module#
Module containing plasma quantification score calculators (PYE - Plasma Year Edition).
- class proteobench.score.entrapmentscores.EntrapmentScores[source]#
Bases:
ScoreBaseClass for computing entrapment scores for entrapment benchmarking.
This class inherits from ScoreBase and extends it with entrapment metrics and calculations.
- Parameters:
precursor_column_name (str) – Name of the precursor column.
- static calculate_fdp_at_fdr_thresholds(df: DataFrame, n_intervals: int = 10, filepath_paired: str = '../proteobench/score/static_files/ProteoBenchFASTA_Entrapment_Human_with_contaminants_entrapment_pep.txt') Dict[float, Dict[str, float]][source]#
Compute lower-bound, combined, and paired FDP at evenly-spaced Q-value thresholds.
Thresholds are spaced from
max_q / n_intervalstomax_qinn_intervalsequal steps, wheremax_qis the maximum Q-value indf(i.e. the reported FDR). The mapping file is loaded once and the pair-index merge is performed once; only the Q-value filter varies per step.- Parameters:
- Returns:
Mapping of
{threshold: {lower_bound_FDP, combined_FDP, paired_FDP, nr_id_features}}. Thresholds where no targets are identified are omitted.- Return type:
- static calculate_lower_bound_fdp(df: DataFrame) Dict[int, float][source]#
Compute the lower bound false discovery proportion (FDP) for the given DataFrame.
- Parameters:
df (pd.DataFrame) – DataFrame containing the intermediate file for which to compute the lower bound FDP.
- Returns:
The computed lower bound FDP value.
- Return type:
Float
- static calculate_metrics(df: DataFrame) Dict[str, float][source]#
Handle the calculation of all entrapment metrics for the given DataFrame. Ensures 1% FDR filtering for the main plot metrics. Handles categorisation into valid, invalid, and inconclusive based on bound values.
- static calculate_paired_fdp(df: DataFrame, filepath_paired: str = '../proteobench/score/static_files/ProteoBenchFASTA_Entrapment_Human_with_contaminants_entrapment_pep.txt') Dict[int, float][source]#
Compute the paired false discovery proportion (FDP) for the given DataFrame.
- Parameters:
df (pd.DataFrame) – DataFrame containing the intermediate file for which to compute the paired FDP.
filepath_paired (str) – File path to the paired mapping file for computing the paired FDP.
- Returns:
The computed paired FDP value.
- Return type:
Float
- static calculate_reported_fdr(df: DataFrame, score_col: str = 'Q-Value') float[source]#
Estimate the FDR threshold applied by the search engine from the output data.
The reported FDR is inferred as the maximum score value in the DataFrame for the given score column. For Q-value-based outputs this equals the least significant accepted Q-value, which corresponds to the FDR cutoff the search engine applied. The
score_colparameter makes the method applicable to different entrapment levels: use"Q-Value"for PSM/precursor level, or the appropriate column name for peptide- or protein-level outputs.- Parameters:
df (pd.DataFrame) – DataFrame containing the intermediate data for one entrapment run. Must contain the column specified by
score_col.score_col (str, optional) – Name of the Q-value (or equivalent score) column. Defaults to
"Q-Value".
- Returns:
The maximum value found in
score_col, interpreted as the applied FDR threshold.- Return type:
- static calculate_upper_bound_combined_fdp(df: DataFrame) Dict[int, float][source]#
Compute the false discovery proportion (FDP) for the given DataFrame.
- Parameters:
df (pd.DataFrame) – DataFrame containing the intermediate file for which to compute the FDP.
- Returns:
The computed FDP value.
- Return type:
Float
- static categorise_metric(lower_bound: float, upper_bound: float, fdr: float) str[source]#
- Categorise the FDR into
valid: Upper bound lower than reported FDR invalid: Lower bound higher than reported FDR inconclusive: Lower bound lower than reported FDR but upper bound higher than reported FDR
- generate_intermediate(filtered_df: DataFrame) DataFrame[source]#
Generate intermediate data structure for entrapment scores.
- Parameters:
filtered_df (pd.DataFrame) – DataFrame containing the filtered data.
- Returns:
DataFrame containing the intermediate data structure.
- Return type:
pd.DataFrame
- static validate_entrapment_coverage(df: DataFrame, filepath_paired: str = '../proteobench/score/static_files/ProteoBenchFASTA_Entrapment_Human_with_contaminants_entrapment_pep.txt', max_missing_fraction: float = 0.03) DataFrame[source]#
Check that identified peptides are covered by the entrapment mapping file and return a filtered DataFrame containing only peptides that have a pair.
Raises
EntrapmentErrorif the fraction of peptides absent from the mapping file exceedsmax_missing_fraction. This indicates a FASTA mismatch — most commonly caused by enabling in-silico digestion in the search engine when the entrapment FASTA is already pre-digested.- Parameters:
- Returns:
Copy of
dfwith rows whose peptide has no paired entrapment removed.- Return type:
pd.DataFrame
- Raises:
EntrapmentError – If the fraction of unmatched peptides exceeds
max_missing_fraction.