proteobench.datapoint.entrapment_datapoint module
This module provides functionality for handling and processing quantitative datapoints in the ProteoBench framework.
-
class proteobench.datapoint.entrapment_datapoint.EntrapmentDatapoint(id: str = None, software_name: str = None, software_version: int = 0, search_engine: str = None, search_engine_version: int = 0, ident_fdr_psm: int = 0, ident_fdr_peptide: int = 0, ident_fdr_protein: int = 0, enable_match_between_runs: bool = False, precursor_mass_tolerance: str = None, fragment_mass_tolerance: str = None, enzyme: str = None, allowed_miscleavages: int = 0, min_peptide_length: int = 0, max_peptide_length: int = 0, is_temporary: bool = True, intermediate_hash: str = '', results: dict = None, nr_id_features: int = 0, lower_bound_FDP: float = nan, combined_FDP: float = nan, category_combined: str = '', category_paired: str = '', paired_FDP: float = nan, reported_fdr_parsed_from_input: float = nan, fdp_curve: dict = None, comments: str = '', proteobench_version: str = '')[source]
Bases: DatapointBase
A data structure used to store the results of a entrapment benchmark run.
This class extends DatapointBase to implement entrapment-specific metrics and metadata
storage for LFQ benchmarking runs.
-
id
Unique identifier for the benchmark run.
- Type:
str
-
software_name
Name of the software used in the benchmark.
- Type:
str
-
software_version
Version of the software.
- Type:
str
-
search_engine
Name of the search engine used.
- Type:
str
-
search_engine_version
Version of the search engine.
- Type:
str
-
ident_fdr_psm
False discovery rate for PSMs.
- Type:
float
-
ident_fdr_peptide
False discovery rate for peptides.
- Type:
float
-
ident_fdr_protein
False discovery rate for proteins.
- Type:
float
-
enable_match_between_runs
Whether matching between runs is enabled.
- Type:
bool
-
precursor_mass_tolerance
Mass tolerance for precursor ions.
- Type:
str
-
fragment_mass_tolerance
Mass tolerance for fragment ions.
- Type:
str
-
enzyme
Enzyme used for digestion.
- Type:
str
-
allowed_miscleavages
Number of allowed miscleavages.
- Type:
int
-
min_peptide_length
Minimum peptide length.
- Type:
int
-
max_peptide_length
Maximum peptide length.
- Type:
int
-
is_temporary
Whether the data is temporary.
- Type:
bool
-
intermediate_hash
Hash of the intermediate result.
- Type:
str
-
results
A dictionary of metrics for the benchmark run.
- Type:
dict
-
nr_id_features
Number of identified features.
- Type:
int
-
lower_bound_FDP
estimated false discovery proportion based on entrapment IDs.
- Type:
float
-
combined_FDP
estimated False discovery proportion based on entrapment IDs.
- Type:
float
-
paired_FDP
estimated False discovery proportion based on entrapment IDs.
- Type:
float
-
reported_fdr_parsed_from_input
FDR threshold inferred from the input data (max Q-value).
- Type:
float
Any additional comments.
- Type:
str
-
proteobench_version
Version of the Proteobench tool used.
- Type:
str
-
allowed_miscleavages: int = 0
-
category_combined: str = ''
-
category_paired: str = ''
-
combined_FDP: float = nan
-
comments: str = ''
-
enable_match_between_runs: bool = False
-
enzyme: str = None
-
fdp_curve: dict = None
-
fragment_mass_tolerance: str = None
-
static generate_datapoint(intermediate: DataFrame, input_format: str, user_input: dict) → Series[source]
Generate a Datapoint object containing metadata and results from the benchmark run.
- Parameters:
intermediate (pd.DataFrame) – The intermediate DataFrame containing benchmark results.
input_format (str) – The format of the input data (e.g., file format).
user_input (dict) – User-defined input values for the benchmark.
default_cutoff_min_prec (int, optional) – The default minimum precursor cutoff value. Defaults to 3.
max_nr_observed (int, optional) – Maximum nr_observed value to calculate metrics for. If None, defaults to 6.
- Returns:
A Pandas Series containing the Datapoint’s attributes as key-value pairs.
- Return type:
pd.Series
-
generate_id() → None[source]
Generate a unique ID for the benchmark run by combining the software name and a timestamp.
This ID is used to uniquely identify each run of the benchmark.
-
static get_metrics(intermediate: DataFrame) → Dict[str, Any][source]
Compute statistical metrics from the provided DataFrame.
- Parameters:
-
- Returns:
Dictionary mapping quantification cutoffs to their computed metrics.
- Return type:
Dict[int, Dict[str, float]]
-
id: str = None
-
ident_fdr_peptide: int = 0
-
ident_fdr_protein: int = 0
-
ident_fdr_psm: int = 0
-
intermediate_hash: str = ''
-
is_temporary: bool = True
-
lower_bound_FDP: float = nan
-
max_peptide_length: int = 0
-
min_peptide_length: int = 0
-
nr_id_features: int = 0
-
paired_FDP: float = nan
-
precursor_mass_tolerance: str = None
-
proteobench_version: str = ''
-
reported_fdr_parsed_from_input: float = nan
-
results: dict = None
-
search_engine: str = None
-
search_engine_version: int = 0
-
software_name: str = None
-
software_version: int = 0