proteobench.datapoint.entrapment_datapoint module#

This module provides functionality for handling and processing quantitative datapoints in the ProteoBench framework.

class proteobench.datapoint.entrapment_datapoint.EntrapmentDatapoint(id: str = None, software_name: str = None, software_version: int = 0, search_engine: str = None, search_engine_version: int = 0, ident_fdr_psm: int = 0, ident_fdr_peptide: int = 0, ident_fdr_protein: int = 0, enable_match_between_runs: bool = False, precursor_mass_tolerance: str = None, fragment_mass_tolerance: str = None, enzyme: str = None, allowed_miscleavages: int = 0, min_peptide_length: int = 0, max_peptide_length: int = 0, is_temporary: bool = True, intermediate_hash: str = '', results: dict = None, nr_id_features: int = 0, lower_bound_FDP: float = nan, combined_FDP: float = nan, category_combined: str = '', category_paired: str = '', paired_FDP: float = nan, reported_fdr_parsed_from_input: float = nan, fdp_curve: dict = None, comments: str = '', proteobench_version: str = '')[source]#

Bases: DatapointBase

A data structure used to store the results of a entrapment benchmark run.

This class extends DatapointBase to implement entrapment-specific metrics and metadata storage for LFQ benchmarking runs.

id#

Unique identifier for the benchmark run.

Type:

str

software_name#

Name of the software used in the benchmark.

Type:

str

software_version#

Version of the software.

Type:

str

search_engine#

Name of the search engine used.

Type:

str

search_engine_version#

Version of the search engine.

Type:

str

ident_fdr_psm#

False discovery rate for PSMs.

Type:

float

ident_fdr_peptide#

False discovery rate for peptides.

Type:

float

ident_fdr_protein#

False discovery rate for proteins.

Type:

float

enable_match_between_runs#

Whether matching between runs is enabled.

Type:

bool

precursor_mass_tolerance#

Mass tolerance for precursor ions.

Type:

str

fragment_mass_tolerance#

Mass tolerance for fragment ions.

Type:

str

enzyme#

Enzyme used for digestion.

Type:

str

allowed_miscleavages#

Number of allowed miscleavages.

Type:

int

min_peptide_length#

Minimum peptide length.

Type:

int

max_peptide_length#

Maximum peptide length.

Type:

int

is_temporary#

Whether the data is temporary.

Type:

bool

intermediate_hash#

Hash of the intermediate result.

Type:

str

results#

A dictionary of metrics for the benchmark run.

Type:

dict

nr_id_features#

Number of identified features.

Type:

int

lower_bound_FDP#

estimated false discovery proportion based on entrapment IDs.

Type:

float

combined_FDP#

estimated False discovery proportion based on entrapment IDs.

Type:

float

paired_FDP#

estimated False discovery proportion based on entrapment IDs.

Type:

float

reported_fdr_parsed_from_input#

FDR threshold inferred from the input data (max Q-value).

Type:

float

comments#

Any additional comments.

Type:

str

proteobench_version#

Version of the Proteobench tool used.

Type:

str

allowed_miscleavages: int = 0#
category_combined: str = ''#
category_paired: str = ''#
combined_FDP: float = nan#
comments: str = ''#
enable_match_between_runs: bool = False#
enzyme: str = None#
fdp_curve: dict = None#
fragment_mass_tolerance: str = None#
static generate_datapoint(intermediate: DataFrame, input_format: str, user_input: dict) Series[source]#

Generate a Datapoint object containing metadata and results from the benchmark run.

Parameters:
  • intermediate (pd.DataFrame) – The intermediate DataFrame containing benchmark results.

  • input_format (str) – The format of the input data (e.g., file format).

  • user_input (dict) – User-defined input values for the benchmark.

  • default_cutoff_min_prec (int, optional) – The default minimum precursor cutoff value. Defaults to 3.

  • max_nr_observed (int, optional) – Maximum nr_observed value to calculate metrics for. If None, defaults to 6.

Returns:

A Pandas Series containing the Datapoint’s attributes as key-value pairs.

Return type:

pd.Series

generate_id() None[source]#

Generate a unique ID for the benchmark run by combining the software name and a timestamp.

This ID is used to uniquely identify each run of the benchmark.

static get_metrics(intermediate: DataFrame) Dict[str, Any][source]#

Compute statistical metrics from the provided DataFrame.

Parameters:
  • df (pd.DataFrame) – DataFrame containing the intermediate results.

  • **kwargs (dict) – Additional module-specific parameters.

Returns:

Dictionary mapping quantification cutoffs to their computed metrics.

Return type:

Dict[int, Dict[str, float]]

id: str = None#
ident_fdr_peptide: int = 0#
ident_fdr_protein: int = 0#
ident_fdr_psm: int = 0#
intermediate_hash: str = ''#
is_temporary: bool = True#
lower_bound_FDP: float = nan#
max_peptide_length: int = 0#
min_peptide_length: int = 0#
nr_id_features: int = 0#
paired_FDP: float = nan#
precursor_mass_tolerance: str = None#
proteobench_version: str = ''#
reported_fdr_parsed_from_input: float = nan#
results: dict = None#
search_engine: str = None#
search_engine_version: int = 0#
software_name: str = None#
software_version: int = 0#