proteobench.datapoint.denovo_datapoint module#

This module provides functionality for storing the de novo metrics.

class proteobench.datapoint.denovo_datapoint.DenovoDatapoint(id: str = None, software_name: str = None, software_version: int = 0, checkpoint: str = None, n_beams: int = None, n_peaks: int = None, precursor_mass_tolerance: str = None, min_peptide_length: int = 0, max_peptide_length: int = 0, min_mz: int = 0, max_mz: int = 50000, min_intensity: int = 0, max_intensity: int = 1, tokens: str = None, min_precursor_charge: int = 1, max_precursor_charge: int = None, remove_precursor_tol: str = None, isotope_error_range: str = None, decoding_strategy: str = None, is_temporary: bool = True, intermediate_hash: str = '', results: dict = None, precision_peptide: float = 0, precision_aa: float = 0, recall_aa: float = 0, recall_peptide: float = 0, comments: str = '', proteobench_version: str = '')[source]#

Bases: object

A data structure used to store the results of a benchmark run.

id#

Unique identifier for the benchmark run.

Type:

str

software_name#

Name of the software used in the benchmark.

Type:

str

software_version#

Version of the software.

Type:

str

search_engine#

Name of the search engine used.

Type:

str

search_engine_version#

Version of the search engine.

Type:

str

ident_fdr_psm#

False discovery rate for PSMs.

Type:

float

ident_fdr_peptide#

False discovery rate for peptides.

Type:

float

ident_fdr_protein#

False discovery rate for proteins.

Type:

float

enable_match_between_runs#

Whether matching between runs is enabled.

Type:

bool

precursor_mass_tolerance#

Mass tolerance for precursor ions.

Type:

str

fragment_mass_tolerance#

Mass tolerance for fragment ions.

Type:

str

enzyme#

Enzyme used for digestion.

Type:

str

allowed_miscleavages#

Number of allowed miscleavages.

Type:

int

min_peptide_length#

Minimum peptide length.

Type:

int

max_peptide_length#

Maximum peptide length.

Type:

int

is_temporary#

Whether the data is temporary.

Type:

bool

intermediate_hash#

Hash of the intermediate result.

Type:

str

results#

A dictionary of metrics for the benchmark run.

Type:

dict

median_abs_epsilon#

Median absolute epsilon value for the benchmark.

Type:

float

mean_abs_epsilon#

Mean absolute epsilon value for the benchmark.

Type:

float

nr_prec#

Number of precursors identified.

Type:

int

comments#

Any additional comments.

Type:

str

proteobench_version#

Version of the Proteobench tool used.

Type:

str

checkpoint: str = None#
comments: str = ''#
decoding_strategy: str = None#
static evaluate_ptm(mod_label, mod_tag, peptidoform, match_array)[source]#
static generate_datapoint(intermediate: DataFrame, input_format: str, user_input: dict, subset_columns_hash: List[str] = ['spectrum_id', 'peptide_str', 'score'], evaluation_type: str = 'mass') Series[source]#

Generate a Datapoint object containing metadata and results from the benchmark run.

generate_id() None[source]#

Generate a unique ID for the benchmark run by combining the software name and a timestamp.

This ID is used to uniquely identify each run of the benchmark.

get_indepth_metrics(df: DataFrame)[source]#
get_metrics(df: DataFrame, level: str, evaluation: str)[source]#

Compute various statistical metrics from the provided DataFrame for the benchmark.

get_ptm_metrics(df: DataFrame)[source]#
get_species_metrics(df: DataFrame)[source]#
get_spectrum_metrics(df: DataFrame)[source]#
id: str = None#
intermediate_hash: str = ''#
is_temporary: bool = True#
isotope_error_range: str = None#
max_intensity: int = 1#
max_mz: int = 50000#
max_peptide_length: int = 0#
max_precursor_charge: int = None#
min_intensity: int = 0#
min_mz: int = 0#
min_peptide_length: int = 0#
min_precursor_charge: int = 1#
n_beams: int = None#
n_peaks: int = None#
precision_aa: float = 0#
precision_peptide: float = 0#
precursor_mass_tolerance: str = None#
proteobench_version: str = ''#
recall_aa: float = 0#
recall_peptide: float = 0#
static record_proportions_to_results_feature(series: Series, counts: dict, min_el: int = 1, max_el: int = 30, all_elements=None) dict[source]#
remove_precursor_tol: str = None#
results: dict = None#
software_name: str = None#
software_version: int = 0#
tokens: str = None#
proteobench.datapoint.denovo_datapoint.calculate_prc(scores_correct, scores_all, n_spectra, threshold=None)[source]#
proteobench.datapoint.denovo_datapoint.collapse_aa_scores(df: DataFrame, evaluation_type: str)[source]#
proteobench.datapoint.denovo_datapoint.get_prc_curve(t, n_spectra)[source]#