proteobench.datapoint.denovo_datapoint module
This module provides functionality for storing the de novo metrics.
-
class proteobench.datapoint.denovo_datapoint.DenovoDatapoint(id: str = None, software_name: str = None, software_version: int = 0, checkpoint: str = None, n_beams: int = None, n_peaks: int = None, precursor_mass_tolerance: str = None, min_peptide_length: int = 0, max_peptide_length: int = 0, min_mz: int = 0, max_mz: int = 50000, min_intensity: int = 0, max_intensity: int = 1, tokens: str = None, min_precursor_charge: int = 1, max_precursor_charge: int = None, remove_precursor_tol: str = None, isotope_error_range: str = None, decoding_strategy: str = None, is_temporary: bool = True, intermediate_hash: str = '', results: dict = None, precision_peptide: float = 0, precision_aa: float = 0, recall_aa: float = 0, recall_peptide: float = 0, comments: str = '', proteobench_version: str = '')[source]
Bases: object
A data structure used to store the results of a benchmark run.
-
id
Unique identifier for the benchmark run.
- Type:
str
-
software_name
Name of the software used in the benchmark.
- Type:
str
-
software_version
Version of the software.
- Type:
str
-
search_engine
Name of the search engine used.
- Type:
str
-
search_engine_version
Version of the search engine.
- Type:
str
-
ident_fdr_psm
False discovery rate for PSMs.
- Type:
float
-
ident_fdr_peptide
False discovery rate for peptides.
- Type:
float
-
ident_fdr_protein
False discovery rate for proteins.
- Type:
float
-
enable_match_between_runs
Whether matching between runs is enabled.
- Type:
bool
-
precursor_mass_tolerance
Mass tolerance for precursor ions.
- Type:
str
-
fragment_mass_tolerance
Mass tolerance for fragment ions.
- Type:
str
-
enzyme
Enzyme used for digestion.
- Type:
str
-
allowed_miscleavages
Number of allowed miscleavages.
- Type:
int
-
min_peptide_length
Minimum peptide length.
- Type:
int
-
max_peptide_length
Maximum peptide length.
- Type:
int
-
is_temporary
Whether the data is temporary.
- Type:
bool
-
intermediate_hash
Hash of the intermediate result.
- Type:
str
-
results
A dictionary of metrics for the benchmark run.
- Type:
dict
-
median_abs_epsilon
Median absolute epsilon value for the benchmark.
- Type:
float
-
mean_abs_epsilon
Mean absolute epsilon value for the benchmark.
- Type:
float
-
nr_prec
Number of precursors identified.
- Type:
int
Any additional comments.
- Type:
str
-
proteobench_version
Version of the Proteobench tool used.
- Type:
str
-
checkpoint: str = None
-
comments: str = ''
-
decoding_strategy: str = None
-
static evaluate_ptm(mod_label, mod_tag, peptidoform, match_array)[source]
-
static generate_datapoint(intermediate: DataFrame, input_format: str, user_input: dict, subset_columns_hash: List[str] = ['spectrum_id', 'peptide_str', 'score'], evaluation_type: str = 'mass') → Series[source]
Generate a Datapoint object containing metadata and results from the benchmark run.
-
generate_id() → None[source]
Generate a unique ID for the benchmark run by combining the software name and a timestamp.
This ID is used to uniquely identify each run of the benchmark.
-
get_indepth_metrics(df: DataFrame)[source]
-
get_metrics(df: DataFrame, level: str, evaluation: str)[source]
Compute various statistical metrics from the provided DataFrame for the benchmark.
-
get_ptm_metrics(df: DataFrame)[source]
-
get_species_metrics(df: DataFrame)[source]
-
get_spectrum_metrics(df: DataFrame)[source]
-
id: str = None
-
intermediate_hash: str = ''
-
is_temporary: bool = True
-
isotope_error_range: str = None
-
max_intensity: int = 1
-
max_mz: int = 50000
-
max_peptide_length: int = 0
-
max_precursor_charge: int = None
-
min_intensity: int = 0
-
min_mz: int = 0
-
min_peptide_length: int = 0
-
min_precursor_charge: int = 1
-
n_beams: int = None
-
n_peaks: int = None
-
precision_aa: float = 0
-
precision_peptide: float = 0
-
precursor_mass_tolerance: str = None
-
proteobench_version: str = ''
-
recall_aa: float = 0
-
recall_peptide: float = 0
-
static record_proportions_to_results_feature(series: Series, counts: dict, min_el: int = 1, max_el: int = 30, all_elements=None) → dict[source]
-
remove_precursor_tol: str = None
-
results: dict = None
-
software_name: str = None
-
software_version: int = 0
-
tokens: str = None
-
proteobench.datapoint.denovo_datapoint.calculate_prc(scores_correct, scores_all, n_spectra, threshold=None)[source]
-
proteobench.datapoint.denovo_datapoint.collapse_aa_scores(df: DataFrame, evaluation_type: str)[source]
-
proteobench.datapoint.denovo_datapoint.get_prc_curve(t, n_spectra)[source]