proteobench.score.denovoscores module#

Module containing the DenovoScores class.

class proteobench.score.denovoscores.DenovoScores[source]#

Class for computing de novo scores.

aa_match(peptide1: List[str], peptide2: List[str], cum_mass_threshold: float = 50, ind_mass_threshold: float = 20) → Tuple[ndarray, bool, Tuple[ndarray], Tuple[ndarray]][source]#

Find the matching prefix and suffix amino acids between two peptide sequences.

Parameters:

peptide1 (List[str]) – The first tokenized peptide sequence to be compared.
peptide2 (List[str]) – The second tokenized peptide sequence to be compared.
cum_mass_threshold (float) – Mass threshold in Dalton to accept cumulative mass-matching amino acid sequences.
ind_mass_threshold (float) – Mass threshold in Dalton to accept individual mass-matching amino acids.

Returns:

aa_matches (np.ndarray of length max(len(peptide1), len(peptide2))) – Boolean flag indicating whether each paired-up amino acid matches across both peptide sequences.
pep_match (bool) – Boolean flag to indicate whether the two peptide sequences fully match.
per_seq_aa_matches (Tuple[np.ndarray]) – TODO.

aa_match_prefix(peptide1: List[str], peptide2: List[str], cum_mass_threshold: float = 50, ind_mass_threshold: float = 20) → Tuple[ndarray, bool, Tuple[ndarray], Tuple[ndarray]][source]#

Find the matching prefix amino acids between two peptide sequences.

Parameters:

peptide1 (List[str]) – The first tokenized peptide sequence to be compared.
peptide2 (List[str]) – The second tokenized peptide sequence to be compared.
cum_mass_threshold (float) – Mass threshold in Dalton to accept cumulative mass-matching amino acid sequences.
ind_mass_threshold (float) – Mass threshold in Dalton to accept individual mass-matching amino acids.

Returns:

aa_matches (np.ndarray of length max(len(peptide1), len(peptide2))) – Boolean flag indicating whether each paired-up amino acid matches across both peptide sequences.
pep_match (bool) – Boolean flag to indicate whether the two peptide sequences fully match.
per_seq_aa_matches (Tuple[np.ndarray]) – TODO.

evaluate_match(ground_truth: Peptidoform, de_novo: Peptidoform)[source]#: Return the match type between two peptide sequences.

generate_intermediate(filtered_df: DataFrame, replicate_to_raw=None) → DataFrame[source]#

Generate intermediate data structure for scores.

Parameters:

filtered_df (pd.DataFrame) – DataFrame containing the filtered data.
replicate_to_raw (dict) – Dictionary containing the replicate to raw mapping.

Returns:

DataFrame containing the intermediate data structure with computed scores.

Return type:

pd.DataFrame

get_token_mass(token: tuple) → float[source]#: Convert the amino acid to a mass while considering modifications as well.

get_token_str(token: tuple) → str[source]#: Convert the amino acid to string format including the modification if present.

mass_diff(mz1, mz2, mode_is_da)[source]#

Calculate the mass difference(s).

Parameters:

Return type:

The mass difference(s) between the given m/z values.