proteobench.score.denovoscores module#

Module containing the DenovoScores class.

class proteobench.score.denovoscores.DenovoScores[source]#

Bases: ScoreBase

Class for computing de novo scores.

aa_match(peptide1: List[str], peptide2: List[str], cum_mass_threshold: float = 50, ind_mass_threshold: float = 20) Tuple[ndarray, bool, Tuple[ndarray], Tuple[ndarray]][source]#

Find the matching prefix and suffix amino acids between two peptide sequences.

Parameters:
  • peptide1 (List[str]) – The first tokenized peptide sequence to be compared.

  • peptide2 (List[str]) – The second tokenized peptide sequence to be compared.

  • cum_mass_threshold (float) – Mass threshold in Dalton to accept cumulative mass-matching amino acid sequences.

  • ind_mass_threshold (float) – Mass threshold in Dalton to accept individual mass-matching amino acids.

Returns:

  • aa_matches (np.ndarray of length max(len(peptide1), len(peptide2))) – Boolean flag indicating whether each paired-up amino acid matches across both peptide sequences.

  • pep_match (bool) – Boolean flag to indicate whether the two peptide sequences fully match.

  • per_seq_aa_matches (Tuple[np.ndarray]) – TODO.

aa_match_prefix(peptide1: List[str], peptide2: List[str], cum_mass_threshold: float = 50, ind_mass_threshold: float = 20) Tuple[ndarray, bool, Tuple[ndarray], Tuple[ndarray]][source]#

Find the matching prefix amino acids between two peptide sequences.

Parameters:
  • peptide1 (List[str]) – The first tokenized peptide sequence to be compared.

  • peptide2 (List[str]) – The second tokenized peptide sequence to be compared.

  • cum_mass_threshold (float) – Mass threshold in Dalton to accept cumulative mass-matching amino acid sequences.

  • ind_mass_threshold (float) – Mass threshold in Dalton to accept individual mass-matching amino acids.

Returns:

  • aa_matches (np.ndarray of length max(len(peptide1), len(peptide2))) – Boolean flag indicating whether each paired-up amino acid matches across both peptide sequences.

  • pep_match (bool) – Boolean flag to indicate whether the two peptide sequences fully match.

  • per_seq_aa_matches (Tuple[np.ndarray]) – TODO.

convert_peptidoform(peptidoform: Peptidoform)[source]#
evaluate_match(ground_truth: Peptidoform, de_novo: Peptidoform)[source]#

Return the match type between two peptide sequences.

generate_intermediate(filtered_df: DataFrame, replicate_to_raw=None) DataFrame[source]#

Generate intermediate data structure for scores.

Parameters:
  • filtered_df (pd.DataFrame) – DataFrame containing the filtered data.

  • replicate_to_raw (dict) – Dictionary containing the replicate to raw mapping.

Returns:

DataFrame containing the intermediate data structure with computed scores.

Return type:

pd.DataFrame

get_token_mass(token: tuple) float[source]#

Convert the amino acid to a mass while considering modifications as well.

get_token_str(token: tuple) str[source]#

Convert the amino acid to string format including the modification if present.

mass_diff(mz1, mz2, mode_is_da)[source]#

Calculate the mass difference(s).

Parameters:
  • mz1 – First m/z value(s).

  • mz2 – Second m/z value(s).

  • mode_is_da (bool) – Mass difference in Dalton (True) or in ppm (False).

Return type:

The mass difference(s) between the given m/z values.