proteobench.score.denovo.denovoscores module#
Module containing the DenovoScores class.
- class proteobench.score.denovo.denovoscores.DenovoScores[source]#
Bases:
ScoreBaseClass for computing de novo scores.
- aa_match(peptide1: List[str], peptide2: List[str], cum_mass_threshold: float = 50, ind_mass_threshold: float = 20) Tuple[ndarray, bool, Tuple[ndarray], Tuple[ndarray]][source]#
Find the matching prefix and suffix amino acids between two peptide sequences.
- Parameters:
peptide1 (List[str]) – The first tokenized peptide sequence to be compared.
peptide2 (List[str]) – The second tokenized peptide sequence to be compared.
cum_mass_threshold (float) – Mass threshold in Dalton to accept cumulative mass-matching amino acid sequences.
ind_mass_threshold (float) – Mass threshold in Dalton to accept individual mass-matching amino acids.
- Returns:
aa_matches (np.ndarray of length max(len(peptide1), len(peptide2))) – Boolean flag indicating whether each paired-up amino acid matches across both peptide sequences.
pep_match (bool) – Boolean flag to indicate whether the two peptide sequences fully match.
per_seq_aa_matches (Tuple[np.ndarray]) – TODO.
- aa_match_prefix(peptide1: List[str], peptide2: List[str], cum_mass_threshold: float = 50, ind_mass_threshold: float = 20) Tuple[ndarray, bool, Tuple[ndarray], Tuple[ndarray]][source]#
Find the matching prefix amino acids between two peptide sequences.
- Parameters:
peptide1 (List[str]) – The first tokenized peptide sequence to be compared.
peptide2 (List[str]) – The second tokenized peptide sequence to be compared.
cum_mass_threshold (float) – Mass threshold in Dalton to accept cumulative mass-matching amino acid sequences.
ind_mass_threshold (float) – Mass threshold in Dalton to accept individual mass-matching amino acids.
- Returns:
aa_matches (np.ndarray of length max(len(peptide1), len(peptide2))) – Boolean flag indicating whether each paired-up amino acid matches across both peptide sequences.
pep_match (bool) – Boolean flag to indicate whether the two peptide sequences fully match.
per_seq_aa_matches (Tuple[np.ndarray]) – TODO.
- evaluate_match(ground_truth: Peptidoform, de_novo: Peptidoform)[source]#
Return the match type between two peptide sequences.
- generate_intermediate(filtered_df: DataFrame, replicate_to_raw=None) DataFrame[source]#
Generate intermediate data structure for scores.
- Parameters:
filtered_df (pd.DataFrame) – DataFrame containing the filtered data.
replicate_to_raw (dict) – Dictionary containing the replicate to raw mapping.
- Returns:
DataFrame containing the intermediate data structure with computed scores.
- Return type:
pd.DataFrame
- get_token_mass(token: tuple) float[source]#
Convert the amino acid to a mass while considering modifications as well.