proteobench.io.parsing.parse_peptidoform module#
Module for parsing peptidoform strings and extracting modifications.
- proteobench.io.parsing.parse_peptidoform.aggregate_modification_column(input_string_seq: str, input_string_modifications: str, special_locations: Dict[str, int] = {'Any C-term': -1, 'Any N-term': 0, 'C-Term': -1, 'N-Term': 0, 'Protein C-term': -1, 'Protein N-term': 0}) str[source]#
Aggregate modifications into a string representing the modified sequence.
This version handles both: - Original format (e.g. “Methylation (C11)” or “Carbamidomethyl (Any N-term)”) - New format (e.g. “1xCarbamidomethyl [C11]”, “1xOxidation [M4]”, “1xAcetyl [N-Term]”)
- Parameters:
- Returns:
The modified sequence string with aggregated modifications.
- Return type:
- proteobench.io.parsing.parse_peptidoform.count_chars(input_string: str, isalpha: bool = True, isupper: bool = True) int[source]#
Count the number of characters in the string that match the given criteria.
- proteobench.io.parsing.parse_peptidoform.get_proforma_bracketed(input_string: str, before_aa: bool = True, isalpha: bool = True, isupper: bool = True, pattern: str = '\\[([^]]+)\\]', modification_dict: Dict[str, str] = {'+15.9949': 'Oxidation', '+42': 'Acetyl', '+57.0215': 'Carbamidomethyl', '-17.026548': 'Gln->pyro-Glu', '-18.010565': 'Glu->pyro-Glu'}) str[source]#
Get the proforma sequence with bracketed modifications.
- Parameters:
input_string (str) – The input sequence string.
before_aa (bool, optional) – Whether to add the modification before the amino acid. Defaults to True.
isalpha (bool, optional) – Whether to include alphabetic characters. Defaults to True.
isupper (bool, optional) – Whether to include uppercase characters. Defaults to True.
pattern (str, optional) – The regular expression pattern for matching modifications. Defaults to r”[([^]]+)]”.
modification_dict (dict, optional) – A dictionary of modifications and their names.
- Returns:
The proforma sequence with bracketed modifications.
- Return type:
- proteobench.io.parsing.parse_peptidoform.get_stripped_seq(input_string: str, isalpha: bool = True, isupper: bool = True) str[source]#
Get a stripped version of the sequence containing only characters that match the given criteria.
- proteobench.io.parsing.parse_peptidoform.load_input_file(input_csv: str, input_format: str) DataFrame[source]#
Load a dataframe from a CSV file depending on its format.
- proteobench.io.parsing.parse_peptidoform.match_brackets(input_string: str, pattern: str = '\\[([^]]+)\\]', isalpha: bool = True, isupper: bool = True) tuple[source]#
Match and extract bracketed modifications from the string.
- Parameters:
input_string (str) – The input string.
pattern (str, optional) – The regular expression pattern for matching modifications. Defaults to r”[([^]]+)]”.
isalpha (bool, optional) – Whether to match alphabetic characters. Defaults to True.
isupper (bool, optional) – Whether to match uppercase characters. Defaults to True.
- Returns:
A tuple containing the matched modifications and their positions.
- Return type: