proteobench.io.parsing.parse_ion module#
Module for parsing precursor ion data from various formats.
- proteobench.io.parsing.parse_ion.aggregate_modification_column(input_string_seq: str, input_string_modifications: str, special_locations: dict = {'Any C-term': -1, 'Any N-term': 0, 'Protein C-term': -1, 'Protein N-term': 0}) str[source]#
Aggregate modifications into a string representing the modified sequence.
- Parameters:
- Returns:
The modified sequence string with aggregated modifications.
- Return type:
- proteobench.io.parsing.parse_ion.aggregate_modification_sites_column(input_string_seq: str, input_string_modifications: str, input_string_sites: str) str[source]#
Aggregate modification sites into a string representing the modified sequence with sites.
- proteobench.io.parsing.parse_ion.count_chars(input_string: str, isalpha: bool = True, isupper: bool = True) int[source]#
Count the number of characters in the string that match the given criteria.
- proteobench.io.parsing.parse_ion.get_proforma_bracketed(input_string: str, before_aa: bool = True, isalpha: bool = True, isupper: bool = True, pattern: str = '\\[([^]]+)\\]', modification_dict: dict = {'+15.9949': 'Oxidation', '+42': 'Acetyl', '+57.0215': 'Carbamidomethyl', '-17.026548': 'Gln->pyro-Glu', '-18.010565': 'Glu->pyro-Glu'}) str[source]#
Generate a proforma string with bracketed modifications.
- Parameters:
input_string (str) – The input sequence string.
before_aa (bool, optional) – Whether to add the modification before the amino acid. Defaults to True.
isalpha (bool, optional) – Whether to include alphabetic characters. Defaults to True.
isupper (bool, optional) – Whether to include uppercase characters. Defaults to True.
pattern (str, optional) – The regular expression pattern for matching modifications. Defaults to “[([^]]+)]”.
modification_dict (dict, optional) – A dictionary of modifications and their names.
- Returns:
The proforma sequence with bracketed modifications.
- Return type:
- proteobench.io.parsing.parse_ion.get_stripped_seq(input_string: str, isalpha: bool = True, isupper: bool = True) str[source]#
Get a stripped version of the sequence containing only characters that match the given criteria.
- proteobench.io.parsing.parse_ion.load_input_file(input_csv: str, input_format: str, input_csv_secondary: str = None) DataFrame[source]#
Load a dataframe from a CSV file depending on its format.
- Parameters:
- Returns:
The loaded dataframe.
- Return type:
pd.DataFrame
- proteobench.io.parsing.parse_ion.match_brackets(input_string: str, pattern: str = '\\[([^]]+)\\]', isalpha: bool = True, isupper: bool = True) tuple[source]#
Match and extract bracketed modifications from the string.
- Parameters:
input_string (str) – The input string.
pattern (str, optional) – The regular expression pattern for matching modifications. Defaults to “[([^]]+)]”.
isalpha (bool, optional) – Whether to match alphabetic characters. Defaults to True.
isupper (bool, optional) – Whether to match uppercase characters. Defaults to True.
- Returns:
A tuple containing the matched modifications and their positions.
- Return type: