proteobench.io.parsing.parse_ion module#

Module for parsing precursor ion data from various formats.

proteobench.io.parsing.parse_ion.aggregate_modification_column(input_string_seq: str, input_string_modifications: str, special_locations: dict = {'Any C-term': -1, 'Any N-term': 0, 'Protein C-term': -1, 'Protein N-term': 0}) str[source]#

Aggregate modifications into a string representing the modified sequence.

Parameters:
  • input_string_seq (str) – The input sequence string.

  • input_string_modifications (str) – The modifications applied to the sequence.

  • special_locations (dict, optional) – A dictionary specifying special locations for modifications.

Returns:

The modified sequence string with aggregated modifications.

Return type:

str

proteobench.io.parsing.parse_ion.aggregate_modification_sites_column(input_string_seq: str, input_string_modifications: str, input_string_sites: str) str[source]#

Aggregate modification sites into a string representing the modified sequence with sites.

Parameters:
  • input_string_seq (str) – The input sequence string.

  • input_string_modifications (str) – The modifications applied to the sequence.

  • input_string_sites (str) – The positions of the modifications.

Returns:

The modified sequence string with modification sites.

Return type:

str

proteobench.io.parsing.parse_ion.count_chars(input_string: str, isalpha: bool = True, isupper: bool = True) int[source]#

Count the number of characters in the string that match the given criteria.

Parameters:
  • input_string (str) – The input string.

  • isalpha (bool, optional) – Whether to count alphabetic characters. Defaults to True.

  • isupper (bool, optional) – Whether to count uppercase characters. Defaults to True.

Returns:

The count of characters that match the criteria.

Return type:

int

proteobench.io.parsing.parse_ion.get_proforma_bracketed(input_string: str, before_aa: bool = True, isalpha: bool = True, isupper: bool = True, pattern: str = '\\[([^]]+)\\]', modification_dict: dict = {'+15.9949': 'Oxidation', '+42': 'Acetyl', '+57.0215': 'Carbamidomethyl', '-17.026548': 'Gln->pyro-Glu', '-18.010565': 'Glu->pyro-Glu'}) str[source]#

Generate a proforma string with bracketed modifications.

Parameters:
  • input_string (str) – The input sequence string.

  • before_aa (bool, optional) – Whether to add the modification before the amino acid. Defaults to True.

  • isalpha (bool, optional) – Whether to include alphabetic characters. Defaults to True.

  • isupper (bool, optional) – Whether to include uppercase characters. Defaults to True.

  • pattern (str, optional) – The regular expression pattern for matching modifications. Defaults to “[([^]]+)]”.

  • modification_dict (dict, optional) – A dictionary of modifications and their names.

Returns:

The proforma sequence with bracketed modifications.

Return type:

str

proteobench.io.parsing.parse_ion.get_stripped_seq(input_string: str, isalpha: bool = True, isupper: bool = True) str[source]#

Get a stripped version of the sequence containing only characters that match the given criteria.

Parameters:
  • input_string (str) – The input string.

  • isalpha (bool, optional) – Whether to include alphabetic characters. Defaults to True.

  • isupper (bool, optional) – Whether to include uppercase characters. Defaults to True.

Returns:

The stripped sequence.

Return type:

str

proteobench.io.parsing.parse_ion.load_input_file(input_csv: str, input_format: str) DataFrame[source]#

Load a dataframe from a CSV file depending on its format.

Parameters:
  • input_csv (str) – The path to the CSV file.

  • input_format (str) – The format of the input file (e.g., “MaxQuant”, “AlphaPept”, etc.).

Returns:

The loaded dataframe.

Return type:

pd.DataFrame

proteobench.io.parsing.parse_ion.match_brackets(input_string: str, pattern: str = '\\[([^]]+)\\]', isalpha: bool = True, isupper: bool = True) tuple[source]#

Match and extract bracketed modifications from the string.

Parameters:
  • input_string (str) – The input string.

  • pattern (str, optional) – The regular expression pattern for matching modifications. Defaults to “[([^]]+)]”.

  • isalpha (bool, optional) – Whether to match alphabetic characters. Defaults to True.

  • isupper (bool, optional) – Whether to match uppercase characters. Defaults to True.

Returns:

A tuple containing the matched modifications and their positions.

Return type:

tuple

proteobench.io.parsing.parse_ion.to_lowercase(match) str[source]#

Convert a match to lowercase.

Parameters:

match (re.Match) – The match object from a regular expression.

Returns:

The lowercase version of the matched string.

Return type:

str