proteobench.io.parsing.parse_ion module#

Module for parsing precursor ion data from various formats.

proteobench.io.parsing.parse_ion.aggregate_modification_column(input_string_seq: str, input_string_modifications: str, special_locations: dict = {'Any C-term': -1, 'Any N-term': 0, 'Protein C-term': -1, 'Protein N-term': 0}) → str[source]#

Aggregate modifications into a string representing the modified sequence.

Parameters:

input_string_seq (str) – The input sequence string.
input_string_modifications (str) – The modifications applied to the sequence.
special_locations (dict, optional) – A dictionary specifying special locations for modifications.

Returns:

The modified sequence string with aggregated modifications.

Return type:

str

proteobench.io.parsing.parse_ion.aggregate_modification_sites_column(input_string_seq: str, input_string_modifications: str, input_string_sites: str) → str[source]#

Aggregate modification sites into a string representing the modified sequence with sites.

Parameters:

input_string_seq (str) – The input sequence string.
input_string_modifications (str) – The modifications applied to the sequence.
input_string_sites (str) – The positions of the modifications.

Returns:

The modified sequence string with modification sites.

Return type:

str

proteobench.io.parsing.parse_ion.count_chars(input_string: str, isalpha: bool = True, isupper: bool = True) → int[source]#

Count the number of characters in the string that match the given criteria.

Parameters:

input_string (str) – The input string.
isalpha (bool, optional) – Whether to count alphabetic characters. Defaults to True.
isupper (bool, optional) – Whether to count uppercase characters. Defaults to True.

Returns:

The count of characters that match the criteria.

Return type:

int

proteobench.io.parsing.parse_ion.get_proforma_bracketed(input_string: str, before_aa: bool = True, isalpha: bool = True, isupper: bool = True, pattern: str = '\\[([^]]+)\\]', modification_dict: dict = {'+15.9949': 'Oxidation', '+42': 'Acetyl', '+57.0215': 'Carbamidomethyl', '-17.026548': 'Gln->pyro-Glu', '-18.010565': 'Glu->pyro-Glu'}) → str[source]#

Generate a proforma string with bracketed modifications.

Parameters:

input_string (str) – The input sequence string.
before_aa (bool, optional) – Whether to add the modification before the amino acid. Defaults to True.
isalpha (bool, optional) – Whether to include alphabetic characters. Defaults to True.
isupper (bool, optional) – Whether to include uppercase characters. Defaults to True.
pattern (str, optional) – The regular expression pattern for matching modifications. Defaults to “[([^]]+)]”.
modification_dict (dict, optional) – A dictionary of modifications and their names.

Returns:

The proforma sequence with bracketed modifications.

Return type:

str

proteobench.io.parsing.parse_ion.get_stripped_seq(input_string: str, isalpha: bool = True, isupper: bool = True) → str[source]#

Get a stripped version of the sequence containing only characters that match the given criteria.

Parameters:

input_string (str) – The input string.
isalpha (bool, optional) – Whether to include alphabetic characters. Defaults to True.
isupper (bool, optional) – Whether to include uppercase characters. Defaults to True.

Returns:

The stripped sequence.

Return type:

str

proteobench.io.parsing.parse_ion.load_input_file(input_csv: str, input_format: str, input_csv_secondary: str = None) → DataFrame[source]#

Load a dataframe from a CSV file depending on its format.

Parameters:

input_csv (str) – The path to the CSV file.
input_format (str) – The format of the input file (e.g., “MaxQuant”, “AlphaPept”, etc.).
input_csv_secondary (str, optional) – The path to a secondary CSV file (used for some formats like AlphaDIA).

Returns:

The loaded dataframe.

Return type:

pd.DataFrame

proteobench.io.parsing.parse_ion.match_brackets(input_string: str, pattern: str = '\\[([^]]+)\\]', isalpha: bool = True, isupper: bool = True) → tuple[source]#

Match and extract bracketed modifications from the string.

Parameters:

input_string (str) – The input string.
pattern (str, optional) – The regular expression pattern for matching modifications. Defaults to “[([^]]+)]”.
isalpha (bool, optional) – Whether to match alphabetic characters. Defaults to True.
isupper (bool, optional) – Whether to match uppercase characters. Defaults to True.

Returns:

A tuple containing the matched modifications and their positions.

Return type:

tuple

proteobench.io.parsing.parse_ion.to_lowercase(match) → str[source]#

Convert a match to lowercase.

Parameters:: match (re.Match) – The match object from a regular expression.
Returns:: The lowercase version of the matched string.
Return type:: str