proteobench.io.parsing.parse_settings module#

All formats available for the module.

class proteobench.io.parsing.parse_settings.ParseModificationSettings(parse_settings: Dict[str, Any])[source]#

Bases: object

Class to handle modifications-specific parsing settings.

Parameters:
  • parser (ParseSettings) – The base parse settings object.

  • parse_settings (Dict[str, Any]) – The modifications-specific parse settings.

convert_to_proforma(df: DataFrame, analysis_level: str) tuple[DataFrame, Dict[int, List[str]]][source]#

Convert the DataFrame to a standard format, adding modifications to the ‘proforma’ column.

Parameters:

df (pd.DataFrame) – The input DataFrame to convert.

Returns:

The converted DataFrame and a dictionary mapping replicates to raw data.

Return type:

tuple[pd.DataFrame, Dict[int, List[str]]]

class proteobench.io.parsing.parse_settings.ParseSettingsBuilder(parse_settings_dir: str, module_id: str)[source]#

Bases: object

Class to build the parser settings for a given input format.

Parameters:
  • parse_settings_dir (str) – The directory containing the parse settings files, by default None.

  • module_id (str) – The ID of the module used to fetch the specific parse settings.

build_parser(input_format: str) object[source]#

Build the parser for a given input format using the corresponding TOML files.

Parameters:

input_format (str) – The input format to build the parser for (e.g., “MaxQuant”, “Sage”).

Returns:

The parser for the specified input format.

Return type:

ParseSettings

class proteobench.io.parsing.parse_settings.ParseSettingsQuant(parse_settings: Dict[str, Any], parse_settings_module: Dict[str, Any])[source]#

Bases: object

Structure that contains all the parameters used to parse the given benchmark run output depending on the software tool used.

Parameters:
  • parse_settings (Dict[str, Any]) – The settings for parsing, typically loaded from a TOML file.

  • parse_settings_module (Dict[str, Any]) – Module-specific settings, typically loaded from a TOML file.

add_modification_parser(parser: ParseModificationSettings)[source]#
convert_to_standard_format(df: DataFrame) tuple[DataFrame, Dict[int, List[str]]][source]#

Convert a software tool output into a generic format supported by the module.

Steps: 1. Validate and rename columns 2. Create replicate mapping 3. Filter decoys and contaminants 4. Process species information 5. Handle data format (long vs short) 6. Process modifications if needed 7. Filter zero intensities 8. Format based on analysis level

Parameters:

df (pd.DataFrame) – The input DataFrame to convert.

Returns:

The converted DataFrame and a dictionary mapping replicates to raw data.

Return type:

tuple[pd.DataFrame, Dict[int, List[str]]]

species_dict() Dict[str, str][source]#

Get the species dictionary.

Returns:

A dictionary of species mappings.

Return type:

Dict[str, str]

species_expected_ratio() float[source]#

Get the expected species ratio.

Returns:

The expected ratio of species.

Return type:

float