proteobench.io.parsing.parse_settings module#

All formats available for the module.

class proteobench.io.parsing.parse_settings.ParseModificationSettings(parse_settings: Dict[str, Any])[source]#

Bases: object

Settings for parsing modifications in protein data.

Parameters:

parse_settings (Dict[str, Any]) – Dictionary containing modification-specific parsing settings.

convert_to_proforma(df: DataFrame, analysis_level: str) tuple[DataFrame, Dict[int, List[str]]][source]#

Convert modifications to ProForma format.

Parameters:
  • df (pd.DataFrame) – Input DataFrame containing modification data.

  • analysis_level (str) – The level of analysis to perform.

Returns:

The converted DataFrame and a dictionary mapping replicates to raw data.

Return type:

tuple[pd.DataFrame, Dict[int, List[str]]]

class proteobench.io.parsing.parse_settings.ParseSettingsBuilder(parse_settings_dir: str, module_id: str)[source]#

Bases: object

Class to build the parser settings for a given input format.

Parameters:
  • parse_settings_dir (str) – The directory containing the parse settings files, by default None.

  • module_id (str) – The ID of the module used to fetch the specific parse settings.

build_parser(input_format: str) object[source]#

Build the parser for a given input format using the corresponding TOML files.

Parameters:

input_format (str) – The input format to build the parser for (e.g., “MaxQuant”, “Sage”).

Returns:

The parser for the specified input format.

Return type:

ParseSettings

class proteobench.io.parsing.parse_settings.ParseSettingsQuant(parse_settings: Dict[str, Any], parse_settings_module: Dict[str, Any])[source]#

Bases: object

Structure that contains all the parameters used to parse the given benchmark run output depending on the software tool used.

Parameters:
  • parse_settings (Dict[str, Any]) – The settings for parsing, typically loaded from a TOML file.

  • parse_settings_module (Dict[str, Any]) – Module-specific settings, typically loaded from a TOML file.

add_modification_parser(parser: ParseModificationSettings)[source]#

Add a modification parser to the settings.

Parameters:

parser (object) – The modification parser to add.

convert_to_standard_format(df: DataFrame) tuple[DataFrame, Dict[int, List[str]]][source]#

Convert a software tool output into a generic format supported by the module.

Steps: 1. Validate and rename columns 2. Create replicate mapping 3. Filter decoys 4. Fix column names 5. Mark contaminants 6. Process species information 7. Handle data format (long vs short) 8. Process modifications if needed 9. Filter zero intensities 10. Format based on analysis level

Parameters:

df (pd.DataFrame) – The input DataFrame to convert.

Returns:

The converted DataFrame and a dictionary mapping replicates to raw data.

Return type:

tuple[pd.DataFrame, Dict[int, List[str]]]

species_dict() Dict[str, str][source]#

Get the species dictionary.

Returns:

A dictionary of species mappings.

Return type:

Dict[str, str]

species_expected_ratio() float[source]#

Get the expected species ratio.

Returns:

The expected ratio of species.

Return type:

float