proteobench.modules.quant.quant_base_module module#

Quant Base Module.

class proteobench.modules.quant.quant_base_module.QuantModule(token: str | None, proteobench_repo_name: str, proteobot_repo_name: str, parse_settings_dir: str, module_id: str)[source]#

Bases: object

Base Module for Quantification.

Parameters:
  • token (Optional[str]) – The GitHub token.

  • proteobench_repo_name (str) – The name of the ProteoBench repository.

  • proteobot_repo_name (str) – The name of the ProteoBot repository.

  • parse_settings_dir (str) – The directory containing parse settings.

  • module_id (str) – The module identifier for configuration.

EXTRACT_PARAMS_DICT: Dict[str, Any] = {'AlphaDIA': <function extract_params>, 'AlphaPept': <function extract_params>, 'DIA-NN': <function extract_params>, 'FragPipe': <function extract_params>, 'FragPipe (DIA-NN quant)': <function extract_params>, 'MSAID': <function extract_params>, 'MSAngel': <function extract_params>, 'MaxQuant': <function extract_params>, 'PEAKS': <function extract_params>, 'ProlineStudio': <function extract_params>, 'Proteome Discoverer': <function read_spectronaut_settings>, 'Sage': <function extract_params>, 'Spectronaut': <function read_spectronaut_settings>, 'WOMBAT': <function extract_params>, 'i2MassChroQ': <function extract_params>, 'quantms': <function extract_params>}#
add_current_data_point(current_datapoint: Series, all_datapoints: DataFrame | None = None) DataFrame[source]#

Add current data point to previous data points. Load them from file if empty.

Parameters:
  • current_datapoint (pd.Series) – The current data point to add.

  • all_datapoints (Optional[pd.DataFrame]) – Data points from previous runs. Loaded from GitHub repo if None.

Returns:

A DataFrame with the current data point added.

Return type:

pd.DataFrame

benchmarking(input_file: str, input_format: str, user_input: dict, all_datapoints: DataFrame | None, default_cutoff_min_prec: int = 3) tuple[DataFrame, DataFrame, DataFrame][source]#

Main workflow of the module. Used to benchmark workflow results.

Parameters:
  • input_file (str) – Path to the workflow output file.

  • input_format (str) – Format of the workflow output file.

  • user_input (dict) – User-provided parameters for plotting.

  • all_datapoints (Optional[pd.DataFrame]) – DataFrame containing all datapoints from the ProteoBench repo.

  • default_cutoff_min_prec (int, optional) – Minimum number of runs a precursor ion has to be identified in. Defaults to 3.

Returns:

A tuple containing the intermediate data structure, all data points, and the input DataFrame.

Return type:

tuple[DataFrame, DataFrame, DataFrame]

check_new_unique_hash(datapoints: DataFrame) bool[source]#

Check if the new data point has a unique hash.

Parameters:

datapoints (pd.DataFrame) – Data points.

Returns:

Whether the new data point has a unique hash.

Return type:

bool

clone_pr(temporary_datapoints: DataFrame, datapoint_params: Any, remote_git: str, submission_comments: str = 'no comments') str[source]#

Clone the repo and open a pull request with the new data points.

Parameters:
  • temporary_datapoints (pd.DataFrame) – Temporary data points.

  • datapoint_params (Any) – Data point parameters.

  • remote_git (str) – Remote Git repository URL.

  • submission_comments (str, optional) – Comments to be included in the pull request. Defaults to “no comments”.

Returns:

The URL of the created pull request.

Return type:

str

static filter_data_point(all_datapoints: DataFrame, default_val_slider: int = 3) DataFrame[source]#

Filter the data points based on predefined criteria.

Parameters:
  • all_datapoints (pd.DataFrame) – All data points.

  • default_val_slider (int, optional) – The minimum number of observations for filtering. Defaults to 3.

Returns:

A DataFrame containing the filtered data points.

Return type:

pd.DataFrame

is_implemented() bool[source]#

Return whether the module is fully implemented.

Returns:

Always returns True in this implementation.

Return type:

bool

load_params_file(input_file: List[str], input_format: str, json: str) ProteoBenchParameters[source]#

Load parameters from a metadata file depending on its format.

Parameters:
  • input_file (List[str]) – Path to the metadata file.

  • input_format (str) – Format of the metadata file.

Returns:

The parameters for the module.

Return type:

ProteoBenchParameters

obtain_all_data_points(all_datapoints: DataFrame | None = None) DataFrame[source]#

Load all data points, load from file if empty.

Parameters:

all_datapoints (Optional[pd.DataFrame])) – All data points. Loaded from the GitHub repo if None.

Returns:

A DataFrame containing all data points.

Return type:

pd.DataFrame

write_intermediate_raw(directory: str, ident: str, input_file_obj: Any, result_performance: DataFrame, param_loc: List[Any], comment: str, extension_input_file: str = '.txt', extension_input_parameter_file: str = '.txt') None[source]#

Write intermediate and raw data to a directory in zipped form.

Parameters:
  • directory (str) – Directory to write to.

  • ident (str) – Identifier to create a subdirectory for this submission.

  • input_file_obj (Any) – File-like object representing the raw input file.

  • result_performance (pd.DataFrame) – The result performance DataFrame.

  • param_loc (List[Any]) – List of paths to parameter files that need to be copied.

  • comment (str) – User comment for the submission.

write_json_local_development(temporary_datapoints: DataFrame, datapoint_params: dict) str[source]#

Write the datapoints to a JSON file for local development.

Parameters:
  • temporary_datapoints (pd.DataFrame) – Temporary data points.

  • datapoint_params (dict) – Data point parameters.

Returns:

The path to the written JSON file.

Return type:

str