proteobench.modules.quant.quant_base_module module#
Quant Base Module.
- class proteobench.modules.quant.quant_base_module.QuantModule(token: str | None, proteobench_repo_name: str, proteobot_repo_name: str, parse_settings_dir: str, module_id: str, branch: str | None = None)[source]#
Bases:
objectBase Module for Quantification.
- Parameters:
token (Optional[str]) – The GitHub token.
proteobench_repo_name (str) – The name of the ProteoBench repository.
proteobot_repo_name (str) – The name of the ProteoBot repository.
parse_settings_dir (str) – The directory containing parse settings.
module_id (str) – The module identifier for configuration.
- EXTRACT_PARAMS_DICT: Dict[str, Any] = {'AlphaDIA': <function extract_params>, 'AlphaPept': <function extract_params>, 'DIA-NN': <function extract_params>, 'FragPipe': <function extract_params>, 'FragPipe (DIA-NN quant)': <function extract_params>, 'MSAID': <function extract_params>, 'MSAngel': <function extract_params>, 'MaxQuant': <function extract_params>, 'MetaMorpheus': <function extract_params>, 'PEAKS': <function extract_params>, 'ProlineStudio': <function extract_params>, 'Proteome Discoverer': <function read_spectronaut_settings>, 'Sage': <function extract_params>, 'Spectronaut': <function read_spectronaut_settings>, 'WOMBAT': <function extract_params>, 'i2MassChroQ': <function extract_params>, 'quantms': <function extract_params>}#
- add_current_data_point(current_datapoint: Series, all_datapoints: DataFrame | None = None) DataFrame[source]#
Add current data point to previous data points. Load them from file if empty.
- Parameters:
current_datapoint (pd.Series) – The current data point to add.
all_datapoints (Optional[pd.DataFrame]) – Data points from previous runs. Loaded from GitHub repo if None.
- Returns:
A DataFrame with the current data point added.
- Return type:
pd.DataFrame
- benchmarking(input_file: str, input_format: str, user_input: dict, all_datapoints: DataFrame | None, default_cutoff_min_prec: int = 3, input_file_secondary: str = None) tuple[DataFrame, DataFrame, DataFrame][source]#
Main workflow of the module. Used to benchmark workflow results.
- Parameters:
input_file (str) – Path to the workflow output file.
input_format (str) – Format of the workflow output file.
user_input (dict) – User-provided parameters for plotting.
all_datapoints (Optional[pd.DataFrame]) – DataFrame containing all datapoints from the ProteoBench repo.
default_cutoff_min_prec (int, optional) – Minimum number of runs a precursor ion has to be identified in. Defaults to 3.
input_file_secondary (str, optional) – Path to a secondary input file (used for some formats like AlphaDIA).
- Returns:
A tuple containing the intermediate data structure, all data points, and the input DataFrame.
- Return type:
tuple[DataFrame, DataFrame, DataFrame]
- check_new_unique_hash(datapoints: DataFrame) bool[source]#
Check if the new data point has a unique hash.
- Parameters:
datapoints (pd.DataFrame) – Data points.
- Returns:
Whether the new data point has a unique hash.
- Return type:
- clone_pr(temporary_datapoints: DataFrame, datapoint_params: Any, remote_git: str, submission_comments: str = 'no comments') str[source]#
Clone the repo and open a pull request with the new data points.
- Parameters:
- Returns:
The URL of the created pull request.
- Return type:
- static filter_data_point(all_datapoints: DataFrame, default_val_slider: int = 3) DataFrame[source]#
Filter the data points based on predefined criteria.
- Parameters:
all_datapoints (pd.DataFrame) – All data points.
default_val_slider (int, optional) – The minimum number of observations for filtering. Defaults to 3.
- Returns:
A DataFrame containing the filtered data points.
- Return type:
pd.DataFrame
- get_plot_generator() PlotGeneratorBase[source]#
Get the plot generator for LFQ Ion plots.
- Returns:
The plot generator instance.
- Return type:
- is_implemented() bool[source]#
Return whether the module is fully implemented.
- Returns:
Always returns True in this implementation.
- Return type:
- load_params_file(input_file: List[str], input_format: str, json_file: str) ProteoBenchParameters[source]#
Load parameters from a metadata file depending on its format.
- Parameters:
- Returns:
The parameters for the module.
- Return type:
- obtain_all_data_points(all_datapoints: DataFrame | None = None) DataFrame[source]#
Load all data points, load from file if empty.
- Parameters:
all_datapoints (Optional[pd.DataFrame])) – All data points. Loaded from the GitHub repo if None.
- Returns:
A DataFrame containing all data points.
- Return type:
pd.DataFrame
- write_intermediate_raw(directory: str, ident: str, input_file_obj: Any, result_performance: DataFrame, param_loc: List[Any], comment: str, extension_input_file: str = '.txt', extension_input_parameter_file: str = '.txt', input_file_secondary_obj: Any = None) None[source]#
Write intermediate and raw data to a directory in zipped form.
- Parameters:
directory (str) – Directory to write to.
ident (str) – Identifier to create a subdirectory for this submission.
input_file_obj (Any) – File-like object representing the raw input file.
result_performance (pd.DataFrame) – The result performance DataFrame (intermediate data).
param_loc (List[Any]) – List of paths to parameter files that need to be copied.
comment (str) – User comment for the submission.
input_file_secondary_obj (Any, optional) – File-like object representing a secondary input file (e.g., for AlphaDIA).