proteobench.modules.denovo.denovo_base module#
- class proteobench.modules.denovo.denovo_base.DeNovoModule(token: str | None, proteobench_repo_name: str, proteobot_repo_name: str, parse_settings_dir: str, module_id: str)[source]#
Bases:
objectBase Module for De Novo.
- Parameters:
token (Optional[str]) – The GitHub token.
proteobench_repo_name (str) – The name of the ProteoBench repository.
proteobot_repo_name (str) – The name of the ProteoBot repository.
parse_settings_dir (str) – The directory containing parse settings.
module_id (str) – The module identifier for configuration.
- EXTRACT_PARAMS_DICT: Dict[str, Any] = {'AdaNovo': <function extract_params>, 'Casanovo': <function extract_params>, 'DeepNovo': <function extract_params>, 'InstaNovo': <function extract_params>, 'Pi-HelixNovo': <function extract_params>, 'Pi-PrimeNovo': <function extract_params>, 'PointNovo': <function extract_params>}#
- add_current_data_point(current_datapoint: Series, all_datapoints: DataFrame | None = None) DataFrame[source]#
Add current data point to previous data points. Load them from file if empty.
- Parameters:
current_datapoint (pd.Series) – The current data point to add.
all_datapoints (Optional[pd.DataFrame]) – Data points from previous runs. Loaded from GitHub repo if None.
- Returns:
A DataFrame with the current data point added.
- Return type:
pd.DataFrame
- benchmarking(input_file: str, input_format: str, user_input: dict, all_datapoints: DataFrame | None, default_cutoff_min_prec: int = 3) tuple[DataFrame, DataFrame, DataFrame][source]#
Main workflow of the module. Used to benchmark workflow results.
- Parameters:
input_file (str) – Path to the workflow output file.
input_format (str) – Format of the workflow output file.
user_input (dict) – User-provided parameters for plotting.
all_datapoints (Optional[pd.DataFrame]) – DataFrame containing all datapoints from the ProteoBench repo.
default_cutoff_min_prec (int, optional) – Minimum number of runs an ion has to be identified in. Defaults to 3.
- Returns:
A tuple containing the intermediate data structure, all data points, and the input DataFrame.
- Return type:
tuple[DataFrame, DataFrame, DataFrame]
- check_new_unique_hash(datapoints: DataFrame) bool[source]#
Check if the new data point has a unique hash.
- Parameters:
datapoints (pd.DataFrame) – Data points.
- Returns:
Whether the new data point has a unique hash.
- Return type:
- clone_pr(temporary_datapoints: DataFrame, datapoint_params: Any, remote_git: str, submission_comments: str = 'no comments') str[source]#
Clone the repo and open a pull request with the new data points.
- Parameters:
- Returns:
The URL of the created pull request.
- Return type:
- static filter_data_point(all_datapoints: DataFrame, default_val_slider: int = 3) DataFrame[source]#
Filter the data points based on predefined criteria.
- Parameters:
all_datapoints (pd.DataFrame) – All data points.
default_val_slider (int, optional) – The minimum number of observations for filtering. Defaults to 3.
- Returns:
A DataFrame containing the filtered data points.
- Return type:
pd.DataFrame
- get_plot_generator() PlotGeneratorBase[source]#
Get the plot generator for this module.
- Returns:
The plot generator instance for creating module-specific plots.
- Return type:
- is_implemented() bool[source]#
Return whether the module is fully implemented.
- Returns:
Always returns True in this implementation.
- Return type:
- load_params_file(input_file: List[str], input_format: str, **kwargs) ProteoBenchParameters[source]#
Load parameters from a metadata file depending on its format.
- Parameters:
- Returns:
The parameters for the module.
- Return type:
- obtain_all_data_points(all_datapoints: DataFrame | None = None) DataFrame[source]#
Load all data points, load from file if empty.
- Parameters:
all_datapoints (Optional[pd.DataFrame])) – All data points. Loaded from the GitHub repo if None.
- Returns:
A DataFrame containing all data points.
- Return type:
pd.DataFrame
- write_intermediate_raw(dir: str, ident: str, input_file_obj: Any, result_performance: DataFrame, param_loc: List[Any], comment: str, extension_input_file: str = '.txt', extension_input_parameter_file: str = '.txt') None[source]#
Write intermediate and raw data to a directory in zipped form.
- Parameters:
dir (str) – Directory to write to.
ident (str) – Identifier to create a subdirectory for this submission.
input_file_obj (Any) – File-like object representing the raw input file.
result_performance (pd.DataFrame) – The result performance DataFrame.
param_loc (List[Any]) – List of paths to parameter files that need to be copied.
comment (str) – User comment for the submission.