proteobench.modules.denovo.denovo_base module#

class proteobench.modules.denovo.denovo_base.DeNovoModule(token: str | None, proteobench_repo_name: str, proteobot_repo_name: str, parse_settings_dir: str, module_id: str)[source]#

Bases: object

Base Module for De Novo.

Parameters:

token (Optional[str]) – The GitHub token.
proteobench_repo_name (str) – The name of the ProteoBench repository.
proteobot_repo_name (str) – The name of the ProteoBot repository.
parse_settings_dir (str) – The directory containing parse settings.
module_id (str) – The module identifier for configuration.

EXTRACT_PARAMS_DICT: Dict[str, Any] = {'AdaNovo': <function extract_params>, 'Casanovo': <function extract_params>, 'DeepNovo': <function extract_params>, 'InstaNovo': <function extract_params>, 'Pi-HelixNovo': <function extract_params>, 'Pi-PrimeNovo': <function extract_params>, 'PointNovo': <function extract_params>}#

add_current_data_point(current_datapoint: Series, all_datapoints: DataFrame | None = None) → DataFrame[source]#

Add current data point to previous data points. Load them from file if empty.

Parameters:

current_datapoint (pd.Series) – The current data point to add.
all_datapoints (Optional[pd.DataFrame]) – Data points from previous runs. Loaded from GitHub repo if None.

Returns:

A DataFrame with the current data point added.

Return type:

pd.DataFrame

benchmarking(input_file: str, input_format: str, user_input: dict, all_datapoints: DataFrame | None, default_cutoff_min_prec: int = 3) → tuple[DataFrame, DataFrame, DataFrame][source]#

Main workflow of the module. Used to benchmark workflow results.

Parameters:

input_file (str) – Path to the workflow output file.
input_format (str) – Format of the workflow output file.
user_input (dict) – User-provided parameters for plotting.
all_datapoints (Optional[pd.DataFrame]) – DataFrame containing all datapoints from the ProteoBench repo.
default_cutoff_min_prec (int, optional) – Minimum number of runs an ion has to be identified in. Defaults to 3.

Returns:

A tuple containing the intermediate data structure, all data points, and the input DataFrame.

Return type:

tuple[DataFrame, DataFrame, DataFrame]

check_new_unique_hash(datapoints: DataFrame) → bool[source]#

Check if the new data point has a unique hash.

Parameters:: datapoints (pd.DataFrame) – Data points.
Returns:: Whether the new data point has a unique hash.
Return type:: bool

clone_pr(temporary_datapoints: DataFrame, datapoint_params: Any, remote_git: str, submission_comments: str = 'no comments') → str[source]#

Clone the repo and open a pull request with the new data points.

Parameters:

temporary_datapoints (pd.DataFrame) – Temporary data points.
datapoint_params (Any) – Data point parameters.
remote_git (str) – Remote Git repository URL.
submission_comments (str, optional) – Comments to be included in the pull request. Defaults to “no comments”.

Returns:

The URL of the created pull request.

Return type:

str

static filter_data_point(all_datapoints: DataFrame, default_val_slider: int = 3) → DataFrame[source]#

Filter the data points based on predefined criteria.

Parameters:

all_datapoints (pd.DataFrame) – All data points.
default_val_slider (int, optional) – The minimum number of observations for filtering. Defaults to 3.

Returns:

A DataFrame containing the filtered data points.

Return type:

pd.DataFrame

get_plot_generator() → PlotGeneratorBase[source]#

Get the plot generator for this module.

Returns:: The plot generator instance for creating module-specific plots.
Return type:: PlotGeneratorBase

is_implemented() → bool[source]#

Return whether the module is fully implemented.

Returns:: Always returns True in this implementation.
Return type:: bool

load_params_file(input_file: List[str], input_format: str, **kwargs) → ProteoBenchParameters[source]#

Load parameters from a metadata file depending on its format.

Parameters:

input_file (List[str]) – Path to the metadata file.
input_format (str) – Format of the metadata file.

Returns:

The parameters for the module.

Return type:

ProteoBenchParameters

obtain_all_data_points(all_datapoints: DataFrame | None = None) → DataFrame[source]#

Load all data points, load from file if empty.

Parameters:: all_datapoints (Optional[pd.DataFrame])) – All data points. Loaded from the GitHub repo if None.
Returns:: A DataFrame containing all data points.
Return type:: pd.DataFrame

write_intermediate_raw(dir: str, ident: str, input_file_obj: Any, result_performance: DataFrame, param_loc: List[Any], comment: str, extension_input_file: str = '.txt', extension_input_parameter_file: str = '.txt') → None[source]#

Write intermediate and raw data to a directory in zipped form.

Parameters:

dir (str) – Directory to write to.
ident (str) – Identifier to create a subdirectory for this submission.
input_file_obj (Any) – File-like object representing the raw input file.
result_performance (pd.DataFrame) – The result performance DataFrame.
param_loc (List[Any]) – List of paths to parameter files that need to be copied.
comment (str) – User comment for the submission.

write_json_local_development(temporary_datapoints: DataFrame, datapoint_params: dict) → str[source]#

Write the datapoints to a JSON file for local development.

Parameters:

temporary_datapoints (pd.DataFrame) – Temporary data points.
datapoint_params (dict) – Data point parameters.

Returns:

The path to the written JSON file.

Return type:

str

proteobench.modules.denovo.denovo_base module#

This Page