Adding a new module#

Please use the template for adding a new benchmarking module.

Here we provide a comprehensive overview of how to set up a new module in ProteoBench

Naming convention#

We suggest to understand the following terms as they are crucial components:

Module: All code and definitions for creating and comparing benchmarks of a new data type

Intermediate data structure (DataFrame): Data structure needed for the calculation of the datapoint. It contains the transformed and annotated data of an uploaded data file.

Datapoint: Metadata and benchmarking metrics for a given data set. A datapoint is the data needed for the benchmarking and should also be represented by a json object.

Programmatic structure#

The modules are located in the proteobench/modules <https://github.com/Proteobench/ProteoBench/tree/main/proteobench/modules>_ directory. We separated the benchmarking modules into a different steps that allow for a more modular and portable implementation.

Backend

Each module implementation should contain the following classes:

  1. Module() contains the main functions reading the and processing the uploaded data set, generating the _intermediate_ structure and creating the _datapoint_, as well as adding it to our collection of _datapoints_.

  2. ParseInputs() interfaces with the Streamlit interface providing formatting parameter definitions to create a standardized format from the uploaded files with respect to the given file format (e.g. MaxQuant output file)

  3. Datapoint() is the data structure of _Datapoint_. It contains data set properties from the acquisition and processing (e.g. used peptide fdr) and functions to calculate the benchmarking metrics.

  4. PlotDataPoint() are the functions to visualize the benchmarking metrics from the data points.

  5. ParseInputs() contains the functions to parse the uploaded data files into the _intermediate_ structure. The input file parameters should be defined in the toml file like proteobench/modules/template/io_parse_settings/parse_settings_format1.toml <https://github.com/Proteobench/ProteoBench/blob/main/proteobench/modules/template/io_parse_settings/parse_settings_format1.toml>_.

Web interface

The web interface is written in Streamlit. Each module gets assigned a specific “page”. There are only few changes necessary as the main calculations are done in

StreamlitUI() contains the functions to create the web interface for the module.

_Relevant functions:_

generate_input_field() creates the input fields for the metadate and the input file format and type. They are given by in the proteobench/modules/template/io_parse_settings <https://github.com/Proteobench/ProteoBench/tree/main/proteobench/modules/template/io_parse_settings>_ folder, same as for the backend of the module.

webinterface.pages.TEMPLATE.StreamlitUI.generate_results() gathers the data from the backend and displays them in several figures. Here you will need to edit and adapt the code to show the respective figures with the right metadata.

webinterface.pages.TEMPLATE.WebpageTexts() contains the text for the different parts of the web interface.

Change the text and the field names accordingly

Documentation

We strongly recommend to keep documenting your code. The documentation is written in Sphinx and can be found in the docs <https://github.com/Proteobench/ProteoBench/tree/main/docs>_ folder.

  1. docs/proteobench/modules.rst <https://github.com/Proteobench/ProteoBench/tree/main/docs/proteobench/modules.rst>_ Here you can add a link to your new module

  2. docs/proteobench/template.rst <https://github.com/Proteobench/ProteoBench/tree/main/docs/proteobench/template.rst>_ This template can be used to creat your own documentation file in reStructuredText (rst) format.

  3. docs/webinterface/webinterface.rst <https://github.com/Proteobench/ProteoBench/tree/main/docs/webinterface/webinterface.rst>_ Here you should add a link to the new page in the web interface.

To work on the documentation and get a live preview, install the requirements and run sphinx-autobuild:

pip install .[docs]
sphinx-autobuild  --watch ./ms2rescore ./docs/source/ ./docs/_build/html/

Then browse to http://localhost:8000 to watch the live preview.

Note

Ensure to have changed all occurrences of template to the name of your new module.

Checklist#

This checklist is meant to help you add a new module to ProteoBench. It is not meant to be exhaustive, but it should cover the most important steps.

  1. Copy the template folder in the proteobench/modules directory to a new folder in the same directory. The name of the new directory should be the name of the module.

  2. Define the input formats in the toml files of the proteobench/modules/my_module/io_parse_settings directory and proteobench.modules.my_module.parse_settings.py.

  3. Modify the upload prodecures in the proteobench/modules/my_module/parse.py. This will ensure a standardized data structure for the benchmarking independently from the input file format.

  4. Modify proteobench/modules/my_module/datapoint.py to define the requested metadata about the data acquisition and the benchmarking metrics, all to be stored in a datapoint. You might need to add some function(s) for further processing the standardized data structure.

  5. Modify proteobench/modules/my_module/plot.py to create the figures for the web interface.

  6. Modify proteobench/modules/my_module/module.py to harmonize all procedures called in the benchmarking function.

  7. Copy webinterface.pages.TEMPLATE <https://github.com/Proteobench/ProteoBench/tree/main/webinterface/pages/TEMPLATE>_ to webinterface.pages.my_module and modify the functions to display the figures. Adapt the code according to ensure loading the right figures and data points.

  8. Copy Template to developer-guide/api/proteobench/my_module and modify the documentation accordingly. Add entries to Modules and Web interface