Adding a new module#
Here we provide a comprehensive overview of how to set up a new module in ProteoBench, currently focused on quantification modules, where you only need to check and maybe slightly modify some components. For entirely new module types, you will need to create a new version if it says ‘check, modify or add’ of a component.
Terms#
The following terms capture the crucial components:
Module: All code and definitions for creating and comparing benchmarks of a new data type.
Intermediate data structure (
DataFrame): Data structure needed for the calculation of thedatapoint, e.g.QuantDatapoint. It contains the transformed and annotated data of an uploaded data file.Datapoint: Metadata and benchmarking metrics for a given data set. A
datapoint, e.g.QuantDatapoint, is the data needed for the benchmarking and should also be represented by a json object.
Naming convention#
New modules and classes should be given descriptive names, and fit into existing naming schemes.
Go from general to specific properties, and make clear what distinguishes the module
from existing ones, e.g. DIAQuantPeptidoformModule.
The modules are stored in the Python package proteobench in the
modules subpackage: proteobench.modules.
Programmatic structure#
The modules are located in the proteobench/modules directory. We separated the benchmarking modules into a different steps that allow for a more modular and portable implementation.
Backend#
We make an example of an quant module implementation, where you should use or extend
certain classes and do the following steps:
QuantModulecontains the main functions reading the and processing the uploaded data set, generating the intermediate metric structure and creating thedatapoint, as well as adding it to our collection ofdatapoints. You can subclass it and implement thebenchmarkingmethod and theis_implementedmethod, initializing it with custom parameters in the__init__method.Functions in
proteobench/io/parsing/parse_ion.pyprovide the functions used to parse precursor (open on GitHub)proteobench/io/parsing/io_parse_settings/parse_settings_files.toml(link) links the settings to parse the uploaded data files into the intermediate metric structure used byproteobench/io/parsing/parse_ion.pyper module. The settings file parameters should be defined in the toml file in a folder for a moduleproteobench/io/parse/io_parse_settings/Quant/lfq/DDA/ion/, for example parse_settings_alphadia. New data analysis software has to be added toload_input_file()and the settings are parsed byParseSettingsQuant, which most important method isconvert_to_standard_format().QuantDatapointis the data structure (as a dataclass) ofDataPointfor quant modules. It contains data set properties from the acquisition and processing (e.g. used peptide fdr).PlotDataPointis the class with methods to visualize the benchmarking metrics from theDataPoints.Functionality for calculating score can be found in
QuantScores, which also generates theintermediateoutput.Functions in
proteobench/io/paramsprovide the functions used to parse parameter setting files for data analysis tools (open on GitHub)The possibility to adapt the parsed results before submission is customized based on a module specific json file in proteobench/io/params/json/Quant
Web interface#
The web interface is written in Streamlit. Each module gets assigned a
specific page. There are only few changes necessary
as the main calculations are done in
QuantUIObjects. It contains most
functionality to create the web interface for each quantification module.
Warning
QuantUIObjects should be simplified.
webinterface.pages.pages_variables contains files with dataclasses for the
text for the different modules in the interface.
Relevant functions in QuantUIObjects#
Tab 1:
display_all_data_results_main()shows the description of the module, which is defined in webinterface/pages/pages_variables where we define custom text and unique component names for each module (e.g. for the main plot) to not display on several pages the same plot in the streamlit webinterface.Tab 2:
display_submission_form()displays the submission form based on the module toml configurations in proteobench/io/parsing/io_parse_settings.Tab 2.5:
generate_current_data_plots()displays the metric plot if a new results were added to the module.Tab 3:
display_all_data_results_submitted()Tab 4:
display_public_submission_ui()
creates the input fields for the metadata and the input file format and type. They are given in the proteobench/modules/parsing/io_parse_settings folder, same as for the backend of the module.
generate_results() gathers the data from the backend
and displays them in several figures. Here you will need to edit and adapt the code
to show the respective figures with the right metadata.
Change the text and the field names accordingly in the dataclass
in webinterface.pages.pages_variables.
Storing results#
Results are stored in separate GitHub repositories, where the Webinterface first adds datapoints to an fork of the module-specific results directory. The core functionality is in proteobench.github.gh
Make a new repository in the Proteobench organisation and give it a sensible name, e.g.
Proteobench/Results_quant_ion_DDA.Login to Proteobot organisation (ask for the login details from relevant people)
Make a fork of the new repository under
ProteoBenchtoProteobot
Documentation#
We strongly recommend to keep documenting your code. The documentation is written in Markdown or richtext and can be found in the docs folder. We use Sphinx and myst-parser to build the website.
docs/available-modules Here you can add a file for your new module, using any of the existing module descriptions as a template.
API documentation for your module will be added automatically. You can see it on the readthedocs page built specifically for your pull request.
To work locally on the documentation and get a live preview, install the requirements and run sphinx-autobuild:
pip install '.[docs]'
# selecting the docs folder to watch for changes
sphinx-autobuild --watch ./docs ./docs/source/ ./docs/_build/html/
Then browse to http://localhost:8000 to watch the live preview.
Checklist#
This checklist is meant to help you add a new module to ProteoBench. It is not meant to be exhaustive, but it should cover the most important steps. To see which files need to change for adding a module, have a look at one of the recent examples. Adding a quant module (based on other quant modules): PR 703. Or for adding a new type of module: PR 727.
Subclass
QuantModuleand replace thebenchmarking()method with your own implementation. You can copy from other modules in the folder proteobench/modulesDefine the input formats using toml files in a new subfolder of proteobench/io/parsing/io_parse_settings
Check, modify or add a parsing procedures in proteobench/io/parsing e.g.
parse_ion.pyorparse_peptidoform.py.Check, modify or add datapoint classes to proteobench/datapoint for storing the intermediate data structure.
Check, modify or add the score classes to compute the scoring metrics in proteobench/score
Check, modify or add plotting classes to proteobench/plotting to create the figures for the web interface.
Check, modify or add parameter parsing for new tools in proteobench/io/params
Add a new page defining the module webinterface to webinterface/pages using the base functionality and adding
pages_variablesdataclasses.Create a new results repository for the module in Proteobench and a fork in Proteobot