E. coli)#

This module compares the sensitivity and quantification accuracy for data-independent acquisition (DIA) data on plasma samples spiked with yeast and E. coli (PYE dataset). Users can load their data and inspect the results privately. They can also make their outputs public by providing the associated parameter file and submitting the benchmark run to ProteoBench. By doing so, their workflow output will be stored alongside all other benchmark runs in ProteoBench and will be accessible to the entire community.

This module is not designed to compare later-stages post-processing of quantitative data such as missing value replacement, and we advise users to publicly upload data without replacement of missing values and without manual filtering.

We think that this module is more suited to evaluate the impact of (non exhaustive list):

search engine identification
peak picking
low-level ion signal normalisation
performance on complex biological matrices (plasma background)

Other modules will be more suited to explore further post-processing steps.

Release stage: ALPHA.

Data set#

The PYE (Plasma/Yeast/E. coli) dataset is based on a comprehensive benchmark study (Distler et al., Nat. Comms.) evaluating DIA quantification strategies. The dataset uses a three-component mixture to simulate complex biological quantification scenarios:

HUMAN (plasma background) - represents the complex endogenous proteome
YEAST (spike-in) - log2FC ≈ -1.585 (1:3 ratio A/B)
ECOLI (spike-in) - log2FC = 1 (2:1 ratio A/B)

Expected condition ratios (A/B):#

HUMAN: 1.0 (log2FC = 0)
YEAST: 1/3 (log2FC ≈ -1.585)
ECOLI: 2.0 (log2FC = 1)

Sample composition and preparation#

The samples consist of commercial peptide digest standards from three species:

Escherichia coli (Waters Corporation)
Saccharomyces cerevisiae (Promega)
Human plasma (as the complex biological background)

The samples are mixed to achieve the specified fold-change ratios, with condition A containing the base ratio and condition B containing the modified ratios. Each condition is measured in six technical replicates, resulting in a total of 12 raw files (6 replicates × 2 conditions).

Data acquisition#

The dataset was acquired on a timsTOF instrument using data-independent acquisition (DIA) with nano-LC separation. The raw files follow the naming convention:

Condition A: A9_G_DIA_nLC_tTOF_R1 … A9_G_DIA_nLC_tTOF_R6
Condition B: B9_G_DIA_nLC_tTOF_R1 … B9_G_DIA_nLC_tTOF_R6

Full acquisition details and analytical procedures are available in the original publication.

Downloading the data#

The files can be downloaded from the proteomeXchange repository JPST003358:

Alternatively, you can download them from the ProteoBench server here: proteobench.cubimed.rub.de/raws/DIA-plasma/

It is imperative not to rename the files once downloaded!

FASTA database#

Download the zipped FASTA file here: ProteoBenchFASTA_MixedSpecies_HYE.zip.

The fasta file provided for this module contains the three species present in the samples and contaminant proteins (Frankenfield et al., JPR)

Metric calculation#

Contaminant sequences flagged with the prefix “Cont_” in the FASTA file are removed, as well as precursor ions that match proteins from multiple species and ions that are not quantified in any raw file. Missing values are handled appropriately per tool specification.

Quantification values are log2-transformed, and the mean signal per condition is calculated with standard deviation and coefficient of variation (CV). For each precursor ion, the difference between the mean(log2) in condition A and condition B is compared to its expected value (epsilon).

Main plot dimensions:#

X-axis: absolute log2 fold-change error for spike-ins (YEAST + ECOLI), displayed as Median or Mean
Y-axis: number of quantified spike-in precursor ions
Dot size: dynamic range of HUMAN plasma precursors (mean of condition-wise log10 90th-10th percentile spread)
Dot opacity: HUMAN plasma quantification accuracy (absolute epsilon; darker coloring = better accuracy)

Calculation modes:#

Two error calculation modes are available:

Global: globally calculated error across all spike-in precursors
Species-weighted: per-species error (YEAST, ECOLI) averaged equally

Both modes are available in Mean and Median variants.

A cutoff slider allows filtering of precursors by the minimum number of runs in which the precursor is observed.

How to use#

Click here if you want to submit your results or when you want to explore the plasma quantification module.

Input data for private visualization#

The module supports multiple data formats to maximize flexibility. Users can process the data with their preferred DIA analysis tool, as long as one of the supported formats is generated.

Currently supported input formats in this module:

DIA-NN
AlphaDIA
Spectronaut
PEAKS
FragPipe (DIA-NN Quant)
Custom (tab-delimited format)

Suggested parameters#

To ensure fair comparison between different processing workflows, we suggest using the parameters listed below:

Parameter	Value
Maximum number of missed cleavages	1
PSM/Precursor FDR	0.01
Spectral Library	Predicted spectral library from FASTA
Precursor charge state	1-5
Fixed modifications	Carbamidomethylation (C)
Variable modifications	Oxidation (M), Acetyl (Protein N-term)
Minimum peptide length	6-7 residues

These parameters represent a standardized configuration to evaluate the intrinsic performance of different analysis tools without the confounding effects of non-standard parameter choices.

Important Tool-specific settings#

Detailed instructions and optimal settings for each supported tool are provided below.

DIA-NN #

Use the provided FASTA file to generate a spectral library using DIA-NN’s library generation mode
Process the raw files using the standard DIA-NN workflow with the recommended DIA settings
Export the results as either *_report.tsv or *_report.parquet format
The parameter log file *_report.log.txt should be collected for public submissions

AlphaDIA #

Process your DIA raw files with AlphaDIA following the standard workflow
AlphaDIA generates two important output files that must both be uploaded:
- precursors.tsv - contains precursor-level quantification data in long format
- precursor.matrix.tsv - contains the quantification matrix
Both files are required for proper parsing
If your AlphaDIA version outputs a different format, you may need to preprocess the files using the ProteoBench_input_conversion.ipynb Jupyter Notebook
Upload the log.txt file for public submissions

Spectronaut #

Create a spectral library from the provided FASTA using Spectronaut’s library generation tools
Process your DIA raw files using DirectDIA or standard Spectronaut DIA analysis workflow
Export results in the BGS Factory Report format: *_Report.tsv
Use the Spectronaut setup file *_Report.setup.txt for public submission, which contains all analysis parameters
Ensure that your export includes precursor-level quantification data with columns for: modified peptide sequence, charge state, protein IDs, and intensity values

FragPipe (DIA-NN Quant)#

Load the DIA_SpecLib_Quant workflow
Import your DIA raw files into FragPipe
Assign experimental group information to raw files
Generate or use a spectral library from the provided FASTA
Important: Make sure contaminants are not added when you add decoys to the database
Run the analysis and export DIA-NN output *_report.tsv file containing precursor-level quantification
For public submissions, provide the fragpipe.workflow parameter file that corresponds to your search

Note: FragPipe output files concatenate protein identifiers from “Proteins” and “Mapped Proteins” columns to create protein groups.

PEAKS #

Create a new PEAKS project and import your DIA raw files
Configure sample grouping to match your experimental design (Condition A vs. Condition B)
Set up the DIA quantification method with appropriate parameters
Use label-free quantification (LFQ) with Identification Directed Quantification (IDQ) mode
Export results as a text report (.txt format) containing precursor-level quantification data
For public submission, upload the settings text file (.txt) containing all analysis parameters

Custom format#

If you do not use a tool that is compatible with ProteoBench, you can upload a tab-delimited table format containing the following columns:

Column	Description
Sequence	Peptide sequence without modifications
Modified sequence	Sequence with localised modifications in ProForma standard format
Proteins	Protein identifiers separated by “;”; must contain species flags (e.g., “_HUMAN”, “_YEAST”, “_ECOLI”)
Charge	Charge state of the precursor ion
A9_G_DIA_nLC_tTOF_R1 … R6	Quantitative intensity values for condition A replicates
B9_G_DIA_nLC_tTOF_R1 … R6	Quantitative intensity values for condition B replicates

The table should contain only validated ions and must not include contaminant sequences or non-specific peptides.

Submit your run for public usage#

When you have successfully uploaded and visualized a benchmark run, we strongly encourage you to add the result to the online repository. This way, your run will be available to the entire community and can be compared to all other uploaded benchmark runs.

To submit your benchmark run publicly:

Upload your quantification output file (in one of the supported formats)
Provide the matching parameter/log file(s) from your analysis tool
Fill in optional comments describing your workflow, any filtering steps, or notable observations
Confirm the metadata information (software name, version, parameters)
Submit your benchmark run

Once submitted, a GitHub pull request will be automatically generated for tracking and community review. Your workflow output, parameters, and calculated metrics will be stored and made publicly available.

Tool-specific input files#

Tool	Quantification input	Metadata / parameter file
DIA-NN	`_report.tsv` or `_report.parquet`	`*_report.log.txt`
AlphaDIA	`precursors.tsv` + `precursor.matrix.tsv` (both required; see Jupyter Notebook for preprocessing)	`log.txt`
Spectronaut	`*_Report.tsv` (BGS factory report format)	`*_Report.setup.txt`
FragPipe (DIA-NN Quant)	`*_report.tsv`	`fragpipe.workflow`
PEAKS	PEAKS DIA output file (`.txt` format - export as text report)	Settings text file (`.txt`)
Custom	Tab-separated values (`.tsv` or `.csv`) following standard format	Not required

Notes#

Contaminants are expected to be flagged with Cont_.
Species annotation in protein identifiers must support _HUMAN, _YEAST, and _ECOLI mapping.
For this module page, only currently implemented formats and behavior are documented.