DIA quantification - precursor ion level (PYE: Plasma/Yeast/E. coli)#

This module compares the sensitivity and quantification accuracy for data-independent acquisition (DIA) data on plasma samples spiked with yeast and E. coli (PYE dataset). Users can load their data and inspect the results privately. They can also make their outputs public by providing the associated parameter file and submitting the benchmark run to ProteoBench. By doing so, their workflow output will be stored alongside all other benchmark runs in ProteoBench and will be accessible to the entire community.

This module is not designed to compare later-stages post-processing of quantitative data such as missing value replacement, and we advise users to publicly upload data without replacement of missing values and without manual filtering.

We think that this module is more suited to evaluate the impact of (non exhaustive list):

  • search engine identification

  • peak picking

  • low-level ion signal normalisation

  • performance on complex biological matrices (plasma background)

Other modules will be more suited to explore further post-processing steps.

Release stage: ALPHA.

Data set#

The PYE (Plasma/Yeast/E. coli) dataset is based on a comprehensive benchmark study (Distler et al., Nat. Comms.) evaluating DIA quantification strategies. The dataset uses a three-component mixture to simulate complex biological quantification scenarios:

  • HUMAN (plasma background) - represents the complex endogenous proteome

  • YEAST (spike-in) - log2FC ≈ -1.585 (1:3 ratio A/B)

  • ECOLI (spike-in) - log2FC = 1 (2:1 ratio A/B)

Expected condition ratios (A/B):#

  • HUMAN: 1.0 (log2FC = 0)

  • YEAST: 1/3 (log2FC ≈ -1.585)

  • ECOLI: 2.0 (log2FC = 1)

Sample composition and preparation#

The samples consist of commercial peptide digest standards from three species:

  • Escherichia coli (Waters Corporation)

  • Saccharomyces cerevisiae (Promega)

  • Human plasma (as the complex biological background)

The samples are mixed to achieve the specified fold-change ratios, with condition A containing the base ratio and condition B containing the modified ratios. Each condition is measured in six technical replicates, resulting in a total of 12 raw files (6 replicates × 2 conditions).

Data acquisition#

The dataset was acquired on a timsTOF instrument using data-independent acquisition (DIA) with nano-LC separation. The raw files follow the naming convention:

  • Condition A: A9_G_DIA_nLC_tTOF_R1A9_G_DIA_nLC_tTOF_R6

  • Condition B: B9_G_DIA_nLC_tTOF_R1B9_G_DIA_nLC_tTOF_R6

Full acquisition details and analytical procedures are available in the original publication.

Downloading the data#

The files can be downloaded from the proteomeXchange repository JPST003358:

Alternatively, you can download them from the ProteoBench server here: proteobench.cubimed.rub.de/raws/DIA-plasma/

It is imperative not to rename the files once downloaded!

FASTA database#

Download the zipped FASTA file here: ProteoBenchFASTA_MixedSpecies_HYE.zip.

The fasta file provided for this module contains the three species present in the samples and contaminant proteins (Frankenfield et al., JPR)

Metric calculation#

Contaminant sequences flagged with the prefix “Cont_” in the FASTA file are removed, as well as precursor ions that match proteins from multiple species and ions that are not quantified in any raw file. Missing values are handled appropriately per tool specification.

Quantification values are log2-transformed, and the mean signal per condition is calculated with standard deviation and coefficient of variation (CV). For each precursor ion, the difference between the mean(log2) in condition A and condition B is compared to its expected value (epsilon).

Main plot dimensions:#

  • X-axis: absolute log2 fold-change error for spike-ins (YEAST + ECOLI), displayed as Median or Mean

  • Y-axis: number of quantified spike-in precursor ions

  • Dot size: dynamic range of HUMAN plasma precursors (mean of condition-wise log10 90th-10th percentile spread)

  • Dot opacity: HUMAN plasma quantification accuracy (absolute epsilon; darker coloring = better accuracy)

Calculation modes:#

Two error calculation modes are available:

  • Global: globally calculated error across all spike-in precursors

  • Species-weighted: per-species error (YEAST, ECOLI) averaged equally

Both modes are available in Mean and Median variants.

A cutoff slider allows filtering of precursors by the minimum number of runs in which the precursor is observed.

How to use#

Click here if you want to submit your results or when you want to explore the plasma quantification module.

Input data for private visualization#

The module supports multiple data formats to maximize flexibility. Users can process the data with their preferred DIA analysis tool, as long as one of the supported formats is generated.

Currently supported input formats in this module:

  • DIA-NN

  • AlphaDIA

  • Spectronaut

  • PEAKS

  • FragPipe (DIA-NN Quant)

  • Custom (tab-delimited format)

Suggested parameters#

To ensure fair comparison between different processing workflows, we suggest using the parameters listed below:

Parameter

Value

Maximum number of missed cleavages

1

PSM/Precursor FDR

0.01

Spectral Library

Predicted spectral library from FASTA

Precursor charge state

1-5

Fixed modifications

Carbamidomethylation (C)

Variable modifications

Oxidation (M), Acetyl (Protein N-term)

Minimum peptide length

6-7 residues

These parameters represent a standardized configuration to evaluate the intrinsic performance of different analysis tools without the confounding effects of non-standard parameter choices.

Important Tool-specific settings#

Detailed instructions and optimal settings for each supported tool are provided below.

DIA-NN#

  1. Use the provided FASTA file to generate a spectral library using DIA-NN’s library generation mode

  2. Process the raw files using the standard DIA-NN workflow with the recommended DIA settings

  3. Export the results as either *_report.tsv or *_report.parquet format

  4. The parameter log file *_report.log.txt should be collected for public submissions

AlphaDIA#

  1. Process your DIA raw files with AlphaDIA following the standard workflow

  2. AlphaDIA generates two important output files that must both be uploaded:

    • precursors.tsv - contains precursor-level quantification data in long format

    • precursor.matrix.tsv - contains the quantification matrix

  3. Both files are required for proper parsing

  4. If your AlphaDIA version outputs a different format, you may need to preprocess the files using the ProteoBench_input_conversion.ipynb Jupyter Notebook

  5. Upload the log.txt file for public submissions

Spectronaut#

  1. Create a spectral library from the provided FASTA using Spectronaut’s library generation tools

  2. Process your DIA raw files using DirectDIA or standard Spectronaut DIA analysis workflow

  3. Export results in the BGS Factory Report format: *_Report.tsv

  4. Use the Spectronaut setup file *_Report.setup.txt for public submission, which contains all analysis parameters

  5. Ensure that your export includes precursor-level quantification data with columns for: modified peptide sequence, charge state, protein IDs, and intensity values

FragPipe (DIA-NN Quant)#

  1. Load the DIA_SpecLib_Quant workflow

  2. Import your DIA raw files into FragPipe

  3. Assign experimental group information to raw files

  4. Generate or use a spectral library from the provided FASTA

  5. Important: Make sure contaminants are not added when you add decoys to the database

  6. Run the analysis and export DIA-NN output *_report.tsv file containing precursor-level quantification

  7. For public submissions, provide the fragpipe.workflow parameter file that corresponds to your search

Note: FragPipe output files concatenate protein identifiers from “Proteins” and “Mapped Proteins” columns to create protein groups.

PEAKS#

  1. Create a new PEAKS project and import your DIA raw files

  2. Configure sample grouping to match your experimental design (Condition A vs. Condition B)

  3. Set up the DIA quantification method with appropriate parameters

  4. Use label-free quantification (LFQ) with Identification Directed Quantification (IDQ) mode

  5. Export results as a text report (.txt format) containing precursor-level quantification data

  6. For public submission, upload the settings text file (.txt) containing all analysis parameters

Custom format#

If you do not use a tool that is compatible with ProteoBench, you can upload a tab-delimited table format containing the following columns:

Column

Description

Sequence

Peptide sequence without modifications

Modified sequence

Sequence with localised modifications in ProForma standard format

Proteins

Protein identifiers separated by “;”; must contain species flags (e.g., “_HUMAN”, “_YEAST”, “_ECOLI”)

Charge

Charge state of the precursor ion

A9_G_DIA_nLC_tTOF_R1 … R6

Quantitative intensity values for condition A replicates

B9_G_DIA_nLC_tTOF_R1 … R6

Quantitative intensity values for condition B replicates

The table should contain only validated ions and must not include contaminant sequences or non-specific peptides.

Submit your run for public usage#

When you have successfully uploaded and visualized a benchmark run, we strongly encourage you to add the result to the online repository. This way, your run will be available to the entire community and can be compared to all other uploaded benchmark runs.

To submit your benchmark run publicly:

  1. Upload your quantification output file (in one of the supported formats)

  2. Provide the matching parameter/log file(s) from your analysis tool

  3. Fill in optional comments describing your workflow, any filtering steps, or notable observations

  4. Confirm the metadata information (software name, version, parameters)

  5. Submit your benchmark run

Once submitted, a GitHub pull request will be automatically generated for tracking and community review. Your workflow output, parameters, and calculated metrics will be stored and made publicly available.

Tool-specific input files#

Tool

Quantification input

Metadata / parameter file

DIA-NN

*_report.tsv or *_report.parquet

*_report.log.txt

AlphaDIA

precursors.tsv + precursor.matrix.tsv (both required; see Jupyter Notebook for preprocessing)

log.txt

Spectronaut

*_Report.tsv (BGS factory report format)

*_Report.setup.txt

FragPipe (DIA-NN Quant)

*_report.tsv

fragpipe.workflow

PEAKS

PEAKS DIA output file (.txt format - export as text report)

Settings text file (.txt)

Custom

Tab-separated values (.tsv or .csv) following standard format

Not required

Notes#

  • Contaminants are expected to be flagged with Cont_.

  • Species annotation in protein identifiers must support _HUMAN, _YEAST, and _ECOLI mapping.

  • For this module page, only currently implemented formats and behavior are documented.