proteobench.io.params package#

Parameter handling for ProteoBench.

ProteoBenchParameters is initialized from a JSON field-definition file (default: json/Quant/quant_lfq_DDA_ion.json) and populated by per-software extract_params functions in the sibling modules. After population, every parser calls fill_none(), which coerces values to canonical types via normalize().

normalize_dataframe_columns applies the same coercion rules to a full DataFrame of historical datapoints loaded from the results repository.

Normalization rules (applied by normalize() / normalize_dataframe_columns):

Missing sentinel strings ("None", "N/A", "", "unknown", etc.) → np.nan
ident_fdr_psm, ident_fdr_peptide, ident_fdr_protein → float in [0, 1]; values ≥ 1 are treated as percentages and divided by 100
allowed_miscleavages, min/max_peptide_length, min/max_precursor_charge, max_mods, min/max_precursor_mz, min/max_fragment_mz, n_beams, n_peaks, min_mz, max_mz → int
enable_match_between_runs → bool
enzyme → canonical name via _ENZYME_MAP (e.g. "trypsin" → "Trypsin", "kr|p,true" → "Trypsin")
precursor_mass_tolerance, fragment_mass_tolerance → mapped to "Automatic calibration" when a known auto-calibration sentinel is detected (e.g. "dynamic", "0 ppm")

NOT normalized (kept as-is from parsers, parsers should homogenize themselves):

precursor_mass_tolerance, fragment_mass_tolerance, remove_precursor_tol — string, format varies by tool
fixed_mods, variable_mods — string, tool-specific format
quantification_method, protein_inference, abundance_normalization_ions — string
software_name, software_version, search_engine, search_engine_version — string
min_intensity, max_intensity — float/int, kept as-is
tokens — string, semicolon-separated amino acids/modifications
isotope_error_range — string (e.g. "[0, 2]")
decoding_strategy, checkpoint — string, tool-specific

Classes#

ProteoBenchParameters: Parameter container initialized from a JSON field-definition file.

Functions#

normalize_dataframe_columns: Apply the same normalization rules to a historical-results DataFrame.

class proteobench.io.params.ProteoBenchParameters(filename='/home/docs/checkouts/readthedocs.org/user_builds/proteobench/envs/v0.16.3/lib/python3.11/site-packages/proteobench/io/params/json/Quant/quant_lfq_DDA_ion.json', **kwargs)[source]#

Bases: object

Parameter container for a single ProteoBench submission.

Attributes are determined at runtime by the JSON field-definition file; only fields present in that file are set as instance attributes.

Parameters:

filename (str or os.PathLike) – Path to a JSON field-definition file. Defaults to json/Quant/quant_lfq_DDA_ion.json (relative to this package).
**kwargs – Optional attribute overrides applied after JSON initialization. A string value of "None" is coerced to np.nan.

fill_none()[source]#

Convert string "None" sentinels to np.nan and call normalize().

Every extract_params function should call this at the end of parameter extraction so that normalization is applied uniformly.

normalize()[source]#

Coerce parsed parameter values to their canonical types.

This method is called automatically at the end of fill_none() so that every parser benefits without per-parser changes.

Normalization rules#

Any attribute whose value is a missing sentinel string (e.g. "not specified", "N/A", "None", "") is set to np.nan.
FDR fields are coerced to float in the range [0, 1]. Values > 1 are assumed to be percentages and divided by 100.
Integer fields (miscleavages, peptide length, charge, max_mods) are coerced to int.
enable_match_between_runs is coerced to bool.
enzyme is mapped to a canonical name via _ENZYME_MAP.

proteobench.io.params.normalize_dataframe_columns(df: DataFrame) → DataFrame[source]#

Apply the same coercion rules as ProteoBenchParameters.normalize() to an entire DataFrame of historical results.

Operates in-place on df and also returns it for convenience.

proteobench.io.params package#

Classes#

Functions#

Normalization rules#

Submodules#