proteobench.io.params package#
Parameter handling for ProteoBench.
ProteoBenchParameters is initialized from a JSON field-definition file
(default: json/Quant/quant_lfq_DDA_ion.json) and populated by
per-software extract_params functions in the sibling modules.
After population, every parser calls fill_none(), which coerces values
to canonical types via normalize().
normalize_dataframe_columns applies the same coercion rules to a full
DataFrame of historical datapoints loaded from the results repository.
Normalization rules (applied by normalize() / normalize_dataframe_columns):
Missing sentinel strings (
"None","N/A","","unknown", etc.) →np.nanident_fdr_psm,ident_fdr_peptide,ident_fdr_protein→ float in [0, 1]; values ≥ 1 are treated as percentages and divided by 100allowed_miscleavages,min/max_peptide_length,min/max_precursor_charge,max_mods,min/max_precursor_mz,min/max_fragment_mz,n_beams,n_peaks,min_mz,max_mz→ intenable_match_between_runs→ boolenzyme→ canonical name via_ENZYME_MAP(e.g."trypsin"→"Trypsin","kr|p,true"→"Trypsin")precursor_mass_tolerance,fragment_mass_tolerance→ mapped to"Automatic calibration"when a known auto-calibration sentinel is detected (e.g."dynamic","0 ppm")
NOT normalized (kept as-is from parsers, parsers should homogenize themselves):
precursor_mass_tolerance,fragment_mass_tolerance,remove_precursor_tol— string, format varies by toolfixed_mods,variable_mods— string, tool-specific formatquantification_method,protein_inference,abundance_normalization_ions— stringsoftware_name,software_version,search_engine,search_engine_version— stringmin_intensity,max_intensity— float/int, kept as-istokens— string, semicolon-separated amino acids/modificationsisotope_error_range— string (e.g."[0, 2]")decoding_strategy,checkpoint— string, tool-specific
Classes#
- ProteoBenchParameters
Parameter container initialized from a JSON field-definition file.
Functions#
- normalize_dataframe_columns
Apply the same normalization rules to a historical-results DataFrame.
- class proteobench.io.params.ProteoBenchParameters(filename='/home/docs/checkouts/readthedocs.org/user_builds/proteobench/envs/v0.16.3/lib/python3.11/site-packages/proteobench/io/params/json/Quant/quant_lfq_DDA_ion.json', **kwargs)[source]#
Bases:
objectParameter container for a single ProteoBench submission.
Attributes are determined at runtime by the JSON field-definition file; only fields present in that file are set as instance attributes.
- Parameters:
filename (str or os.PathLike) – Path to a JSON field-definition file. Defaults to
json/Quant/quant_lfq_DDA_ion.json(relative to this package).**kwargs – Optional attribute overrides applied after JSON initialization. A string value of
"None"is coerced tonp.nan.
- fill_none()[source]#
Convert string
"None"sentinels tonp.nanand callnormalize().Every
extract_paramsfunction should call this at the end of parameter extraction so that normalization is applied uniformly.
- normalize()[source]#
Coerce parsed parameter values to their canonical types.
This method is called automatically at the end of
fill_none()so that every parser benefits without per-parser changes.Normalization rules#
Any attribute whose value is a missing sentinel string (e.g.
"not specified","N/A","None","") is set tonp.nan.FDR fields are coerced to
floatin the range [0, 1]. Values> 1are assumed to be percentages and divided by 100.Integer fields (miscleavages, peptide length, charge, max_mods) are coerced to
int.enable_match_between_runsis coerced tobool.enzymeis mapped to a canonical name via_ENZYME_MAP.
- proteobench.io.params.normalize_dataframe_columns(df: DataFrame) DataFrame[source]#
Apply the same coercion rules as
ProteoBenchParameters.normalize()to an entire DataFrame of historical results.Operates in-place on df and also returns it for convenience.
Submodules#
- proteobench.io.params.adanovo module
- proteobench.io.params.alphadia module
clean_line()clean_up_parameters()detect_newer_version()extract_params()extract_values_from_nested_lines()homogenize_modification_string()initialize_default_parameters()map_keys_to_desired_format()parse_key_value()process_fragment_mz()process_key_value_line()process_precursor_charge()process_precursor_len()process_precursor_mz()read_file_lines()
- proteobench.io.params.alphapept module
- proteobench.io.params.casanovo module
- proteobench.io.params.contranovo module
- proteobench.io.params.deepnovo module
- proteobench.io.params.diann module
- proteobench.io.params.fragger module
- proteobench.io.params.i2masschroq module
- proteobench.io.params.instanovo module
- proteobench.io.params.maxdia module
- proteobench.io.params.maxquant module
- proteobench.io.params.metamorpheus module
- proteobench.io.params.msaid module
- proteobench.io.params.msangel module
- proteobench.io.params.peaks module
- proteobench.io.params.pihelixnovo module
- proteobench.io.params.piprimenovo module
- proteobench.io.params.pointnovo module
- proteobench.io.params.proline module
- proteobench.io.params.quantms module
- proteobench.io.params.sage module
- proteobench.io.params.spectronaut module
- proteobench.io.params.wombat module