proteobench.validation.fasta module#
FASTA / reference-database parsing for submission validation.
FastaReference builds the set of accepted protein identifiers from a
FASTA file. It parses common UniProt-style headers (sp|P49327|FAS_HUMAN,
tr|...|...) as well as bare accession-like headers, indexing both the
accession and the entry name so that result protein identifiers can be matched
regardless of which form a tool reports.
The class can be built from raw text, a local path (plain, .gz, or .zip),
in-memory bytes, or an explicit iterable of identifiers. Downloading from a URL
is supported via FastaReference.from_url(); the actual network call is
performed lazily so that importing this module never requires network access.
- class proteobench.validation.fasta.FastaReference(identifiers: Iterable[str] | None = None)[source]#
Bases:
objectSet of protein identifiers derived from a FASTA / reference database.
- Parameters:
identifiers (iterable of str, optional) – Pre-computed identifiers to seed the reference with.
- contains_any(identifiers: Iterable[str]) bool[source]#
Test whether any of several identifiers is present.
- classmethod from_bytes(data: bytes, source_name: str | None = None, member_filename: str | None = None, encoding: str = 'utf-8') FastaReference[source]#
Build a reference from in-memory bytes (plain, gzip, or zip).
- Parameters:
data (bytes) – Raw file content.
source_name (str, optional) – Original file name or URL, used to detect the compression type.
member_filename (str, optional) – Preferred FASTA member name when
datais a ZIP archive.encoding (str, optional) – Text encoding used to decode the FASTA content. Default
"utf-8".
- Returns:
Reference indexing every header’s identifiers.
- Return type:
- classmethod from_identifiers(identifiers: Iterable[str]) FastaReference[source]#
Build a reference directly from an iterable of identifiers.
- Parameters:
identifiers (iterable of str) – Identifiers to index (e.g. accessions extracted elsewhere).
- Returns:
Reference indexing the supplied identifiers.
- Return type:
- classmethod from_path(path: str, member_filename: str | None = None) FastaReference[source]#
Build a reference from a local file path (plain,
.gz, or.zip).- Parameters:
- Returns:
Reference indexing every header’s identifiers.
- Return type:
- classmethod from_text(text: str) FastaReference[source]#
Build a reference from raw FASTA text.
- Parameters:
text (str) – FASTA content (one or more records).
- Returns:
Reference indexing every header’s identifiers.
- Return type:
- classmethod from_url(url: str, member_filename: str | None = None, timeout: int = 60) FastaReference[source]#
Build a reference by downloading a FASTA / zip / gzip from a URL.
requestsis imported lazily so that importing this module does not require network access.- Parameters:
- Returns:
Reference indexing every header’s identifiers.
- Return type: