Known limitations¶
This document spells out what the Secure SDLC Evidence Collector does not do, where heuristics can produce false positives or false negatives, and which scenarios require human interpretation on top of the bundle. Publishing these limits up-front is part of the project's public contract: the tool aims at traceable honesty, not marketing.
1 · Not a scanner¶
- The collector does not scan source code, containers, or infrastructure. It ingests the output of third-party tools.
- Rule selection, severity thresholds, and exploitability labels are decided by the upstream scanner. We never override them.
- If a scanner misses a vulnerability, the collector cannot synthesize it. Reference evidence is the signal, not the ground truth.
2 · SARIF classification heuristic¶
SARIF is a format, not a taxonomy. A single SARIF file may contain
multiple runs from different tools with overlapping intent (Trivy
exports vuln + secret + misconfig in one file; Snyk exports SAST + SCA
separately; Sonar exports a mix). The collector classifies each run
into exactly one evidence_type based on the driver name.
- False-positive risk: a Trivy SARIF with a vuln run and a secret
run in the same file is classified as
sca_scan; the secret run is counted under SCA rather than undersecrets_scan. Mitigation: export Trivy's secret scan into a separate file (trivy fs --scanners secret --format sarif --output trivy-secrets.sarif) or provide the output of a dedicated secrets tool (Gitleaks/TruffleHog). - False-negative risk: an unknown driver defaults to
sast_scan. If the driver was actually a DAST tool, thesast_scancontrol is credited whiledast_scanremains missing. - Mitigation: the classification heuristic and the fallback label are
both documented in the bundle's
evidence[*].rationalefield, so reviewers can see why the collector chose a label. - Since
v1.1.1the bundle also surfaces a first-classevidence[*].classificationfield withconfidence(high/medium/low),reason(driver_match/manual_override/fallback_sast) and the originaldriver_name. Downstream consumers can filter or weight evidence by classification confidence without parsing therationaleprose, and afallback_sastreason is a reliable signal that the underlying tool was unknown to the heuristic.
3 · Evidence quality is shallow¶
The collector checks presence, not correctness.
- A SAST SARIF with zero findings still satisfies the
sast_scancontrol — the control asks "was a scan run?", not "was the scan effective?". - Attestations are validated against a Pydantic schema (shape,
required fields,
extra='forbid') but the content of those fields is not audited. Arelease_approval.yamlnamingApprover: joe@example.comis accepted at face value. - Suppressions (e.g.
.semgrepignore, CodeQL# lgtm[...]) are visible only if the underlying scanner reports them in its SARIF. The collector does not flag "scan ran but everything was suppressed". - SBOM evidence carries a
metadata.cisa_2025_minimum_elementspresence map and ametadata.cisa_2025_conformantflag, checked against CISA's 2025 Minimum Elements for an SBOM (author, timestamp, supplier, component name, version, unique identifier, dependency relationships, hash, license, tool name, generation context). A component-level element is reported present only when every component carries it. This checks the SBOM's shape, not the correctness of the values — a component with a bogus license string still counts thelicenseelement as present.
4 · Control catalog is small on purpose¶
- The default catalog has 13 controls focused on widely supported SSDF + OWASP SAMM practices plus four org-internal controls.
- It is not a full SSDF implementation (PS.1, PS.3-partial, PW.2,
PW.5, PW.6, RV.* are out of scope). Custom catalogs via
--catalogare the supported extension point. - Criticality values are advisory. The release-status rule only cares
about
criticalandhigh;mediumandlowcontrols drop the verdict toconditionalat worst.
5 · SCM coverage¶
- Only GitHub and GitLab are first-class. Azure DevOps and Bitbucket are on the roadmap but not implemented.
- "Last approval after last commit" for GitHub looks at REST
/pulls/:n/reviews. Dismissed reviews and draft reviews are ignored. Required-reviewer policies on protected branches are not checked. - Code-review evidence from providers without reviewer metadata (e.g.
signed-off-by on plain git) is not parsed; supply a
code_review.yamlattestation instead.
6 · DAST coverage¶
- The ZAP parser reads ZAP's JSON baseline/full-scan reports. Nuclei, Burp XML, Wapiti, and ZAP's automation framework output are not parsed.
- If a release's DAST evidence is a screenshot or a PDF, it must be
attested via
dast_scaningeneric_attestation.yamlwith a link.
7 · Signing verification¶
- The collector records
artifact_signaturepresence (a file exists and declares the signed digest). It does not verify the signature against a key or an identity. - Verification of cosign keyless signatures is expected upstream (for
example in the
publish-pypi.ymlworkflow or on the consumer side). The project's own release workflow signs its artifacts; downstream consumers still need to runcosign verify-blobthemselves.
8 · Determinism boundaries¶
- Bundle JSON is deterministic for a fixed set of inputs after
stripping four per-run timestamp fields:
generated_at,evaluated_at,collected_at, and the release-uniquebundle_id. Those four fields intentionally change per run; everything else (includingevidence_id, which is derived from a SHA-256 of the input artifact and context, not from a random UUID) is stable. - File iteration order depends on the filesystem. The collector sorts
paths before ingestion, so Linux and Windows should produce the same
bundle — the
test_sample_release_bundle_is_stable_across_artifact_reorderregression test catches drift if this ever changes.
9 · Performance envelope¶
- Individual artifact hard cap: 25 MB.
- No concurrency inside a single
run; a 100-artifact bundle is processed serially. Benchmarks on the sample and lab fixtures finish under 2 s; releases with hundreds of SARIF files will take longer. - No streaming mode: every artifact is parsed into memory and held until export. Very large bundles may need a catalog split.
10 · Scope out of bug-bounty¶
These are explicitly out of scope for security reports (see
SECURITY.md):
- issues in third-party SARIF / SBOM parsers we call,
- DoS that requires privileged filesystem access,
- missing security headers in the
summary.htmlartifact when served outside the tool's default invocation.
11 · Things the verdict does not imply¶
readydoes not mean "legally compliant with SSDF / PCI-DSS / SOC 2 / ISO 27001". It means "every required critical and high control in the active catalog has evidence attached".conditionaldoes not mean "ship at your own risk". It means "medium-criticality gaps or only recommended-evidence gaps remain — review them before promotion".not_readydoes not automatically mean "the release is broken". A critical control without evidence may also reflect an immature evidence pipeline rather than an unsafe release.
12 · When results need human interpretation¶
- SARIF from a scanner run with
--severity=low— the collector records it, but the release manager has to decide whether a low-severity-only run counts as a meaningful SAST check. - Waivers (
exceptions/*.yaml) that are expired vs. valid — the collector flags expiry but never decides whether the waiver should have existed. - DAST coverage on an API release where only UI was scanned — the
collector sees a
dast_scan, not its coverage.
13 · Enriched bundles are not byte-stable across EPSS / KEV feed refreshes¶
sdlc-evidence enrichandsdlc-evidence run --enrichwrite EPSS / KEV signal into evidence. The structural-hash gate deliberately capturesepss_feed_date,epss_model_version, andkev_feed_dateso a feed bump (or an EPSS model-version change) surfaces as drift.- This means the same source artifacts produce different bundle hashes on different days when enrichment is on. That is the intent: drift in the EPSS feed is meaningful, not noise.
- If a downstream signer needs a single durable hash, sign the
base bundle (built without
--enrich) and ship the enriched view as a separate, time-bound delta. The base bundle is byte-stable across runs on identical inputs.
14 · Risk-weighted verdict only downgrades (T6.6)¶
sdlc-evidence run --risk-mode epss-weightedre-derivesrelease_statususing EPSS + KEV signal already in the bundle.- The mode can only make the verdict worse (
ready→conditional→not_ready). It will never promote anot_readybase verdict toreadyjust because the present CVEs happen not to be exploitable. Missing required evidence is its own gap and must be addressed at the evidence layer, not waved away. - A CVE on evidence marked
reachability.status == "not_reachable"is removed from the exploitable count. The collector trusts the upstream reachability tool; it does not second-guess the verdict.
15 · Reachability is not re-derived by the collector (§3.2)¶
- The
Reachabilityfield is populated by external tools (CodeQL reachability, Endor Labs, Semgrep Pro, manual review). - The collector records
status(reachable/not_reachable/unknown),source, andmethod. It does NOT compute reachability and will NOT contradict an upstream verdict, even when the data looks wrong. unknownis treated as "could be exploitable" in risk weighting so a missing reachability tool cannot silently suppress a real signal.
17 · Watch daemon postponed to v2.1 (T6.9)¶
- The roadmap pairs the GUAC adapter with a
sdlc-evidence watchdaemon (webhook receiver, durable cursor, delta bundles for continuous ATO). That subcommand is deferred from v2.0 to v2.1 because it requires an optional[watch]extra (FastAPI + uvicorn + watchdog) and durable cursor persistence the file-first collector deliberately avoids. - The GUAC adapter and the CRA / FedRAMP profiles cover the immediate regulator-driven use cases for set/2026. The watch daemon is a continuous-ATO accelerator, not a v2.0 blocker.
16 · Multi-VEX consumer trusts the upstream verdict (T6.7)¶
sdlc-evidence vex --consume <file>merges external VEX (OpenVEX, CycloneDX VEX, CSAF) into the bundle-derived statements.- The merger does not validate that the consumed VEX file was
signed or that the source is authentic. Pipelines that need that
guarantee must verify signatures before piping the file into
--consume. - Conflict policy (
first-wins,last-wins,fail) decides what happens when sources disagree.failis the right choice when an unexpected disagreement is itself a finding. - SPDX VEX is deferred (low industry adoption). Tracked in the v2.1 backlog.