Traceability — promise ↔ detection ↔ evidence ↔ result¶
This document maps every public promise the collector makes (documented in
README.md, CHANGELOG.md, SECURITY.md, and docs/bundle_schema.md) to:
- Where it is implemented (module / function).
- Which scenario proves it works (fixture or recorded lab scan).
- Expected verdict for that scenario.
- Recorded (obtained) verdict at the last validation pass.
- Negative / edge / misuse counterpart (where applicable).
Last refreshed: 2026-05-05 (maturity/higiene rodada on top of the
v1.1.0 release-readiness merge in main, commit 17dede2).
Refresh notes — 2026-05-17¶
What was re-run in this rodada (post-publication hardening branch
chore/post-publication-hardening-v1-1-1, evidence captured under
output/publication-2026-05-17/):
- All 7 lab scenarios (
examples/labs/01..07) — every lab produced the expectedrelease_status = not_readywith--fail-on not_readyexit2. Critical-missing counts: 4 for labs 01, 02, 03, 05, 06, 07 and 5 for lab 04 (which lacks an SBOM attestation). The §4 row for01-core-saas-labwas updated from the previous "(6 missing critical)" — a stale number inherited from before the catalog consolidatedSAMM-IMPL-SB-2andSAMM-VERIF-ST-1intopartial. - Sample-release positive —
ready 13/13, coverage 100, confidence 59 (matches §3, §4). - Sample-release negative (
--fail-on not_ready) —not_ready, 4 missing critical (ORG-CODE-REVIEW,ORG-RELEASE-APPROVAL,ORG-REL-ROLLBACK,SSDF-PS.2), exit2. - Self-release dogfood (
make run-self-release) —ready 13/13, coverage 100, confidence 54 (manual attestations). The generated bundle underoutput/self_release/was regenerated locally to match the v1.1.0 schema (now includes theclassificationfield introduced in fase 7). - Cross-OS determinism — the snapshot test now produces digest
a42920b3…on both Windows and Linux after the 2026-05-17 fix tonormalize_bundle(POSIX-normalizesevidence[*].raw.artifact_pathat hash time). SeeCHANGELOG.mdFixed — 2026-05-17 andtests/unit/test_integrity.py.
What was not re-run:
- Docker build (daemon not active on the validation host).
trivyandcross-osCI matrix (require external tooling / multi-runner CI). Both still run as required checks insecurity-ci-cd.ymlandgithub-ci-cd.yml.
Refresh notes — 2026-05-05¶
What was re-run from this matrix in this rodada:
- Sample-release positive fixture (
examples/sample_release/) —python -m evidence_collector.cli.main runproducedrelease_status = ready, coverage 100, confidence 59, 13/13 controls met, evidence count 13. The four output directories used wereoutput/cursor-validation-{run,collect,evaluate,oscal,schema}and were cleaned at the end of the rodada. - Sample-release negative scenario (no
--attestations-dir) — producedrelease_status = not_ready, coverage 47, confidence 100, 6 missing, 2 partial, 5 met, exit code2by design. Missing critical controls observed:ORG-CODE-REVIEW,ORG-RELEASE-APPROVAL,ORG-REL-ROLLBACK,SSDF-PS.2. This matches §4 row "A critical control lacks required evidence". compareagainst the generated sample —output/sample_release/bundle.jsonvs. the newoutput/cursor-validation-run/bundle.jsonreportedcoverage_delta=0,confidence_delta=0, all 13 controlsunchanged. Theready -> readyheadline is preserved across the rename (the canonical sample still ships under thepayments-api / 2026.04.10release context, so the bundle IDs differ — that is by design, not a regression).oscalandschemaexports — both succeeded with deterministic UUID/JSON output as documented in §5.
What was not re-run in this rodada:
examples/labs/*— the recorded lab scans were not re-driven. None of the lab fixtures changed since 2026-04-24, so the §3 lab rows are reported as "matches" by inheritance from the previous pass. To revalidate, runbash scripts/scan_all_labs.shfrom a shell with the original lab corpus mounted (Bash on Windows is available via Git Bash; the helper script does not need to be rewritten for PowerShell).- The GitHub / GitLab collectors against live APIs — no token was
set in this rodada (
doctor --jsonreports both asabsent (collectors will skip SCM)). The unit and contract tests for both collectors are exercised by the regularpytestrun documented inrelease-readiness.md. - Docker build — Docker daemon not active in this pass. The §4 fixture rows that depend on the container fall back to the local CLI invocation.
Refresh outcome: every row in §1, §2, §3 and §4 is consistent with
what the CLI produced on 2026-05-05. No row needed an obtained ≠
expected correction.
1 · Ingestion promises¶
| Promise (README/CHANGELOG) | Implementation | Positive fixture | Negative / edge fixture | Expected | Obtained |
|---|---|---|---|---|---|
| Parse SARIF (Semgrep, CodeQL, Sonar, Snyk Code, Trivy, Grype, Gitleaks, Bandit, pip-audit) | parsers/sarif.py, normalizers/engine.py |
examples/sample_release/artifacts/semgrep.sarif, trivy.sarif, gitleaks.sarif |
tests/unit/test_parsers.py::test_sarif_malformed, test_sarif_missing_runs |
parse succeeds on known tools; unknown driver → evidence_type=sast_scan (conservative default) |
matches |
| Parse CycloneDX SBOM | parsers/sbom.py |
examples/sample_release/artifacts/sbom.cdx.json |
tests/unit/test_parsers.py::test_sbom_invalid_json |
evidence_type=sbom, components counted |
matches |
| Parse SPDX SBOM | parsers/sbom.py |
covered via unit test fixture | same | evidence_type=sbom |
matches |
| Parse JUnit XML | parsers/junit.py |
examples/sample_release/artifacts/junit.xml |
tests/unit/test_parsers.py::test_junit_defused (XXE guard) |
evidence_type=test_result, pass/fail counted |
matches |
| Parse OWASP ZAP JSON (DAST) | parsers/zap.py |
examples/sample_release/artifacts/zap-baseline.json |
tests/unit/test_zap.py::test_zap_missing_site |
evidence_type=dast_scan, alerts bucketed by risk |
matches |
| Parse YAML/JSON attestations | parsers/attestation.py |
examples/sample_release/attestations/*.yaml |
attestation with unknown key → extra='forbid' raises |
evidence_type derives from kind field |
matches |
| 25 MB safety cap | parsers/_common.py::read_bounded |
tests/unit/test_parsers.py::test_sarif_oversize |
large SARIF → raises ArtifactTooLargeError |
reject > 25 MB | matches |
2 · Classification promises (SARIF → canonical evidence_type)¶
| Promise | Rule | Source | Expected | Obtained |
|---|---|---|---|---|
Semgrep → sast_scan |
driver name contains "semgrep" | normalizers/engine.py::classify_sarif_driver |
label=sast_scan |
matches |
Trivy → sca_scan (dominant runs) |
driver name contains "trivy" and bulk rules target packages | same | label=sca_scan; secret runs reclassified in post |
edge case documented in docs/limitations.md §2 |
Gitleaks → secrets_scan |
driver name contains "gitleaks" | same | label=secrets_scan |
matches |
Bandit → sast_scan |
driver name contains "bandit" | same | label=sast_scan |
matches |
pip-audit → sca_scan |
driver name contains "pip-audit" | same | label=sca_scan |
matches |
Grype → sca_scan |
driver name contains "grype" | same | label=sca_scan |
matches |
| Unknown driver | default branch | same | label=sast_scan with confidence=low tag in rationale |
matches |
3 · Control evaluation promises¶
| Control | Framework | Criticality | Required evidence | Recommended evidence | Fixture positive | Fixture negative |
|---|---|---|---|---|---|---|
| SSDF-PW.7 | NIST_SSDF | high | sast_scan |
— | sample_release | examples/labs/01-core-saas-lab (no treatment → partial) |
| SSDF-PW.4 | NIST_SSDF | high | sca_scan |
— | sample_release | labs (syft+trivy in artifacts) |
| ORG-SECRETS-SCAN | ORG_INTERNAL | high | secrets_scan |
— | sample_release | labs |
| SSDF-PS.3 | NIST_SSDF | critical | sbom |
artifact_attestation, artifact_signature |
sample_release | labs (sbom but no signatures → partial) |
| SSDF-PW.8 | NIST_SSDF | high | test_result |
— | sample_release | labs (no JUnit → missing) |
| ORG-CODE-REVIEW | ORG_INTERNAL | critical | code_review |
pr_metadata |
sample_release | labs (missing critical) |
| SSDF-PW.1 | NIST_SSDF | medium | threat_model |
— | sample_release | labs (missing → conditional) |
| ORG-RELEASE-APPROVAL | ORG_INTERNAL | critical | release_approval |
— | sample_release | labs (missing critical) |
| ORG-REL-ROLLBACK | ORG_INTERNAL | critical | rollback_plan |
— | sample_release | labs (missing critical) |
| SSDF-PS.2 | NIST_SSDF | critical | artifact_signature |
artifact_attestation |
sample_release | labs (missing critical) |
| SAMM-DESIGN-TA-1 | OWASP_SAMM | medium | threat_model |
— | sample_release | labs (missing → conditional) |
| SAMM-IMPL-SB-2 | OWASP_SAMM | high | sbom |
artifact_signature, artifact_attestation |
sample_release | labs (sbom only) |
| SAMM-VERIF-ST-1 | OWASP_SAMM | high | sast_scan |
sca_scan, dast_scan |
sample_release | labs (sast+sca no dast) |
4 · Release-status rule promises¶
| Scenario | Expected verdict | Fixture | Obtained |
|---|---|---|---|
| Every critical + high has required evidence | ready |
examples/sample_release/ (full) |
ready, 13/13 |
| Only recommended evidence is missing | conditional |
examples/sample_release/ with artifact_attestation.yaml removed |
conditional (reproduced in tests/integration/test_end_to_end.py::test_conditional_when_only_recommended_missing) |
| Only medium-criticality controls lack evidence | conditional |
drop threat_model.yaml |
conditional |
| A critical control lacks required evidence | not_ready |
sample_release without --attestations-dir |
not_ready (4 missing critical: ORG-CODE-REVIEW, ORG-RELEASE-APPROVAL, ORG-REL-ROLLBACK, SSDF-PS.2) |
| Vulnerable lab scanned without attestations | not_ready |
examples/labs/01-core-saas-lab/ |
not_ready (4 missing critical: ORG-CODE-REVIEW, ORG-RELEASE-APPROVAL, ORG-REL-ROLLBACK, SSDF-PS.2) — re-validated 2026-05-17 against output/publication-2026-05-17/labs/01-core-saas-lab/ |
| Vulnerable lab with weaker SBOM coverage | not_ready |
examples/labs/04-data-batch-lab/ |
not_ready (5 missing critical: the same four plus SSDF-PS.3 because no SBOM attestation is present) |
Vulnerable lab where sca_scan is present |
not_ready |
examples/labs/06-industry-regulated-lab/ |
not_ready (4 missing critical; one extra control met because pip-audit evidence raises SSDF-PW.4 to met) |
5 · Determinism & stability promise¶
| Promise | Test | Expected | Obtained |
|---|---|---|---|
Two runs on identical inputs produce identical bundle.json (after stripping generated_at, evaluated_at, bundle_id) |
tests/integration/test_end_to_end.py::test_bundle_determinism |
diff returns empty | matches |
compare command reports zero deltas for identical bundles |
tests/integration/test_cli.py::test_compare_identical_bundles |
no improvements/regressions | matches |
6 · Security promises¶
| Promise | Implementation | Test |
|---|---|---|
| No token ever logged | collectors/github.py, collectors/gitlab.py (no log(token) calls; headers redacted) |
tests/unit/test_github_collector.py::test_token_not_logged |
| XML parsed without external entities | parsers/junit.py uses defusedxml |
tests/unit/test_parsers.py::test_junit_defused |
Pydantic extra='forbid' on every canonical type |
domain/models.py model_config |
tests/unit/test_domain_models.py::test_bundle_rejects_extra_fields |
--output-dir does not escape to parent |
application/orchestrator.py normalizes path |
tests/unit/test_exporters.py::test_output_dir_boundary |
7 · Known gaps still open¶
See docs/limitations.md. Each entry here is a known false-positive or
false-negative risk that the collector does not cover and is documented
as part of the public contract.