ADR-0007: Proposal How to Mark Findings With Hashes to Find Duplicates
Status: | PROPOSED |
Date: | 2020-11-25 |
Author(s): | Sven Strittmatter Sven.Strittmatter@iteratec.com |
NOTE: The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
Context​
We need the possibility to find duplicate findings. One use case is that we want to accept a finding and want to ignore the same finding in the future.
Assumptions​
- The execution order of hooks is unspecified.
- The information if a finding’s hash is a duplicate MUST NOT be stored or maintained in the SCB S3 storage.
- The SCB MUST NOT remove findings: read-write-hooks may alter them, but never delete or filter them out.
- Maybe a read-hooks MAY decide to not store a finding into an external system.
Decision​
- We generate a hash for each finding so we can compare findings by the hash and identify duplicates.
- This hash MUST be mutable and MAY be altered by read-write-hooks because we don’t want to introduce an exceptions to what a read-write-hooks can alter.
- The parser MUST generate the initial hash of a finding from some of it’s attributes (e.g. name, lication, category …).
- Each scanner MUST define a default set of attributes used for the hashing.
- This set of hashed attributes MAY be overwritten.
- Each read-write-hooks MUST update the hash as last step because the hook MAY changed a hashed attribute.
We implement the hashing step in the parser first with feature flag to evaluate this proposal.
Consequences​
- We don’t need to introduce an ordering for the read-write-hooks.
- The duplicate detection/handling MUST be done in another service with its own data storage. This is because we have no stable hash until the read-hooks will be executed and these MUST NOT alter the data in SCB itself. But the read-hooks MAY decide to not store data into an external system.