Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include unlicensed files in scanner results #9435

Open
kikofernandez opened this issue Nov 15, 2024 · 3 comments
Open

Include unlicensed files in scanner results #9435

kikofernandez opened this issue Nov 15, 2024 · 3 comments
Labels
configuration About configuration topics enhancement Issues that are considered to be enhancements scanner About the scanner tool

Comments

@kikofernandez
Copy link

kikofernandez commented Nov 15, 2024

What is the existing functionality and how should it be enhanced?

Currently, the scanner does not include files without licenses.

Problems

  • those files may not be detected by the developer (unless one knows that this is the default behaviour)
  • developers cannot apply curation rules, because curation rules are not applied to undetected file licenses.

Improvement
To make sure that all files are considered in the scanning phase, and that those can potentially be curated, I propose:

  • add an option (include-unlicensed: boolean) to the scan phase to include in the object scan_results.summary.licenses all files that were found to NOT have a license. Since this happens when calling the scan phase, these results should be recorded in the scan-result.json file.
    • The benefit of doing this, is that ORT now would allow curation of files with license NONE (or whatever is the default for an unknown license)

What is the use-case for your enhancement?

Source SBOMs may need to include all files in a repo. At the moment, generation of SBOM includes also the files without license, but one cannot have the option to curate files that should have a specific license. By adding the flag include-unlicensed: true, the scanner includes unlicensed files in ORT scanning result and gives the possibility to developers to curate those files, if needed.

As an example, projects with a single license at the top can enable this to include all files with NONE license, and apply a curation to all files that should have MIT license.

curations:
    license_findings:
      - path: "**/*.exs"
        reason: "INCORRECT"
        comment: "Apply license to all unknown files"
        detected_license: "NONE"
        concluded_license: "MIT"

      - path: "**/*.ex"
        reason: "INCORRECT"
        comment: "Apply license to all unknown files"
        detected_license: "NONE"
        concluded_license: "MIT"

I believe this is a quite common case, examples include the Elixir programming language (https://github.com/elixir-lang/elixir), Gleam
(https://github.com/gleam-lang/gleam), Django Web Framework (this shows an example of a file without license, so no license applied AFAIK), Rails Web Framework (Rails) where each folder contains the expected license that applies

Alternatives you have considered

I have a script that parses ORT scanner for files with licenses and all files with SHA1. Takes the set difference and adds the missing files to the corresponding scanner field with license NONE. This works, but I am not sure how maintainable it is in the future. It means I need to run ORT analysis and scanner, then run a custom script, then run the evaluator to get some results and apply curations.

Additional context

--

@kikofernandez kikofernandez added enhancement Issues that are considered to be enhancements to triage Issues that need triaging labels Nov 15, 2024
@sschuberth sschuberth added reporter About the reporter tool and removed to triage Issues that need triaging labels Nov 15, 2024
@fviernau
Copy link
Member

The list of all files is already included, but not under scan_results.summary.licenses, because that would cause redundancy in case multiple scanners are run. The OrtResult data model, contains FileLists, under ScannerRun.files. These however, are currently not connected to the license finding curation mechanism.

@sschuberth
Copy link
Member

These however, are currently not connected to the license finding curation mechanism.

Couldn't the fix be as simple as for license findings curations to allow the matcher for detected_license: "NONE" to also match files absent from license findings (as these can be regarded to have an implicit license of NONE) but that are present in file lists?

@kikofernandez
Copy link
Author

kikofernandez commented Nov 15, 2024

Yes, this is a simpler solution and does the job. I meant the addition of a configuration option to avoid breaking existing semantics.
Having no curation to NONE makes it to preserve behaviour. Adding a curation matching to NONE makes it opt-in :)

Essentially, @sschuberth yes, that is the same semantics I proposed but cleaner! :)

@sschuberth sschuberth added scanner About the scanner tool configuration About configuration topics and removed reporter About the reporter tool labels Nov 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
configuration About configuration topics enhancement Issues that are considered to be enhancements scanner About the scanner tool
Projects
None yet
Development

No branches or pull requests

3 participants