Curation guideline
This information is intended to provide guidelines on how data are curated for the OSSelot project and how contributing works. The curator should be familiar with their preferred scanning tool (ours is Fossology) and have a general understanding of copyright law and in particular knowledge of FOSS licensing.
Note: Whenever information is given that is specific to Fossology, it is prepended with the keyword fossy.
Preparation
- Obtain the component in source code form.
- Note the download URL.
- Naming convention:
- Try to follow the project’s naming and version convention, e.g. as given by the release’s git tag.
- If this is not consistent, use only lowercase letters.
- [package name]-[version number], e.g. angular-15.1.0.
- Analyze the component with a license scan tool (e.g. Fossology, Scancode).
- fossy: Fossology default settings for analysis:
- 7. Select optional analysis:
- Upload from file
- Copyright/Email/URL/Author Analysis
- Monk License Analysis, scanning for licenses performing a text comparison
- Nomos License Analysis, scanning for licenses using regular expressions
- Ojo License Analysis, scanning for licenses using SPDX-License-Identifier
- 10. ScanCode Toolkit, scan for
- License
- Copyright
- 7. Select optional analysis:
- Scancode default options for analysis:c: copyrights; l: licenses; i: file information; --license-text: include full license text
scancode -cli --license-text –json [package name-version].json [package]
- fossy: Fossology default settings for analysis:
Data curation
- A licensing expert reviews and analyzes the scanning results.
- Fossology can directly be used to review the results. The Scancode results must be reviewed with an external tool, e.g. Opossum.
- Review is done on file level, i.e. every file in the source code tree for which at least one scanner found a result is analyzed.
- fossy: In Fossology, you can browse through the relevant files by selecting "Go through all files with licenses and no clearing result".
- That means:
- scanner findings are confirmed, or
- scanner findings are corrected.
- If there are no findings for a file, the conclusion is NO ASSERTION (for SPDX tag LicenseConcluded).
- fossy: In Fossology, this is given by the clearing decision types "No license known" or "Irrelevant" or "Non-functional".
LicenseComments
In case a license conclusion is not obvious, the decision is explained.
- This is done with the following heuristic:
The information in the file is: "[Quote licensing information in the source code file]" [Give reason for conclusion] Therefore, [license] is concluded.
- Example 1: No version
The information in the file is: "This file is GPL'd." As no version of the GPL is given, GPL-1.0-or-later is concluded.
- Example 2: URL for license text
The information in the file is: "This file is licensed under License A. You can find the license text at https://www.LicenseTextOfLicenseA.com." The URL contains the license text of License A, therefore License A is concluded. The information was retrieved on [date].
- fossy: In Fossology, the explanations are given in the "Comment" section which maps to the SPDX tag LicenseComments.
Correcting scanner findings
The following list includes typical cases where scanner findings have to be corrected and how to do so.
Not a license
The scanner concludes a license from an expression in a file that is not actually a license expression at all. In this case, the incorrect license finding is removed.
- fossy: In Fossology, the source of the scanner finding is highlighted when clicking on the number (#1) behind the scanner.
Not the file's license
The scanner concludes a license from a license expression that is part of the file’s content but not the license of the file itself. In this case, the incorrect license finding is removed.
License text
Files that contain only a license text (e.g. COPYING) are concluded by the scanners to be licensed under the respective license. This is usually not correct. Most license texts are not explicitly licensed, so the finding is removed. The GNU licenses contain a license statement for the license text itself which is concluded for these cases (License-of-GNU-licenses).
Imprecise finding
The scanner finding might be imprecise, e.g. w.r.t. to the version of a license, e.g. no version number is given. If this is the case, the imprecise finding is removed and the specified license and version is concluded. If no version is given, the lowest existing version with the -or-later extension is concluded.
Dual licensing
A file might offer a choice of two or more licenses under which it can be used. If the context requires to chose one specific license, this choice must be noted. However, all applicable licenses must be concluded. Also, dual license cases require additional post-processing, see section "Post-processing" below.
- fossy: In Fossology, add the following text to the "Acknowledgement" section of the "Dual-license" finding to note the license choice, if applicable:
To the extend files may be licensed under License A or License B, in this context License B has been chosen. This shall not restrict the freedom of other users to choose either License A or License B. For convenience, all license texts are provided.
License exceptions
In particular for the GNU licenses, there are a number of license exceptions.
- fossy: Fossology notes the license and the exception as separate findings. This is corrected to one finding using the SPDX license expression [License] WITH [exception], e.g. GPL-2.0-or-later WITH GCC-exception-2.0.
- fossy: If the Fossology license database does not yet contain these licenses, they have to be added.
Generic license texts
For some licenses, especially the BSD-type licenses, many variants of the license texts exist. The scanners often provide only the generic license texts. If an individual text differs from the generic text, the individual license text is provided.
- fossy: In Fossology, click percentage of match to see differences.
- fossy: The individual text is copied from the file into the "License" section of Fossology.
External references
Sometimes the file does not contain the name or text of a license but references an external resource such as a COPYRIGHT file in the root directory or a URL. In these cases, the external reference is checked and the detected license is concluded and the process is documented as a LicenseComment (in case of a URL, the date of access is noted).
(Partially) global license assignment
Sometimes there is a Readme file or similar that contains a statement assigning a license to several files within the source tree (e.g. all files in a specific directory). As such information is often outdated or does not account for individual licensing of files, it is not used to assign a license to a file here.
Acknowledgment
If a license has an acknowledgment requirement, the respective acknowledgment text is given. In particular for CC_BY licenses, the acknowledgment must contain the following information (if available): name of the creator, copyright notice, license notice, disclaimer, link to the material.
- fossy: In Fossology, the acknowledgment text is given in the "Acknowledgement" section.