ClearlyDefined at SAP: enhancing Open Source license compliance through Open Source data

By Brian Duran, Qing Tomlinson and contributors (SAP)

Like other organizations invested in Open Source software, SAP takes great care in understanding the composition and potential risks of Open Source projects that it might contribute to and/or use.

In 2018, SAP began its engagement with ClearlyDefined, the OSI Open Source project that centralizes and curates data for Open Source software licenses, during the project’s early stages and launched an internal pilot program to evaluate its capabilities soon thereafter.

Figure 1 – Timeline of SAP’s involvement with ClearlyDefined

ClearlyDefined has since become an integral part of SAP’s internal Free and Open Source (FOSS) compliance process, aiding in the acquisition of important package metadata required for risk evaluation.  

Below are a few key benefits we’ve seen since incorporating ClearlyDefined into our FOSS compliance process:

  • Less time scanning, more time focusing on compliance: The data is available and easily consumable via open APIs, making it easy to integrate into internal risk modeling systems. 
  • Better data quality: There is a chance that a package has already been reviewed and curated by another member of the community. Multiple pairs of eyes are better than one!
  • The community: There are inherent benefits to working with other organizations and individuals toward a shared goal of improving Open Source compliance, and this is even easier when there are common platforms and systems established to facilitate this sort of crowdsourcing effort.

From pilot to production: harvesting at scale

ClearlyDefined supports an impressive and growing number of harvest sources.

One initial challenge we encountered was how exactly to send the large volumes of FOSS requested by our developers to ClearlyDefined and get the results back. We knew that everything done downstream with the resulting data would depend heavily on our ability to do this task reliably and at scale. 

We began with a small pilot group within our Open Source compliance organization that would submit FOSS to ClearlyDefined via the harvest API and record back some basic information (e.g. coordinate, status, etc.). It was a 100% manual task at the beginning, but eventually, some automated tooling was developed for the task.

This automated tooling was simple at first, consisting of a few locally run scripts. But by 2020, it had evolved into a fully integrated service working alongside our other internal compliance systems. 

Figure 2 – High-level architecture of SAP’s internal solution for automatic harvesting

Some of the earliest repositories to be supported by ClearlyDefined as a harvesting source include GitHub, NPM and Maven. FOSS in less known or privately maintained locations may not be harvestable through ClearlyDefined. For these sorts of cases, we devised a separate process to handle them outside of ClearlyDefined.  

Review: a consumption use-case

ClearlyDefined uses a shared harvest queue – meaning your request goes in with everyone else’s. 

While there may be feasible ways to internalize such workloads (e.g., by running your own harvesting crawler instance), our initial business requirements did not necessitate this level of complexity.

Knowing when a package moves through the queue is crucial (e.g., goes from partially to fully harvested) so you know when it’s appropriate to consume the data. Mechanisms, including a basic change notification service based on Azure Blob Storage, exist for this purpose.

Once a package has been harvested, its location in ClearlyDefined (called a “coordinate”) is recorded into our internal FOSS database for future reference and put into a pipeline for our compliance specialists to review.  

The compliance specialist will inspect the package in ClearlyDefined, leveraging available data points (e.g., declared, discovered licenses, in conjunction with others) to perform a risk assessment. 

If there’s an appropriate opportunity to rectify missing or incorrect information, our compliance specialists will attempt to use ClearlyDefined’s unique curation mechanism to submit a pull request to update the data.

SAP also contributes heavily to the development and technology side of the project. In many cases, what we learn through our own efforts internally can also partly drive our public contributions to the project, and we regularly collaborate with others in the community to provide development and technical support.

For example, a UI bug or data quality issue reported internally by our compliance specialists may in some cases necessitate changes or improvements to ClearlyDefined’s code base (e.g., bug fixes, optimizations, etc.)

We’re always looking at new and innovative ways to use the data ClearlyDefined provides, including to solve new and emerging business challenges, as well as how to contribute back to the project in meaningful ways. The most ideal cases will generally be the ones where synergies exist that can lead to innovation on both sides. SAP’s Open Source compliance operations have benefited greatly from ClearlyDefined, realizing an estimated* 30-50% reduction in review turnaround time when compared to the former process. We intend to continue contributing as part of the project’s community and help improve ClearlyDefined and the quality of its data.

*Based on the average processing time per FOSS component before adopting ClearlyDefined as an analysis tool and after, including time and effort saved through automation and standardization.

Authors

  • Brian Duran

    Brian leads the implementation strategy for adoption of ClearlyDefined within SAP’s open source compliance teams. He has a combined 12 years of experience in open-source software compliance and data quality management.

    View all posts
  • Qinq Tomlinson

    Qing is a software developer from SAP. She has been contributing to the ClearlyDefined Project since 2021.

    View all posts

Disclaimer: All published articles represent the views of the authors, they don’t represent the official positions of the Open Source Initiative, even if the authors are OSI staff members or board directors.

One response to “ClearlyDefined at SAP: enhancing Open Source license compliance through Open Source data”

  1. […] ClearlyDefined at SAP: enhancing Open Source license compliance through Open Source data […]

Author

Support us

OpenSource.net is supported by the Open Source Initiative, the non-profit organization that defines Open Source.

Trending