Page MenuHomePhabricator

Migrate foreign-resources files to CDX SBOM format
Open, Needs TriagePublic

Description

To make it easier for scanners to detect out of date and vulnerable dependencies. It also avoids inadvertently reinventing a SBOM format as seen in here.

CDX has been decided as the format we go with in T361943: Decide on a Software Bill of Materials (SBOM) format for MediaWiki
Overview of the spec: https://cyclonedx.org/specification/overview/
Examples of cdx files: https://cyclonedx.org/use-cases/

According to the spec the name of the file should be either bom.json or *.cdx.json. I prefer foreign-resources.cdx.json to make it clear it's CDX.

List of WMF-deployed repos with foreign resources files:

  • core
  • mediawiki/extensions/3D
  • mediawiki/extensions/Citoid
  • mediawiki/extensions/CodeEditor
  • mediawiki/extensions/CodeMirror
  • mediawiki/extensions/DiscussionTools
  • mediawiki/extensions/EventLogging
  • mediawiki/extensions/Graph
  • mediawiki/extensions/GrowthExperiments
  • mediawiki/extensions/ProofreadPage
  • mediawiki/extensions/TimedMediaHandler
  • mediawiki/extensions/VisualEditor
  • mediawiki/extensions/WikiLambda

Non-WMF repos:

  • samwilson/diagrams-extension

Event Timeline

The entry for Vue 3.3.9 could be represented as follows:

# resources/lib/foreign-resources.yaml
vue:
  license: MIT
  homepage: https://vuejs.org/
  authors: Yuxi (Evan) You
  version: 3.3.9
  type: tar
  src: https://registry.npmjs.org/vue/-/vue-3.3.9.tgz
  integrity: sha512-sy5sLCTR8m6tvUk1/ijri3Yqzgpdsmxgj6n6yl7GXXCXqVbmW2RCXe9atE4cEI6Iv7L89v5f35fZRRr5dChP9w==
  dest:
    package/README.md:
    package/LICENSE:
    package/dist/vue.global.js:
    package/dist/vue.global.prod.js:
{
	"$schema": "http://cyclonedx.org/schema/bom-1.5.schema.json",
	"bomFormat": "CycloneDX",
	"specVersion": "1.5",
	"serialNumber": "urn:uuid:773b643c-d560-4109-8ccf-66a97fccd8bd",
	"version": 1,
	"metadata": {
		"timestamp": "2024-05-03T22:26:33+03:00"
	},
	"components": [
		{
			"name": "vue",
			"version": "3.3.9",
			"scope": "required",
			"hashes": [
				{
					"alg": "SHA-512",
					"content": "b32e6c2c24d1f26eadbd4935fe28eb8b762ace0a5db26c608fa9faca5ec65d7097a956e65b64425def5ab44e1c108e88bfb2fcf6fe5fdf97d9451af974284ff7"
				}
			],
			"licenses": [
				{
					"expression": "MIT"
				}
			],
			"purl": "pkg:npm/[email protected]",
			"type": "framework",
			"bom-ref": "pkg:npm/[email protected]",
			"properties": [
				{
					"name": "mw-dest",
					"value": "package/README.md package/LICENSE package/dist/vue.global.js package/dist/vue.global.prod.js"
				}
			],
			"externalReferences": [
				{
					"type": "distribution",
					"url": "https://registry.npmjs.org/vue/-/vue-3.3.9.tgz"
				}
			]
		}
	]
}

Multi-file resources such as unit need a representation of the files, maybe using the (verbose?) components https://cyclonedx.org/docs/1.6/json/#components_items_components or using a custom entry in properties...

qunitjs:
  license: MIT
  homepage: https://qunitjs.com
  authors: OpenJS Foundation and other contributors
  version: 2.20.0
  type: multi-file
  # Integrity from link modals at https://code.jquery.com/qunit/
  files:
    qunit.js:
      src: https://code.jquery.com/qunit/qunit-2.20.0.js
      integrity: sha256-qDPbT8viaD6gCaTrUd7FOE5lU5S9H5U5bbuQMADnrRY=
    qunit.css:
      src: https://code.jquery.com/qunit/qunit-2.20.0.css
      integrity: sha256-NN2oro7r8aYcRYA4fnBbijdLRS7Rc4Pz9ikcmlwWvl4=

foreign-resources.yaml contains the following metadata for a resource:

  • name
  • license
  • URL (e.g. website)
  • version
  • credit

CycloneDX contains the following metadata for a resource (among many others):

  • name
  • type (probably always library) for the use cases relevant to us
  • authors/supplier/manufacturer/publisher: various ways to provide structured credits
  • version
  • license
  • URL(s)

These seem mostly mappable in both directions. Author and URL might not be trivial to automate because the CDX version is more structured, but manually it should still be easy.

foreign-resources.yaml contains the following information for each file:

  • the URL to download from (or the URL of a tarball and a path within, possibly as a path pattern with a * for variable parts like version numbers)
  • the destination within the MediaWiki repository
  • a hash

CycloneDX contains the following information:

  • a single top-level hash for the entire library

CycloneDX has an extension mechanism but it's very limited, only allows adding string key-value pairs to a component. So I don't think there's a way to capture the foreign-resources.yaml data in a CDX file. (SPDX actually seems a little closer, it has per-file hashes at least.)

So I think we need to either keep both foreign-resources.yaml and a CDX file, or keep the foreign-resources.yaml file, be a little loose about the author and URL translation, and generate the CDX data on the fly.

One thing to consider is that we could only store hash of the library altogether, not each file. It shouldn't be too hard.

Change #1027190 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/core@master] foreign-resources: Add CycloneDX export support

https://gerrit.wikimedia.org/r/1027190

Tested on core: P61866
It passes JSON Schema validation at least.

I need to ask a stupid question, aren't we making code more vulnerable to possible attacks by listing all dependencies we have, including with versions as a web resource?

Most of this information is already possible to extract by checking our git repo, but allowing any bot to retrieve current information by fetching a URL resource seems a bit scary.

Tested on core: P61866
It passes JSON Schema validation at least.

I generally like this, just it would be great to have PURL too. If possible.

I need to ask a stupid question, aren't we making code more vulnerable to possible attacks by listing all dependencies we have, including with versions as a web resource?

Most of this information is already possible to extract by checking our git repo, but allowing any bot to retrieve current information by fetching a URL resource seems a bit scary.

STO is highly discouraged in cybersecurity: https://en.wikipedia.org/wiki/Kerckhoffs's_principle

I generally like this, just it would be great to have PURL too. If possible.

We'd either have to add a new field to foreign-resources.yaml then, or try to parse file URLs and figure out what kind of package manager they correlate with (probably not worth it).

Most of this information is already possible to extract by checking our git repo, but allowing any bot to retrieve current information by fetching a URL resource seems a bit scary.

Eh, I think it's pretty trivial for most would-be attackers to pull any of the lockfiles from public wikimedia repos, and that's all they would ever really need.

STO is highly discouraged in cybersecurity: https://en.wikipedia.org/wiki/Kerckhoffs's_principle

STO can be a marginally helpful tool in a limited number of scenarios. But no, it should never be the predominant or singular mitigation.

Change #1027588 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/core@master] foreign-resources: Add purl field

https://gerrit.wikimedia.org/r/1027588

Change #1027190 merged by jenkins-bot:

[mediawiki/core@master] foreign-resources: Add CycloneDX export support

https://gerrit.wikimedia.org/r/1027190

Change #1027588 merged by jenkins-bot:

[mediawiki/core@master] foreign-resources: Add purl field

https://gerrit.wikimedia.org/r/1027588

Change #1034991 had a related patch set uploaded (by Sportzpikachu; author: Sportzpikachu):

[mediawiki/core@master] foreign-resources: Add purl field to vue-demi

https://gerrit.wikimedia.org/r/1034991

Change #1034991 merged by jenkins-bot:

[mediawiki/core@master] foreign-resources: Add purl field to vue-demi

https://gerrit.wikimedia.org/r/1034991