Quantcast
Channel: Joab Jackson, Author at The New Stack
Viewing all articles
Browse latest Browse all 697

Python Exposes Phantom Dependencies With SBOM Screening

$
0
0

In June, Python programmer Seth Michael Larson was hired by the Python Software Foundation to act as security-developer-in-residence. But he was already working on what Python’s thorniest security problems are: Hidden software dependencies, or phantom dependencies.

The term “phantom dependencies” was coined by Endor Labs in 2023 to describe code embedded in an application that was not declared in any sort of manifest file, thus making it invisible to vulnerability scanners.

Every open source software package should have a manifest of some sort, listing all the libraries and third-party packages used in that application. It could be a simple text file with the name “requirements.txt” or “setup.py” or some such.

Then, a good scanner can compile a list of vulnerabilities in every software package, matching the list of dependencies against a list of known vulnerabilities.

But these files often just have an incomplete list of dependencies. Especially in Python.

In a talk at PyCon 2025 earlier this year, Larson looked at a sample Python imaging library, called Pillow. Running the Python pip freeze command returns only the name of the file itself. (Larson gave a similar talk at The Linux Foundation‘s Open Source Conference in June).

Yet, Pillow is packed with dependencies. A quick look through Python’s site-packages folder for an installed copy of Pillow reveals a long list of libraries (libfreetype, libtiff, libjpeg…)

The trouble is manifests for Python programs today cover only Python libraries that are in the program’s package. A requirement.txt file only has Python packages, not those written in other languages.

Scanners look for metadata. They don’t scan all the files in a directory — that would be too much work. But they need a complete list.

What Are Phantom Dependencies?

Almost all phantom dependencies exist for perfectly logical reasons.

One cause is a common practice called bundling, or vendoring. Bundling means the software has been shipped with lots of dependencies, but with little or no documentation about their existence. Many Python software packages have not one but two dependency trees, one for Python packages and one for libraries written in other languages.

Another practice that leads to stealth dependencies is static linking, where all the libraries are bundled into one binary.

“All of that code is being put into one application, and sometimes it’s really hard to like tease out what the exact versions of what’s actually getting used,” Larson said.

Bundled libraries are necessary for a number of reasons. One is bootstrapping. A program like Python’s PIP — a package for installing Python programs — can’t assume much in the way of supporting software on the target machine.

“You don’t have to assume as much about the environment if you just take all your dependencies and put it into your project,” he said.

Python itself has a rule not to use more than one version of a package in a virtual environment. So bundling also becomes necessary in cases when you want to make sure a specific  version of a package is available.

The Python Wheels package installer is another big source of phantom dependencies.

A “wheel” in this context is a ZIP-format archive providing everything needed to install a Python application across many different types of Linux distributions, without compiling the code itself.

It assumes the user only has a minimum number of libraries already installed, but by doing so creates a lot of duplicate, and undocumented, code on the target machine.

Finally, Python itself is a culprit in the spawning of hidden dependencies. It is, after all, a glue language, one that borrows a lot of libraries from other languages such as JavaScript, Rust and C/C++.

“There is so much non-Python code inside of Python packages,” Larson said.

But until recently, Python packaging software had no way to represent non-Python resources. What the community really needed was a language-agnostic manifest that would record all the metadata for scanners, not just the Python code.

Enter the SBOM.

SBOMs

SBOM stands for Software Bill of Materials. It is an inventory of all the components, dependencies, and metadata of a software application. IT came about as a way to prevent software supply chain attacks, though SBOMS will also be quite useful in meeting Europe forthcoming EU Cyber Resilience Act (CRA). The Linux Foundation’s SPDX (Software Package Data Exchange) has become a de facto standard in this space.

The best thing about SBOMs is that they are technology-agnostic. An SBOM scanner such as Syft gathers information about every package, not just Python packages — as long as it is documented in a machine-readable file somewhere.

“We can use software build material standards to record the metadata we need about any software dependency not just Python dependencies,” he said.

Larson had previously done some work on an SBOM for the CPython, the default Python implementation, which has a huge number of dependencies.

This led him to create PEP 770, an SBOM directory for Python packages, which was accepted in April by the Python Steering Council as a core Python component.

He also worked on supporting PEP 770 into the Wheels packaging mechanism AuditWheel, which can start using this directory immediately (./dist-info/sboms). Scanners can then use the info in their directory to build their own manifests and reports.

screenshot.

This PIP package of Pillow was created by AuditWheel with the SBOM support.

The actual “bill of materials” file includes info such as the name of the app, the version, what type of app it is (a library, etc), all packaged in the JSON format.

The next step is to get the vulnerability scanners to recognize the manifest and compare the results to their own list of vulnerabilities.

For package maintainers, Larson suggests starting with assembling an SBOM itself, in order to get a feel for what types of dependencies are involved.

To root out vulnerabilities, use a scanner that covers all software, not just Python. Larson recommends Gripe. If you know the environment is all Python, you can use pip-audit.

View the entire PyCon talk here:

The post Python Exposes Phantom Dependencies With SBOM Screening appeared first on The New Stack.

PEP 770 provides a directory to document all of a Python's dependencies, not just those written in Python.

Viewing all articles
Browse latest Browse all 697

Trending Articles