Python tarfile Path Traversal Bypass (CVE-2026-7774): data_filter Can Be Circumvented with Crafted Symlinks

Python tarfile Path Traversal Bypass (CVE-2026-7774): data_filter Can Be Circumvented with Crafted Symlinks

A path traversal bypass in Python’s tarfile module, tracked as CVE-2026-7774, allows a malicious tar archive to circumvent the data_filter — the security mechanism designed to prevent path traversal during archive extraction. By using crafted link entries including symlinks with empty or directory-like names, an attacker can redirect later archive members outside the intended extraction directory. The vulnerability affects Python’s standard library and any application that uses tarfile.extractall() with the data filter on untrusted archives.

What Is the Vulnerability?

CVE-2026-7774 is a bypass of Python’s tarfile.data_filter, which was introduced in Python 3.12 (PEP 706) as the recommended security filter to prevent path traversal attacks during tar archive extraction. Prior to the data_filter, tarfile had been the subject of CVE-2007-4559 — a classic path traversal that allowed archive members to write files outside the destination directory using absolute paths or ../ sequences.

The data_filter was designed to block these attacks by intercepting and sanitising each archive member’s path before extraction. However, the bypass exploits a subtle interaction: crafted link entries — including symlinks with empty or directory-like names — can be used to redirect later archive members to paths outside the extraction directory, even though each individual member passes the data filter’s path checks. An attacker can construct a tar archive where an early entry creates a symlink that points outside the destination, and a subsequent entry follows that symlink to write a file to an arbitrary location.

The vulnerability is classified under CWE-22 (Improper Limitation of a Pathname to a Restricted Directory):

  • CVSS v3.1 Score: 7.5 (High) — estimated based on path traversal impact
  • Attack Vector: Local — requires the victim to extract a malicious archive
  • Privileges Required: None (PR:N)
  • User Interaction: Required (UI:R) — user must extract the archive
  • Impact: Arbitrary file write subject to the permissions of the extracting process

Which Versions Are Affected?

The vulnerability affects Python’s standard library tarfile module:

  • Python 3.12 and later — all versions where tarfile.data_filter is available

The data_filter was introduced in Python 3.12. Versions prior to 3.12 do not have the data_filter and rely on other extraction modes — but those versions were already known to be vulnerable to path traversal (CVE-2007-4559). This CVE specifically affects users who upgraded to Python 3.12+ and believed the data_filter protected them.

The vulnerability is tracked in the CPython issue tracker at: https://github.com/python/cpython/issues/149486

Is It Being Exploited in the Wild?

No active exploitation has been publicly reported at the time of writing. However, path traversal in archive extraction is a well-known attack vector, and the Python tarfile module is used by thousands of applications — web frameworks, data processing pipelines, package managers, and DevOps tools — that extract user-supplied tar archives. The specific bypass technique is now publicly documented, and any application that uses tarfile.extractall() with the data_filter on untrusted input should assume the filter alone is insufficient.

What Is the Fix?

A fix has been developed in the CPython project. The patch addresses the symlink redirection bypass in the data filter. Python users should:

  • Monitor the CPython issue tracker for the official patch release and apply the Python update when available
  • In the interim, do not rely solely on tarfile.data_filter for security on untrusted archives
  • Consider additional mitigations: extract archives in a temporary directory, validate extracted paths against an allowlist of expected paths, or use operating-system-level sandboxing (containers, chroot, or seccomp) when processing untrusted archives

Recommendations

Do not trust tarfile.data_filter alone for untrusted archives. Until the CPython fix is released and deployed, treat the data_filter as a defence-in-depth measure, not a security boundary. Any application that extracts user-supplied tar archives should implement additional path validation.

Apply the defence-in-depth measures now:

  • Extract archives to a dedicated temporary directory with restricted permissions
  • After extraction, validate that every extracted file’s real path (resolving all symlinks) falls within the intended destination directory
  • If possible, disable symlink extraction entirely when processing untrusted archives by using the filter parameter with a custom filter that rejects symlink and link members
  • Run archive extraction in a minimal-privilege context — a dedicated service account or container with no write access outside the intended extraction directory

Audit your Python applications for tarfile usage. Identify all code paths where tarfile.extractall() or tarfile.extract() is called on user-supplied or externally-sourced tar archives. Prioritise applications that extract archives automatically — CI/CD pipelines, package upload handlers, backup restoration tools — over interactive use cases where the user is expected to trust the archive source.

Apply the Python update when released. Monitor the CPython issue tracker and Python release announcements. Deploy the patched Python version to all environments where untrusted tar archives are processed.

References


This advisory is covered in the broader Vulnerability Intelligence Report — June 4, 2026. For a comprehensive view of all active threats and newly disclosed vulnerabilities, refer to the full report.

Connect with me

Enter your Email address if you want to connect and receive threat modeling updates (I won’t spam you or share your contact details).

AND / OR

Try my threat modeling tool, it's completely free to use.

Thanks for signing up!