The latest cybersecurity trends, best practices, security vulnerabilities, and more
Tarfile: Exploiting the World With a 15-Year-Old Vulnerability
By Trellix · September 21, 2022
This story was also written by Kasimir Schulz
While investigating an unrelated vulnerability, Trellix Advanced Research Center stumbled across a vulnerability in Python’s tarfile module. Initially we thought we had found a new zero-day vulnerability. As we dug into the issue, we realized this was in fact CVE-2007-4559. The vulnerability is a path traversal attack in the extract and extractall functions in the tarfile module that allow an attacker to overwrite arbitrary files by adding the “..” sequence to filenames in a TAR archive.
Over the course of our research into the impact of this vulnerability we discovered that hundreds of thousands of repositories were vulnerable to this vulnerability. While the vulnerability was originally only marked as a 6.8, we were able to confirm that in most cases an attacker can gain code execution from the file write. In the video below, we show how we were able to get code execution by exploiting Universal Radio Hacker:
The purpose of this blog is to dive into the technical details of the vulnerability and to show how easy it is for an attacker to write an exploit for the vulnerability. Over the course of the blog we will also explore the process of writing a tool to automatically detect the tarfile vulnerability in source code by leveraging the power of the AST intermediate representation. Finally, the post will walk you through how we exploited a popular open-source repository, using the path traversal attack to perform code execution.
The tarfile vulnerability
Tarfiles are a collection of multiple different files and metadata which is later used to unarchive the tarfile. The metadata contained within a tar archive includes but is not limited to information such as the file name, the size and checksum of the file and information about the owner of the file when the file was archived. In the Python tarfile module this information is represented by the TarInfo class which is generated for every “member” in a tar archive. These members can represent many different types of structures in a filesystem from directories, symbolic links, files, and more.
In the image above we can see a snippet of code from the extract function in the tarfile module. This code snippet shows how the filename is constructed before being passed to the function that extracts and writes the file to the filesystem. The code explicitly trusts the information in the TarInfo object and joins the path that is passed to the extract function and the name in the TarInfo object allowing an attacker to perform a directory traversal attack.
Since the extractall function relies on the extract function, as seen above, the extractall function is also vulnerable to the directory traversal attack.
The tarfile exploit
For an attacker to take advantage of this vulnerability they need to add “..” with the separator for the operating system (“/” or “\”) into the file name to escape the directory the file is supposed to be extracted to. Python’s tarfile module lets us do exactly this:
The tarfile module lets users add a filter that can be used to parse and modify a file’s metadata before it is added to the tar archive. This enables attackers to create their exploits with as little as the 6 lines of code above.
After discovering that the tarfile module was still vulnerable, we wanted a way to automatically check repositories for the vulnerability so that we could assess the extent of the vulnerability. To do this we built Creosote, a Python script that recursively looks through directories scanning for .py files and then analyzing them once they have been found. After analyzing files, Creosote will print out any files that may contain vulnerabilities, sorting them into 3 categories based on confidence level (Vulnerable, Probably Vulnerable, Potentially Vulnerable).
In order to analyze the python code, Creosote leverages the Python ast library which allows the script to traverse through constructs in the source code rather than attempting to crudely parse text and figure out spacing in the scripts to find the vulnerabilities. Using the ast library and the NodeVisitor structure Creosote can quickly filter away many extract and extractall functions that may have nothing to do with the vulnerability by only analyzing those that belong to an Attribute node. We can make this distinction since both extract and extractall are instance methods and will always appear in the code base along with the archive object (e.g., tar.extractall()).
After finding an Attribute node, Creosote looks for the two most common code patterns that the team found while analyzing this vulnerability:
If Creosote finds an Attribute node with extractall it will backtrack and try to check if open was also called and check for whether the second argument or opening mode was set to “r”. Depending on how many criteria get hit the script marks the line as vulnerable, probably vulnerable, or potentially vulnerable.
Another common occurrence was for an extract to happen within a for loop when iterating through all the members in the file. While the previous case could have been done simply by looking at the lines rather than the ast representation, this gets more difficult with the loop since the members can be looped through many ways. In the snippet below grabbed from Universal Radio Hacker the for loop loops through members that are enumerated using the enumerate function. By leveraging the intermediate interpretation, Creosote can detect loops through getmembers in any form it may appear.
Exploiting the tarfile vulnerability in the real world
Spyder IDE is a free and open-source scientific environment written for Python that can be run on Windows and macOS.
While running Creosote, we discovered that there was a vulnerability in the Spyder repository:
After looking through the codebase, we discovered that the load_dictionary function gets called whenever a user imports a .spydata file inside of the variable explorer. Spydata files are used to save and transfer variables between different projects and scripts and can be shared between multiple people.
Now that we knew we could use the import data button to upload a malicious file, we needed to test it. The first step in this process was to try to find out where the extraction directory was located. Prior to calling extractall we can see that the program calls chdir on a temporary folder it creates, chdir changes the current working directory to the folder that is passed to it. This indicates that Spider is trying to extract to a temporary folder. After looking into what mkdtemp does we were able to discover that on Windows the function creates a temporary file inside of C:\Users\
Now with the understanding of where the uploaded file lands, it’s time to see if we can use the directory traversal attack to get outside of the directory. To do this we uploaded a .spydata file that should write to C:\Users\
After importing the .spydata file we can see that we now have a hacked.txt file inside of the AppData directory which means our attack worked, however we also get an error that appears in Spyder:
We now have two goals: to exploit the IDE without an error appearing, and to turn our system write into code execution. Luckily, the error message we received had some extra details that helped us solve both problems. The details told us the exception as well as which line of code caused the exception, this happened to be the line right after the extractall call. After a quick look we can see that the program expects one .pickle file to have been extracted from the .spydata file. The details also help us solve the issue of how to get code execution by showing us that the python files being run by the program reside under AppData\Local and can therefore be modified/overwritten without needing administrator privileges.
After looking through the codebase we decided that the best place to add code would be in mainwindow.py since the code was always run when the application was opened and was run before the main window was created but after the application had checked to see if it could run. We then copied the file and added some code to pop up a message box before creating a new .spydata file, this time with a valid .pickle file for the variable importer:
After importing the new .spydata file the IDE loads up valid variables and the program continues uninterrupted rather than throwing an error message. Once the program is reopened, we now get the popup that we had programmed into the start of the program:
While code execution by itself can be devastating, we did not want to stop there. Watch the demo video below to see how we added code to try and social engineer the attacked user to give the attacker code execution with administrator privileges:
The tarfile exploit lets an attacker escalate the file write to code execution on more than just Windows. Watch the video below to see how we were able to exploit Polemarch, an IT infrastructure management service running on Linux and Docker:
As we have demonstrated above, this vulnerability is incredibly easy to exploit, requiring little to no knowledge about complicated security topics. Due to this fact and the prevalence of the vulnerability in the wild, Python’s tarfile module has become a massive supply chain issue threatening infrastructure around the world. Our team at Trellix is leading the charge by patching as many open-source repositories as possible as well as providing a way to scan closed source repositories. We hope that you will join us in our attempt to strengthen the security of code bases around the world.
Feb 21, 2024
Trellix Named to Constellation ShortLists for XDR and Endpoint Protection Platforms
Feb 15, 2024
Trellix to Host AI and Cybersecurity Virtual Summit
Feb 15, 2024
Trellix to Host Public Sector Cybersecurity Summit
Feb 9, 2024
Trellix Named a Leader in IDC MarketScape for Modern Endpoint Security for Midsize Businesses
Jan 25, 2024
Trellix Achieves AWS Small and Medium Business Competency
The latest from our newsroom
Trellix’s leading extended detection and response (XDR) platform and endpoint security solutions build cyber resiliency and Security Operations efficiencies for global organizations
Get the latest
We’re no strangers to cybersecurity. But we are a new company.
Stay up to date as we evolve.
Zero spam. Unsubscribe at any time.