Blogs
The latest cybersecurity trends, best practices, security vulnerabilities, and more
The Anatomy of HTML Attachment Phishing: One Code, Many Variants
By Mathanraj Thangaraju, Niranjan Hegde, and Sijo Jacob · June 14, 2023
Introduction
Phishing is the malevolent practise of pretending to be a reliable entity in electronic communication to steal sensitive data, such as login credentials or credit card numbers. Email is a popular platform for phishing attacks due to how easy it is for bad actors to execute an email phishing campaign. HTML (Hypertext Markup Language) files are one of the most common attachments used in such attacks , as HTML attachments can bypass some email security filters and are often seen as less suspicious than other types of file attachments, such as executable files.
HTML attachments may contain links that redirect users to phishing pages, or download malware, or steal login credentials through phishing forms. To avoid detection by security products, attackers use techniques such as redirecting users to multiple malicious websites, obfuscating the code, and encoding sensitive information using methods such as the "unescape()" function. And we see this trend of reliance on HTML files for phishing attacks continuing to surge in 2023.
Trellix Advanced Research Center has been actively monitoring phishing campaigns employing HTML attachments with a Microsoft theme thanks to telemetry available in Trellix Email Security. Starting in the middle of 2022, we observed a surge in this campaign using HTML attachments to target and steal login information from numerous users worldwide. On comparing the telemetry available for Q4-2022 and Q1-2023, we see a rapid increase of over 1030% across multiple industries, with high-tech, manufacturing, and healthcare sectors being the main targets. Notably, the United States, South Korea, and Germany have been identified as the primary countries being targeted by such campaigns.
This blog will take a closer look at the inner-workings of these attacks and how the attackers are regularly updating the HTML file with different obfuscation techniques to bypass security products.
Phishing samples from the wild
As noted, the Trellix Advanced Research Center has tracked various HTML attachment campaigns since last year. The following are just a handful of the samples our team found in the wild:
Sample 1
The email is a fake DocuSign request asking the victim to eSign the attached HTML attachment which on execution leads to phishing page.
Sample 2
The email contains a nested email attachment which has the malicious HTML file attached.
Sample 3
Email is pretending to be from the Human Resources department and contains an HTML file disguised as an updated Employee Benefits Policy.
Sample 4
This email has the HTML attachment pretending to be the meeting review document.
Sample 5
The email is a fake conference call update with an HTML attachment impersonated as a voicemail.
Sample 6
Email includes a malicious HTML attachment disguised as a legitimate eFax message.
Inner workings of HTML attachments
In this campaign, the HTML attachment uses various obfuscation techniques and shows an intermediate page before loading the final phishing page. This is the key characteristic of the campaign. The section below illustrates how HTML attachments with no obfuscation work and the next section explains different obfuscation techniques used in this campaign.
The HTML file on execution creates a web page with two hidden input elements and a script element. One of the input elements has the Base64-encoded value of targeted user’s email address. The script element dynamically creates another script element and appends it to the head of the document. The src attribute of the dynamically created script element is set to a URL that is Base64-encoded using the atob() function. The decoded URL is used to load additional JavaScript code.
In Figure 2, The sample on the right is the basic version of the phishing page where we see that it makes a request to a URL ending with mj.js. It also contains the div elements with id b64e and b64u which contain the email id of victim and URL of the c2 server, respectively.
The initial GET request is made to the mj.php file with “ar” as the get parameter with a base64 encoded value containing the text “word”. Other base64 encoded strings we observed include “office”, “invoice”, “pdf”, “aging” and “default”.
The response received is shown in two parts. Figure 4 shows the first part of the response. First part of the response decodes the base64 payload which contains the intermediate loading page. The loaded intermediate page is shown for a few seconds before the final phishing page is loaded.
Code block 1 is a function that returns an array containing a base64 encoded date divided into three parts. Code block 2 declares various variables. The “prer” and “pre2” variables contain the base64 encoded part of the HTML which creates the head and body tag of the intermediate HTML page, respectively. It also assigns key words such as document, atob and eval to other variables. Code block 3 declares the function which decodes the values and writes it in body and head tag of the HTML page. Code block 4 executes the given data using eval. Code block 5 is used to call the function declared in code block 3. Once the script is executed, we see a loading page as shown in Figure 5.
The display of an intermediate loading page is one of the key characteristics of this attack. The attackers are trying to evade automatic detection by adding delay via this technique.
Second part:
The variable pr1 and pr2 contains the base64 encoded code which executes a POST request to get the final phishing page. It is first base64 decoded and then executed via eval.
The base64 decoded code is shown below:
Code block 1 loads the jquery library to execute the rest of the code. The code block 2 executes the function get_jwt shown in Figure 4 and extracts the value from the div with id b64u which is present in the original HTML attachment. It contains the URL of the c2 server. The code block 3 also extracts the value from div with id b64e which is present in the original HTML attachment. It contains the victim’s email id.
The code block 4 creates a post request with 4 parameters. “Scte” contains the email id. Data11, data22 and data33 when combined contains base64 encoded value of the time when the phishing page is executed. Conf value contains the value from div with id conf which is present in the original HTML attachment. Decoded base64 value is: {"back":"default","title":"default","caption":"default"}
When executed, the post request looks as below:
Just before the final phishing page is loaded with victim company logo and background, it makes the following post request with email as a parameter:
The request shown in figure 9 is responded with a json data containing links to the url for background image and logo of the victim’s company for the final phishing page.
Figure 10 shows the corresponding code responsible for making request shown in Figure 9. The code block 1 extracts the values such as the URL for c2 server and email id of the victim from the final phishing page.
The code block 2 is a unused code which is never executed. The code block 3 is makes the post request. Depending upon the data received via Json object, it makes changes to the page dynamically to load the victim’s company website logo and background image.
The below figure 11 shows the final phishing page that is seen by the end user.
Once the user enters the password then it posts the data to server handled by threat actor as seen in above figure 12.
HTML attachments: Evolving to evade
We observed that the threat actors are updating the HTML file code regularly to evade detection. By changing the code regularly and using these techniques, threat actors can make it more difficult for security products to detect and block their attacks. We see the different variants of the HTML code which perform similar activity on execution as seen in the initial base variant shown in Figure 2.
One code, many variants
We have observed HTML file undergoes various changes to evade detection. The size of the HTML attachments ranges from 3 kb to 5 kb for most of the variants.
We have covered the base variant in the section HTML attachment. Refer Figure 2 for base variant sample.
Variant 1: Accessing DOM elements
The samples part of this variant are accessing DOM elements to build the final phishing script.
In Figure 13, Figure 14 and Figure 15, the sample is using obfuscated script to execute the initial payload which loads the intermediate loading page. Like the base sample shown in Figure 2, this sample contains the email address in the div element with id b64e. It also contains the URL for the second stage payload in the div element with id b64u.
Figure 13 shows the obfuscated code creating a script which is appended to the document. The script is then executed while it is loaded into the browser. Similarly, Figure 14 is a slight variation of Figure 13 where the script element is appended to the document using “window.constructor”. Similarly, Figure 15 also shows a sample using “Array.constructor.constructor” to append script to the document.
Variant 2: Use of onload trigger and eval execution
The samples part of this variant are using onload trigger to execute phishing payload using eval and atob functions.
Figure 16 contains the script in a base64 encoded payload. The script encoded as same as the one shown in Figure 13. The payload is first base64 decoded and executed via eval function. The execution is triggered by using onload attribute of body tag.
Similarly, Figure 17 shows variation of Figure 16. In this sample, there are multiple base64 decoding and execution via eval which are concatenated together to be finally executed via eval function.
Variant 3: Use of onload trigger and accessing DOM elements
The samples belonging to this variant are using onload trigger to execute the code that accesses DOM elements without using script tag.
In Figure 18, the sample contains the script tag in body. It is triggered via onload attribute. The script uses window functionality to access various parts of the document. It accesses the div element with id temp1 and temp2. The div elements contains class names which are base64 encoded strings. The script combines both the base64 encoded string to generate the final script. The generated script is like the one shown in Figure 13.
Similarly, sample in Figure 19 executes in the equivalent manner but the script to access names of the classlist is different. It also accessed values of base64 encoded string and combines them to create the script which is like the one in Figure 13.
In Figure 20, the sample has multiple layers of Unicode encoding. Once it is decoded, the code is same as shown in Figure 18.
In Figure 21, the sample uses packing. First the script reverses the base64 encoded string and then decodes it. It is uses decodeURIComponent to escape any special characters then it is written to the HTML file using document.write.
The decoded base64 string is like the code shown in Figure 18.
Variant 4: Using onerror trigger and eval execution
The samples part of this variant are using onerror trigger to execute phishing payload using eval and atob functions.
In Figure 22, the sample contains base64 encoded string which is decoded and then executed via eval function. It is triggered using onerror attribute. The onerror attribute of img tag is executed when there is an error loading the image. In this case, the src is set to character ‘x’ instead of a valid path of an image or a URL. Thus, the code in onerror attribute is executed.
The code in the decoded base64 string is like the one shown in Figure 13.
In Figure 23, the sample is using packing to hide its original code. The unpacked version is shown in Figure 24 which is like the one shown in Figure 22.
Variant 5: Use of URI encoding
The samples belonging to this variant are using URI encoding and HTML tags such as svg, video and h5
Unlike the previous samples shown above where base64 encoded string is present, in Figure 25, Figure 26, Figure 27 and Figure 28, we see a URI encoded string being present in it. After decoding the URI encoded string, we see another obfuscated script. This script exhibits the same behaviour as other samples.
In Figure 25, the execution is triggered via onbegin attritbute of the animate tag. In Figure 26, the execution is triggered via onanimatestart attribute of h5 tag. In Figure 27, it uses onload attribute of the style tag to trigger the execution. In Figure 28, it uses onloadstart attribute to video tag to trigger the execution.
Trellix email security detection telemetry
While tracking these campaigns, we found that the countries majorly targeted are United States, South Korea, and Germany as seen in figure 19. We analysed the telemetry to understand the statistics of detection across industries and found that the high-tech, manufacturing and healthcare sectors have the highest number of detections (Figure 20).
These sectors may be more vulnerable to such attacks because they frequently handle sensitive information, including financial data, personal information, and intellectual property. In addition, these sectors frequently have complicated IT architectures with numerous ports of entry, which can make it easier for attackers to exploit holes and get unapproved access to systems and data. They may also have staff members who are less tech adept or knowledgeable about cybersecurity threats, which makes them more susceptible to phishing attempts. Therefore, it is imperative that they proactively put strong measures in place to protect their systems and networks from such phishing campaigns.
From Q4-2022 to Q1-2023, Trellix observed surge in these campaigns and a major uptick was seen towards the end of December targeting online shoppers, retailers and financial institutions who are more vulnerable due to holiday-related distractions or increased online activity.
Conclusion
Phishing attacks using HTML attachments have been steadily growing in recent years – but the surge in campaigns last year show that attackers are becoming more sophisticated in their techniques and are updating the malicious code regularly to evade detection. Threat actors are constantly evolving their tactics and techniques to improve the success rate of their phishing campaigns. In today’s dynamic threat landscape, educating users or employees about the risks of opening untrusted files, can help prevent them from falling victim to this type of attack.
Trellix product coverage
Trellix Email Security offers a multi-layered detection strategy for this campaign that includes checks on the URL, email, network, and attachment levels to ensure that any potential threat is discovered and stopped from doing harm to our customers. To remain ahead of new and changing threats, our product continuously monitors and updates its threat intelligence database to stay ahead of new and evolving threats. that includes the Trellix Multi-Vector Virtual Execution Engine, a new anti-malware core engine, machine-learning behaviour classification and AI correlation engines, real-time threat intelligence from the Trellix Dynamic Threat Intelligence (DTI) Cloud, and defences across the entire attack lifecycle to keep your organisation safer and more resilient.
HTML/Phishing.px
HTML/Phishing.rm
HTML/Phishing.rn
HTML/Phishing.ro
HTML/Phishing.rp
JS/Downloader.gh
JS/Downloader.gi
Trojan.GenericKD.66153232
Trojan.GenericKD.65926933
Trojan.Script.EBA
Trojan.GenericKD.66208721
Generic.HTML.Phishing.Q.4017B596
Trojan.GenericKD.66104272
Trojan.GenericKD.66164630
GT:JS.Clsfk.1.0D2C49A6
Trojan.GenericKD.65934454
Trojan.GenericKD.65926690
Trojan.GenericKD.65927415
Trojan.GenericKD.65923956
Detection as a Service
Email Security
Malware Analysis
File Protect
FEC_Phish_HTML_Generic_352
FEC_Phish_HTML_Generic_358
FEC_Phish_HTML_Generic_355
FEC_Phish_HTML_Generic_315
FEC_Phish_HTML_Generic_286
FE_Trojan_HTML_Phish_372
FE_Trojan_HTM_Phish_189
FE_Trojan_HTM_Phish_198
FE_Trojan_HTML_Phish_372
FE_Trojan_HTML_Phish_402
FE_Trojan_HTML_Phish_373
FE_Trojan_HTM_Phish_155
FE_Trojan_HTML_Phish_337
FE_Trojan_HTML_Phish_429
FE_Trojan_HTML_Phish_399
FE_Trojan_HTML_Phish_438
FE_Trojan_HTML_Phish_457
Phishing.HTML.PhishingMS
Phish.URL
Indicators of compromise (IoCs):
Hashes
URLs. (Pattern: admin/js/mj.php?ar=[base64])
RECENT NEWS
-
Sep 10, 2024
Trellix Integrates Email Security with Data Loss Prevention
-
Aug 21, 2024
U.S. Department of Defense Chooses Trellix to Protect Millions of Email Systems from Zero-Day Threats
-
Aug 14, 2024
Magenta Buyer LLC Raises $400 Million of New Capital
-
Aug 1, 2024
Trellix Endpoint Security Stops 100% of Threats in Leading Industry Test
-
Jul 29, 2024
Trellix Named Email Security Innovation Leader
RECENT STORIES
The latest from our newsroom
Get the latest
We’re no strangers to cybersecurity. But we are a new company.
Stay up to date as we evolve.
Zero spam. Unsubscribe at any time.