Using HTML5 to prevent detection of drive-by-download web malware

Introduction

The web is experiencing an explosive growth. New technologies are introduced at a very fast-pace with the aim of narrowing the gap between web-based applications and traditional desktop applications. However, these advancements come at a price. The same technologies can also be used to implement web malware able to evade common detection techniques. In our article we present some obfuscation techniques based on HTML5, which can be used to deceive popular malware detection systems. The proposed techniques have been experimented on a reference set of obfuscated malware. Our results show that the malware rewritten using our obfuscation techniques go undetected while being analyzed by a large number of detection systems. Indeed, the same detection systems were able to correctly identify the same malware in its original unobfuscated form. We also provide some hints about how the existing malware detection systems can be enhanced in order to cope with these new threats.

Drive-by Download Malware

One of the most common attack spread through the web is the drive-by download, which developed into exploit kits. In these attacks, the unaware user visits a web page containing malicious code, typically written in JavaScript. The malicious code acquires information on the context where it is executed in order to determine which exploits can be used to compromise the system. If a vulnerable component is found, the corresponding exploit is launched. In case of success, another piece of malware, such as a trojan, is downloaded and executed on the compromised machine.

Several techniques have been proposed so far for detecting web malware. The classical approach is that based on signatures, typically used by antivirus software and intrusion detection systems. This is not very effective against web malware, which can leverage the high dynamicity of JavaScript for obfuscation. A much more effective approach is that based on honeyclients, such as Wepawet, which are able to execute the suspicious code in an emulated environment in order to inspect instructions and data at runtime.

HTML5 Obfuscation

Our obfuscation techniques leverage some HTML5 APIs in order to deliver and reassemble malicious code in a web page. As preliminary phase, the malicious code is split in a series of chunks. The chunks can be arbitrary small and individually undetectable. When the victim visits the infected web page, an arbitrary complex procedure based on HTML5 is executed in order to reassemble and execute the original malware. This approach allows to avoid typical (de)obfuscation patterns detected by static and dynamic analysis. In particular, we show 3 techniques.

Delegated preparation.

This technique allows to avoid (at all or partially) the activities related to the deobfuscation phase by delegating them to the web browser internals, through the WebSQL API or the IndexedDB API. The idea is to split the malicious code into a series of chunks and to recompose it at runtime, as typically occurs for simple (de)obfuscation routines. The difference here is that each chunk is stored in a table entry on the local browser database. Then, when the attack has to take place, the retrieval and preparation of the malicious code is delegated to the database engine through a properly crafted selection query. The same result can be also achieved by means of FileReader and Blob APIs.

Distributed Preparation.

Typically, the operations driving the deobfuscation and the execution of a malware would look harmless in themselves but harmful if considered as a whole. The distributed preparation technique aims at deceiving detection systems by breaking-up the execution of a malware code in several simpler pieces to be executed separately in different contexts. Each piece of code would execute its part of the attack and, then, make available the result to the next part. This result can be achieved by executing independent malware activities in different threads through web workers. Moreover, in order to further confuse detection systems, the communication patterns to follow during the execution of the attack would not be established statically but decided at runtime, by evaluating a function that would decide which other web worker would be the target of a communication at the end of a certain step.

User-driven Preparation.

The user-driven technique is a variant of the distributed preparation technique. Here, the activities related to the preparation and to the execution of a malware are spread across the time that a victim user spends visiting a single page or a collection of pages, rather than being concentrated in few milliseconds. Moreover, in order to avoid the predictability of the sequence, the execution of the single activities is not automatic but it is triggered by the (unaware) user himself. The content of the page can be organized in such a way that the victim has to perform an exact sequence of steps in order to enjoy the content of the page (e.g., playing a game). By following this sequence, the victim unintentionally drives the execution of the malware.

All the samples published in this repository have been analyzed by means of Wepawet and VirusTotal, showing a very low detection ratio.

Useful links

Using HTML5 to prevent detection of drive-by-download web malware Full paper, paywall article
Using HTML5 to prevent detection of drive-by-download web malware Preliminary version, free
HTML5 Obfuscation Examples A copy of the source code used in our experimentations

Authors

Giancarlo De Maio
Lastline, Inc.
https://www.linkedin.com/in/gdemaio
gdemaio@lastline.com

Alfredo De Santis
University of Salerno, Italy
http://www.di.unisa.it/professori/ads/ads/Home.html
ads@dia.unisa.it

Umberto Ferraro Petrillo
University of Rome - La Sapienza, Italy
http://umbfer.googlepages.com
umberto.ferraro@uniroma1.it