View on GitHub

Validating static call graph-based malware signatures using community detection methods

About

Abstract

" Due to the increasing number of new malware appearing daily, it is impossible to manually inspect each sample. By applying data mining techniques to analyze the program code, we can help manual processing. In this paper we propose a method to extract signatures from the executable binary of a malware, in order to query the local neighborhood in real time. The method is validated by applying community detection algorithms on the common fingerprints-based malware graph to identify families, and assessing these with evaluation metrics used in the field (e.g. modularity, family majority, etc.). The signatures are obtained via static code analysis, using function call n-grams and applying locality-sensitive hashing techniques to enable the match between functions with highly similar instruction lists. "

Keywords

static call graph, n-gram features, locality-sensitive hashing, malware communities, family clustering

Fig. 1.Prevalent methods and feautures used for malware analysis (based on [1])
Fig. 2.Call graph-based malware analysis (based on [1])

Figures

Fig. 3.Louvain communities of the common fingerprints-based malware graph
Fig. 4.Call graphs of two samples within the same community, having the same family, and ~4800 common signatures
[]

References