About
Authors
Abstract
"
This paper offers yet another static analysis method aimed
at classifying malware families, by disassembling the executables with
Radare2 and traversing the static call graph to train CNNs on
instruction-based RGB images. The instruction-based family detection
should have the potential to model common behavioral patterns, thus
creating a profile for various families and actors. The experiments are
carried out on the BODMAS, MalImg, and IBD (internal Bitdefender
dataset). Our method’s performance is compared to another static fea-
ture selection method – the EMBER features. Furthermore, we reveal
proof of correlation between packers and malware families in all three
datasets. Our conclusion states that the proposed model’s accuracy does
not reach the EMBER feature’s performance due to the high number
of packed files in these datasets. However, its stability still motivates
its use since the instruction-based information cannot be altered easily
as header-based features – our observations infer that while classifying
malware families, ML methods which ignore unpacking the samples may
overfit the data, learning packer traits instead of actual family behaviour,
offering no explainability over the decision.
"
Keywords
static malware analysis
,
call graph
,
control flow graph
,
Radare2
,
family classification
,
packer
,
BODMAS
,
EMBER
,
MalImg
Dataset
- published on Kaggle: https://www.kaggle.com/datasets/amester/malflow