Hagit
Grushka-Cohen

Simulating user activity for assessing the effect of sampling on DB activity monitoring anomaly Detection

Ben-Gurion University

Hagit Grushka-Cohen

Hagit
Grushka-Cohen

Simulating user activity for assessing the effect of sampling on DB activity monitoring anomaly Detection

Ben-Gurion University

Hagit Grushka-Cohen

Bio

Hagit Grushka-Cohen is a PhD student at Ben-Gurion university, the department of software and information systems, under the supervision of prof. Lior Rokach and prof. Bracha Shapira. Her PhD topic applied Machine Learning in the domain of cyber security.

Hagit won the prestigious IBM fellowship award for her work on risk assessment and working towards automatic policy calibration twice. During her PhD Hagit collaborated with IBM Guardium and IBM, IBM Cyber Center of Excellence which led to several ML papers (including CIKM). Prior to starting her PhD Hagit was a project manager in Adama and a BI projects leader in Stanly works.

Bio

Hagit Grushka-Cohen is a PhD student at Ben-Gurion university, the department of software and information systems, under the supervision of prof. Lior Rokach and prof. Bracha Shapira. Her PhD topic applied Machine Learning in the domain of cyber security.

Hagit won the prestigious IBM fellowship award for her work on risk assessment and working towards automatic policy calibration twice. During her PhD Hagit collaborated with IBM Guardium and IBM, IBM Cyber Center of Excellence which led to several ML papers (including CIKM). Prior to starting her PhD Hagit was a project manager in Adama and a BI projects leader in Stanly works. 

Abstract

Anomaly detection systems monitoring high velocity streams such as Database Activity Monitoring (DAM) or network packet traffic are constrained by resources. The sheer number of transactions in a database restricts DAM systems to examining only a sample of the activity. Such solutions use manually expert-crafted policies to decide which transactions to monitor and log. This skew the data collected, over representing specific subsets of the data such as high risk users, while under representing the rest of the population which may never be sampled.

Previous work focused on sampling methods for optimizing detection. To enable evaluation of the effect on anomaly detection and the efficacy of sampling methods we created a simulator for user data base activity. We redefine the problem as a special case of Multi-Armed Bandit (MAB) and propose two novel algorithms ,C-epsilon Greedy Strategy, based on random exploration and Expert-Init-Gibbs, based on expert input for initializing to avoid cold start.

Experiments on the simulated data show that anomaly detection using the baseline rule-based sampling method fail to effectively detect malicious events. The experiments show that C -epsilon Greedy Strategy performs well for this task, alerting on malicious events and maximizing knowledge on user profiles. I will share a GitHub of all the code and, can include in my talk a deep code dive for how it is implemented.

Abstract

Anomaly detection systems monitoring high velocity streams such as Database Activity Monitoring (DAM) or network packet traffic are constrained by resources. The sheer number of transactions in a database restricts DAM systems to examining only a sample of the activity. Such solutions use manually expert-crafted policies to decide which transactions to monitor and log. This skew the data collected, over representing specific subsets of the data such as high risk users, while under representing the rest of the population which may never be sampled.

Previous work focused on sampling methods for optimizing detection. To enable evaluation of the effect on anomaly detection and the efficacy of sampling methods we created a simulator for user data base activity. We redefine the problem as a special case of Multi-Armed Bandit (MAB) and propose two novel algorithms ,C-epsilon Greedy Strategy, based on random exploration and Expert-Init-Gibbs, based on expert input for initializing to avoid cold start.

Experiments on the simulated data show that anomaly detection using the baseline rule-based sampling method fail to effectively detect malicious events. The experiments show that C -epsilon Greedy Strategy performs well for this task, alerting on malicious events and maximizing knowledge on user profiles. I will share a GitHub of all the code and, can include in my talk a deep code dive for how it is implemented. 

Planned Agenda

Planned Agenda