Faculty of Computer Science
University of New Brunswick
550 Windsor St, Head Hall E128
Fredericton, NB, E3B 5A3
|UNB ISCX Intrusion Detection Evaluation DataSet|
In network intrusion detection (IDS), anomaly-based approaches in particular suffer from accurate evaluation, comparison, and deployment which originates from the scarcity of adequate datasets. Many such datasets are internal and cannot be shared due to privacy issues, others are heavily anonymized and do not reflect current trends, or they lack certain statistical characteristics. These deficiencies are primarily the reasons why a perfect dataset is yet to exist. Thus, researchers must resort to datasets which they can obtain that are often suboptimal. As network behaviors and patterns change and intrusions evolve, it has very much become necessary to move away from static and one-time datasets toward more dynamically generated datasets which not only reflect the traffic compositions and intrusions of that time, but are also modifiable, extensible, and reproducible.
At ISCX, a systematic approach to generate the required datasets is introduced to address this need. The underlying notion is based on the concept of profiles which contain detailed descriptions of intrusions and abstract distribution models for applications, protocols, or lower level network entities. Real traces are analyzed to create profiles for agents that generate real traffic for HTTP, SMTP, SSH, IMAP, POP3, and FTP. In this regard, a set of guidelines is established to outline valid datasets, which set the basis for generating profiles. These guidelines are vital for the effectiveness of the dataset in terms of realism, evaluation capabilities, total capture, completeness, and malicious activity. The profiles are then employed in an experiment to generate the desirable dataset in a testbed environment. Various multi-stage attacks scenarios were subsequently carried out to supply the anomalous portion of the dataset. The intend for this dataset is to assist various researchers in acquiring datasets of this kind for testing, evaluation, and comparison purposes, through sharing the generated datasets and profiles.
To simulate user behavior, the behaviors of our Center's users were abstracted into profiles. Agents were then programmed to execute them, effectively mimicking user activity. Attack scenarios were then designed and executed to express real-world cases of malicious behavior. They were applied in real-time from physical devices via human assistance; therefore, avoiding any unintended characteristics of post-merging network attacks with real-time background traffic. The resulting arrangement has the obvious benefit of allowing the network traces to be labeled. This is believed to simplify the evaluation of intrusion detection systems and provide more realistic and comprehensive benchmarks.
The full research paper outlining the details of the dataset and its underlying principles is accessible here.
Ali Shiravi, Hadi Shiravi, Mahbod Tavallaee, Ali A. Ghorbani, Toward developing a systematic approach to generate benchmark datasets for intrusion detection, Computers & Security, Volume 31, Issue 3, May 2012, Pages 357-374, ISSN 0167-4048, 10.1016/j.cose.2011.12.012.
|Friday||11/6/2010||Normal Activity. No malicious activity||16.1|
|Saturday||12/6/2010||Normal Activity. No malicious activity||4.22|
|Sunday||13/6/2010||Infiltrating the network from inside + Normal Activity||3.95|
|Monday||14/6/2010||HTTP Denial of Service + Normal Activity||6.85|
|Tuesday||15/6/2010||Distributed Denial of Service using an IRC Botnet||23.4|
|Wednesday||16/6/2010||Normal Activity. No malicious activity||17.6|
|Thursday||17/6/2010||Brute Force SSH + Normal Activity||12.3|