G-Hypeddings
1. Overview
G-hypeddings is a Python library designed for graph hyperbolic embeddings, primarily utilized in detecting cybersecurity anomalies. It includes 06 distinct models with various configurations, all of which utilize hyperbolic geometry for their operations. The library is built on top of the PyTorch framework.
1.1. Models
The models can be divided into three main categories based on the model's overall architecture namely Shallow models (Poincaré), Convolutional-based models (HGCN & HGNN), and Autoencoder-based models (HGCAE & PVAE).
Name | Year | Encoder | Decoder | Manifold | Ref |
---|---|---|---|---|---|
Poincaré | 2017 | / | MLP | Poincaré Ball | [1] |
HGNN | 2019 | HGCN | MLP | Poincaré Ball, Lorentz | [2] |
HGCN | 2019 | HGCN | MLP | Lorentz | [3] |
P-VAE | 2019 | GCN | MLP | Poincaré Ball | [4] |
H2H-GCN | 2021 | HGCN | MLP | Lorentz | [5] |
HGCAE | 2021 | HGCN | HGCN | Poincaré Ball | [6] |
In this library, we provide a variety of binary classifiers, clustering algorithms, and unsupervised anomaly detection algorithms to use with the autoencoder-based models (HGCAE & PVAE). All of these are Scikit-learn models tuned using the Grid-Search technique.
Name | Type |
---|---|
Support Vector Machine (SVM) | Binary Classifier |
Multilayer Perceptrone (MLP) | Binary Classifier |
Decision Tree | Binary Classifier |
Random Forest | Binary Classifier |
AdaBoost | Binary Classifier |
K-Nearest Neighbors (KNN) | Binary Classifier |
Naive Bayes | Binary Classifier |
Agglomerative Hierarchical Clustering (AHC) | Clustering Algorithm |
DBSCAN | Clustering Algorithm |
Fuzzy C mean | Clustering Algorithm |
Gaussian Mixture | Clustering Algorithm |
k-Means | Clustering Algorithm |
Mean shift | Clustering Algorithm |
Isolation Forest | Anomaly Detection Algorithm |
One-class SVM | Anomaly Detection Algorithm |
Local Outlier Factor | Anomaly Detection Algorithm |
DBSCAN | Anomaly Detection Algorithm |
k-Means | Anomaly Detection Algorithm |
1.2. Datasets
The following intrusion detection datasets were used to test and evaluate the models. Our code includes all the pre-processing steps required to convert these datasets from tabular format into graphs. Due to usage restrictions, this library provides only a single graph of each dataset, with 5,000 nodes, already pre-processed and normalized.
Name | Ref |
---|---|
CIC-DDoS2019 | [7] |
AWID3 | [8] |
CIC-Darknet2020 | [9] |
NF-UNSW-NB15-V2 | [10] |
NF-BoT-IoT-V2 | [11] |
NF-ToN-IoT-V2 | [12] |
NF-CSE-CIC-IDS2018-V2 | [13] |
2. Installation
git clone https://gitlab.liris.cnrs.fr/gladis/ghypeddings.git
mv ghypeddings\ Ghypeddings\
3. Usage
Training and evaluating a model using our library is done in lines of code only!
3.1. Models
from Ghypeddings import PVAE
# adj: adjacency matrix
# features: node features matrix
model = PVAE(adj=adj,
features=features,
labels=labels,
dim=20,
hidden_dim=features.shape[1],
test_prop=.2,
val_prop=.1,
epochs=50,
classifier='random forest')
# fit the model and outputs the training scores
loss, accuracy, f1,recall,precision,roc_auc,training_time = model.fit()
# prediction scores
loss,acc,f1,recall,precision,roc_auc = model.predict()
3.2. Datasets
from Ghypeddings import Darknet
# Build a graph of 5000 nodes from the Darknet dataset
adj ,features ,labels = Darknet().build(n_nodes = 5000)
# The graph is already loaded automatically after executing the previous line of code
# This method saves time and helps comparing results
# it simply loads graphs built and saved previously
adj, features, labels = Darknet().load_samples()
4. Citation
Mohamed Yacine Touahria Miliani, Souhail Abdelmouaiz Sadat, Mohammed Haddad, Hamida Seba, and Karima Amrouche. 2024. Comparing Hyperbolic Graph Embedding models on Anomaly Detection for Cybersecurity. In Proceedings of the 19th International Conference on Availability, Reliability and Security (ARES '24). Association for Computing Machinery, New York, NY, USA, Article 118, 1–11. https://doi.org/10.1145/3664476.3670445
5. References
[1]: Nickel, Maximillian, and Douwe Kiela. "Poincaré embeddings for learning hierarchical representations." Advances in neural information processing systems 30 (2017). [2]: Liu, Qi, Maximilian Nickel, and Douwe Kiela. "Hyperbolic graph neural networks." Advances in neural information processing systems 32 (2019). [3]: Chami, Ines, et al. "Hyperbolic graph convolutional neural networks." Advances in neural information processing systems 32 (2019). [4]: Mathieu, Emile, et al. "Continuous hierarchical representations with poincaré variational auto-encoders." Advances in neural information processing systems 32 (2019). [5]: Dai, Jindou, et al. "A hyperbolic-to-hyperbolic graph convolutional network." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021. [6]: Park, Jiwoong, et al. "Unsupervised hyperbolic representation learning via message passing auto-encoders." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021. [7]: CIC-DDoS2019 [8]: AWID3 [9]: CIC-Darknet2020 [10]: NF-UNSW-NB15-V2 [11]: NF-BoT-IoT-V2 [11]: NF-ToN-IoT-V2 [13]: NF-CSE-CIC-IDS2018-V2