Skip to content

Database Schema: Cyber Intelligence

This document provides a detailed overview of the MySQL 8.0 relational database used in the Cyber Sentinel project. The database, named cyber_intelligence, manages network observables, threat intelligence reports, and AI-generated verdicts.

Schema version 3.0

Breaking changes versus v2.0:

  • Threat scale reduced from 1–10 to 1–5 with an explicit is_malicious_flag column (replacing the implicit score > 5 rule).
  • action_recommended column on dic_threat_levels makes the response policy first-class data instead of business logic in n8n.
  • Composite primary keys on dns_queries, network_events, and threat_indicators — required by MySQL partitioning rules.
  • Foreign keys removed between partitioned tables — partitioned tables cannot be FK targets nor sources. Integrity is enforced at the application layer in the n8n workflow.
  • Monthly partitioning + 6-month retention added — see the dedicated partitioning section.

Related pages: Architecture · Deployment · Database Init (04.3) · n8n Workflow · Components · Vault & Secrets (06)


1. Entity Relationship Diagram

The diagram below reflects the current relationships. Note that links between partitioned tables (dns_queries, network_events, threat_indicators) and their dependants are logical only — they are enforced at the application layer in n8n, not by MySQL FKs.

erDiagram
    dic_indicator_types ||--o{ threat_indicators : "defines type (logical)"
    dic_threat_levels   ||--o{ ai_analysis_results : "categorizes (FK)"
    ai_analysis_results ||--o{ threat_indicators : "provides verdict (logical)"
    dns_queries         ||--o{ threat_indicators : "triggers analysis (logical)"
    threat_indicators   ||--o{ threat_indicator_details : "has metadata (logical)"
    dic_source_providers||--o{ threat_indicator_details : "is source of (FK)"
    network_events      ||--o| dns_queries : "logs access (logical)"
    network_events      ||--o| threat_indicators : "links to verdict (logical)"
Relationship type Enforcement
(FK) Real MySQL foreign key constraint
(logical) Application-layer integrity (validated in n8n workflow) — not enforceable as FK because at least one of the tables is partitioned

2. Core Tables

The tables below form the backbone of the data model. Three of them (dns_queries, network_events, threat_indicators) are partitioned by date — see section 7 for the full rationale and DDL.

2.1 threat_indicators

The primary registry for all analyzed observables. Bridges a captured DNS query with its AI verdict and tracks how many times the same (dns_query, analysis_result) pair has been re-scanned.

Column Type Constraint Description
id INT AUTO, part of composite PK Internal autoincrement
dns_query_id INT NOT NULL, INDEXED Logical link to dns_queries.id (no FK — partitioned)
type_id INT NOT NULL, INDEXED Logical link to dic_indicator_types.id
analysis_result_id INT NOT NULL, INDEXED Logical link to ai_analysis_results.id
last_scan DATETIME NOT NULL DEFAULT NOW, part of composite PK Most recent enrichment time — also the partitioning key
scan_count INT DEFAULT 1 Number of times this (dns_query, verdict) pair was re-scanned
created_at TIMESTAMP DEFAULT NOW Row creation time

Composite primary key: (id, last_scan) Unique key: uk_dns_analysis_scan (dns_query_id, analysis_result_id, last_scan) — the inclusion of last_scan allows the same observable to be scanned multiple times across days.

Why no FK?

Partitioned tables in MySQL cannot be referenced by foreign keys. The relationships to dns_queries, dic_indicator_types, and ai_analysis_results are validated by the n8n workflow before insert.


2.2 ai_analysis_results

Stores the cognitive synthesis generated by the AI agent (Google Gemini). This table is not partitioned — it is small and serves as a foreign key target for dic_threat_levels.

Column Type Constraint Description
id INT PK, AUTO Verdict identifier
threat_score INT FKdic_threat_levels.score, NOT NULL Risk score on the 1–5 scale
threat_label VARCHAR(50) Short classification label
verdict_summary_en TEXT Technical classification (EN)
analysis_pl TEXT Human-readable explanation (PL)
confidence_score DECIMAL(5,2) NULLABLE Model confidence (0.00–100.00)
analyzed_at TIMESTAMP DEFAULT NOW When the verdict was produced

The FK on threat_score uses ON DELETE RESTRICT ON UPDATE CASCADE so that the threat scale cannot be silently broken.


2.3 threat_indicator_details

Maps relational records to raw JSON stored in MongoDB, one row per (indicator, provider) pair.

Column Type Constraint Description
id INT PK, AUTO Identifier
indicator_id INT NOT NULL, INDEXED Logical link to threat_indicators.id (no FK — partitioned)
source_id INT FKdic_source_providers.id Which CTI provider produced this row
mongo_ref_id VARCHAR(50) INDEXED ObjectID of the raw response in MongoDB
raw_response_hash VARCHAR(64) SHA-256 of the raw payload (de-dup)
fetched_at TIMESTAMP DEFAULT NOW Fetch time

Unique key: uk_indicator_source (indicator_id, source_id) — one row per (indicator, provider).


2.4 dns_queries

Stores raw DNS requests captured by the monitoring layer (Pi-holedns_log_processor.py). Partitioned monthly by timestamp.

Column Type Constraint Description
id INT AUTO, part of composite PK Internal identifier
timestamp DATETIME NOT NULL, INDEXED, part of composite PK Time of the DNS request — partitioning key
query_type VARCHAR(10) NOT NULL DNS query type (A, AAAA, …)
record_type VARCHAR(10) Resolved record type
domain VARCHAR(255) NOT NULL, INDEXED Requested FQDN
source_ip VARCHAR(45) NOT NULL, INDEXED Client IP
response_ip VARCHAR(45) INDEXED Resolved IP
created_at TIMESTAMP DEFAULT NOW Row creation time

Composite primary key: (id, timestamp) Composite index: idx_domain_timestamp (domain, timestamp) for time-bounded domain lookups.


2.5 network_events

Universal table for network events from IDS/IPS (Suricata), Scapy, and packet sniffers. Partitioned monthly by timestamp.

Column Type Constraint Description
id INT AUTO, part of composite PK Identifier
timestamp DATETIME NOT NULL DEFAULT NOW, part of composite PK Event time — partitioning key
dns_query_id INT NULL, INDEXED Logical link to dns_queries.id
threat_indicator_id INT NULL, INDEXED Logical link to threat_indicators.id
source_ip VARCHAR(45) NOT NULL, INDEXED Client IP
dest_ip VARCHAR(45) NOT NULL, INDEXED Destination IP
dest_port INT Destination port
protocol VARCHAR(10) TCP / UDP / ICMP
application_protocol VARCHAR(20) HTTP / TLS / DNS / FTP
request_url TEXT Full URL or URI path
user_agent TEXT Client User-Agent
http_method VARCHAR(10) GET / POST / PUT / …
event_type VARCHAR(50) INDEXED alert, flow, http_request, scapy_intercept
severity INT INDEXED Normalized severity (1–5)
signature_name VARCHAR(255) Suricata rule name
action_taken VARCHAR(20) allowed, blocked, logged
created_at TIMESTAMP DEFAULT NOW Row creation time

Composite primary key: (id, timestamp)

Logical FKs to partitioned parents

Both dns_query_id and threat_indicator_id reference partitioned tables, so they are not declared as FKs. The n8n workflow is responsible for maintaining referential integrity.


3. Dictionary Tables

Controlled vocabularies used across the system. All dictionary tables are pre-populated by db_deployment.sql using INSERT IGNORE.

3.1 dic_indicator_types

Type of observable being analyzed.

id name
1 FQDN
2 IP
3 HASH

3.2 dic_source_providers

CTI (Cyber Threat Intelligence) providers used for enrichment. Includes is_active to support runtime toggling.

id name is_active Provider docs
1 VirusTotal TRUE API v3
2 Abuse_ThreatFox TRUE ThreatFox
3 Abuse_URLhaus TRUE URLHaus
4 urlscan.io TRUE urlscan.io API

API tokens for these providers are stored in HashiCorp Vault under cyber-sentinel/api-keys/.


3.3 dic_threat_levels

Defines the 1–5 scoring policy introduced in schema v3.0. Each row carries is_malicious_flag (replacing the implicit > 5 threshold from v2.0) and an explicit action_recommended so that response logic lives in the database, not in n8n code.

Score Description is_malicious_flag action_recommended
1 Clean / Trusted infrastructure ✅ FALSE Allow
2 Low Risk / Monitor ✅ FALSE Monitor
3 Suspicious — manual review needed ✅ FALSE Review
4 Malicious — confirmed threat ⚠️ TRUE Block
5 Critical — active threat, immediate alert ⚠️ TRUE Block + Alert

Why an explicit flag?

The previous design relied on a hard-coded threshold (threat_score > 5) duplicated across n8n nodes and SQL views. With is_malicious_flag the malicious / non-malicious decision is data, not code — adjusting the policy means a single UPDATE on dic_threat_levels, no n8n workflow redeploy. The same flag drives every Grafana view in section 5.

The AI agent reads this table dynamically at every invocation through v_threat_scale_for_agent (see section 5.2).


4. Analytical Views

SQL views simplify orchestration (n8n) and visualization (Grafana). All views are prefixed with v_.

4.1 v_pending_analysis

Queue of new DNS observables that have not yet been enriched by CTI sources.

  • Source tables: dns_queries, threat_indicators
  • Filter: records with no matching threat_indicators entry and a valid IPv4 response_ip (validated by regex)
  • Consumed by: n8n threat enrichment workflow — main scheduling input

4.2 v_latest_threat_reports

Returns the most recent scan result per DNS query, joining verdict data from ai_analysis_results. Uses a self-join on MAX(last_scan) to pick the latest scan.

  • Source tables: threat_indicators, ai_analysis_results
  • Key fields: threat_score, threat_label, verdict_summary_en, analysis_pl, confidence_score, scan_count

5. Grafana Views

Grafana panels query these views via the MySQL data source. All Grafana views now use is_malicious_flag instead of the legacy threat_score > 5 rule.

5.1 v_grafana_malicious_stats

Global statistics on total scans vs. malicious detections.

  • Based on: v_latest_threat_reports joined with dic_threat_levels
  • Key fields: total_scans, total_malicious_scans, malicious_percentage, last_scan, first_scan

Time-series aggregation of daily scan volume and threat activity.

  • Based on: v_latest_threat_reports
  • Key fields: scan_date, time_sec (UNIX), total_scans_per_day, total_positives_per_day, total_clean_per_day, max_threat_score_that_day, avg_threat_score

5.3 v_grafana_dns_hourly_traffic

Aggregates DNS query volume per hour for traffic intensity charts.

  • Source table: dns_queries
  • Key fields: hour_group, time_sec (UNIX), total_queries, unique_domains, unique_sources, latest_query

5.4 v_grafana_threat_explorer

Full forensic view combining network_events, dns_queries, AI verdicts, and CTI provider names. Filtered to malicious events only via tl.is_malicious_flag = TRUE.

  • Source tables: network_events, dns_queries, threat_indicators, ai_analysis_results, dic_threat_levels, threat_indicator_details, dic_source_providers
  • Key fields: timestamp, fqdn, source_ip, dest_ip, request_url, event_type, providers, threat_label, action_recommended, threat_score, action_taken

5.5 v_grafana_threat_alerts

New in v3.0. Alert-ready view of all malicious indicators, joining the verdict, the original DNS query, and the threat-level dictionary. Designed for Grafana alerting / n8n notification flows.

  • Source tables: threat_indicators, dns_queries, ai_analysis_results, dic_threat_levels
  • Filter: tl.is_malicious_flag = TRUE
  • Key fields: indicator_id, detection_time, fqdn, record_type, source_ip, observable_ip, threat_score, threat_label, is_malicious_flag, action_recommended, verdict_en, analysis_pl, confidence_score, scan_count, last_scan

5.6 v_threat_scale_for_agent

New in v3.0. Helper view exposing the threat scale to the AI agent at every invocation, so prompt context always reflects the current policy in dic_threat_levels without redeploying n8n.

  • Source table: dic_threat_levels
  • Key fields: score, description, is_malicious_flag, action_recommended, formatted_line (pre-formatted string for prompt injection)
-- Excerpt: pre-formatted line for direct prompt embedding
CONCAT(score, ' | ', description, ' (Action: ', action_recommended, ')') AS formatted_line

6. User Management

The deployment script provisions a non-privileged application user with the minimum grants required to operate the schema. Credentials come from Ansible Vault:

CREATE USER IF NOT EXISTS '{{ mysql_user }}'@'%' IDENTIFIED BY '{{ vault_mysql_password }}';
GRANT SELECT, INSERT, UPDATE, DELETE ON cyber_intelligence.* TO '{{ mysql_user }}'@'%';
GRANT CREATE VIEW ON cyber_intelligence.* TO '{{ mysql_user }}'@'%';
FLUSH PRIVILEGES;
Variable Source Vault path
mysql_user group_vars — (used for templating only)
vault_mysql_password Vault cyber-sentinel/credentials/mysql/app_manager
vault_mysql_root_password Vault cyber-sentinel/credentials/mysql/root (used by Ansible playbook 04.3 to run the script as root)

The application user gets CREATE VIEW so that operators can iterate on Grafana views without escalating to root. No DROP, ALTER, or GRANT is granted to it.


7. Partitioning & Retention

Three high-volume tables — dns_queries, network_events, and threat_indicators — use MySQL RANGE partitioning on a DATETIME column, with monthly partitions and a 6-month retention policy enforced by stored procedures and the Event Scheduler.

The DDL and procedures live in db_partitioning_retention.sql and are applied by Ansible playbook 04.x — DB partitioning.

7.1 Partition layout

Table Partition column Initial partitions
dns_queries timestamp p202604, p202605, p202606, p202607, p_future
network_events timestamp same
threat_indicators last_scan same

Each pYYYYMM partition holds rows where the partition column is < first day of YYYY-MM+1. The catch-all p_future partition is reorganised every month to add a new explicit partition.

ALTER TABLE dns_queries
PARTITION BY RANGE (TO_DAYS(timestamp)) (
    PARTITION p202604 VALUES LESS THAN (TO_DAYS('2026-05-01')),
    PARTITION p202605 VALUES LESS THAN (TO_DAYS('2026-06-01')),
    PARTITION p202606 VALUES LESS THAN (TO_DAYS('2026-07-01')),
    PARTITION p202607 VALUES LESS THAN (TO_DAYS('2026-08-01')),
    PARTITION p_future VALUES LESS THAN MAXVALUE
);

Composite PK is mandatory

Every unique key on a partitioned table must include the partitioning column (MySQL ER 1503). That's why all three partitioned tables have composite primary keys(id, timestamp) for dns_queries and network_events, (id, last_scan) for threat_indicators.

7.2 Stored procedures

Procedure Purpose Key behaviour
sp_drop_old_partitions() Drops every partition older than 6 months Iterates INFORMATION_SCHEMA.PARTITIONS, uses a cursor, logs every action / failure to partition_maintenance_log, never aborts on error
sp_add_future_partitions() Adds the next 3 monthly partitions if missing Loops months 1..3 ahead, uses REORGANIZE PARTITION p_future INTO (…, p_future MAXVALUE), skips months that already exist

v3.0 fix vs v1

The v1 implementation guessed a single partition name and only handled one partition per run. The v3 version iterates INFORMATION_SCHEMA.PARTITIONS so a long-paused or backfilled deployment cleans up correctly in a single invocation. Failed DROP / REORGANIZE operations are logged but do not abort the cursor.

7.3 Maintenance log

CREATE TABLE partition_maintenance_log (
    id              INT AUTO_INCREMENT PRIMARY KEY,
    table_name      VARCHAR(100) NOT NULL,
    partition_name  VARCHAR(50)  NOT NULL,
    action          VARCHAR(20)  NOT NULL,   -- ADD / DROP / ADD_FAILED / DROP_FAILED
    error_message   TEXT,
    executed_at     TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Every ADD and DROP performed by the procedures appends a row here. The *_FAILED actions capture SQLEXCEPTION events without rolling back the surrounding cursor. Use this table as the operational source of truth.

7.4 Scheduled events

The MySQL Event Scheduler is enabled with SET GLOBAL event_scheduler = ON and runs two monthly jobs:

Event Schedule Action
evt_drop_old_partitions 1st of each month, 02:00 CALL sp_drop_old_partitions()
evt_add_future_partitions 1st of each month, 03:00 CALL sp_add_future_partitions()

The 1-hour gap between DROP and ADD is intentional — it prevents the two heavy DDL operations from contending on the same partitioned tables. Verify with:

SELECT @@global.event_scheduler;     -- should return ON
SHOW EVENTS FROM cyber_intelligence;

7.5 Monitoring view

v_partition_info exposes per-partition row counts and on-disk sizes (in MB) for all three partitioned tables, so you don't need direct access to INFORMATION_SCHEMA:

Column Description
TABLE_NAME Partitioned table
PARTITION_NAME pYYYYMM or p_future
PARTITION_METHOD Always RANGE here
PARTITION_EXPRESSION TO_DAYS(timestamp) / TO_DAYS(last_scan)
PARTITION_DESCRIPTION Upper bound (TO_DAYS(...) or MAXVALUE)
TABLE_ROWS Approximate row count
data_mb, index_mb, total_mb Storage footprint
CREATE_TIME, UPDATE_TIME Lifecycle timestamps
-- Quick health check
SELECT * FROM v_partition_info;
SELECT * FROM partition_maintenance_log ORDER BY executed_at DESC LIMIT 20;

Manual maintenance

Both procedures can be triggered manually for testing: CALL sp_add_future_partitions(); / CALL sp_drop_old_partitions();. Failures show up in partition_maintenance_log with action = 'ADD_FAILED' or 'DROP_FAILED'.