Database Schema: Cyber Intelligence
This document provides a detailed overview of the MySQL 8.0 relational database used in the Cyber Sentinel project. The database, named cyber_intelligence, manages network observables, threat intelligence reports, and AI-generated verdicts.
Schema version 3.0
Breaking changes versus v2.0:
- Threat scale reduced from 1–10 to 1–5 with an explicit
is_malicious_flagcolumn (replacing the implicitscore > 5rule). action_recommendedcolumn ondic_threat_levelsmakes the response policy first-class data instead of business logic in n8n.- Composite primary keys on
dns_queries,network_events, andthreat_indicators— required by MySQL partitioning rules. - Foreign keys removed between partitioned tables — partitioned tables cannot be FK targets nor sources. Integrity is enforced at the application layer in the n8n workflow.
- Monthly partitioning + 6-month retention added — see the dedicated partitioning section.
Related pages: Architecture · Deployment · Database Init (04.3) · n8n Workflow · Components · Vault & Secrets (06)
1. Entity Relationship Diagram
The diagram below reflects the current relationships. Note that links between partitioned tables (dns_queries, network_events, threat_indicators) and their dependants are logical only — they are enforced at the application layer in n8n, not by MySQL FKs.
erDiagram
dic_indicator_types ||--o{ threat_indicators : "defines type (logical)"
dic_threat_levels ||--o{ ai_analysis_results : "categorizes (FK)"
ai_analysis_results ||--o{ threat_indicators : "provides verdict (logical)"
dns_queries ||--o{ threat_indicators : "triggers analysis (logical)"
threat_indicators ||--o{ threat_indicator_details : "has metadata (logical)"
dic_source_providers||--o{ threat_indicator_details : "is source of (FK)"
network_events ||--o| dns_queries : "logs access (logical)"
network_events ||--o| threat_indicators : "links to verdict (logical)" | Relationship type | Enforcement |
|---|---|
(FK) | Real MySQL foreign key constraint |
(logical) | Application-layer integrity (validated in n8n workflow) — not enforceable as FK because at least one of the tables is partitioned |
2. Core Tables
The tables below form the backbone of the data model. Three of them (dns_queries, network_events, threat_indicators) are partitioned by date — see section 7 for the full rationale and DDL.
2.1 threat_indicators
The primary registry for all analyzed observables. Bridges a captured DNS query with its AI verdict and tracks how many times the same (dns_query, analysis_result) pair has been re-scanned.
| Column | Type | Constraint | Description |
|---|---|---|---|
id | INT | AUTO, part of composite PK | Internal autoincrement |
dns_query_id | INT | NOT NULL, INDEXED | Logical link to dns_queries.id (no FK — partitioned) |
type_id | INT | NOT NULL, INDEXED | Logical link to dic_indicator_types.id |
analysis_result_id | INT | NOT NULL, INDEXED | Logical link to ai_analysis_results.id |
last_scan | DATETIME | NOT NULL DEFAULT NOW, part of composite PK | Most recent enrichment time — also the partitioning key |
scan_count | INT | DEFAULT 1 | Number of times this (dns_query, verdict) pair was re-scanned |
created_at | TIMESTAMP | DEFAULT NOW | Row creation time |
Composite primary key: (id, last_scan) Unique key: uk_dns_analysis_scan (dns_query_id, analysis_result_id, last_scan) — the inclusion of last_scan allows the same observable to be scanned multiple times across days.
Why no FK?
Partitioned tables in MySQL cannot be referenced by foreign keys. The relationships to dns_queries, dic_indicator_types, and ai_analysis_results are validated by the n8n workflow before insert.
2.2 ai_analysis_results
Stores the cognitive synthesis generated by the AI agent (Google Gemini). This table is not partitioned — it is small and serves as a foreign key target for dic_threat_levels.
| Column | Type | Constraint | Description |
|---|---|---|---|
id | INT | PK, AUTO | Verdict identifier |
threat_score | INT | FK → dic_threat_levels.score, NOT NULL | Risk score on the 1–5 scale |
threat_label | VARCHAR(50) | — | Short classification label |
verdict_summary_en | TEXT | — | Technical classification (EN) |
analysis_pl | TEXT | — | Human-readable explanation (PL) |
confidence_score | DECIMAL(5,2) | NULLABLE | Model confidence (0.00–100.00) |
analyzed_at | TIMESTAMP | DEFAULT NOW | When the verdict was produced |
The FK on threat_score uses ON DELETE RESTRICT ON UPDATE CASCADE so that the threat scale cannot be silently broken.
2.3 threat_indicator_details
Maps relational records to raw JSON stored in MongoDB, one row per (indicator, provider) pair.
| Column | Type | Constraint | Description |
|---|---|---|---|
id | INT | PK, AUTO | Identifier |
indicator_id | INT | NOT NULL, INDEXED | Logical link to threat_indicators.id (no FK — partitioned) |
source_id | INT | FK → dic_source_providers.id | Which CTI provider produced this row |
mongo_ref_id | VARCHAR(50) | INDEXED | ObjectID of the raw response in MongoDB |
raw_response_hash | VARCHAR(64) | — | SHA-256 of the raw payload (de-dup) |
fetched_at | TIMESTAMP | DEFAULT NOW | Fetch time |
Unique key: uk_indicator_source (indicator_id, source_id) — one row per (indicator, provider).
2.4 dns_queries
Stores raw DNS requests captured by the monitoring layer (Pi-hole → dns_log_processor.py). Partitioned monthly by timestamp.
| Column | Type | Constraint | Description |
|---|---|---|---|
id | INT | AUTO, part of composite PK | Internal identifier |
timestamp | DATETIME | NOT NULL, INDEXED, part of composite PK | Time of the DNS request — partitioning key |
query_type | VARCHAR(10) | NOT NULL | DNS query type (A, AAAA, …) |
record_type | VARCHAR(10) | — | Resolved record type |
domain | VARCHAR(255) | NOT NULL, INDEXED | Requested FQDN |
source_ip | VARCHAR(45) | NOT NULL, INDEXED | Client IP |
response_ip | VARCHAR(45) | INDEXED | Resolved IP |
created_at | TIMESTAMP | DEFAULT NOW | Row creation time |
Composite primary key: (id, timestamp) Composite index: idx_domain_timestamp (domain, timestamp) for time-bounded domain lookups.
2.5 network_events
Universal table for network events from IDS/IPS (Suricata), Scapy, and packet sniffers. Partitioned monthly by timestamp.
| Column | Type | Constraint | Description |
|---|---|---|---|
id | INT | AUTO, part of composite PK | Identifier |
timestamp | DATETIME | NOT NULL DEFAULT NOW, part of composite PK | Event time — partitioning key |
dns_query_id | INT | NULL, INDEXED | Logical link to dns_queries.id |
threat_indicator_id | INT | NULL, INDEXED | Logical link to threat_indicators.id |
source_ip | VARCHAR(45) | NOT NULL, INDEXED | Client IP |
dest_ip | VARCHAR(45) | NOT NULL, INDEXED | Destination IP |
dest_port | INT | — | Destination port |
protocol | VARCHAR(10) | — | TCP / UDP / ICMP |
application_protocol | VARCHAR(20) | — | HTTP / TLS / DNS / FTP |
request_url | TEXT | — | Full URL or URI path |
user_agent | TEXT | — | Client User-Agent |
http_method | VARCHAR(10) | — | GET / POST / PUT / … |
event_type | VARCHAR(50) | INDEXED | alert, flow, http_request, scapy_intercept |
severity | INT | INDEXED | Normalized severity (1–5) |
signature_name | VARCHAR(255) | — | Suricata rule name |
action_taken | VARCHAR(20) | — | allowed, blocked, logged |
created_at | TIMESTAMP | DEFAULT NOW | Row creation time |
Composite primary key: (id, timestamp)
Logical FKs to partitioned parents
Both dns_query_id and threat_indicator_id reference partitioned tables, so they are not declared as FKs. The n8n workflow is responsible for maintaining referential integrity.
3. Dictionary Tables
Controlled vocabularies used across the system. All dictionary tables are pre-populated by db_deployment.sql using INSERT IGNORE.
3.1 dic_indicator_types
Type of observable being analyzed.
id | name |
|---|---|
| 1 | FQDN |
| 2 | IP |
| 3 | HASH |
3.2 dic_source_providers
CTI (Cyber Threat Intelligence) providers used for enrichment. Includes is_active to support runtime toggling.
id | name | is_active | Provider docs |
|---|---|---|---|
| 1 | VirusTotal | TRUE | API v3 |
| 2 | Abuse_ThreatFox | TRUE | ThreatFox |
| 3 | Abuse_URLhaus | TRUE | URLHaus |
| 4 | urlscan.io | TRUE | urlscan.io API |
API tokens for these providers are stored in HashiCorp Vault under cyber-sentinel/api-keys/.
3.3 dic_threat_levels
Defines the 1–5 scoring policy introduced in schema v3.0. Each row carries is_malicious_flag (replacing the implicit > 5 threshold from v2.0) and an explicit action_recommended so that response logic lives in the database, not in n8n code.
| Score | Description | is_malicious_flag | action_recommended |
|---|---|---|---|
| 1 | Clean / Trusted infrastructure | ✅ FALSE | Allow |
| 2 | Low Risk / Monitor | ✅ FALSE | Monitor |
| 3 | Suspicious — manual review needed | ✅ FALSE | Review |
| 4 | Malicious — confirmed threat | ⚠️ TRUE | Block |
| 5 | Critical — active threat, immediate alert | ⚠️ TRUE | Block + Alert |
Why an explicit flag?
The previous design relied on a hard-coded threshold (threat_score > 5) duplicated across n8n nodes and SQL views. With is_malicious_flag the malicious / non-malicious decision is data, not code — adjusting the policy means a single UPDATE on dic_threat_levels, no n8n workflow redeploy. The same flag drives every Grafana view in section 5.
The AI agent reads this table dynamically at every invocation through v_threat_scale_for_agent (see section 5.2).
4. Analytical Views
SQL views simplify orchestration (n8n) and visualization (Grafana). All views are prefixed with v_.
4.1 v_pending_analysis
Queue of new DNS observables that have not yet been enriched by CTI sources.
- Source tables:
dns_queries,threat_indicators - Filter: records with no matching
threat_indicatorsentry and a valid IPv4response_ip(validated by regex) - Consumed by: n8n threat enrichment workflow — main scheduling input
4.2 v_latest_threat_reports
Returns the most recent scan result per DNS query, joining verdict data from ai_analysis_results. Uses a self-join on MAX(last_scan) to pick the latest scan.
- Source tables:
threat_indicators,ai_analysis_results - Key fields:
threat_score,threat_label,verdict_summary_en,analysis_pl,confidence_score,scan_count
5. Grafana Views
Grafana panels query these views via the MySQL data source. All Grafana views now use is_malicious_flag instead of the legacy threat_score > 5 rule.
5.1 v_grafana_malicious_stats
Global statistics on total scans vs. malicious detections.
- Based on:
v_latest_threat_reportsjoined withdic_threat_levels - Key fields:
total_scans,total_malicious_scans,malicious_percentage,last_scan,first_scan
5.2 v_grafana_daily_trends
Time-series aggregation of daily scan volume and threat activity.
- Based on:
v_latest_threat_reports - Key fields:
scan_date,time_sec(UNIX),total_scans_per_day,total_positives_per_day,total_clean_per_day,max_threat_score_that_day,avg_threat_score
5.3 v_grafana_dns_hourly_traffic
Aggregates DNS query volume per hour for traffic intensity charts.
- Source table:
dns_queries - Key fields:
hour_group,time_sec(UNIX),total_queries,unique_domains,unique_sources,latest_query
5.4 v_grafana_threat_explorer
Full forensic view combining network_events, dns_queries, AI verdicts, and CTI provider names. Filtered to malicious events only via tl.is_malicious_flag = TRUE.
- Source tables:
network_events,dns_queries,threat_indicators,ai_analysis_results,dic_threat_levels,threat_indicator_details,dic_source_providers - Key fields:
timestamp,fqdn,source_ip,dest_ip,request_url,event_type,providers,threat_label,action_recommended,threat_score,action_taken
5.5 v_grafana_threat_alerts
New in v3.0. Alert-ready view of all malicious indicators, joining the verdict, the original DNS query, and the threat-level dictionary. Designed for Grafana alerting / n8n notification flows.
- Source tables:
threat_indicators,dns_queries,ai_analysis_results,dic_threat_levels - Filter:
tl.is_malicious_flag = TRUE - Key fields:
indicator_id,detection_time,fqdn,record_type,source_ip,observable_ip,threat_score,threat_label,is_malicious_flag,action_recommended,verdict_en,analysis_pl,confidence_score,scan_count,last_scan
5.6 v_threat_scale_for_agent
New in v3.0. Helper view exposing the threat scale to the AI agent at every invocation, so prompt context always reflects the current policy in dic_threat_levels without redeploying n8n.
- Source table:
dic_threat_levels - Key fields:
score,description,is_malicious_flag,action_recommended,formatted_line(pre-formatted string for prompt injection)
-- Excerpt: pre-formatted line for direct prompt embedding
CONCAT(score, ' | ', description, ' (Action: ', action_recommended, ')') AS formatted_line
6. User Management
The deployment script provisions a non-privileged application user with the minimum grants required to operate the schema. Credentials come from Ansible Vault:
CREATE USER IF NOT EXISTS '{{ mysql_user }}'@'%' IDENTIFIED BY '{{ vault_mysql_password }}';
GRANT SELECT, INSERT, UPDATE, DELETE ON cyber_intelligence.* TO '{{ mysql_user }}'@'%';
GRANT CREATE VIEW ON cyber_intelligence.* TO '{{ mysql_user }}'@'%';
FLUSH PRIVILEGES;
| Variable | Source | Vault path |
|---|---|---|
mysql_user | group_vars | — (used for templating only) |
vault_mysql_password | Vault | cyber-sentinel/credentials/mysql/app_manager |
vault_mysql_root_password | Vault | cyber-sentinel/credentials/mysql/root (used by Ansible playbook 04.3 to run the script as root) |
The application user gets CREATE VIEW so that operators can iterate on Grafana views without escalating to root. No DROP, ALTER, or GRANT is granted to it.
7. Partitioning & Retention
Three high-volume tables — dns_queries, network_events, and threat_indicators — use MySQL RANGE partitioning on a DATETIME column, with monthly partitions and a 6-month retention policy enforced by stored procedures and the Event Scheduler.
The DDL and procedures live in db_partitioning_retention.sql and are applied by Ansible playbook 04.x — DB partitioning.
7.1 Partition layout
| Table | Partition column | Initial partitions |
|---|---|---|
dns_queries | timestamp | p202604, p202605, p202606, p202607, p_future |
network_events | timestamp | same |
threat_indicators | last_scan | same |
Each pYYYYMM partition holds rows where the partition column is < first day of YYYY-MM+1. The catch-all p_future partition is reorganised every month to add a new explicit partition.
ALTER TABLE dns_queries
PARTITION BY RANGE (TO_DAYS(timestamp)) (
PARTITION p202604 VALUES LESS THAN (TO_DAYS('2026-05-01')),
PARTITION p202605 VALUES LESS THAN (TO_DAYS('2026-06-01')),
PARTITION p202606 VALUES LESS THAN (TO_DAYS('2026-07-01')),
PARTITION p202607 VALUES LESS THAN (TO_DAYS('2026-08-01')),
PARTITION p_future VALUES LESS THAN MAXVALUE
);
Composite PK is mandatory
Every unique key on a partitioned table must include the partitioning column (MySQL ER 1503). That's why all three partitioned tables have composite primary keys — (id, timestamp) for dns_queries and network_events, (id, last_scan) for threat_indicators.
7.2 Stored procedures
| Procedure | Purpose | Key behaviour |
|---|---|---|
sp_drop_old_partitions() | Drops every partition older than 6 months | Iterates INFORMATION_SCHEMA.PARTITIONS, uses a cursor, logs every action / failure to partition_maintenance_log, never aborts on error |
sp_add_future_partitions() | Adds the next 3 monthly partitions if missing | Loops months 1..3 ahead, uses REORGANIZE PARTITION p_future INTO (…, p_future MAXVALUE), skips months that already exist |
v3.0 fix vs v1
The v1 implementation guessed a single partition name and only handled one partition per run. The v3 version iterates INFORMATION_SCHEMA.PARTITIONS so a long-paused or backfilled deployment cleans up correctly in a single invocation. Failed DROP / REORGANIZE operations are logged but do not abort the cursor.
7.3 Maintenance log
CREATE TABLE partition_maintenance_log (
id INT AUTO_INCREMENT PRIMARY KEY,
table_name VARCHAR(100) NOT NULL,
partition_name VARCHAR(50) NOT NULL,
action VARCHAR(20) NOT NULL, -- ADD / DROP / ADD_FAILED / DROP_FAILED
error_message TEXT,
executed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
Every ADD and DROP performed by the procedures appends a row here. The *_FAILED actions capture SQLEXCEPTION events without rolling back the surrounding cursor. Use this table as the operational source of truth.
7.4 Scheduled events
The MySQL Event Scheduler is enabled with SET GLOBAL event_scheduler = ON and runs two monthly jobs:
| Event | Schedule | Action |
|---|---|---|
evt_drop_old_partitions | 1st of each month, 02:00 | CALL sp_drop_old_partitions() |
evt_add_future_partitions | 1st of each month, 03:00 | CALL sp_add_future_partitions() |
The 1-hour gap between DROP and ADD is intentional — it prevents the two heavy DDL operations from contending on the same partitioned tables. Verify with:
7.5 Monitoring view
v_partition_info exposes per-partition row counts and on-disk sizes (in MB) for all three partitioned tables, so you don't need direct access to INFORMATION_SCHEMA:
| Column | Description |
|---|---|
TABLE_NAME | Partitioned table |
PARTITION_NAME | pYYYYMM or p_future |
PARTITION_METHOD | Always RANGE here |
PARTITION_EXPRESSION | TO_DAYS(timestamp) / TO_DAYS(last_scan) |
PARTITION_DESCRIPTION | Upper bound (TO_DAYS(...) or MAXVALUE) |
TABLE_ROWS | Approximate row count |
data_mb, index_mb, total_mb | Storage footprint |
CREATE_TIME, UPDATE_TIME | Lifecycle timestamps |
-- Quick health check
SELECT * FROM v_partition_info;
SELECT * FROM partition_maintenance_log ORDER BY executed_at DESC LIMIT 20;
Manual maintenance
Both procedures can be triggered manually for testing: CALL sp_add_future_partitions(); / CALL sp_drop_old_partitions();. Failures show up in partition_maintenance_log with action = 'ADD_FAILED' or 'DROP_FAILED'.