Automated Domain & IP Reputation Guard (n8n Workflow)

Officially verified and published in the n8n Workflow Library. Check it out here: Score DNS Threats with VirusTotal, Abuse.ch, HashiCorp Vault, and Gemini.

Overview

This workflow is a core component of the Cyber Sentinel platform. It automates threat intelligence enrichment, analysis, and alerting for IP addresses and domain names observed in DNS traffic.

The pipeline integrates:

Multi-source CTI (VirusTotal, ThreatFox, URLHaus)
Secure secret management (HashiCorp Vault)
AI-based decision engine (Google Gemini) with a dynamic 1–5 threat scale loaded from MySQL
Dual persistence (MongoDB raw, MySQL relational)
Severity-graded alert email (info / review / alert)

Two-version setup

The production workflow described on this page is v1 — three CTI sources running in parallel before AI analysis. v2 is in active development (preview at the end of this page) and reshapes the data flow so URLHaus only runs after VirusTotal or ThreatFox has flagged an indicator. v2 is the canonical detection-first design referenced in the v1.0.2-rc1 release notes.

1. Workflow Trigger & Initialization

Components

Schedule Trigger
HashiCorp Vault (MySQL, MongoDB credentials)

Description

The workflow executes every 3 minutes, ensuring continuous monitoring of infrastructure.

schedule-trigger.json

{
  "rule": {
    "interval": [
      {
        "field": "minutes",
        "minutesInterval": 3
      }
    ]
  }
}

2. Secure Secrets Management

Components

HashiCorp Vault Nodes:
MySQL credentials
MongoDB credentials
API tokens (VirusTotal, Abuse.ch, Gemini)
Email credentials

Description

All sensitive data is dynamically retrieved from Vault, implementing a Zero Trust Secrets Model. There are no API keys, passwords, or certificates anywhere in the workflow JSON or in n8n credentials — every secret is read from Vault at execution time.

vault-secret.json

{
  "secretPath": "cyber-sentinel/api-keys/virustotal",
  "secretPath": "cyber-sentinel/api-keys/gemini/home-network-guardian",
  "secretPath": "cyber-sentinel/credentials/mysql/app_manager",
  "secretPath": "cyber-sentinel/api-keys/abuse/api-key",
  "secretPath": "cyber-sentinel/credentials/gmail"
}

3. Data Source (MySQL)

Component

Select rows from a table

Description

The workflow retrieves observables (IP/FQDN) from v_pending_analysis — a queue of DNS observables that have not yet been enriched. The view is documented on the Database Schema page.

view.sql

SELECT
    dns_query_id,
    source_ip,
    fqdn,
    observable_ip,
    first_seen
FROM cyber_intelligence.v_pending_analysis
LIMIT 1;

This view acts as a queue of entities awaiting analysis.

4. Threat Intelligence Enrichment

Components

Description

Each observable is enriched using multiple CTI providers to ensure high-confidence detection. In v1 all three providers are queried in parallel — see section 15 for the v2 detection-first reshape.

4.1 VirusTotal Scan

virustotal-request.js

const url = `https://www.virustotal.com/api/v3/ip_addresses/${observable_ip}`;

4.2 ThreatFox Request

threatfox-request.json

{
  "method": "POST",
  "url": "https://threatfox-api.abuse.ch/api/v1/",
  "authentication": "genericCredentialType",
  "genericAuthType": "httpHeaderAuth",
  "sendBody": true,
  "bodyParameters": {
    "parameters": [
      { "name": "query",        "value": "search_ioc" },
      { "name": "search_term",  "value": "={{ $('Select rows from a table').item.json.fqdn }}" },
      { "name": "exact_match",  "value": "true" }
    ]
  },
  "options": {}
}

4.3 URLHaus Request

urlhaus-request.json

{
  "method": "POST",
  "url": "https://urlhaus-api.abuse.ch/v1/host/",
  "authentication": "genericCredentialType",
  "genericAuthType": "httpHeaderAuth",
  "sendBody": true,
  "contentType": "form-urlencoded",
  "bodyParameters": {
    "parameters": [
      { "name": "host", "value": "={{ $('Select rows from a table').item.json.fqdn }}" }
    ]
  },
  "options": {}
}

5. Conditional Flow Control

Components

IF node — VirusTotal (malicious > 0 OR suspicious > 0)
IF node — ThreatFox (query_status = ok)
IF node — URLHaus (query_status = ok)

Description

Each CTI provider uses a dedicated IF node with a different condition, because each API returns data in a different format.

VirusTotal evaluates raw last_analysis_stats fields directly:

if-virustotal.json

{
  "combinator": "or",
  "conditions": [
    { "leftValue": "={{ $json.data.attributes.last_analysis_stats.malicious }}", "operator": "gt", "rightValue": 0 },
    { "leftValue": "={{ $json.data.attributes.last_analysis_stats.suspicious }}", "operator": "gt", "rightValue": 0 }
  ]
}

ThreatFox & URLHaus evaluate the query_status field returned by the Abuse.ch API:

if-abusech.json

{
  "conditions": [
    { "leftValue": "={{ $json.query_status }}", "operator": "equals", "rightValue": "ok" }
  ]
}

If the condition is false, the workflow branches to an Insert a clear scan MySQL node, which registers a baseline score of 1 (Clean) for that provider. This prevents redundant API calls on already-clean observables and maintains full relational integrity.

6. Raw Data Storage (MongoDB)

Components

MongoDB nodes (Insert doc VirusTotal, Insert doc ThreatFox, Insert doc URLHaus)
Transformation Code nodes (Edit Json for Mongo - *)
MySQL nodes (Insert a clear scan - VirusTotal, Insert a clear scan - ThreatFox, Insert a clear scan - URLHaus)

Description

The storage layer has two parallel paths depending on the IF node result.

Path A — Data found: Raw API response is transformed and archived in MongoDB, then the enriched result proceeds to the normalization layer.

Path B — No data (clean result): A baseline scan record with threat_score = 1 is written directly to MySQL. This short-circuits the full AI pipeline and prevents redundant API calls for already-clean observables.

All MongoDB documents are inserted into:

collection: threat_data_raw

Used for:

forensic analysis
audit trail
reprocessing

6.1 Example Transformation

mongo-transform.js

// Prepare structured document for MongoDB
return {
    resource: $('Select rows from a table').first().json.observable_ip,
    type: 'IP',
    source_provider: 'VirusTotal',
    scan_date: new Date().toISOString(),
    raw_data: $('VirusTotal IP Scan').item.json
};

6.2 Clean Scan — MySQL Fallback

When a provider returns no data, a safe baseline is written immediately:

insert-clean-scan.sql

-- Register a safe baseline verdict (score = 1) for referential integrity
INSERT INTO cyber_intelligence.ai_analysis_results (
    threat_score, verdict_summary_en, analysis_pl
) VALUES (1, null, null);

SET @last_clean_result_id = LAST_INSERT_ID();

-- Log the scan event in threat_indicators
INSERT INTO cyber_intelligence.threat_indicators (
    dns_query_id, type_id, analysis_result_id, last_scan
) VALUES (
    {{ $('Select rows from a table').item.json.dns_query_id }},
    (SELECT id FROM cyber_intelligence.dic_indicator_types WHERE name = 'IP'),
    @last_clean_result_id,
    NOW()
);

7. Data Normalization Layer

Components

Code nodes (per provider): Data reduction and aggregation - VirusTotal / ThreatFox / Urlhaus
Merge node
Aggregation node (Code for Merge)
MySQL node Select threat scale for AI agent (loads the dynamic 1–5 scale from v_threat_scale_for_agent)

Description

Each provider response is reduced into a compact, structured object before being passed to the AI. Raw API payloads are large — normalization extracts only the fields relevant for threat scoring. In addition to the three CTI objects, the workflow also pulls the current threat scale from dic_threat_levels so the AI prompt always reflects the live policy without redeploying the workflow.

VirusTotal extracts:

Field	Description
`vt_report`	Human-readable summary string
`vt_stats`	Raw counts: `malicious`, `suspicious`, `undetected`
`vt_owner`	AS owner name (e.g. `Google LLC`)
`vt_is_big_player`	`true` if owner matches known trusted providers
`vt_malicious_count`	Number of malicious engine detections
`vt_scan_date`	Freshness of the data
`no_data`	`true` if VirusTotal has no record for this resource

ThreatFox extracts:

Field	Description
`threatfox_report`	Identified malware families and threat types
`threatfox_active`	`true` if an actively confirmed threat exists
`threatfox_max_confidence`	Reliability score of the report
`no_data`	`true` if ThreatFox has no record

URLHaus extracts:

Field	Description
`urlhaus_report`	Details on active malware distribution
`is_active_threat`	`true` if payload URL is currently online
`urlhaus_reference`	Direct evidence link
`no_data`	`true` if URLHaus has no record

7.1 Aggregation Logic

All normalized objects (three CTI + the threat scale) are merged into a single context object for the AI prompt:

merge.js

// Merge all CTI sources and the threat scale into one object
let combinedData = {};

for (const item of $input.all()) {
    Object.assign(combinedData, item.json);
}

return combinedData;

8. AI Threat Analysis

Components

AI Agent
Google Gemini Model

Description

The AI layer acts as a Senior Cyber Threat Intelligence Analyst.

Responsibilities:

Correlate multiple CTI sources
Apply the dynamic 1–5 scoring model loaded from dic_threat_levels at runtime
Detect malware patterns
Reduce false positives (e.g. big cloud providers)
Generate bilingual output (EN + PL) plus a scoring_rationale audit field

Threat scale is data, not code

The five-point scale is no longer hardcoded into the prompt. It is fetched from v_threat_scale_for_agent (see Database Schema) and injected into the prompt at every run. Adjusting the policy means a single UPDATE on dic_threat_levels — no workflow redeploy.

Prompt Structure

ai-prompt.txt

ROLE:
You are a Senior Cyber Threat Intelligence Analyst in the Cyber Sentinel system.
Your task is to evaluate an artifact based on aggregated data from VirusTotal,
ThreatFox, and URLHaus, and return a structured analysis.

CONTEXT & SCORING POLICY:
The threat scale is provided dynamically via the `threat_scale` field in the
input object (loaded from dic_threat_levels). Use exactly the scores listed
there. The current policy is:

SCORE | DESCRIPTION                                | IS_MALICIOUS | ACTION
1     | Clean / Trusted infrastructure             | FALSE        | Allow
2     | Low Risk / Monitor                         | FALSE        | Monitor
3     | Suspicious — manual review needed          | FALSE        | Review
4     | Malicious — confirmed threat               | TRUE         | Block
5     | Critical — active threat, immediate alert  | TRUE         | Block + Alert

DATA SOURCE PRE-ANALYSIS:
Before finalizing the score, analyze every variable returned by the source nodes.
You will receive multiple threat intelligence reports in a single data object.
Synthesize ALL provided reports into ONE single evaluation.

Data : {{ JSON.stringify($('Code for Merge').item.json) }}

  1. VirusTotal Section ("virustotal"):
     - If "no_data" is true, skip this source.
     - Use "vt_report" for the textual summary.
     - Analyze "vt_stats" and "vt_malicious_count" for raw detection levels.
     - Check "vt_owner" and "vt_is_big_player" to identify trusted infrastructure.
     - Reference "vt_scan_date" for data freshness.

  2. ThreatFox Section ("threatfox"):
     - If "no_data" is true, skip this source.
     - Use "threatfox_report" for malware families and threat types.
     - "threatfox_active" (boolean): true = actively confirmed threat.
     - "threatfox_max_confidence": reliability of the report.

  3. URLHaus Section ("urlhaus") — SUPPORTING source only:
     - If "no_data" is true, skip this source.
     - URLHaus may add at most +1 to the score, and ONLY if VirusTotal or
       ThreatFox has already flagged the indicator. URLHaus is NEVER the
       sole driver of a malicious verdict.
     - Use "urlhaus_report" and "urlhaus_reference" for evidence links.

DATA AVAILABILITY RULES:
1. Each source contains a "no_data" flag.
2. If "no_data" is true, treat as clean for that specific source.
3. If all sources report "no_data": true → score 1 (known big player) or 2
   (entirely unknown but harmless on its face).

ANALYSIS RULES:
1. Cross-Check: If ThreatFox confirms an active threat
   (threatfox_active: true), threat_score MUST be >= 4 (Block) regardless
   of VirusTotal detections.
2. Big Player Guard: If vt_is_big_player is true, cap the score at 2 unless
   ThreatFox confirms a specific malware family. This eliminates false
   positives on AWS, Cloudflare, Google, Microsoft, GitHub, etc.
3. Flagging: is_malicious is driven by dic_threat_levels.is_malicious_flag —
   currently scores 4–5 are malicious, 1–3 are not.
4. Attribution: Identify the infrastructure owner (vt_owner) and specify
   malware families (e.g., ValleyRAT) if present in threatfox/urlhaus.

STRICT JSON FORMATTING RULE:
1. All string values must be single-line — never use physical line breaks.
2. SQL ESCAPING: replace every single quote (') with two single quotes ('').
3. Avoid backslashes (\) or backticks (`).

REQUIRED OUTPUT (STRICT JSON ONLY):
{
  "threat_score": [number 1-5],
  "is_malicious": [boolean],
  "threat_label": "[Clean / Monitor / Review / Block / Block + Alert]",
  "verdict_en": "[Technical summary, max 200 chars]",
  "analysis_pl": "[Komentarz w jezyku Polskim: 1. Wlasciciel IP. 2. Typ zagrozenia. 3. Rekomendacja]",
  "active_providers": ["virustotal", "threatfox", "urlhaus"],
  "scoring_rationale": "[Short reason for the score — audit input for the future self-healing meta-agent]"
}

9. AI Output Parsing

Component

Code node

Description

Cleans LLM output and converts it into valid JSON.

Example

parse-ai.js

// Retrieve the raw output from the AI node
let rawText = $json.output;

// Clean the markdown formatting and newlines from the AI output
let cleanJson = rawText.replace(/```json\n|```/g, "").trim();
cleanJson = cleanJson.replace(/[\r\n]+/g, " ");

try {
    // Parse the cleaned JSON string into a structured object
    const data = JSON.parse(cleanJson);
    const providerDetails = [];

    // Helper function to retrieve Mongo IDs from previous transformation nodes
    const getMongoId = (nodeName, providerKey) => {
        try {
            const nodeData = $(nodeName).all();
            for (const entry of nodeData) {
                if (entry.json && entry.json[providerKey] && entry.json[providerKey].mongo_id) {
                    return entry.json[providerKey].mongo_id;
                }
            }
        } catch (e) {
            return null;
        }
        return null;
    };

    // Map active providers to their respective MongoDB IDs for the final report
    if (data.active_providers.includes("virustotal")) {
        const vt_id = getMongoId('Data reduction and aggregation - VirusTotal', 'virustotal');
        if (vt_id) providerDetails.push({ name: "VirusTotal", mongo_id: vt_id });
    }

    if (data.active_providers.includes("threatfox")) {
        const tf_id = getMongoId('Data reduction and aggregation - ThreatFox', 'threatfox');
        if (tf_id) providerDetails.push({ name: "Abuse_ThreatFox", mongo_id: tf_id });
    }

    if (data.active_providers.includes("urlhaus")) {
        const uh_id = getMongoId('Data reduction and aggregation - Urlhaus', 'urlhaus');
        if (uh_id) providerDetails.push({ name: "Abuse_URLhaus", mongo_id: uh_id });
    }

    return {
        score: data.threat_score,
        is_malicious: data.is_malicious,
        verdict_en: data.verdict_en,
        analysis_pl: data.analysis_pl,
        scoring_rationale: data.scoring_rationale,
        provider_details: providerDetails
    };
} catch (error) {
    return {
        error: "Parsing failed",
        details: error.message,
        raw_received: rawText
    };
}

10. Relational Database Storage (MySQL)

Components

Insert AI verdict
Insert threat indicators
Insert provider details

Description

Final structured intelligence is stored in relational tables:

ai_analysis_results
threat_indicators
threat_indicator_details

Every AI decision is linked back to the original DNS query and raw provider data via MongoDB Object IDs, creating a complete audit trail.

Example

insert-verdict.sql

-- STEP 1: Insert the unique AI verdict into the results table
INSERT INTO cyber_intelligence.ai_analysis_results (
    threat_score,
    verdict_summary_en,
    analysis_pl
) VALUES (
    {{ $json.score }},
    '{{ $json.verdict_en }}',
    '{{ $json.analysis_pl }}'
);

-- Store the generated verdict ID for the next steps
SET @last_result_id = LAST_INSERT_ID();

-- STEP 2: Insert the event record into the threat_indicators table
INSERT INTO cyber_intelligence.threat_indicators (
    dns_query_id,
    type_id,
    analysis_result_id,
    last_scan
) VALUES (
    {{ $('Select rows from a table').item.json.dns_query_id }},
    (SELECT id FROM cyber_intelligence.dic_indicator_types WHERE name = 'IP'),
    @last_result_id,
    NOW()
);

-- Store the event ID to link with specific scanner details
SET @last_event_id = LAST_INSERT_ID();

-- STEP 3: Dynamically insert scanner details (VirusTotal, ThreatFox, etc.)
{{ $json.provider_details.map(p => `
INSERT INTO cyber_intelligence.threat_indicator_details (indicator_id, source_id, mongo_ref_id)
SELECT @last_event_id, id, '${p.mongo_id}'
FROM cyber_intelligence.dic_source_providers WHERE name = '${p.name}';
`).join('\n') }}

-- Return the event ID for subsequent steps (e.g., email notification)
SELECT @last_event_id AS indicator_id;

11. Alerting & Notification

Components

Code node — pick severity styling based on score
Filter node — is_malicious === true OR score === 3
Email node

Description

Alerts are now severity-graded: the email is rendered in green, amber, or red depending on the score. This replaces the old behaviour where every notification was a red 🚨 ALARM, including for clean traffic.

Score	Accent	Header	Badge
1–2	🟢 Green (`#4caf50`)	`✅ INFO: Cyber Sentinel`	`Clean / Monitor`
3	🟡 Amber (`#ff9800`)	`⚠️ REVIEW: Cyber Sentinel`	`Suspicious`
4–5	🔴 Red (`#f44336`)	`🚨 ALERT: Cyber Sentinel`	`Malicious / Critical`

The accent colour is applied consistently across the top border, header background, score number, analysis side bar, and action button — no more red exclamation marks for clean traffic.

Severity Styling

A small Code node before the Email Send node picks the accent palette and headline text from the score:

pick-severity.js

const score = $json.score;
let accent, headerText, badge;

if (score <= 2) {
    accent = '#4caf50';                       // green
    headerText = '✅ INFO: Cyber Sentinel';
    badge = 'Clean / Monitor';
} else if (score === 3) {
    accent = '#ff9800';                       // amber
    headerText = '⚠️ REVIEW: Cyber Sentinel';
    badge = 'Suspicious';
} else {
    accent = '#f44336';                       // red
    headerText = '🚨 ALERT: Cyber Sentinel';
    badge = 'Malicious / Critical';
}

return { ...$json, accent, headerText, badge };

Send Condition

Send the email only when the result is malicious (scores 4–5) or warrants manual review (score 3). Scores 1–2 are logged but do not page the operator:

filter.js

return $json.is_malicious === true || $json.score === 3;

Email Template

The HTML template uses the accent, headerText, and badge fields prepared above. The score is shown out of 5 (not 10), and a severity label is rendered beneath it for instant context.

alert.html

<!DOCTYPE html>
<html>
<head>
    <style>
        body  { font-family: 'Segoe UI', Tahoma, sans-serif; background: #121212; color: #e0e0e0; margin: 0; padding: 20px; }
        .card { max-width: 600px; margin: auto; background: #1e1e1e; border: 1px solid #333; border-radius: 8px; overflow: hidden;
                border-top: 4px solid {{ $json.accent }}; }
        .header     { background: {{ $json.accent }}22; padding: 15px; text-align: center; }
        .content    { padding: 25px; }
        .score-box  { background: #2d2d2d; border-radius: 5px; padding: 10px; text-align: center; margin-bottom: 20px; border: 1px solid #444; }
        .ip-addr    { font-family: 'Courier New', monospace; color: #64ffda; font-size: 18px; font-weight: bold; }
        .analysis   { background: #252525; padding: 15px; border-radius: 4px; border-left: 4px solid {{ $json.accent }}; font-size: 14px; line-height: 1.6; margin-top: 15px; }
        .badge      { display: inline-block; padding: 3px 10px; border-radius: 3px; font-size: 12px; font-weight: bold; background: {{ $json.accent }}; color: #000; }
        .provider-tag { display: inline-block; background: #37474f; color: #cfd8dc; padding: 2px 8px; border-radius: 3px; font-size: 11px; margin-right: 5px; border: 1px solid #546e7a; }
        .btn        { display: inline-block; padding: 10px 20px; background: {{ $json.accent }}; color: #000; text-decoration: none; border-radius: 4px; font-weight: bold; margin-top: 20px; font-size: 14px; }
        .footer     { text-align: center; font-size: 11px; color: #777; padding: 15px; }
    </style>
</head>
<body>
<div class="card">
    <div class="header">
        <h2 style="margin:0; color: {{ $json.accent }};">{{ $json.headerText }}</h2>
    </div>
    <div class="content">
        <div class="score-box">
            <div style="font-size: 12px; text-transform: uppercase; color: #888; letter-spacing: 1px;">Threat Severity Score</div>
            <div style="font-size: 36px; font-weight: bold; color: {{ $json.accent }};">{{ $json.score }}/5</div>
            <div style="margin-top: 4px;"><span class="badge">{{ $json.badge }}</span></div>
        </div>

        <p><strong>Obiekt (IP):</strong> <span class="ip-addr">{{ $('Select rows from a table').item.json.observable_ip }}</span></p>
        <p><strong>Powiązana domena:</strong> <span style="color: #90caf9;">{{ $('Select rows from a table').item.json.fqdn }}</span></p>

        <div style="margin-top: 10px;">
            <strong style="font-size: 13px; color: #888;">Aktywne źródła danych:</strong><br>
            {{ $json.provider_details.map(p => `<span class="provider-tag">${p.name}</span>`).join('') }}
        </div>

        <div class="analysis">
            <strong style="color: {{ $json.accent }};">Analiza (PL):</strong><br>
            <div style="margin-top: 5px;">{{ $json.analysis_pl }}</div>
        </div>

        <p style="color: #bbb; font-style: italic; font-size: 12px; margin-top: 15px; border-top: 1px solid #333; padding-top: 10px;">
            <strong>Technical Verdict:</strong> {{ $json.verdict_en }}<br>
            <strong>Scoring rationale:</strong> {{ $json.scoring_rationale }}
        </p>

        <div style="text-align: center;">
            <a href="https://rdap.arin.net/registry/ip/{{ $('Select rows from a table').item.json.observable_ip }}" class="btn">Sprawdź detale IP (RDAP)</a>
        </div>
    </div>
    <div class="footer">
        System: <strong>Cyber Sentinel v1.0.2-rc1</strong><br>
        Data: {{ new Date().toLocaleString('pl-PL') }}
    </div>
</div>
</body>
</html>

12. Architecture Patterns

Dual Storage Strategy

MongoDB → raw intelligence (forensics)
MySQL → structured data (operations)

Zero Trust Secrets

All secrets from Vault
No hardcoded credentials

Multi-Source Correlation

VirusTotal + ThreatFox + URLHaus

AI Decision Engine

Central intelligence layer
Context-aware scoring
Dynamic threat scale loaded at runtime — no prompt redeploy on policy change

13. End-to-End Flow

Schedule trigger starts workflow
Fetch observable from MySQL
Enrich using CTI providers (VirusTotal, ThreatFox, URLHaus)
Store raw responses in MongoDB
Normalize and merge data, plus load the dynamic threat scale
Perform AI analysis (1–5 score, bilingual verdict, scoring rationale)
Store results in MySQL
Send severity-graded alert email if is_malicious = true (scores 4–5) or score = 3 (review)

14. Practical Use Cases

SOC automation pipelines
Home network threat monitoring
Threat intelligence enrichment (SIEM/SOAR)
Malware infrastructure detection
Automated incident triage

15. v2 Workflow Preview

A second version of the workflow is in active development. The goal is to formalise URLHaus as a supporting source rather than a primary driver — at the data-flow level, not just in the prompt rules.

What changes

In v1, all three CTI providers run in parallel and feed the AI together. In v2, the flow is split into two stages:

Primary stage — VirusTotal and ThreatFox run in parallel. Their normalized outputs go through a Compute primary verdict node and an IF primary flagged gate.
URLHaus stage (conditional) — URLHaus is queried only when the primary stage has already flagged the indicator. If primary is clean, URLHaus is skipped entirely (URLhaus skip stub); if URLHaus itself returns no data for a flagged indicator, a URLhaus no-result stub placeholder is injected so downstream merges stay consistent.
Final merge — primary verdict + URLHaus result (real, stub, or skipped) are merged for AI analysis.

Conceptual diagram

flowchart LR
    VT[VirusTotal scan] --> MP[Merge primary]
    TF[ThreatFox scan] --> MP
    MP --> CP[Compute primary verdict]
    CP --> IF{IF primary flagged?}
    IF -- No --> SK[URLhaus skip stub]
    IF -- Yes --> UH[URLHaus query]
    UH --> IFU{IF URLhaus query ok?}
    IFU -- Yes --> UR[URLhaus reduction]
    IFU -- No --> NR[URLhaus no-result stub]
    SK --> FM[Final merge]
    UR --> FM
    NR --> FM
    FM --> FA[Final aggregation]
    FA --> AI[AI Agent]

Why

Fewer false positives. URLHaus catalogues many legitimate platforms (GitHub, Pastebin, Bitbucket) as historical hosts of malicious payloads. Letting it drive a verdict on its own pushed clean infrastructure to high scores. As a supporting source URLHaus can only confirm a primary signal, never create one.
Lower API pressure. Indicators that are clean according to VirusTotal and ThreatFox no longer cost a URLHaus call.
Cleaner audit trail. The scoring_rationale field can now distinguish "URLHaus confirmed" from "URLHaus not consulted" explicitly.

v2 is not yet shipped with the v1.0.2-rc1 release — the workflow JSON will follow once it has been validated in staging.