Evil-DB Database Update

2026-04-05 — Major Performance Overhaul, Webhook Fix, Feed Discovery Expansion

Database Performance (Critical)

The SQLite database (2.1GB, 5.6M threats) was running extremely slow due to several compounding issues. All fixed without any schema changes.

SQLite Engine Tuning (lib/db.ts)
- Switched from DELETE journal mode to WAL mode — concurrent reads no longer block on writes
- Increased cache from 2MB to 64MB — hot data stays in memory
- Enabled 256MB memory-mapped I/O — reduces disk reads on the 2.1GB file
- Set synchronous = NORMAL (safe with WAL, faster than FULL)
- Set temp_store = MEMORY for temp tables

Query Optimizations
- /api/v1/stats — Replaced 10 separate LIKE scans (56M row scans) with a single UNION ALL query. Increased cache TTL from 5min to 15min. This endpoint is the Docker healthcheck — it was hammering the DB every 30 seconds on cache miss.
- /api/v1/analytics (categories) — Replaced findMany() that loaded all 5.6M rows into Node.js memory with a raw SQL count query. Added 10-minute result caching for all analytics endpoints.
- /api/v1/analytics (geographic) — Added LIMIT 200 to an unbounded JOIN across IpEnrichment (1.5M rows) and ThreatEntry (5.6M rows).

Webhooks (Fixed)

Webhooks were essentially non-functional — the system was fully built out but only triggered in one place.

Webhook Triggers Wired Up
- Feed ingestion cron — fires threat_added / threat_updated after batch ingestion
- AbuseIPDB auto-add — fires threat_added when a lookup auto-creates a threat
- False positive review — fires threat_updated when admin confirms a false positive (confidence changes)
- Scanner auto-detection — fires threat_added when a scanner IP is auto-added to the DB
- Report review — now fires report_status for rejections and duplicates (previously only fired threat_verified)

Retry Logic Fixed (lib/webhooks.ts)
- Failed deliveries with permanent HTTP errors (401, 403, 404, etc.) now correctly marked as failed instead of pending — stops infinite retry loops
- Added isPermanentFailure() helper — 4xx errors (except 408/429) are treated as permanent

Webhook Retry Cron
- New API route: /api/cron/webhook-retry
- Added to vercel.json crons (runs every minute)
- Processes pending retries with exponential backoff

Auto-Ban Persistence (Fixed)

Scanner auto-bans were stored only in memory — lost on every restart/deploy.

New internal endpoint /api/internal/persist-ban writes bans to the existing BannedIp database table
Middleware syncs its in-memory ban cache from DB every 60 seconds
Bans now survive container restarts and work across instances
Internal routes (/api/internal/*) skipped by middleware to prevent recursion

Feed Discovery Expansion

Discovery sources expanded from 3 to 7, with much broader coverage.

New Discovery Sources
- 22 GitHub repos crawled (up from 2) — awesome-lists, IOC repos, APT report repos, blocklist collections
- GitHub Topics API — searches 9 topic tags (threat-intelligence-feeds, blocklist, ioc-feeds, etc.), crawls top repos, checks common feed file paths
- MISP default feeds — parses MISP's official feed registry for freetext/CSV feeds
- abuse.ch index — 16 feeds across URLhaus, ThreatFox, SSL Blacklist, Malware Bazaar, Feodo Tracker

New Hardcoded Feeds (~30 added)
- C2 tracker feeds: Cobalt Strike, Metasploit, Sliver, Havoc, BruteRatel, PoshC2
- C2IntelFeeds (IPs and domains, 30-day windows)
- Additional IPSum levels (2, 4, 6, 7)
- DataPlane.org feeds (SSH, DNS, VNC)
- Additional phishing/scam domain lists
- More malware hash feeds (MD5 + SHA256 from Malware Bazaar and ThreatFox)
- Aggressive SSL blacklist

URL Extraction Improvements
- Stopped skipping .json files — many modern feeds are JSON format
- Added patterns for .csv, .netset, .ipset, .zone, .rules, /export/, /ioc/, /indicators/
- Reduced false-positive skips

Automated Feed Discovery Agent

Scheduled a daily Claude Code remote agent ("EvilDB Feed Hunter") that:
1. Searches the web and GitHub for new threat intel feeds
2. Validates discovered URLs by fetching and parsing them
3. Appends verified feeds to data/discovered-feeds.json
4. Commits results back to the repo
5. Runs daily at midnight UTC