Security standard operating procedures (SOPs)

Validator “Konomic” | Version 1.1 | Last Updated: June 16, 2025

1. Purpose and Scope

This document outlines the organization of security, operation, and monitoring of the Konomic validator operating on the Solana Mainnet-Beta network.

The policy applies to:

  • The primary validator node (vote account) and backup nodes
  • Ancillary services (RPC-relay, monitoring, off-chain storage)
  • Employees and automated processes with access to keys and configuration

2. Key Management

Access to HSM is restricted to two senior operators only. No keys are stored on online instances; authorization occurs via offline-signed transactions.

Key TypeStorage MethodBackupRotation
Vote & IdentityFIPS-140-3 HSM in offline segmentAir-gapped laptop (LUKS, TPM)Scheduled: every 18 months
Emergency: upon suspicion of compromise
Authorized VoterFIPS-140-3 HSM in offline segmentAir-gapped laptop (LUKS, TPM)With each protocol upgrade
Authorized WithdrawerFIPS-140-3 HSM in offline segmentEncrypted microSD card stored in a safe and encrypted image in Cloud Secure StoreScheduled: every 18 months
Emergency: upon suspicion of compromise

3. Infrastructure and Network

  • Data Center: Tier III facility in Frankfurt; backup - Iron Mountain AMS-1.
  • Segmentation: Validator hosted on a dedicated cluster of 3 servers in a 46U rack; public RPCs are isolated in a separate subnet with no outbound rules to the validator.
  • Perimeter Protection: Multi-layer stateful firewall, GeoIP filtering, and rate-limiting.
  • DDoS Mitigation: Automatic switching of external Anycast address upon detection of anomalous traffic.
  • Hardware: Each server with 64 vCPU AMD EPYC 9555, 1024GB RAM DDR5 6000, 8x NVMe SSD Enterprise 1.92 TB RAID-1 + ZFS mirrors (16x total); all drives with hardware encryption.
  • Redundancy: 2N redundancy ensures resilience to simultaneous failures and scheduled maintenance.

4. Software Lifecycle

StageProcess
TestingEach client release is tested on a private DevNet cluster for a minimum of 6 hours.
UpdateTo version N.N.x - within 24 hours of Foundation release. Critical CVEs - within 4 hours.
RollbackHot slot snapshot supported; downtime ≤ 5 minutes.
ModificationsNo kernel patches or non-standard builds are applied.

5. Monitoring and Alerting

  • Metrics: skip-rate, vote-credit, ping-latency, CPU/IO-wait, ledger size.
  • System: Prometheus → Alertmanager → Telegram/SMS; backup - email gateway.
  • Thresholds:
    • skip-rate > 5% for 5 minutes - critical alert;
    • 32 consecutive slot misses - emergency node restart;
    • CPU > 85% or IO-wait > 30% - preemptive failover.

6. Incident Response

  1. Detection (T₀): Automatic alert.
  2. Classification (T₀ + 5 min): Operator assigns priority (P1/P2).
  3. Containment: Activation of backup validator; primary node switched to read-only.
  4. Resolution: Patch/restart/configuration change.
  5. Recovery: Slot synchronization, ledger consistency validation.
  6. Post-Mortem (≤ 48 hours): Root cause analysis, publication of internal report.

RTO ≤ 30 minutes, RPO ≈ 0 slots.

7. Compliance and Audit

  • Internal Audit: Quarterly, based on CIS Benchmark + NIST 800-53 checklist (42 selected items).
  • Penetration Testing: Annually by an independent group; report available to delegators upon signing an NDA.
  • Certifications: Data center certified to ISO 27001/9001/14001/50001, PCI-DSS, SOC 2/3, as confirmed by contract.