Security standard operating procedures (SOPs)
Validator “Konomic” | Version 1.1 | Last Updated: June 16, 2025
1. Purpose and Scope
This document outlines the organization of security, operation, and monitoring of the Konomic validator operating on the Solana Mainnet-Beta network.
The policy applies to:
- The primary validator node (vote account) and backup nodes
- Ancillary services (RPC-relay, monitoring, off-chain storage)
- Employees and automated processes with access to keys and configuration
2. Key Management
Access to HSM is restricted to two senior operators only. No keys are stored on online instances; authorization occurs via offline-signed transactions.
| Key Type | Storage Method | Backup | Rotation |
|---|---|---|---|
| Vote & Identity | FIPS-140-3 HSM in offline segment | Air-gapped laptop (LUKS, TPM) | Scheduled: every 18 months Emergency: upon suspicion of compromise |
| Authorized Voter | FIPS-140-3 HSM in offline segment | Air-gapped laptop (LUKS, TPM) | With each protocol upgrade |
| Authorized Withdrawer | FIPS-140-3 HSM in offline segment | Encrypted microSD card stored in a safe and encrypted image in Cloud Secure Store | Scheduled: every 18 months Emergency: upon suspicion of compromise |
3. Infrastructure and Network
- Data Center: Tier III facility in Frankfurt; backup - Iron Mountain AMS-1.
- Segmentation: Validator hosted on a dedicated cluster of 3 servers in a 46U rack; public RPCs are isolated in a separate subnet with no outbound rules to the validator.
- Perimeter Protection: Multi-layer stateful firewall, GeoIP filtering, and rate-limiting.
- DDoS Mitigation: Automatic switching of external Anycast address upon detection of anomalous traffic.
- Hardware: Each server with 64 vCPU AMD EPYC 9555, 1024GB RAM DDR5 6000, 8x NVMe SSD Enterprise 1.92 TB RAID-1 + ZFS mirrors (16x total); all drives with hardware encryption.
- Redundancy: 2N redundancy ensures resilience to simultaneous failures and scheduled maintenance.
4. Software Lifecycle
| Stage | Process |
|---|---|
| Testing | Each client release is tested on a private DevNet cluster for a minimum of 6 hours. |
| Update | To version N.N.x - within 24 hours of Foundation release. Critical CVEs - within 4 hours. |
| Rollback | Hot slot snapshot supported; downtime ≤ 5 minutes. |
| Modifications | No kernel patches or non-standard builds are applied. |
5. Monitoring and Alerting
- Metrics: skip-rate, vote-credit, ping-latency, CPU/IO-wait, ledger size.
- System: Prometheus → Alertmanager → Telegram/SMS; backup - email gateway.
- Thresholds:
- skip-rate > 5% for 5 minutes - critical alert;
- 32 consecutive slot misses - emergency node restart;
- CPU > 85% or IO-wait > 30% - preemptive failover.
6. Incident Response
- Detection (T₀): Automatic alert.
- Classification (T₀ + 5 min): Operator assigns priority (P1/P2).
- Containment: Activation of backup validator; primary node switched to read-only.
- Resolution: Patch/restart/configuration change.
- Recovery: Slot synchronization, ledger consistency validation.
- Post-Mortem (≤ 48 hours): Root cause analysis, publication of internal report.
RTO ≤ 30 minutes, RPO ≈ 0 slots.
7. Compliance and Audit
- Internal Audit: Quarterly, based on CIS Benchmark + NIST 800-53 checklist (42 selected items).
- Penetration Testing: Annually by an independent group; report available to delegators upon signing an NDA.
- Certifications: Data center certified to ISO 27001/9001/14001/50001, PCI-DSS, SOC 2/3, as confirmed by contract.