Service Level Agreements & Objectives

AegisGate Security Platform publishes transparent service level commitments for every tier. These aren’t aspirational — they’re measurable against the /health, /metrics, and /api/v1/sla endpoints.

SLA by Tier

CommitmentCommunityDeveloperProfessionalEnterprise
Uptime Target99.0%99.9%99.95%99.99%
Support ResponseBest-effort (community)24 hours4 hours1 hour
Support ChannelGitHub IssuesEmail + Priority IssuesDedicated channel24/7 + Slack
Incident ResponseNo SLA8h (P1), 24h (P2)2h (P1), 8h (P2)30min (P1), 2h (P2)
Data Retention7 days90 days365 daysCustom
Uptime Downtime/Year~3.7 days~8.7 hours~4.4 hours~52 minutes

Service Level Objectives

We measure these SLOs continuously via Prometheus metrics exported at /metrics:

SLOTargetWindowMetric
API Availability≥99.5%30dsuccessful_requests / total_requests
Request Latency (P99)≤100ms30dhistogram_quantile(0.99, request_duration_seconds)
MCP Session Availability≥99.9%30dactive_mcp_sessions / attempted_sessions
A2A Auth Success Rate≥99.9%30da2a_auth_success / a2a_auth_total
Guardrail Enforcement Rate100%30dguardrail_enforced / guardrail_evaluated
Audit Log Completeness100%30daudit_entries / total_requests

Why 100% for Guardrails and Audit?

AegisGate is a security product. If a guardrail fails to enforce or an audit log entry goes missing, that’s a security incident — not a performance issue. These SLOs are set to 100% because anything less means a threat could pass through undetected.

The fail-closed architecture ensures that any guardrail error, timeout, or internal failure results in a deny rather than an allow. This architecture guarantees the 100% enforcement target.

Health Check

The /health endpoint verifies all critical dependencies:

{
  "status": "healthy",
  "tier": "community",
  "version": "2.0.1",
  "checks": {
    "proxy": { "enabled": true, "healthy": true },
    "persistence": { "enabled": true, "started": true, "healthy": true },
    "license": { "valid": true, "tier": "community", "healthy": true },
    "certificates": { "valid": true, "healthy": true }
  }
}

If any dependency is unhealthy, the health check returns "status": "unhealthy" with HTTP 503.

SLA API

The /api/v1/sla endpoint returns the SLA and SLOs for the current tier:

curl -s http://localhost:8443/api/v1/sla | jq
{
  "tier": "community",
  "uptime_target": 99,
  "support_response": "Best-effort (community forum)",
  "support_channel": "GitHub Issues",
  "incident_response": "No SLA — best effort",
  "data_retention_days": 7,
  "slos": [
    {
      "name": "api_availability",
      "target": 99.5,
      "window": "30d",
      "metric": "successful_requests / total_requests",
      "description": "API request success rate"
    }
  ]
}

Incident Response Priority Levels

PriorityDefinitionExamples
P1 — CriticalSecurity product is down or actively compromisedProxy down, guardrails bypassed, data breach
P2 — HighMajor feature degraded but functionalMCP sessions failing, A2A auth intermittent
P3 — MediumMinor feature impactedDashboard slow, metrics delay
P4 — LowCosmetic or informationalUI glitch, documentation error

Measuring Compliance

AegisGate uses Prometheus metrics to track all SLOs. You can monitor compliance with standard PromQL queries:

# API availability over 30 days
sum(rate(aegisgate_requests_total{code!~"5xx"}[30d]))
/
sum(rate(aegisgate_requests_total[30d]))

# P99 latency
histogram_quantile(0.99, sum(rate(aegisgate_request_duration_seconds_bucket[30d])) by (le))

# Guardrail enforcement rate
sum(rate(aegisgate_guardrails_enforced_total[30d]))
/
sum(rate(aegisgate_guardrails_evaluated_total[30d]))