Kubernetes Conformance Audit

The Kubernetes conformance audit system ensures that new API endpoints reaching General Availability (GA) are properly covered by conformance tests, preventing technical debt accumulation and maintaining API testing integrity across the project.

Overview

The conformance audit process automatically tracks API endpoint usage through audit logs and compares it against conformance test coverage, flagging gaps that need attention from contributors and maintainers.

Key Components

Swagger/OpenAPI Specification: The authoritative definition of Kubernetes APIs in swagger.json
Audit Log Analysis: Scripts that parse Kubernetes audit logs to identify API endpoint usage
Endpoint Tracking Files: YAML files that categorize endpoints as pending, ineligible, or conformance-ready
CI Jobs: Automated jobs that run the analysis and report on compliance

Conformance Audit Workflow

The conformance audit system operates through a continuous integration pipeline that validates API endpoint coverage:

┌─────────────────┐    ┌────────────────┐    ┌──────────────────┐
│   swagger.json  │────│ Periodic CI    │────│ Baseline Audit   │
│   Changes       │    │ Job Runs       │    │ Data Generated   │
└─────────────────┘    └────────────────┘    └──────────────────┘
                                                       │
┌─────────────────┐    ┌────────────────┐    ┌──────────────────┘
│ Pull Request    │────│ Presubmit CI   │────│ 
│ Submitted       │    │ Job Triggers   │    │
└─────────────────┘    └────────────────┘    │
                                             ▼
┌─────────────────┐    ┌────────────────┐    ┌──────────────────┐
│ Endpoint Files  │◄───│ Analysis &     │◄───│ Comparison       │
│ Updated         │    │ Validation     │    │ Against Baseline │
└─────────────────┘    └────────────────┘    └──────────────────┘

Detailed Process Flow

API Development: Developers add new endpoints to the Kubernetes API
Swagger Generation: The OpenAPI specification (swagger.json) is updated automatically
PR Submission: Changes are submitted as pull requests
Trigger Analysis: Modifications to swagger.json trigger the presubmit audit job
Audit Collection: Conformance tests run and generate comprehensive audit logs
Endpoint Mapping: Scripts map audit entries to swagger operation IDs
Baseline Comparison: PR results are compared against the latest periodic job baseline
Gap Identification: New stable endpoints without conformance coverage are flagged
Enforcement: Contributors must categorize endpoints appropriately
Validation: Updated tracking files are validated for completeness

CI Jobs

Periodic Job: `ci-kubernetes-audit-kind-conformance`

Purpose: Establishes the baseline for API endpoint coverage by running conformance tests and generating audit logs.

When it runs: Scheduled periodically to maintain current baseline data

What it does:

Creates a KIND (Kubernetes in Docker) cluster
Runs all 425+ conformance tests
Generates audit logs of API endpoint usage during test execution
Parses logs to create endpoint usage reports

Artifacts generated:

audit*.log: Raw audit log files (~90-100MB) - Complete Kubernetes API audit trail
audit-endpoints.txt: Human-readable endpoint usage summary (~27KB) - Sorted by usage frequency
audit-operations.json: JSON mapping of operations to audit entries (~2MB) - Sample data for each operation
policy.yaml: Audit policy configuration - Defines which events to capture

Example audit-endpoints.txt output:

Total unique endpoints: 714
Total calls: 87,241

readCoreV1NamespacedPodStatus: 15234 calls
listCoreV1NamespacedEvent: 8921 calls
createCoreV1NamespacedEvent: 7345 calls
patchCoreV1NamespacedPod: 5678 calls
...

Example audit-operations.json structure:

{
  "readCoreV1NamespacedPodStatus": {
    "count": 15234,
    "sample": {
      "verb": "get",
      "uri": "/api/v1/namespaces/default/pods/test-pod/status",
      "user": "system:serviceaccount:default:test-runner"
    }
  }
}

Monitoring: TestGrid Dashboard

Presubmit Job: `pull-kubernetes-audit-kind-conformance`

Purpose: Analyzes changes in pull requests to identify new API endpoints that need conformance test coverage.

When it runs: Triggered automatically when swagger.json is modified in a pull request

What it does:

Runs the same conformance tests as the periodic job
Compares audit results against the latest periodic job baseline
Identifies newly added, removed, or stable-but-untested endpoints
Validates endpoint categorization in tracking files

Additional analysis:

Highlights API operation differences between PR and baseline
Checks for proper categorization of endpoints
Reports on conformance test gaps

Monitoring: TestGrid Dashboard

Endpoint Classification System

Conformance-Eligible Endpoints

Endpoints that should eventually be covered by conformance tests but are not yet tested.

File: pending_eligible_endpoints.yaml

Contains: API endpoints awaiting conformance test development, typically for:

Recently added GA features
Resource management operations (pod resizing, device classes)
Core API operations not yet covered

Example entries:

# Pod resizing functionality
- readCoreV1NamespacedPodStatus
- patchCoreV1NamespacedPod

# Dynamic Resource Allocation
- createResourceV1alpha3NamespacedResourceClaim
- listResourceV1alpha3NamespacedResourceClaim
- readResourceV1alpha3NamespacedResourceClaim

Entry format: Simple list of operation IDs from swagger.json that correspond to stable API endpoints that should have conformance test coverage but don't yet.

Ineligible Endpoints

Endpoints explicitly excluded from conformance testing for valid technical or policy reasons.

File: ineligible_endpoints.yaml

Categories:

Deprecated endpoints: Soon-to-be-removed functionality
Optional features: Components not required in all Kubernetes distributions (NetworkPolicy, HPA)
Debug features: Development and troubleshooting tools (port forwarding, pod attach)
Administrative endpoints: Operations that distributions may restrict for security
Unstable features: APIs that lack stable implementations across providers

Each entry includes the endpoint name, exclusion reason, and link to relevant issue discussion.

Example entries:

- endpoint: connectCoreV1DeleteNodeProxy
  reason: "Unable to be tested, and likely soon deprecated"
  issue: "https://github.com/kubernetes/kubernetes/issues/12345"

- endpoint: readAuthorizationV1NamespacedLocalSubjectAccessReview  
  reason: "Optional feature - not all distros implement RBAC authorization"
  issue: "https://github.com/kubernetes/kubernetes/issues/67890"

- endpoint: connectCoreV1GetNamespacedPodPortforward
  reason: "Explicitly designed to be a debug feature"
  issue: "https://github.com/kubernetes/kubernetes/issues/24680"

Entry format: Structured YAML with endpoint name, justification, and reference to community discussion.

Audit Analysis Scripts

`audit_log_parser.py`

Purpose: Parses Kubernetes audit logs and maps entries to official API specifications.

Key functions:

Downloads current swagger.json specification
Processes audit log JSON entries
Maps log entries to OpenAPI operation IDs
Generates usage statistics and operation samples

Outputs:

Console report with endpoint counts and matches
audit-endpoints.txt: Sorted list of endpoints with usage counts
audit-operations.json: Sample audit entries for each operation

Command line usage:

# Basic usage with single audit log
python3 audit_log_parser.py --audit-logs audit.log

# Process multiple audit log files
python3 audit_log_parser.py --audit-logs audit-*.log

# Custom swagger URL and sorting
python3 audit_log_parser.py --audit-logs audit.log \
  --swagger-url https://custom.k8s.io/swagger.json \
  --sort count

# Output to specific file
python3 audit_log_parser.py --audit-logs audit.log \
  --output results/

Common errors and solutions:

URLError: [SSL: CERTIFICATE_VERIFY_FAILED]: Network issues downloading swagger specification
- Solution: Check network connectivity and proxy settings
json.JSONDecodeError: Expecting value: Malformed JSON in audit log files
- Solution: Verify audit log format and check for truncated files
FileNotFoundError: [Errno 2] No such file: Missing or unreadable audit log files
- Solution: Verify file paths and permissions
KeyError: 'operationId': Unexpected log entry formats
- Solution: Ensure audit logs are from conformance test runs with proper policy

`kubernetes_api_analysis.py`

Purpose: Compares audit results between pull requests and CI baseline to identify API changes.

Key functions:

Downloads latest CI audit data from Google Cloud Storage
Compares local PR audit results against CI baseline
Identifies added, removed, and stable-but-unused operations
Provides detailed change analysis

Analysis types:

New operations: API endpoints introduced in the PR
Removed operations: Previously audited endpoints no longer called
Stable unused: GA endpoints not exercised by conformance tests

Command line usage:

# Basic comparison against latest CI baseline
python3 kubernetes_api_analysis.py

# Use specific baseline file (skip auto-download)
python3 kubernetes_api_analysis.py --baseline-file ci-audit-endpoints.txt

# Custom output directory  
python3 kubernetes_api_analysis.py --output-dir analysis-results/

# Detailed verbose output
python3 kubernetes_api_analysis.py --verbose

Sample output:

Downloading latest CI audit data...
Found CI run: 1961316246318223360 (2025-08-29)

Comparing operations:
- New operations in PR: 9
  * createResourceV1alpha3NamespacedResourceClaim
  * readResourceV1alpha3NamespacedResourceClaim
  * listResourceV1alpha3ResourceSlice
  
- Removed operations: 0
- Stable operations not exercised: 127

Common errors and solutions:

Command 'gsutil' not found: Missing gsutil CLI tool
- Solution: Install Google Cloud SDK or use --baseline-file option
AccessDenied: 403 Insufficient Permission: Permissions issues accessing GCS buckets
- Solution: Run gcloud auth login or use service account authentication
ConnectionError: HTTPSConnectionPool: Network connectivity problems
- Solution: Check firewall/proxy settings for GCS access
swagger.json parsing failed: Swagger specification parsing failures
- Solution: Verify swagger.json format and ensure it's accessible

Step-by-Step Resolution Guides

Marking Endpoints as Ineligible

Problem: Your PR adds endpoints that shouldn't be part of conformance testing.

Step-by-step resolution:

Document the reasoning:

# Add to ineligible_endpoints.yaml
- endpoint: connectCoreV1GetNamespacedPodExec
  reason: "Debug feature requiring special cluster permissions"
  issue: "https://github.com/kubernetes/kubernetes/issues/12345"

Create or reference issue:

Title: Conformance exemption: Pod exec endpoint

Body: The connectCoreV1GetNamespacedPodExec endpoint should be 
exempt from conformance testing because:

1. It's primarily a debug/development feature
2. Requires special RBAC permissions that conformance tests avoid
3. Implementation varies significantly across distributions

Follow established patterns:
- Review existing ineligible entries for similar reasoning
- Ensure your exemption follows community precedent
- Get SIG approval for controversial exemptions

Enforcement Process

For Pull Request Authors

When the presubmit job identifies issues, contributors must follow the appropriate scenario guide above. The system enforces:

Stable endpoints not in pending_eligible_endpoints.yaml: Must be added to pending list or have conformance tests
Endpoints in pending_eligible_endpoints.yaml that are now tested: Must be removed from pending list
New ineligible endpoints: Must be documented in ineligible list with justification

Error Messages and Resolution

"Stable endpoint X not found in pending_eligible_endpoints.yaml"

Cause: New GA endpoint lacks conformance test coverage
Resolution: Add endpoint to pending list or create conformance test

"Endpoint X found in pending_eligible_endpoints.yaml but is being tested"

Cause: Endpoint now has conformance test coverage
Resolution: Remove from pending list as no longer needed

Technical Implementation Details

Audit Log Collection

The conformance audit system relies on comprehensive Kubernetes audit logging during test execution:

Audit Policy Configuration:

# policy.yaml - Captures all API server requests
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: Request
  namespaces: ["default", "kube-system"]
  verbs: ["get", "list", "create", "update", "patch", "delete"]
  resources:
  - group: ""
    resources: ["*"]
  - group: "apps"
    resources: ["*"]

KIND Cluster Configuration:

# kind-config.yaml - Enables audit logging
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  kubeadmConfigPatches:
  - |
    kind: ClusterConfiguration
    apiServer:
      extraArgs:
        audit-log-path: /var/log/audit.log
        audit-policy-file: /etc/kubernetes/audit-policy.yaml
        audit-log-maxage: "30"
        audit-log-maxbackup: "3"
        audit-log-maxsize: "100"
      extraVolumes:
      - name: audit-policy
        hostPath: /etc/kubernetes/audit-policy.yaml
        mountPath: /etc/kubernetes/audit-policy.yaml
        readOnly: true
        pathType: File

Script Architecture

audit_log_parser.py workflow:

Download swagger.json: Fetches latest OpenAPI specification
Parse operation mappings: Builds lookup table of operationId → HTTP method/path
Process audit logs: Parses each JSON log entry
Map to operations: Matches audit entries to swagger operations using URI patterns
Generate statistics: Counts usage and creates sample data
Output results: Writes human-readable and machine-readable reports

kubernetes_api_analysis.py workflow:

Discover baseline: Searches GCS for latest CI audit data
Download artifacts: Retrieves baseline audit-endpoints.txt
Parse local data: Processes current PR's audit results
Compare datasets: Identifies additions, removals, and stable unused endpoints
Generate report: Outputs detailed comparison analysis

Data Flow Architecture

┌─────────────────┐
│ Kubernetes API  │
│ Server (KIND)   │
└─────────────────┘
         │ HTTP requests during tests
         ▼
┌─────────────────┐
│ Audit Log       │
│ (JSON format)   │
└─────────────────┘
         │ Raw audit events
         ▼
┌─────────────────┐    ┌─────────────────┐
│ audit_log_      │◄───│ swagger.json    │
│ parser.py       │    │ (OpenAPI spec)  │
└─────────────────┘    └─────────────────┘
         │ Parsed operation counts
         ▼
┌─────────────────┐    ┌─────────────────┐
│ audit-endpoints │    │ audit-operations│
│ .txt (summary)  │    │ .json (samples) │
└─────────────────┘    └─────────────────┘
         │ Current PR data
         ▼
┌─────────────────┐    ┌─────────────────┐
│ kubernetes_api_ │◄───│ CI Baseline     │
│ analysis.py     │    │ (from GCS)      │
└─────────────────┘    └─────────────────┘
         │ Comparison analysis
         ▼
┌─────────────────┐
│ Change Report   │
│ (console output)│
└─────────────────┘

CI Integration Points

Presubmit trigger conditions:

Changes to api/openapi-spec/swagger.json
Modifications in conformance test directories
Updates to endpoint classification files

Job execution environment:

Container: gcr.io/k8s-staging-test-infra/krte
Resources: 2 CPU cores, 9GB RAM
Timeout: 150 minutes
Storage: 100GB for KIND images and audit logs

Artifact collection:

# CI job uploads artifacts to GCS
gsutil -m cp -r artifacts/audit/* \
  gs://kubernetes-ci-logs/pr-logs/pull/${PULL_NUMBER}/pull-kubernetes-audit-kind-conformance/${BUILD_ID}/artifacts/audit/

Troubleshooting

CI Job Failures

Build failures: Check build logs for:

KIND cluster creation issues
Test execution failures
Script execution errors
Artifact generation problems

Analysis failures: Common issues:

Swagger specification download problems
Audit log parsing errors
GCS access permission issues
File format inconsistencies

Script Execution Issues

Local development:

# Run audit log parser
python3 audit_log_parser.py --audit-logs audit.log

# Compare against CI baseline  
python3 kubernetes_api_analysis.py

Requirements:

Python 3.x
Network access for downloading specifications
gsutil installed for GCS operations
Valid audit log files in JSON format

Common Resolution Steps

Check network connectivity for specification downloads
Verify file permissions for audit logs and output files
Validate JSON format of audit log entries
Ensure gsutil authentication for GCS access
Review script error output for specific failure details

Best Practices

For API Developers

Update tracking files promptly:

# Good workflow - immediate tracking
git add api/openapi-spec/swagger.json
git add test/conformance/testdata/pending_eligible_endpoints.yaml
git commit -m "Add new stable endpoint to pending conformance list

The createAppsV1NamespacedDeployment endpoint is now stable but
lacks conformance test coverage. Tracked in issue #12345."

Provide clear justifications:

# Good documentation in ineligible_endpoints.yaml
- endpoint: connectCoreV1GetNamespacedPodPortforward
  reason: "Debug feature: requires special permissions, varies by implementation"
  issue: "https://github.com/kubernetes/kubernetes/issues/24680"
  
# Poor documentation
- endpoint: someEndpoint
  reason: "doesn't work"  # Too vague
  issue: ""               # Missing reference

Test locally before submitting:

# Complete local validation workflow
# 1. Run conformance tests with audit logging
kind create cluster --config kind-audit-config.yaml
export KUBECONFIG="$(kind get kubeconfig-path)"
go test ./test/e2e/... --ginkgo.focus="Conformance"

# 2. Parse audit logs
python3 audit_log_parser.py --audit-logs /tmp/audit.log

# 3. Compare against baseline
python3 kubernetes_api_analysis.py --baseline-file previous-endpoints.txt

# 4. Verify endpoint categorization
diff -u previous-pending.yaml test/conformance/testdata/pending_eligible_endpoints.yaml

Frequently Asked Questions

General Questions

Q: Why is conformance audit enforcement necessary? A: Without systematic enforcement, stable APIs can reach GA status without adequate test coverage, creating technical debt and potential compatibility issues across Kubernetes distributions. The audit system ensures every stable endpoint is either tested or explicitly documented as exempt.

Q: How often do the CI jobs run? A: The periodic job runs automatically on a schedule to maintain baseline data. The presubmit job triggers only when swagger.json or related conformance files are modified in pull requests.

Q: Can I disable the audit check for my PR? A: No, the conformance audit is mandatory for API changes. However, you can satisfy the requirements by appropriately categorizing endpoints in the tracking files rather than writing tests immediately.

Technical Questions

Q: Why doesn't the audit pick up my API endpoint usage? A: Common causes:

Endpoint not called during conformance test execution
Audit logging policy doesn't capture your API group
URI pattern matching failed (check for typos in swagger.json)
Test runs in non-audited namespace

Q: How do I test the audit scripts locally? A: Complete local workflow:

# 1. Set up KIND cluster with audit logging
kind create cluster --config kind-audit-config.yaml

# 2. Run conformance tests
go test ./test/e2e/... --ginkgo.focus="Conformance" --provider=skeleton

# 3. Extract audit logs from KIND container
docker cp kind-control-plane:/var/log/audit.log ./audit.log

# 4. Run parser
python3 audit_log_parser.py --audit-logs audit.log

# 5. Compare against CI baseline
python3 kubernetes_api_analysis.py

Process Questions

Q: How long should endpoints stay in pending_eligible_endpoints.yaml? A: No strict time limit, but best practice is to resolve within 1-2 release cycles. Long-pending endpoints should be evaluated for potential ineligibility or prioritized for test development.

Q: Who decides if an endpoint should be ineligible? A: The relevant SIG makes the initial determination, but controversial decisions should involve SIG Architecture. All decisions must be documented with clear reasoning.

Q: Can alpha/beta endpoints be in the audit system? A: The audit system focuses on stable (GA) endpoints, but alpha/beta endpoints may appear in audit logs if used by conformance tests. They should not be added to tracking files until reaching stability.

Q: What happens if I ignore the audit failures? A: Pull requests with audit failures should not be merged. The presubmit is a required check that must pass before code integration.

Resources

Essential Documentation

Conformance Test Requirements - Complete testing guidelines
OpenAPI Specification - Source of truth for API definitions
Conformance Test Suite - Current test implementations

CI and Monitoring

CI Job Configuration - Job definitions and triggers
TestGrid Dashboard - Real-time job status and history
Audit Scripts - Analysis tool source code
Script Usage Guide - Practical examples

Data Files

Current Conformance Tests - Active test inventory
Pending Endpoints - Awaiting test coverage
Ineligible Endpoints - Explicitly excluded from testing

Community

SIG Architecture: Primary owner of conformance audit system
CNCF Conformance WG: Policy and certification oversight
Kubernetes Slack: #sig-architecture and #conformance channels for questions

The conformance audit system represents a crucial safeguard in Kubernetes development, ensuring that the project's commitment to API stability and comprehensive testing remains intact as the platform evolves. By automatically tracking API coverage and enforcing systematic categorization of endpoints, this system helps maintain the high quality and compatibility standards that make Kubernetes a reliable platform for diverse deployment environments.

dims/README.md

Kubernetes API Audit Log Analysis

Quick Start

Outputs

audit_log_parser.py

kubernetes_api_analysis.py

How It Works