Skip to content

Instantly share code, notes, and snippets.

@dims
Last active August 29, 2025 12:16
Show Gist options
  • Save dims/1fc254d0e4d043ef18bbc84596785de2 to your computer and use it in GitHub Desktop.
Save dims/1fc254d0e4d043ef18bbc84596785de2 to your computer and use it in GitHub Desktop.
Extract Endpoint information from audit logs

Kubernetes API Audit Log Analysis

Tools for analyzing Kubernetes API coverage from E2E test audit logs.

Copy the two python scripts from https://github.com/kubernetes/test-infra/tree/master/experiment/audit to your local directory

The example below uses kubernetes/kubernetes#133132 which is a PR to add a new DRA test to conformance.

Quick Start

  1. Download audit logs from GCS:

    gsutil -m cp -R gs://kubernetes-ci-logs/pr-logs/pull/133132/pull-kubernetes-audit-kind-conformance/1960293546019786752/artifacts/audit/ .
  2. Parse audit logs:

    python3 audit_log_parser.py --audit-logs audit/audit*.log --output audit/audit-endpoints.txt --audit-operations-json audit/audit-operations.json
    Loading ineligible endpoints from: https://raw.githubusercontent.com/kubernetes/kubernetes/master/test/conformance/testdata/ineligible_endpoints.yaml
    Loaded 99 ineligible endpoints
    Loading Swagger specification from: https://raw.githubusercontent.com/kubernetes/kubernetes/refs/heads/master/api/openapi-spec/swagger.json
    Using cached Swagger specification
    Extracting resource types from Swagger specification...
    Extracted 68 resource types from Swagger spec
    Building path to operation mapping...
    Loaded 1050 API operations from Swagger spec
    Found 203 deprecated operations
    Parsing 2 audit log file(s):
    [1/2] audit/audit-2025-08-26T12-04-42.943.log
    [2/2] audit/audit.log
    
    Processing file 1/2: audit/audit-2025-08-26T12-04-42.943.log
    Processed 10000 lines from audit/audit-2025-08-26T12-04-42.943.log...
    Processed 20000 lines from audit/audit-2025-08-26T12-04-42.943.log...
    Processed 30000 lines from audit/audit-2025-08-26T12-04-42.943.log...
    Processed 40000 lines from audit/audit-2025-08-26T12-04-42.943.log...
    Processed 50000 lines from audit/audit-2025-08-26T12-04-42.943.log...
    Processed 60000 lines from audit/audit-2025-08-26T12-04-42.943.log...
    Processed 70000 lines from audit/audit-2025-08-26T12-04-42.943.log...
    Processed 80000 lines from audit/audit-2025-08-26T12-04-42.943.log...
    Processed 90000 lines from audit/audit-2025-08-26T12-04-42.943.log...
    Processed 100000 lines from audit/audit-2025-08-26T12-04-42.943.log...
    Completed audit/audit-2025-08-26T12-04-42.943.log: 106851 entries processed
    Processing file 2/2: audit/audit.log
    Processed 10000 lines from audit/audit.log...
    Processed 20000 lines from audit/audit.log...
    Processed 30000 lines from audit/audit.log...
    Processed 40000 lines from audit/audit.log...
    Processed 50000 lines from audit/audit.log...
    Processed 60000 lines from audit/audit.log...
    Processed 70000 lines from audit/audit.log...
    Processed 80000 lines from audit/audit.log...
    Completed audit/audit.log: 89464 entries processed
    
    Parsing complete:
    Total log entries: 196315
    Swagger-based matches: 87105
    Fallback matches: 731
    Unique endpoints found: 715
    Total API calls: 87836
    Skipped entries: 108479
    
    Results written to: audit/audit-endpoints.txt
    Generated audit/audit-operations.json with 602 operations and 1739 sample audit entries   
    
  3. Compare against CI baseline:

    python3 kubernetes_api_analysis.py --pull-audit-endpoints audit/audit-endpoints.txt
    Kubernetes API Operations Analysis
    ==================================
    
    Step 1: Extracting operationIds from swagger.json...
    Swagger URL: https://raw.githubusercontent.com/kubernetes/kubernetes/refs/heads/master/api/openapi-spec/swagger.json
    Output file: swagger_operations.txt
    Downloading swagger specification...
    Extracted 1062 operationIds to swagger_operations.txt
    
    No CI audit endpoints file specified, auto-discovering latest from GCS...
    Searching for latest CI audit run...
    Enumerating directories in gs://kubernetes-ci-logs/logs/ci-kubernetes-audit-kind-conformance...
    Found directory with finished.json: gs://kubernetes-ci-logs/logs/ci-kubernetes-audit-kind-conformance/1960318661684105216/
    Found audit file at: gs://kubernetes-ci-logs/logs/ci-kubernetes-audit-kind-conformance/1960318661684105216/artifacts/audit/audit-endpoints.txt
    Downloaded to: ci-audit-kind-conformance-audit-endpoints.txt
    
    Step 2: Comparing audit endpoint files...
    CI File: ci-audit-kind-conformance-audit-endpoints.txt
    Pull File: audit/audit-endpoints.txt
    
    Extracting operations from audit files (filtering by swagger operations)...
    SUMMARY
    =======
    Total Operations in Swagger:  1062
    Operations in CI:             508
    Operations in Pull:           517
    Operations Added:             9
    Operations Removed:           0
    Net Change:                   +9
    
    OPERATIONS ADDED IN PULL (NOT IN CI)
    ====================================
    Count: 9
    
      1. createResourceV1DeviceClass
      2. createResourceV1NamespacedResourceClaim
      3. createResourceV1NamespacedResourceClaimTemplate
      4. createResourceV1ResourceSlice
      5. deleteResourceV1DeviceClass
      6. deleteResourceV1NamespacedResourceClaim
      7. readResourceV1NamespacedResourceClaim
      8. replaceResourceV1NamespacedResourceClaim
      9. replaceResourceV1NamespacedResourceClaimStatus
    
    OPERATIONS REMOVED FROM PULL (IN CI BUT NOT PULL)
    =================================================
    Count: 0
    
    No operations removed.
    
    STABLE ENDPOINTS NOT FOUND IN PULL AUDIT LOG
    ============================================
    Count: 29
    
    These are stable, non-deprecated API endpoints defined in the Swagger spec
    but not exercised in the pull request audit log:
    
      1. connectCoreV1PostNamespacedPodExec
      2. connectCoreV1PostNamespacedPodPortforward
      3. createStorageV1VolumeAttributesClass
      4. deleteResourceV1CollectionDeviceClass
      5. deleteResourceV1NamespacedResourceClaimTemplate
      6. deleteResourceV1ResourceSlice
      7. deleteStorageV1CollectionVolumeAttributesClass
      8. deleteStorageV1VolumeAttributesClass
      9. getResourceV1APIResources
     10. listResourceV1ResourceClaimTemplateForAllNamespaces
     11. listStorageV1VolumeAttributesClass
     12. patchCoreV1NamespacedPodResize
     13. patchResourceV1DeviceClass
     14. patchResourceV1NamespacedResourceClaim
     15. patchResourceV1NamespacedResourceClaimStatus
     16. patchResourceV1NamespacedResourceClaimTemplate
     17. patchResourceV1ResourceSlice
     18. patchStorageV1VolumeAttributesClass
     19. readCoreV1NamespacedPodResize
     20. readResourceV1DeviceClass
     21. readResourceV1NamespacedResourceClaimStatus
     22. readResourceV1NamespacedResourceClaimTemplate
     23. readResourceV1ResourceSlice
     24. readStorageV1VolumeAttributesClass
     25. replaceCoreV1NamespacedPodResize
     26. replaceResourceV1DeviceClass
     27. replaceResourceV1NamespacedResourceClaimTemplate
     28. replaceResourceV1ResourceSlice
     29. replaceStorageV1VolumeAttributesClass
    
    Analysis complete!
    Generated files:
    - swagger_operations.txt (swagger operations list)
    

Outputs

audit_log_parser.py

  • audit/audit-endpoints.txt: Human-readable report (602 operations, 79K API calls)
  • audit/audit-operations.json: JSON with up to 5 audit samples per operation

kubernetes_api_analysis.py

  • Console output: Comparison showing added/removed operations vs CI baseline
  • swagger_operations.txt: Complete list of 1062 Swagger operations

How It Works

Each operation flows through this pipeline:

swagger.json → audit_log_parser.py → audit-endpoints.txt → kubernetes_api_analysis.py
     ↓              ↓                    ↓                    ↓
   POST /apis/   requestURI match    Line 98: | 1        "OPERATIONS ADDED"
   resource...   → operationId       (1 API call)        (new in this PR)

Example: createResourceV1NamespacedResourceClaimTemplate

  • Swagger: POST /apis/resource.k8s.io/v1/namespaces/{namespace}/resourceclaimtemplates
  • Audit Log: requestURI: "/apis/resource.k8s.io/v1/namespaces/dra-9508/resourceclaimtemplates"
  • Parser: Maps URI → operation ID via pattern matching
  • Output: Shows as "ADDED" (found in PR, missing from CI baseline)

Kubernetes Conformance Audit

The Kubernetes conformance audit system ensures that new API endpoints reaching General Availability (GA) are properly covered by conformance tests, preventing technical debt accumulation and maintaining API testing integrity across the project.

Overview

The conformance audit process automatically tracks API endpoint usage through audit logs and compares it against conformance test coverage, flagging gaps that need attention from contributors and maintainers.

Key Components

  1. Swagger/OpenAPI Specification: The authoritative definition of Kubernetes APIs in swagger.json
  2. Audit Log Analysis: Scripts that parse Kubernetes audit logs to identify API endpoint usage
  3. Endpoint Tracking Files: YAML files that categorize endpoints as pending, ineligible, or conformance-ready
  4. CI Jobs: Automated jobs that run the analysis and report on compliance

Conformance Audit Workflow

The conformance audit system operates through a continuous integration pipeline that validates API endpoint coverage:

┌─────────────────┐    ┌────────────────┐    ┌──────────────────┐
│   swagger.json  │────│ Periodic CI    │────│ Baseline Audit   │
│   Changes       │    │ Job Runs       │    │ Data Generated   │
└─────────────────┘    └────────────────┘    └──────────────────┘
                                                       │
┌─────────────────┐    ┌────────────────┐    ┌──────────────────┘
│ Pull Request    │────│ Presubmit CI   │────│ 
│ Submitted       │    │ Job Triggers   │    │
└─────────────────┘    └────────────────┘    │
                                             ▼
┌─────────────────┐    ┌────────────────┐    ┌──────────────────┐
│ Endpoint Files  │◄───│ Analysis &     │◄───│ Comparison       │
│ Updated         │    │ Validation     │    │ Against Baseline │
└─────────────────┘    └────────────────┘    └──────────────────┘

Detailed Process Flow

  1. API Development: Developers add new endpoints to the Kubernetes API
  2. Swagger Generation: The OpenAPI specification (swagger.json) is updated automatically
  3. PR Submission: Changes are submitted as pull requests
  4. Trigger Analysis: Modifications to swagger.json trigger the presubmit audit job
  5. Audit Collection: Conformance tests run and generate comprehensive audit logs
  6. Endpoint Mapping: Scripts map audit entries to swagger operation IDs
  7. Baseline Comparison: PR results are compared against the latest periodic job baseline
  8. Gap Identification: New stable endpoints without conformance coverage are flagged
  9. Enforcement: Contributors must categorize endpoints appropriately
  10. Validation: Updated tracking files are validated for completeness

CI Jobs

Periodic Job: ci-kubernetes-audit-kind-conformance

Purpose: Establishes the baseline for API endpoint coverage by running conformance tests and generating audit logs.

When it runs: Scheduled periodically to maintain current baseline data

What it does:

  • Creates a KIND (Kubernetes in Docker) cluster
  • Runs all 425+ conformance tests
  • Generates audit logs of API endpoint usage during test execution
  • Parses logs to create endpoint usage reports

Artifacts generated:

  • audit*.log: Raw audit log files (~90-100MB) - Complete Kubernetes API audit trail
  • audit-endpoints.txt: Human-readable endpoint usage summary (~27KB) - Sorted by usage frequency
  • audit-operations.json: JSON mapping of operations to audit entries (~2MB) - Sample data for each operation
  • policy.yaml: Audit policy configuration - Defines which events to capture

Example audit-endpoints.txt output:

Total unique endpoints: 714
Total calls: 87,241

readCoreV1NamespacedPodStatus: 15234 calls
listCoreV1NamespacedEvent: 8921 calls
createCoreV1NamespacedEvent: 7345 calls
patchCoreV1NamespacedPod: 5678 calls
...

Example audit-operations.json structure:

{
  "readCoreV1NamespacedPodStatus": {
    "count": 15234,
    "sample": {
      "verb": "get",
      "uri": "/api/v1/namespaces/default/pods/test-pod/status",
      "user": "system:serviceaccount:default:test-runner"
    }
  }
}

Monitoring: TestGrid Dashboard

Presubmit Job: pull-kubernetes-audit-kind-conformance

Purpose: Analyzes changes in pull requests to identify new API endpoints that need conformance test coverage.

When it runs: Triggered automatically when swagger.json is modified in a pull request

What it does:

  • Runs the same conformance tests as the periodic job
  • Compares audit results against the latest periodic job baseline
  • Identifies newly added, removed, or stable-but-untested endpoints
  • Validates endpoint categorization in tracking files

Additional analysis:

  • Highlights API operation differences between PR and baseline
  • Checks for proper categorization of endpoints
  • Reports on conformance test gaps

Monitoring: TestGrid Dashboard

Endpoint Classification System

Conformance-Eligible Endpoints

Endpoints that should eventually be covered by conformance tests but are not yet tested.

File: pending_eligible_endpoints.yaml

Contains: API endpoints awaiting conformance test development, typically for:

  • Recently added GA features
  • Resource management operations (pod resizing, device classes)
  • Core API operations not yet covered

Example entries:

# Pod resizing functionality
- readCoreV1NamespacedPodStatus
- patchCoreV1NamespacedPod

# Dynamic Resource Allocation
- createResourceV1alpha3NamespacedResourceClaim
- listResourceV1alpha3NamespacedResourceClaim
- readResourceV1alpha3NamespacedResourceClaim

Entry format: Simple list of operation IDs from swagger.json that correspond to stable API endpoints that should have conformance test coverage but don't yet.

Ineligible Endpoints

Endpoints explicitly excluded from conformance testing for valid technical or policy reasons.

File: ineligible_endpoints.yaml

Categories:

  • Deprecated endpoints: Soon-to-be-removed functionality
  • Optional features: Components not required in all Kubernetes distributions (NetworkPolicy, HPA)
  • Debug features: Development and troubleshooting tools (port forwarding, pod attach)
  • Administrative endpoints: Operations that distributions may restrict for security
  • Unstable features: APIs that lack stable implementations across providers

Each entry includes the endpoint name, exclusion reason, and link to relevant issue discussion.

Example entries:

- endpoint: connectCoreV1DeleteNodeProxy
  reason: "Unable to be tested, and likely soon deprecated"
  issue: "https://github.com/kubernetes/kubernetes/issues/12345"

- endpoint: readAuthorizationV1NamespacedLocalSubjectAccessReview  
  reason: "Optional feature - not all distros implement RBAC authorization"
  issue: "https://github.com/kubernetes/kubernetes/issues/67890"

- endpoint: connectCoreV1GetNamespacedPodPortforward
  reason: "Explicitly designed to be a debug feature"
  issue: "https://github.com/kubernetes/kubernetes/issues/24680"

Entry format: Structured YAML with endpoint name, justification, and reference to community discussion.

Audit Analysis Scripts

audit_log_parser.py

Purpose: Parses Kubernetes audit logs and maps entries to official API specifications.

Key functions:

  • Downloads current swagger.json specification
  • Processes audit log JSON entries
  • Maps log entries to OpenAPI operation IDs
  • Generates usage statistics and operation samples

Outputs:

  • Console report with endpoint counts and matches
  • audit-endpoints.txt: Sorted list of endpoints with usage counts
  • audit-operations.json: Sample audit entries for each operation

Command line usage:

# Basic usage with single audit log
python3 audit_log_parser.py --audit-logs audit.log

# Process multiple audit log files
python3 audit_log_parser.py --audit-logs audit-*.log

# Custom swagger URL and sorting
python3 audit_log_parser.py --audit-logs audit.log \
  --swagger-url https://custom.k8s.io/swagger.json \
  --sort count

# Output to specific file
python3 audit_log_parser.py --audit-logs audit.log \
  --output results/

Common errors and solutions:

  • URLError: [SSL: CERTIFICATE_VERIFY_FAILED]: Network issues downloading swagger specification
    • Solution: Check network connectivity and proxy settings
  • json.JSONDecodeError: Expecting value: Malformed JSON in audit log files
    • Solution: Verify audit log format and check for truncated files
  • FileNotFoundError: [Errno 2] No such file: Missing or unreadable audit log files
    • Solution: Verify file paths and permissions
  • KeyError: 'operationId': Unexpected log entry formats
    • Solution: Ensure audit logs are from conformance test runs with proper policy

kubernetes_api_analysis.py

Purpose: Compares audit results between pull requests and CI baseline to identify API changes.

Key functions:

  • Downloads latest CI audit data from Google Cloud Storage
  • Compares local PR audit results against CI baseline
  • Identifies added, removed, and stable-but-unused operations
  • Provides detailed change analysis

Analysis types:

  • New operations: API endpoints introduced in the PR
  • Removed operations: Previously audited endpoints no longer called
  • Stable unused: GA endpoints not exercised by conformance tests

Command line usage:

# Basic comparison against latest CI baseline
python3 kubernetes_api_analysis.py

# Use specific baseline file (skip auto-download)
python3 kubernetes_api_analysis.py --baseline-file ci-audit-endpoints.txt

# Custom output directory  
python3 kubernetes_api_analysis.py --output-dir analysis-results/

# Detailed verbose output
python3 kubernetes_api_analysis.py --verbose

Sample output:

Downloading latest CI audit data...
Found CI run: 1961316246318223360 (2025-08-29)

Comparing operations:
- New operations in PR: 9
  * createResourceV1alpha3NamespacedResourceClaim
  * readResourceV1alpha3NamespacedResourceClaim
  * listResourceV1alpha3ResourceSlice
  
- Removed operations: 0
- Stable operations not exercised: 127

Common errors and solutions:

  • Command 'gsutil' not found: Missing gsutil CLI tool
    • Solution: Install Google Cloud SDK or use --baseline-file option
  • AccessDenied: 403 Insufficient Permission: Permissions issues accessing GCS buckets
    • Solution: Run gcloud auth login or use service account authentication
  • ConnectionError: HTTPSConnectionPool: Network connectivity problems
    • Solution: Check firewall/proxy settings for GCS access
  • swagger.json parsing failed: Swagger specification parsing failures
    • Solution: Verify swagger.json format and ensure it's accessible

Step-by-Step Resolution Guides

Marking Endpoints as Ineligible

Problem: Your PR adds endpoints that shouldn't be part of conformance testing.

Step-by-step resolution:

  1. Document the reasoning:

    # Add to ineligible_endpoints.yaml
    - endpoint: connectCoreV1GetNamespacedPodExec
      reason: "Debug feature requiring special cluster permissions"
      issue: "https://github.com/kubernetes/kubernetes/issues/12345"
  2. Create or reference issue:

    Title: Conformance exemption: Pod exec endpoint
    
    Body: The connectCoreV1GetNamespacedPodExec endpoint should be 
    exempt from conformance testing because:
    
    1. It's primarily a debug/development feature
    2. Requires special RBAC permissions that conformance tests avoid
    3. Implementation varies significantly across distributions
  3. Follow established patterns:

    • Review existing ineligible entries for similar reasoning
    • Ensure your exemption follows community precedent
    • Get SIG approval for controversial exemptions

Enforcement Process

For Pull Request Authors

When the presubmit job identifies issues, contributors must follow the appropriate scenario guide above. The system enforces:

  1. Stable endpoints not in pending_eligible_endpoints.yaml: Must be added to pending list or have conformance tests
  2. Endpoints in pending_eligible_endpoints.yaml that are now tested: Must be removed from pending list
  3. New ineligible endpoints: Must be documented in ineligible list with justification

Error Messages and Resolution

"Stable endpoint X not found in pending_eligible_endpoints.yaml"

  • Cause: New GA endpoint lacks conformance test coverage
  • Resolution: Add endpoint to pending list or create conformance test

"Endpoint X found in pending_eligible_endpoints.yaml but is being tested"

  • Cause: Endpoint now has conformance test coverage
  • Resolution: Remove from pending list as no longer needed

Technical Implementation Details

Audit Log Collection

The conformance audit system relies on comprehensive Kubernetes audit logging during test execution:

Audit Policy Configuration:

# policy.yaml - Captures all API server requests
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: Request
  namespaces: ["default", "kube-system"]
  verbs: ["get", "list", "create", "update", "patch", "delete"]
  resources:
  - group: ""
    resources: ["*"]
  - group: "apps"
    resources: ["*"]

KIND Cluster Configuration:

# kind-config.yaml - Enables audit logging
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  kubeadmConfigPatches:
  - |
    kind: ClusterConfiguration
    apiServer:
      extraArgs:
        audit-log-path: /var/log/audit.log
        audit-policy-file: /etc/kubernetes/audit-policy.yaml
        audit-log-maxage: "30"
        audit-log-maxbackup: "3"
        audit-log-maxsize: "100"
      extraVolumes:
      - name: audit-policy
        hostPath: /etc/kubernetes/audit-policy.yaml
        mountPath: /etc/kubernetes/audit-policy.yaml
        readOnly: true
        pathType: File

Script Architecture

audit_log_parser.py workflow:

  1. Download swagger.json: Fetches latest OpenAPI specification
  2. Parse operation mappings: Builds lookup table of operationId → HTTP method/path
  3. Process audit logs: Parses each JSON log entry
  4. Map to operations: Matches audit entries to swagger operations using URI patterns
  5. Generate statistics: Counts usage and creates sample data
  6. Output results: Writes human-readable and machine-readable reports

kubernetes_api_analysis.py workflow:

  1. Discover baseline: Searches GCS for latest CI audit data
  2. Download artifacts: Retrieves baseline audit-endpoints.txt
  3. Parse local data: Processes current PR's audit results
  4. Compare datasets: Identifies additions, removals, and stable unused endpoints
  5. Generate report: Outputs detailed comparison analysis

Data Flow Architecture

┌─────────────────┐
│ Kubernetes API  │
│ Server (KIND)   │
└─────────────────┘
         │ HTTP requests during tests
         ▼
┌─────────────────┐
│ Audit Log       │
│ (JSON format)   │
└─────────────────┘
         │ Raw audit events
         ▼
┌─────────────────┐    ┌─────────────────┐
│ audit_log_      │◄───│ swagger.json    │
│ parser.py       │    │ (OpenAPI spec)  │
└─────────────────┘    └─────────────────┘
         │ Parsed operation counts
         ▼
┌─────────────────┐    ┌─────────────────┐
│ audit-endpoints │    │ audit-operations│
│ .txt (summary)  │    │ .json (samples) │
└─────────────────┘    └─────────────────┘
         │ Current PR data
         ▼
┌─────────────────┐    ┌─────────────────┐
│ kubernetes_api_ │◄───│ CI Baseline     │
│ analysis.py     │    │ (from GCS)      │
└─────────────────┘    └─────────────────┘
         │ Comparison analysis
         ▼
┌─────────────────┐
│ Change Report   │
│ (console output)│
└─────────────────┘

CI Integration Points

Presubmit trigger conditions:

  • Changes to api/openapi-spec/swagger.json
  • Modifications in conformance test directories
  • Updates to endpoint classification files

Job execution environment:

  • Container: gcr.io/k8s-staging-test-infra/krte
  • Resources: 2 CPU cores, 9GB RAM
  • Timeout: 150 minutes
  • Storage: 100GB for KIND images and audit logs

Artifact collection:

# CI job uploads artifacts to GCS
gsutil -m cp -r artifacts/audit/* \
  gs://kubernetes-ci-logs/pr-logs/pull/${PULL_NUMBER}/pull-kubernetes-audit-kind-conformance/${BUILD_ID}/artifacts/audit/

Troubleshooting

CI Job Failures

Build failures: Check build logs for:

  • KIND cluster creation issues
  • Test execution failures
  • Script execution errors
  • Artifact generation problems

Analysis failures: Common issues:

  • Swagger specification download problems
  • Audit log parsing errors
  • GCS access permission issues
  • File format inconsistencies

Script Execution Issues

Local development:

# Run audit log parser
python3 audit_log_parser.py --audit-logs audit.log

# Compare against CI baseline  
python3 kubernetes_api_analysis.py

Requirements:

  • Python 3.x
  • Network access for downloading specifications
  • gsutil installed for GCS operations
  • Valid audit log files in JSON format

Common Resolution Steps

  1. Check network connectivity for specification downloads
  2. Verify file permissions for audit logs and output files
  3. Validate JSON format of audit log entries
  4. Ensure gsutil authentication for GCS access
  5. Review script error output for specific failure details

Best Practices

For API Developers

Update tracking files promptly:

# Good workflow - immediate tracking
git add api/openapi-spec/swagger.json
git add test/conformance/testdata/pending_eligible_endpoints.yaml
git commit -m "Add new stable endpoint to pending conformance list

The createAppsV1NamespacedDeployment endpoint is now stable but
lacks conformance test coverage. Tracked in issue #12345."

Provide clear justifications:

# Good documentation in ineligible_endpoints.yaml
- endpoint: connectCoreV1GetNamespacedPodPortforward
  reason: "Debug feature: requires special permissions, varies by implementation"
  issue: "https://github.com/kubernetes/kubernetes/issues/24680"
  
# Poor documentation
- endpoint: someEndpoint
  reason: "doesn't work"  # Too vague
  issue: ""               # Missing reference

Test locally before submitting:

# Complete local validation workflow
# 1. Run conformance tests with audit logging
kind create cluster --config kind-audit-config.yaml
export KUBECONFIG="$(kind get kubeconfig-path)"
go test ./test/e2e/... --ginkgo.focus="Conformance"

# 2. Parse audit logs
python3 audit_log_parser.py --audit-logs /tmp/audit.log

# 3. Compare against baseline
python3 kubernetes_api_analysis.py --baseline-file previous-endpoints.txt

# 4. Verify endpoint categorization
diff -u previous-pending.yaml test/conformance/testdata/pending_eligible_endpoints.yaml

Frequently Asked Questions

General Questions

Q: Why is conformance audit enforcement necessary? A: Without systematic enforcement, stable APIs can reach GA status without adequate test coverage, creating technical debt and potential compatibility issues across Kubernetes distributions. The audit system ensures every stable endpoint is either tested or explicitly documented as exempt.

Q: How often do the CI jobs run? A: The periodic job runs automatically on a schedule to maintain baseline data. The presubmit job triggers only when swagger.json or related conformance files are modified in pull requests.

Q: Can I disable the audit check for my PR? A: No, the conformance audit is mandatory for API changes. However, you can satisfy the requirements by appropriately categorizing endpoints in the tracking files rather than writing tests immediately.

Technical Questions

Q: Why doesn't the audit pick up my API endpoint usage? A: Common causes:

  • Endpoint not called during conformance test execution
  • Audit logging policy doesn't capture your API group
  • URI pattern matching failed (check for typos in swagger.json)
  • Test runs in non-audited namespace

Q: How do I test the audit scripts locally? A: Complete local workflow:

# 1. Set up KIND cluster with audit logging
kind create cluster --config kind-audit-config.yaml

# 2. Run conformance tests
go test ./test/e2e/... --ginkgo.focus="Conformance" --provider=skeleton

# 3. Extract audit logs from KIND container
docker cp kind-control-plane:/var/log/audit.log ./audit.log

# 4. Run parser
python3 audit_log_parser.py --audit-logs audit.log

# 5. Compare against CI baseline
python3 kubernetes_api_analysis.py

Process Questions

Q: How long should endpoints stay in pending_eligible_endpoints.yaml? A: No strict time limit, but best practice is to resolve within 1-2 release cycles. Long-pending endpoints should be evaluated for potential ineligibility or prioritized for test development.

Q: Who decides if an endpoint should be ineligible? A: The relevant SIG makes the initial determination, but controversial decisions should involve SIG Architecture. All decisions must be documented with clear reasoning.

Q: Can alpha/beta endpoints be in the audit system? A: The audit system focuses on stable (GA) endpoints, but alpha/beta endpoints may appear in audit logs if used by conformance tests. They should not be added to tracking files until reaching stability.

Q: What happens if I ignore the audit failures? A: Pull requests with audit failures should not be merged. The presubmit is a required check that must pass before code integration.

Resources

Essential Documentation

CI and Monitoring

Data Files

Community

  • SIG Architecture: Primary owner of conformance audit system
  • CNCF Conformance WG: Policy and certification oversight
  • Kubernetes Slack: #sig-architecture and #conformance channels for questions

The conformance audit system represents a crucial safeguard in Kubernetes development, ensuring that the project's commitment to API stability and comprehensive testing remains intact as the platform evolves. By automatically tracking API coverage and enforcing systematic categorization of endpoints, this system helps maintain the high quality and compatibility standards that make Kubernetes a reliable platform for diverse deployment environments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment