# Kubernetes Conformance Audit

The Kubernetes conformance audit system ensures that new API endpoints reaching General Availability (GA) are properly covered by conformance tests, preventing technical debt accumulation and maintaining API testing integrity across the project.

## Overview

The conformance audit process automatically tracks API endpoint usage through audit logs and compares it against conformance test coverage, flagging gaps that need attention from contributors and maintainers.

### Key Components

1. **Swagger/OpenAPI Specification**: The authoritative definition of Kubernetes APIs in [`swagger.json`](https://github.com/kubernetes/kubernetes/blob/master/api/openapi-spec/swagger.json)
2. **Audit Log Analysis**: Scripts that parse Kubernetes audit logs to identify API endpoint usage
3. **Endpoint Tracking Files**: YAML files that categorize endpoints as pending, ineligible, or conformance-ready
4. **CI Jobs**: Automated jobs that run the analysis and report on compliance

## Conformance Audit Workflow

The conformance audit system operates through a continuous integration pipeline that validates API endpoint coverage:

```
┌─────────────────┐    ┌────────────────┐    ┌──────────────────┐
│   swagger.json  │────│ Periodic CI    │────│ Baseline Audit   │
│   Changes       │    │ Job Runs       │    │ Data Generated   │
└─────────────────┘    └────────────────┘    └──────────────────┘
                                                       │
┌─────────────────┐    ┌────────────────┐    ┌──────────────────┘
│ Pull Request    │────│ Presubmit CI   │────│ 
│ Submitted       │    │ Job Triggers   │    │
└─────────────────┘    └────────────────┘    │
                                             ▼
┌─────────────────┐    ┌────────────────┐    ┌──────────────────┐
│ Endpoint Files  │◄───│ Analysis &     │◄───│ Comparison       │
│ Updated         │    │ Validation     │    │ Against Baseline │
└─────────────────┘    └────────────────┘    └──────────────────┘
```

### Detailed Process Flow

1. **API Development**: Developers add new endpoints to the Kubernetes API
2. **Swagger Generation**: The OpenAPI specification (`swagger.json`) is updated automatically
3. **PR Submission**: Changes are submitted as pull requests
4. **Trigger Analysis**: Modifications to `swagger.json` trigger the presubmit audit job
5. **Audit Collection**: Conformance tests run and generate comprehensive audit logs
6. **Endpoint Mapping**: Scripts map audit entries to swagger operation IDs
7. **Baseline Comparison**: PR results are compared against the latest periodic job baseline
8. **Gap Identification**: New stable endpoints without conformance coverage are flagged
9. **Enforcement**: Contributors must categorize endpoints appropriately
10. **Validation**: Updated tracking files are validated for completeness

## CI Jobs

### Periodic Job: `ci-kubernetes-audit-kind-conformance`

**Purpose**: Establishes the baseline for API endpoint coverage by running conformance tests and generating audit logs.

**When it runs**: Scheduled periodically to maintain current baseline data

**What it does**:
- Creates a KIND (Kubernetes in Docker) cluster
- Runs all 425+ conformance tests
- Generates audit logs of API endpoint usage during test execution
- Parses logs to create endpoint usage reports

**Artifacts generated**:
- `audit*.log`: Raw audit log files (~90-100MB) - Complete Kubernetes API audit trail
- `audit-endpoints.txt`: Human-readable endpoint usage summary (~27KB) - Sorted by usage frequency
- `audit-operations.json`: JSON mapping of operations to audit entries (~2MB) - Sample data for each operation
- `policy.yaml`: Audit policy configuration - Defines which events to capture

**Example audit-endpoints.txt output**:
```
Total unique endpoints: 714
Total calls: 87,241

readCoreV1NamespacedPodStatus: 15234 calls
listCoreV1NamespacedEvent: 8921 calls
createCoreV1NamespacedEvent: 7345 calls
patchCoreV1NamespacedPod: 5678 calls
...
```

**Example audit-operations.json structure**:
```json
{
  "readCoreV1NamespacedPodStatus": {
    "count": 15234,
    "sample": {
      "verb": "get",
      "uri": "/api/v1/namespaces/default/pods/test-pod/status",
      "user": "system:serviceaccount:default:test-runner"
    }
  }
}
```

**Monitoring**: [TestGrid Dashboard](https://testgrid.k8s.io/sig-arch-conformance#kind-conformance-audit)

### Presubmit Job: `pull-kubernetes-audit-kind-conformance`

**Purpose**: Analyzes changes in pull requests to identify new API endpoints that need conformance test coverage.

**When it runs**: Triggered automatically when `swagger.json` is modified in a pull request

**What it does**:
- Runs the same conformance tests as the periodic job
- Compares audit results against the latest periodic job baseline
- Identifies newly added, removed, or stable-but-untested endpoints
- Validates endpoint categorization in tracking files

**Additional analysis**:
- Highlights API operation differences between PR and baseline
- Checks for proper categorization of endpoints
- Reports on conformance test gaps

**Monitoring**: [TestGrid Dashboard](https://testgrid.k8s.io/sig-arch-conformance#presubmit-kind-conformance-audit)

## Endpoint Classification System

### Conformance-Eligible Endpoints

Endpoints that should eventually be covered by conformance tests but are not yet tested.

**File**: [`pending_eligible_endpoints.yaml`](https://github.com/kubernetes/kubernetes/blob/master/test/conformance/testdata/pending_eligible_endpoints.yaml)

**Contains**: API endpoints awaiting conformance test development, typically for:
- Recently added GA features
- Resource management operations (pod resizing, device classes)
- Core API operations not yet covered

**Example entries**:
```yaml
# Pod resizing functionality
- readCoreV1NamespacedPodStatus
- patchCoreV1NamespacedPod

# Dynamic Resource Allocation
- createResourceV1alpha3NamespacedResourceClaim
- listResourceV1alpha3NamespacedResourceClaim
- readResourceV1alpha3NamespacedResourceClaim
```

**Entry format**: Simple list of operation IDs from swagger.json that correspond to stable API endpoints that should have conformance test coverage but don't yet.

### Ineligible Endpoints  

Endpoints explicitly excluded from conformance testing for valid technical or policy reasons.

**File**: [`ineligible_endpoints.yaml`](https://github.com/kubernetes/kubernetes/blob/master/test/conformance/testdata/ineligible_endpoints.yaml)

**Categories**:
- **Deprecated endpoints**: Soon-to-be-removed functionality
- **Optional features**: Components not required in all Kubernetes distributions (NetworkPolicy, HPA)
- **Debug features**: Development and troubleshooting tools (port forwarding, pod attach)
- **Administrative endpoints**: Operations that distributions may restrict for security
- **Unstable features**: APIs that lack stable implementations across providers

Each entry includes the endpoint name, exclusion reason, and link to relevant issue discussion.

**Example entries**:
```yaml
- endpoint: connectCoreV1DeleteNodeProxy
  reason: "Unable to be tested, and likely soon deprecated"
  issue: "https://github.com/kubernetes/kubernetes/issues/12345"

- endpoint: readAuthorizationV1NamespacedLocalSubjectAccessReview  
  reason: "Optional feature - not all distros implement RBAC authorization"
  issue: "https://github.com/kubernetes/kubernetes/issues/67890"

- endpoint: connectCoreV1GetNamespacedPodPortforward
  reason: "Explicitly designed to be a debug feature"
  issue: "https://github.com/kubernetes/kubernetes/issues/24680"
```

**Entry format**: Structured YAML with endpoint name, justification, and reference to community discussion.

## Audit Analysis Scripts

### `audit_log_parser.py`

**Purpose**: Parses Kubernetes audit logs and maps entries to official API specifications.

**Key functions**:
- Downloads current `swagger.json` specification
- Processes audit log JSON entries
- Maps log entries to OpenAPI operation IDs
- Generates usage statistics and operation samples

**Outputs**:
- Console report with endpoint counts and matches
- `audit-endpoints.txt`: Sorted list of endpoints with usage counts
- `audit-operations.json`: Sample audit entries for each operation

**Command line usage**:
```bash
# Basic usage with single audit log
python3 audit_log_parser.py --audit-logs audit.log

# Process multiple audit log files
python3 audit_log_parser.py --audit-logs audit-*.log

# Custom swagger URL and sorting
python3 audit_log_parser.py --audit-logs audit.log \
  --swagger-url https://custom.k8s.io/swagger.json \
  --sort count

# Output to specific file
python3 audit_log_parser.py --audit-logs audit.log \
  --output results/
```

**Common errors and solutions**:
- **`URLError: [SSL: CERTIFICATE_VERIFY_FAILED]`**: Network issues downloading swagger specification
  - *Solution*: Check network connectivity and proxy settings
- **`json.JSONDecodeError: Expecting value`**: Malformed JSON in audit log files
  - *Solution*: Verify audit log format and check for truncated files
- **`FileNotFoundError: [Errno 2] No such file`**: Missing or unreadable audit log files
  - *Solution*: Verify file paths and permissions
- **`KeyError: 'operationId'`**: Unexpected log entry formats
  - *Solution*: Ensure audit logs are from conformance test runs with proper policy

### `kubernetes_api_analysis.py`

**Purpose**: Compares audit results between pull requests and CI baseline to identify API changes.

**Key functions**:
- Downloads latest CI audit data from Google Cloud Storage
- Compares local PR audit results against CI baseline
- Identifies added, removed, and stable-but-unused operations
- Provides detailed change analysis

**Analysis types**:
- **New operations**: API endpoints introduced in the PR
- **Removed operations**: Previously audited endpoints no longer called
- **Stable unused**: GA endpoints not exercised by conformance tests

**Command line usage**:
```bash
# Basic comparison against latest CI baseline
python3 kubernetes_api_analysis.py

# Use specific baseline file (skip auto-download)
python3 kubernetes_api_analysis.py --baseline-file ci-audit-endpoints.txt

# Custom output directory  
python3 kubernetes_api_analysis.py --output-dir analysis-results/

# Detailed verbose output
python3 kubernetes_api_analysis.py --verbose
```

**Sample output**:
```
Downloading latest CI audit data...
Found CI run: 1961316246318223360 (2025-08-29)

Comparing operations:
- New operations in PR: 9
  * createResourceV1alpha3NamespacedResourceClaim
  * readResourceV1alpha3NamespacedResourceClaim
  * listResourceV1alpha3ResourceSlice
  
- Removed operations: 0
- Stable operations not exercised: 127
```

**Common errors and solutions**:
- **`Command 'gsutil' not found`**: Missing `gsutil` CLI tool
  - *Solution*: Install Google Cloud SDK or use `--baseline-file` option
- **`AccessDenied: 403 Insufficient Permission`**: Permissions issues accessing GCS buckets  
  - *Solution*: Run `gcloud auth login` or use service account authentication
- **`ConnectionError: HTTPSConnectionPool`**: Network connectivity problems
  - *Solution*: Check firewall/proxy settings for GCS access
- **`swagger.json parsing failed`**: Swagger specification parsing failures
  - *Solution*: Verify swagger.json format and ensure it's accessible

## Step-by-Step Resolution Guides

### Marking Endpoints as Ineligible

**Problem**: Your PR adds endpoints that shouldn't be part of conformance testing.

**Step-by-step resolution**:

1. **Document the reasoning**:
   ```yaml
   # Add to ineligible_endpoints.yaml
   - endpoint: connectCoreV1GetNamespacedPodExec
     reason: "Debug feature requiring special cluster permissions"
     issue: "https://github.com/kubernetes/kubernetes/issues/12345"
   ```

2. **Create or reference issue**:
   ```markdown
   Title: Conformance exemption: Pod exec endpoint
   
   Body: The connectCoreV1GetNamespacedPodExec endpoint should be 
   exempt from conformance testing because:
   
   1. It's primarily a debug/development feature
   2. Requires special RBAC permissions that conformance tests avoid
   3. Implementation varies significantly across distributions
   ```

3. **Follow established patterns**:
   - Review existing ineligible entries for similar reasoning
   - Ensure your exemption follows community precedent
   - Get SIG approval for controversial exemptions

## Enforcement Process

### For Pull Request Authors

When the presubmit job identifies issues, contributors must follow the appropriate scenario guide above. The system enforces:

1. **Stable endpoints not in pending_eligible_endpoints.yaml**: Must be added to pending list or have conformance tests
2. **Endpoints in pending_eligible_endpoints.yaml that are now tested**: Must be removed from pending list  
3. **New ineligible endpoints**: Must be documented in ineligible list with justification

### Error Messages and Resolution

**"Stable endpoint X not found in pending_eligible_endpoints.yaml"**
- **Cause**: New GA endpoint lacks conformance test coverage
- **Resolution**: Add endpoint to pending list or create conformance test

**"Endpoint X found in pending_eligible_endpoints.yaml but is being tested"**
- **Cause**: Endpoint now has conformance test coverage
- **Resolution**: Remove from pending list as no longer needed

## Technical Implementation Details

### Audit Log Collection

The conformance audit system relies on comprehensive Kubernetes audit logging during test execution:

**Audit Policy Configuration**:
```yaml
# policy.yaml - Captures all API server requests
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: Request
  namespaces: ["default", "kube-system"]
  verbs: ["get", "list", "create", "update", "patch", "delete"]
  resources:
  - group: ""
    resources: ["*"]
  - group: "apps"
    resources: ["*"]
```

**KIND Cluster Configuration**:
```yaml
# kind-config.yaml - Enables audit logging
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  kubeadmConfigPatches:
  - |
    kind: ClusterConfiguration
    apiServer:
      extraArgs:
        audit-log-path: /var/log/audit.log
        audit-policy-file: /etc/kubernetes/audit-policy.yaml
        audit-log-maxage: "30"
        audit-log-maxbackup: "3"
        audit-log-maxsize: "100"
      extraVolumes:
      - name: audit-policy
        hostPath: /etc/kubernetes/audit-policy.yaml
        mountPath: /etc/kubernetes/audit-policy.yaml
        readOnly: true
        pathType: File
```

### Script Architecture

**audit_log_parser.py workflow**:
1. **Download swagger.json**: Fetches latest OpenAPI specification
2. **Parse operation mappings**: Builds lookup table of operationId → HTTP method/path
3. **Process audit logs**: Parses each JSON log entry
4. **Map to operations**: Matches audit entries to swagger operations using URI patterns
5. **Generate statistics**: Counts usage and creates sample data
6. **Output results**: Writes human-readable and machine-readable reports

**kubernetes_api_analysis.py workflow**:
1. **Discover baseline**: Searches GCS for latest CI audit data
2. **Download artifacts**: Retrieves baseline audit-endpoints.txt
3. **Parse local data**: Processes current PR's audit results
4. **Compare datasets**: Identifies additions, removals, and stable unused endpoints
5. **Generate report**: Outputs detailed comparison analysis

### Data Flow Architecture

```
┌─────────────────┐
│ Kubernetes API  │
│ Server (KIND)   │
└─────────────────┘
         │ HTTP requests during tests
         ▼
┌─────────────────┐
│ Audit Log       │
│ (JSON format)   │
└─────────────────┘
         │ Raw audit events
         ▼
┌─────────────────┐    ┌─────────────────┐
│ audit_log_      │◄───│ swagger.json    │
│ parser.py       │    │ (OpenAPI spec)  │
└─────────────────┘    └─────────────────┘
         │ Parsed operation counts
         ▼
┌─────────────────┐    ┌─────────────────┐
│ audit-endpoints │    │ audit-operations│
│ .txt (summary)  │    │ .json (samples) │
└─────────────────┘    └─────────────────┘
         │ Current PR data
         ▼
┌─────────────────┐    ┌─────────────────┐
│ kubernetes_api_ │◄───│ CI Baseline     │
│ analysis.py     │    │ (from GCS)      │
└─────────────────┘    └─────────────────┘
         │ Comparison analysis
         ▼
┌─────────────────┐
│ Change Report   │
│ (console output)│
└─────────────────┘
```

### CI Integration Points

**Presubmit trigger conditions**:
- Changes to `api/openapi-spec/swagger.json`
- Modifications in conformance test directories
- Updates to endpoint classification files

**Job execution environment**:
- Container: `gcr.io/k8s-staging-test-infra/krte`
- Resources: 2 CPU cores, 9GB RAM
- Timeout: 150 minutes
- Storage: 100GB for KIND images and audit logs

**Artifact collection**:
```bash
# CI job uploads artifacts to GCS
gsutil -m cp -r artifacts/audit/* \
  gs://kubernetes-ci-logs/pr-logs/pull/${PULL_NUMBER}/pull-kubernetes-audit-kind-conformance/${BUILD_ID}/artifacts/audit/
```

## Troubleshooting

### CI Job Failures

**Build failures**: Check build logs for:
- KIND cluster creation issues
- Test execution failures
- Script execution errors
- Artifact generation problems

**Analysis failures**: Common issues:
- Swagger specification download problems
- Audit log parsing errors
- GCS access permission issues
- File format inconsistencies

### Script Execution Issues

**Local development**:
```bash
# Run audit log parser
python3 audit_log_parser.py --audit-logs audit.log

# Compare against CI baseline  
python3 kubernetes_api_analysis.py
```

**Requirements**:
- Python 3.x
- Network access for downloading specifications
- `gsutil` installed for GCS operations
- Valid audit log files in JSON format

### Common Resolution Steps

1. **Check network connectivity** for specification downloads
2. **Verify file permissions** for audit logs and output files
3. **Validate JSON format** of audit log entries
4. **Ensure gsutil authentication** for GCS access
5. **Review script error output** for specific failure details

## Best Practices

### For API Developers

**Update tracking files promptly**:
```bash
# Good workflow - immediate tracking
git add api/openapi-spec/swagger.json
git add test/conformance/testdata/pending_eligible_endpoints.yaml
git commit -m "Add new stable endpoint to pending conformance list

The createAppsV1NamespacedDeployment endpoint is now stable but
lacks conformance test coverage. Tracked in issue #12345."
```

**Provide clear justifications**:
```yaml
# Good documentation in ineligible_endpoints.yaml
- endpoint: connectCoreV1GetNamespacedPodPortforward
  reason: "Debug feature: requires special permissions, varies by implementation"
  issue: "https://github.com/kubernetes/kubernetes/issues/24680"
  
# Poor documentation
- endpoint: someEndpoint
  reason: "doesn't work"  # Too vague
  issue: ""               # Missing reference
```

**Test locally before submitting**:
```bash
# Complete local validation workflow
# 1. Run conformance tests with audit logging
kind create cluster --config kind-audit-config.yaml
export KUBECONFIG="$(kind get kubeconfig-path)"
go test ./test/e2e/... --ginkgo.focus="Conformance"

# 2. Parse audit logs
python3 audit_log_parser.py --audit-logs /tmp/audit.log

# 3. Compare against baseline
python3 kubernetes_api_analysis.py --baseline-file previous-endpoints.txt

# 4. Verify endpoint categorization
diff -u previous-pending.yaml test/conformance/testdata/pending_eligible_endpoints.yaml
```

## Frequently Asked Questions

### General Questions

**Q: Why is conformance audit enforcement necessary?**
A: Without systematic enforcement, stable APIs can reach GA status without adequate test coverage, creating technical debt and potential compatibility issues across Kubernetes distributions. The audit system ensures every stable endpoint is either tested or explicitly documented as exempt.

**Q: How often do the CI jobs run?**
A: The periodic job runs automatically on a schedule to maintain baseline data. The presubmit job triggers only when `swagger.json` or related conformance files are modified in pull requests.

**Q: Can I disable the audit check for my PR?**
A: No, the conformance audit is mandatory for API changes. However, you can satisfy the requirements by appropriately categorizing endpoints in the tracking files rather than writing tests immediately.

### Technical Questions

**Q: Why doesn't the audit pick up my API endpoint usage?**
A: Common causes:
- Endpoint not called during conformance test execution
- Audit logging policy doesn't capture your API group
- URI pattern matching failed (check for typos in swagger.json)
- Test runs in non-audited namespace

**Q: How do I test the audit scripts locally?**
A: Complete local workflow:
```bash
# 1. Set up KIND cluster with audit logging
kind create cluster --config kind-audit-config.yaml

# 2. Run conformance tests
go test ./test/e2e/... --ginkgo.focus="Conformance" --provider=skeleton

# 3. Extract audit logs from KIND container
docker cp kind-control-plane:/var/log/audit.log ./audit.log

# 4. Run parser
python3 audit_log_parser.py --audit-logs audit.log

# 5. Compare against CI baseline
python3 kubernetes_api_analysis.py
```

### Process Questions

**Q: How long should endpoints stay in pending_eligible_endpoints.yaml?**
A: No strict time limit, but best practice is to resolve within 1-2 release cycles. Long-pending endpoints should be evaluated for potential ineligibility or prioritized for test development.

**Q: Who decides if an endpoint should be ineligible?**
A: The relevant SIG makes the initial determination, but controversial decisions should involve SIG Architecture. All decisions must be documented with clear reasoning.

**Q: Can alpha/beta endpoints be in the audit system?**
A: The audit system focuses on stable (GA) endpoints, but alpha/beta endpoints may appear in audit logs if used by conformance tests. They should not be added to tracking files until reaching stability.

**Q: What happens if I ignore the audit failures?**
A: Pull requests with audit failures should not be merged. The presubmit is a required check that must pass before code integration.

## Resources

### Essential Documentation
- [Conformance Test Requirements](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) - Complete testing guidelines
- [OpenAPI Specification](https://github.com/kubernetes/kubernetes/blob/master/api/openapi-spec/swagger.json) - Source of truth for API definitions
- [Conformance Test Suite](https://github.com/kubernetes/kubernetes/tree/master/test/conformance) - Current test implementations

### CI and Monitoring  
- [CI Job Configuration](https://github.com/kubernetes/test-infra/blob/master/config/jobs/kubernetes/sig-arch/conformance-audit.yaml) - Job definitions and triggers
- [TestGrid Dashboard](https://testgrid.k8s.io/sig-arch-conformance) - Real-time job status and history
- [Audit Scripts](https://github.com/kubernetes/test-infra/tree/master/experiment/audit) - Analysis tool source code
- [Script Usage Guide](https://gist.github.com/dims/1fc254d0e4d043ef18bbc84596785de2/raw/73ae40e831367a757e8e2b47ff10836286db7d03/README.md) - Practical examples

### Data Files
- [Current Conformance Tests](https://github.com/kubernetes/kubernetes/blob/master/test/conformance/testdata/conformance.yaml) - Active test inventory  
- [Pending Endpoints](https://github.com/kubernetes/kubernetes/blob/master/test/conformance/testdata/pending_eligible_endpoints.yaml) - Awaiting test coverage
- [Ineligible Endpoints](https://github.com/kubernetes/kubernetes/blob/master/test/conformance/testdata/ineligible_endpoints.yaml) - Explicitly excluded from testing

### Community
- **SIG Architecture**: Primary owner of conformance audit system
- **CNCF Conformance WG**: Policy and certification oversight  
- **Kubernetes Slack**: `#sig-architecture` and `#conformance` channels for questions

---

The conformance audit system represents a crucial safeguard in Kubernetes development, ensuring that the project's commitment to API stability and comprehensive testing remains 
intact as the platform evolves. By automatically tracking API coverage and enforcing systematic categorization of endpoints, this system helps maintain the high quality 
and compatibility standards that make Kubernetes a reliable platform for diverse deployment environments.