Last active
October 6, 2025 14:25
-
-
Save nerdalert/77236e76dba6ee4b9a4998de8c487238 to your computer and use it in GitHub Desktop.
Revisions
-
nerdalert revised this gist
Oct 6, 2025 . 1 changed file with 44 additions and 422 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,417 +1,44 @@ # MaaS Introspection Architecture ## AuthPolicy Summary Two AuthPolicies manage authentication: - **maas-control-plane** (`deploy/manifests/control-plane-auth-policy.yaml`) - JWT auth for admin/management - **data-plane-auth-gateway** (`deploy/manifests/data-plane-introspect.yaml`) - API key auth for model inference ## Data Plane Call Workflow (Model Endpoint + API Key) **Request**: `POST /v1/chat/completions` with `Authorization: APIKEY $API_KEY` 1. Client sends request to `inference-gateway` with `Authorization: APIKEY Zyt5JAfbzyLm_Uwa9OzPLhXObK9mZwCqYXeyHNDQUnw` 2. Gateway triggers `data-plane-auth-gateway` AuthPolicy 3. AuthPolicy calls `POST /introspect` with `token=Zyt5JAfbzyLm_Uwa9OzPLhXObK9mZwCqYXeyHNDQUnw` 4. Introspect extracts key prefix (`Zyt5JAfb`) from first 8 characters 5. Database lookup: `SELECT * FROM api_keys WHERE key_prefix = 'Zyt5JAfb'` 6. Verify full key against stored `key_hash` + `salt` 7. Query team details: `SELECT * FROM teams WHERE id = <team_uuid>` 8. Query user model access: `GetUserModelsAllowed(user_id, team_id)` 9. Return OAuth2 response: `{active: true, user_id: "uuid", team_id: "team-orange", groups: "team-orange"}` 10. AuthPolicy injects `auth.identity.*` context into request 11. Rate limiter checks quota using `auth.identity.userid` + team 12. If within limits: forward to model service, else return 429 ## Control Plane Call Workflow (Admin/Management Endpoints) **Request**: `GET /teams` with `Authorization: Bearer <JWT>` 1. Client sends request to `maas-api-control-plane-route` HTTPRoute 2. HTTPRoute triggers `maas-control-plane` AuthPolicy 3. AuthPolicy validates JWT signature against Keycloak issuer 4. Extract user claims (roles, groups, email) from JWT 5. Check authorization rules: - Admin endpoints (`/admin/*`): require `maas-admin` role - User endpoints (`/teams`, `/keys`, etc.): require `maas-user` role OR group membership 6. Inject headers: `X-MaaS-User-ID`, `X-MaaS-User-Email`, `X-MaaS-User-Roles` 7. Forward request to MaaS API with user context 8. API returns team/user data based on authorized scope --- ## Architecture Overview @@ -423,10 +50,10 @@ flowchart LR Gateway --> ControlPlane[Control Plane<br/>• JWT Auth<br/>• Admin APIs] Gateway --> DataPlane[Data Plane<br/>• APIKEY Auth<br/>• Model APIs] ControlPlane --> KeyManager[MaaS API<br/>• Introspection<br/>• OAuth2 Bridge] DataPlane --> KeyManager KeyManager --> Database[PostgreSQL<br/>• Users UUID<br/>• Teams<br/>• API Keys] ``` ## Core Components @@ -456,7 +83,7 @@ sequenceDiagram participant KC as Keycloak participant GW as Gateway participant AU as Authorino participant KM as MaaS API Admin->>KC: POST /token (credentials) KC->>Admin: JWT access token @@ -477,7 +104,7 @@ sequenceDiagram participant GW as Gateway participant AU as Authorino participant LIM as Limitador participant KM as MaaS API participant MS as Model Service participant DB as PostgreSQL @@ -508,7 +135,7 @@ sequenceDiagram ```mermaid sequenceDiagram participant AU as Authorino participant KM as MaaS API participant DB as PostgreSQL AU->>KM: POST /introspect @@ -547,7 +174,7 @@ api_keys: ```mermaid sequenceDiagram participant API as API Key participant KM as MaaS API participant DB as PostgreSQL participant CEL as CEL Context participant LIM as Limitador @@ -620,12 +247,6 @@ sequenceDiagram CEL->>TOKEN: auth.identity.groups (Team memberships) ``` ## Security Model ### Authentication Boundaries @@ -642,7 +263,7 @@ sequenceDiagram ### Network Security - **Introspection Endpoint**: Internal cluster access only - **Bypass Gateway**: Authorino → MaaS-API via service mesh - **No External Auth**: `/introspect` not exposed to internet - **Service-to-Service**: mTLS encryption @@ -654,7 +275,7 @@ sequenceDiagram sequenceDiagram participant Admin as Admin participant CP as Control Plane participant KM as MaaS API participant DB as PostgreSQL participant KQ as Kuadrant participant LIM as Limitador @@ -677,7 +298,7 @@ sequenceDiagram participant GW as Gateway participant AUTH as AuthPolicy participant AU as Authorino participant KM as MaaS API participant TOKEN as TokenRateLimitPolicy participant LIM as Limitador participant MS as Model Service @@ -699,5 +320,6 @@ sequenceDiagram end ``` --- This architecture provides secure API key management with database-backed user identity and OAuth2 introspection for integration with Kuadrant policy frameworks. -
nerdalert revised this gist
Oct 1, 2025 . 1 changed file with 11 additions and 11 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -7,7 +7,7 @@ - Database-Backed Identity Resolution: API keys are hashed and stored in PostgreSQL, linking to user UUIDs/ID/Email and team memberships for privacy-preserving rate limiting - Oaths Introspection Bridge: maas-api provides /introspect endpoint that transforms API keys into structured identity context (PostgreSQL UUID/IDs + team groups) for Kuadrant policy ## Quickstart This section provides a quick way to get a JWT and interact with the control and data planes. @@ -373,7 +373,7 @@ Use the provided test script to verify the complete workflow: ```bash # Set environment variables export USER_JWT="your-jwt-token" export CONTROL_BASE="http://maas-api.db.apps.maas2.octo-emerging.redhataicoe.com" export DATA_BASE="http://simulator.db.apps.maas2.octo-emerging.redhataicoe.com" # Run complete workflow test (if available) @@ -423,10 +423,10 @@ flowchart LR Gateway --> ControlPlane[Control Plane<br/>• JWT Auth<br/>• Admin APIs] Gateway --> DataPlane[Data Plane<br/>• APIKEY Auth<br/>• Model APIs] ControlPlane --> maas-api[maas-api<br/>• Introspection<br/>• OAuth2 Bridge] DataPlane --> maas-api maas-api --> Database[PostgreSQL<br/>• Users UUID<br/>• Teams<br/>• API Keys] ``` ## Core Components @@ -456,7 +456,7 @@ sequenceDiagram participant KC as Keycloak participant GW as Gateway participant AU as Authorino participant KM as maas-api Admin->>KC: POST /token (credentials) KC->>Admin: JWT access token @@ -477,7 +477,7 @@ sequenceDiagram participant GW as Gateway participant AU as Authorino participant LIM as Limitador participant KM as maas-api participant MS as Model Service participant DB as PostgreSQL @@ -508,7 +508,7 @@ sequenceDiagram ```mermaid sequenceDiagram participant AU as Authorino participant KM as maas-api participant DB as PostgreSQL AU->>KM: POST /introspect @@ -547,7 +547,7 @@ api_keys: ```mermaid sequenceDiagram participant API as API Key participant KM as maas-api participant DB as PostgreSQL participant CEL as CEL Context participant LIM as Limitador @@ -654,7 +654,7 @@ sequenceDiagram sequenceDiagram participant Admin as Admin participant CP as Control Plane participant KM as maas-api participant DB as PostgreSQL participant KQ as Kuadrant participant LIM as Limitador @@ -677,7 +677,7 @@ sequenceDiagram participant GW as Gateway participant AUTH as AuthPolicy participant AU as Authorino participant KM as maas-api participant TOKEN as TokenRateLimitPolicy participant LIM as Limitador participant MS as Model Service -
nerdalert revised this gist
Oct 1, 2025 . 1 changed file with 8 additions and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,7 +1,13 @@ # MaaS OAuth2 Introspection and Group/User Management ## Overview - Dual Authentication Architecture: Control plane uses JWT/OIDC for admin operations, data plane uses API key introspection for model inference with OAuth2 compliance. In the demo it uses a GItHub OAuth IDP backing via Keycloak. - Database-Backed Identity Resolution: API keys are hashed and stored in PostgreSQL, linking to user UUIDs/ID/Email and team memberships for privacy-preserving rate limiting - Oaths Introspection Bridge: maas-api provides /introspect endpoint that transforms API keys into structured identity context (PostgreSQL UUID/IDs + team groups) for Kuadrant policy ## Quickstart This section provides a quick way to get a JWT and interact with the control and data planes. -
nerdalert revised this gist
Oct 1, 2025 . 1 changed file with 5 additions and 6 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -175,7 +175,7 @@ for i in {1..10}; do printf "Request #%-2s -> " "$i"; \ **What happens automatically:** - The key is created and associated with the specified team - The key inherits the team's embedded rate limits (100 requests/minute) - During introspection, the key will return `group: "team-blue"` - This group matches the TokenRateLimitPolicy predicate for 100 requests/minute limit ## Managing Teams and Keys @@ -305,7 +305,7 @@ Expected response should show: ```json { "active": true, "group": "team-blue", "team_id": "team-uuid", "user_id": "user-uuid", "plan": "team-blue" @@ -386,15 +386,15 @@ Database Storage (teams table) TokenRateLimitPolicy (Kuadrant) ↓ [key creation inherits team rate limits] API Key Generation ↓ [introspection returns team-name] Rate Limiting Enforcement ``` **Key Implementation Points:** - Teams have embedded rate limits (rate_limit, rate_window, rate_limit_spec) - PolicyManager automatically syncs team rate limits to TokenRateLimitPolicy - API keys inherit team rate limits via introspection group formatting - TokenRateLimitPolicy predicates match "{team-name}" groups - No separate policy entities - everything is team-centric - Rate limit updates to teams automatically propagate to all team keys @@ -693,6 +693,5 @@ sequenceDiagram end ``` This architecture provides secure API key management with database-backed user identity and OAuth2 introspection for integration with existing Kuadrant policy frameworks. -
nerdalert created this gist
Oct 1, 2025 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,698 @@ # Team-Based Rate Limiting Architecture This document outlines the simplified architecture for managing teams with embedded rate limits and user API keys in the MaaS platform. ## Quickstart This section provides a quick way to get a JWT and interact with the control and data planes. ```bash export CONTROL_BASE="http://key-manager.db.apps.maas2.octo-emerging.redhataicoe.com" export DATA_BASE="http://deepseek-r1.apps.maas2.octo-emerging.redhataicoe.com" export DATA_BASE="http://simulator.db.apps.maas2.octo-emerging.redhataicoe.com" # 1) Device flow to get a JWT DEV=$(curl -k -s -X POST -H "Content-Type: application/x-www-form-urlencoded" \ -d "client_id=maas-client" -d "client_secret=maas-client-secret" \ https://keycloak.apps.maas2.octo-emerging.redhataicoe.com/realms/maas/protocol/openid-connect/auth/device) echo "$DEV" | jq -r .verification_uri_complete DEVICE_CODE=$(echo "$DEV" | jq -r .device_code) # Wait for user to authenticate in the browser before running the next command echo "Press enter after you have authenticated in the browser" read USER_JWT=$(curl -k -s -X POST -H "Content-Type: application/x-www-form-urlencoded" \ -d "grant_type=urn:ietf:params:oauth:grant-type:device_code" \ -d "device_code=$DEVICE_CODE" -d "client_id=maas-client" -d "client_secret=maas-client-secret" \ https://keycloak.apps.maas2.octo-emerging.redhataicoe.com/realms/maas/protocol/openid-connect/token | jq -r .access_token) # 2) Bootstrap account (creates user if needed) curl -sS -H "Authorization: Bearer $USER_JWT" "$CONTROL_BASE/profile" | jq . # 3) Create an API key (no UUIDs needed) KEY_JSON=$(curl -sS -X POST "$CONTROL_BASE/users/me/keys" \ -H "Authorization: Bearer $USER_JWT" -H "Content-Type: application/json" \ -d '{"alias":"demo-key"}') API_KEY=$(echo "$KEY_JSON" | jq -r .api_key) echo "API Key: $API_KEY" # 4) List your keys curl -sS -H "Authorization: Bearer $USER_JWT" "$CONTROL_BASE/users/me/keys" | jq . # 5) Call data plane with your key curl -sS "$DATA_BASE/v1/chat/completions" \ -H "Authorization: APIKEY $API_KEY" \ -H "Content-Type: application/json" \ -d '{"model":"simulator-model","messages":[{"role":"user","content":"hello"}],"max_tokens":32}' | jq . ``` ## Workflow: Creating Teams with Embedded Rate Limits This workflow details the simplified process: creating a team with embedded rate limits, automatically syncing to Kuadrant, and generating API keys that inherit the rate limits. ### 1. Create a Team with Rate Limits (with Automatic Kuadrant Sync) Create a team with embedded rate limits. The rate limits will be automatically validated and synced to the Kuadrant TokenRateLimitPolicy resource. Send a `POST` request to the `/teams` endpoint: ```bash # A user JWT is required for this operation (ensure USER_JWT is set from the Quickstart) TEAM_ID=$(curl -sS -X POST "$CONTROL_BASE/teams" \ -H "Authorization: Bearer $USER_JWT" -H "Content-Type: application/json" \ -d '{ "ext_id": "team-blue", "name": "Blue Team", "description": "The blue team with 100 requests per minute", "rate_limit": 90, "rate_window": "1m" }' | jq -r .id) echo "Team ID: $TEAM_ID" # A user JWT is required for this operation (ensure USER_JWT is set from the Quickstart) TEAM_ID=$(curl -sS -X POST "$CONTROL_BASE/teams" \ -H "Authorization: Bearer $USER_JWT" -H "Content-Type: application/json" \ -d '{ "ext_id": "team-orange", "name": "Orange Team", "description": "The orange team with 10000 requests per minute", "rate_limit": 10000, "rate_window": "5m" }' | jq -r .id) echo "Team ID: $TEAM_ID" ``` **What happens automatically:** - Rate limit spec is validated (limit must be positive integer, window must match format like "1m", "1h", "24h") - Team is stored in the database with embedded rate limits - Rate limits are automatically synced to the Kuadrant TokenRateLimitPolicy with a new limit block: ```yaml limits: team-blue: rates: - limit: 100 window: 1m when: - predicate: auth.identity.groups.split(",").exists(g, g == "team-blue") counters: - expression: auth.identity.userid ``` **Verify the sync:** ```bash kubectl get tokenratelimitpolicy -n maas-db -o yaml | grep -A 10 "team-blue" ``` ### 2. Create API Key with Team Rate Limit Inheritance Create an API key that automatically inherits the team's rate limiting configuration. The key will be associated with the team and its embedded rate limits. #### a. User Registration (Automatic) A user is automatically created in the database the first time they access the `/profile` endpoint with a valid JWT. ```bash # Bootstrap the user account (creates user record if it doesn't exist) curl -sS -H "Authorization: Bearer $USER_JWT" "$CONTROL_BASE/profile" | jq . ``` #### b. Create API Key for Team Create an API key that will inherit the team's rate limits. You can specify the team explicitly or let it use the user's default team. ```bash # Create API key with explicit team assignment API_KEY_JSON=$(curl -sS -X POST "$CONTROL_BASE/users/me/keys" \ -H "Authorization: Bearer $USER_JWT" -H "Content-Type: application/json" \ -d '{ "alias": "blue-team-key7", "team_id": "'"$TEAM_ID"'" }') API_KEY=$(echo "$API_KEY_JSON" | jq -r .api_key) echo "API Key: $API_KEY" export API_KEY=$API_KEY # Create API key with explicit team assignment API_KEY_JSON=$(curl -sS -X POST "$CONTROL_BASE/users/me/keys" \ -H "Authorization: Bearer $USER_JWT" -H "Content-Type: application/json" \ -d '{ "alias": "orange-team-key1", "team_id": "'"$TEAM_ID"'" }') API_KEY=$(echo "$API_KEY_JSON" | jq -r .api_key) echo "API Key: $API_KEY" export API_KEY=$API_KEY ``` kubectl logs -n kuadrant-system deployment/limitador-limitador | grep tokenlimit.team_blue Test Curl: ```shell export API_KEY=dVSnNTYEnvoJ9j1M2ZOJDQGwP5aN8FvJQEkNrE8j8g8 curl -sS "$DATA_BASE/v1/chat/completions" \ -H "Authorization: APIKEY $API_KEY" \ -H "Content-Type: application/json" \ -d '{"model":"simulator-model","messages":[{"role":"user","content":"hello"}],"max_tokens":32}' | jq for i in {1..10}; do printf "Request #%-2s -> " "$i"; \ curl -s -o /dev/null -w "%{http_code}\n" \ -H "Authorization: APIKEY $API_KEY" \ -H 'Content-Type: application/json' \ -d '{"model":"deepseek-r1","messages":[{"role":"user","content":"Test"}],"max_tokens":10}' \ $DATA_BASE/v1/chat/completions; done ``` **What happens automatically:** - The key is created and associated with the specified team - The key inherits the team's embedded rate limits (100 requests/minute) - During introspection, the key will return `group: "plan:team-blue"` - This group matches the TokenRateLimitPolicy predicate for 100 requests/minute limit ## Managing Teams and Keys ### Listing Resources #### List Teams To see all teams, send a `GET` request to the `/teams` endpoint. ```bash curl -sS -H "Authorization: Bearer $USER_JWT" "$CONTROL_BASE/teams" | jq . ``` #### List Team Members (Users) To list the users (members) of a specific team, send a `GET` request to the `/teams/{team_id}/members` endpoint. ```bash TEAM_ID="the-id-of-the-team" curl -sS -H "Authorization: Bearer $USER_JWT" "$CONTROL_BASE/teams/$TEAM_ID/members" | jq . ``` #### List Team Keys To list the API keys associated with a specific team, send a `GET` request to the `/teams/{team_id}/keys` endpoint. ```bash TEAM_ID="the-id-of-the-team" curl -sS -H "Authorization: Bearer $USER_JWT" "$CONTROL_BASE/teams/$TEAM_ID/keys" | jq . ``` #### List User Keys To list your own API keys, send a `GET` request to the `/users/me/keys` endpoint. ```bash curl -sS -H "Authorization: Bearer $USER_JWT" "$CONTROL_BASE/users/me/keys" | jq . ``` ### Deleting Resources #### Delete a Team (with Cascading Deletes) To delete a team, send a `DELETE` request to the `/teams/{team_id}` endpoint. This performs cascading deletes of all dependent resources. ```bash TEAM_ID="team-blue" RESULT=$(curl -sS -X DELETE -H "Authorization: Bearer $USER_JWT" "$CONTROL_BASE/teams/$TEAM_ID") echo "$RESULT" | jq . ``` **What happens automatically:** - **Database Cascades**: All dependent records are automatically deleted via foreign key constraints: - `team_memberships` (ON DELETE CASCADE) - `model_grants` (ON DELETE CASCADE) - `api_keys` (ON DELETE CASCADE) - `usage_metrics` (ON DELETE SET NULL) - **Kuadrant Cleanup**: The team's rate limits are removed from TokenRateLimitPolicy and AuthPolicy - **Transactional**: All operations occur within a database transaction for consistency - **Key Count**: Returns the number of API keys that were cascaded **Response format:** ```json { "message": "Team deleted successfully", "team_id": "uuid", "ext_id": "team_external_id", "name": "Team Name", "cascaded_key_count": 3, "deleted_by": "user_id" } ``` #### Delete an API Key To delete an API key, send a `DELETE` request to the `/keys/{key_prefix}` endpoint using the first 8 characters of the key. ```bash KEY_PREFIX="the-first-8-chars-of-your-key" RESULT=$(curl -sS -X DELETE -H "Authorization: Bearer $USER_JWT" "$CONTROL_BASE/keys/$KEY_PREFIX") echo "$RESULT" | jq . ``` ### Updating Resources #### Update a Team's Rate Limits You can update a team's name, description, and rate limits by sending a `PATCH` request to the `/teams/{team_id}` endpoint. ```bash TEAM_ID="the-id-of-the-team-to-update" curl -sS -X PATCH "$CONTROL_BASE/teams/$TEAM_ID" \ -H "Authorization: Bearer $USER_JWT" -H "Content-Type: application/json" \ -d '{ "name": "Updated Blue Team", "description": "Updated description", "rate_limit": 800, "rate_window": "1m" }' ``` **What happens automatically:** - Team rate limits are updated in the database - Changes are automatically synced to the Kuadrant TokenRateLimitPolicy - All existing API keys for this team immediately inherit the new rate limits ## Complete Workflow Verification ### Test the End-to-End Rate Limiting Flow Verify that team rate limits correctly flow from creation to rate limiting enforcement: #### 1. Test Rate Limit Inheritance via Introspection ```bash # Test API key introspection to verify rate limit inheritance curl -sS -X POST "$CONTROL_BASE/introspect" \ -H "Content-Type: application/x-www-form-urlencoded" \ -d "token=$API_KEY" | jq . ``` Expected response should show: ```json { "active": true, "group": "plan:team-blue", "team_id": "team-uuid", "user_id": "user-uuid", "plan": "team-blue" } ``` #### 2. Test Data Plane Rate Limiting ```bash # Test data plane access with the API key curl -sS "$DATA_BASE/v1/chat/completions" \ -H "Authorization: APIKEY $API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "simulator-model", "messages": [{"role": "user", "content": "test"}], "max_tokens": 10 }' | jq . ``` #### 3. Verify TokenRateLimitPolicy Configuration ```bash # Check that your team exists in the TokenRateLimitPolicy kubectl get tokenratelimitpolicy -n maas-db -o yaml # Look for your team name in the limits section: # limits: # team-blue: # rates: # - limit: 100 # window: 1m # when: # - predicate: auth.identity.groups.split(",").exists(g, g == "team-blue") ``` #### 4. Test Rate Limiting Enforcement To test rate limiting is working, make rapid requests (exceeding 100 in 1 minute): ```bash # Run rapid requests to test rate limiting for i in {1..105}; do curl -sS "$DATA_BASE/v1/chat/completions" \ -H "Authorization: APIKEY $API_KEY" \ -H "Content-Type: application/json" \ -d '{"model": "simulator-model", "messages": [{"role": "user", "content": "test"}], "max_tokens": 1}' & done wait ``` You should see rate limiting kick in after 100 requests. ### Automated Testing Use the provided test script to verify the complete workflow: ```bash # Set environment variables export USER_JWT="your-jwt-token" export CONTROL_BASE="http://key-manager.db.apps.maas2.octo-emerging.redhataicoe.com" export DATA_BASE="http://simulator.db.apps.maas2.octo-emerging.redhataicoe.com" # Run complete workflow test (if available) ./test-complete-workflow.sh ``` ## Architecture Summary The simplified team-based rate limiting workflow implements the following data flow: ``` Team Creation with Rate Limits (API) ↓ [validates rate_limit and rate_window] Database Storage (teams table) ↓ [auto-sync via PolicyManager] TokenRateLimitPolicy (Kuadrant) ↓ [key creation inherits team rate limits] API Key Generation ↓ [introspection returns plan:team-name] Rate Limiting Enforcement ``` **Key Implementation Points:** - Teams have embedded rate limits (rate_limit, rate_window, rate_limit_spec) - PolicyManager automatically syncs team rate limits to TokenRateLimitPolicy - API keys inherit team rate limits via introspection group formatting - TokenRateLimitPolicy predicates match "plan:{team-name}" groups - No separate policy entities - everything is team-centric - Rate limit updates to teams automatically propagate to all team keys **Benefits of the New Architecture:** - **Simplified**: No separate policy management - rate limits are embedded in teams - **Consistent**: Team names directly match rate limiting identifiers (no name mismatches) - **Automatic**: Rate limit changes to teams immediately affect all team keys - **Intuitive**: Team-centric model is easier to understand and manage # MaaS Introspection Architecture ## Architecture Overview ```mermaid flowchart LR Client[External Clients] -->|HTTPS| Router[OpenShift Router] Router --> Gateway[Istio Gateway<br/>inference-gateway] Gateway --> ControlPlane[Control Plane<br/>• JWT Auth<br/>• Admin APIs] Gateway --> DataPlane[Data Plane<br/>• APIKEY Auth<br/>• Model APIs] ControlPlane --> KeyManager[Key Manager<br/>• Introspection<br/>• OAuth2 Bridge] DataPlane --> KeyManager KeyManager --> Database[PostgreSQL<br/>• Users UUID<br/>• Teams<br/>• API Keys] ``` ## Core Components ### Control Plane - **Authentication**: JWT Bearer tokens from Keycloak OIDC - **Endpoints**: `/teams`, `/users`, `/policies`, `/keys` - **Purpose**: Administrative operations, user/team management ### Data Plane - **Authentication**: API key tokens via OAuth2 introspection - **Endpoints**: `/v1/chat/completions`, `/v1/models` - **Purpose**: Model inference with rate limiting ### Introspection Service - **Endpoint**: `/introspect` (internal cluster access) - **Protocol**: OAuth2 introspection standard - **Function**: API key → user identity transformation ## Authentication Flows ### Control Plane JWT Flow ```mermaid sequenceDiagram participant Admin as Admin/CLI participant KC as Keycloak participant GW as Gateway participant AU as Authorino participant KM as Key Manager Admin->>KC: POST /token (credentials) KC->>Admin: JWT access token Admin->>GW: GET /teams (Bearer JWT) GW->>AU: Authorize request AU->>KC: Validate JWT signature KC->>AU: Valid + roles [maas-admin] AU->>GW: Allow + inject X-MaaS-User-* headers GW->>KM: Forward with user context KM->>Admin: 200 + team data ``` ### Data Plane APIKEY + Rate Limiting Flow ```mermaid sequenceDiagram participant Client as Client participant GW as Gateway participant AU as Authorino participant LIM as Limitador participant KM as Key Manager participant MS as Model Service participant DB as PostgreSQL Client->>GW: POST /v1/chat/completions (APIKEY xxx) GW->>AU: Check APIKEY authorization AU->>KM: POST /introspect (token=xxx) KM->>DB: SELECT user,team,policy FROM api_keys DB->>KM: UUID: c0976dac-..., team: orange KM->>AU: {"user_id": "UUID", "groups": "team-orange"} AU->>GW: Allow + inject auth.identity.* Note over GW,LIM: Token Rate Limiting GW->>LIM: RateLimitRequest(team_orange, UUID, tokens: 100) LIM->>LIM: Check bucket: team-orange-UUID LIM->>GW: RateLimitResponse(allowed: true/false) alt Rate limit exceeded GW->>Client: 429 Too Many Requests else Within limits GW->>MS: Forward to model MS->>GW: Model response GW->>Client: 200 + JSON response end ``` ### Introspection Detail Flow ```mermaid sequenceDiagram participant AU as Authorino participant KM as Key Manager participant DB as PostgreSQL AU->>KM: POST /introspect Note over AU,KM: Content-Type: application/x-www-form-urlencoded<br/>token=maas-2024-abc123def456 KM->>KM: SHA256(token) → hash KM->>DB: Query with JOIN Note over KM,DB: SELECT u.id, u.email, t.name, p.limits<br/>FROM api_keys ak<br/>JOIN users u ON ak.user_id = u.id<br/>JOIN teams t ON ak.team_id = t.id<br/>WHERE ak.key_hash = hash DB->>KM: User record Note over DB,KM: id: c0976dac-e1d5-4fe6-b7bc-c8fe20bcfc3a<br/>email: [email protected]<br/>team: team-orange KM->>AU: OAuth2 introspection response Note over KM,AU: {<br/>"active": true,<br/>"user_id": "c0976dac-e1d5-4fe6-b7bc-c8fe20bcfc3a",<br/>"groups": "team-orange,premium",<br/>"team_id": "team-orange"<br/>} ``` ## User Identity Architecture ### Database Schema ```sql users: ├── id: uuid (primary key) → c0976dac-e1d5-4fe6-b7bc-c8fe20bcfc3a ├── email: citext → brent.salisbury@gmail.com ├── keycloak_user_id: text → 6ae65b39-6b35-49ee-be74-d2f3b2f2a08b └── display_name: text api_keys: ├── key_hash: text (SHA256) ├── user_id: uuid → foreign key to users.id ├── team_id: uuid └── active: boolean ``` ### Identity Resolution Chain ```mermaid sequenceDiagram participant API as API Key participant KM as Key Manager participant DB as PostgreSQL participant CEL as CEL Context participant LIM as Limitador Note over API: maas-2024-abc123def456 API->>KM: SHA256 Hash KM->>DB: Query api_keys → users → teams DB->>KM: JOIN Result Note over DB,KM: UUID: c0976dac-...<br/>Email: [email protected]<br/>Team: team-orange KM->>CEL: OAuth2 Response Note over CEL: auth.identity.userid: UUID<br/>auth.identity.groups: team-orange<br/>auth.identity.team_id: team-orange CEL->>LIM: Rate Limiting Key Note over LIM: team-orange-c0976dac-... ``` ## Rate Limiting Integration ### CEL Expression Policy ```yaml TokenRateLimitPolicy: limits: team-orange: rates: - limit: 100000 window: "1h" when: - predicate: auth.identity.groups.split(",").exists(g, g == "team-orange") counters: - expression: auth.identity.userid # PostgreSQL UUID ``` ### Limitador Request Structure ```rust RateLimitRequest { domain: "maas-db/vllm-simulator-db", descriptors: [ RateLimitDescriptor { entries: [ Entry { key: "tokenlimit.team_orange__cd755ac6", value: "1" }, Entry { key: "auth.identity.userid", value: "c0976dac-e1d5-4fe6-b7bc-c8fe20bcfc3a" } ], limit: Some(TokenBucket { max: 100000, window: 3600s }) } ], hits_addend: 100 // From request.max_tokens } ``` ## Policy Architecture ### Kuadrant CEL Binding Scope ```mermaid sequenceDiagram participant GW as Gateway<br/>inference-gateway participant AUTH as AuthPolicy participant TOKEN as TokenRateLimitPolicy participant CEL as CEL Context Note over GW: Gateway-Level Policies (Shared Context) GW->>AUTH: Target reference GW->>TOKEN: Target reference Note over CEL: CEL Context Available CEL->>AUTH: auth.identity.userid (PostgreSQL UUID) CEL->>AUTH: auth.identity.groups (Team memberships) CEL->>AUTH: auth.identity.team_id (Primary team) CEL->>TOKEN: request.max_tokens (Token cost) CEL->>TOKEN: auth.identity.userid (PostgreSQL UUID) CEL->>TOKEN: auth.identity.groups (Team memberships) ``` ### Critical Design Rule **Policies must target the same resource level to share CEL binding context** - ✅ Both target `Gateway` → Shared context - ❌ AuthPolicy targets `HTTPRoute`, TokenRateLimitPolicy targets `Gateway` → Separate contexts ## Security Model ### Authentication Boundaries - **External Access**: TLS termination at OpenShift Router - **Control Plane**: JWT validation against Keycloak - **Data Plane**: API key validation via introspection - **Internal Services**: mTLS service mesh ### Privacy Design - **Rate Limiting Logs**: PostgreSQL UUIDs (not email addresses) - **API Key Storage**: SHA256 hashes in database - **Audit Trail**: UUID-based correlation - **User Resolution**: Database lookup required for human-readable info ### Network Security - **Introspection Endpoint**: Internal cluster access only - **Bypass Gateway**: Authorino → Key-Manager via service mesh - **No External Auth**: `/introspect` not exposed to internet - **Service-to-Service**: mTLS encryption ## Data Flow Summary ### User Onboarding ```mermaid sequenceDiagram participant Admin as Admin participant CP as Control Plane participant KM as Key Manager participant DB as PostgreSQL participant KQ as Kuadrant participant LIM as Limitador participant User as User Admin->>CP: Create team (JWT auth) CP->>KM: Team creation request KM->>DB: Store team + policy DB->>KM: Confirmation KM->>KQ: Sync rate limits KQ->>LIM: Configure quotas KM->>User: API keys linked to team quotas ``` ### Request Processing ```mermaid sequenceDiagram participant Client as Client participant GW as Gateway participant AUTH as AuthPolicy participant AU as Authorino participant KM as Key Manager participant TOKEN as TokenRateLimitPolicy participant LIM as Limitador participant MS as Model Service Client->>GW: Model request + API key GW->>AUTH: Route to validation AUTH->>AU: AuthPolicy triggers AU->>KM: Introspection call KM->>AU: PostgreSQL UUID + team context AU->>TOKEN: Pass auth context TOKEN->>LIM: UUID-based rate limiting LIM->>TOKEN: Quota enforcement result alt Within limits TOKEN->>MS: Forward request MS->>Client: Model response else Quota exceeded TOKEN->>Client: 429 Rate limit end ``` --- This architecture provides secure API key management with database-backed user identity and OAuth2 introspection for seamless integration with existing Kuadrant policy frameworks.