Amazon Bedrock and Intelligent Prompt Routing

Amazon offers two main managed AI services in AWS: SageMaker for building and training custom models, and Bedrock for easy access to foundation models via an API and a serverless model. This doc is only focussed on AWS BedRock and in particular on the Prompt Routing feature. We will cover other related service features in other docs.

Prompt Routing (sometimes marketed as Intelligent Prompt Routing) is a new feature added to the AWS Bedrock service and is still in early stages .. becoming generally available since April 2025. The capabilities of both Bedrock and its Prompt Routing feature are likely to evolve and this note only describes the capabilities at the time of writing this note (June 2025). You can use the Prompt Routing feature with a built-in set of foundation models from AWS, Anthropic and Meta. The basic capability here is that the service will automatically route a user provided LLM prompt to one of multiple LLM models being served in a serverless manner with an aim of either improving inference performance or accuracy of response or lowering inference costs.

Currently prompt routers can only be used with models from the same provider family (and these can be only certain specific text models from the Amazon Nova model family, Anthropic Claude family or Meta Llama family as listed in the Table below). Also the router can only route between 2 instances of served models. These restrictions are likely due to this being a new service and also suggest that the internal routing logic likely works well only when router has been designed and tuned in combination with those specific backend models. Note that the parent service itself (i.e. Bedrock) supports a much larger set of models and model families. However the Intelligent Prompt Routing feature is only supported for up to 2 model instances where both instances are from the same model familiy and only these 3 model families are supported.

3 pre-configured default routers are currently provided one for each of the 3 supported model families. Alternately you can custom config a router but it would still only work with one of these 3 model families. When using the default routers, the following combinations are supported.

Router/ model Family	Avaiable models	Fallback model
Amazon	Nova Lite, Nova Pro	Nova Pro
Anthropic	Claude 3 Haiku, Claude 3.5 Sonnet	Claude 3.5 Sonnet
Meta	Llama 3.1 8B, Llama 3.1 70B	Llama 3.1 70B

Here the "Fallback model" is the reference model used internally to determine whether a response from a candidate simpler backend model is close enough to the presumably better response from the more powerful model that has been designated to be the "fallback model". There is a metric called "Response quality difference" (fixed to 0% in case of default routers and variable in case of configurable routers) that is used to determine the acceptability of a response from a candidate model. So, an operator can choose to set this parameter to say 10% if they are willing to accept upto 10% loss in accuracy by prioritizing routing to the smaller or less powerful of the two backend models.

How does this work internally ?

Although Amazon does not disclose the internal logic/ algorithm for LLM routing, it appears that it is a form of dynamic LLM prompt routing that uses a classifier LLM to do the routing possibly in combination with semantic routing. Semantic routing leverages pre-trained vector embeddings of sample prompts in a separate vector database and uses appropriate similarity searches to pick the model that best supports the category of responses that this prompt is closest to. Classifier LLM based routing essentially trains a separate model to classify between the candidate backends by comparing their responses on a training set from the responses given by the fallback model. By using one of both of these two techqniues, Amazon's service can potentially achieve better balance between LLM routing accuracy and cost. We speculate that it is because of this combination of techniques that the routers are associated with the specific and the limited set of backend models supported (since for example this classifier LLM is trained for those specific backends). More such details and evaluation results to be analysed in followup testing.

Some simple initial tests/ experiments

These models and routers can easily be tested in the AWS Bedrock playground. Below are some sample outputs when using the default routers in the playground.

Sample use of chat with the built-in default Meta prompt router

User: Is 2357 a prime number ? Answer only via yes or no
Model: 
Yes
User: List the 4 most common causes of a kubernetes pod being stuck in the crashloopbackoff. Only answer in 1 line per cause
Model: 
Here are the 4 most common causes of a Kubernetes pod being stuck in CrashLoopBackOff:

1. Insufficient resources (e.g., CPU, memory) allocated to the pod.
2. Incorrect or missing environment variables or configuration settings.
3. Container image issues, such as a corrupted or invalid image.
4. Application code errors or exceptions that cause the container to crash repeatedly.

No setup was needed for the above chat. There were some nice metrics available for each chat prompt and response including data on which model provided the response and whether tokens were read from or written to cache. Based on these metrics, the first answer came from a LLama 3.1 8B Instruct model (response latency was about 400 ms) and the second from a Llama 3.1 70B Instruct model (response latency was about 1200 ms). When the same question was asked repeatedly, the answers to the first question flipped between Yes and No. The correct answer is Yes (2357 is in fact a prime number). This suggests that either the Temperature was set too high internally in the default configuration or that this question really needed a better model (probably a more reasoning oriented model). The router should ideally have tried the more complex model for this question imo so there seems room for improvement there too. Llama models are apparently not the best at self-assesment and reflection as of now so will retry these sorts of chats on other models. In fact when the same question was asked with a prompt to explain the work, the router again picked the same (small, sub-optimal and clearly not reasoning) model and that resulted in an incorrect explanation as shown below.

User: Is 2357 a prime number ? Explain your reasoning
Model: 2357 is not a prime number. It can be divided by 7 and 337, so it is a composite number.

Trying similar questions with the default Amazon Nova prompt router. The short version of the math question was again incorrectly routed and answered, however when prompted for an explanation, the prompt was routed to a more comprehensive model and the correct answer and explanation was provided as shown below. This suggests better semantic routing and potentially better self-assesment with the Nova router and Nova models.

User: Is 2357 a prime number ? Answer only via yes or no
Model: No.
(the above query was routed to Nova lite and the answer was incorrect)

User: is 2357 a prime number ? explain your work
Model: To determine if 2357 is a prime number, we need to check if it has any divisors other than 1 and itself.

   1. **Check for divisibility by small prime numbers:**
   - **2:** 2357 is odd, so it's not divisible by 2.
   - **3:** Sum of digits of 2357 is 2 + 3 + 5 + 7 = 17. Since 17 is not divisible by 3, 2357 is not divisible by 3.
   - **5:** 2357 does not end in 0 or 5, so it's not divisible by 5.
   - **7:** Perform the division: 2357 ÷ 7 ≈ 336.714, which is not an integer.
   - **11:** Alternating sum of digits is 2 - 3 + 5 - 7 = -3. Since -3 is not divisible by 11, 2357 is not divisible by 11.
   - **13:** Perform the division: 2357 ÷ 13 ≈ 181.308, which is not an integer.
   - **17:** Perform the division: 2357 ÷ 17 ≈ 138.647, which is not an integer.
   - **19:** Perform the division: 2357 ÷ 19 ≈ 124.053, which is not an integer.
   - **23:** Perform the division: 2357 ÷ 23 ≈ 102.478, which is not an integer.
   - **29:** Perform the division: 2357 ÷ 29 ≈ 81.276, which is not an integer.
   - **31:** Perform the division: 2357 ÷ 31 ≈ 76.032, which is not an integer.

   Since 2357 is not divisible by any prime number up to its square root (approximately 48.5), it is a prime number.

(the second answer above was from the Nova Pro model and the answer as well as reasoning were correct)

As with the Meta/ Llama router, even with the Nova router however the metrics did not show any Cache tokens being read or written even when the exact same pronmpts were repeatedly provided. That could be simply a bug or missing capability in this relatively new and immature feature.

With the default Anthropic router, the combination of the routing function and the quality of the models seemed to provide better results. For the math question, the routing function sent it to the bigger model right from the start (instead of sending to the simpler model as in the case of the Meta and Nova routers) and this model did reason and answer correctly even at the cost fo the highest latency of response compared to the other two default routers. Of course cant generalize about quality based on just a few prompts and responses. We will run proper eval benchmarks on these later.

Is 2357 a prime number ? Answer only via yes or no
Yes

Is 2357 a prime number ? Explain your reasoning
2357 is a prime number. Here's the explanation:

First, let's recall the definition of a prime number: A prime number is a natural number greater than 1 that is only   
divisible by 1 and itself.

To determine if 2357 is prime, we need to check if it has any divisors other than 1 and itself.

We only need to check up to the square root of 2357, which is approximately 48.55. This is because if a number has a
divisor larger than its square root, it must also have a corresponding divisor smaller than its square root.

We can check for divisibility by all prime numbers up to 48: 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47

After checking, we find that 2357 is not divisible by any of these numbers:

2357 ÷ 2 = 1178 remainder 1
2357 ÷ 3 = 785 remainder 2
2357 ÷ 5 = 471 remainder 2
2357 ÷ 7 = 336 remainder 5 ...and so on for the rest of the primes up to 47.
Since 2357 is not divisible by any prime number up to its square root, it is indeed a prime number.

Therefore, 2357 is a prime number.

For On the other question, interestingly the Anthropic router routed the prompt to the smaller model unlike in the case of the Meta and Amazon Nova routers.

List the 4 most common causes of a kubernetes pod being stuck in the crashloopbackoff. Only answer in 1 line per cause

1. Incorrect container command or arguments.
2. Missing or incorrect environment variables.
3. Resource constraints (CPU, memory) exceeded.
4. Dependency issues (e.g., unable to connect to a database).

All the above examples were with the "Chat" mode of the AWS playground. There is also a "Text" mode (i.e. "Single prompt" mode). Initial tests with the single prompt mode did not change the key observations noted above (for instance the Cache tokens were still showing 0) but this should be tested further since there could be Routing logice efficiency implications for Single prompt vs multi-turn chat due to implications for LLM context lengths. Additional tests to be done for such aspects in future.

There are a few additional knobs as well and handy features as well. For instance the Bedrock playground allows a side by side compare mode of any two individual models and/ or routers so that the exact same prompts can be submitted to both and the results compared for accuracy, performance and eventually cost. In our preliminary side by side testing we noted that the Anthropic router was better able to classify complex math queries to the more powerful Sonnet backend whereas the Amazon and Meta models tended to classify prompts that appeared simple and short yet required complex math reasoning to the smaller of the backend models. It may also be that the Anthropic models have been designed explicitly for such sample math questions and exhibit better results.

This was a very brief and preliminary evaluation of Amazon's new Prompt Routing feature. It looks like a good start and specially when combined with Amazon Bedrock's other inference related and prompt management features can make for a good solution story. One would expect additional configuration and monitoring features as well as support for additional LLM models to be added by Amazon in the near future. We expect to do further testing and evaluation on this and related features.

srampal/gist:1c712156ae95d4b6911a14eec9b48d11

Select an option

No results found

Select an option

No results found