Skip to content

Instantly share code, notes, and snippets.

@srampal
Last active July 2, 2025 14:57
Show Gist options
  • Select an option

  • Save srampal/1c712156ae95d4b6911a14eec9b48d11 to your computer and use it in GitHub Desktop.

Select an option

Save srampal/1c712156ae95d4b6911a14eec9b48d11 to your computer and use it in GitHub Desktop.

Revisions

  1. srampal revised this gist Jul 2, 2025. 1 changed file with 2 additions and 2 deletions.
    4 changes: 2 additions & 2 deletions gistfile1.md
    Original file line number Diff line number Diff line change
    @@ -18,7 +18,7 @@ So, an operator can choose to set this parameter to say 10% if they are willing

    How does this work internally ?

    Although Amazon does not disclose the internal logic/ algorithm for LLM routing, it appears that it is a form of dynamic LLM prompt routing that uses a classifier LLM to do the routing possibly in combination with semantic routing. Semantic routing leveragesasic initial pre-stored vector embeddings of sample prompts in a separate vector database and uses appropriate similarity searches to pick the model that best supports the category of responses that this prompt is closest to. Classifier LLM based routing essentially trains a separate model to classify between the candidate backends by comparing their responses on a training set from the responses given by the fallback model. By using one of both of these two techqniues, Amazon's service can potentially achieve better balance between LLM routing accuracy and cost. We speculate that it is because of this combination of techniques that the routers are associated with the specific and the limited set of backend models supported (since for example this classifier LLM is trained for those specific backends). More such details and evaluation results to be analysed in followup testing.
    Although Amazon does not disclose the internal logic/ algorithm for LLM routing, it appears that it is a form of dynamic LLM prompt routing that uses a classifier LLM to do the routing possibly in combination with semantic routing. Semantic routing leverages pre-trained vector embeddings of sample prompts in a separate vector database and uses appropriate similarity searches to pick the model that best supports the category of responses that this prompt is closest to. Classifier LLM based routing essentially trains a separate model to classify between the candidate backends by comparing their responses on a training set from the responses given by the fallback model. By using one of both of these two techqniues, Amazon's service can potentially achieve better balance between LLM routing accuracy and cost. We speculate that it is because of this combination of techniques that the routers are associated with the specific and the limited set of backend models supported (since for example this classifier LLM is trained for those specific backends). More such details and evaluation results to be analysed in followup testing.

    Some simple initial tests/ experiments

    @@ -112,7 +112,7 @@ On the other question, interestingly the Anthropic router routed the prompt to t

    All the above examples were with the "Chat" mode of the AWS playground. There is also a "Text" mode (i.e. "Single prompt" mode). Initial tests with the single prompt mode did not change the key observations noted above (for instance the Cache tokens were still showing 0) but this should be tested further since there could be Routing logice efficiency implications for Single prompt vs multi-turn chat due to implications for LLM context lengths. Additional tests to be done for such aspects in future.

    There are a few additional knobs as well and handy features as well. For instance the Nedwork playground allows a side by side compare mode of any two individual models and/ or routers so that the exact same prompts can be submitted to both and the results compared for accuracy, performance and eventually cost. In our preliminary side by side testing we noted that the Anthropic router was better able to classify complex math queries to the more powerful Sonnet backend whereas the Amazon and Meta models tended to classify prompts that appeared simple and short yet required complex math reasoning to the smaller of the backend models. It may also be that the Anthropic models have been designed explicitly for such sample math questions and exhibit better results.
    There are a few additional knobs as well and handy features as well. For instance the Bedrock playground allows a side by side compare mode of any two individual models and/ or routers so that the exact same prompts can be submitted to both and the results compared for accuracy, performance and eventually cost. In our preliminary side by side testing we noted that the Anthropic router was better able to classify complex math queries to the more powerful Sonnet backend whereas the Amazon and Meta models tended to classify prompts that appeared simple and short yet required complex math reasoning to the smaller of the backend models. It may also be that the Anthropic models have been designed explicitly for such sample math questions and exhibit better results.

    This was a very brief and preliminary evaluation of Amazon's new Prompt Routing feature. It looks like a good start and specially when combined with Amazon Bedrock's other inference related and prompt management features can make for a good solution story. One would expect additional configuration and monitoring features as well as support for additional LLM models to be added by Amazon in the near future. We expect to do further testing and evaluation on this and related features.

  2. srampal revised this gist Jul 1, 2025. 1 changed file with 2 additions and 1 deletion.
    3 changes: 2 additions & 1 deletion gistfile1.md
    Original file line number Diff line number Diff line change
    @@ -14,6 +14,7 @@ Currently prompt routers can only be used with models from the same provider fam
    | Meta | Llama 3.1 8B, Llama 3.1 70B | Llama 3.1 70B |

    Here the "Fallback model" is the reference model used internally to determine whether a response from a candidate simpler backend model is close enough to the presumably better response from the more powerful model that has been designated to be the "fallback model". There is a metric called "Response quality difference" (fixed to 0% in case of default routers and variable in case of configurable routers) that is used to determine the acceptability of a response from a candidate model.
    So, an operator can choose to set this parameter to say 10% if they are willing to accept upto 10% loss in accuracy by prioritizing routing to the smaller or less powerful of the two backend models.

    How does this work internally ?

    @@ -113,5 +114,5 @@ All the above examples were with the "Chat" mode of the AWS playground. There is

    There are a few additional knobs as well and handy features as well. For instance the Nedwork playground allows a side by side compare mode of any two individual models and/ or routers so that the exact same prompts can be submitted to both and the results compared for accuracy, performance and eventually cost. In our preliminary side by side testing we noted that the Anthropic router was better able to classify complex math queries to the more powerful Sonnet backend whereas the Amazon and Meta models tended to classify prompts that appeared simple and short yet required complex math reasoning to the smaller of the backend models. It may also be that the Anthropic models have been designed explicitly for such sample math questions and exhibit better results.

    This was a very brief and preliminary evaluation of Amazon's Prompt Routing feature. It looks like a good start and specially when combined with Amazon Bedrock's other inference related and prompt management features can make for a good solution story. We expect to do further testing and evaluation on this and related features.
    This was a very brief and preliminary evaluation of Amazon's new Prompt Routing feature. It looks like a good start and specially when combined with Amazon Bedrock's other inference related and prompt management features can make for a good solution story. One would expect additional configuration and monitoring features as well as support for additional LLM models to be added by Amazon in the near future. We expect to do further testing and evaluation on this and related features.

  3. srampal revised this gist Jul 1, 2025. 1 changed file with 2 additions and 2 deletions.
    4 changes: 2 additions & 2 deletions gistfile1.md
    Original file line number Diff line number Diff line change
    @@ -13,11 +13,11 @@ Currently prompt routers can only be used with models from the same provider fam
    | Anthropic | Claude 3 Haiku, Claude 3.5 Sonnet | Claude 3.5 Sonnet |
    | Meta | Llama 3.1 8B, Llama 3.1 70B | Llama 3.1 70B |

    Here the "Fallback model" is the reference model used internally to determine whether a response from a candidate backend model is acceptable. There is a metric called "Response quality difference" (fixed to 0% in case of default routers and variable in case of configurable routers) that is used to determine the acceptability of a response from a candidate model.
    Here the "Fallback model" is the reference model used internally to determine whether a response from a candidate simpler backend model is close enough to the presumably better response from the more powerful model that has been designated to be the "fallback model". There is a metric called "Response quality difference" (fixed to 0% in case of default routers and variable in case of configurable routers) that is used to determine the acceptability of a response from a candidate model.

    How does this work internally ?

    Although Amazon does not disclose the internal logic/ algorithm for LLM routing, it appears that it is a form of dynamic LLM prompt routing that combines Semantic routing with a classifier LLM. Semantic routing leveragesasic initial pre-stored vector embeddings of sample prompts in a separate vector database and uses appropriate similarity searches to pick the model that best supports the category of responses that this prompt is closest to. Classifier LLM based routing essentially trains a separate model to classify between the candidate backends. By combining these two techqniues, Amazon's service can potentially achieve better balance between LLM routing accuracy and cost. We speculate that it is because of this combination of techniques that the routers are closely tied to the limited set of backend models supported (since for example this classifier LLM is trained for those specific backends). More such details and evaluation results to be analysed in followup testing.
    Although Amazon does not disclose the internal logic/ algorithm for LLM routing, it appears that it is a form of dynamic LLM prompt routing that uses a classifier LLM to do the routing possibly in combination with semantic routing. Semantic routing leveragesasic initial pre-stored vector embeddings of sample prompts in a separate vector database and uses appropriate similarity searches to pick the model that best supports the category of responses that this prompt is closest to. Classifier LLM based routing essentially trains a separate model to classify between the candidate backends by comparing their responses on a training set from the responses given by the fallback model. By using one of both of these two techqniues, Amazon's service can potentially achieve better balance between LLM routing accuracy and cost. We speculate that it is because of this combination of techniques that the routers are associated with the specific and the limited set of backend models supported (since for example this classifier LLM is trained for those specific backends). More such details and evaluation results to be analysed in followup testing.

    Some simple initial tests/ experiments

  4. srampal revised this gist Jul 1, 2025. 1 changed file with 12 additions and 1 deletion.
    13 changes: 12 additions & 1 deletion gistfile1.md
    Original file line number Diff line number Diff line change
    @@ -3,7 +3,8 @@ Amazon offers two main managed AI services in AWS: SageMaker for building and tr
    Prompt Routing (sometimes marketed as Intelligent Prompt Routing) is a new feature added to the AWS Bedrock service and is still in early stages .. becoming generally available since April 2025. The capabilities of both Bedrock and its Prompt Routing feature are likely to evolve and this note only describes the capabilities at the time of writing this note (June 2025).
    You can use the Prompt Routing feature with a built-in set of foundation models from AWS, Anthropic and Meta. The basic capability here is that the service will automatically route a user provided LLM prompt to one of multiple LLM models being served in a serverless manner with an aim of either improving inference performance or accuracy of response or lowering inference costs.

    Currently prompt routers can only be used with models from the same provider family (and these can be only certain specific text models from the Amazon Nova model family, Anthropic Claude family or Meta Llama family as listed in the Table below). Also the router can only route between 2 instances of served models. These restrictions are likely due to this being a new service and also suggest that the internal response evaluation and fallback logic likely works well only when the backends have been similarly trained. Note that the parent service itself (i.e. Bedrock) supports a much larger set of models and model families. But the Intelligent Prompt Routing feature is only supported for up to 2 model instances where both instances are from the same model familiy and only these 3 model families are supported. Also, this feature is only optimized for English language models. All such constraints hint at how this works internally. More on this later.
    Currently prompt routers can only be used with models from the same provider family (and these can be only certain specific text models from the Amazon Nova model family, Anthropic Claude family or Meta Llama family as listed in the Table below). Also the router can only route between 2 instances of served models. These restrictions are likely due to this being a new service and also suggest that the internal routing logic likely works well only when router has been designed and tuned in combination with those specific backend models. Note that the parent service itself (i.e. Bedrock) supports a much larger set of models and model families. However the Intelligent Prompt Routing feature is only supported for up to 2 model instances where both instances are from the same model familiy and only these 3 model families are supported.

    3 pre-configured default routers are currently provided one for each of the 3 supported model families. Alternately you can custom config a router but it would still only work with one of these 3 model families. When using the default routers, the following combinations are supported.

    | Router/ model Family | Avaiable models | Fallback model |
    @@ -14,6 +15,12 @@ Currently prompt routers can only be used with models from the same provider fam

    Here the "Fallback model" is the reference model used internally to determine whether a response from a candidate backend model is acceptable. There is a metric called "Response quality difference" (fixed to 0% in case of default routers and variable in case of configurable routers) that is used to determine the acceptability of a response from a candidate model.

    How does this work internally ?

    Although Amazon does not disclose the internal logic/ algorithm for LLM routing, it appears that it is a form of dynamic LLM prompt routing that combines Semantic routing with a classifier LLM. Semantic routing leveragesasic initial pre-stored vector embeddings of sample prompts in a separate vector database and uses appropriate similarity searches to pick the model that best supports the category of responses that this prompt is closest to. Classifier LLM based routing essentially trains a separate model to classify between the candidate backends. By combining these two techqniues, Amazon's service can potentially achieve better balance between LLM routing accuracy and cost. We speculate that it is because of this combination of techniques that the routers are closely tied to the limited set of backend models supported (since for example this classifier LLM is trained for those specific backends). More such details and evaluation results to be analysed in followup testing.

    Some simple initial tests/ experiments

    These models and routers can easily be tested in the AWS Bedrock playground. Below are some sample outputs when using the default routers in the playground.

    Sample use of chat with the built-in default Meta prompt router
    @@ -104,3 +111,7 @@ On the other question, interestingly the Anthropic router routed the prompt to t

    All the above examples were with the "Chat" mode of the AWS playground. There is also a "Text" mode (i.e. "Single prompt" mode). Initial tests with the single prompt mode did not change the key observations noted above (for instance the Cache tokens were still showing 0) but this should be tested further since there could be Routing logice efficiency implications for Single prompt vs multi-turn chat due to implications for LLM context lengths. Additional tests to be done for such aspects in future.

    There are a few additional knobs as well and handy features as well. For instance the Nedwork playground allows a side by side compare mode of any two individual models and/ or routers so that the exact same prompts can be submitted to both and the results compared for accuracy, performance and eventually cost. In our preliminary side by side testing we noted that the Anthropic router was better able to classify complex math queries to the more powerful Sonnet backend whereas the Amazon and Meta models tended to classify prompts that appeared simple and short yet required complex math reasoning to the smaller of the backend models. It may also be that the Anthropic models have been designed explicitly for such sample math questions and exhibit better results.

    This was a very brief and preliminary evaluation of Amazon's Prompt Routing feature. It looks like a good start and specially when combined with Amazon Bedrock's other inference related and prompt management features can make for a good solution story. We expect to do further testing and evaluation on this and related features.

  5. srampal revised this gist Jun 30, 2025. 1 changed file with 22 additions and 6 deletions.
    28 changes: 22 additions & 6 deletions gistfile1.md
    Original file line number Diff line number Diff line change
    @@ -3,8 +3,18 @@ Amazon offers two main managed AI services in AWS: SageMaker for building and tr
    Prompt Routing (sometimes marketed as Intelligent Prompt Routing) is a new feature added to the AWS Bedrock service and is still in early stages .. becoming generally available since April 2025. The capabilities of both Bedrock and its Prompt Routing feature are likely to evolve and this note only describes the capabilities at the time of writing this note (June 2025).
    You can use the Prompt Routing feature with a built-in set of foundation models from AWS, Anthropic and Meta. The basic capability here is that the service will automatically route a user provided LLM prompt to one of multiple LLM models being served in a serverless manner with an aim of either improving inference performance or accuracy of response or lowering inference costs.

    Currently prompt routers can only be used with models from the same provider family (and these can be only the Amazon Nova model family, Anthropic Claude family or Meta Llama family). Also the router can only route between 2 instances of served models. These restrictions suggest that the internal response evaluation and fallback logic likely works well only when the backends have been similarly trained. Note that the parent service itself (i.e. Bedrock) supports a much larger set of models and model families. But the Intelligent Prompt Routing feature is only supported for up to 2 model instances where both instances are from the same model familiy and only these 3 model families are supported. Also, this feature is only optimized for English language models. All such constraints hint at how this works internally. More on this later.
    3 pre-configured default routers are currently provided one for each of the 3 supported model families. Alternately you can custom config a router but it would still only work with one of these 3 model families.
    Currently prompt routers can only be used with models from the same provider family (and these can be only certain specific text models from the Amazon Nova model family, Anthropic Claude family or Meta Llama family as listed in the Table below). Also the router can only route between 2 instances of served models. These restrictions are likely due to this being a new service and also suggest that the internal response evaluation and fallback logic likely works well only when the backends have been similarly trained. Note that the parent service itself (i.e. Bedrock) supports a much larger set of models and model families. But the Intelligent Prompt Routing feature is only supported for up to 2 model instances where both instances are from the same model familiy and only these 3 model families are supported. Also, this feature is only optimized for English language models. All such constraints hint at how this works internally. More on this later.
    3 pre-configured default routers are currently provided one for each of the 3 supported model families. Alternately you can custom config a router but it would still only work with one of these 3 model families. When using the default routers, the following combinations are supported.

    | Router/ model Family | Avaiable models | Fallback model |
    |---------|-------|----------|
    | Amazon | Nova Lite, Nova Pro | Nova Pro |
    | Anthropic | Claude 3 Haiku, Claude 3.5 Sonnet | Claude 3.5 Sonnet |
    | Meta | Llama 3.1 8B, Llama 3.1 70B | Llama 3.1 70B |

    Here the "Fallback model" is the reference model used internally to determine whether a response from a candidate backend model is acceptable. There is a metric called "Response quality difference" (fixed to 0% in case of default routers and variable in case of configurable routers) that is used to determine the acceptability of a response from a candidate model.

    These models and routers can easily be tested in the AWS Bedrock playground. Below are some sample outputs when using the default routers in the playground.

    Sample use of chat with the built-in default Meta prompt router

    @@ -21,7 +31,7 @@ Sample use of chat with the built-in default Meta prompt router
    4. Application code errors or exceptions that cause the container to crash repeatedly.

    No setup was needed for the above chat. There were some nice metrics available for each chat prompt and response including data on which model provided the response and whether tokens were read from or written to cache. Based on these metrics, the first answer came from a LLama 3.1 8B Instruct model (response latency was about 400 ms) and the second from a Llama 3.1 70B Instruct model (response latency was about 1200 ms).
    When the same question was asked repeatedly, the answers to the first question flipped between Yes and No. The correct answer is Yes (2357 is in fact a prima number). This suggests that either the Temperature was set too high internally in the default configuration or that this question really needed a better model (probably a more reasoning oriented model). The router should've tried a more complex model for this question imo so there seems room for improvement there too. Llama models are apparently not the best at self-assesment and reflection as of now so will retry these sorts of chats on other models. In fact when the same question was asked with a prompt to explain the work, the router again picked the same (small, sub-optimal and clearly not reasoning) model and that resulted in an incorrect explanation as shown below.
    When the same question was asked repeatedly, the answers to the first question flipped between Yes and No. The correct answer is Yes (2357 is in fact a prime number). This suggests that either the Temperature was set too high internally in the default configuration or that this question really needed a better model (probably a more reasoning oriented model). The router should ideally have tried the more complex model for this question imo so there seems room for improvement there too. Llama models are apparently not the best at self-assesment and reflection as of now so will retry these sorts of chats on other models. In fact when the same question was asked with a prompt to explain the work, the router again picked the same (small, sub-optimal and clearly not reasoning) model and that resulted in an incorrect explanation as shown below.

    User: Is 2357 a prime number ? Explain your reasoning
    Model: 2357 is not a prime number. It can be divided by 7 and 337, so it is a composite number.
    @@ -52,12 +62,15 @@ Trying similar questions with the default Amazon Nova prompt router. The short v

    (the second answer above was from the Nova Pro model and the answer as well as reasoning were correct)

    Even with the Nova router however the metrics did not show any Cache tokens being read or written even when the exact same pronmpts were repeatedly provided. However that could be simply a bug or missing capability in this relatively new and immature feature.
    As with the Meta/ Llama router, even with the Nova router however the metrics did not show any Cache tokens being read or written even when the exact same pronmpts were repeatedly provided. That could be simply a bug or missing capability in this relatively new and immature feature.

    With the default Anthropic router, the combination of the routing function and the quality of the models seemed to provide better results. For the math question, the routing function sent it to the bigger model right from the start (instead of sending to the simpler model as in the case of the Meta and Nova routers) and this model did reason and answer correctly even at the cost fo the highest latency of response compared to the other two default routers. Of course cant generalize about quality based on just a few prompts and responses. Will run proper eval benchmarks on these later.
    With the default Anthropic router, the combination of the routing function and the quality of the models seemed to provide better results. For the math question, the routing function sent it to the bigger model right from the start (instead of sending to the simpler model as in the case of the Meta and Nova routers) and this model did reason and answer correctly even at the cost fo the highest latency of response compared to the other two default routers. Of course cant generalize about quality based on just a few prompts and responses. We will run proper eval benchmarks on these later.

    Is 2357 a prime number ? Answer only via yes or no
    Yes, 2357 is a prime number. Here's the explanation:
    Yes

    Is 2357 a prime number ? Explain your reasoning
    2357 is a prime number. Here's the explanation:

    First, let's recall the definition of a prime number: A prime number is a natural number greater than 1 that is only
    divisible by 1 and itself.
    @@ -79,6 +92,7 @@ With the default Anthropic router, the combination of the routing function and t

    Therefore, 2357 is a prime number.

    For
    On the other question, interestingly the Anthropic router routed the prompt to the smaller model unlike in the case of the Meta and Amazon Nova routers.

    List the 4 most common causes of a kubernetes pod being stuck in the crashloopbackoff. Only answer in 1 line per cause
    @@ -88,3 +102,5 @@ On the other question, interestingly the Anthropic router routed the prompt to t
    3. Resource constraints (CPU, memory) exceeded.
    4. Dependency issues (e.g., unable to connect to a database).

    All the above examples were with the "Chat" mode of the AWS playground. There is also a "Text" mode (i.e. "Single prompt" mode). Initial tests with the single prompt mode did not change the key observations noted above (for instance the Cache tokens were still showing 0) but this should be tested further since there could be Routing logice efficiency implications for Single prompt vs multi-turn chat due to implications for LLM context lengths. Additional tests to be done for such aspects in future.

  6. srampal revised this gist Jun 30, 2025. No changes.
  7. srampal revised this gist Jun 30, 2025. 1 changed file with 2 additions and 2 deletions.
    4 changes: 2 additions & 2 deletions gistfile1.md
    Original file line number Diff line number Diff line change
    @@ -1,7 +1,7 @@
    Amazon offers two main managed AI services in AWS: SageMaker for building and training custom models, and Bedrock for easy access to foundation models via a API and a serverless model. This doc is only focussed on AWS BedRock and in particular on the Prompt Routing feature. We will cover other features or aspects in other docs.
    Amazon offers two main managed AI services in AWS: SageMaker for building and training custom models, and Bedrock for easy access to foundation models via an API and a serverless model. This doc is only focussed on AWS BedRock and in particular on the Prompt Routing feature. We will cover other related service features in other docs.

    Prompt Routing (sometimes marketed as Intelligent Prompt Routing) is a new feature added to the AWS Bedrock service and is still in early stages .. becoming generally available since April 2025. The capabilities of both Bedrock and its Prompt Routing feature are likely to evolve and this note only describes the capabilities at the time of writing this note (June 2025).
    You can use the Prompt Routing feature with a built-in set of foundation models from AWS, Anthropic and Meta.
    You can use the Prompt Routing feature with a built-in set of foundation models from AWS, Anthropic and Meta. The basic capability here is that the service will automatically route a user provided LLM prompt to one of multiple LLM models being served in a serverless manner with an aim of either improving inference performance or accuracy of response or lowering inference costs.

    Currently prompt routers can only be used with models from the same provider family (and these can be only the Amazon Nova model family, Anthropic Claude family or Meta Llama family). Also the router can only route between 2 instances of served models. These restrictions suggest that the internal response evaluation and fallback logic likely works well only when the backends have been similarly trained. Note that the parent service itself (i.e. Bedrock) supports a much larger set of models and model families. But the Intelligent Prompt Routing feature is only supported for up to 2 model instances where both instances are from the same model familiy and only these 3 model families are supported. Also, this feature is only optimized for English language models. All such constraints hint at how this works internally. More on this later.
    3 pre-configured default routers are currently provided one for each of the 3 supported model families. Alternately you can custom config a router but it would still only work with one of these 3 model families.
  8. srampal revised this gist Jun 30, 2025. 1 changed file with 1 addition and 2 deletions.
    3 changes: 1 addition & 2 deletions gistfile1.md
    Original file line number Diff line number Diff line change
    @@ -1,5 +1,4 @@
    Amazon offers two main managed AI services in AWS: SageMaker for building and training custom models, and Bedrock for easy access to foundation models via API.
    This doc is only focussed on AWS BedRock and in particular on the Prompt Routing feature. We will cover other features or aspects in other docs.
    Amazon offers two main managed AI services in AWS: SageMaker for building and training custom models, and Bedrock for easy access to foundation models via a API and a serverless model. This doc is only focussed on AWS BedRock and in particular on the Prompt Routing feature. We will cover other features or aspects in other docs.

    Prompt Routing (sometimes marketed as Intelligent Prompt Routing) is a new feature added to the AWS Bedrock service and is still in early stages .. becoming generally available since April 2025. The capabilities of both Bedrock and its Prompt Routing feature are likely to evolve and this note only describes the capabilities at the time of writing this note (June 2025).
    You can use the Prompt Routing feature with a built-in set of foundation models from AWS, Anthropic and Meta.
  9. srampal revised this gist Jun 30, 2025. 1 changed file with 3 additions and 1 deletion.
    4 changes: 3 additions & 1 deletion gistfile1.md
    Original file line number Diff line number Diff line change
    @@ -3,10 +3,12 @@ This doc is only focussed on AWS BedRock and in particular on the Prompt Routing

    Prompt Routing (sometimes marketed as Intelligent Prompt Routing) is a new feature added to the AWS Bedrock service and is still in early stages .. becoming generally available since April 2025. The capabilities of both Bedrock and its Prompt Routing feature are likely to evolve and this note only describes the capabilities at the time of writing this note (June 2025).
    You can use the Prompt Routing feature with a built-in set of foundation models from AWS, Anthropic and Meta.
    Currently prompt routers can only be used with models from the same provider family (and these can be only the Amazon Nova model family, Anthropic Claude family or Meta Llama family). Also the router can only route between 2 instances of served models. These restrictions suggest that the internal response evaluation and fallback logic likely works well only when the backends have been similarly trained. Note that the parent service which is Bedrock supports a much larger set of models and model families. But the Intelligent Prompt Routing feature is only supported for up to 2 model instances where both instances are from the same model familiy and only these 3 model families are supported. Also, this feature is only optimized for English language models. All such constraints hint at how this works internally. More on this later.

    Currently prompt routers can only be used with models from the same provider family (and these can be only the Amazon Nova model family, Anthropic Claude family or Meta Llama family). Also the router can only route between 2 instances of served models. These restrictions suggest that the internal response evaluation and fallback logic likely works well only when the backends have been similarly trained. Note that the parent service itself (i.e. Bedrock) supports a much larger set of models and model families. But the Intelligent Prompt Routing feature is only supported for up to 2 model instances where both instances are from the same model familiy and only these 3 model families are supported. Also, this feature is only optimized for English language models. All such constraints hint at how this works internally. More on this later.
    3 pre-configured default routers are currently provided one for each of the 3 supported model families. Alternately you can custom config a router but it would still only work with one of these 3 model families.

    Sample use of chat with the built-in default Meta prompt router

    User: Is 2357 a prime number ? Answer only via yes or no
    Model:
    Yes
  10. srampal revised this gist Jun 30, 2025. 1 changed file with 2 additions and 0 deletions.
    2 changes: 2 additions & 0 deletions gistfile1.md
    Original file line number Diff line number Diff line change
    @@ -1,9 +1,11 @@
    Amazon offers two main managed AI services in AWS: SageMaker for building and training custom models, and Bedrock for easy access to foundation models via API.
    This doc is only focussed on AWS BedRock and in particular on the Prompt Routing feature. We will cover other features or aspects in other docs.

    Prompt Routing (sometimes marketed as Intelligent Prompt Routing) is a new feature added to the AWS Bedrock service and is still in early stages .. becoming generally available since April 2025. The capabilities of both Bedrock and its Prompt Routing feature are likely to evolve and this note only describes the capabilities at the time of writing this note (June 2025).
    You can use the Prompt Routing feature with a built-in set of foundation models from AWS, Anthropic and Meta.
    Currently prompt routers can only be used with models from the same provider family (and these can be only the Amazon Nova model family, Anthropic Claude family or Meta Llama family). Also the router can only route between 2 instances of served models. These restrictions suggest that the internal response evaluation and fallback logic likely works well only when the backends have been similarly trained. Note that the parent service which is Bedrock supports a much larger set of models and model families. But the Intelligent Prompt Routing feature is only supported for up to 2 model instances where both instances are from the same model familiy and only these 3 model families are supported. Also, this feature is only optimized for English language models. All such constraints hint at how this works internally. More on this later.
    3 pre-configured default routers are currently provided one for each of the 3 supported model families. Alternately you can custom config a router but it would still only work with one of these 3 model families.

    Sample use of chat with the built-in default Meta prompt router
    User: Is 2357 a prime number ? Answer only via yes or no
    Model:
  11. srampal renamed this gist Jun 30, 2025. 1 changed file with 0 additions and 0 deletions.
    File renamed without changes.
  12. srampal revised this gist Jun 30, 2025. 1 changed file with 2 additions and 2 deletions.
    4 changes: 2 additions & 2 deletions gistfile1.txt
    Original file line number Diff line number Diff line change
    @@ -2,8 +2,8 @@ Amazon offers two main managed AI services in AWS: SageMaker for building and tr
    This doc is only focussed on AWS BedRock and in particular on the Prompt Routing feature. We will cover other features or aspects in other docs.
    Prompt Routing (sometimes marketed as Intelligent Prompt Routing) is a new feature added to the AWS Bedrock service and is still in early stages .. becoming generally available since April 2025. The capabilities of both Bedrock and its Prompt Routing feature are likely to evolve and this note only describes the capabilities at the time of writing this note (June 2025).
    You can use the Prompt Routing feature with a built-in set of foundation models from AWS, Anthropic and Meta.
    Currently prompt routers can only be used with models from the same provider family (and these can be only the Amazon Nova model family, Anthropic Claude family or Meta Llama family). Also the router can only route between 2 instances of served models. These restrictions suggest that the internal response evaluation and fallback logic likely works well only when the backends have been similarly trained
    3 pre-configured default routers are currently provided one for each of the 3 supported model families. Alternately you can custom config a router.
    Currently prompt routers can only be used with models from the same provider family (and these can be only the Amazon Nova model family, Anthropic Claude family or Meta Llama family). Also the router can only route between 2 instances of served models. These restrictions suggest that the internal response evaluation and fallback logic likely works well only when the backends have been similarly trained. Note that the parent service which is Bedrock supports a much larger set of models and model families. But the Intelligent Prompt Routing feature is only supported for up to 2 model instances where both instances are from the same model familiy and only these 3 model families are supported. Also, this feature is only optimized for English language models. All such constraints hint at how this works internally. More on this later.
    3 pre-configured default routers are currently provided one for each of the 3 supported model families. Alternately you can custom config a router but it would still only work with one of these 3 model families.
    Sample use of chat with the built-in default Meta prompt router
    User: Is 2357 a prime number ? Answer only via yes or no
    Model:
  13. srampal revised this gist Jun 28, 2025. 1 changed file with 67 additions and 9 deletions.
    76 changes: 67 additions & 9 deletions gistfile1.txt
    Original file line number Diff line number Diff line change
    @@ -1,12 +1,9 @@
    AWS offers two main managed AI services: SageMaker for building and training custom models, and Bedrock for easy access to foundation models via API.
    Bedrock is the easy-to-use, serverless, pay-as-you-go service, typically used for Generative AI inferencing only.
    Sagemaker is for the advanced user that wants to also do model training and needs more control in setting up their services or applications.
    This doc is only focussed on AWS BedRock and in particular on the Prompt Routing feature. Will cover other aspects in other docs.
    Prompt Routing (sometimes marketed as Intelligent Prompt Routing) is a new feature added to the AWS Bedrock service and is still in early stages. These features are likely to evolve and this note only describes the capabilities at the time of writing this note (June 2025).
    You can use the Prompt Routing feature with a built-in set of foundation models from AWS, Anthropic, Meta etc (there are about 20+ model providers)
    or with models purchased from the AWS Models marketplace (which has 100s of options).
    Currently prompt routers can only be used with models from the same provider family (e.g. all Llama models or all Anthropic/ Claude models etc). #doublecheckthis
    3 pre-configured default routers are currently provided (for Amazon's own Nova family of models, Anthropic models and Meta models). Alternately you can custom config a router.
    Amazon offers two main managed AI services in AWS: SageMaker for building and training custom models, and Bedrock for easy access to foundation models via API.
    This doc is only focussed on AWS BedRock and in particular on the Prompt Routing feature. We will cover other features or aspects in other docs.
    Prompt Routing (sometimes marketed as Intelligent Prompt Routing) is a new feature added to the AWS Bedrock service and is still in early stages .. becoming generally available since April 2025. The capabilities of both Bedrock and its Prompt Routing feature are likely to evolve and this note only describes the capabilities at the time of writing this note (June 2025).
    You can use the Prompt Routing feature with a built-in set of foundation models from AWS, Anthropic and Meta.
    Currently prompt routers can only be used with models from the same provider family (and these can be only the Amazon Nova model family, Anthropic Claude family or Meta Llama family). Also the router can only route between 2 instances of served models. These restrictions suggest that the internal response evaluation and fallback logic likely works well only when the backends have been similarly trained
    3 pre-configured default routers are currently provided one for each of the 3 supported model families. Alternately you can custom config a router.
    Sample use of chat with the built-in default Meta prompt router
    User: Is 2357 a prime number ? Answer only via yes or no
    Model:
    @@ -26,4 +23,65 @@ When the same question was asked repeatedly, the answers to the first question f
    User: Is 2357 a prime number ? Explain your reasoning
    Model: 2357 is not a prime number. It can be divided by 7 and 337, so it is a composite number.

    Trying similar questions with the default Amazon Nova prompt router. The short version of the math question was again incorrectly routed and answered, however when prompted for an explanation, the prompt was routed to a more comprehensive model and the correct answer and explanation was provided as shown below. This suggests better semantic routing and potentially better self-assesment with the Nova router and Nova models.

    User: Is 2357 a prime number ? Answer only via yes or no
    Model: No.
    (the above query was routed to Nova lite and the answer was incorrect)

    User: is 2357 a prime number ? explain your work
    Model: To determine if 2357 is a prime number, we need to check if it has any divisors other than 1 and itself.

    1. **Check for divisibility by small prime numbers:**
    - **2:** 2357 is odd, so it's not divisible by 2.
    - **3:** Sum of digits of 2357 is 2 + 3 + 5 + 7 = 17. Since 17 is not divisible by 3, 2357 is not divisible by 3.
    - **5:** 2357 does not end in 0 or 5, so it's not divisible by 5.
    - **7:** Perform the division: 2357 ÷ 7 ≈ 336.714, which is not an integer.
    - **11:** Alternating sum of digits is 2 - 3 + 5 - 7 = -3. Since -3 is not divisible by 11, 2357 is not divisible by 11.
    - **13:** Perform the division: 2357 ÷ 13 ≈ 181.308, which is not an integer.
    - **17:** Perform the division: 2357 ÷ 17 ≈ 138.647, which is not an integer.
    - **19:** Perform the division: 2357 ÷ 19 ≈ 124.053, which is not an integer.
    - **23:** Perform the division: 2357 ÷ 23 ≈ 102.478, which is not an integer.
    - **29:** Perform the division: 2357 ÷ 29 ≈ 81.276, which is not an integer.
    - **31:** Perform the division: 2357 ÷ 31 ≈ 76.032, which is not an integer.

    Since 2357 is not divisible by any prime number up to its square root (approximately 48.5), it is a prime number.

    (the second answer above was from the Nova Pro model and the answer as well as reasoning were correct)

    Even with the Nova router however the metrics did not show any Cache tokens being read or written even when the exact same pronmpts were repeatedly provided. However that could be simply a bug or missing capability in this relatively new and immature feature.

    With the default Anthropic router, the combination of the routing function and the quality of the models seemed to provide better results. For the math question, the routing function sent it to the bigger model right from the start (instead of sending to the simpler model as in the case of the Meta and Nova routers) and this model did reason and answer correctly even at the cost fo the highest latency of response compared to the other two default routers. Of course cant generalize about quality based on just a few prompts and responses. Will run proper eval benchmarks on these later.

    Is 2357 a prime number ? Answer only via yes or no
    Yes, 2357 is a prime number. Here's the explanation:

    First, let's recall the definition of a prime number: A prime number is a natural number greater than 1 that is only
    divisible by 1 and itself.

    To determine if 2357 is prime, we need to check if it has any divisors other than 1 and itself.

    We only need to check up to the square root of 2357, which is approximately 48.55. This is because if a number has a
    divisor larger than its square root, it must also have a corresponding divisor smaller than its square root.

    We can check for divisibility by all prime numbers up to 48: 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47

    After checking, we find that 2357 is not divisible by any of these numbers:

    2357 ÷ 2 = 1178 remainder 1
    2357 ÷ 3 = 785 remainder 2
    2357 ÷ 5 = 471 remainder 2
    2357 ÷ 7 = 336 remainder 5 ...and so on for the rest of the primes up to 47.
    Since 2357 is not divisible by any prime number up to its square root, it is indeed a prime number.

    Therefore, 2357 is a prime number.

    On the other question, interestingly the Anthropic router routed the prompt to the smaller model unlike in the case of the Meta and Amazon Nova routers.

    List the 4 most common causes of a kubernetes pod being stuck in the crashloopbackoff. Only answer in 1 line per cause

    1. Incorrect container command or arguments.
    2. Missing or incorrect environment variables.
    3. Resource constraints (CPU, memory) exceeded.
    4. Dependency issues (e.g., unable to connect to a database).

  14. srampal revised this gist Jun 27, 2025. 1 changed file with 28 additions and 6 deletions.
    34 changes: 28 additions & 6 deletions gistfile1.txt
    Original file line number Diff line number Diff line change
    @@ -1,7 +1,29 @@
    AWS has two core categories of managed AI services: AWS Sagemaker and AWS Bedrock.
    Bedrock is the easy-to-use, serverless, pay-as-you-go service.
    Sagemaker is for the advanced user that wants more control in setting up their services or applications.
    Intelligent Prompt Routing is a new feature added to the AWS Bedrock service and is still in early stages.
    You can use a built-in set of foundation models from AWS, Anthropic, Meta etc (there are about 20+ model providers)
    or you can purchase models from the AWS Models marketplace (which has 100s of options)
    AWS offers two main managed AI services: SageMaker for building and training custom models, and Bedrock for easy access to foundation models via API.
    Bedrock is the easy-to-use, serverless, pay-as-you-go service, typically used for Generative AI inferencing only.
    Sagemaker is for the advanced user that wants to also do model training and needs more control in setting up their services or applications.
    This doc is only focussed on AWS BedRock and in particular on the Prompt Routing feature. Will cover other aspects in other docs.
    Prompt Routing (sometimes marketed as Intelligent Prompt Routing) is a new feature added to the AWS Bedrock service and is still in early stages. These features are likely to evolve and this note only describes the capabilities at the time of writing this note (June 2025).
    You can use the Prompt Routing feature with a built-in set of foundation models from AWS, Anthropic, Meta etc (there are about 20+ model providers)
    or with models purchased from the AWS Models marketplace (which has 100s of options).
    Currently prompt routers can only be used with models from the same provider family (e.g. all Llama models or all Anthropic/ Claude models etc). #doublecheckthis
    3 pre-configured default routers are currently provided (for Amazon's own Nova family of models, Anthropic models and Meta models). Alternately you can custom config a router.
    Sample use of chat with the built-in default Meta prompt router
    User: Is 2357 a prime number ? Answer only via yes or no
    Model:
    Yes
    User: List the 4 most common causes of a kubernetes pod being stuck in the crashloopbackoff. Only answer in 1 line per cause
    Model:
    Here are the 4 most common causes of a Kubernetes pod being stuck in CrashLoopBackOff:

    1. Insufficient resources (e.g., CPU, memory) allocated to the pod.
    2. Incorrect or missing environment variables or configuration settings.
    3. Container image issues, such as a corrupted or invalid image.
    4. Application code errors or exceptions that cause the container to crash repeatedly.

    No setup was needed for the above chat. There were some nice metrics available for each chat prompt and response including data on which model provided the response and whether tokens were read from or written to cache. Based on these metrics, the first answer came from a LLama 3.1 8B Instruct model (response latency was about 400 ms) and the second from a Llama 3.1 70B Instruct model (response latency was about 1200 ms).
    When the same question was asked repeatedly, the answers to the first question flipped between Yes and No. The correct answer is Yes (2357 is in fact a prima number). This suggests that either the Temperature was set too high internally in the default configuration or that this question really needed a better model (probably a more reasoning oriented model). The router should've tried a more complex model for this question imo so there seems room for improvement there too. Llama models are apparently not the best at self-assesment and reflection as of now so will retry these sorts of chats on other models. In fact when the same question was asked with a prompt to explain the work, the router again picked the same (small, sub-optimal and clearly not reasoning) model and that resulted in an incorrect explanation as shown below.

    User: Is 2357 a prime number ? Explain your reasoning
    Model: 2357 is not a prime number. It can be divided by 7 and 337, so it is a composite number.


  15. srampal created this gist Jun 27, 2025.
    7 changes: 7 additions & 0 deletions gistfile1.txt
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,7 @@
    AWS has two core categories of managed AI services: AWS Sagemaker and AWS Bedrock.
    Bedrock is the easy-to-use, serverless, pay-as-you-go service.
    Sagemaker is for the advanced user that wants more control in setting up their services or applications.
    Intelligent Prompt Routing is a new feature added to the AWS Bedrock service and is still in early stages.
    You can use a built-in set of foundation models from AWS, Anthropic, Meta etc (there are about 20+ model providers)
    or you can purchase models from the AWS Models marketplace (which has 100s of options)