A is correct: Model router is a deployable AI chat model that analyzes prompts in real time based on complexity, reasoning, and task type. It routes simpler requests to smaller, cheaper models and complex tasks to larger or reasoning models, optimizing cost while maintaining quality within a single deployment. Reference: https://learn.microsoft.com/en-us/azure/ai-foundry/openai/concepts/model-router
B is incorrect: While technically feasible, building a custom classification pipeline adds development complexity, maintenance overhead, and introduces additional latency and cost for the classification step. Model router provides this intelligent routing capability as a managed, single-deployment solution. Reference: https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/model-router
C is incorrect: Round-robin load balancing distributes requests without analyzing prompt complexity, meaning simple requests are sent to expensive models as often as to cheaper ones. This approach does not optimize cost based on task requirements, unlike model router which intelligently matches prompts to models. Reference: https://learn.microsoft.com/en-us/azure/api-management/genai-gateway-capabilities
D is incorrect: Using only the largest model guarantees high per-token costs for simple tasks that smaller models could handle equally well. Prompt caching helps with repeated patterns but does not address the fundamental cost inefficiency of over-provisioning model capability for simple requests. Reference: https://learn.microsoft.com/en-us/azure/ai-foundry/openai/concepts/model-router