Microsoft has updated its Azure AI Foundry portal and Azure OpenAI Service APIs and SDKs to support Direct Preference Optimization (DPO) for GPT-4.1 and GPT-4.1-mini. Direct Preference Optimization (DPO) is a fine-tuning technique that can be used to adjust model weights based on human preferences using a pair of preferred and non-preferred responses.
One of the main benefits of using DPO over Reinforcement Learning from Human Feedback (RLHF) is that it’s computationally lighter and faster while being just as effective for model alignment. Organizations can use this method to train models to match their specific brand voice, safety requirements, or conversational styles.
In addition to using DPO for model fine-tuning, Microsoft has expanded Azure AI’s Global Training to 12 new regions including East US, West Europe, UK South, Switzerland North, and more. Despite the expansion, it’s still considered to be a public preview.
Microsoft said that users should expect and watch for new features coming soon including pause/resume functionality and continuous fine-tuning. It will also be bringing GPT-4.1-nano to these new regions.
The expansion of Global Training is important for data sovereignty, which is becoming a more important issue with the European Union pushing for European’s data to be handled in Europe to ensure better privacy.
Finally, Microsoft has released the new Responses API which supports your fine-tuned models, making it easier for developers to use them inside of other applications. Microsoft said this API is ideal for agentic workflows as “it supports stateful, multi-turn conversations and allows seamless tool calling, automatically stitching everything together in the background.”
The Responses API can also keep track of conversations so that the model can remember context, you can see how models reason through answers, it can let users check the progress while a response generates, and it supports background processing and works with tools like web search and file lookup.