Home / Blog

Microsoft’s Azure hosted OpenAI language models

Jason Li
Sr. Software Development Engineer
Skilled Angular and .NET developer, team leader for a healthcare insurance company.
February 08, 2023

With technologies like Azure's Cognitive Services offering API-based access to pre-trained models, modern machine learning and AI research has quickly transitioned from the lab to our IDEs.  There are numerous approaches to deploying AI services.

Microsoft and OpenAI

This method was invented by the OpenAI research team, which 2018 released the first publication on the subject. Iterations of the model it utilizes have been made, starting with the unsupervised GPT-2, which uses untagged data to simulate human behavior. GPT-2, which was developed on top of 40GB of publicly available internet content, took a lot of training to produce a model with 1.5 billion parameters. GPT-3, a significantly bigger model with 175 billion parameters, came next. GPT-3, which is solely licensed to Microsoft, is the foundation for products like the image-creating DALL-E and the programming code-focused Codex utilized by GitHub Copilot.

 Microsoft has created its Nvidia-based supercomputer servers for Azure, and its cloud instances are listed among the TOP500 supercomputers. The Nvidia Ampere A12000 Tensor Core GPUs used in Azure's AI servers are connected by a fast InfiniBand network.

You need a lot of data, a way to clean the data and do feature engineering on it, a way to train models on your data quickly, and a way to develop successful machine learning and deep learning models. Next, you require a method for deploying your models, keeping track of any drift over time, and retraining them as necessary.

If you have invested in processing resources and accelerators like GPUs, you can accomplish all of that on-site. However, even if your resources are sufficient, you can discover that they are idle a lot of the time. On the other hand, running the entire pipeline on the cloud, requiring a lot of compute resources and accelerators when necessary, and then releasing them, can occasionally be more cost-effective.

The major cloud providers, as well as a few smaller clouds, have worked hard to develop their machine-learning platforms so that they can support the entire machine-learning lifecycle, from project planning to model maintenance.

Azure's OpenAI integration

The generative AI tools developed by OpenAI were built and trained on Azure servers. A long-term agreement between OpenAI and Microsoft has now made OpenAI's tools available through Azure, with APIs tailored for Azure and billing integrated with Azure.

This does not imply that anyone can create an application that makes use of GPT-3; instead, Microsoft continues to restrict access to ensure that projects adhere to its ethical AI usage principles and are narrowly focused on particular use cases. Access to Azure OpenAI is also forbidden to direct Microsoft customers.

These regulations are generally going to stay tough, and some industries, like the healthcare industry, will probably need enhanced security to comply with legal obligations. Microsoft has learned a lesson it wants to avoid repeating from its experiments with AI language models.

Learning about Azure OpenAI Studio

You can begin writing code that uses Azure OpenAI's API endpoints after your account has been given permission to utilize it. Then give the resource a name and choose the pricing tier. There is currently only one pricing choice, but this may alter as Microsoft introduces new service tiers.

After setting up a resource, you may now use Azure OpenAI Studio to deploy a model. The majority of your work with OpenAI will be done here. You can currently pick among models in the GPT-3 family, including the code-based Codex. Other methods use embeddings, which are sophisticated semantic data that is search-optimized.

Various models with names denote both price and capability within each family. Ada is the least expensive and capable option when utilizing GPT-3, while Davinci is the most advanced. As jobs become more complex, you may choose a different model because each one is a superset of the one before. It's interesting to note that Microsoft advises beginning with the most competent model when developing an OpenAI-powered application because doing so allows you to adjust the underlying model's performance and price before putting it into production.

Customizing a model while working

Although GPT-3's text completion features have gained widespread popularity, your application needs to be considerably more tailored to your particular use case. GPT-3 shouldn't be used to power a support service that frequently offers useless guidance. Using training examples with inputs and intended outputs—what Azure OpenAI refers to as "completions"—you must create a unique model. Microsoft advises using several hundred samples and stresses the importance of having a sizable quantity of training data. To simplify handling your training data, you can combine all of your prompts and completions into a single JSON file.

You can utilize Azure OpenAI Studio to test how GPT-3 will function for your scenario once you have a customized model in place. With a simple console software that lets you enter a prompt and returns an OpenAI completion, a rudimentary playground enables you to explore how the model reacts to particular prompts. In order to get the best results, prompts should be as explicit as possible, according to Microsoft's description of creating an effective prompt as "show, don't tell." If you're developing a classifier, you can offer a list of text and expected outputs before giving inputs and a trigger to get a response because the playground also aids in training your model.

As a result, if you're using OpenAI to power a help desk triage tool, you can establish the expectation that the output will be courteous and calm, guaranteeing that it won't mimic an angry user. This is a beneficial aspect of the playground. The Codex model may be used with the same tools, allowing you to observe how it functions as a dynamic helper or as a tool for code completion.

Writing a code to use Azure OpenAI

When you're ready to begin writing, you can utilize the REST endpoints in your deployment directly or indirectly using the OpenAI Python libraries. The latter is the quickest way for you to get live code. You will require the deployment name, an authentication key, and the endpoint URL. Set the appropriate environment variables for your code once you have these. Always avoid hard-coding keys in production and instead manage them with a program like Azure Key Vault.

It's simple enough to call an endpoint: To obtain a response, you call the openai . Completion . create method and specify the maximum amount of tokens required to hold both your prompt and its reply. The text produced by your model is contained in the response object supplied by the API, which can be used by the rest of your code after being extracted and formatted. The fundamental calls are straightforward, and your code can control the answer using additional parameters. These regulate how the model samples its results and how creative it is. These constraints can be used to guarantee precise and correct responses.

Use the JSON and REST parsing facilities of the language you are using. To create API calls and operate with the returned data, you can refer to an API reference in the Azure OpenAI documentation or use Azure's GitHub-hosted Swagger specifications. With IDEs like Visual Studio, this strategy performs nicely.

Pricing for Azure OpenAI

Token-based pricing is a crucial component of OpenAI models. Instead of the usual authentication token, Azure OpenAI tokens are tokenized chunks of strings that an internal statistical model generates. To assist you in understanding how your queries are charged, Open AI offers a tool on its website that displays how strings are tokenized. Though a token may be less or more than four characters, you can anticipate that it will require around 100 tokens for 75 words.

The price of the tokens increases with model complexity. Ada, the entry-level model, costs approximately $0.0004 per 1,000 tokens, while Davinci, the top model, costs $0.02. Applying your own tuning incurs a storage cost, and employing embeddings might result in more outstanding charges due to the increasing compute demands. Model fine-tuning comes with additional expenses starting at $20 for each compute hour. The Azure website lists sample charges, but actual costs may differ based on your organization's Microsoft account status.

The simplicity of Azure OpenAIclo is arguably its most startling aspect. You only need to apply some fundamental pretraining, comprehend how prompts produce output, attach the tools to your code, and generate text content or code as and when it is required because you are using prebuilt models (with the option of some fine tuning).


The Azure OpenAI provider models quickly adapt to your specific tasks, such as content creation, succinct summation, extensive data analysis, and language processing to code translation. OpenAI Studio offers several access methods, such as REST APIs, Python SDKs, and web-based functionality through Azure OpenAI Studio. Azure OpenAI collaborates with OpenAI to develop the APIs, guaranteeing interoperability and a seamless transition from one to the other. Customers who use Azure OpenAI benefit from Microsoft Azure's security capabilities while operating the same models as OpenAI. Azure OpenAI provides personalized interconnectivity, geographic availability, and AI-responsible content filtering.