Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
30 views12 pages

AzureAI Fundamentals

Uploaded by

dennisxtjohn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views12 pages

AzureAI Fundamentals

Uploaded by

dennisxtjohn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 12

Machine learning - This is often the foundation for an AI system, and is the way we

"teach" a computer model to make predictions and draw conclusions from data.
Computer vision - Capabilities within AI to interpret the world visually through
cameras, video, and images.
Natural language processing - Capabilities within AI for a computer to interpret
written or spoken language, and respond in kind.
Document intelligence - Capabilities within AI that deal with managing, processing,
and using high volumes of data found in forms and documents.
Knowledge mining - Capabilities within AI to extract information from large volumes
of often unstructured data to create a searchable knowledge store.
Generative AI - Capabilities within AI that create original content in a variety of
formats including natural language, image, code, and more.

So how do machines learn?

The answer is, from data. In today's world, we create huge volumes of data as we go
about our everyday lives. From the text messages, emails, and social media posts we
send to the photographs and videos we take on our phones, we generate massive
amounts of information. More data still is created by millions of sensors in our
homes, cars, cities, public transport infrastructure, and factories.

Data scientists can use all of that data to train machine learning models that can
make predictions and inferences based on the relationships they find in the data.

Machine learning models try to capture the relationship between data. For example,
suppose an environmental conservation organization wants volunteers to identify and
catalog different species of wildflower using a phone app. The following animation
shows how machine learning can be used to enable this scenario.

NLP enables you to create software that can:

Analyze and interpret text in documents, email messages, and other sources.
Interpret spoken language, and synthesize speech responses.
Automatically translate spoken or written phrases between languages.
Interpret commands and determine appropriate actions.

You can use Microsoft's Azure AI Language to build natural language processing
solutions. Some features of Azure AI Language include understanding and analyzing
text, training conversational language models that can understand spoken or text-
based commands, and building intelligent applications.

Microsoft's Azure AI Speech is another service that can be used to build natural
language processing solutions. Azure AI Speech features include speech recognition
and synthesis, real-time translations, conversation transcriptions, and more.

You can explore Azure AI Language features in the Azure Language Studio and Azure
AI Speech features in the Azure Speech Studio. The service features are available
for use and testing in the studios and other programming languages.

Azure AI Search can utilize the built-in AI capabilities of Azure AI services such
as image processing, document intelligence, and natural language processing to
extract data. The product's AI capabilities makes it possible to index previously
unsearchable documents and to extract and surface insights from large amounts of
data quickly.

You can use Microsoft's Azure AI Document Intelligence to build solutions that
manage and accelerate data collection from scanned documents. Features of Azure AI
Document Intelligence help automate document processing in applications and
workflows, enhance data-driven strategies, and enrich document search capabilities.
You can use prebuilt models to add intelligent document processing for invoices,
receipts, health insurance cards, tax forms, and more. You can also use Azure AI
Document Intelligence to create custom models with your own labeled datasets. The
service features are available for use and testing in the Document Intelligence
Studio and other programming languages.

Azure OpenAI Service is Microsoft's cloud solution for deploying, customizing, and
hosting generative AI models. It brings together the best of OpenAI's cutting edge
models and APIs with the security and scalability of the Azure cloud platform.

Azure OpenAI Service supports many generative model choices that can serve
different needs. You can use Azure AI Studio to create generative AI solutions,
such as custom copilot chat-based assistants that use Azure OpenAI Service models

AI software development is guided by a set of six principles


Fairness
Reliability and safety
Privacy and security
Inclusiveness
Transparency (Interpretability or Intelligibility)
Accountability

Machine learning is in many ways the intersection of two disciplines - data science
and software engineering. The goal of machine learning is to use data to create a
predictive model that can be incorporated into a software application or service.
To achieve this goal requires collaboration between data scientists who explore and
prepare the data before using it to train a machine learning model, and software
developers who integrate the models into applications where they're used to predict
new data values (a process known as inferencing).

Fundamentally, a machine learning model is a software application that encapsulates


a function to calculate an output value based on one or more input values. The
process of defining that function is known as training. After the function has been
defined, you can use it to predict new values in a process called inferencing.
The training data consists of past observations. In most cases, the observations
include the observed attributes or features of the thing being observed, and the
known value of the thing you want to train a model to predict (known as the label).
An algorithm is applied to the data to try to determine a relationship between the
features and the label, and generalize that relationship as a calculation that can
be performed on x to calculate y. The specific algorithm used depends on the kind
of predictive problem you're trying to solve (more about this later), but the basic
principle is to try to fit a function to the data, in which the values of the
features can be used to calculate the label.
Maching Learning types
supervised machine learning (regression and classification) - Under classification
- Binary classification and Multiclass classification
unsupervised machine learning (clustering).

Supervised machine learning is a general term for machine learning algorithms in


which the training data includes both feature values and known label values.
Supervised machine learning is used to train models by determining a relationship
between the features and labels in past observations, so that unknown labels can be
predicted for features in future cases.
Regression is a form of supervised machine learning in which the label predicted by
the model is a numeric value. For example:

The number of ice creams sold on a given day, based on the temperature, rainfall,
and windspeed.
The selling price of a property based on its size in square feet, the number of
bedrooms it contains, and socio-economic metrics for its location.
The fuel efficiency (in miles-per-gallon) of a car based on its engine size,
weight, width, height, and length.
Classification
Classification is a form of supervised machine learning in which the label
represents a categorization, or class. There are two common classification
scenarios.

in machine learning logistic regression is used for classification, not regression.


The important point is the logistic nature of the function it produces, which
describes an S-shaped curve between a lower and upper value (0.0 and 1.0 when used
for binary classification).

The arrangement of the confusion matrix is such that correct (true) predictions are
shown in a diagonal line from top-left to bottom-right. Often, color-intensity is
used to indicate the number of predictions in each cell, so a quick glance at a
model that predicts well should reveal a deeply shaded diagonal trend.
The simplest metric you can calculate from the confusion matrix is accuracy - the
proportion of predictions that the model got right. Accuracy is calculated as:
(TN+TP) ÷ (TN+FN+FP+TP)

Recall is a metric that measures the proportion of positive cases that the model
identified correctly. In other words, compared to the number of patients who have
diabetes, how many did the model predict to have diabetes?
The formula for recall is:
TP ÷ (TP+FN)

Precision is a similar metric to recall, but measures the proportion of predicted


positive cases where the true label is actually positive. In other words, what
proportion of the patients predicted by the model to have diabetes actually have
diabetes?
The formula for precision is:
TP ÷ (TP+FP)

F1-score is an overall metric that combined recall and precision. The formula for
F1-score is:
(2 x Precision x Recall) ÷ (Precision + Recall)

Another name for recall is the true positive rate (TPR), and there's an equivalent
metric called the false positive rate (FPR) that is calculated as FP÷(FP+TN). We
already know that the TPR for our model when using a threshold of 0.5 is 0.75, and
we can use the formula for FPR to calculate a value of 0÷2 = 0.

Of course, if we were to change the threshold above which the model predicts true
(1), it would affect the number of positive and negative predictions; and therefore
change the TPR and FPR metrics. These metrics are often used to evaluate a model by
plotting a received operator characteristic (ROC) curve that compares the TPR and
FPR for every possible threshold value between 0.0 and 1.0:

The ROC curve for a perfect model would go straight up the TPR axis on the left and
then across the FPR axis at the top. Since the plot area for the curve measures
1x1, the area under this perfect curve would be 1.0 (meaning that the model is
correct 100% of the time). In contrast, a diagonal line from the bottom-left to the
top-right represents the results that would be achieved by randomly guessing a
binary label; producing an area under the curve of 0.5. In other words, given two
possible class labels, you could reasonably expect to guess correctly 50% of the
time.
In the case of our diabetes model, the curve above is produced, and the area under
the curve (AUC) metric is 0.875. Since the AUC is higher than 0.5, we can conclude
the model performs better at predicting whether or not a patient has diabetes than
randomly guessing.

One-vs-Rest algorithms train a binary classification function for each class, each
calculating the probability that the observation is an example of the target class.
Each function calculates the probability of the observation being a specific class
compared to any other class.

Multinomial algorithms
As an alternative approach is to use a multinomial algorithm, which creates a
single function that returns a multi-valued output. The output is a vector (an
array of values) that contains the probability distribution for all possible
classes - with a probability score for each class which when totaled add up to 1.0:
f(x) =[P(y=0|x), P(y=1|x), P(y=2|x)]

There are multiple metrics that you can use to evaluate cluster separation,
including:
Average distance to cluster center: How close, on average, each point in the
cluster is to the centroid of the cluster.
Average distance to other center: How close, on average, each point in the cluster
is to the centroid of all other clusters.
Maximum distance to cluster center: The furthest distance between a point in the
cluster and its centroid.
Silhouette: A value between -1 and 1 that summarizes the ratio of distance between
points in the same cluster and points in different clusters (The closer to 1, the
better the cluster separation).

Deep learning is an advanced form of machine learning that tries to emulate the way
the human brain learns. The key to deep learning is the creation of an artificial
neural network that simulates electrochemical activity in biological neurons by
using mathematical functions, as shown here.
Biological neural network Artificial neural network
Diagram of a natural neural network. Diagram of an artificial neural network.
Neurons fire in response to electrochemical stimuli. When fired, the signal is
passed to connected neurons. Each neuron is a function that operates on an input
value (x) and a weight (w). The function is wrapped in an activation function that
determines whether to pass the output on.
Artificial neural networks are made up of multiple layers of neurons - essentially
defining a deeply nested function. This architecture is the reason the technique is
referred to as deep learning and the models produced by it are often referred to as
deep neural networks (DNNs). You can use deep neural networks for many kinds of
machine learning problem, including regression and classification, as well as more
specialized models for natural language processing and computer vision.
Just like other machine learning techniques discussed in this module, deep learning
involves fitting training data to a function that can predict a label (y) based on
the value of one or more features (x). The function (f(x)) is the outer layer of a
nested function in which each layer of the neural network encapsulates functions
that operate on x and the weight (w) values associated with them. The algorithm
used to train the model involves iteratively feeding the feature values (x) in the
training data forward through the layers to calculate output values for ŷ,
validating the model to evaluate how far off the calculated ŷ values are from the
known y values (which quantifies the level of error, or loss, in the model), and
then modifying the weights (w) to reduce the loss. The trained model includes the
final weight values that result in the most accurate predictions.
In Azure Machine Learning studio, you can (among other things):

Import and explore data.


Create and use compute resources.
Run code in notebooks.
Use visual tools to create jobs and pipelines.
Use automated machine learning to train models.
View details of trained models, including evaluation metrics, responsible AI
information, and training parameters.
Deploy trained models for on-request and batch inferencing.
Import and manage models from a comprehensive model catalog.

Azure AI services are easy to use AI capabilities made available as resources on


the Azure platform. Azure AI service capabilities include Language, Speech, Vision,
Decision, Search, and Azure OpenAI.

In this module we’ve used several different terms relating to AI services. Here's a
recap:

API – application programming interfaces (APIs) enable software components to


communicate, so one side can be updated without stopping the other from working.
Artificial Intelligence (AI) – computer programs that respond in ways that are
normally associated with human reasoning, learning, and thought.
Azure AI services – a portfolio of AI services that can be incorporated into
applications quickly and easily without specialist knowledge. Azure AI services is
also the name for the multi-service resource created in the Azure portal that
provides access to several different Azure AI services with a single key and
endpoint.
Endpoint – the location of a resource, such as an Azure AI service
Key – a private string that is used to authenticate a request.
Machine learning – the ability for computer programs to learn from large amounts of
data, in a process k
nown as "training".
Multi-service resource – the AI service resource created in the Azure portal that
provides access to a bundle of AI services.
Single-service resource – a resource created in the Azure portal that provides
access to a single Azure AI service, such as Speech, Vision, Language, etc. Each
Azure AI service has a unique key and endpoint.
RESTful API – a scalable web application programming interface used to access Azure
AI services.

Transformers and multi-modal models


CNNs have been at the core of computer vision solutions for many years. While
they're commonly used to solve image classification problems as described
previously, they're also the basis for more complex computer vision models. For
example, object detection models combine CNN feature extraction layers with the
identification of regions of interest in images to locate multiple classes of
object in the same image.

Transformers
Most advances in computer vision over the decades have been driven by improvements
in CNN-based models. However, in another AI discipline - natural language
processing (NLP), another type of neural network architecture, called a transformer
has enabled the development of sophisticated models for language. Transformers work
by processing huge volumes of data, and encoding language tokens (representing
individual words or phrases) as vector-based embeddings (arrays of numeric values).
You can think of an embedding as representing a set of dimensions that each
represent some semantic attribute of the token. The embeddings are created such
that tokens that are commonly used in the same context are closer together
dimensionally than unrelated words.

Multi-modal models
The success of transformers as a way to build language models has led AI
researchers to consider whether the same approach would be effective for image
data. The result is the development of multi-modal models, in which the model is
trained using a large volume of captioned images, with no fixed labels. An image
encoder extracts features from images based on pixel values and combines them with
text embeddings created by a language encoder. The overall model encapsulates
relationships between natural language token embeddings and image features, as
shown here:

The Microsoft Florence model is just such a model. Trained with huge volumes of
captioned images from the Internet, it includes both a language encoder and an
image encoder. Florence is an example of a foundation model. In other words, a pre-
trained general model on which you can build multiple adaptive models for
specialist tasks. For example, you can use Florence as a foundation model for
adaptive models that perform:

Image classification: Identifying to which category an image belongs.


Object detection: Locating individual objects within an image.
Captioning: Generating appropriate descriptions of images.
Tagging: Compiling a list of relevant text tags for an image.

Microsoft Azure provides multiple Azure AI services that you can use to detect and
analyze faces, including:

Azure AI Vision, which offers face detection and some basic face analysis, such as
returning the bounding box coordinates around an image.
Azure AI Video Indexer, which you can use to detect and identify faces in a video.
Azure AI Face, which offers pre-built algorithms that can detect, recognize, and
analyze faces.
Of these, Face offers the widest range of facial analysis capabilities.

Face service
The Azure Face service can return the rectangle coordinates for any human faces
that are found in an image, as well as a series of attributes related to those face
such as:

Accessories: indicates whether the given face has accessories. This attribute
returns possible accessories including headwear, glasses, and mask, with confidence
score between zero and one for each accessory.
Blur: how blurred the face is, which can be an indication of how likely the face is
to be the main focus of the image.
Exposure: such as whether the image is underexposed or over exposed. This applies
to the face in the image and not the overall image exposure.
Glasses: whether or not the person is wearing glasses.
Head pose: the face's orientation in a 3D space.
Mask: indicates whether the face is wearing a mask.
Noise: refers to visual noise in the image. If you have taken a photo with a high
ISO setting for darker settings, you would notice this noise in the image. The
image looks grainy or full of tiny dots that make the image less clear.
Occlusion: determines if there might be objects blocking the face in the image.

Machine learning for text classification


Another useful text analysis technique is to use a classification algorithm, such
as logistic regression, to train a machine learning model that classifies text
based on a known set of categorizations. A common application of this technique is
to train a model that classifies text as positive or negative in order to perform
sentiment analysis or opinion mining.

For example, consider the following restaurant reviews, which are already labeled
as 0 (negative) or 1 (positive):
The food and service were both great: 1
A really terrible experience: 0
Mmm! tasty food and a fun vibe: 1
Slow service and substandard food: 0
With enough labeled reviews, you can train a classification model using the
tokenized text as features and the sentiment (0 or 1) a label. The model will
encapsulate a relationship between tokens and sentiment - for example, reviews with
tokens for words like "great", "tasty", or "fun" are more likely to return a
sentiment of 1 (positive), while reviews with words like "terrible", "slow", and
"substandard" are more likely to return 0 (negative).

Azure AI Language is a part of the Azure AI services offerings that can perform
advanced natural language processing over unstructured text. Azure AI Language's
text analysis features include:

Named entity recognition identifies people, places, events, and more. This feature
can also be customized to extract custom categories.
Entity linking identifies known entities together with a link to Wikipedia.
Personal identifying information (PII) detection identifies personally sensitive
information, including personal health information (PHI).
Language detection identifies the language of the text and returns a language code
such as "en" for English.
Sentiment analysis and opinion mining identifies whether text is positive or
negative.
Summarization summarizes text by identifying the most important information.
Key phrase extraction lists the main concepts from unstructured text.

To work with conversational language understanding, you need to take into account
three core concepts: utterances, entities, and intents.

Utterances
An utterance is an example of something a user might say, and which your
application must interpret. For example, when using a home automation system, a
user might use the following utterances:

"Switch the fan on."

"Turn on the light."

Entities
An entity is an item to which an utterance refers. For example, fan and light in
the following utterances:

"Switch the fan on."

"Turn on the light."

You can think of the fan and light entities as being specific instances of a
general device entity.

Intents
An intent represents the purpose, or goal, expressed in a user's utterance. For
example, for both of the previously considered utterances, the intent is to turn a
device on; so in your conversational language understanding application, you might
define a TurnOn intent that is related to these utterances.

A conversational language understanding application defines a model consisting of


intents and entities. Utterances are used to train the model to identify the most
likely intent and the entities to which it should be applied based on a given
input. The home assistant application we've been considering might include multiple
intents, like the following examples:

Azure AI Language's conversational language understanding feature enables you to


author a language model and use it for predictions. Authoring a model involves
defining entities, intents, and utterances. Generating predictions involves
publishing a model so that client applications can take user input and return
responses.

Azure resources for conversational language understanding


To use conversational language capabilities in Azure, you need a resource in your
Azure subscription. You can use the following types of resource:

Azure AI Language: A resource that enables you to build apps with industry-leading
natural language understanding capabilities without machine learning expertise. You
can use a language resource for authoring and prediction.
Azure AI services: A general resource that includes conversational language
understanding along with many other Azure AI services. You can only use this type
of resource for prediction.
The separation of resources is useful when you want to track resource utilization
for Azure AI Language use separately from client applications using all Azure AI
services applications.

Authoring
After you've created an authoring resource, you can use it to train a
conversational language understanding model. To train a model, start by defining
the entities and intents that your application will predict as well as utterances
for each intent that can be used to train the predictive model.

Conversational language understanding provides a comprehensive collection of


prebuilt domains that include pre-defined intents and entities for common
scenarios; which you can use as a starting point for your model. You can also
create your own entities and intents.

When you create entities and intents, you can do so in any order. You can create an
intent, and select words in the sample utterances you define for it to create
entities for them; or you can create the entities ahead of time and then map them
to words in utterances as you're creating the intents.

You can write code to define the elements of your model, but in most cases it's
easiest to author your model using the Language studio - a web-based interface for
creating and managing Conversational Language Understanding applications.

Training the model


After you have defined the intents and entities in your model, and included a
suitable set of sample utterances; the next step is to train the model. Training is
the process of using your sample utterances to teach your model to match natural
language expressions that a user might say to probable intents and entities.

After training the model, you can test it by submitting text and reviewing the
predicted intents. Training and testing is an iterative process. After you train
your model, you test it with sample utterances to see if the intents and entities
are recognized correctly. If they're not, make updates, retrain, and test again.

Predicting
When you are satisfied with the results from the training and testing, you can
publish your Conversational Language Understanding application to a prediction
resource for consumption.
Client applications can use the model by connecting to the endpoint for the
prediction resource, specifying the appropriate authentication key; and submit user
input to get predicted intents and entities. The predictions are returned to the
client application, which can then take appropriate action based on the predicted
intent.

Speech synthesis is concerned with vocalizing data, usually by converting text to


speech. A speech synthesis solution typically requires the following information:

The text to be spoken


The voice to be used to vocalize the speech
To synthesize speech, the system typically tokenizes the text to break it down into
individual words, and assigns phonetic sounds to each word. It then breaks the
phonetic transcription into prosodic units (such as phrases, clauses, or sentences)
to create phonemes that will be converted to audio format. These phonemes are then
synthesized as audio and can be assigned a particular voice, speaking rate, pitch,
and volume.

You can use the output of speech synthesis for many purposes, including:

Generating spoken responses to user input


Creating voice menus for telephone systems
Reading email or text messages aloud in hands-free scenarios
Broadcasting announcements in public locations, such as railway stations or
airports

The Speech to text API


You can use Azure AI Speech to text API to perform real-time or batch transcription
of audio into a text format. The audio source for transcription can be a real-time
audio stream from a microphone or an audio file.

The model that is used by the Speech to text API, is based on the Universal
Language Model that was trained by Microsoft. The data for the model is Microsoft-
owned and deployed to Microsoft Azure. The model is optimized for two scenarios,
conversational and dictation. You can also create and train your own custom models
including acoustics, language, and pronunciation if the pre-built models from
Microsoft do not provide what you need.

Real-time transcription
Real-time speech to text allows you to transcribe text in audio streams. You can
use real-time transcription for presentations, demos, or any other scenario where a
person is speaking.

In order for real-time transcription to work, your application will need to be


listening for incoming audio from a microphone, or other audio input source such as
an audio file. Your application code streams the audio to the service, which
returns the transcribed text.

Batch transcription
Not all speech to text scenarios are real time. You might have audio recordings
stored on a file share, a remote server, or even on Azure storage. You can point to
audio files with a shared access signature (SAS) URI and asynchronously receive
transcription results.

Batch transcription should be run in an asynchronous manner because the batch jobs
are scheduled on a best-effort basis. Normally a job will start executing within
minutes of the request but there is no estimate for when a job changes into the
running state.
The text to speech API
The text to speech API enables you to convert text input to audible speech, which
can either be played directly through a computer speaker or written to an audio
file.

Speech synthesis voices


When you use the text to speech API, you can specify the voice to be used to
vocalize the text. This capability offers you the flexibility to personalize your
speech synthesis solution and give it a specific character.

The service includes multiple pre-defined voices with support for multiple
languages and regional pronunciation, including neural voices that leverage neural
networks to overcome common limitations in speech synthesis with regard to
intonation, resulting in a more natural sounding voice. You can also develop custom
voices and use them with the text to speech API

Document intelligence describes AI capabilities that support processing text and


making sense of information in text. As an extension of optical character
recognition (OCR), document intelligence takes the next step a person might after
reading a form or document. It automates the process of extracting, understanding,
and saving the data in text.

Consider an organization that needs to process large numbers of receipts for


expenses claims, project costs, and other accounting purposes. Suppose someone
needs to manually enter the information into a database. The manual process is
relatively slow and potentially error-prone.

Using document intelligence, the company can take a scanned image of a receipt,
digitize the text with OCR, and pair the field items with their field names in a
database. Document intelligence can identify specific data such as the merchant's
name, merchant's address, total value, and tax value.

Azure AI Document Intelligence supports features that can analyze documents and
forms with prebuilt and custom models. In this module, you explore how Azure AI
services provide access to document intelligence capabilities.

Azure AI Document Intelligence consists of features grouped by model type:

Prebuilt models - pretrained models that have been built to process common document
types such as invoices, business cards, ID documents, and more. These models are
designed to recognize and extract specific fields that are important for each
document type.
Custom models - can be trained to identify specific fields that are not included in
the existing pretrained models.
Document analysis - general document analysis that returns structured data
representations, including regions of interest and their inter-relationships.
Prebuilt models
The prebuilt models apply advanced machine learning to accurately identify and
extract text, key-value pairs, tables, and structures from forms and documents.
These capabilities include extracting:

customer and vendor details from invoices


sales and transaction details from receipts
identification and verification details from identity documents
health insurance details
business contact details
agreement and party details from contracts
taxable compensation, mortgage interest, student loan details and more
For example, consider the prebuilt receipt model. It processes receipts by:

Matching field names to values


Identifying tables of data
Identifying specific fields, such as dates, telephone numbers, addresses, totals,
and others
The receipt model has been trained to recognize data on several different receipt
types, such as thermal receipts (printed on heat-sensitive paper), hotel receipts,
gas receipts, credit card receipts, and parking receipts.

Fields recognized include:

Name, address, and telephone number of the merchant


Date and time of the purchase
Name, quantity, and price of each item purchased
Total, subtotals, and tax values
Each field and data pair has a confidence level, indicating the likely level of
accuracy. This could be used to automatically identify when a person needs to
verify a receipt.

The model has been trained to recognize several different languages, depending on
the receipt type. For best results when using the prebuilt receipt model, images
should be:

JPEG, PNG, BMP, PDF, or TIFF format


File size less than 500 MB for paid (S0) tier and 4 MB for free (F0) tier
Between 50 x 50 pixels and 10000 x 10000 pixels
For PDF documents, no larger than 17 inches x 17 inches
One receipt per document
You can get started with training models in the Document Intelligence Studio, a
user interface for testing document analysis, prebuilt models, and creating custom
models.

Azure AI Document Intelligence resource


To use Azure AI Document Intelligence, create either a Document Intelligence or
Azure AI services resource in your Azure subscription. If you have not used
Document Intelligence before, select the free tier when you create the resource.
There are some restrictions with the free tier, for example only the first two
pages are processed for PDF or TIFF documents.

After the resource has been created, you can create client applications that use
its key and endpoint to connect forms for analysis, or use the resource in Document
Intelligence Studio.

Azure AI Search features


Azure AI Search exists to complement existing technologies and provides a
programmable search engine built on Apache Lucene, an open-source software library.
It's a highly available platform offering a 99.9% uptime SLA available for cloud
and on-premises assets.

Azure AI Search comes with the following features:

Data from any source: accepts data from any source provided in JSON format, with
auto crawling support for selected data sources in Azure.
Full text search and analysis: offers full text search capabilities supporting both
simple query and full Lucene query syntax.
AI powered search: has Azure AI capabilities built in for image and text analysis
from raw content.
Multi-lingual offers linguistic analysis for 56 languages to intelligently handle
phonetic matching or language-specific linguistics. Natural language processors
available in Azure AI Search are also used by Bing and Office.
Geo-enabled: supports geo-search filtering based on proximity to a physical
location.
Configurable user experience: has several features to improve the user experience
including autocomplete, autosuggest, pagination, and hit highlighting.

You might also like