In this blog I’m going to show one of the advantages of linking Data Lakes Analytics with Machine Learning.
We’ll be uploading a series of images to the Data Lake, we will then run a USQL script that will detect objects in the images and create relative tags in a text file.
First of all you need an instance of Data Lake Store and one of Data Lake Analytics, once these are up and running we need to enable Python/R/Cognitive in your Data Lake Analytics instance (here is a blog to help you out on this).
First things first, we need to put an image in our Data Lake Store, following Azure Data Lake best practices I put the images in my laboratory subfolder.
Once our images are in place we need to create a script, in your Data Lake analytics instance click on New Job
This will open a new blade with an empty script, let’s give our new Job a name “ImageTagging”.
In order to use Image tagging we need to import the relevant ASSEMBLIES:
REFERENCE ASSEMBLY ImageCommon; REFERENCE ASSEMBLY ImageTagging;
Next we need to extract information (location, filename etc.) on the image file(s) we want to analyse, in this case we’ll process all images in the specified folder.
@images= EXTRACT FileName string, ImgData byte[] FROM @"/Laboratory/Desks/CSbrescia/ImageTagging/{FileName:*}.jpg" USING new Cognition.Vision.ImageExtractor();
The following step is where the magic happens, the script analyses all the images located in the folder indicated before, it detects all objects present in each image and create tags; here is the structure of this “variable”:
- Image name
- Number of tagged objects detected
- A string with all the tags
@TaggedObjects= PROCESS @images PRODUCE FileName, NumObjects int, Tags string READONLY FileName USING new Cognition.Vision.ImageTagger();
Now we can write our variable with all the tags to an output file
OUTPUT @TaggedObjects
TO “/Laboratory/Desks/CSbrescia/ImageTagging/ImageTags.tsv”
USING Outputters.Tsv();
Here are the images I used in this example
And here is the list of objects detected
In conclusion, we have created a pretty handy tool for automatic image tagging using Data Lake with very little knowledge required on the background processes involved.
To be noted that there seems to be an image size limit, i had to resize all images to about 500 kb.
Introduction to Data Wrangler in Microsoft Fabric
What is Data Wrangler? A key selling point of Microsoft Fabric is the Data Science
Jul
Autogen Power BI Model in Tabular Editor
In the realm of business intelligence, Power BI has emerged as a powerful tool for
Jul
Microsoft Healthcare Accelerator for Fabric
Microsoft released the Healthcare Data Solutions in Microsoft Fabric in Q1 2024. It was introduced
Jul
Unlock the Power of Colour: Make Your Power BI Reports Pop
Colour is a powerful visual tool that can enhance the appeal and readability of your
Jul
Python vs. PySpark: Navigating Data Analytics in Databricks – Part 2
Part 2: Exploring Advanced Functionalities in Databricks Welcome back to our Databricks journey! In this
May
GPT-4 with Vision vs Custom Vision in Anomaly Detection
Businesses today are generating data at an unprecedented rate. Automated processing of data is essential
May
Exploring DALL·E Capabilities
What is DALL·E? DALL·E is text-to-image generation system developed by OpenAI using deep learning methodologies.
May
Using Copilot Studio to Develop a HR Policy Bot
The next addition to Microsoft’s generative AI and large language model tools is Microsoft Copilot
Apr