In this blog I’m going to show one of the advantages of linking Data Lakes Analytics with Machine Learning.
We’ll be uploading a series of images to the Data Lake, we will then run a USQL script that will detect objects in the images and create relative tags in a text file.
First of all you need an instance of Data Lake Store and one of Data Lake Analytics, once these are up and running we need to enable Python/R/Cognitive in your Data Lake Analytics instance (here is a blog to help you out on this).
First things first, we need to put an image in our Data Lake Store, following Azure Data Lake best practices I put the images in my laboratory subfolder.
Once our images are in place we need to create a script, in your Data Lake analytics instance click on New Job
This will open a new blade with an empty script, let’s give our new Job a name “ImageTagging”.
In order to use Image tagging we need to import the relevant ASSEMBLIES:
REFERENCE ASSEMBLY ImageCommon; REFERENCE ASSEMBLY ImageTagging;
Next we need to extract information (location, filename etc.) on the image file(s) we want to analyse, in this case we’ll process all images in the specified folder.
@images= EXTRACT FileName string, ImgData byte[] FROM @"/Laboratory/Desks/CSbrescia/ImageTagging/{FileName:*}.jpg" USING new Cognition.Vision.ImageExtractor();
The following step is where the magic happens, the script analyses all the images located in the folder indicated before, it detects all objects present in each image and create tags; here is the structure of this “variable”:
- Image name
- Number of tagged objects detected
- A string with all the tags
@TaggedObjects= PROCESS @images PRODUCE FileName, NumObjects int, Tags string READONLY FileName USING new Cognition.Vision.ImageTagger();
Now we can write our variable with all the tags to an output file
OUTPUT @TaggedObjects
TO “/Laboratory/Desks/CSbrescia/ImageTagging/ImageTags.tsv”
USING Outputters.Tsv();
Here are the images I used in this example
And here is the list of objects detected
In conclusion, we have created a pretty handy tool for automatic image tagging using Data Lake with very little knowledge required on the background processes involved.
To be noted that there seems to be an image size limit, i had to resize all images to about 500 kb.
Pareto Charts in Power BI and the DAX behind them
The Pareto principle, commonly referred to as the 80/20 rule, is a concept of prioritisation.
Apr
Databricks: Cluster Configuration
Databricks, a cloud-based platform for data engineering, offers several tools that can be used to
Apr
AI Assistance in Microsoft Fabric
The exponential growth of Large Language Models (LLMs) couples with Microsoft’s close partnership with OpenAI
Apr
10 reasons why it’s worth the effort to understand the value of your data
“If leaders really want to create a data driven culture, the journey starts with them!
Apr
Content Safety in Azure AI Studio
Azure AI Content Safety is a solution designed to identify harmful content, whether generated by
Apr
Model Benchmarks in Azure AI Studio
In the constantly changing field of artificial intelligence (AI) and machine learning (ML), choosing the
Apr
Celebrating International Women’s Day: from Classroom to Code
As we celebrate International Women’s Day, I want to share my journey of breaking stereotypes
Mar
Pretty Power BI – Adding Pagination to Bar Charts
Good User Experience (UX) design is crucial in enabling stakeholders to maximise the insights that
Feb