Home
>
Data Science
>
Update A Tutorial Introduction to Google Vertex AI AutoML: Data Preparation

March 22, 2022 by Phu Nguyen

Update A Tutorial Introduction to Google Vertex AI AutoML: Data Preparation

A Tutorial Introduction to Google Vertex AI AutoML: Data Preparation is an article under the topic Data Science Many of you are most interested in today !! Today, let’s InApps.net learn A Tutorial Introduction to Google Vertex AI AutoML: Data Preparation in today’s post !

Google’s Vertex AI is a unified machine learning and deep learning platform that supports AutoML models and custom models. In this tutorial, we will train an image classification model to detect face masks with Vertex AI AutoML. For an introduction to Vertex AI, read this article I published last week at InApps.

To complete this tutorial, you need an active Google Cloud subscription and Google Cloud SDK installed on your workstation.

There are three steps involved in training this model: dataset creation, training, and inference.

Dataset creation involves uploading the images and labeling them. Since we are using AutoML, training needs minimal intervention. We don’t need to write code or perform steps like hyperparameter tuning. When the training is done, we can download the model for deployment in edge devices or host it for performing inference.

In the first part of this tutorial, we will focus on creating the dataset. For this tutorial, we will use the raw dataset of faces with mask and without mask created by Prajna Bhandary.

She used image augmentation techniques to generate 600+ images for each class.

While this is not the most comprehensive dataset, it makes a good choice for AutoML which can train models with a lesser number of images.

We will upload these images to Google Cloud Storage bucket with two folders — mask and no-mask. A CSV file with the path of each image and the label will be uploaded to the same bucket which becomes the input for Vertex AI.

Let’s create the Google Cloud Storage bucket.

<br /><br /><br />
BUCKET=j-mask-nomask<br /><br /><br />
REGION=EUROPE-WEST4

1 2	BUCKET=j–mask–nomask REGION=EUROPE–WEST4

Feel free to change the values to reflect your bucket name and the region. At the time of launch, Vertex AI AutoML is available only in US-CENTRAL1 (Iowa) and EUROPE-WEST4 (Netherlands) regions.

<br /><br /><br />
gsutil mb -l $REGION -c STANDARD gs://$BUCKET

1	gsutil mb –l $REGION –c STANDARD gs://$BUCKET

We will now start uploading the images to the above bucket.

Clone the GitHub repository on your local machine.

<br /><br /><br />
git clone https://github.com/prajnasb/observations.git

1	git clone https://github.com/prajnasb/observations.git

Navigate to the data directory and run the following commands:

<br /><br /><br />
gsutil cp -r with_mask gs://$BUCKET

1	gsutil cp –r with_mask gs://$BUCKET

<br /><br /><br />
gsutil cp -r without_mask gs://$BUCKET

1	gsutil cp –r without_mask gs://$BUCKET

To upload images simultaneously from both the directories, run the commands in two different terminal windows.

Check the Google Cloud Console and browse the folders.

Once the images are uploaded, we need to generate a CSV file with the path and label of each image.

We will run a simple BASH script for this task.

<br /><br /><br />
for filename in with_mask/*.jpg; do<br /><br /><br />
    [ -e “$filename” ] || continue<br /><br /><br />
    echo “gs://$BUCKET/$filename,mask” >> mask-ds.csv<br /><br /><br />
done

for filename in with_mask/*.jpg; do

[ –e “$filename” ] || continue

echo “gs://$BUCKET/$filename,mask” >> mask–ds.csv

done

This populates the file, mask-ds.csv with entries that looks like this:

<br /><br /><br />
gs://j-mask-nomask/with_mask/0-with-mask.jpg,mask<br /><br /><br />
gs://j-mask-nomask/with_mask/1-with-mask.jpg,mask<br /><br /><br />
gs://j-mask-nomask/with_mask/10-with-mask.jpg,mask<br /><br /><br />
gs://j-mask-nomask/with_mask/100-with-mask.jpg,mask

gs://j-mask-nomask/with_mask/0-with-mask.jpg,mask

gs://j-mask-nomask/with_mask/1-with-mask.jpg,mask

gs://j-mask-nomask/with_mask/10-with-mask.jpg,mask

gs://j-mask-nomask/with_mask/100-with-mask.jpg,mask

Let’s repeat this for the second folder to generate the path and label for no-mask.

<br /><br /><br />
for filename in without_mask/*.jpg; do<br /><br /><br />
    [ -e “$filename” ] || continue<br /><br /><br />
    echo “gs://$BUCKET/$filename,no-mask” >> mask-ds.csv<br /><br /><br />
done

for filename in without_mask/*.jpg; do

[ –e “$filename” ] || continue

echo “gs://$BUCKET/$filename,no-mask” >> mask–ds.csv

done

This will append lines to the CSV file with the path of images with no mask.

<br /><br /><br />
gs://j-mask-nomask/without_mask/0.jpg,no-mask<br /><br /><br />
gs://j-mask-nomask/without_mask/1.jpg,no-mask<br /><br /><br />
gs://j-mask-nomask/without_mask/10.jpg,no-mask<br /><br /><br />
gs://j-mask-nomask/without_mask/100.jpg,no-mask<br /><br /><br />
gs://j-mask-nomask/without_mask/101.jpg,no-mask

gs://j-mask-nomask/without_mask/0.jpg,no-mask

gs://j-mask-nomask/without_mask/1.jpg,no-mask

gs://j-mask-nomask/without_mask/10.jpg,no-mask

gs://j-mask-nomask/without_mask/100.jpg,no-mask

gs://j-mask-nomask/without_mask/101.jpg,no-mask

Finally, we need to upload the CSV file to the bucket.

<br /><br /><br />
gsutil cp mask-ds.csv gs://$BUCKET

1	gsutil cp mask–ds.csv gs://$BUCKET

The CSV file becomes the critical input to Vertex AI AutoML to create the final dataset.

Running the command, gsutil ls gs://$BUCKET confirms that the CSV file is successfully uploaded to Google Cloud Storage bucket.

With the data uploaded to cloud storage, let’s turn that into a Vertex AI dataset.

Access the Vertex AI Dashboard in the Google Cloud Console and enable the API. Choose the region and click on create dataset:

Give the dataset a name, choose image classification with a single label, and click on create:

In the next section, choose select import files from Cloud Storage:

Browse the Cloud Storage bucket and select the CSV file uploaded earlier, and click on continue:

The import process takes a few minutes. When it completes, you are taken to the next page that shows all of the images identified from the dataset, both labeled and unlabeled images:

You may see some warnings and errors during the import process due to duplicate images found by Vertex AI. They can be safely ignored.

We are now ready to kick off the training. Stay tuned for the next part of the tutorial for a walkthrough of the training and inference process.

Source: InApps.net

Rate this post

Phu Nguyen

As a Senior Tech Enthusiast, I bring a decade of experience to the realm of tech writing, blending deep industry knowledge with a passion for storytelling. With expertise in software development to emerging tech trends like AI and IoT—my articles not only inform but also inspire. My journey in tech writing has been marked by a commitment to accuracy, clarity, and engaging storytelling, making me a trusted voice in the tech community.