One of my 2018 resolutions was to become a Google Cloud Certifed Professional Cloud Architect. I managed to meet that goal only at the tail end of the year. During the last week of December, when most of my customers went on vacation, I was heads-down preparing for the exam.
I am quite familiar with the vendor certification processes and formats. Having cleared multiple certifications from Microsoft and Amazon, I was curious to see how Google structured the exam. The training and certification team at Google did a great job with the exam pattern, technology domains, and real-world enterprise scenarios. I thoroughly enjoyed the testing experience.
If you have an aspiration to become a Google Cloud certified architect, here are some tips for you. These are based on the experiences of preparing for the exam and the observations from the actual test.
1. Understand the Concepts of the Hybrid Cloud
There is a lot of emphasis on connecting on-premises infrastructure to the Google Cloud Platform. You need to thoroughly understand the choices and tradeoffs of extending an enterprise data center to GCP.
Google, like its competitors, has multiple channels to connect on-prem resources to the cloud. Each channel has unique attributes that address specific enterprise scenarios. You got to know the pros and cons of using one service against the other while implementing a hybrid strategy. Focus on the hybrid networking services offered by Google.
Cloud VPN, which securely connects on-prem resources to GCP VPC via the public internet. It is the cheapest option available to customers to open a secure tunnel between their data center and cloud.
Cloud Interconnect offers a dedicated, thick 10 Gbps pipe directly to a location where Google Cloud has a point of presence. This delivers unmatched connectivity but is expensive.
Direct Peering is a cheaper option to Cloud Interconnect that delivers better performance than a VPN. While it doesn’t have an SLA, Direct Peering lets customers connect directly to Google by cutting the egress fees significantly.
2. Know How to Move Data to Google Cloud
Moving data to the cloud is an important step in the migration. Google offers multiple services to migrate data to GCP. You should be able to choose the right service given the business scenario.
Become familiar with gsutil command line tool to perform basic operations on Google Cloud Storage. In many scenarios, this tool comes handy to move a large number of files from local storage to the cloud. Understand how to parallelize uploads, configure security, and automate data movement with gsutil. It only makes sense to use this CLI when the data is in few GBs. Consider other options when you have to upload terabytes or petabytes of data.
Cloud Storage Transfer Service is meant to migrate data from an online source such as Amazon S3, Azure Storage, or even an HTTP endpoint. Since Google doesn’t charge for ingress, this becomes an ideal choice to migrate large amounts of data from other cloud platforms or storage services.
Transfer Appliance is the cheapest and fastest option when you need to securely move terabytes or petabytes of data to GCP. Both, Google and the customer team, participate in the migration process.
If the customer needs to move large datasets directly to BigQuery, consider BigQuery Data Transfer Service, which automates data movement from SaaS applications to Google BigQuery on a scheduled, managed basis.
3. Learn Google Cloud IAM Inside Out
Google Cloud Identity and Access Management (IAM) is a service to implement granular or fine-grained security policies. It’s a comprehensive framework to secure any Google Cloud resource.
Learn the key differences between user accounts and service accounts. If you are familiar with AWS IAM, service accounts are a lot like IAM roles for EC2 where instances assume the context of a role. In GCP, service accounts can be used by any application that needs fine-grained access to a cloud resource. You need a service account even to connect Compute Engine VMs to Cloud SQL instances.
Understand how permissions propagate within the IAM hierarchy. The permissions defined at the parent level are always inherited by the child resources.
Explore the use cases of using Google Groups vs individual user accounts when defining a policy.
Google Cloud Storage supports both IAM and ACL policies. IAM is preferred when you want to protect buckets while ACLs are great for securing individual objects stored in buckets. It’s important to understand the effective policies when both are active.
4. Choose the Right Storage and Database Offerings
GCP has unique object storage tiers that can deliver more value to customers at a cheaper price than the competition. Same is the case with GCP database and big data offerings.
You should know when to use regional, multi-regional, nearline, and coldline storage tiers when uploading and storing data in object storage. When you don’t need replication across regions, regional storage bucket is the right choice. Nearline makes sense when data is accessed at least once a month. Coldline is ideal when the data is accessed only once in a year. Make sure you learn the concepts of object versioning and object lifecycle management which helps in automating the archival and deletion process.
Architects will have to choose among a variety of databases based on the use cases. Become familiar with the core concepts of Datastore, Cloud SQL, Cloud Spanner, Cloud Bigtable, and BigQuery. From an examination point of view, you can safely ignore Firebase and Firestore.
Datastore is great for web and mobile backends that need to store schema-less documents. But if the application needs transactional support with extremely low-latency and compatibility with HBase, go for Bigtable. Cloud SQL offers compatibility with existing MySQL and PostgreSQL databases. Cloud Spanner is an expensive service that’s used only when you need a global database with transactional support. BigQuery is meant for storing and retrieving large datasets with support for ANSI SQL. It’s not a replacement to a NoSQL and RDBMS database server.
Cloud Dataproc can be a replacement for Apache Hadoop and Spark jobs running in an on-prem environment. Cloud Dataflow is used when you need to build data pipelines for streaming as well as batch processing scenarios. Cloud Pub/Sub is meant for ingesting high volumes of data which can be connected to a Dataflow pipeline to ultimately store the final output in BigQuery for analytics.
5. Get a Grip on the Enterprise Case Studies
Google did a great job with scenario-based questions aligned with enterprise case studies. These case studies are publicly available which means that you can access them even before you register for the exam. Pay attention to detail when you read the case studies. The choice of words used in these requirements conveys a lot of intricate details about the design and architecture of the solution.
- The Dress4Win case study is a typical example of an enterprise considering cloud for development and test. The key takeaway is the replication of the existing environment in the cloud with minimal changes.
- TerramEarth is a classic connected vehicle / IoT use case with a lot of scope for designing data processing pipelines. Subtle hints and wordings used in the case study point to significant architectural choices and design decisions that influence choosing the right data platform services in GCP.
- Mountkirk Games is an example of a classic mobile gaming backend running in the cloud. It has tremendous scope for implementing a scalable backend combined with an analytics engine.
As a part of the preparation, please take a print of these case studies and highlight the keywords used in the technical and business requirement section. Mapping them to the right services and tools of GCP will save you precious time during the exam.
Make sure you know the tradeoffs of below choices:
- Google Compute Engine vs. Google App Engine
- Standard VMs. vs Preemptible VMs
- Local SSD vs. SSD Persistent Disks
- Global vs. Regional vs. Zonal resources
- Google App Engine vs. Google Kubernetes Engine
- HTTP Load Balancer vs. Network Load Balancer
- Primitive IAM Roles vs. predefined IAM Roles
- App Engine Standard vs. App Engine Flex
- Cloud Storage vs. BigQuery for Stackdriver Sink
Finally, don’t forget to take the practice exam. Good luck with the preparation!
Feature image via Pixabay.