Preparing for the Google Cloud Professional Data Engineer certification
What is the Google Cloud Data Engineer certification?
According to Google, a Google Cloud Certified – Professional Data Engineer enables data-driven decision making by collecting, transforming, and visualizing data. The Data Engineer designs, builds, maintains, and troubleshoots data processing systems with a particular emphasis on the security, reliability, fault-tolerance, scalability, fidelity, and efficiency of such systems.
The Data Engineer also analyzes data to gain insight into business outcomes, builds statistical models to support decision-making, and creates machine learning models to automate and simplify key business processes.
A Google Cloud Certified – Professional Data Engineer has demonstrated in Google’s assessment their ability to:
- Build and maintain data structures and databases
- Design data processing systems
- Analyze data and enable machine learning
- Model business processes for analysis and optimization
- Design for reliability
- Visualize data and advocate policy
- Design for security and compliance
You can find the full exam guide here.
For me, this certification includes everything you need to know to work on data projects with (and even without) Google Cloud Platform. You will need to prove your knowledge and expertise on the following topics:
- Google data storage technologies (Google SQL, Cloud Spanner, BigQuery, DataStore, BigTable, Cloud storage)
- Migrating on-premise data pipelines and applications to Google cloud
- Building data pipelines for batch and streaming data (Python, Java, Go, Apache beam & Dataflow, Dataprep)
- Advanced SQL skills (BigQuery, Hive)
- Working with data on distributed systems (Hadoop, Compute engine, Dataproc)
- Scaling data-flows and being able to leverage real-time data streams (Dataflow, Pub/sub)
- Securing your data projects (Managing access rights responsibly with IAM and VPC networks)
- Monitoring resource consumption and your project’s state using Stackdriver
- Machine learning (Theoretical background, Cloud ML, Tensorflow, DataLab – Jupyter notebooks)
- Working with unstructured data
As you can see, the range of topics which is covered is very broad. The certification does go into technical details. Make sure you are familiar with the basic coding needed for each GCP product.
Road plan to glory
Do you want to get yourself a cool badge like this one and prove to the world your knowledge on all modern data analysis systems? Let me walk you through the resources I found useful while I was preparing for the exams in November 2018.
The type of training you need depends also on your background. Below are a few suggestions from a guy with a technical background, who worked for a little more than 10 years on data collection and data analysis projects, but with limited experience on Google cloud (I’m talking about me by the way 😊).
There is a lot of high quality studying material online, that you can use to get yourself familiar with the necessary topics. These are the ones I’ve used. The order of appearance is the order I would suggest to use them while studying.
- Coursera Specialization – Data Engineering on Google Cloud Platform Specialization
This course is offered by Google’s cloud training team. This team is responsible for developing, delivering and evaluating training that enables our enterprise customers and partners to use our products and solution offerings in an effective and impactful way. You will need at least 3 weeks to go through all 5 courses of this specialization (suggested completion time is 5 weeks, but if you are really devoted you can do it in 3). It includes most of the theoretical topics you need to cover and it also gives hands-on practice with access to Qwiklabs. Qwiklabs gives you temporary credentials to Google Cloud Platform , so you can learn the cloud using the real thing when completing the tasks you are asked to. Normally, you have to purchase Qwiklabs credits for every lab.
I was a bit disappointed by the quizzes available at the end of each course, as they were only covering the very basics.
- Udemy – GCP: Complete Google Data Engineer and Cloud Architect Guide
This is a much shorter course but I think that it includes a few points that are missing from Coursera’s specialization. Each course contains a lot of valuable information, but it’s much more concentrated, which might make it difficult to remember in one go. You should be able to complete this in a couple of weeks. This course also includes hands-on practice with quests from the codelabs website, but you’ll need to create you own Google cloud account (you’ll also get free credit worth 300$, which is enough to cover the cost of the exercises). If you already completed the Qwiklabs quests, I don’t think it’s worth spending too much time on the lab exercises. The quizzes at the end of each session are bit better than the ones you will find in Coursera, but still don’t expect anything close to the final certification exams. Each section of the class also gives you helpful information on the topics you have to focus for each exam, as this course covers both the Data engineer and the Cloud architect exams (Spoiler: Most of the courses are required for the data engineer exam).
- LinuxAcademy – Google Cloud Certified Professional Data Engineer
This course skips a lot of the basics, so don’t start with it if you have no experience working with GCP or haven’t completed any of the other courses. It’s a pretty short course, you should be able to complete it within a week. You will receive a lot of concentrated information, so it’s not very easy to follow it if you are not already familiar with everything it teaches. If found this course useful when I was trying to recap the most important points a few days before the exam. This also offers a few hands-on exercises, but I didn’t have enough time to try them out. I found really useful the quizzes at the end of each section. They are the most advanced quizzes I could find, you can take them multiple times and still get different questions every time. They helped me identify a few points, which were not clear to me a few days before the exam. The final exam may also include questions which cover the 2 sample case studies (MJTelco and Flowlogistic) you will find on Google’s website or other similar case studies. This is the only course which covers the 2 sample case studies and helps you understand how these 2 fictional companies can migrate their infrastructure to Google cloud.
- Professional data engineer sample case studies
During the exam for the Data Engineer Certification, some of the questions may refer you to a case study that describes a fictitious business and solution concept. These case studies are intended to provide additional context to help you choose your answers. It’s very useful to be familiar with the 2 sample case studies. Use the LinuxAcademy course to familiarize yourself. Also go through this video, it’s a discussion on the deconstruction the Flowlogistic case study.
This exam covers a lot of different topics. It’s important to spend some time to “digest” everything you’ve learned. Work on mapping your thoughts, understanding the differences between each one of the tools, the use cases where each tool can come in handy and understanding how Google suggests using each tool. This is a very important step that will help you “own” everything you learned from the previous steps. I think that this is what helped me pass the exam on my first try.
- Practice exam
Google offers a sample exam, you can take when you feel you are ready. The Data Engineer practice exam will familiarize you with types of questions you may encounter on the certification exam and help you determine your readiness or if you need more preparation and/or experience. Successful completion of the practice exam does not guarantee you will pass the certification exam as the actual exam is longer and covers a wider range of topics. Since this practice exam probably consists of questions which do not appear in the certification exam anymore, you might find questions referring to deprecated features.
- Book a time and date!
When you reach a certain level of confidence, because it’s difficult to feel 100% ready, book a time and date to take the exams. You’ll be able to locate test centers near you and book an exam using this link.
Things to keep in mind
- Stay away from old training material
Google cloud platform is constantly evolving, new features are added and existing products are upgraded. If you find any training material or books which are over 10-12 months old, there’s a good chance they will not be very accurate. I only used training material which were issued after April 2018, when I started preparing for the exams in October 2018. I was looking for training material which was issues after the last Google I/O event, where a lot of the new features are announced.
- Familiarize yourself with the GCP environment
It’s important to feel familiar with everything you see on GCP. Spend some time working with each of the options, poke around, try to complete small and simple quests. It will help you reach a good level of confidence and keep in mind small details which make a big difference during the exam.
- Spend some time going through Google’s documentation
This was not part of my training plan, but I do feel it’s important to go through the documentation available for each GCP product. There’s a lot of useful knowledge there that you will find helpful.
- Review case studies and solutions available by Google
There’s not better way to understand everything you see during the courses, than seeing how they work in actual use cases. You can find a lot of use cases on Google’s website and on the solutions gallery website.
I addition to everything I mentioned above, I found the following links really helpful when I was working on my study plan:
- GCP Playlist for big data (YouTube)
- Spotify’s road to Google Cloud (Blog)
- Overview of Hadoop’s ecosystem
- Experiences from other candidates:
I hope this will help your advance your GCP skills and save you some time as you are familiarizing yourself with Google Cloud Platform!