Professional Machine Learning Engineer-Part1
Professional Machine Learning Engineer-Part1
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 1/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Question #1 Topic 1
You are building an ML model to detect anomalies in real-time sensor data. You will use Pub/Sub to handle incoming requests. You want to store
the results for analytics and visualization. How should you configure the pipeline?
Correct Answer: C
Reference:
https://cloud.google.com/solutions/building-anomaly-detection-dataflow-bigqueryml-dlp
Selected Answer: A
Verified Answer
upvoted 1 times
Selected Answer: A
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 2/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Selected Answer: A
Went with A
upvoted 1 times
Selected Answer: A
Dataflow is required
upvoted 1 times
Definitely A.
upvoted 1 times
Definitely A.
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 3/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Question #2 Topic 1
Your organization wants to make its internal shuttle service route more efficient. The shuttles currently stop at all pick-up points across the city
every 30 minutes between 7 am and 10 am. The development team has already built an application on Google Kubernetes Engine that requires
users to confirm their presence and shuttle station one day in advance. What approach should you take?
A. 1. Build a tree-based regression model that predicts how many passengers will be picked up at each shuttle station. 2. Dispatch an
appropriately sized shuttle and provide the map with the required stops based on the prediction.
B. 1. Build a tree-based classification model that predicts whether the shuttle should pick up passengers at each shuttle station. 2. Dispatch
an available shuttle and provide the map with the required stops based on the prediction.
C. 1. Define the optimal route as the shortest route that passes by all shuttle stations with confirmed attendance at the given time under
capacity constraints. 2. Dispatch an appropriately sized shuttle and indicate the required stops on the map.
D. 1. Build a reinforcement learning model with tree-based classification models that predict the presence of passengers at shuttle stops as
agents and a reward function around a distance-based metric. 2. Dispatch an appropriately sized shuttle and provide the map with the
Correct Answer: A
Selected Answer: C
C - Since we have the attendance list in advance. Tree-based classification, regression and reinforced learning sounds useless in this case.
upvoted 2 times
Selected Answer: C
you do not need to predict how many people will be at each station as the requirement mentions they have to register a day in advance
upvoted 1 times
Selected Answer: C
Went with C
upvoted 1 times
Selected Answer: C
No need to predict the presences since they are already confirmed, best thing we can do is optimize the route
upvoted 3 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 4/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Selected Answer: C
Answer is C
upvoted 1 times
Selected Answer: C
correct answer is C
upvoted 1 times
Selected Answer: C
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 5/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Question #3 Topic 1
You were asked to investigate failures of a production line component based on sensor readings. After receiving the dataset, you discover that
less than 1% of the readings are positive examples representing failure incidents. You have tried to train several classification models, but none of
them converge. How should you resolve the class imbalance problem?
B. Use a convolutional neural network with max pooling and softmax activation.
C. Downsample the data with upweighting to create a sample with 10% positive examples.
D. Remove negative examples until the numbers of positive and negative examples are equal.
Correct Answer: B
Reference:
https://towardsdatascience.com/convolution-neural-networks-a-beginners-guide-implementing-a-mnist-hand-written-digit-8aa60330d022
ANS: C
https://developers.google.com/machine-learning/data-prep/construct/sampling-splitting/imbalanced-data#downsampling-and-upweighting
Downsampling (in this context) means training on a disproportionately low subset of the majority class examples.
upvoted 29 times
How many trainable weights does your model have? (The arithmetic below is correct.)
A. 501*256+257*128+2 = 161154
B. 500*256+256*128+128*2 = 161024
C. 501*256+257*128+128*2 = 161408
D. 500*256*0(?)25+256*128*0(?)25+128*2 = 4044
upvoted 8 times
upvoted 3 times
Selected Answer: C
Selected Answer: C
Went with C
upvoted 1 times
Selected Answer: C
https://developers.google.com/machine-learning/data-prep/construct/sampling-splitting/imbalanced-data#downsampling-and-upweighting
upvoted 1 times
Selected Answer: C
https://developers.google.com/machine-learning/data-prep/construct/sampling-splitting/imbalanced-data
upvoted 1 times
Selected Answer: C
C. Downsample the data with upweighting to create a sample with 10% positive examples.
Dealing with class imbalance can be challenging for machine learning models. One common approach to resolving the problem is to downsample
the data, either by removing examples from the majority class or by oversampling the minority class. In this case, since you have very few positive
examples, you would want to oversample the positive examples to create a sample that better represents the underlying distribution of the data.
This could involve using upweighting, where positive examples are given a higher weight in the loss function to compensate for their relative
scarcity in the data. This can help the model to better focus on the positive examples and improve its performance in classifying failure incidents.
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 7/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
C. Downsample the data with upweighting to create a sample with 10% positive examples.
Dealing with class imbalance can be challenging for machine learning models. One common approach to resolving the problem is to downsample
the data, either by removing examples from the majority class or by oversampling the minority class. In this case, since you have very few positive
examples, you would want to oversample the positive examples to create a sample that better represents the underlying distribution of the data.
This could involve using upweighting, where positive examples are given a higher weight in the loss function to compensate for their relative
scarcity in the data. This can help the model to better focus on the positive examples and improve its performance in classifying failure incidents.
upvoted 1 times
Selected Answer: C
Selected Answer: C
Selected Answer: C
C. because regardless of the model you use, you should always try to transform or adapt your dataset so that it is more balanced
upvoted 1 times
Selected Answer: C
https://developers.google.com/machine-learning/data-prep/construct/sampling-splitting/imbalanced-data
upvoted 1 times
Selected Answer: D
https://developers.google.com/machine-learning/data-prep/construct/sampling-splitting/imbalanced-data
upvoted 1 times
Selected Answer: C
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 8/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Question #4 Topic 1
You want to rebuild your ML pipeline for structured data on Google Cloud. You are using PySpark to conduct data transformations at scale, but
your pipelines are taking over 12 hours to run. To speed up development and pipeline run time, you want to use a serverless tool and SQL syntax.
You have already moved your raw data into Cloud Storage. How should you build the pipeline on Google Cloud while meeting the speed and
processing requirements?
A. Use Data Fusion's GUI to build the transformation pipelines, and then write the data into BigQuery.
B. Convert your PySpark into SparkSQL queries to transform the data, and then run your pipeline on Dataproc to write the data into BigQuery.
C. Ingest your data into Cloud SQL, convert your PySpark commands into SQL queries to transform the data, and then use federated queries
D. Ingest your data into BigQuery using BigQuery Load, convert your PySpark commands into BigQuery SQL queries to transform the data, and
Correct Answer: B
ANS: A
https://cloud.google.com/data-fusion#section-1
- Data Fusion is a serverless approach leveraging the scalability and reliability of Google services like Dataproc means Data Fusion offers the best o
data integration capabilities with a lower total cost of ownership.
- BigQuery is serverless and supports SQL.
- Dataproc is not serverless, you have to manage clusters.
- Cloud SQL is not serverless, you have to manage instances.
upvoted 11 times
Selected Answer: D
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 9/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Selected Answer: D
I'll go with D.
upvoted 1 times
Went with D
upvoted 3 times
Selected Answer: B
It should be A.
upvoted 1 times
I was thinking B, but now I'm kind of confused that nobody voted it
upvoted 3 times
Selected Answer: D
D is correct
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 10/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 11/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Question #5 Topic 1
You manage a team of data scientists who use a cloud-based backend system to submit training jobs. This system has become very difficult to
administer, and you want to use a managed service instead. The data scientists you work with use many different frameworks, including Keras,
PyTorch, theano, Scikit-learn, and custom libraries. What should you do?
A. Use the AI Platform custom containers feature to receive training jobs using any framework.
B. Configure Kubeflow to run on Google Kubernetes Engine and receive training jobs through TF Job.
C. Create a library of VM images on Compute Engine, and publish these images on a centralized repository.
D. Set up Slurm workload manager to receive jobs that can be scheduled to run on your cloud infrastructure.
Correct Answer: D
the answer is A
upvoted 23 times
A, because AI platform supported all the frameworks mentioned. And Kubeflow is not managed service in GCP. https://cloud.google.com/ai-
platform/training/docs/getting-started-pytorch
upvoted 9 times
Selected Answer: A
Chose A
upvoted 1 times
Selected Answer: A
Selected Answer: A
Went with A
upvoted 2 times
Selected Answer: A
It's A
upvoted 2 times
Here the question is on workload management not on supporting frameworks slurm is a managed solution for workloads
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 12/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Selected Answer: A
Now it's Vertex AI (instead of AI Platform), but it's the best solution, no need to do anything more complicated
upvoted 3 times
Selected Answer: A
A is correct
upvoted 2 times
the answer is A
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 13/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 14/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Question #6 Topic 1
You work for an online retail company that is creating a visual search engine. You have set up an end-to-end ML pipeline on Google Cloud to
classify whether an image contains your company's product. Expecting the release of new products in the near future, you configured a retraining
functionality in the pipeline so that new data can be fed into your ML models. You also want to use AI Platform's continuous evaluation service to
ensure that the models have high accuracy on your test dataset. What should you do?
A. Keep the original test dataset unchanged even if newer products are incorporated into retraining.
B. Extend your test dataset with images of the newer products when they are introduced to retraining.
C. Replace your test dataset with images of the newer products when they are introduced to retraining.
D. Update your test dataset with images of the newer products when your evaluation metrics drop below a pre-decided threshold.
Correct Answer: C
A: Doesn't make sense. If you don't use the new product, it becomes useless.
C: Conventional products are also necessary as data.
D: I don't understand the need to wait until the threshold is exceeded.
upvoted 30 times
answer is B
upvoted 11 times
Selected Answer: B
My initial confusion with option B arose from the phrase "with images of the newer products when they are introduced to retraining." Initially, I
mistakenly interpreted it as recommending the use of the same images in both training and testing, which is incorrect. However, upon further
reflection, I realized that using the same product does not necessarily mean using identical images. Therefore, I now believe that option B is the
most suitable choice.
upvoted 1 times
A and C make no sense - you don't want to lose any of the performance on existing products.
D - Why would you wait for your performance to drop in the first place? That's a reactive rather than proactive approach.
The answer is B
upvoted 1 times
B for sure
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 15/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Selected Answer: B
A. Keep the original test dataset unchanged even if newer products are incorporated into retraining. : This would not test on new products.
B. Extend your test dataset with images of the newer products when they are introduced to retraining. Most Voted : old+new products testing.
Great
C. Replace your test dataset with images of the newer products when they are introduced to retraining. : No need of old product to be tested? old
product recognition might change when new products are added in training. Option Not good.
D. Update your test dataset with images of the newer products when your evaluation metrics drop below a pre-decided threshold.: why wait? no
need
upvoted 1 times
Went with B
upvoted 2 times
Selected Answer: B
you can't just replace the old product data with just new product, until you don't sell old product anymore
upvoted 2 times
Selected Answer: B
You need to correctly classify newer products, so you need the new training data ==> A is wrong;
You need to keep doing a good job on older dataset, you can't just ignore it ==> C is wrong;
You know when you are introducing new products, there is no need to wait for a drop in preformaces ==> D is wrong;
B is correct
upvoted 2 times
Selected Answer: D
answer between B,D but in the question "You also want to use AI Platform's continuous evaluation service" will make me biased towards D , also
retrain is done when model performance is below threshold , not whenever new data is intoroduce
upvoted 4 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 16/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Question #7 Topic 1
You need to build classification workflows over several structured datasets currently stored in BigQuery. Because you will be performing the
classification several times, you want to complete the following steps without writing code: exploratory data analysis, feature selection, model
building, training, and hyperparameter tuning and serving. What should you do?
C. Use AI Platform Notebooks to run the classification model with pandas library.
D. Use AI Platform to run the classification model job configured for hyperparameter tuning.
Correct Answer: B
BigQuery ML supports supervised learningג€ with the logistic regression model type.
Reference:
https://cloud.google.com/bigquery-ml/docs/logistic-regression-prediction
The question says 'over several structured datasets' means large/multiple datasets and 'several times' means frequently use of data. Though
BigQuery ML is not an absolute 'NO Code' solution but all it needs is very simple SQL query to train ML model So 'B' could be the correct answer
here but it is asking for Hyperparameter tuning which is not available in BigQuery ML so correct answer is 'A'
upvoted 1 times
Selected Answer: A
A - AutoML is no code
upvoted 1 times
Selected Answer: A
requirement : No code
A. Configure AutoML Tables to perform the classification task. : No code
B. Run a BigQuery ML task to perform logistic regression for the classification. : coding LR model
C. Use AI Platform Notebooks to run the classification model with pandas library. : Notebooks include codes
D. Use AI Platform to run the classification model job configured for hyperparameter tuning.: job needs to be written what to execute
upvoted 1 times
Selected Answer: A
Went with A
upvoted 1 times
Selected Answer: A
Selected Answer: A
Selected Answer: A
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 17/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
"without writing code" only A option complies with this statment , all other options requires writing code
upvoted 1 times
A is correct
upvoted 2 times
Dump data to table and do the work by clicks of button and no coding needed.
upvoted 1 times
https://cloud.google.com/automl-tables/docs#docs
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 18/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Question #8 Topic 1
You work for a public transportation company and need to build a model to estimate delay times for multiple transportation routes. Predictions
are served directly to users in an app in real time. Because different seasons and population increases impact the data relevance, you will retrain
the model every month. You want to follow Google-recommended best practices. How should you configure the end-to-end architecture of the
predictive model?
A. Configure Kubeflow Pipelines to schedule your multi-step workflow from training to deploying your model.
B. Use a model trained and deployed on BigQuery ML, and trigger retraining with the scheduled query feature in BigQuery.
C. Write a Cloud Functions script that launches a training and deploying job on AI Platform that is triggered by Cloud Scheduler.
D. Use Cloud Composer to programmatically schedule a Dataflow job that executes the workflow from training to deploying your model.
Correct Answer: A
Answer: A
A. Kubeflow Pipelines can form an end-to-end architecture (https://www.kubeflow.org/docs/components/pipelines/overview/pipelines-overview/)
and deploy models.
B. BigQuery ML can't offer an end-to-end architecture because it must use another tool, like AI Platform, for serving models at the end of the
process (https://cloud.google.com/bigquery-ml/docs/export-model-tutorial#online_deployment_and_serving).
C. Cloud Scheduler can trigger the first step in a pipeline, but then some orchestrator is needed to continue the remaining steps. Besides, having
Cloud Scheduler alone can't ensure failure handling during pipeline execution.
D. A Dataflow job can't deploy models, it must use AI Platform at the end instead.
upvoted 36 times
the answer is D. found similar explaination in this course. open for discussion. I found B could also work, but the question asked for end-to end,
thus I choose D in stead of B https://www.coursera.org/lecture/ml-pipelines-google-cloud/what-is-cloud-composer-CuXTQ
upvoted 11 times
Selected Answer: A
Chose A
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 19/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Selected Answer: A
Req: retrain the model every month+ Google-recommended best practice+ end-to-end architecture
A. Configure Kubeflow Pipelines to schedule your multi-step workflow from training to deploying your model. : Supports all above
B. Use a model trained and deployed on BigQuery ML, and trigger retraining with the scheduled query feature in BigQuery : Why BigQuery ML
when vertexAI/kubflow can handle end to end. BigQuery ML+ traigger only initiate the code run.
C. Write a Cloud Functions script that launches a training and deploying job on AI Platform that is triggered by Cloud Scheduler. : Not
recommended by google for end to end ML
D. Use Cloud Composer to programmatically schedule a Dataflow job that executes the workflow from training to deploying your model. : Not
recommended by google for end to end ML. what if model fails? matrix monitor?
upvoted 1 times
Selected Answer: A
Went with A
upvoted 1 times
Selected Answer: A
A : In this case, it would be a good fit as you need to retrain your model every month, which can be automated with Kubeflow Pipelines. This make
it easier to manage the entire process, from training to deploying, in a streamlined and scalable manner.
upvoted 1 times
Selected Answer: A
A is correct
All the options get you to the required result, but only A follows the Google-recommended best practices
upvoted 1 times
Selected Answer: A
Community vote
upvoted 2 times
Selected Answer: A
A for me too. KF provides all the end2end tools to perform what is asked
upvoted 2 times
Kubeflow can handle all of those things, including deploying to a model endpoint for real-time serving.
upvoted 2 times
To automate this model-building process, you will orchestrate the pipeline using Kubeflow Pipelines, ‘a platform for building and deploying
portable, scalable machine learning (ML) workflows based on Docker containers.’
upvoted 6 times
upvoted 1 times
https://cloud.google.com/architecture/ml-on-gcp-best-practices?hl=en#machine-learning-workflow-orchestration
upvoted 3 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 21/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Question #9 Topic 1
You are developing ML models with AI Platform for image segmentation on CT scans. You frequently update your model architectures based on
the newest available research papers, and have to rerun training on the same dataset to benchmark their performance. You want to minimize
computation costs and manual intervention while having version control for your code. What should you do?
A. Use Cloud Functions to identify changes to your code in Cloud Storage and trigger a retraining job.
B. Use the gcloud command-line tool to submit training jobs on AI Platform when you update your code.
C. Use Cloud Build linked with Cloud Source Repositories to trigger retraining when new code is pushed to the repository.
D. Create an automated workflow in Cloud Composer that runs daily and looks for changes in code in Cloud Storage using a sensor.
Correct Answer: B
https://cloud.google.com/ai-platform/training/docs/training-jobs
ANS:C
At the heart of this architecture is Cloud Build, infrastructure. Cloud Build can import source from Cloud Source Repositories, GitHub, or Bitbucket,
and then execute a build to your specifications, and produce artifacts such as Docker containers or Python tar files.
upvoted 24 times
Should be C
https://cloud.google.com/architecture/architecture-for-mlops-using-tfx-kubeflow-pipelines-and-cloud-build#cicd_architecture
upvoted 10 times
I mean C is indeed the most logical, but i do not see anything relevant to cost concern. Anyone has any explanation?
upvoted 1 times
Selected Answer: C
Req :frequently rerun training + minimise computation costs + 0 manual intervention + version control for your code
A. Use Cloud Functions to identify changes to your code in Cloud Storage and trigger a retraining job. : No version control
B. Use the gcloud command-line tool to submit training jobs on AI Platform when you update your code. : Needs manual intervention to gcloud cl
code submission
C. Use Cloud Build linked with Cloud Source Repositories to trigger retraining when new code is pushed to the repository. Yes, connects to github
like Vcontrols, automated=0 manual intervention + can initiate upon code changes + cost(not sure compared to other options)
D. Create an automated workflow in Cloud Composer that runs daily and looks for changes in code in Cloud Storage using a sensor. : Sensor?? too
much . also none of req meets.
upvoted 1 times
Selected Answer: C
Went with C
upvoted 1 times
Selected Answer: C
Selected Answer: C
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 22/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Selected Answer: C
Community vote
upvoted 3 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 23/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Your team needs to build a model that predicts whether images contain a driver's license, passport, or credit card. The data engineering team
already built the pipeline and generated a dataset composed of 10,000 images with driver's licenses, 1,000 images with passports, and 1,000
images with credit cards. You now have to train a model with the following label map: [`˜drivers_license', `˜passport', `˜credit_card']. Which loss
A. Categorical hinge
B. Binary cross-entropy
C. Categorical cross-entropy
Correct Answer: D
se sparse_categorical_crossentropy. Examples for above 3-class classification problem: [1] , [2], [3]
Reference:
https://stats.stackexchange.com/questions/326065/cross-entropy-vs-sparse-cross-entropy-when-to-use-one-over-the-other
Answer is C
upvoted 19 times
answer is D
https://machinelearningmastery.com/how-to-choose-loss-functions-when-training-deep-learning-neural-networks/
upvoted 10 times
Selected Answer: C
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 24/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
I'd go with C. Categorical cross entropy is used when classes are mutually exclusive. If the number of classes was very high, then we could use
sparse categorical cross entropy.
upvoted 1 times
Selected Answer: D
Use sparse categorical crossentropy when your classes are mutually exclusive (e.g. when each sample belongs exactly to one class) and categorical
crossentropy when one sample can have multiple classes or labels are soft probabilities (like [0.5, 0.3, 0.2]).
upvoted 1 times
Selected Answer: C
C
D is for integer value instead of one-hot encoded vectors, in our question, it is 'drivers_license', 'passport', 'credit_card' one-hot.
upvoted 1 times
Selected Answer: C
It depends on how the labels are encoded. If onehot use CCE. If its a single integer representing the class use SCCE (Source: same as in the official
(wrong) answer)
From the question it's not clear how the labels are encoded. But for just 3 classes there is no doubt it's better to go with one-hot encoding.
Memory restrictions or a huge number of classes might point to SCCE
upvoted 1 times
Selected Answer: D
You now HAVE TO to train a model with the following label map: [`˜drivers_license', `˜passport', `˜credit_card'].
upvoted 2 times
Selected Answer: C
If you are wondering between C & D - think about what "sparse" means
It is used when dealing with hundreds of categories
upvoted 1 times
Selected Answer: D
Sparse categorical cross-entropy is used for multi-class classification problems where the labels are represented in a sparse matrix format. This is
not the case in this problem.
upvoted 2 times
Only 3 categories of values being either T or F. They don't really need to be integer encoded, which differs sparse cross-entropy from categorical.
upvoted 1 times
Selected Answer: D
https://fmorenovr.medium.com/sparse-categorical-cross-entropy-vs-categorical-cross-entropy-ea01d0392d28
upvoted 1 times
sparse_categorical_crossentropy (scce) produces a category index of the most likely matching category.
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 25/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
upvoted 1 times
you are dealing with a multi-class classification problem where each image can belong to one of three classes: "driver's license," "passport," or
"credit card." Categorical cross-entropy is the appropriate loss function for multi-class classification tasks. It measures the dissimilarity between the
predicted class probabilities and the true class labels. It's designed to penalize larger errors in predicted probabilities and help the model converge
towards more accurate predictions.
upvoted 1 times
Selected Answer: C
it's C
upvoted 1 times
Categorical cross entropy as model is trained with [1,0,0]/[0,1,0]/[0,0,1] kind of labels as given in the question
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 26/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You are designing an ML recommendation model for shoppers on your company's ecommerce website. You will use Recommendations AI to build,
test, and deploy your system. How should you develop recommendations that increase revenue while following best practices?
A. Use the ג€Other Products You May Likeג€ recommendation type to increase the click-through rate.
B. Use the ג€Frequently Bought Togetherג€ recommendation type to increase the shopping cart size for each order.
C. Import your user events and then your product catalog to make sure you have the highest quality event stream.
D. Because it will take time to collect and record product data, use placeholder values for the product catalog to test the viability of the model.
Correct Answer: B
Frequently bought together' recommendations aim to up-sell and cross-sell customers by providing product.
Reference:
https://rejoiner.com/resources/amazon-recommendations-secret-selling-online/
Answer should be B
https://cloud.google.com/recommendations-ai/docs/placements#rps
upvoted 19 times
This recommendation is useful when the user has indicated an intent to purchase a particular product (or list of products) already, and you are
looking to recommend complements (as opposed to substitutes). This recommendation is commonly displayed on the "add to cart" page, or on
the "shopping cart" or "registry" pages (for shopping cart expansion).
upvoted 7 times
Selected Answer: B
Went with B
upvoted 1 times
Selected Answer: C
https://cloud.google.com/recommendations-ai/docs/overview
upvoted 1 times
Selected Answer: B
Selected Answer: B
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 27/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Selected Answer: B
Community vote
upvoted 2 times
I think the correct answer is B, because the "default optimization objective" for FBT is "revenue per order", whereas the "default optimization
objective" for OYML is "click-through rate".
https://cloud.google.com/retail/recommendations-ai/docs/placements#fbt
upvoted 4 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 28/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You are designing an architecture with a serverless ML system to enrich customer support tickets with informative metadata before they are
routed to a support agent. You need a set of models to predict ticket priority, predict ticket resolution time, and perform sentiment analysis to help
agents make strategic decisions when they process support requests. Tickets are not expected to have any domain-specific terms or jargon.
Correct Answer: B
ANS: C
https://cloud.google.com/architecture/architecture-of-a-serverless-ml-model#architecture
The architecture has the following flow:
A user writes a ticket to Firebase, which triggers a Cloud Function.
-The Cloud Function calls 3 different endpoints to enrich the ticket:
-An AI Platform endpoint, where the function can predict the priority.
-An AI Platform endpoint, where the function can predict the resolution time.
-The Natural Language API to do sentiment analysis and word salience.
-For each reply, the Cloud Function updates the Firebase real-time database.
-The Cloud Function then creates a ticket into the helpdesk platform using the RESTful API.
upvoted 25 times
the answer should be C. The tickets do not include specific terms , which means, it doesn't need to be custom built. thus, we can use cloud NLP AP
instead of automl NLP.
upvoted 16 times
Selected Answer: C
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 29/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Selected Answer: C
Req : serverless ML system + models to (predict ticket priority -predict ticket resolution time- perform sentiment analysis )
The proposed architecture has the following flow:
A. 1 = AI Platform, 2 = AI Platform, 3 = AutoML Vision. : No image data as input here. Only text (NLP)
B. 1 = AI Platform, 2 = AI Platform, 3 = AutoML Natural Language : Only sentiment for 3rd endpoint. No custom model needed :
https://cloud.google.com/natural-language/automl/docs/beginners-guide . So autoML not required
C. 1 = AI Platform, 2 = AI Platform, 3 = Cloud Natural Language API : 1- for classification(priority :high low medium), 2- ticket time-regression -3-
sentiment analysis the CNL api is enough
D. 1 = Cloud Natural Language API, 2 = AI Platform, 3 = Cloud Vision API : No image data
upvoted 2 times
Went with C
upvoted 1 times
ANS: C
Selected Answer: B
ANS: B As you need to train custom regression models (Auto ML), as NLP API is not going to be able to rank your Priority and eval the Time.
upvoted 1 times
Selected Answer: C
AI Platform (now Vertex AI) for both the predictions and Natural Language API for sentiment analysis since there are no specific terms (so no need
to custom build something with an AutoML), so C
upvoted 2 times
Selected Answer: C
Selected Answer: C
- by options eliminations A,D must be dropped we have no vision tasks in this system
- answer between B,C , question stated "no specific domain or jargon" so natural laguage api is prefered over automl since there no custom
entinites or custom training , so I vote for C
upvoted 2 times
Selected Answer: C
Community vote
upvoted 4 times
Priority prediction is categorical. Resolution time is linear regression. Sentiment is a NLP problem.
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 30/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 31/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You have trained a deep neural network model on Google Cloud. The model has low loss on the training data, but is performing worse on the
validation data. You want the model to be resilient to overfitting. Which strategy should you use when retraining the model?
A. Apply a dropout parameter of 0.2, and decrease the learning rate by a factor of 10.
B. Apply a L2 regularization parameter of 0.4, and decrease the learning rate by a factor of 10.
C. Run a hyperparameter tuning job on AI Platform to optimize for the L2 regularization and dropout parameters.
D. Run a hyperparameter tuning job on AI Platform to optimize for the learning rate, and increase the number of neurons by a factor of 2.
Correct Answer: D
Should be C
https://machinelearningmastery.com/introduction-to-regularization-to-reduce-overfitting-and-improve-generalization-error/
upvoted 24 times
Selected Answer: C
Voted C
upvoted 1 times
Selected Answer: C
Selected Answer: C
A. Apply a dropout parameter of 0.2, and decrease the learning rate by a factor of 10. : Might / might not work . But may not find optimal
parameter set since it uses random values
B. Apply a L2 regularization parameter of 0.4, and decrease the learning rate by a factor of 10. : Might / might not work . But may not find optimal
parameter set since it uses random values
C. Run a hyperparameter tuning job on AI Platform to optimize for the L2 regularization and dropout parameters. : l2 and dropout are
regularisation method which would work. Let AI find the optimal solution on how extend these parameters should regularise. Yes this would work.
D. Run a hyperparameter tuning job on AI Platform to optimize for the learning rate, and increase the number of neurons by a factor of 2 :
AIplatform would do but adding neurons would make network nore complex. So we can eliminate this option.
upvoted 2 times
It should be C as regularization (L1/L2), early stopping and drop out are some of the ways in deep learning to handle overfitting. Other options
have specific values which may or may not solve overfitting as it depends on specific use case.
upvoted 1 times
Selected Answer: C
Went with C
upvoted 2 times
Selected Answer: C
ANS: C
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 32/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
upvoted 2 times
Selected Answer: C
We don't know the optimum values for the parameters, so we need to run a hyperparameter tuning job; L2 regularization and dropout parameters
are great ways to avoid overfitting.
So C is the answer
upvoted 1 times
Selected Answer: C
Selected Answer: C
- by options eliminations C,D are better than A,D (more automated , scalable)
- between C,D C is better as in D "and increase the number of neurons by a factor of 2" will make matters worse and increase overfitting
upvoted 1 times
Selected Answer: C
C for sure
upvoted 2 times
Selected Answer: C
Best practice is to let a AI Platform tool run the tuning to optimize hyperparameters. Why should I trust values in answers A or B?? Plus L2
regularization and dropout are the way to go here.
upvoted 2 times
Community vote
upvoted 2 times
Selected Answer: C
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 33/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You built and manage a production system that is responsible for predicting sales numbers. Model accuracy is crucial, because the production
model is required to keep up with market changes. Since being deployed to production, the model hasn't changed; however the accuracy of the
What issue is most likely causing the steady decline in model accuracy?
D. Incorrect data split ratio during model training, evaluation, validation, and test
Correct Answer: D
the biggest issue of this website is `all correct answers` are wrong
upvoted 13 times
Selected Answer: B
The market can be dynamic, Sales trends, customer preferences, and even competitor strategies might evolve over time but our model hasn't
changed since the deployment so our model can adapt with these changes by retraining only
Degradation Over Time: Without retraining to adapt to these changes, the model's predictions become less accurate as the real world diverges
from the data it was trained on.
upvoted 1 times
Selected Answer: B
Selected Answer: B
B because the environment is changing and the model only captures past performance
upvoted 1 times
Selected Answer: B
A. Poor data quality : Model perfomance depens on trained model only. Quality issue should be taken care by pipeline and it do not much affect
the model to cause a performance slow down over time
B. Lack of model retraining : Very obvious
C. Too few layers in the model for capturing information : If so model wpould not have been deployed at first stage due to low performance on
unseen data
D. Incorrect data split ratio during model training, evaluation, validation, and test : This is relevant only at training when model deployed at first
place, We have way passed that. Not not the reason.
upvoted 2 times
Selected Answer: B
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 34/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Went with B
upvoted 1 times
B is correct. Model needs to keep up with the market changes, implying that the underlying data distribution would be changing as well. Hence
retrain the model.
upvoted 1 times
Selected Answer: B
The questions says the model is required to keep up with market changes, hence retraining needed.
upvoted 1 times
Selected Answer: B
ANS: B
upvoted 1 times
Data distribution changes over time and so should do the model, so B is the correct answer
upvoted 1 times
B for sure
upvoted 3 times
Selected Answer: B
Community vote
upvoted 2 times
Selected Answer: B
B. Retraining is needed as the market is changing. its how the Model keep updated and predictions accuracy.
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 35/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You have been asked to develop an input pipeline for an ML training model that processes images from disparate sources at a low latency. You
discover that your input data does not fit in memory. How should you create a dataset following Google-recommended best practices?
D. Convert the images into TFRecords, store the images in Cloud Storage, and then use the tf.data API to read the images for training.
Correct Answer: B
Reference:
https://www.tensorflow.org/api_docs/python/tf/data/Dataset
Should be D
upvoted 19 times
My option is D.
Cite from Google Pag: to construct a Dataset from data in memory, use tf.data.Dataset.from_tensors() or tf.data.Dataset.from_tensor_slices(). When
input data is stored in a file (not in memory), the recommended TFRecord format, you can use tf.data.TFRecordDataset().
Selected Answer: D
Selected Answer: D
why this website shows wrong option as answer, this is my observation from so many questions?
upvoted 2 times
D is correct
upvoted 1 times
D because:
tf.data.Dataset is for data in memory.
tf.data.TFRecordDataset is for data in non-memory storage.
upvoted 1 times
Went with D
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 36/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
upvoted 2 times
Selected Answer: D
Converting your data into TFRecord has many advantages, such as: More efficient storage: the TFRecord data can take up less space than the
original data; it can also be partitioned into multiple files. Fast I/O: the TFRecord format can be read with parallel I/O operations, which is useful for
TPUs or multiple hosts
upvoted 1 times
Selected Answer: D
my option is D
upvoted 1 times
ans: D
upvoted 1 times
For data in memory use tf.data.Dataset, for data in non-memory storage use tf.data.TFRecordDataset.
Since data don't fit in memory, go with option D.
upvoted 1 times
Selected Answer: D
Selected Answer: D
- by options eliminations A is the first option to be dropped , prefetch will use additional memory overhead to buffer images
- answer in B,C,D but D is the best answer as we save the huge images dataset on gcs then load batches of data for training
- B,C not good as they did not provide a solution to images are not fit in memory
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 37/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You are an ML engineer at a large grocery retailer with stores in multiple regions. You have been asked to create an inventory prediction model.
Your model's features include region, location, historical demand, and seasonal popularity. You want the algorithm to learn from new inventory
data on a daily basis. Which algorithms should you use to build the model?
A. Classification
B. Reinforcement Learning
Correct Answer: B
Reference:
https://www.kdnuggets.com/2018/03/5-things-reinforcement-learning.html
As Y2Data pointed out, your reasoning for choosing B does not make much sense.
Furthermore, Reinforcement Learning for this question does not make much sense to me. Reinforcement Learning is basically agent - task
problems. You give the agent a task i.e. get out of a maze and then through trial and error and many many iterations the agent learns the correct
way to perform the task. It is called Reinforcement because you ... well ... reinforce the agent, you reward the agent for correct choices and penalize
for incorrect choices. In RL you dont use many / any previous data because the data is generated with each iteration I think.
upvoted 7 times
go for C
https://www.akkio.com/post/deep-learning-vs-reinforcement-learning-key-differences-and-use-
cases#:~:text=Reinforcement%20learning%20is%20particularly%20well,of%20reinforcement%20learning%20in%20action.
upvoted 1 times
Selected Answer: C
I'm not sure that daily basis means it is time series. It could mean updating the model daily.
But I'll follow collective intelligence.
upvoted 2 times
Selected Answer: C
Went with C
upvoted 1 times
Selected Answer: B
Reinforcement Learning(RL) is a type of machine learning technique that enables an agent to learn in an interactive environment by trial and error
using feedback from its own actions and experiences.
upvoted 1 times
Selected Answer: C
ans: C
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 38/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Selected Answer: C
Selected Answer: C
Selected Answer: C
"algorithm to learn from new inventory data on a daily basis" = time series model , best option to deal with time series is forsure RNN , vote for C
upvoted 1 times
It's C.
upvoted 3 times
"You want the algorithm to learn from new inventory data on a daily basis". The implication is a feedback with reward or punishment, which can
optimise the mode. But, all other options can only practice prediction against new data rather than learning knowledge from new data
automatically.
upvoted 4 times
I'd use C (RNN) in case we are predicting only based on historical demand (time series). However, as we are also taking region, location and
seasonality popularity into consideration, it is not a time series problem anymore.
upvoted 1 times
- source: https://builtin.com/data-science/recurrent-neural-networks-and-lstm
And *Reinforcement Learning* doesn't mean that the model will learn from new data (better explained by george_ognyanov).
upvoted 5 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 39/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You are building a real-time prediction engine that streams files which may contain Personally Identifiable Information (PII) to Google Cloud. You
Cloud Data Loss Prevention (DLP) API to scan the files. How should you ensure that the PII is not accessible by unauthorized individuals?
A. Stream all files to Google Cloud, and then write the data to BigQuery. Periodically conduct a bulk scan of the table using the DLP API.
B. Stream all files to Google Cloud, and write batches of the data to BigQuery. While the data is being written to BigQuery, conduct a bulk scan
C. Create two buckets of data: Sensitive and Non-sensitive. Write all data to the Non-sensitive bucket. Periodically conduct a bulk scan of that
bucket using the DLP API, and move the sensitive data to the Sensitive bucket.
D. Create three buckets of data: Quarantine, Sensitive, and Non-sensitive. Write all data to the Quarantine bucket. Periodically conduct a bulk
scan of that bucket using the DLP API, and move the data to either the Sensitive or Non-Sensitive bucket.
Correct Answer: A
Should be D
https://cloud.google.com/architecture/automating-classification-of-data-uploaded-to-cloud-
storage#building_the_quarantine_and_classification_pipeline
upvoted 25 times
It's easier than B, which would work for a real-time scenario - but would require loads more custom work to implement (things like batching,
segmentation, triggering).
A and C are 'reactive' / periodic, and so not appropriate for the given scenario.
upvoted 1 times
Selected Answer: D
Selected Answer: D
Option B does not provide a clear separation between sensitive and non-sensitive data before it is written to BigQuery, which means that PII might
be exposed during the process.
But, in D offers a better level of security by writing all the data to a Quarantine bucket first. This way, the DLP API can scan and categorize the data
into Sensitive or Non-sensitive buckets before it is further processed or stored. This ensures that PII is not accessible by unauthorized individuals, a
the sensitive data is identified and separated from the non-sensitive data before any further actions are taken.
upvoted 1 times
Selected Answer: D
real-time prediction engine, that streams files to Google Cloud. PII is not accessible by unauthorized individuals.
D
upvoted 1 times
upvoted 1 times
Selected Answer: D
Went with D
upvoted 2 times
Selected Answer: B
B is real time
upvoted 1 times
Selected Answer: D
It's D
upvoted 1 times
A, D, C they do not apply to a realtime case, all three say that the scan is applied periodically
Then it's B
upvoted 2 times
Selected Answer: B
D is the right answer: you can temporarily store the sensitive data in a Quarantine bucket with restricted access, then move the data to the relative
buckets once the PII have been protected.
upvoted 1 times
Selected Answer: D
Selected Answer: B
Reason being "Real Time' DLP scanning. Option A would scan all the data again and again. For others - Buckets etc is overkill and offline process.
upvoted 2 times
Selected Answer: A
Altough both A and D are correct when scanning for the data using DLP, however here the question is streaming data and the best option in this
particular case would be A.
Check this use case Using Cloud DLP with BigQuery
https://cloud.google.com/dlp/docs/dlp-bigquery
Also the other use case involving the DLP using Quarantine bucket is by uploading the files and not streaming.
https://cloud.google.com/architecture/automating-classification-of-data-uploaded-to-cloud-
storage#building_the_quarantine_and_classification_pipeline
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 41/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 42/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You work for a large hotel chain and have been asked to assist the marketing team in gathering predictions for a targeted marketing strategy. You
need to make predictions about user lifetime value (LTV) over the next 20 days so that marketing can be adjusted accordingly. The customer
dataset is in BigQuery, and you are preparing the tabular data for training with AutoML Tables. This data has a time signal that is spread across
A. Manually combine all columns that contain a time signal into an array. AIlow AutoML to interpret this array appropriately. Choose an
automatic data split across the training, validation, and testing sets.
B. Submit the data for training without performing any manual transformations. AIlow AutoML to handle the appropriate transformations.
Choose an automatic data split across the training, validation, and testing sets.
C. Submit the data for training without performing any manual transformations, and indicate an appropriate column as the Time column. AIlow
AutoML to split your data based on the time signal provided, and reserve the more recent data for the validation and testing sets.
D. Submit the data for training without performing any manual transformations. Use the columns that have a time signal to manually split your
data. Ensure that the data in your validation set is from 30 days after the data in your training set and that the data in your testing sets from 30
Correct Answer: D
Should be D. As time signal that is spread across multiple columns so manual split is required.
upvoted 21 times
https://cloud.google.com/automl-tables/docs/data-best-practices#time
upvoted 9 times
C
You use the Time column to tell AutoML Tables that time matters for your data; it is not randomly distributed over time. When you specify the Time
column, AutoML Tables use the earliest 80% of the rows for training, the next 10% of rows for validation, and the latest 10% of rows for testing.
AutoML Tables treats each row as an independent and identically distributed training example; setting the Time column does not change this. The
Time column is used only to split the data set.
You must include a value for the Time column for every row in your dataset. Make sure that the Time column has enough distinct values, so that
the evaluation and test sets are non-empty. Usually, having at least 20 distinct values should be sufficient.
https://cloud.google.com/automl-tables/docs/prepare#time
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 43/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
upvoted 14 times
During schema review, you select this column as the Time column. (In the API, you use the timeColumnSpecId field.) This selection takes effect
only if you have not specified the data split column.
If you have a time-related column that you do not want to use to split your data, set the data type for that column to Timestamp but do not set
it as the Time column.
upvoted 2 times
Selected Answer: D
thinking that "spread across multiple columns" seems like "columns with redundant information," and considering how AutoML can deal with
correlated columns, I think option C is the best choice, with no need for a manual split.
However, "time information is not contained in a single column" is the same thing as "time signal that is spread across multiple columns." I agree
that D could be the best option.
Then, I tend to think that D is the best choice because the text could be more clearly expressed in redundant options.
upvoted 2 times
Selected Answer: C
Selected Answer: D
"data has a time signal that is spread across multiple columns" - I interpret as having > 1 timeseries column.
AutoML knows how to deal with a single column but not multiple
hence answer is D
upvoted 1 times
Selected Answer: C
Since AutoML is good enough to perform the splits, C appears to be the right answer. Moreover, time information across multiple columns which
requires manual split as per option D is different from the question's scenario where the time signal is spread across multiple columns which can b
hours, months, days, etc. if we can define in AutoML the right time signal column, its enojugh to split the data and pick most recent data as test
data and earliest data as test data
upvoted 1 times
Selected Answer: D
A Wrong, Even if columns are combines into a 1D-array(column), the time signal should be noticed to autoML anyway. Automatic split cannot wor
since we need more than 20 days history
B Wrong, Without indicating time signal to AutoML, data would leak in (time leakage) in training/validation/test sets
C Wrong, but might be possible if time signal wouldn't have bee spread across multiple columns
D True, because time signal is spread accross multiple columns require to manually split the data. Since we want to predict LTV over the next 20
days, it is necessary to have at least 20 days history between the splits (30 seems okay: 10 days predictions) Validating and testing on the last 2
months seems reasonable for marketing purpose (usually seasonal).
upvoted 2 times
Selected Answer: D
Selected Answer: C
As far as I understand, that AutoML table can handle time-signal column full automatically. Thus, I went to C.
upvoted 1 times
Selected Answer: D
Went with D
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 44/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
This approach ensures that AutoML can handle the time-based nature of the data properly. By providing the Time column, AutoML can
automatically split the data in a way that respects the time-based structure, using more recent data for validation and testing. This approach is
especially important for time-series data, as it helps prevent leakage of future information into the training set, ensuring a more accurate and
reliable model.
upvoted 1 times
https://cloud.google.com/automl-tables/docs/data-best-practices#time
- If the time information is not contained in a single column, you can use a manual data split to use the most recent data as the test data, and the
earliest data as the training data.
upvoted 1 times
Selected Answer: D
I go with D: https://cloud.google.com/automl-tables/docs/data-best-practices#time
Read it carefully at the last paragraph of the topic: If the time information is not contained in a single column, you can use a manual data split to
use the most recent data as the test data, and the earliest data as the training data.
upvoted 1 times
Selected Answer: D
D
https://cloud.google.com/automl-tables/docs/data-best-practices
upvoted 2 times
Selected Answer: D
Automatic splitting is wrong for time-series, you need to split the data in older-newer, so A and B are wrong.
Since the time info is split in more columns, we can't use the option provided by C for the timestamps, but we need to go with D.
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 45/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You have written unit tests for a Kubeflow Pipeline that require custom libraries. You want to automate the execution of unit tests with each new
push to your development branch in Cloud Source Repositories. What should you do?
A. Write a script that sequentially performs the push to your development branch and executes the unit tests on Cloud Run.
B. Using Cloud Build, set an automated trigger to execute the unit tests when changes are pushed to your development branch.
C. Set up a Cloud Logging sink to a Pub/Sub topic that captures interactions with Cloud Source Repositories. Configure a Pub/Sub trigger for
D. Set up a Cloud Logging sink to a Pub/Sub topic that captures interactions with Cloud Source Repositories. Execute the unit tests using a
Cloud Function that is triggered when messages are sent to the Pub/Sub topic.
Correct Answer: B
B. GCP recommends to use Cloud Build when building KubeFlow Pipelines. It's possible to run unit tests in Cloud Build. And, the others seems
overly complex/unnecessary
upvoted 16 times
Selected Answer: B
Selected Answer: B
Cloud Build is the best choice but the other answers are feasible.
upvoted 1 times
Went with B
upvoted 1 times
Selected Answer: B
ans: B
upvoted 1 times
Selected Answer: B
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 46/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
B it is.
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 47/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You are training an LSTM-based model on AI Platform to summarize text using the following job submission script: gcloud ai-platform jobs submit
training $JOB_NAME \
--package-path $TRAINER_PACKAGE_PATH \
--module-name $MAIN_TRAINER_MODULE \
--job-dir $JOB_DIR \
--region $REGION \
--scale-tier basic \
-- \
--epochs 20 \
--batch_size=32 \
--learning_rate=0.001 \
You want to ensure that training time is minimized without significantly compromising the accuracy of your model. What should you do?
Correct Answer: C
B. Changing the scale tier does not impact performance–only speeds up training time. Epochs, Batch size, and learning rate all are hyperparameter
that might impact model accuracy.
upvoted 28 times
Selected Answer: B
Went with B
upvoted 1 times
Selected Answer: B
B is correct, cost is not a concern as it is not mentioned in the question, the scale tier can be upgraded to significantly minimize the training time.
C is incorrect, wouldn’t affect training time, but would affect model performance.
D is incorrect, the model might converge faster with higher learning rate, but this would affect the training routine and might cause exploding
gradients.
upvoted 1 times
Selected Answer: B
It's B!
upvoted 1 times
Selected Answer: B
A, C, D are all about hyperparameters that might impact model accuracy, while B is just about computing speed; so upgrading the scale tier will
make the model faster with no chance of reducing accuracy.
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 48/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Selected Answer: B
- using options elimination all options except B can harm the accuracy
upvoted 3 times
B for sure.
upvoted 2 times
As others have said - B is only way to improve the time to training the model.
upvoted 3 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 49/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You have deployed multiple versions of an image classification model on AI Platform. You want to monitor the performance of the model versions
B. Compare the loss performance for each model on the validation data.
C. Compare the receiver operating characteristic (ROC) curve for each model using the What-If Tool.
D. Compare the mean average precision across the models using the Continuous Evaluation feature.
Correct Answer: B
Answer is D
upvoted 13 times
D is correct. Choose the feature / capability GCP provides is always a good bet. :)
upvoted 6 times
Selected Answer: D
Selected Answer: D
Selected Answer: D
I choose by myself D. But as I read the post here https://www.v7labs.com/blog/mean-average-precision, I was not sure about D.
It wrote mAP is commonly used for object detection or instance segmentation tasks.
Validation Dataset in GCP context: not trained dataset and not seen dataset
upvoted 1 times
Selected Answer: D
D. Compare the mean average precision across the models using the Continuous Evaluation feature
https://cloud.google.com/vertex-ai/docs/evaluation/introduction
Vertex AI provides model evaluation metrics, such as precision and recall, to help you determine the performance of your models...
Vertex AI supports evaluation of the following model types:
AuPRC: The area under the precision-recall (PR) curve, also referred to as average precision. This value ranges from zero to one, where a higher
value indicates a higher-quality model.
upvoted 1 times
Selected Answer: D
Went with D
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 50/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
o monitor the performance of the model versions over time, you should compare the loss performance for each model on the validation data.
Therefore, option B is the correct answer.
upvoted 1 times
The best option to monitor the performance of multiple versions of an image classification model on AI Platform over time is to compare the loss
performance for each model on the validation data.
Option B is the best approach because comparing the loss performance of each model on the validation data is a common method to monitor
machine learning model performance over time. The validation data is a subset of the data that is not used for model training, but is used to
evaluate its performance during training and to compare different versions of the model. By comparing the loss performance of each model on the
same validation data, you can determine which version of the model has better performance.
upvoted 4 times
Selected Answer: D
If you have multiple model versions in a single model and have created an evaluation job for each one, you can view a chart comparing the mean
average precision of the model versions over time
upvoted 1 times
I think choose loss to compare the model performance is better than see for metrics. For example, when can build an image model classification
that has good precision metrics, because the class in unbalanced, but the loss could be terrible because of kind of loss choose that penalizes
classes.
so, losses are better than metrics to available models, and the answer is in A or B.
I thought that the A could be the answer because I see validation as a part of the training process. So, If we want to test the model performance
over time, we have to use new data, which I suppose to be the held-out data.
upvoted 3 times
ans: D
upvoted 1 times
Since you want to monitor the performance of the model versions *over time*, use the Continuous Evaluation feature, so D
upvoted 1 times
Selected Answer: D
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 51/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You trained a text classification model. You have the following SignatureDefs:
You started a TensorFlow-serving component server and tried to send an HTTP request to get a prediction using: headers = {"content-type":
B. data = json.dumps({ג€signature_name ג:€ג€serving_default ג,€ג€instancesג€ [['a', 'b', 'c', 'd', 'e', 'f']]})
C. data = json.dumps({ג€signature_name ג:€ג€serving_default ג,€ג€instancesג€ [['a', 'b', 'c'], ['d', 'e', 'f']]})
D. data = json.dumps({ג€signature_name ג:€ג€serving_default ג,€ג€instancesג€ [['a', 'b'], ['c', 'd'], ['e', 'f']]})
Correct Answer: C
Options:
Most likely D. A negative number in the shape enables auto expand (https://stackoverflow.com/questions/37956197/what-is-the-negative-index-
in-shape-arrays-used-for-tensorflow).
Then the first number -1 out of the shape (-1, 2) speaks the number of 1 dimensional arrays within the tensor (and it can autoexpand) while the
second numer (2) sets the number of elements in the inner array at 2. Hence D.
upvoted 19 times
Selected Answer: D
Went with D
upvoted 2 times
Selected Answer: D
ans: D
upvoted 1 times
Selected Answer: D
Having "shape=[-1,2]", the input can have as many rows as we want, but each row needs to be of 2 elements. The only option satisfying this
requirement is D.
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 52/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Selected Answer: D
will vote for D , as the data shape in instances matches the shape in signature def
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 53/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Your organization's call center has asked you to develop a model that analyzes customer sentiments in each call. The call center receives over one
million calls daily, and data is stored in Cloud Storage. The data collected must not leave the region in which the call originated, and no Personally
Identifiable Information (PII) can be stored or analyzed. The data science team has a third-party tool for visualization and access which requires a
SQL ANSI-2011 compliant interface. You need to select components for data processing and for analytics. How should the data pipeline be
designed?
A. 1= Dataflow, 2= BigQuery
B. 1 = Pub/Sub, 2= Datastore
Correct Answer: B
Should be A
upvoted 7 times
Selected Answer: A
Selected Answer: A
Went with A
upvoted 2 times
Selected Answer: A
correct answer is A
upvoted 1 times
we need a dataflow to process data from cloud storage and data is unstructured and if we want to perform analysis on unstructured with SQL
interface BIgQuery is the only option
upvoted 1 times
Selected Answer: A
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 54/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You need to do analytics, so the answer needs to contain BigQuery and only option A does.
Moreover, BigQuery is fine with SQL and Dataflow is the right tool for the processing pipline.
upvoted 1 times
Selected Answer: A
Selected Answer: A
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 55/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You are an ML engineer at a global shoe store. You manage the ML models for the company's website. You are asked to build a model that will
recommend new products to the user based on their purchase behavior and similarity with other users. What should you do?
Correct Answer: C
Reference:
https://cloud.google.com/solutions/recommendations-using-machine-learning-on-compute-engine
C. Collaborative filtering is about user similarity and product recommendations. Other models won't work
upvoted 19 times
Selected Answer: C
Chat gPT:
Collaborative filtering models are specifically designed for recommendation systems. They work by analyzing the interactions and behaviors of
users and items, then making predictions about what users will like based on similarities with other users. In this case, since you're looking at
purchase behavior and user similarities, a collaborative filtering approach is well-suited to identify and recommend products that users with similar
behaviors have liked or purchased.
Classification models (Option A) and regression models (Option D) are generally used for different types of predictive modeling tasks, not
specifically for recommendations. A knowledge-based filtering model (Option B), while useful in recommendation systems, relies more on explicit
knowledge about users and items, rather than on user interaction patterns and similarities, which seems to be the focus in this scenario.
upvoted 1 times
Went with C
upvoted 2 times
Selected Answer: C
ans: C
upvoted 1 times
Selected Answer: C
C
https://cloud.google.com/blog/topics/developers-practitioners/looking-build-recommendation-system-google-cloud-leverage-following-
guidelines-identify-right-solution-you-part-i
upvoted 1 times
Selected Answer: C
Selected Answer: C
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 56/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Selected Answer: C
https://developers.google.com/machine-learning/recommendation/collaborative/basics
upvoted 1 times
Definitely C
upvoted 2 times
Community vote
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 57/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You work for a social media company. You need to detect whether posted images contain cars. Each training example is a member of exactly one
class. You have trained an object detection neural network and deployed the model version to AI Platform Prediction for evaluation. Before
deployment, you created an evaluation job and attached it to the AI Platform Prediction model version. You notice that the precision is lower than
your business requirements allow. How should you adjust the model's final layer softmax threshold to increase precision?
Correct Answer: D
Answer: B ("improving precision typically reduces recall and vice versa", https://developers.google.com/machine-learning/crash-
course/classification/precision-and-recall)
upvoted 26 times
Selected Answer: B
To increase precision, you have to decrese recall, increse true positives, increse false negatives and decrease false positives
upvoted 2 times
Went with B
upvoted 3 times
Option B is the best approach because decreasing the threshold will increase the precision by reducing the number of false positives.
upvoted 1 times
Selected Answer: B
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 58/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
upvoted 1 times
Precision and recall are negatively correlated, when one goes up the other goes down and vice-versa; to increase precidion we need to decrease
recall, therefore answer B.
(To be more complete, answer C and D are wrong because they both would increase recall, according to the recall formula)
upvoted 2 times
Selected Answer: C
Selected Answer: B
precision and recall have negative proportion , so to increase precision reduce recall
upvoted 1 times
Selected Answer: B
It's B.
C,D is basically ruining your model.
upvoted 1 times
Selected Answer: B
definitely B
upvoted 1 times
Choices A and B are not really right, as precision and recall are after-effects, not something you will control ahead.
upvoted 1 times
Selected Answer: B
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 59/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 60/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You are responsible for building a unified analytics environment across a variety of on-premises data marts. Your company is experiencing data
quality and security challenges when integrating data across the servers, caused by the use of a wide range of disconnected tools and temporary
solutions. You need a fully managed, cloud-native data integration service that will lower the total cost of work and reduce repetitive work. Some
members on your team prefer a codeless interface for building Extract, Transform, Load (ETL) process. Which service should you use?
A. Dataflow
B. Dataprep
C. Apache Flink
Correct Answer: D
D. correct.
Reference: https://cloud.google.com/data-fusion
upvoted 14 times
Selected Answer: D
D is correct
upvoted 1 times
Selected Answer: D
I think D is correct.
upvoted 1 times
Selected Answer: D
Went with D
upvoted 1 times
Selected Answer: B
Answer is B
upvoted 1 times
Cloud Data Fusion is a fully managed, cloud-native data integration service provided by Google Cloud Platform. It is designed to simplify the
process of building and managing ETL pipelines across a variety of data sources and targets.
upvoted 2 times
Selected Answer: D
Selected Answer: D
Selected Answer: D
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 61/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
D is correct as it is codeless
upvoted 1 times
https://cloud.google.com/data-fusion/docs/concepts/overview#using_the_code-free_web_ui
upvoted 1 times
Visual point-and-click interface enabling code-free deployment of ETL/ELT data pipelines and Operate high-volumes of data pipelines periodically
source: https://cloud.google.com/data-fusion#all-features
upvoted 4 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 62/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You are an ML engineer at a regulated insurance company. You are asked to develop an insurance approval model that accepts or rejects
insurance applications from potential customers. What factors should you consider before building the model?
Correct Answer: A
I think the answer should be B. as I review the OECD document on impact of AI on insurance, the document mention explainability, traceable.
However, open for discussion. https://www.oecd.org/finance/Impact-Big-Data-AI-in-the-Insurance-Sector.pdf
upvoted 32 times
Selected Answer: B
went with B
upvoted 2 times
Selected Answer: B
Traceability: This involves maintaining records of the data, decisions, and processes used in the model. This is crucial in regulated industries for
audit purposes and to ensure compliance with regulatory standards. It helps in understanding how the model was developed and how it makes
decisions.
Reproducibility: Ensuring that the results of the model can be reproduced using the same data and methods is vital for validating the model's
reliability and for future development or debugging.
Explainability: Given the significant impact of the model’s decisions on individuals' lives, it's crucial that the model's decisions can be explained in
understandable terms. This is not just a best practice in AI ethics; in many jurisdictions, it's a legal requirement under regulations that mandate
transparency in automated decision-making.
upvoted 2 times
B. Traceability, reproducibility, and explainability are the most important factors to consider before building an insurance approval model.
Traceability ensures that the data used in the model is reliable and can be traced back to its source.
Reproducibility ensures that the model can be replicated and tested to ensure its accuracy and fairness.
Explainability ensures that the model's decisions can be explained to customers and regulators in a transparent manner. These factors are crucial
for building a trustworthy and compliant model for an insurance company.
Redaction is also important for protecting sensitive customer information, but it is not as critical as the other factors listed. Federated learning and
differential privacy are techniques used to protect data privacy, but they are not necessarily required for building an insurance approval model.
upvoted 4 times
Selected Answer: B
Went with B
upvoted 1 times
Selected Answer: B
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 63/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
When developing an insurance approval model, it's crucial to consider several factors to ensure that the model is fair, accurate, and compliant with
regulations. The factors to consider include:
Traceability: It's important to be able to trace the data used to build the model and the decisions made by the model. This is important for
transparency and accountability.
Reproducibility: The model should be built in a way that allows for its reproducibility. This means that other researchers should be able to
reproduce the same results using the same data and methods.
Explainability: The model should be able to provide clear and understandable explanations for its decisions. This is important for building trust with
customers and ensuring compliance with regulations.
Other factors that may also be important to consider, depending on the specific context of the insurance company and its customers, include data
privacy and security, fairness, and bias mitigation.
upvoted 4 times
When developing an insurance approval model, it's crucial to consider several factors to ensure that the model is fair, accurate, and compliant with
regulations. The factors to consider include:
Traceability: It's important to be able to trace the data used to build the model and the decisions made by the model. This is important for
transparency and accountability.
Reproducibility: The model should be built in a way that allows for its reproducibility. This means that other researchers should be able to
reproduce the same results using the same data and methods.
Explainability: The model should be able to provide clear and understandable explanations for its decisions. This is important for building trust with
customers and ensuring compliance with regulations.
Other factors that may also be important to consider, depending on the specific context of the insurance company and its customers, include data
privacy and security, fairness, and bias mitigation.
upvoted 1 times
Selected Answer: D
ans: B
upvoted 1 times
Selected Answer: B
Selected Answer: D
Selected Answer: B
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 64/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
don't undertsand why you are thinking of the privacy issue. Here it is not mentioned nor relevant imo. Moreover the traceability is a key. For me B
upvoted 3 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 65/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You are training a Resnet model on AI Platform using TPUs to visually categorize types of defects in automobile engines. You capture the training
Cloud TPU profiler plugin and observe that it is highly input-bound. You want to reduce the bottleneck and speed up your model training process.
Which modifications should you make to the tf.data dataset? (Choose two.)
Correct Answer: AE
A. Use the interleave option for reading data. - Yes, that helps to parallelize data reading.
B. Reduce the value of the repeat parameter. - No, this is only to repeat rows of the dataset.
C. Increase the buffer size for the shuttle option. - No, there is only a shuttle option.
D. Set the prefetch option equal to the training batch size. - Yes, this will pre-load the data.
E. Decrease the batch size argument in your transformation. - No, could be even slower due to more I/Os.
https://www.tensorflow.org/guide/data_performance
upvoted 24 times
Selected Answer: AD
Selected Answer: AD
Selected Answer: AD
yes AD
upvoted 1 times
Selected Answer: AD
Yes AD
upvoted 1 times
Selected Answer: AD
By the way, this is handy to understand the significance of shuffle buffer_size: https://stackoverflow.com/a/48096625/1933315
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 66/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Selected Answer: DE
I'm not 100% sure on A, personally I don't think processing many input files concurrently would help in this case because the reading operation is
precisely the problem. However, I'm no expert in this topic so I might be wrong.
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 67/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You have trained a model on a dataset that required computationally expensive preprocessing operations. You need to execute the same
preprocessing at prediction time. You deployed the model on AI Platform for high-throughput online prediction. Which architecture should you
use?
A. Validate the accuracy of the model that you trained on preprocessed data. Create a new model that uses the raw data and is available in
real time. Deploy the new model onto AI Platform for online prediction.
B. Send incoming prediction requests to a Pub/Sub topic. Transform the incoming data using a Dataflow job. Submit a prediction request to AI
Platform using the transformed data. Write the predictions to an outbound Pub/Sub queue.
C. Stream incoming prediction request data into Cloud Spanner. Create a view to abstract your preprocessing logic. Query the view every
second for new records. Submit a prediction request to AI Platform using the transformed data. Write the predictions to an outbound Pub/Sub
queue.
D. Send incoming prediction requests to a Pub/Sub topic. Set up a Cloud Function that is triggered when messages are published to the
Pub/Sub topic. Implement your preprocessing logic in the Cloud Function. Submit a prediction request to AI Platform using the transformed
Correct Answer: D
Reference:
https://cloud.google.com/pubsub/docs/publisher
Supporting B ..https://cloud.google.com/architecture/data-preprocessing-for-ml-with-tf-transform-pt1#where_to_do_preprocessing
upvoted 28 times
I think it should b B
upvoted 13 times
Selected Answer: B
Went with B, using dataflow for large amount data transformation is the best option
upvoted 2 times
I went to B.
A is completely wrong. C: 1st cloud spanner is not designed for high throughput, also it is not for preprocessing. D: cloud function could not be ge
enough resource to do the high computational transformation.
upvoted 2 times
Because the concern here is high throughput and not specifically the latency so better to go with option B
upvoted 1 times
B. Send incoming prediction requests to a Pub/Sub topic. Transform the incoming data using a Dataflow job. Submit a prediction request to AI
Platform using the transformed data. Write the predictions to an outbound Pub/Sub queue
https://dataintegration.info/building-streaming-data-pipelines-on-google-cloud
upvoted 1 times
Went with B
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 68/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Selected Answer: D
I think it's D as B is not a good choice because it requires you to run a Dataflow job for each prediction request. This is inefficient and can lead to
latency issues.
upvoted 2 times
Selected Answer: B
It's B
upvoted 1 times
Selected Answer: B
yes ans B
upvoted 1 times
Selected Answer: B
B
Pubsub + DataFlow + Vertex AI (AI Platform)
upvoted 1 times
Should be B. Dataflow is BEST option for preprocessing training , testing data both
upvoted 1 times
Selected Answer: B
Answer should be B
upvoted 1 times
Selected Answer: B
- using options eliminatios , A totally wrong , D also not valid as cloud functions is not sutiable for heavy data workflows
- answer between B,D will vote for B as dataflow is the best solution while dealing with heavy data workflows
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 69/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Your team trained and tested a DNN regression model with good results. Six months after deployment, the model is performing poorly due to a
change in the distribution of the input data. How should you address the input differences in production?
B. Perform feature selection on the model, and retrain the model with fewer features.
C. Retrain the model, and select an L2 regularization parameter with a hyperparameter tuning service.
D. Perform feature selection on the model, and retrain the model on a monthly basis with fewer features.
Correct Answer: C
Maybe it should be C
upvoted 2 times
A
Data drift doesn't necessarily require feature reselection (e.g. by L2 regularization).
https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning#challenges
upvoted 5 times
Selected Answer: A
When the distribution of input data changes, the model may not perform as well as it did during training. It is important to monitor the
performance of the model in production and identify any changes in the distribution of input data. By creating alerts to monitor for skew, you can
detect when the input data distribution has changed and take action to retrain the model using more recent data that reflects the new distribution
This will help ensure that the model continues to perform well in production.
upvoted 1 times
Selected Answer: A
Went with A
upvoted 2 times
Selected Answer: A
A is correct
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 70/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Selected Answer: A
Creating alerts to monitor for skew in the input data can help to detect when the distribution of the data has changed and the model's
performance is affected. Once a skew is detected, retraining the model with the new data can improve its performance.
upvoted 1 times
Selected Answer: A
Skew & drift monitoring: Production data tends to constantly change in different dimensions (i.e. time and system wise). And this causes the
performance of the model to drop.
https://cloud.google.com/vertex-ai/docs/model-monitoring/using-model-monitoring
upvoted 1 times
A
You don't need to do feature selection again
upvoted 2 times
Selected Answer: A
A
as celia explained
upvoted 1 times
Reference: https://machinelearningmastery.com/introduction-to-regularization-to-reduce-overfitting-and-improve-generalization-error/
upvoted 4 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 71/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You need to train a computer vision model that predicts the type of government ID present in a given image using a GPU-powered virtual machine
on Compute
✑ Optimizer: SGD
✑ Image shape = 224ֳ—224
✑ Batch size = 64
✑ Epochs = 10
✑ Verbose =2
During training you encounter the following error: ResourceExhaustedError: Out Of Memory (OOM) when allocating tensor. What should you do?
Correct Answer: B
Reference:
https://github.com/tensorflow/tensorflow/issues/136
B. I think you want to reduce batch size. Learning rate and optimizer shouldn't really impact memory utilisation. Decreasing image size (A) would
work, but might be costly in terms final performance
upvoted 22 times
Selected Answer: B
no doubt went to B
upvoted 1 times
Selected Answer: B
Went with B
upvoted 2 times
B is correct
upvoted 1 times
By reducing the batch size, the amount of memory required for each iteration of the training process is reduced
upvoted 1 times
Creating alerts to monitor for skew in the input data can help to detect when the distribution of the data has changed and the model's
performance is affected. Once a skew is detected, retraining the model with the new data can improve its performance.
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 72/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Selected Answer: B
Selected Answer: B
to fix memory overflow you need to reduce batch size also reduce input resolution is valid
but reducing image size can harm model performance , so answer is B
upvoted 3 times
Reducing batch size or reducing image size bot can reduce memory usage. But, the former seems much easier.
upvoted 2 times
Letter D can be used, as we reduced the image size but this will directly impact the model's performance. Another point is that when doing this, if
you are using a model via Keras's `Functional API` you need to change the definition of the input and also apply pre-processing on the image to
reduce its size . In other words: much more work than the letter B.
upvoted 3 times
A works too but depending on what you need you will loose perfomance (just like maartenalexander said) so I think it is not recommended.
upvoted 3 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 73/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You developed an ML model with AI Platform, and you want to move it to production. You serve a few thousand queries per second and are
experiencing latency issues. Incoming requests are served by a load balancer that distributes them across multiple Kubeflow CPU-only pods
(GKE). Your goal is to improve the serving latency without changing the underlying infrastructure. What should you do?
D. Recompile TensorFlow Serving using the source to support CPU-specific optimizations. Instruct GKE to choose an appropriate baseline
Correct Answer: D
D is correct since this question is focusing on server performance which development env is higher than production env. It's already throttling so
increase the pressure on them won't help. Both A and C is essentially doing this. B is a bit mysterious, but we definitely know that D would work.
upvoted 25 times
Selected Answer: D
increasing the max_batch_size TensorFlow Serving parameter, is not the best choice because increasing the batch size may not necessarily improve
latency. In fact, it may even lead to higher latency for individual requests, as they will have to wait for the batch to be filled before processing. This
may be useful when optimizing for throughput, but not for serving latency, which is the primary goal in this scenario.
upvoted 1 times
https://github.com/tensorflow/serving/blob/master/tensorflow_serving/batching/README.md#batch-scheduling-parameters-and-tuning
A may help to some extent, but it primarily affects how many requests are processed in a single batch. It might not directly address latency issues.
D is a valid approach for optimizing TensorFlow Serving for CPU-specific optimizations, but it's a more involved process and might not be the
quickest way to address latency issues.
upvoted 4 times
Selected Answer: A
I think A is correct, as D implies changes to the infrastructure (question says you must not do that).
upvoted 1 times
increasing the max_batch_size TensorFlow Serving parameter, is not the best choice because increasing the batch size may not necessarily improve
latency. In fact, it may even lead to higher latency for individual requests, as they will have to wait for the batch to be filled before processing. This
may be useful when optimizing for throughput, but not for serving latency, which is the primary goal in this scenario.
upvoted 1 times
max_batch_size parameter controls the maximum number of requests that can be batched together by TensorFlow Serving. Increasing this
parameter can help reduce the number of round trips between the client and server, which can improve serving latency. However, increasing the
batch size too much can lead to higher memory usage and longer processing times for each batch.
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 74/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
upvoted 1 times
Selected Answer: D
Definetely D
to improve the serving latency of an ML model on AI Platform, you can recompile TensorFlow Serving using the source to support CPU-specific
optimizations and instruct GKE to choose an appropriate baseline minimum CPU platform for serving nodes, this way GKE will schedule the pods
on nodes with at least that CPU platform.
upvoted 1 times
Went with D
upvoted 1 times
https://github.com/tensorflow/serving/tree/master/tensorflow_serving/batching
upvoted 2 times
Selected Answer: D
ans: D
upvoted 1 times
A would further increase latency. It may only help to improve the throughput if the memory and computation power of the GKE pods are not
saturated.
upvoted 1 times
Selected Answer: D
In other words, trade a bit of latency to batch up requests (that latency being of the order of very few milliseconds or less, based on
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 75/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
https://github.com/tensorflow/serving/blob/master/tensorflow_serving/batching/README.md#batch-scheduling-parameters-and-tuning) in order
to gain sub-linear scalability when under heavy load. This assumes that latency is a consequence of being processing bound - which is implied but
not explicitly stated.
D would likely also work, but is more involved. By manually recompiling TF Serving, we are starting to move away from the goodness of a fully
managed solution...
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 76/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You have a demand forecasting pipeline in production that uses Dataflow to preprocess raw data prior to model training and prediction. During
preprocessing, you employ Z-score normalization on data stored in BigQuery and write it back to BigQuery. New training data is added every week.
You want to make the process more efficient by minimizing computation time and manual intervention. What should you do?
B. Translate the normalization algorithm into SQL for use with BigQuery.
D. Normalize the data with Apache Spark using the Dataproc connector for BigQuery.
Correct Answer: B
B. I think. BiqQuery definitely minimizes computational time for normalization. I think it would also minimize manual intervention. For data
normalization in dataflow you'd have to pass in values of mean and standard deviation as a side-input. That seems more work than a simple SQL
query
upvoted 20 times
I think that D would be wrong because we would add one more service into the pipeline for a simple transformation (minus the mean and
divide by std).
upvoted 3 times
Selected Answer: B
Selected Answer: C
Every week when new data is loaded mean and standard deviation is calculated for it and passed as parameter to calculate z score at serving
https://towardsdatascience.com/how-to-normalize-features-in-tensorflow-5b7b0e3a4177
upvoted 1 times
To make the process more efficient by minimizing computation time and manual intervention, you should still opt for option B: Translate the
normalization algorithm into SQL for use with BigQuery. This way, you can perform the normalization directly in BigQuery, which will save time
and resources compared to using an external tool.
upvoted 1 times
A, D usually need additional configuration, which could cost much more time.
upvoted 1 times
Went with B
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 77/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Selected Answer: B
Best way is B
upvoted 2 times
Selected Answer: D
Option D is the best solution because Apache Spark provides a distributed computing platform that can handle large-scale data processing with
ease. By using the Dataproc connector for BigQuery, Spark can read data directly from BigQuery and perform the normalization process in a
distributed manner. This can significantly reduce computation time and manual intervention. Option A is not a good solution because Kubernetes
is a container orchestration platform that does not directly provide data normalization capabilities. Option B is not a good solution because Z-scor
normalization is a data transformation technique that cannot be easily translated into SQL. Option C is not a good solution because the
normalizer_fn argument in TensorFlow's Feature Column API is only applicable for feature normalization during model training, not for data
preprocessing.
upvoted 2 times
B is the most efficient as you will not load --> process --> save , no you will only write some sql in bigquery and voila :D
upvoted 4 times
Selected Answer: B
I agree with B.
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 78/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You need to design a customized deep neural network in Keras that will predict customer purchases based on their purchase history. You want to
explore model performance using multiple model architectures, store training data, and be able to compare the evaluation metrics in the same
Correct Answer: C
D - https://www.kubeflow.org/docs/about/use-cases/
upvoted 11 times
Should be D
upvoted 6 times
I would vote for D but if C had said instead "different job names" .. would that have been a better option?
upvoted 1 times
D should be correct
upvoted 2 times
Selected Answer: D
The best approach is to create an experiment in Kubeflow Pipelines to organize multiple runs.
Option A is incorrect because AutoML Tables is a managed machine learning service that automates the process of building machine learning
models from tabular data. It does not provide the flexibility to customize the model architecture or explore multiple model architectures.
Option B is incorrect because Cloud Composer is a managed workflow orchestration service that can be used to automate machine learning
workflows. However, it does not provide the same level of flexibility or scalability as Kubeflow Pipelines.
Option C is incorrect because running multiple training jobs on AI Platform with similar job names will not allow you to easily organize and
compare the results.
upvoted 5 times
Selected Answer: D
Went with D
upvoted 1 times
Selected Answer: D
With Kubeflow Pipelines, you can create experiments that help you keep track of multiple training runs with different model architectures and
hyperparameters.
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 79/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
upvoted 1 times
Selected Answer: C
https://cloud.google.com/vertex-ai/docs/experiments/user-journey/uj-compare-models
upvoted 2 times
https://www.kubeflow.org/docs/components/pipelines/concepts/experiment/
https://www.kubeflow.org/docs/components/pipelines/concepts/run/
upvoted 1 times
Selected Answer: D
D- we need to use experiments feature to comapre models,having different jobnames is not going to help track experiments.
upvoted 3 times
Similar job names is a bit of a confusion creator as we can not use same job names for sure. D sounds better but better in vertex AI during
experiment phase only.
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 80/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You are developing a Kubeflow pipeline on Google Kubernetes Engine. The first step in the pipeline is to issue a query against BigQuery. You plan
to use the results of that query as the input to the next step in your pipeline. You want to achieve this in the easiest way possible. What should you
do?
A. Use the BigQuery console to execute your query, and then save the query results into a new BigQuery table.
B. Write a Python script that uses the BigQuery API to execute queries against BigQuery. Execute this script as the first step in your Kubeflow
pipeline.
C. Use the Kubeflow Pipelines domain-specific language to create a custom component that uses the Python BigQuery client library to execute
queries.
D. Locate the Kubeflow Pipelines repository on GitHub. Find the BigQuery Query Component, copy that component's URL, and use it to load the
component into your pipeline. Use the component to execute queries against BigQuery.
Correct Answer: A
D. Kubeflow pipelines have different types of components, ranging from low- to high-level. They have a ComponentStore that allows you to access
prebuilt functionality from GitHub.
upvoted 20 times
Selected Answer: D
Not sure what is the reason behind putting A as it is manual and manual steps can not be part of automation. I would say Answer is D as it just
require a clone of the component from github. Using a Python and import bigquery component may sounds good too, but ask was what is easiest
It depends how word "easy" is taken by individuals but definitely not A.
upvoted 6 times
Selected Answer: B
Im going "against the flow" and chosing B. It just sounds a lot easier option than D.
upvoted 1 times
Selected Answer: B
Very confused as to why D is the correct answer. To me it seems a) much simpler to just write a couple of lines of python
(https://cloud.google.com/bigquery/docs/reference/libraries#client-libraries-install-python) and b) the documentation for the BigQuery reusable
component (https://v0-5.kubeflow.org/docs/pipelines/reusable-components/) states that the data is written to Google Cloud Storage, which mean
we have to write the fetching logic in the next pipeline step, going against the "as simple as possible" requirement. Would be interested to hear
why I am wrong.
upvoted 2 times
With Option B, you can write a Python script that uses the BigQuery API to execute queries against BigQuery and fetch the data directly into
your pipeline. This way, you can process the data as needed and pass it to the next step in the pipeline without the need to fetch it from Google
Cloud Storage.
While using the reusable BigQuery Query Component (Option D) provides a pre-built solution, it does require additional steps to fetch the data
from Google Cloud Storage for the next step in the pipeline, which might not be the simplest approach.
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 81/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Selected Answer: D
Went with D
upvoted 1 times
Selected Answer: D
https://linuxtut.com/en/f4771efee37658c083cc/
upvoted 1 times
Answer is D.
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 82/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You are building a model to predict daily temperatures. You split the data randomly and then transformed the training and test datasets.
Temperature data for model training is uploaded hourly. During testing, your model performed with 97% accuracy; however, after deploying to
production, the model's accuracy dropped to 66%. How can you make your production model more accurate?
A. Normalize the data for the training, and test datasets as two separate steps.
B. Split the training and test data based on time rather than a random split to avoid leakage.
C. Add more data to your test set to ensure that you have a fair distribution and sample for testing.
D. Apply data transformations before splitting, and cross-validate to make sure that the transformations are applied to both the training and
test sets.
Correct Answer: D
B. If you do time series prediction, you can't borrow information from the future to predict the future. If you do, you are artificially increasing your
accuracy.
upvoted 33 times
Selected Answer: B
Definetely B
upvoted 1 times
they did not explicitly say forecasting, but splitting by time is the number one rule you learn
upvoted 1 times
Selected Answer: B
Went with B
upvoted 1 times
Selected Answer: D
D is correct. cross-validate
upvoted 2 times
Selected Answer: B
train accuracy 97% , production accuracy 66% ---> time series data ---> random split ---> cause leakage , answer is B
upvoted 2 times
B should be the answer. D is incorrect as normalize before split is going to do data leak
https://community.rapidminer.com/discussion/32592/normalising-data-before-data-split-or-after
upvoted 2 times
Selected Answer: B
If you do random split in a time series, your risk that training data will contain information about the target (definition of leakage), but similar data
won't be available when the model is used for prediction. Leakage causes the model to look accurate until you start making actual predictions with
it.
upvoted 3 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 83/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 84/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You are developing models to classify customer support emails. You created models with TensorFlow Estimators using small datasets on your on-
premises system, but you now need to train the models using large datasets to ensure high performance. You will port your models to Google
Cloud and want to minimize code refactoring and infrastructure overhead for easier migration from on-prem to cloud. What should you do?
Correct Answer: C
A. AI platform provides lower infrastructure overhead and allows you to not have to refactor your code too much (no containerization and such,
like in KubeFlow).
upvoted 27 times
Selected Answer: A
I chose A. Even though D is a working option, it requires us to create a GKE cluster, which requires more work.
upvoted 2 times
Selected Answer: A
A. Use AI Platform for distributed training. : Managed , low infra change migration: yes , although need code refactoring to bigquery sql
B. Create a cluster on Dataproc for training. : only cluster ? what about training?
C. Create a Managed Instance Group with autoscaling. : Same Q?
D. Use Kubeflow Pipelines to train on a Google Kubernetes Engine cluster : only training?
upvoted 2 times
Selected Answer: A
Went with A
upvoted 1 times
Option A is the best choice as AI Platform provides a distributed training framework, enabling you to train large-scale models faster and with less
effort
upvoted 1 times
Selected Answer: A
using options eliminations answer between A,D will vote for A as it is easier
upvoted 1 times
Selected Answer: A
The answer is A. AI platform also contains kubeflow pipelines. you don't need to set up infrastructure to use it. For D you need to set up a
kubernetes cluster engine. The question asks us to minimize infrastructure overheard.
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 85/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Selected Answer: D
D- Kubeflow pipelines with Vertex ai provides you ability to reuse existing code using a TF conatiner in a pipeline. it helps automate the process.
there is a qwiklab walking through this.
A-incorrect, question is asking resuse existing code with minimum changes. distributed deployment does not address that.
upvoted 1 times
A) Portability.
B) Composability.
C) Flexibility in mind.
Selected Answer: A
TensorFlow Estimators means require distributed and that is key feature for AI platform or later Vertex AI.
upvoted 3 times
However, I wanted to share my logic why its not B as well. Dataproc is a managed Hadoop and as such needs a processing engine for ML tasks.
Most likely Spark and SparkML. Now Spark code is quite different than pure Python and SparkML is even more different than TFcode. I imagine
there might me a way to convert TF code to run on SparkML, but this seems a lot of work. And besides the question specifically wants us to
minimize refactoring, so there you have it, we can eliminate option B 100%.
upvoted 4 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 86/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You have trained a text classification model in TensorFlow using AI Platform. You want to use the trained model for batch predictions on text data
stored in
C. Use Dataflow with the SavedModel to read the data from BigQuery.
D. Submit a batch prediction job on AI Platform that points to the model location in Cloud Storage.
Correct Answer: A
I think it's A
https://cloud.google.com/bigquery-ml/docs/making-predictions-with-imported-tensorflow-models#importing_models
upvoted 11 times
Selected Answer: A
Would go with D
upvoted 1 times
Selected Answer: A
Model : AI Platform.
pred batch data : BigQuery
constraint : computational overhead
Same platform as data == less computation required to load and pass it to model
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 87/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Selected Answer: A
Selected Answer: A
Cost: Submitting a batch prediction job on AI Platform is a paid service. The cost will depend on the size of the model and the amount of data that
you are predicting.
Complexity: Submitting a batch prediction job on AI Platform requires you to write some code. This can be a challenge if you are not familiar with
AI Platform.
Performance: Submitting a batch prediction job on AI Platform may not be as efficient as using BigQuery ML. This is because AI Platform needs to
load the model into memory before it can run the predictions.
Overall, option D is a viable option, but it may not be the best option for all situations.
upvoted 2 times
Selected Answer: D
Went with D
upvoted 1 times
why not C?
upvoted 1 times
what about C?
upvoted 1 times
D is more straightforward
upvoted 1 times
Now, is it A/D?
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 88/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You work with a data engineering team that has developed a pipeline to clean your dataset and save it in a Cloud Storage bucket. You have
created an ML model and want to use the data to refresh your model as soon as new data is available. As part of your CI/CD workflow, you want to
Pipelines training job on Google Kubernetes Engine (GKE). How should you architect this workflow?
A. Configure your pipeline with Dataflow, which saves the files in Cloud Storage. After the file is saved, start the training job on a GKE cluster.
B. Use App Engine to create a lightweight python client that continuously polls Cloud Storage for new files. As soon as a file arrives, initiate
C. Configure a Cloud Storage trigger to send a message to a Pub/Sub topic when a new file is available in a storage bucket. Use a Pub/Sub-
D. Use Cloud Scheduler to schedule jobs at a regular interval. For the first step of the job, check the timestamp of objects in your Cloud
Storage bucket. If there are no new files since the last run, abort the job.
Correct Answer: C
C
https://cloud.google.com/architecture/architecture-for-mlops-using-tfx-kubeflow-pipelines-and-cloud-build#triggering-and-scheduling-kubeflow
pipelines
upvoted 15 times
C
https://cloud.google.com/architecture/architecture-for-mlops-using-tfx-kubeflow-pipelines-and-cloud-build#triggering-and-scheduling-kubeflow
pipelines
upvoted 7 times
Selected Answer: C
Went with C
upvoted 1 times
Selected Answer: C
The scenario involves automatically running a Kubeflow Pipelines training job on GKE as soon as new data becomes available. To achieve this, we
can use Cloud Storage to store the cleaned dataset, and then configure a Cloud Storage trigger that sends a message to a Pub/Sub topic wheneve
a new file is added to the storage bucket. We can then create a Pub/Sub-triggered Cloud Function that starts the training job on a GKE cluster.
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 89/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Selected Answer: A
The question says: As part of your CI/CD workflow, you want to automatically run a Kubeflow..
Selected Answer: C
C
Pubsub is the keyword
upvoted 2 times
event driven architecture is better than polling based architecure so I will vote for C
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 90/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You have a functioning end-to-end ML pipeline that involves tuning the hyperparameters of your ML model using AI Platform, and then using the
best-tuned parameters for training. Hypertuning is taking longer than expected and is delaying the downstream processes. You want to speed up
the tuning job without significantly compromising its effectiveness. Which actions should you take? (Choose two.)
Correct Answer: BD
Reference:
https://cloud.google.com/ai-platform/training/docs/hyperparameter-tuning-overview
I think should CE. I can't find any reference regarding B can reduce tuning time.
upvoted 18 times
Selected Answer: CE
see pawan94
upvoted 2 times
So in my opinion C and E ( after carefully reading the updated docs) and please don't believe everything CHATGPT says . I encountered so many
questions where the LLM's are giving completely wrong answers
upvoted 3 times
Selected Answer: CD
I chose C and D
upvoted 2 times
Selected Answer: CD
Early Stopping: Enabling early stopping allows the tuning process to terminate a trial if it becomes clear that it's not producing promising results.
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 91/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
This prevents wasting time on unpromising trials and can significantly speed up the hyperparameter tuning process. It helps to focus resources on
more promising parameter combinations.
D. Change the search algorithm from Bayesian search to random search.
Random Search Algorithm: Random search, as opposed to Bayesian optimization, doesn't attempt to build a model of the objective function. While
Bayesian search can be more efficient in finding the optimal parameters, random search is often faster per iteration. Random search can be
particularly effective when the hyperparameter space is large, as it doesn't require as much computational power to select the next set of
parameters to evaluate.
upvoted 2 times
Selected Answer: CE
C&E
This video explains very well the max trials and parallel trials
https://youtu.be/8hZ_cBwNOss
This link explains early stopping
See https://cloud.google.com/ai-platform/training/docs/using-hyperparameter-tuning#early-stopping
upvoted 3 times
Selected Answer: CE
A increase time, B HP tuning job normally bottle neck is not at model size, D did reduce time, but might significantly hurt effectiveness
upvoted 1 times
Selected Answer: AC
Running parallel trials has the benefit of reducing the time the training job takes (real time—the total processing time required is not typically
changed). However, running in parallel can reduce the effectiveness of the tuning job overall. That is because hyperparameter tuning uses the
results of previous trials to inform the values to assign to the hyperparameters of subsequent trials. When running in parallel, some trials start
without having the benefit of the results of any trials still running.
You can specify that AI Platform Training must automatically stop a trial that has become clearly unpromising. This saves you the cost of continuing
a trial that is unlikely to be useful.
To permit stopping a trial early, set the enableTrialEarlyStopping value in the HyperparameterSpec to TRUE.
upvoted 1 times
Selected Answer: CE
Selected Answer: AD
To speed up the tuning job without significantly compromising its effectiveness, you can take the following actions:
A. Decrease the number of parallel trials: By reducing the number of parallel trials, you can limit the amount of computational resources being used
at a given time, which may help speed up the tuning job. However, reducing the number of parallel trials too much could limit the exploration of
the parameter space and result in suboptimal results.
D. Change the search algorithm from Bayesian search to random search: Bayesian optimization is a computationally intensive method that requires
more time and resources than random search. By switching to a simpler method like random search, you may be able to speed up the tuning job
without compromising its effectiveness. However, random search may not be as efficient in finding the best hyperparameters as Bayesian
optimization.
upvoted 1 times
Selected Answer: AD
The two actions that can speed up hyperparameter tuning without compromising effectiveness are decreasing the number of parallel trials and
changing the search algorithm from Bayesian search to random search.
upvoted 2 times
B. Decrease the range of floating-point values: Reducing the range of the hyperparameters will decrease the search space and the time it takes to
find the optimal hyperparameters. However, if the range is too narrow, it may not be possible to find the best hyperparameters.
C. Set the early stopping parameter to TRUE: Setting the early stopping parameter to true will stop the trial when the performance has stopped
improving. This will help to reduce the number of trials needed and thus speed up the hypertuning job without compromising its effectiveness.
D.Changing the search algorithm from Bayesian search to random search could also be a valid action to speed up the hypertuning job. Random
search can explore the hyperparameter space more efficiently and with less computation cost compared to Bayesian search, especially when the
search space is large and complex. However, it may not be as effective as Bayesian search in finding the best hyperparameters in some cases.
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 92/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
upvoted 1 times
Answer C,E
=========
Explanation :
A. Decrease the number of parallel trials : doing this will of course make Hypertuning take more time , we need to increase parallel trials not
decrease
B.Decrease the range of floating-point values : theoretically this should speed up the computation but this is not the most correct answer
C. Set the early stopping parameter to TRUE : this is very good option
D. Change the search algorithm from Bayesian search to random search : also searching the search algorithm will not have a great impact
E. Decrease the maximum number of trials during subsequent training phases : very good option
upvoted 2 times
Selected Answer: CE
CE for me.
upvoted 1 times
Selected Answer: CE
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 93/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Your team is building an application for a global bank that will be used by millions of customers. You built a forecasting model that predicts
customers' account balances 3 days in the future. Your team will use the results in a new feature that will notify users when their account balance
is likely to drop below $25. How should you serve your predictions?
A. 1. Create a Pub/Sub topic for each user. 2. Deploy a Cloud Function that sends a notification when your model predicts that a user's
B. 1. Create a Pub/Sub topic for each user. 2. Deploy an application on the App Engine standard environment that sends a notification when
your model predicts that a user's account balance will drop below the $25 threshold.
C. 1. Build a notification system on Firebase. 2. Register each user with a user ID on the Firebase Cloud Messaging server, which sends a
notification when the average of all account balance predictions drops below the $25 threshold.
D. 1. Build a notification system on Firebase. 2. Register each user with a user ID on the Firebase Cloud Messaging server, which sends a
notification when your model predicts that a user's account balance will drop below the $25 threshold.
Correct Answer: A
Should be D !
creating a Pub/Sub topic for each user is overkill
upvoted 18 times
Selected Answer: D
D is correct. Firebase is designed for exactly this sort of scenario. Also, it would not be possible to create millions of pubsub topics due to GCP
quotas
https://cloud.google.com/pubsub/quotas#quotas
https://firebase.google.com/docs/cloud-messaging
upvoted 6 times
Selected Answer: D
Selected Answer: A
simple answer , use tools most mentioned during training . , cloud functions
upvoted 1 times
Selected Answer: D
Went with D
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 94/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Selected Answer: D
"Create a Pub/Sub topic for each user" this is crazy , we can not imagine a system with millions of pub/sub topics , so A,B wrong
C also wrong
upvoted 3 times
Selected Answer: D
Selected Answer: D
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 95/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You work for an advertising company and want to understand the effectiveness of your company's latest advertising campaign. You have
streamed 500 MB of campaign data into BigQuery. You want to query the table, and then manipulate the results of that query with a pandas
A. Use AI Platform Notebooks' BigQuery cell magic to query the data, and ingest the results as a pandas dataframe.
B. Export your table as a CSV file from BigQuery to Google Drive, and use the Google Drive API to ingest the file into your notebook instance.
C. Download your table from BigQuery as a local CSV file, and upload it to your AI Platform notebook instance. Use pandas.read_csv to ingest
D. From a bash cell in your AI Platform notebook, use the bq extract command to export the table as a CSV file to Cloud Storage, and then use
gsutil cp to copy the data into the notebook. Use pandas.read_csv to ingest the file as a pandas dataframe.
Correct Answer: C
Reference:
https://cloud.google.com/bigquery/docs/bigquery-storage-python-pandas
Selected Answer: A
Selected Answer: A
Went with A
upvoted 2 times
Selected Answer: A
%%bigquery df
SELECT name, SUM(number) as count
FROM `bigquery-public-data.usa_names.usa_1910_current`
GROUP BY name
ORDER BY count DESC
LIMIT 3
print(df.head())
upvoted 4 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 96/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
A
https://cloud.google.com/bigquery/docs/visualize-jupyter
upvoted 2 times
Selected Answer: A
https://googleapis.dev/python/bigquery/latest/magics.html#ipython-magics-for-bigquery
upvoted 2 times
Selected Answer: A
this is the simplest and most straightforward way read BQ data into Pandas dataframe.
upvoted 3 times
Selected Answer: C
both A and C is technically correct. C has more manual step and A has less. The question does not ask which requires least effort. so C is clear
answer
upvoted 1 times
Selected Answer: C
C is the correct answer due to the size of the data. It wouldn't be possible to download it all into an in memory data frame.
upvoted 1 times
https://googleapis.dev/python/bigquery/latest/magics.html
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 97/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You are an ML engineer at a global car manufacture. You need to build an ML model to predict car sales in different cities around the world. Which
features or feature crosses should you use to train city-specific relationships between car type and number of sales?
A. Thee individual features: binned latitude, binned longitude, and one-hot encoded car type.
B. One feature obtained as an element-wise product between latitude, longitude, and car type.
C. One feature obtained as an element-wise product between binned latitude, binned longitude, and one-hot encoded car type.
D. Two feature crosses as an element-wise product: the first between binned latitude and one-hot encoded car type, and the second between
Correct Answer: C
C
https://developers.google.com/machine-learning/crash-course/feature-crosses/check-your-understanding
upvoted 22 times
Selected Answer: C
Went with C
upvoted 1 times
Selected Answer: C
https://developers.google.com/machine-learning/crash-course/feature-crosses/video-lecture
upvoted 4 times
https://developers.google.com/machine-learning/crash-course/feature-crosses/check-your-understanding
Answer C: It needs a feature cross to obtain one feature.
upvoted 3 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 98/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
In that order of Ideas Crossing binned latitude with binned longitude enables the model to learn city-specific effects of car type.
I will go for C,
https://developers.google.com/machine-learning/crash-course/feature-crosses/check-your-understanding
upvoted 12 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 99/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You work for a large technology company that wants to modernize their contact center. You have been asked to develop a solution to classify
incoming calls by product so that requests can be more quickly routed to the correct support team. You have already transcribed the calls using
the Speech-to-Text API. You want to minimize data preprocessing and development time. How should you build the model?
C. Use the Cloud Natural Language API to extract custom entities for classification.
D. Build a custom model to identify the product keywords from the transcribed calls, and then run the keywords through a classification
algorithm.
Correct Answer: A
Should be B
-> minimize data preprocessing and development time
upvoted 21 times
Key Differences:
Approach: Option B (AutoML Natural Language) involves using an AutoML service to train a custom NLP model, while Option C (Cloud Natural
Language API) relies on a pre-built NLP API.
Control and Customization: Option B gives you more control and customization over the training process, as you train a model specific to your
needs. Option C offers less control but is quicker to set up since it uses a pre-built API.
Complexity: Option B might require more technical expertise to set up and configure the AutoML model, while Option C is more straightforward
and user-friendly.
In summary, both options allow you to extract custom entities for classification, but Option B (AutoML) involves more manual involvement in
training a custom model, while Option C (Cloud Natural Language API) provides a simpler, pre-built solution
upvoted 1 times
Selected Answer: B
Went with B
upvoted 2 times
why not C?
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 100/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Selected Answer: B
AutoML is appropriate to classify incoming calls by product (Custom) to be routed to the correct support team.
Cloud Natural Language API is for general case (not particular business)
upvoted 1 times
Selected Answer: B
"minimize data preprocessing and development time" answer will be limited to B,C
will choose C as Natural Language API does not handle custom operation
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 101/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You are training a TensorFlow model on a structured dataset with 100 billion records stored in several CSV files. You need to improve the
A. Load the data into BigQuery, and read the data from BigQuery.
B. Load the data into Cloud Bigtable, and read the data from Bigtable.
C. Convert the CSV files into shards of TFRecords, and store the data in Cloud Storage.
D. Convert the CSV files into shards of TFRecords, and store the data in the Hadoop Distributed File System (HDFS).
Correct Answer: B
Reference:
https://cloud.google.com/dataflow/docs/guides/templates/provided-batch
C - not enough info in the question, but C is the "most correct" one
upvoted 24 times
Selected Answer: C
C https://datascience.stackexchange.com/questions/16318/what-is-the-benefit-of-splitting-tfrecord-file-into-
shards#:~:text=Splitting%20TFRecord%20files%20into%20shards,them%20through%20a%20training%20process.
upvoted 2 times
C. Convert the CSV files into shards of TFRecords, and store the data in Cloud Storage.
TFRecords is a TensorFlow-specific binary format that is optimized for performance. Converting the CSV files into TFRecords will improve the
input/output execution performance. Sharding the TFRecords will allow the data to be read in parallel, which will further improve performance.
Loading the data into BigQuery or Cloud Bigtable will add an additional layer of abstraction, which can slow down performance.
Storing the TFRecords in HDFS is not likely to improve performance, as HDFS is not optimized for TensorFlow.
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 102/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
A. Load the data into BigQuery, and read the data from BigQuery.
https://cloud.google.com/blog/products/ai-machine-learning/tensorflow-enterprise-makes-accessing-data-on-google-cloud-faster-and-easier
Precisely on this link provided in other comments it whos that the best shot with tfrecords is: 18752 Records per second. In the same report it
shows that bigquery is morethan 40000 recors per second
upvoted 2 times
On the other hand, converting the CSV files into shards of TFRecords and storing them in Cloud Storage (Option C) will provide better
performance because TFRecords is a format designed specifically for TensorFlow. It allows for efficient storage and retrieval of data, making it a
more suitable choice for improving the input/output execution performance. Additionally, Cloud Storage provides high throughput and low-
latency data access, which is beneficial for training large-scale TensorFlow models.
upvoted 2 times
Went with C
upvoted 2 times
Cloud Bigtable is typically used to process unstructured data, such as time-series data, logs, or other types of data that do not conform to a fixed
schema. However, Cloud Bigtable can also be used to store structured data if necessary, such as in the case of a key-value store or a database that
does not require complex relational queries.
upvoted 1 times
Option C, converting the CSV files into shards of TFRecords and storing the data in Cloud Storage, is the most appropriate solution for improving
input/output execution performance in this scenario
upvoted 1 times
Selected Answer: A
https://cloud.google.com/architecture/ml-on-gcp-best-practices#store-tabular-data-in-bigquery
BigQuery for structured data, cloud storage for unstructed data
upvoted 3 times
Selected Answer: D
"100 billion records stored in several CSV files" that means we deal with distributed big data problem , so HDFS is very suitable , Will choose D
upvoted 1 times
Selected Answer: C
Google best practices: Use Cloud Storage buckets and directories to group the shards of data (either sharded TFRecord files if using Tensorflow, or
Avro if using any other framework). Aim for files of at least 100Mb, and 100 - 10000 shards.
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 103/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
As the lead ML Engineer for your company, you are responsible for building ML models to digitize scanned customer forms. You have developed a
TensorFlow model that converts the scanned images into text and stores them in Cloud Storage. You need to use your ML model on the
aggregated data collected at the end of each day with minimal manual intervention. What should you do?
C. Use Cloud Functions for prediction each time a new data point is ingested.
D. Deploy the model on AI Platform and create a version of it for online inference.
Correct Answer: D
Selected Answer: A
Selected Answer: A
Selected Answer: A
Went with A
upvoted 1 times
Because aggregated data can be sent at the end of the day for batch prediction and AI platform is managed so satisfy minimal intervention
requirement
Not B as violates minimal intervention requirement
Not C and D as real-time or online inference is not needed since data is aggregated at the end of the day
upvoted 3 times
Selected Answer: A
You need to use your ML model on the aggregated data collected at the end of each day with minimal manual intervention.
upvoted 1 times
Selected Answer: A
"You need to use your ML model on the aggregated data" that means we need the batch prediction feature in AI platform
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 104/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Selected Answer: A
A
https://cloud.google.com/ai-platform/prediction/docs/batch-predict
upvoted 3 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 105/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You recently joined an enterprise-scale company that has thousands of datasets. You know that there are accurate descriptions for each table in
BigQuery, and you are searching for the proper BigQuery table to use for a model you are building on AI Platform. How should you find the data
A. Use Data Catalog to search the BigQuery datasets by using keywords in the table description.
B. Tag each of your model and version resources on AI Platform with the name of the BigQuery table that was used for training.
C. Maintain a lookup table in BigQuery that maps the table descriptions to the table ID. Query the lookup table to find the correct table ID for
D. Execute a query in BigQuery to retrieve all the existing table names in your project using the INFORMATION_SCHEMA metadata tables that
are native to BigQuery. Use the result o find the table that you need.
Correct Answer: B
Should be A
https://cloud.google.com/data-catalog/docs/concepts/overview
upvoted 18 times
Selected Answer: A
who is providing these answers?? Its clearly A. most of the answers are incorrect here.
upvoted 7 times
Selected Answer: A
A without hesitation.
upvoted 1 times
Selected Answer: A
Selected Answer: A
A should be correct
upvoted 1 times
Selected Answer: A
Went with A
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 106/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 107/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You started working on a classification problem with time series data and achieved an area under the receiver operating characteristic curve (AUC
ROC) value of
99% for training data after just a few experiments. You haven't explored using any sophisticated algorithms or spent any time on hyperparameter
tuning. What should your next step be to identify and fix the problem?
C. Address data leakage by removing features highly correlated with the target value.
D. Address the model overfitting by tuning the hyperparameters to reduce the AUC ROC value.
Correct Answer: B
Selected Answer: C
Let 's explain on my owner way, sometime the feature used on training data use value to calculate something from target value unintentionally, it
result in high correlation with each other.
for instance , you predict stock price by using moving average, MACD , RSI despite the fact that 3 features have been calculated from price (target)
upvoted 8 times
B: correct.
considering c, but why should we remove a feature of highly predictive nature?? for me, this does not explain the problem of overfitting... a highly
predictive feature is also useful for good performance evaluated on the test set.
--> Decide for B!
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 108/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Selected Answer: B
Selected Answer: A
Option A: This option is a reasonable choice. Switching to a less complex algorithm can help reduce overfitting, and using k-fold cross-validation
can provide a better estimate of how well the model will generalize to unseen data. It's essential to ensure that the high performance isn't solely
due to overfitting.
upvoted 1 times
Option C: Removing features highly correlated with the target value can be a valid step in feature selection or preprocessing. However, it
doesn't directly address the overfitting issue or explain why the model is performing exceptionally well on the training data. It's a separate step
from mitigating overfitting.
Option D: This option is incorrect. Tuning hyperparameters should aim to improve model performance on the validation set, not reduce it.
Selected Answer: B
Selected Answer: B
Option C is a good step to avoid overfitting, but it's not necessarily the best approach to address data leakage.
Data leakage occurs when information from the validation or test data leaks into the training data, leading to overly optimistic performance
metrics. In time-series data, it's important to avoid using future information to predict past events.
Removing features highly correlated with the target value may help to reduce overfitting, but it does not necessarily address data leakage.
Therefore, applying nested cross-validation during model training is a better approach to address data leakage in this scenario.
upvoted 2 times
Selected Answer: C
https://towardsdatascience.com/avoiding-data-leakage-in-timeseries-101-25ea13fcb15f
Directly says: "Dive straight into the MVP, cross-validate later!"
MVP stands for Minimum Viable Product
upvoted 1 times
Agree with Paul_Dirac. Also it is recommended to use nested-cross-validation to avoid data leakage in time series data.
upvoted 1 times
Selected Answer: C
There can be a feature causing data leakage which might have been overlooked. In addition, when cross-validation is done randomly, the leakage
can be even bigger.
upvoted 1 times
Went with B
upvoted 1 times
B
I agree with Paul_Dirac
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 109/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
upvoted 2 times
Selected Answer: B
"You haven't explored using any sophisticated algorithms or spent any time on hyperparameter tuning" so we have a base /default
hyperparameters estimator so overfitting is quite not possible , so it is a leakage problem , by inspection C is wrong , so it will be B
upvoted 1 times
Quite tricky but through elimination, correct answer is B. Model overfitting doesn't apply here as we can't tell if a model is overfitting by just
looking at training data results.
upvoted 3 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 110/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You work for an online travel agency that also sells advertising placements on its website to other companies. You have been asked to predict the
most relevant web banner that a user should see next. Security is important to your company. The model latency requirements are 300ms@p99,
the inventory is thousands of web banners, and your exploratory analysis has shown that navigation context is a good predictor. You want to
Implement the simplest solution. How should you configure the prediction pipeline?
A. Embed the client on the website, and then deploy the model on AI Platform Prediction.
B. Embed the client on the website, deploy the gateway on App Engine, and then deploy the model on AI Platform Prediction.
C. Embed the client on the website, deploy the gateway on App Engine, deploy the database on Cloud Bigtable for writing and for reading the
user's navigation context, and then deploy the model on AI Platform Prediction.
D. Embed the client on the website, deploy the gateway on App Engine, deploy the database on Memorystore for writing and for reading the
user's navigation context, and then deploy the model on Google Kubernetes Engine.
Correct Answer: B
ANS: C
GAE + IAP
https://medium.com/google-cloud/secure-cloud-run-cloud-functions-and-app-engine-with-api-key-73c57bededd1
Selected Answer: B
Was torn between B and C, but decided for B, because the question states how we should configure the PREDICTION pipeline!
Since the exploratory analysis already identified navigation context as good predictor, the focus should be on the prediction model itself.
upvoted 2 times
Selected Answer: C
Selected Answer: B
Embed the client on the website, deploy the gateway on App Engine, and then deploy the model on AI Platform Prediction.
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 111/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Selected Answer: B
B is also a possible solution, but it does not include a database for storing and retrieving the user's navigation context. This means that every time
user visits a page, the gateway would need to query the website to retrieve the navigation context, which could be slow and inefficient. By using
Cloud Bigtable to store the navigation context, the gateway can quickly retrieve the context from the database and pass it to the model for
prediction. This makes the overall prediction pipeline more efficient and scalable. Therefore, C is a better option compared to B.
upvoted 3 times
Selected Answer: B
Selected Answer: C
C. Embed the client on the website, deploy the gateway on App Engine, deploy the database on Cloud Bigtable for writing and for reading the
user's navigation context, and then deploy the model on AI Platform Prediction
https://cloud.google.com/architecture/minimizing-predictive-serving-latency-in-machine-learning#choosing_a_nosql_database
Typical use cases for Bigtable are:
* Ad prediction that leverages dynamically aggregated values over all ad requests and historical data.
upvoted 1 times
Selected Answer: C
Bigtable is a massively scalable NoSQL database service engineered for high throughput and for low-latency workloads. It can handle petabytes of
data, with millions of reads and writes per second at a latency that's on the order of milliseconds.
Fraud detection that leverages dynamically aggregated values. Applications in Fintech and Adtech are usually subject to heavy reads and writes.
Ad prediction that leverages dynamically aggregated values over all ad requests and historical data.
Booking recommendation based on the overall customer base's recent bookings.
upvoted 1 times
Went with C
upvoted 1 times
Selected Answer: B
Selected Answer: B
B.
simplest solution
upvoted 1 times
B. Embed the client on the website, deploy the gateway on App Engine, and then deploy the model on AI Platform Prediction.
For the simplest solution, you can embed the client on the website to collect user data and send it to a gateway deployed on App Engine. App
Engine provides a scalable and cost-effective solution to handle web requests. Then, you can deploy your model on AI Platform Prediction, which
can handle the required latency (300ms@p99) and provides a managed solution for serving machine learning models.
Option A might not provide the necessary security by directly accessing AI Platform Prediction from the client side. Options C and D introduce
additional complexity by adding a database layer (Cloud Bigtable and Memorystore, respectively) that is not necessary for the simplest solution, as
you can use the navigation context directly from the client.
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 112/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 113/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Your team is building a convolutional neural network (CNN)-based architecture from scratch. The preliminary experiments running on your on-
premises CPU-only infrastructure were encouraging, but have slow convergence. You have been asked to speed up model training to reduce time-
to-market. You want to experiment with virtual machines (VMs) on Google Cloud to leverage more powerful hardware. Your code does not include
any manual device placement and has not been wrapped in Estimator model-level abstraction. Which environment should you train your model on?
A. AVM on Compute Engine and 1 TPU with all dependencies installed manually.
B. AVM on Compute Engine and 8 GPUs with all dependencies installed manually.
C. A Deep Learning VM with an n1-standard-2 machine and 1 GPU with all libraries pre-installed.
D. A Deep Learning VM with more powerful CPU e2-highcpu-16 machines with all libraries pre-installed.
Correct Answer: A
ANS: C
https://cloud.google.com/deep-learning-vm/docs/cli#creating_an_instance_with_one_or_more_gpus
https://cloud.google.com/deep-learning-vm/docs/introduction#pre-installed_packages
upvoted 14 times
Selected Answer: C
Selected Answer: D
keyword: Your code does not include any manual device placement and has not been wrapped in Estimator model-level abstraction.
upvoted 1 times
Selected Answer: C
Went with C
upvoted 1 times
Selected Answer: C
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 114/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Critical sentence: Your code does not include any manual device placement and has not been wrapped in Estimator model-level abstraction.
It's C.
upvoted 1 times
Answer C
========
Explanation
"speed up model training" will make us biased towards GPU,TPU options
by options eliminations we may need to stay away of any manual installations , so using preconfigered deep learning will speed up time to market
upvoted 1 times
Selected Answer: A
the question is asking speed up time to market which can happen if model trains fast. so TPU VM can be a solution.
https://cloud.google.com/blog/products/compute/introducing-cloud-tpu-vms option A. if question asks most managed way than answer is deep
learning container with everything installed. C
upvoted 1 times
Selected Answer: C
C is correct.
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 115/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You work on a growing team of more than 50 data scientists who all use AI Platform. You are designing a strategy to organize your jobs, models,
and versions in a clean and scalable way. Which strategy should you choose?
A. Set up restrictive IAM permissions on the AI Platform notebooks so that only a single user or group can access a given instance.
B. Separate each data scientist's work into a different project to ensure that the jobs, models, and versions created by each data scientist are
C. Use labels to organize resources into descriptive categories. Apply a label to each created resource so that users can filter the results by
D. Set up a BigQuery sink for Cloud Logging logs that is appropriately filtered to capture information about AI Platform resource usage. In
BigQuery, create a SQL view that maps users to the resources they are using
Correct Answer: A
I think should be C,
As IAM roles are given to the entire AI Notebook resource, not to a specific instance.
upvoted 13 times
https://cloud.google.com/ai-platform/prediction/docs/resource-labels#overview_of_labels
You can add labels to your AI Platform Prediction jobs, models, and model versions, then use those labels to organize resources into categories
when viewing or monitoring the resources.
For example, you can label jobs by team (such as engineering or research) and development phase (prod or test), then filter the jobs based on the
team and phase.
Labels are also available on operations, but these labels are derived from the resource to which the operation applies. You cannot add or update
labels on an operation.
A label is a key-value pair, where both the key and the value are custom strings that you supp
upvoted 9 times
C
Although there are some questions where setting up a logging sink to BQ is the answer.
upvoted 1 times
Selected Answer: C
Went with C
upvoted 1 times
Selected Answer: C
Restricting access is not scalable and creates silos - better to document sharable resources through tagging, hence C.
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 116/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
C
Resource tagging/labeling is the best way to manage ML resources for medium/big data science teams.
upvoted 1 times
Selected Answer: C
https://cloud.google.com/ai-platform/prediction/docs/resource-labels#overview_of_labels
(A) applies only to notebooks wich is not enough
upvoted 4 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 117/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You are training a deep learning model for semantic image segmentation with reduced training time. While using a Deep Learning VM Image, you
receive the following error: The resource 'projects/deeplearning-platforn/zones/europe-west4-c/acceleratorTypes/nvidia-tesla-k80' was not found.
C. Ensure that you have preemptible GPU quota in the selected region.
D. Ensure that the selected GPU has enough GPU memory for the workload.
Correct Answer: A
ANS: B
https://cloud.google.com/deep-learning-vm/docs/troubleshooting#resource_not_found
https://cloud.google.com/compute/docs/gpus/gpu-regions-zones
Problem: You are trying to create an instance with one or more GPUs in a region where GPUs are not available (for example, an instance with a K80
GPU in europe-west4-c).
Solution: To determine which region has the required GPU, see GPUs on Compute Engine.
upvoted 21 times
Selected Answer: B
Selected Answer: B
Selected Answer: B
Went with B
upvoted 1 times
Selected Answer: B
B obviously
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 118/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Selected Answer: B
Selected Answer: A
the question is asking what should you do not why is the error.
Answer should be A. if you get that exception, make sure to check your limit for instance before running the job.
upvoted 1 times
https://cloud.google.com/deep-learning-vm/docs/troubleshooting#resource_not_found
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 119/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Your team is working on an NLP research project to predict political affiliation of authors based on articles they have written. You have a large
You followed the standard 80%-10%-10% data distribution across the training, testing, and evaluation subsets. How should you distribute the
training examples across the train-test-eval subsets while maintaining the 80-10-10 proportion?
A. Distribute texts randomly across the train-test-eval subsets: Train set: [TextA1, TextB2, ...] Test set: [TextA2, TextC1, TextD2, ...] Eval set:
B. Distribute authors randomly across the train-test-eval subsets: (*) Train set: [TextA1, TextA2, TextD1, TextD2, ...] Test set: [TextB1, TextB2,
C. Distribute sentences randomly across the train-test-eval subsets: Train set: [SentenceA11, SentenceA21, SentenceB11, SentenceB21,
SentenceC11, SentenceD21 ...] Test set: [SentenceA12, SentenceA22, SentenceB12, SentenceC22, SentenceC12, SentenceD22 ...] Eval set:
D. Distribute paragraphs of texts (i.e., chunks of consecutive sentences) across the train-test-eval subsets: Train set: [SentenceA11,
SentenceA12, SentenceD11, SentenceD12 ...] Test set: [SentenceA13, SentenceB13, SentenceB21, SentenceD23, SentenceC12, SentenceD13
...] Eval set: [SentenceA11, SentenceA22, SentenceB13, SentenceD22, SentenceC23, SentenceD11 ...]
Correct Answer: C
I think since we are predicting political leaning of authors, perhaps distributing authors make more sense? (B)
upvoted 18 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 120/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
it will need to really focus in author by author articles rather than get a single political affiliation based on a bunch of mixed articles from
different authors.
https://developers.google.com/machine-learning/crash-course/18th-century-literature
upvoted 12 times
Selected Answer: B
This is the best approach as it ensures that the data is distributed in a way that is representative of the overall population. By randomly distributing
authors across the subsets, we ensure that each subset has a similar distribution of political affiliations. This helps to minimize bias and increases
the likelihood that our model will generalize well to new data.
Distributing texts randomly or by sentences or paragraphs may result in subsets that are biased towards a particular political affiliation. This could
lead to overfitting and poor generalization performance. Therefore, it is important to distribute the data in a way that maintains the overall
distribution of political affiliations across the subsets.
upvoted 3 times
Selected Answer: B
Went with B
upvoted 1 times
Selected Answer: B
https://cloud.google.com/automl-tables/docs/prepare#split
https://developers.google.com/machine-learning/crash-course/18th-century-literature
upvoted 1 times
Ans B
The model is to predict which political party the author belongs to, not which political party the text belongs to... You do not have the information
of the political party of each text, you are assuming that the texts are associated with the political party of the author.
upvoted 1 times
Selected Answer: B
https://developers.google.com/machine-learning/crash-course/18th-century-literature
Split by authors, otherwise there will be data leakage - the model will get the ability to learn author specific use of language
upvoted 6 times
Based on this video I concluded that the answer is A. What answer B is saying is that you will have Author B's texts in the training set, Author A's
texts in the testing set and Author C's texts in the validation set. According to the video B is incorrect.
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 121/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
We want to have texts from author A in the training, testing and validation set. So A is correct. I think most people are choosing B because the
word "author" but let's be careful.
upvoted 2 times
For example, suppose you are training a model with purchase data from a number of stores. You know, however, that the model will be used
primarily to make predictions for stores that are not in the training data. To ensure that the model can generalize to unseen stores, you should
segregate your data sets by stores. In other words, your test set should include only stores different from the evaluation set, and the evaluation set
should include only stores different from the training set.
https://cloud.google.com/automl-tables/docs/prepare#ml-use
upvoted 4 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 122/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Your team has been tasked with creating an ML solution in Google Cloud to classify support requests for one of your platforms. You analyzed the
requirements and decided to use TensorFlow to build the classifier so that you have full control of the model's code, serving, and deployment. You
will use Kubeflow pipelines for the ML platform. To save time, you want to build on existing resources and use managed services instead of
building a completely new model. How should you build the classifier?
D. Use an established text classification model on AI Platform as-is to classify support requests.
Correct Answer: D
ANS: D
https://cloud.google.com/ai-platform/training/docs/algorithms
- to use TensorFlow
- to build on existing resources
- to use managed services
upvoted 11 times
ANS C:
upvoted 2 times
Selected Answer: C
Went with C
upvoted 3 times
Usage of Tensorflow, can build a simple model by using a sentence embedding and a single layer classifier.
upvoted 1 times
Selected Answer: D
Selected Answer: C
- "You analyzed the requirements and decided to use TensorFlow" this will make choices to reduce to C and D
- " so that you have full control of the model's code " will make us choose C
upvoted 2 times
Answer is C.
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 123/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
upvoted 1 times
Selected Answer: C
"full control of the model's code, serving, and deployment": Not A nor B.
and "you want to build on existing resources and use managed services": Not D (that's "as-is") You want transfer learning.
upvoted 3 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 124/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You recently joined a machine learning team that will soon release a new project. As a lead on the project, you are asked to determine the
production readiness of the ML components. The team has already tested features and data, model development, and infrastructure. Which
Correct Answer: A
I think it should be C
upvoted 20 times
Selected Answer: C
Monitoring is crucial. So - C
upvoted 2 times
Went with C
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 125/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
I'll go with C.
Monitoring model performance is an important aspect of production readiness. It allows the team to detect and respond to changes in
performance that may affect the quality of the model. The other options are also important, but they are more focused on the development phase
of the project rather than the production phase.
upvoted 1 times
It will be very rediculous if you launch model as production regardless of how we will have about monitoring. you will lauch model as production
for while and will make plan to model performance monitoring later ??? you are too reckless.
I think that your team ensure that all hypermarameters were turned yet when tested features... i think that it's more important that they ensure tha
model performance is monitored than thaining is reproducible for best practices.
https://cloud.google.com/architecture/ml-on-gcp-best-practices
upvoted 1 times
Reproducible Training is more likely to be in the Deployment step in that it referred to the question "The team has already tested features and data
model development" but the question focuses on Production readiness
https://developers.google.com/machine-learning/testing-debugging/pipeline/production
Monitor section is part of this above link
upvoted 1 times
C, for me.
upvoted 1 times
Selected Answer: C
It's mentioned that the team has already tested features and data, implying that data generation is reproducible. If you have to test features data
has to be reproducible to compare model outputs. ( https://developers.google.com/machine-learning/data-prep/construct/sampling-
splitting/randomization). Hence C makes more sense
upvoted 2 times
Selected Answer: C
https://cloud.google.com/ai-platform/docs/ml-solutions-overview
upvoted 1 times
Selected Answer: C
With the specific focus on "production readiness" as stated, I'd pick C above the others.
upvoted 2 times
Selected Answer: A
"production readiness" means that we are still in dev-test phase , and "performance
monitoring" happens in production , and what if monitoring is applied but the model re-train is difficult , so "A" is the best answer
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 126/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 127/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You work for a credit card company and have been asked to create a custom fraud detection model based on historical data using AutoML Tables.
You need to prioritize detection of fraudulent transactions while minimizing false positives. Which optimization objective should you use when
C. An optimization objective that maximizes the area under the precision-recall curve (AUC PR) value
D. An optimization objective that maximizes the area under the receiver operating characteristic curve (AUC ROC) value
Correct Answer: C
https://neptune.ai/blog/f1-score-accuracy-roc-auc-pr-auc
upvoted 20 times
D - https://en.wikipedia.org/wiki/Receiver_operating_characteristic
upvoted 8 times
In contrast, the AUC PR curve focuses on the trade-off between precision (Y-axis) and recall (X-axis), making it more suitable for imbalanced
datasets and applications with different costs for false positives and false negatives, like credit card fraud detection.
upvoted 2 times
In the case of credit card fraud detection, the class distribution is typically imbalanced (fewer fraudulent transactions compared to non-
fraudulent ones), and the cost of false positives (incorrectly identifying a transaction as fraudulent) and false negatives (failing to detect a
fraudulent transaction) are not the same.
By maximizing the AUC PR (area under the precision-recall curve), the model focuses on the trade-off between precision (proportion of true
positives among predicted positives) and recall (proportion of true positives among actual positives), which is more relevant in imbalanced
datasets and for applications where the costs of false positives and false negatives are not equal. This makes option C a better choice for credit
card fraud detection.
upvoted 2 times
Selected Answer: C
In fraud detection, it's crucial to minimize false positives (transactions flagged as fraudulent but are actually legitimate) while still detecting as man
fraudulent transactions as possible. AUC PR is a suitable optimization objective for this scenario because it provides a balanced trade-off between
precision and recall, which are both important metrics in fraud detection. A high AUC PR value indicates that the model has high precision and
recall, which means it can detect a large number of fraudulent transactions while minimizing false positives.
Log loss (A) and AUC ROC (D) are also commonly used optimization objectives in machine learning, but they may not be as effective in this
particular scenario. Precision at a Recall value of 0.50 (B) is a specific metric and not an optimization objective.
upvoted 4 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 128/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Selected Answer: C
Went with C
upvoted 1 times
Hi Everyone
I discover, there are some clues that this question is likely to refer to the last section of https://developers.google.com/machine-learning/crash-
course/classification/roc-and-auc
This is what it tries to tell us especially with the last sentence
Classification-threshold invariance is not always desirable. In cases where there are wide disparities in the cost of false negatives vs. false positives,
it may be critical to minimize one type of classification error. For example, when doing email spam detection, you likely want to prioritize
minimizing false positives (even if that results in a significant increase of false negatives). AUC isn't a useful metric for this type of optimization.
Additionally, it tells me which of the following choices is the answer to this question as below.
https://cloud.google.com/automl-tables/docs/train#opt-obj.
upvoted 1 times
Selected Answer: D
What is different however is that ROC AUC looks at a true positive rate TPR and false positive rate FPR while PR AUC looks at positive predictive
value PPV and true positive rate TPR.
https://neptune.ai/blog/f1-score-accuracy-roc-auc-pr-
auc#:~:text=ROC%20AUC%20vs%20PR%20AUC&text=What%20is%20different%20however%20is,and%20true%20positive%20rate%20TPR
upvoted 1 times
Selected Answer: C
AUC PR Optimize results for predictions for the less common class.
it is straightforward to answer, you just have to capture key word to get the right way. (Almost banlanced Or Imbalanced)
https://machinelearningmastery.com/roc-curves-and-precision-recall-curves-for-classification-in-python/
ROC curves should be used when there are roughly equal numbers of observations for each class.
Precision-Recall curves should be used when there is a moderate to large class imbalance.
upvoted 3 times
Selected Answer: C
Selected Answer: C
ans: C
Paul_Dirac and giaZ are correct.
upvoted 1 times
C
https://towardsdatascience.com/on-roc-and-precision-recall-curves-c23e9b63820c
upvoted 2 times
Selected Answer: D
D
https://en.wikipedia.org/wiki/Receiver_operating_characteristic
C optimize precision only
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 129/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
upvoted 1 times
Answer is c.
upvoted 1 times
AUC PR: Optimize results for predictions for the less common class.
upvoted 3 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 130/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Your company manages a video sharing website where users can watch and upload videos. You need to create an ML model to predict which
newly uploaded videos will be the most popular so that those videos can be prioritized on your company's website. Which result should you use to
A. The model predicts videos as popular if the user who uploads them has over 10,000 likes.
B. The model predicts 97.5% of the most popular clickbait videos measured by number of clicks.
C. The model predicts 95% of the most popular videos measured by watch time within 30 days of being uploaded.
D. The Pearson correlation coefficient between the log-transformed number of views after 7 days and 30 days after publication is equal to 0.
Correct Answer: C
Selected Answer: C
Went with C
upvoted 1 times
Selected Answer: C
Must be C
upvoted 1 times
Selected Answer: C
watch time among all other options is the most KPI to rely on
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 131/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
D is wrong.
Pearson's Correlation Coefficient is a linear correlation coefficient that returns a value of between -1 and +1.
A -1 means there is a strong negative correlation
+1 means that there is a strong positive correlation
0 means that there is no correlation
upvoted 3 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 132/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You are working on a Neural Network-based project. The dataset provided to you has columns with different ranges. While preparing the data for
model training, you discover that gradient optimization is having difficulty moving weights to a good solution. What should you do?
C. Improve the data cleaning step by removing features with missing values.
D. Change the partitioning step to reduce the dimension of the test set and have a larger training set.
Correct Answer: C
Vote for B. We could impute instead of remove the column to avoid loss of information
upvoted 25 times
I also think it is B:
"The presence of feature value X in the formula will affect the step size of the gradient descent. The difference in ranges of features will cause
different step sizes for each feature. To ensure that the gradient descent moves smoothly towards the minima and that the steps for gradient
descent are updated at the same rate for all the features, we scale the data before feeding it to the model."
upvoted 8 times
Selected Answer: B
B - The key phrase is "different ranges", therefore we need to normalize the values.
upvoted 1 times
Selected Answer: B
Went with B
upvoted 1 times
Selected Answer: B
Normalization
upvoted 1 times
Selected Answer: B
Selected Answer: C
B
"Normalization" is the keyword
upvoted 1 times
Selected Answer: B
normalization
https://developers.google.com/machine-learning/data-prep/transform/transform-numeric
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 133/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
upvoted 4 times
Normalization changes the values of dataset's numeric fields to be in a common scale, without impacting differences in the ranges of values.
Normalization is required only when features have different ranges.
upvoted 4 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 134/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Your data science team needs to rapidly experiment with various features, model architectures, and hyperparameters. They need to track the
accuracy metrics for various experiments and use an API to query the metrics over time. What should they use to track and report their
A. Use Kubeflow Pipelines to execute the experiments. Export the metrics file, and query the results using the Kubeflow Pipelines API.
B. Use AI Platform Training to execute the experiments. Write the accuracy metrics to BigQuery, and query the results using the BigQuery API.
C. Use AI Platform Training to execute the experiments. Write the accuracy metrics to Cloud Monitoring, and query the results using the
Monitoring API.
D. Use AI Platform Notebooks to execute the experiments. Collect the results in a shared Google Sheets file, and query the results using the
Correct Answer: B
ANS: A
https://codelabs.developers.google.com/codelabs/cloud-kubeflow-pipelines-gis
Kubeflow Pipelines (KFP) helps solve these issues by providing a way to deploy robust, repeatable machine learning pipelines along with
monitoring, auditing, version tracking, and reproducibility. Cloud AI Pipelines makes it easy to set up a KFP installation.
upvoted 12 times
Selected Answer: A
Old answer is A. New answer (not available) would be Virtex AI experiments which comes with monitoring API inbuilt.
https://cloud.google.com/blog/topics/developers-practitioners/track-compare-manage-experiments-vertex-ai-experiments
upvoted 9 times
Selected Answer: C
Selected Answer: A
I agree with tavva_prudhvi that cloud monitoring is not the best option to do machine learning tracking, Metadata is a better option for that
purpose
upvoted 1 times
Option C suggests using AI Platform Training to execute the experiments and write the accuracy metrics to Cloud Monitoring. While Cloud
Monitoring can be used to monitor and collect metrics from various services in Google Cloud, it is not specifically designed for machine learning
experiments tracking.
Using Cloud Monitoring for tracking machine learning experiments may not provide the same level of functionality and flexibility as Kubeflow
Pipelines or AI Platform Training. Additionally, querying the results from Cloud Monitoring may not be as straightforward as using the APIs
provided by Kubeflow Pipelines or AI Platform Training.
Therefore, while Cloud Monitoring can be used as a general-purpose monitoring solution, it may not be the best option for tracking and reporting
machine learning experiments.
upvoted 2 times
Went with A
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 135/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Selected Answer: B
It is B
upvoted 1 times
Selected Answer: C
Selected Answer: C
I like C
https://cloud.google.com/monitoring/mql
upvoted 1 times
kubeflow pipelines has already experiment tracking API , so A is the correct , B is valid also but the question states "minimizing manual effort"
upvoted 2 times
Selected Answer: A
I think A is correct unless we are using Bigquery ML to create our models, we can select C
upvoted 2 times
> "Kubeflow Pipelines supports the export of scalar metrics. You can write a list of metrics to a local file to describe the performance of the model.
The pipeline agent uploads the local file as your run-time metrics. You can view the uploaded metrics as a visualization in the Runs page for a
particular experiment in the Kubeflow Pipelines UI."
https://www.kubeflow.org/docs/components/pipelines/sdk/pipelines-metrics/
upvoted 4 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 136/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You work for a bank and are building a random forest model for fraud detection. You have a dataset that includes transactions, of which 1% are
identified as fraudulent. Which data transformation strategy would likely improve the performance of your classifier?
Correct Answer: C
Reference:
https://towardsdatascience.com/how-to-build-a-machine-learning-model-to-identify-credit-card-fraud-in-5-stepsa-hands-on-modeling-
5140b3bd19f1
C - https://swarit.medium.com/detecting-fraudulent-consumer-transactions-through-machine-learning-25b1f2cabbb4
upvoted 13 times
Selected Answer: C
C is the answer
upvoted 5 times
Selected Answer: C
Oversampling increases the number of fraudulent transaction in the training data to enable the machine to learn how to predict them
upvoted 1 times
C - Even though most similar questions propose to downsample the majority (not fraudulent) and add weights to it.
upvoted 1 times
Went with C
upvoted 2 times
Selected Answer: C
ans: C
C
https://medium.com/analytics-vidhya/credit-card-fraud-detection-how-to-handle-imbalanced-dataset-1f18b6f881
upvoted 1 times
Selected Answer: C
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 137/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You are using transfer learning to train an image classifier based on a pre-trained EfficientNet model. Your training dataset has 20,000 images.
You plan to retrain the model once per day. You need to minimize the cost of infrastructure. What platform components and configuration
C. A Google Kubernetes Engine cluster with a V100 GPU Node Pool and an NFS Server
D. An AI Platform Training job using a custom scale tier with 4 V100 GPUs and Cloud Storage
Correct Answer: C
Selected Answer: D
ans: D
A, C => local storage, NFS... discarded. Google encourages you to use Cloud Storage.
B => could do the job, but here I would focus on the "daily training" thing, because Vertex AI Training jobs are better for this. Also I think that
Google usually encourages to use Vertex AI over VMs.
upvoted 9 times
Selected Answer: D
Selected Answer: A
Selected Answer: A
I think it is A. Refer to Q20 of the GCP Sample Questions - they say managed services (such as Kubeflow Pipelines / Vertex AI) are not the options
for 'minimizing costs'. In this case, you should configure your own infrastructure to train the model leaving A,B. Undecided between A,B because A
would minimize costs, but also result in inefficient I/O operations during training.
upvoted 2 times
Selected Answer: D
The pre-trained EfficientNet model can be easily loaded from Cloud Storage, which eliminates the need for local storage or an NFS server. Using A
Platform Training allows for the automatic scaling of resources based on the size of the dataset, which can save costs compared to using a fixed-
size VM or node pool. Additionally, the ability to use custom scale tiers allows for fine-tuning of resource allocation to match the specific needs of
the training job.
upvoted 2 times
Selected Answer: D
Went with D
upvoted 1 times
For this scenario, a Deep Learning VM with 4 V100 GPUs and Cloud Storage is likely the most cost-effective solution while still providing sufficient
computing resources for the model training. Using Cloud Storage can allow the model to be trained and the data to be stored in a scalable and
cost-effective way.
Option A, using a Deep Learning VM with local storage, may not provide enough storage capacity to store the training data and model
checkpoints. Option C, using a Kubernetes Engine cluster, can be overkill for the size of the job and adds additional complexity. Option D, using an
AI Platform Training job, is a good option as it is designed for running machine learning jobs at scale, but may be more expensive than a Deep
Learning VM with Cloud Storage.
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 138/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
upvoted 2 times
Selected Answer: D
Selected Answer: D
it seems D
upvoted 3 times
Selected Answer: D
I think it's D
upvoted 2 times
It's D
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 139/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
While conducting an exploratory analysis of a dataset, you discover that categorical feature A has substantial predictive power, but it is
A. Drop feature A if more than 15% of values are missing. Otherwise, use feature A as-is.
B. Compute the mode of feature A and then use it to replace the missing values in feature A.
C. Replace the missing values with the values of the feature with the highest Pearson correlation with feature A.
D. Add an additional class to categorical feature A for missing values. Create a new binary feature that indicates whether feature A is missing.
Correct Answer: A
Selected Answer: D
ans: D
A => no, you don't want to drop a feature with high prediction power.
B => i think this could confuse the model... a better solution could be to fill missing values using an algorithm like Expectation Maximization, but
using the mode i think is a bad idea in this case, because if you have a significant number of missing values (for example >10%) this would modify
the "predictive power". you don't want to lose predictive power of a feature, just guide the model to learn when to use that feature and when to
ignore it.
C => this doesn't make any sense for me. not sure what i would do that.
D => i think this could be a really good approach, and i'm pretty sure it would work pretty well a lot of models. the model would learn that when
"is_available_feat_A" == True, then it would use the feature A, but whenever it is missing then it would try to use other features.
upvoted 13 times
Selected Answer: B
Google encourages filling missing value and using mode is one of the examples given. D only tell the obvious - data is missing!
upvoted 1 times
Selected Answer: D
Selected Answer: D
highly predictive
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 140/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Selected Answer: B
By imputing the missing values with the mode (the most frequent value), you retain the original feature's predictive power while handling the
missing values
upvoted 1 times
Both B and D are possible, but the correct answer is D because of the feature high predictive power.
upvoted 2 times
Went with D
upvoted 1 times
Similarly, When a categorical feature has substantial predictive power, it is important not to discard it. Instead, missing values can be handled by
adding an additional class for missing values and creating a new binary feature that indicates whether feature A is missing or not. This approach
ensures that the predictive power of feature A is retained while accounting for missing values. Computing the mode of feature A and replacing
missing values may distort the distribution of the feature and create bias in the analysis. Similarly, replacing missing values with values from
another feature may introduce noise and lead to incorrect results.
upvoted 2 times
If our objective was to produce a complete dataset then we might use some average value to fill in the gaps (option B) but in this case we want to
predict an outcome, so inventing our own data is not going to help in my view.
Option D is the most sensible approach to let the model choose the best features.
upvoted 1 times
Selected Answer: B
B
"For categorical variables, we can usually replace missing values with mean, median, or most frequent values"
Dr. Logan Song - Journey to Become a Google Cloud Machine Learning Engineer - Page 48
upvoted 4 times
I agree with B
upvoted 3 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 141/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You work for a large retailer and have been asked to segment your customers by their purchasing habits. The purchase history of all customers
has been uploaded to BigQuery. You suspect that there may be several distinct customer segments, however you are unsure of how many, and you
don’t yet understand the commonalities in their behavior. You want to find the most efficient solution. What should you do?
A. Create a k-means clustering model using BigQuery ML. Allow BigQuery to automatically optimize the number of clusters.
B. Create a new dataset in Dataprep that references your BigQuery table. Use Dataprep to identify similarities within each column.
C. Use the Data Labeling Service to label each customer record in BigQuery. Train a model on your labeled data using AutoML Tables. Review
the evaluation metrics to understand whether there is an underlying pattern in the data.
D. Get a list of the customer segments from your company’s Marketing team. Use the Data Labeling Service to label each customer record in
BigQuery according to the list. Analyze the distribution of labels in your dataset using Data Studio.
Correct Answer: B
Selected Answer: A
Selected Answer: A
Went with A
upvoted 3 times
Selected Answer: A
when to use k-means : Your data may contain natural groupings or clusters of data. You may want to identify these groupings descriptively in orde
to make data-driven decisions. For example, a retailer may want to identify natural groupings of customers who have similar purchasing habits or
locations. This process is known as customer segmentation.
https://cloud.google.com/bigquery/docs/kmeans-tutorial
upvoted 2 times
Selected Answer: A
A
https://cloud.google.com/bigquery-ml/docs/kmeans-tutorial
https://towardsdatascience.com/how-to-use-k-means-clustering-in-bigquery-ml-to-understand-and-describe-your-data-better-c972c6f5733b
upvoted 3 times
Selected Answer: A
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 142/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
A => BigQuery ML is compatible with kmeans clustering, it's easy and efficient to create, and i would automatically detect the number of clusters.
Also from the BigQuery ML docs: "K-means clustering for data segmentation; for example, identifying customer segments."
(Source: https://cloud.google.com/bigquery-ml/docs/introduction#supported_models_in)
upvoted 4 times
We can usa K-means directly in BQ, so I think it's "the most efficient way"
Labeling is not a good option since we don't really know what make a customer similar to another, and why dataprep if we can use directly BQ?
upvoted 3 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 143/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You recently designed and built a custom neural network that uses critical dependencies specific to your organization’s framework. You need to
train the model using a managed training service on Google Cloud. However, the ML framework and related dependencies are not supported by AI
Platform Training. Also, both your model and your data are too large to fit in memory on a single machine. Your ML framework of choice uses the
scheduler, workers, and servers distribution structure. What should you do?
C. Build your custom containers to run distributed training jobs on AI Platform Training.
D. Reconfigure your code to a ML framework with dependencies that are supported by AI Platform Training.
Correct Answer: D
Selected Answer: C
Answer C. By running your machine learning (ML) training job in a custom container, you can use ML frameworks, non-ML dependencies, libraries,
and binaries that are not otherwise supported on Vertex AI.
Model and your data are too large to fit in memory on a single machine hence distributed training jobs.
https://cloud.google.com/vertex-ai/docs/training/containers-overview
upvoted 5 times
Selected Answer: C
This allows using external dependences and distributed training will solve the memory issues
upvoted 1 times
Selected Answer: C
Went with C
upvoted 1 times
Selected Answer: C
ans: C
I think it's C
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 144/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
While monitoring your model training’s GPU utilization, you discover that you have a native synchronous implementation. The training data is split
into multiple files. You want to reduce the execution time of your input pipeline. What should you do?
Correct Answer: A
Selected Answer: D
It's D
https://www.tensorflow.org/guide/data_performance
upvoted 6 times
Selected Answer: D
"training data split into multiple files", "reduce the execution time of your input pipeline" -> Parallel interleave
upvoted 1 times
Selected Answer: D
Went with D
upvoted 1 times
Selected Answer: D
I think it's D
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 145/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Your data science team is training a PyTorch model for image classification based on a pre-trained RestNet model. You need to perform
hyperparameter tuning to optimize for several parameters. What should you do?
A. Convert the model to a Keras model, and run a Keras Tuner job.
C. Create a Kuberflow Pipelines instance, and run a hyperparameter tuning job on Katib.
D. Convert the model to a TensorFlow model, and run a hyperparameter tuning job on AI Platform.
Correct Answer: C
Selected Answer: B
Selected Answer: B
Went with B
upvoted 1 times
Selected Answer: B
C:
Don't wast your time to convert to other framework, you can use it on custom container absolutely.
https://cloud.google.com/blog/topics/developers-practitioners/pytorch-google-cloud-how-train-and-tune-pytorch-models-vertex-ai
upvoted 3 times
Selected Answer: B
ans: B
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 146/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You have a large corpus of written support cases that can be classified into 3 separate categories: Technical Support, Billing Support, or Other
Issues. You need to quickly build, test, and deploy a service that will automatically classify future written requests into one of the categories. How
A. Use the Cloud Natural Language API to obtain metadata to classify the incoming cases.
B. Use AutoML Natural Language to build and test a classifier. Deploy the model as a REST API.
C. Use BigQuery ML to build and test a logistic regression model to classify incoming requests. Use BigQuery ML to perform inference.
D. Create a TensorFlow model using Google’s BERT pre-trained model. Build and test a classifier, and deploy the model using Vertex AI.
Correct Answer: B
Selected Answer: B
ans: B
B => AutoML is easier and faster and "you need to quickly build, test, and deploy". Also the REST API part fits our use case.
upvoted 7 times
Went with B
upvoted 1 times
Selected Answer: B
Went with B
upvoted 1 times
Selected Answer: B
I think it's B, but I don't understand why it doesn't suggest to deploy the model on Vertex AI instead of as a REST API.
upvoted 1 times
Selected Answer: B
Selected Answer: B
B
wish0035 explained
upvoted 3 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 147/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Selected Answer: B
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 148/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You need to quickly build and train a model to predict the sentiment of customer reviews with custom categories without writing code. You do not
have enough data to train a model from scratch. The resulting model should have high predictive performance. Which service should you use?
Correct Answer: A
Selected Answer: B
AutoML does not have transfer learning capabilities as of now. Given that there are
not enough data to train from scratch, B is the only option that makes sense.
upvoted 1 times
This suitable job for AutoML, it used transfer learning when there is small data for training.
upvoted 1 times
Selected Answer: A
AutoML Natural Language is designed to work well even with relatively small datasets. It uses transfer learning and other techniques to train
models effectively on limited data, which is crucial since there's enough data to train a model from scratch.
upvoted 3 times
Custom models and custom categories and hence AutoML natural language, It would still work with less data
upvoted 1 times
Went with A
upvoted 1 times
Selected Answer: A
If you do not have enough data to train a model from scratch, then it may be more appropriate to use a pre-trained model or a pre-made Jupyter
Notebook.
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 149/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Option B, the Cloud Natural Language API, could still be a viable option if you have access to labeled data for sentiment analysis. The API provides
pre-trained models for sentiment analysis that you can use to classify text. However, if you have custom categories or labels, then you would need
to train a custom model, which may not be feasible with limited data.
upvoted 4 times
Selected Answer: A
https://www.toptal.com/machine-learning/google-nlp-
tutorial#:~:text=Google%20Natural%20Language%20API%20vs.&text=Google%20AutoML%20Natural%20Language%20is,t%20require%20machin
e%20learning%20knowledge.
In this case need custom categories without writing code
upvoted 2 times
Quickly ==> A and B and custom categories + you do not have enough data to train a model (it doesn't mean no data to train) it will probably
have a few samples Let's say 10 samples) as this link https://cloud.google.com/natural-language/automl/docs/beginners-guide#include-enough-
labeled-examples-in-each-category
==> A
upvoted 2 times
Selected Answer: B
Quickly ==> A and B and custom categories + you do not have enough data to train a model (it doesn't mean no data to train) it will probably
have a few samples Let's say 10 samples)
==> B
upvoted 1 times
Selected Answer: A
A
wish0035 explained
upvoted 1 times
ans: A
A => AutoML can train with very little data ("The bare minimum required by AutoML Natural Language for training is 10 text examples per
category/label"), as seifou says it will probably use transfer learning behind the scenes.
upvoted 4 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 150/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You need to build an ML model for a social media application to predict whether a user’s submitted profile photo meets the requirements. The
application will inform the user if the picture meets the requirements. How should you build a model to ensure that the application does not
A. Use AutoML to optimize the model’s recall in order to minimize false negatives.
B. Use AutoML to optimize the model’s F1 score in order to balance the accuracy of false positives and false negatives.
C. Use Vertex AI Workbench user-managed notebooks to build a custom model that has three times as many examples of pictures that meet
D. Use Vertex AI Workbench user-managed notebooks to build a custom model that has three times as many examples of pictures that do not
Correct Answer: C
Selected Answer: A
The answer is A. The negative event is usually labeled as positive (e.g., fraud detection, customer default prediction, and here non-compliant
picture identification). The question explicitly says, "ensure that the application does not falsely accept a non-compliant picture." So we should
avoid falsely labeling a non-compliant image as compliant (negative).
It is never mentioned in the question that false positives are also a concern. So, recall is better than F1-score for this problem.
upvoted 11 times
Selected Answer: B
Gonna go with B on this one, tricky question but since reducing false positives is the goal here only B fits that requirement
upvoted 1 times
I went with A.
upvoted 2 times
B.
A non-compliant picture is the positive and not the negative. What the question is asking is to decrease the number of false positives ("falsely
labeled as non compliant"), which is achieved through optimizing for precision and not recall. Since C and D sound a bit overkill, I would go for the
one that prioritizes false positives which is B.
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 151/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Optimising for false positives is the goal here which should have been precision. Since precision is not available in options, the next best is F1 score
which is harmonic mean of precision and recall. Although it wont fully satisfy the false positives it atleast wont skew towards recall which is more
false positives that deviates from the goal. Hence B
upvoted 1 times
We should optimize for precision to minimize false positives, so optimizing for recall should be incorrect. F1 Score will balance both precision and
recall. Both B and C might not necessarily meet the goal
upvoted 1 times
Selected Answer: A
I vote B
upvoted 1 times
Selected Answer: A
Selected Answer: B
B should be correct. It covers not only the recall but also the precision
upvoted 1 times
Selected Answer: A
Selected Answer: B
In this scenario, it is important to balance the accuracy of false positives (where a non-compliant picture is accepted) and false negatives (where a
compliant picture is rejected). By optimizing the F1 score, the model will find the best balance between precision and recall, which will help reduce
both false positives and false negatives. This will ensure that the application doesn't falsely accept a non-compliant picture.
upvoted 2 times
B. Use AutoML to optimize the model’s F1 score in order to balance the accuracy of false positives and false negatives. This ensures that the
model is not overly biased towards accepting or rejecting pictures and provides a balanced approach to handling both types of errors. Howeve
if the priority is strongly weighted towards not accepting non-compliant pictures, then: Dcould be the better approach, as it would likely
improve the model's ability to correctly identify non-compliant pictures.
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 152/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
B. Use AutoML to optimize the model’s F1 score in order to balance the accuracy of false positives and false negatives.
Can't be A. The sentence "ensure that the application does not falsely accept a non-compliant picture" stattes that we don't want false positives (so
I'm less concern about false negatives). A false positive will be that one classfied as compliant when is not compliant. I interpret that the positive
class will be that one that "meets the requirements" as stated as well.
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 153/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You lead a data science team at a large international corporation. Most of the models your team trains are large-scale models using high-level
TensorFlow APIs on AI Platform with GPUs. Your team usually takes a few weeks or months to iterate on a new version of a model. You were
recently asked to review your team’s spending. How should you reduce your Google Cloud compute costs without impacting the model’s
performance?
C. Migrate to training with Kuberflow on Google Kubernetes Engine, and use preemptible VMs with checkpoints.
D. Migrate to training with Kuberflow on Google Kubernetes Engine, and use preemptible VMs without checkpoints.
Correct Answer: D
Selected Answer: C
https://cloud.google.com/blog/products/ai-machine-learning/reduce-the-costs-of-ml-workflows-with-preemptible-vms-and-gpus?hl=en
upvoted 9 times
Selected Answer: C
Pre-emptive VMs are cheaper and checkpoints will enable termination if the result is acceptable
upvoted 2 times
Selected Answer: C
Selected Answer: C
Went with C
upvoted 1 times
Selected Answer: C
https://cloud.google.com/ai-platform/prediction/docs/ai-explanations/overview
AI Explanations helps you understand your model's outputs for classification and regression tasks. Whenever you request a prediction on AI
Platform, AI Explanations tells you how much each feature in the data contributed to the predicted result. You can then use this information to
verify that the model is behaving as expected, recognize bias in your models, and get ideas for ways to improve your model and your training data
upvoted 1 times
preemtible vm are valid for 24hrs. Hence training needs months to complete which is mentioned in question that makes A is answer.
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 154/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Selected Answer: A
It's A.
upvoted 2 times
Selected Answer: C
It's seem C
- https://www.kubeflow.org/docs/distributions/gke/pipelines/preemptible/
- https://cloud.google.com/optimization/docs/guide/checkpointing
upvoted 4 times
Selected Answer: C
C - Reduce cost with preemptive instances and add checkpoints to snapshot intermediate results
upvoted 3 times
Selected Answer: A
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 155/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You need to train a regression model based on a dataset containing 50,000 records that is stored in BigQuery. The data includes a total of 20
categorical and numerical features with a target variable that can include negative values. You need to minimize effort and training time while
maximizing model performance. What approach should you take to train this regression model?
D. Use AutoML Tables to train the model with RMSLE as the optimization objective.
Correct Answer: A
Selected Answer: B
Went with B
upvoted 1 times
Selected Answer: B
Went with B
upvoted 1 times
Selected Answer: B
Ans B.
C --> No early stopping means longer training time
D --> RMSLE metric need non-negative Y values
upvoted 3 times
Selected Answer: B
B and C is the most likely because of regression approach, But RMSLE it not allow you to take negative label to train as
https://cloud.google.com/automl-tables/docs/evaluate#evaluation_metrics_for_regression_models
RMSLE: The root-mean-squared logarithmic error metric is similar to RMSE, except that it uses the natural logarithm of the predicted and actual
values plus 1. RMSLE penalizes under-prediction more heavily than over-prediction. It can also be a good metric when you don't want to penalize
differences for large prediction values more heavily than for small prediction values. This metric ranges from zero to infinity; a lower value indicates
a higher quality model.
The RMSLE evaluation metric is returned only if all label and predicted values are non-negative.
upvoted 1 times
Selected Answer: D
BQML XGBoost ==> you have to take sql knowlege to write statement and B didn't mention how to get mx performance. Meanwhile
AutoML you just click and select, click and select, click and select to get it done. and D refers to measurement to get maximizing model
performance. you can minimize effort literally
upvoted 1 times
Using AutoML Tables to train the model can be a convenient and efficient way to minimize effort and training time while still maximizing model
performance. In this case, using RMSLE as the optimization objective can be a good choice because it is a good fit for regression models with
negative values in the target variable.
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 156/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
B is correct
upvoted 3 times
B is correct
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 157/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You are building a linear model with over 100 input features, all with values between –1 and 1. You suspect that many features are non-
informative. You want to remove the non-informative features from your model while keeping the informative ones in their original form. Which
A. Use principal component analysis (PCA) to eliminate the least informative features.
C. After building your model, use Shapley values to determine which features are the most informative.
D. Use an iterative dropout technique to identify which features do not degrade the model when removed.
Correct Answer: B
Selected Answer: B
Selected Answer: B
Went with B
upvoted 1 times
Went with B
upvoted 1 times
Selected Answer: B
L1 regularization penalises weights in proportion to the sum of the absolute value of the weights. L1 regularization helps drive the weights of
irrelevant or barely relevant features to exactly 0. A feature with a weight of 0 is effectively removed from the model.
https://developers.google.com/machine-learning/glossary#L1_regularization
upvoted 1 times
Selected Answer: B
it's a best way, becouse you reduce features non relevant in this case non-informatives
upvoted 1 times
Selected Answer: C
Answer C:
In the official sample questions, there's a similar question, the explanation is that L! is for reducing overfitting while explainability (shapely) is for
feature selection, hence C.
https://docs.google.com/forms/d/e/1FAIpQLSeYmkCANE81qSBqLW0g2X7RoskBX9yGYQu-m1TtsjMvHabGqg/viewform
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 158/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
upvoted 3 times
Selected Answer: A
The features must be removed from the model. They are not removed when doing L1 regularization. PCA is used prior to training.
upvoted 2 times
Selected Answer: B
Agree with B
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 159/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You work for a global footwear retailer and need to predict when an item will be out of stock based on historical inventory data Customer behavior
is highly dynamic since footwear demand is influenced by many different factors. You want to serve models that are trained on all available data,
but track your performance on specific subsets of data before pushing to production. What is the most streamlined and reliable way to perform
this validation?
A. Use then TFX ModelValidator tools to specify performance metrics for production readiness.
B. Use k-fold cross-validation as a validation strategy to ensure that your model is ready for production.
C. Use the last relevant week of data as a validation set to ensure that your model is performing accurately on current data.
D. Use the entire dataset and treat the area under the receiver operating characteristics curve (AUC ROC) as the main metric.
Correct Answer: B
https://www.tensorflow.org/tfx/guide/evaluator
upvoted 12 times
Selected Answer: C
Selected Answer: A
Evaluator TFX lets you evaluate the performance on different subsets of data https://www.tensorflow.org/tfx/guide/evaluator
upvoted 2 times
The Evaluator TFX pipeline component performs deep analysis on the training results for your models, to help you understand how your model
performs on subsets of your data.
upvoted 2 times
Selected Answer: A
I prefer A to C because 1 week of data may be insufficient to generalize the model and could lead to overfitting on the validation subset.
upvoted 3 times
option C provides a streamlined and reliable approach that focuses on evaluating the model's performance on the most relevant and recent data,
which is essential for predicting out-of-stock events in a dynamic retail setting.
upvoted 1 times
Selected Answer: A
Either A or C but C is only last week which is not specific data sets
upvoted 1 times
Selected Answer: C
Option C, because it allows you to track your model's performance on the most *recent* data, which is the most relevant data for predicting
stockout risk. Given that the preferences are dynamic, the most important thing is that the model WORKS correctly with the newest data
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 160/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Selected Answer: A
The answer is A. Performance on specific subsets of data before pushing to production == TFX ModelValidator with custom performance metrics
for production readiness.
C is wrong because performance in the last relevant week of data != performance on specific subsets of data.
upvoted 1 times
I will go for A. I don't think the aim of the question is to test if the candidates know whether or not a component is deprecated . Note that
ModelValidator has been fused with Evaluator. So we can imagine, the question would have been updated in recent exams. Evaluator enables
testing on specific subsets with the metrics we want, then indicates to Pusher component to push the new model to production if "model is good
enough". This would make the pipeline quite streamlined (https://www.tensorflow.org/tfx/guide/evaluator)
Went with C
upvoted 1 times
A is deprecated.. so C
upvoted 1 times
Went with A
upvoted 2 times
Selected Answer: A
TFX ModelValidator
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 161/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You have deployed a model on Vertex AI for real-time inference. During an online prediction request, you get an “Out of Memory” error. What
Correct Answer: C
Selected Answer: B
B is the answer
429 - Out of Memory
https://cloud.google.com/ai-platform/training/docs/troubleshooting
upvoted 20 times
Selected Answer: B
By reducing the batch size of instances sent for prediction, you decrease the memory footprint of each request, potentially alleviating the out-of-
memory issue. However, be mindful that excessively reducing the batch size might impact the efficiency of your prediction process.
upvoted 1 times
Went with B
upvoted 1 times
If you are getting an "Out of Memory" error during an online prediction request, it suggests that the amount of data you are sending in each
request is too large and is exceeding the available memory. To resolve this issue, you can try sending the request again with a smaller batch of
instances. This reduces the amount of data being sent in each request and helps avoid the out-of-memory error. If the problem persists, you can
also try increasing the machine type or the number of instances to provide more resources for the prediction service.
upvoted 2 times
Selected Answer: C
This question is about prediction not training - and specifically it's about _online_ prediction (aka realtime serving).
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 162/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
https://cloud.google.com/ai-platform/training/docs/troubleshooting
upvoted 2 times
Selected Answer: B
Selected Answer: B
https://cloud.google.com/ai-platform/training/docs/troubleshooting#http_status_codes
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 163/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You work at a subscription-based company. You have trained an ensemble of trees and neural networks to predict customer churn, which is the
likelihood that customers will not renew their yearly subscription. The average prediction is a 15% churn rate, but for a particular customer the
model predicts that they are 70% likely to churn. The customer has a product usage history of 30%, is located in New York City, and became a
customer in 1997. You need to explain the difference between the actual prediction, a 70% churn rate, and the average prediction. You want to use
D. Measure the effect of each feature as the weight of the feature multiplied by the feature value.
Correct Answer: A
Selected Answer: B
Sampled Shapley explanations offer a more sophisticated and model-agnostic method for understanding feature importance and contributions to
predictions.
upvoted 2 times
Went with B
upvoted 2 times
Selected Answer: B
Assigns credit for the outcome to each feature, and considers different permutations of the features. This method provides a sampling
approximation of exact Shapley values.
shampled shapely recommended Model Type: Non-differentiable models, such as ensembles of trees and neural networks.
https://cloud.google.com/ai-platform/prediction/docs/ai-explanations/overview
upvoted 2 times
Selected Answer: B
Sampled Shapley works well for these models, which are meta-ensembles of trees and neural networks.
https://cloud.google.com/vertex-ai/docs/explainable-ai/overview#sampled-shapley
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 164/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Selected Answer: B
It should be B.
upvoted 1 times
Selected Answer: B
https://cloud.google.com/vertex-ai/docs/explainable-ai/overview#sampled-shapley
upvoted 2 times
B
- https://christophm.github.io/interpretable-ml-book/shapley.html
- https://cloud.google.com/vertex-ai/docs/explainable-ai/overview
upvoted 2 times
Agree with B : individual instance prediction + ensemble of trees and neural networks (recommended model types for Sampled Shapley : "Non-
differentiable models, such as ensembles of trees and neural networks " ). Check out the link below :
https://cloud.google.com/vertex-ai/docs/explainable-ai/overview
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 165/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You are working on a classification problem with time series data. After conducting just a few experiments using random cross-validation, you
achieved an Area Under the Receiver Operating Characteristic Curve (AUC ROC) value of 99% on the training data. You haven’t explored using any
sophisticated algorithms or spent any time on hyperparameter tuning. What should your next step be to identify and fix the problem?
A. Address the model overfitting by using a less complex algorithm and use k-fold cross-validation.
C. Address data leakage by removing features highly correlated with the target value.
D. Address the model overfitting by tuning the hyperparameters to reduce the AUC ROC value.
Correct Answer: D
Selected Answer: B
random cross-validation
time series data
-> B
upvoted 1 times
Selected Answer: D
Options B and C (Address data leakage by applying nested cross-validation during model training; Address data leakage by removing features
highly correlated with the target value) are less relevant in this scenario because the primary concern appears to be overfitting rather than data
leakage. Data leakage typically involves inadvertent inclusion of information from the test set in the training process, which may lead to overly
optimistic performance metrics. However, there is no indication that data leakage is the cause of the high AUC ROC value in this case.
upvoted 1 times
Options A and B also address overfitting, but they involve different strategies. Option A suggests using a less complex algorithm and k-fold cross-
validation. While this can be effective, it might be premature to change the algorithm without first exploring hyperparameter tuning. Option B
suggests addressing data leakage, which is a different issue and may not be the primary cause of overfitting in this scenario.
upvoted 3 times
Went with B
upvoted 2 times
Selected Answer: B
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 166/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
It`s B
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 167/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You need to execute a batch prediction on 100 million records in a BigQuery table with a custom TensorFlow DNN regressor model, and then store
the predicted results in a BigQuery table. You want to minimize the effort required to build this inference pipeline. What should you do?
A. Import the TensorFlow model with BigQuery ML, and run the ml.predict function.
B. Use the TensorFlow BigQuery reader to load the data, and use the BigQuery API to write the results to BigQuery.
C. Create a Dataflow pipeline to convert the data in BigQuery to TFRecords. Run a batch inference on Vertex AI Prediction, and write the
results to BigQuery.
D. Load the TensorFlow SavedModel in a Dataflow pipeline. Use the BigQuery I/O connector with a custom function to perform the inference
Correct Answer: A
Selected Answer: A
Selected Answer: C
Went with C
upvoted 1 times
Selected Answer: A
Went with A
upvoted 2 times
https://cloud.google.com/bigquery-ml/docs/making-predictions-with-imported-tensorflow-models
upvoted 2 times
for this:
https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-inference-overview
Predict the label, either a numerical value for regression tasks or a categorical value for classification tasks on DNN regresion
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 168/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You are creating a deep neural network classification model using a dataset with categorical input values. Certain columns have a cardinality
greater than 10,000 unique values. How should you encode these categorical values as input into the model?
Correct Answer: C
Selected Answer: A
went with A
upvoted 1 times
Selected Answer: B
Went with B
upvoted 1 times
Selected Answer: B
https://cloud.google.com/ai-platform/training/docs/algorithms/wide-and-deep
If the column is categorical with high cardinality, then the column is treated with hashing, where the number of hash buckets equals to the square
root of the number of unique values in the column.
upvoted 1 times
Selected Answer: B
B.
The other options solves nada.
upvoted 1 times
Selected Answer: B
https://towardsdatascience.com/getting-deeper-into-categorical-encodings-for-machine-learning-2312acd347c8
When you have millions uniques values try to do: Hash Encoding
upvoted 1 times
B unconditoinally
https://cloud.google.com/ai-platform/training/docs/algorithms/xgboost#analysis
If the column is categorical with high cardinality, then the column is treated with hashing, where the number of hash buckets equals to the square
root of the number of unique values in the column.
A categorical column is considered to have high cardinality if the number of unique values is greater than the square root of the number of rows in
the dataset.
upvoted 2 times
Selected Answer: C
Selected Answer: B
I think B is correct
Ref.:"
- https://cloud.google.com/ai-platform/training/docs/algorithms/xgboost
- https://stackoverflow.com/questions/26473233/in-preprocessing-data-with-high-cardinality-do-you-hash-first-or-one-hot-encode
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 169/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
upvoted 4 times
Selected Answer: B
Answer is B. When cardinality of the categorical column is very large best choice is binary encoding however it not here hence one-hot hash
option.
upvoted 1 times
Selected Answer: B
Ans : B
upvoted 1 times
Selected Answer: B
B is correct
upvoted 1 times
Answer A since with 10.000 unique values one-hot shouldn't be a good solution
https://machinelearningmastery.com/how-to-prepare-categorical-data-for-deep-learning-in-python/
upvoted 3 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 170/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You need to train a natural language model to perform text classification on product descriptions that contain millions of examples and 100,000
unique words. You want to preprocess the words individually so that they can be fed into a recurrent neural network. What should you do?
A. Create a hot-encoding of words, and feed the encodings into your model.
B. Identify word embeddings from a pre-trained model, and use the embeddings in your model.
C. Sort the words by frequency of occurrence, and use the frequencies as the encodings in your model.
D. Assign a numerical value to each word from 1 to 100,000 and feed the values as inputs in your model.
Correct Answer: B
Selected Answer: B
Went with B
upvoted 1 times
Selected Answer: B
B
https://developers.google.com/machine-learning/guides/text-classification/step-3
https://developers.google.com/machine-learning/guides/text-classification/step-4
i
upvoted 2 times
Selected Answer: B
Answer is B
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 171/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You work for an online travel agency that also sells advertising placements on its website to other companies. You have been asked to predict the
most relevant web banner that a user should see next. Security is important to your company. The model latency requirements are 300ms@p99,
the inventory is thousands of web banners, and your exploratory analysis has shown that navigation context is a good predictor. You want to
Implement the simplest solution. How should you configure the prediction pipeline?
A. Embed the client on the website, and then deploy the model on AI Platform Prediction.
B. Embed the client on the website, deploy the gateway on App Engine, deploy the database on Firestore for writing and for reading the user’s
C. Embed the client on the website, deploy the gateway on App Engine, deploy the database on Cloud Bigtable for writing and for reading the
user’s navigation context, and then deploy the model on AI Platform Prediction.
D. Embed the client on the website, deploy the gateway on App Engine, deploy the database on Memorystore for writing and for reading the
user’s navigation context, and then deploy the model on Google Kubernetes Engine.
Correct Answer: C
Selected Answer: C
Selected Answer: B
Cloud Bigtable is a more complex database to set up and manage than Firestore.
Cloud Bigtable is not as secure as Firestore.
Cloud Bigtable is not as well-integrated with other Google Cloud services as Firestore.
Therefore, B is the simpler solution that meets all of the requirements.
upvoted 5 times
Selected Answer: B
see e707
upvoted 1 times
Selected Answer: C
as Hiromi said
upvoted 1 times
Selected Answer: B
Selected Answer: B
Embed the client on the website, deploy the gateway on App Engine, and then deploy the model on AI Platform Prediction.
upvoted 1 times
Selected Answer: B
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 172/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
I would go with Firestore as throughput or latency requirement provided in the question are possible with Firestore and bigTable may be an
overkill. Had the scenario involved super large volumes of data, CBT would have taken precedence
upvoted 1 times
Selected Answer: B
Selected Answer: B
the primary requirement mentioned in the original question is to implement the simplest solution. Firestore is a fully managed, serverless NoSQL
database that can also handle thousands of web banners and dynamically changing user browsing history. It is designed for real-time data
synchronization and can quickly update the most relevant web banner as the user browses different pages of the website.
While Cloud Bigtable offers high performance and scalability, it is more complex to manage and is better suited for large-scale, high-throughput
workloads. Firestore, on the other hand, is easier to implement and maintain, making it a more suitable choice for the simplest solution in this
scenario.
upvoted 2 times
If you need:
- Submillisecond retrieval latency on a limited amount of quickly changing data, retrieved by a few thousand clients, use Memorystore.
- Millisecond retrieval latency on slowly changing data where storage scales automatically, use Datastore.
- Millisecond retrieval latency on dynamically changing data, using a store that can scale linearly with heavy reads and writes, use Bigtable.
Source: https://cloud.google.com/architecture/minimizing-predictive-serving-latency-in-machine-learning#choosing_a_nosql_database
C is better than B because 1) the inventory is thousands of web banners and 2) we expect the user to compare many travel destinations, dates,
hotels, and tariffs during their search process. It means the user's browsing history is dynamically changing, and we need to identify "the most
relevant web banner that a user should see next" => we will be dynamically changing the ad as the user browses different pages of the website.
upvoted 2 times
Selected Answer: C
Went with C
upvoted 1 times
Selected Answer: B
B for me
upvoted 1 times
Selected Answer: B
B, for me.
upvoted 2 times
For the given scenario, the latency requirements are 300ms@p99, which Firestore can handle effectively for thousands of web banners and
dynamically changing user browsing history. Firestore is designed for real-time data synchronization and can quickly update the most relevant
web banner as the user browses different pages on the website.
While Cloud Bigtable can offer improved latency, it comes with added complexity in terms of management and configuration. If the primary
goal is to implement the simplest solution while meeting the latency requirements, Firestore remains a more suitable choice for this use case.
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 173/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
I need a DB to store the banners, so no A. We're talking of thousands of banners, so no C. Memorystore calls Redis, and other solutions, so no D.
The answer is B, for me.
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 174/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Your data science team has requested a system that supports scheduled model retraining, Docker containers, and a service that supports
autoscaling and monitoring for online prediction requests. Which platform components should you choose for this system?
D. Cloud Composer, Vertex AI Training with custom containers, and App Engine
Correct Answer: A
Selected Answer: B
The Cloud Compose may be good consideration if you are involved in getting Google Data Engineer Cert
App enging is relevant to Dev-Op Cert
Pls.
if you know a bit about ML Google Cloud, we are preparing to take Google ML Cert, if there is no specifically particular requirement in the
question.
We must emphasize on use of Vertext AI as much as possible.
upvoted 5 times
Selected Answer: D
A custom container is a Docker image that you create to run your training application. By running your machine learning (ML) training job in a
custom container, you can use ML frameworks, non-ML dependencies, libraries, and binaries that are not otherwise supported on Vertex AI. so we
need vertex ai custom container for docker container. Thus option A and B are omitted .
App Engine allows developers to focus on what they do best: writing code. Based on Compute Engine, the App Engine flexible environment
automatically scales your app up and down while also balancing the load.
Customizable infrastructure - App Engine flexible environment instances are Compute Engine virtual machines, which means that you can take
advantage of custom libraries, use SSH for debugging, and deploy your own Docker containers.
upvoted 1 times
Went with B
upvoted 2 times
Vote for B
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 175/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Selected Answer: B
Vote for B
upvoted 3 times
Serve Vertex AI Prediction, but the monitoring in the question is not the one of the answer B. (that is connected to the modeol). The correct answe
is C.
upvoted 1 times
Selected Answer: B
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 176/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You are profiling the performance of your TensorFlow model training time and notice a performance issue caused by inefficiencies in the input
data pipeline for a single 5 terabyte CSV file dataset on Cloud Storage. You need to optimize the input pipeline performance. Which action should
C. Split into multiple CSV files and use a parallel interleave transformation.
Correct Answer: D
Selected Answer: C
Converting a large 5 terabyte CSV file to a TFRecord can be a time-consuming process, and you would still be dealing with a single large file.
upvoted 1 times
Selected Answer: C
While preprocessing the input CSV file into a TFRecord file (Option A) can improve the performance of your input pipeline, it is not the first action
to try in this situation. Converting a large 5 terabyte CSV file to a TFRecord can be a time-consuming process, and you would still be dealing with a
single large file.
upvoted 1 times
Selected Answer: C
i think C based on the consideration: "Which action should you try first ", meaning it should be less impactful to continue using CSV.
upvoted 1 times
Selected Answer: C
https://www.tensorflow.org/guide/data_performance#best_practice_summary
upvoted 2 times
Went with C
upvoted 1 times
Option A, preprocess the input CSV file into a TFRecord file, is not as good because it requires additional processing time. Hence, I think C is the
best choice.
upvoted 1 times
Selected Answer: A
I think it could be A.
https://cloud.google.com/architecture/best-practices-for-ml-performance-cost#preprocess_the_data_once_and_save_it_as_a_tfrecord_file
upvoted 1 times
Clearly both A and C works here, but I can't find any documentation which suggests C is any better than A.
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 177/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Selected Answer: A
Option B (randomly selecting a 10 gigabyte subset of the data) could lead to a loss of useful data and may not be representative of the entire
dataset. Option C (splitting into multiple CSV files and using a parallel interleave transformation) may also improve the performance, but may be
more complex to implement and maintain, and may not be as efficient as converting to TFRecord. Option D (setting the reshuffle_each_iteration
parameter to true in the tf.data.Dataset.shuffle method) is not directly related to the input data format and may not provide as significant a
performance improvement as converting to TFRecord.
upvoted 3 times
Selected Answer: C
C
Keywords -> You need to optimize the input pipeline performance
https://www.tensorflow.org/guide/data_performance
upvoted 2 times
It seems C, to me.
upvoted 1 times
Splitting the file we can use parallel interleave to parallel load the datasets
https://www.tensorflow.org/guide/data_performance
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 178/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You need to design an architecture that serves asynchronous predictions to determine whether a particular mission-critical machine part will fail.
Your system collects data from multiple sensors from the machine. You want to build a model that will predict a failure in the next N minutes,
given the average of each sensor’s data from the past 12 hours. How should you design the architecture?
A. 1. HTTP requests are sent by the sensors to your ML model, which is deployed as a microservice and exposes a REST API for prediction
2. Your application queries a Vertex AI endpoint where you deployed your model.
3. Responses are received by the caller application as soon as the model produces the prediction.
B. 1. Events are sent by the sensors to Pub/Sub, consumed in real time, and processed by a Dataflow stream processing pipeline.
2. The pipeline invokes the model for prediction and sends the predictions to another Pub/Sub topic.
3. Pub/Sub messages containing predictions are then consumed by a downstream system for monitoring.
2. Submit a Vertex AI batch prediction job that uses your trained model in Cloud Storage to perform scoring on the preprocessed data.
3. Export the batch prediction job outputs from Cloud Storage and import them into Cloud SQL.
D. 1. Export the data to Cloud Storage using the BigQuery command-line tool
2. Submit a Vertex AI batch prediction job that uses your trained model in Cloud Storage to perform scoring on the preprocessed data.
3. Export the batch prediction job outputs from Cloud Storage and import them into BigQuery.
Correct Answer: C
Selected Answer: C
The simplest solution that can support an eventual batch prediction (triggered by pub/sub) even the semi-real time prediction.
upvoted 1 times
Selected Answer: B
Needs to be real time not batch. The data needs to be processed as a stream since multiple sensors are used. pawan94 is right.
https://cloud.google.com/architecture/minimizing-predictive-serving-latency-in-machine-learning#online_real-time_prediction
upvoted 1 times
Selected Answer: D
D.
I think we have to query data from the past 12 hours for the prediction, and that's the reason for exporting the data to Cloud Storage.
Also, the predictions don't have to be real time.
upvoted 1 times
Selected Answer: B
Went with B
upvoted 1 times
Selected Answer: B
B.
Online prediction, and need decoupling with Pub/Sub to make it asynchronous. Option A is synchronous.
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 179/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
upvoted 2 times
In this use case, the requirement is to predict whether a machine part will fail in the next N minutes, given the average of each sensor's data from
the past 12 hours. Therefore, real-time processing and prediction are necessary. Batch prediction jobs are not designed for real-time processing,
and there may be a delay in receiving the predictions.
Option B, on the other hand, is designed for real-time processing and prediction. The Pub/Sub and Dataflow components allow for real-time
processing of incoming sensor data, and the trained ML model can be invoked for prediction in real-time. This makes it ideal for mission-critical
applications where timely predictions are essential.
upvoted 2 times
This architecture is highly scalable, resilient, and efficient, as it can handle large volumes of data and perform real-time processing and prediction.
also separates concerns by using a separate pipeline for data processing and another for prediction, making it easier to maintain and modify the
system.
upvoted 1 times
afte that, you can take a closer look at figure3 and read what it try to describle
C and D it is the offline solution but you opt to use different tools.
https://cloud.google.com/architecture/minimizing-predictive-serving-latency-in-machine-learning#offline_batch_prediction
upvoted 2 times
B
"Predictive maintenance: asynchronously predicting whether a particular machine part will fail in the next N minutes, given the averages of the
sensor's data in the past 30 minutes."
https://cloud.google.com/architecture/minimizing-predictive-serving-latency-in-machine-learning#offline_batch_prediction
upvoted 3 times
Selected Answer: B
Answer is B.
https://cloud.google.com/architecture/minimizing-predictive-serving-latency-in-machine-learning#handling_dynamic_real-time_features
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 180/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Selected Answer: C
C, for me.
upvoted 1 times
ref : https://cloud.google.com/architecture/minimizing-predictive-serving-latency-in-machine-learning#offline_batch_prediction
upvoted 1 times
Answer B
I though a lot, since we don't need a real-time response in this scenario, but other options have this problems:
A - Http request for sensors data is not a good idea
C - What's the point of use Cloud Sql to store the results?
D - No BQ mentioned, so why use bq SDK to move data?
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 181/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Your company manages an application that aggregates news articles from many different online sources and sends them to users. You need to
build a recommendation model that will suggest articles to readers that are similar to the articles they are currently reading. Which approach
A. Create a collaborative filtering system that recommends articles to a user based on the user’s past behavior.
B. Encode all articles into vectors using word2vec, and build a model that returns articles based on vector similarity.
C. Build a logistic regression model for each user that predicts whether an article should be recommended to a user.
D. Manually label a few hundred articles, and then train an SVM classifier based on the manually classified articles that categorizes additional
Correct Answer: A
Selected Answer: B
Went with B
upvoted 1 times
Selected Answer: B
Went with B
upvoted 1 times
Selected Answer: B
https://cloud.google.com/blog/topics/developers-practitioners/meet-ais-multitool-vector-embeddings
Answer B
upvoted 3 times
Selected Answer: B
Currently reading is the keyword here. Going to need B for that, A won't work since it would be based on e.g. all reading history and not the article
currently being read.
upvoted 2 times
Option C, building a logistic regression model for each user, may not be scalable because it requires building a separate model for each user, whic
can become difficult to manage as the number of users increases.
Option D, manually labeling articles and training an SVM classifier, may not be as effective as the word2vec approach because it relies on manual
labeling, which can be time-consuming and may not capture the full semantic meaning of the articles. Additionally, SVMs may not be as effective
as neural network-based approaches like word2vec for capturing complex relationships between words and articles.
upvoted 2 times
word2vec can easily get similar articles, but the collaborative filter isn't sure well.
upvoted 1 times
Selected Answer: B
B
https://towardsdatascience.com/recommending-news-articles-based-on-already-read-articles-627695221fe8
upvoted 3 times
Answer B
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 182/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Selected Answer: B
Collaborative looks at the other users, knowledge-based at me.Answer B is the most knowledge based, among these.
upvoted 2 times
Selected Answer: A
Selected Answer: B
Answer B
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 183/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You work for a large social network service provider whose users post articles and discuss news. Millions of comments are posted online each
day, and more than 200 human moderators constantly review comments and flag those that are inappropriate. Your team is building an ML model
to help human moderators check content on the platform. The model scores each comment and flags suspicious comments to be reviewed by a
human. Which metric(s) should you use to monitor the model’s performance?
B. Number of messages flagged by the model per minute confirmed as being inappropriate by humans.
C. Precision and recall estimates based on a random sample of 0.1% of raw messages each minute sent to a human for review
D. Precision and recall estimates based on a sample of messages flagged by the model as potentially inappropriate each minute
Correct Answer: B
Selected Answer: D
D
- https://cloud.google.com/natural-language/automl/docs/beginners-guide
- https://cloud.google.com/vertex-ai/docs/text-data/classification/evaluate-model
upvoted 10 times
Selected Answer: C
A. Number of messages flagged by the model per minute => NO, no measure of model performance
B. Number of messages flagged by the model per minute confirmed as being inappropriate by humans.=> DONT THINK SO, because we need the
total number of messages (flagged?)
C. Precision and recall estimates based on a random sample of 0.1% of raw messages each minute sent to a human for review. => I think YES,
because as I understand it that would be based on a sample of ALL messages not just the ones that have been flagged.
D. Precision and recall estimates based on a sample of messages flagged by the model as potentially inappropriate each minute => I think NO,
because the sample includes only flagged messages, meaning positives, so you cannot really measure recall.
upvoted 7 times
Selected Answer: D
Precision and recall are critical metrics for evaluating the performance of classification models, especially in contexts where both the accuracy of
positive predictions (precision) and the ability to identify all positive instances (recall) are important. In this case:
Precision (the proportion of messages flagged by the model as inappropriate that were actually inappropriate) helps ensure that the model
minimizes the burden on human moderators by not flagging too many false positives, which could overwhelm them.
Recall (the proportion of actual inappropriate messages that were correctly flagged by the model) ensures that the model is effective at catching a
many inappropriate messages as possible, reducing the risk of harmful content being missed.
upvoted 2 times
I go with C
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 184/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Selected Answer: D
C does not make sense to me since it is a very small random sample. It is also only messages that have been sent to humans for review meaning
that there is bias in that result set.
upvoted 1 times
In favor of D
upvoted 1 times
Given the context of content moderation, a balanced approach is often preferred. Therefore, option C, precision and recall estimates based on a
random sample of raw messages, is a good choice. It provides a holistic view of the model's performance, taking into account both false positives
(precision) and false negatives (recall), and it reflects how well the model is handling the entire dataset.
upvoted 1 times
Both C & D work well in this case, but the specificity is higher in option D and hence will go with D
upvoted 1 times
Google Cloud used to have a service called "continuous evaluation", where human labelers classify data to establish a ground truth. Thinking along
those lines, the answer is C as it's the logical equivalent of that service.
https://cloud.google.com/ai-platform/prediction/docs/continuous-evaluation
upvoted 1 times
Selected Answer: D
D. Precision and recall estimates based on a sample of messages flagged by the model as potentially inappropriate each minute
You will need precision and recall to identify fals positives and false negatives. A very small random sample doesn't help specially becasue probably
you will have skewed data. So D.
upvoted 1 times
Selected Answer: D
Went with D
upvoted 1 times
Selected Answer: D
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 185/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You are a lead ML engineer at a retail company. You want to track and manage ML metadata in a centralized way so that your team can have
reproducible experiments by generating artifacts. Which management solution should you recommend to your team?
Correct Answer: C
Selected Answer: D
D
- https://cloud.google.com/vertex-ai/docs/ml-metadata/tracking
upvoted 5 times
Selected Answer: D
Selected Answer: D
upvoted 1 times
Went with D
upvoted 1 times
Selected Answer: D
totally D
upvoted 2 times
Selected Answer: D
Selected Answer: D
https://codelabs.developers.google.com/vertex-mlmd-pipelines?hl=id&authuser=6#0
upvoted 3 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 186/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You have been given a dataset with sales predictions based on your company’s marketing activities. The data is structured and stored in BigQuery,
and has been carefully managed by a team of data analysts. You need to prepare a report providing insights into the predictive capabilities of the
data. You were asked to run several ML models with different levels of sophistication, including simple models and multilayered neural networks.
You only have a few hours to gather the results of your experiments. Which Google Cloud tools should you use to complete this task in the most
A. Use BigQuery ML to run several regression models, and analyze their performance.
B. Read the data from BigQuery using Dataproc, and run several models using SparkML.
C. Use Vertex AI Workbench user-managed notebooks with scikit-learn code for a variety of ML algorithms and performance metrics.
D. Train a custom TensorFlow model with Vertex AI, reading the data from BigQuery featuring a variety of ML algorithms.
Correct Answer: A
Selected Answer: A
You only have a few hours. The dataset is in BQ. The dataset is carefully managed. BQML it is.
upvoted 1 times
Selected Answer: C
Selected Answer: A
All deep neural networks are multilayered neural networks, but not all multilayered neural networks are necessarily deep. The term "deep" is used
to emphasize the depth of the network in the context of having many hidden layers, which has been shown to be effective for learning hierarchical
representations of complex patterns in data.
Vertex AI Workbench provides user-managed notebooks that allow you to run Python code using libraries like scikit-learn, TensorFlow, and more.
You can easily connect to your BigQuery dataset from within the notebook, extract the data, and perform data preprocessing.
You can then experiment with different ML algorithms available in scikit-learn and track performance metrics.
It provides flexibility, control, and the ability to run various models quickly.
upvoted 3 times
Selected Answer: C
I think multilayered neural networks need to be trained externally from BQ ML as stated here:
https://cloud.google.com/bigquery/docs/bqml-introduction
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 187/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Selected Answer: A
According to the question, you don't have enough time. B, C, D need much more time to set up the service, or write the code. Also the data is
already in BigQuery. BQML should be the fastest way. Besides, BQML supports xgboost, NN models as well.
upvoted 2 times
Selected Answer: C
The question says that "You were asked to run several ML models with different levels of sophistication, including simple models and multilayered
neural networks" BQ ML doesn't allow this. BQ ML provides only simple regression/categorization models. It is not about training these
"sophisticated models" but only run them, so you can easly do it within few hours with notebooks.
upvoted 2 times
Went with A
upvoted 2 times
Selected Answer: C
BigQuery ML allows you to quickly create and evaluate ML models directly within BigQuery, without the need to move the data or set up a
separate environment. This makes it faster and more convenient for running several regression models and analyzing their performance within
the given time frame.
upvoted 1 times
Selected Answer: A
B,C,D requires coding. You only have some hours, A is the fastest.
upvoted 2 times
Selected Answer: A
I vote for A
upvoted 3 times
Selected Answer: A
It's A.
upvoted 2 times
Selected Answer: A
I will go with A, since it's the fastest way to do it. Custom training in Vertex AI requires time and writing scikit-learn models in notebooks too
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 188/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You are an ML engineer at a bank. You have developed a binary classification model using AutoML Tables to predict whether a customer will make
loan payments on time. The output is used to approve or reject loan requests. One customer’s loan request has been rejected by your model, and
the bank’s risks department is asking you to provide the reasons that contributed to the model’s decision. What should you do?
B. Use the correlation with target values in the data summary page.
D. Vary features independently to identify the threshold per feature that changes the classification.
Correct Answer: C
Selected Answer: A
To access local feature importance in AutoML Tables, you can use the "Explain" feature, which shows the contribution of each feature to the
prediction for a specific example. This will help you identify the most important features that contributed to the loan request being rejected.
Option B, using the correlation with target values in the data summary page, may not provide the most accurate explanation as it looks at the
overall correlation between the features and target variable, rather than the contribution of each feature to a specific prediction.
Option C, using the feature importance percentages in the model evaluation page, may not provide a sufficient explanation for the specific
prediction, as it shows the importance of each feature across all predictions, rather than for a specific prediction.
Option D, varying features independently to identify the threshold per feature that changes the classification, is not recommended as it can be
time-consuming and does not provide a clear explanation for why the loan request was rejected
upvoted 9 times
Selected Answer: A
Went with A
upvoted 1 times
Selected Answer: A
Local, not global since they asked about one specific prediction.
Check out that section on this blog: https://cloud.google.com/blog/products/ai-machine-learning/explaining-model-predictions-structured-data/
Cool stuff!
upvoted 4 times
Selected Answer: A
Selected Answer: C
AutoML Tables tells you how much each feature impacts this model. It is shown in the Feature importance graph. The values are provided as a
percentage for each feature: the higher the percentage, the more strongly that feature impacted model training. C.
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 189/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Selected Answer: A
A
https://cloud.google.com/automl-tables/docs/explain#local
upvoted 2 times
Agree with A.
"Local feature importance gives you visibility into how the individual features in a specific prediction request affected the resulting prediction.
Each local feature importance value shows only how much the feature affected the prediction for that row. To understand the overall behavior of
the model, use model feature importance."
https://cloud.google.com/automl-tables/docs/explain#local
upvoted 4 times
Selected Answer: C
"Feature importance: AutoML Tables tells you how much each feature impacts this model. It is shown in the Feature importance graph. The values
are provided as a percentage for each feature: the higher the percentage, the more strongly that feature impacted model training." The correct
answer is C.
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 190/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You work for a magazine distributor and need to build a model that predicts which customers will renew their subscriptions for the upcoming year.
Using your company’s historical data as your training set, you created a TensorFlow model and deployed it to AI Platform. You need to determine
which customer attribute has the most predictive power for each prediction served by the model. What should you do?
A. Use AI Platform notebooks to perform a Lasso regression analysis on your model, which will eliminate features that do not provide a strong
signal.
B. Stream prediction results to BigQuery. Use BigQuery’s CORR(X1, X2) function to calculate the Pearson correlation coefficient between each
C. Use the AI Explanations feature on AI Platform. Submit each prediction request with the ‘explain’ keyword to retrieve feature attributions
D. Use the What-If tool in Google Cloud to determine how your model will perform when individual features are excluded. Rank the feature
importance in order of those that caused the most significant performance drop when removed from the model.
Correct Answer: D
Selected Answer: C
Selected Answer: C
Went with C
upvoted 2 times
Selected Answer: C
https://cloud.google.com/ai-platform/prediction/docs/ai-explanations/overview
AI Explanations helps you understand your model's outputs for classification and regression tasks. Whenever you request a prediction on AI
Platform, AI Explanations tells you how much each feature in the data contributed to the predicted result.
upvoted 1 times
Selected Answer: C
Selected Answer: D
I vote for D
- https://www.tensorflow.org/tensorboard/what_if_tool
- https://pair-code.github.io/what-if-tool/
- https://medium.com/red-buffer/tensorflows-what-if-tool-c52914ea215c
C is wrong cuz AI Explanation dosen't work for TensorFlow models (https://cloud.google.com/vertex-ai/docs/explainable-ai/overview)
upvoted 1 times
upvoted 1 times
Selected Answer: C
AI Explanations provides feature attributions using the sampled Shapley method, which can help you understand how much each feature
contributes to a model's prediction.
upvoted 3 times
Selected Answer: C
AI Explanations helps you understand your model's outputs for classification and regression tasks. Whenever you request a prediction on AI
Platform, AI Explanations tells you how much each feature in the data contributed to the predicted result." It's C!
upvoted 2 times
Agree with C
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 192/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You are working on a binary classification ML algorithm that detects whether an image of a classified scanned document contains a company’s
logo. In the dataset, 96% of examples don’t have the logo, so the dataset is very skewed. Which metrics would give you the most confidence in
your model?
B. RMSE
C. F1 score
Correct Answer: A
Selected Answer: A
In this scenario, the dataset is highly imbalanced, where most of the examples do not have the company's logo. Therefore, accuracy could be
misleading as the model can have high accuracy by simply predicting that all images do not have the logo. F1 score is a good metric to consider in
such cases, as it takes both precision and recall into account. However, since the dataset is highly skewed, we should weigh recall more than
precision to ensure that the model is correctly identifying the images that do have the logo. Therefore, F-score where recall is weighed more than
precision is the best metric to evaluate the performance of the model in this scenario. Option B (RMSE) is not applicable to this classification
problem, and option D (F-score where precision is weighed more than recall) is not suitable for highly skewed datasets.
upvoted 8 times
Selected Answer: C
I'd go with C. We don't know which option (less FP or less FN) is most important for business with the provided information, so we should seek a
balance.
upvoted 2 times
I think it's D.
upvoted 1 times
Selected Answer: D
I think it could be D, but the question does not provide enough information for this.
I have this feeling: If 4% have the logo, we are looking just for these ones, right? So, the 'quality of TP,' that's it, the precision, could be more
interesting because we want a model that we can rely on. So, when this model Predict a image with logo, we`ll be more certain about it.
If we use recall, for example, a model with 99% recall has more chance of getting the logo, but we won't have quality in this. This model could
suggest a lot of images without logo. It is better to use any ML than this...
upvoted 2 times
Selected Answer: C
both option A (F-score with higher weight on recall) and option C (F1 score) could be suitable depending on the specific priorities and
requirements of your classification problem. If missing a company's logo is considered more problematic than having false alarms, then option A
might be preferred. The F1 score (option C) is a balanced measure that considers both precision and recall, which is generally a good choice in
imbalanced datasets.
Ultimately, the choice between option A and option C depends on the specific goals and constraints of your application.
upvoted 1 times
The question not have clear preference for recall or precision hence going with C
upvoted 2 times
Yeah, I know - everyone is voting A... To be honest I still don't understand why are you more affraid of these few FNs than FPs. In my opinion they
are exactly same evil. Every documantation says that F1 is great on skewed data. You should use weighted F1 when you know what is worse for you
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 193/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
FNs or FPs. In this case we have no any hints on it, so I would stay with ordinary F1.
upvoted 4 times
Selected Answer: A
Went with A
upvoted 2 times
Selected Answer: A
I think is A. The positive Class is the minority. So, it's more important to correctly detect logos in all images that have logo (recall) than correctly
detect logos in images classified with logos (precision).
upvoted 3 times
I think is D becouse u try detect TP then it's more important recall than precision
upvoted 3 times
Selected Answer: A
Answer A is my choice.
upvoted 1 times
Selected Answer: A
Selected Answer: D
D (not sure)
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 194/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You work on the data science team for a multinational beverage company. You need to develop an ML model to predict the company’s profitability
for a new line of naturally flavored bottled waters in different locations. You are provided with historical data that includes product types, product
sales volumes, expenses, and profits for all regions. What should you use as the input and output for your model?
A. Use latitude, longitude, and product type as features. Use profit as model output.
B. Use latitude, longitude, and product type as features. Use revenue and expenses as model outputs.
C. Use product type and the feature cross of latitude with longitude, followed by binning, as features. Use profit as model output.
D. Use product type and the feature cross of latitude with longitude, followed by binning, as features. Use revenue and expenses as model
outputs.
Correct Answer: C
Selected Answer: C
C (not sure)
- https://developers.google.com/machine-learning/crash-course/feature-crosses/video-lecture
- https://developers.google.com/machine-learning/crash-course/regularization-for-sparsity/l1-regularization
upvoted 6 times
Selected Answer: D
Selected Answer: C
Went with C
upvoted 1 times
Option C is the best option because it takes into account both the product type and location, which can affect profitability. Binning the feature
cross of latitude and longitude can help capture the nonlinear relationship between location and profitability, and using profit as the model output
is appropriate because it's the target variable we want to predict.
upvoted 3 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 195/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Selected Answer: C
Must be C. Always feature cross lat and lon on geographical problems. Also, D can not be right as we do not have revenue in the dataset.
upvoted 2 times
In this case, there is no need to reduce the number of unique values in the latitude and longitude variables, and binning would reduce information
from those features hence A
upvoted 2 times
Easy C.
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 196/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You work as an ML engineer at a social media company, and you are developing a visual filter for users’ profile photos. This requires you to train
an ML model to detect bounding boxes around human faces. You want to use this filter in your company’s iOS-based mobile phone application.
You want to minimize code development and want the model to be optimized for inference on mobile phones. What should you do?
A. Train a model using AutoML Vision and use the “export for Core ML” option.
B. Train a model using AutoML Vision and use the “export for Coral” option.
C. Train a model using AutoML Vision and use the “export for TensorFlow.js” option.
Correct Answer: A
Selected Answer: A
https://cloud.google.com/vision/automl/docs/export-edge
Core ML -> iOS and macOS
Coral -> Edge TPU-based device
TensorFlow.js -> web
upvoted 12 times
Trained AutoML Edge image classification models can be exported in the following formats:
TF Lite - to run your model on edge or mobile devices.
Edge TPU TF Lite - to run your model on Edge TPU devices.
Container - to run on a Docker container.
Core ML - to run your model on iOS and macOS devices.
Tensorflow.js - to run your model in the browser and in Node.js.
upvoted 2 times
Selected Answer: A
Went with A
upvoted 1 times
Selected Answer: B
AutoML Vision is a service provided by Google Cloud that enables developers to train and deploy machine learning models for image recognition
tasks, such as detecting bounding boxes around human faces. The “export for Coral” option generates a TFLite model that is optimized for running
on Coral, a hardware platform specifically designed for edge computing, including mobile devices. The TFLite model is also compatible with iOS-
based mobile phone applications, making it easy to integrate into the company's app.
upvoted 1 times
Selected Answer: B
Option A, using AutoML Vision and exporting for Core ML, is also a viable option. Core ML is Apple's machine learning framework that is optimized
for iOS-based devices. However, using this option would require more development effort to integrate the Core ML model into the app.
Option C, using AutoML Vision and exporting for TensorFlow.js, is not the best option for this scenario since it is optimized for running on web
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 197/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Option D, training a custom TensorFlow model and converting it to TFLite, would require significant development effort and time compared to
using AutoML Vision. AutoML Vision provides a simple and efficient way to train and deploy machine learning models without requiring expertise
in machine learning.
upvoted 1 times
https://www.tensorflow.org/lite
https://medium.com/the-ai-team/step-into-on-device-inference-with-tensorflow-lite-a47242ba9130
upvoted 1 times
Selected Answer: A
A
"You want to minimize code development" -> AutoML
- https://cloud.google.com/vision/automl/docs/tflite-coreml-ios-tutorial
- https://cloud.google.com/vertex-ai/docs/training-overview#image
upvoted 2 times
Selected Answer: D
TensorFlow Lite is a lightweight version of TensorFlow that is optimized for mobile and embedded devices, making it an ideal choice for use in an
iOS-based mobile phone application.
upvoted 2 times
Selected Answer: D
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 198/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You have been asked to build a model using a dataset that is stored in a medium-sized (~10 GB) BigQuery table. You need to quickly determine
whether this data is suitable for model development. You want to create a one-time report that includes both informative visualizations of data
distributions and more sophisticated statistical analyses to share with other ML engineers on your team. You require maximum flexibility to create
C. Use the output from TensorFlow Data Validation on Dataflow to generate the report.
Correct Answer: C
Selected Answer: A
More Flexbility
upvoted 2 times
Selected Answer: A
More Flexbility
upvoted 1 times
Selected Answer: A
Max flexibility
upvoted 1 times
Looker studio is good too but it does not give the same depth in statistical analysis of the data as using matplotlib, seaborn etc gives on a
notebook. So Jupyterlab notebook a.k.a Vertex AI workbench for me
upvoted 1 times
Selected Answer: A
A as it is a one off report with maximum flexibility. Dont need a dashboard unless being reused
upvoted 1 times
Selected Answer: A
The answer is A.
B is wrong because you need more sophisticated statistical analyses and maximum flexibility to create your report.
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 199/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Selected Answer: A
TensorFlow Data Validation(TFDV) can compute descriptive statistics that provide a quick overview of the data in terms of the features that are
present and the shapes of their value distributions. Tools such as Facets Overview can provide a succinct visualization of these statistics for easy
browsing.
upvoted 2 times
Selected Answer: B
I think has to be B. One of the keys is that it says quickly and BQ makes it very easy to export the query into Looker Studio. The other one is that
there's maximum flexibility within the needs for this case (informative visualizations + statistical analysis), as we can develop and write custom
formulas.
A feels like overkill to use a Deep Learning VM Image to only describe data and perform some analysis.
C also feels overkill to start developping a neural net for that.
D although you may use Dataprep for this, it is less suited than A
upvoted 2 times
Selected Answer: A
A seçeneğini öneriyorum çünkü Vertex AI Workbench kullanıcı yönetimli not defterleri (user-managed notebooks), BigQuery tablosundaki verilerin
analiz edilmesi ve görselleştirilmesi için daha fazla esneklik ve özelleştirme sağlar. Python kütüphaneleri (pandas, matplotlib, seaborn vb.)
kullanarak, veri dağılımlarının görselleştirmelerini oluşturabilir ve daha karmaşık istatistiksel analizler gerçekleştirebilirsiniz.
upvoted 1 times
Selected Answer: A
I think it's A.One time report containing real datasets STATISTICAL measurements to tell if the data is suitable for model development. Target
audience is also other ML engineers.
Getting a whole report of exactly this with TFDV/Facets is like two lines of code: https://www.tensorflow.org/tfx/data_validation/get_started
A similar data studio report for this would take lots of time and work, and there would be no benefit from reuseability since task was a one-time
job.
upvoted 2 times
Selected Answer: A
By using Vertex AI Workbench user-managed notebooks, you can create a one-time report that includes both informative visualizations and
sophisticated statistical analyses. The notebooks provide maximum flexibility for data analysis, as they allow you to use a wide range of libraries
and tools to create visualizations, perform statistical tests, and share your findings with your team. You can easily connect to the BigQuery table
from the notebook and perform the necessary data exploration and analysis.
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 200/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You work on an operations team at an international company that manages a large fleet of on-premises servers located in few data centers
around the world. Your team collects monitoring data from the servers, including CPU/memory consumption. When an incident occurs on a server,
your team is responsible for fixing it. Incident data has not been properly labeled yet. Your management team wants you to build a predictive
maintenance solution that uses monitoring data from the VMs to detect potential failures and then alerts the service desk team. What should you
do first?
A. Train a time-series model to predict the machines’ performance values. Configure an alert if a machine’s actual performance values
B. Implement a simple heuristic (e.g., based on z-score) to label the machines’ historical performance data. Train a model to predict
C. Develop a simple heuristic (e.g., based on z-score) to label the machines’ historical performance data. Test this heuristic in a production
environment.
D. Hire a team of qualified analysts to review and label the machines’ historical performance data. Train a model based on this manually
labeled dataset.
Correct Answer: D
Selected Answer: C
I would go for C, it is important to have a clear understanding of what constitutes a potential failure and how to detect it. A heuristic based on z-
scores, for example, can be used to flag instances where the performance values of a machine significantly differ from its historical baseline.
upvoted 8 times
Selected Answer: B
Option B involves labeling historical data using heuristics, which can be a practical and quick way to get started.
upvoted 1 times
Vote for C
Reference: Rule #1: Don’t be afraid to launch a product without machine learning.
https://developers.google.com/machine-learning/guides/rules-of-ml#before_machine_learning
upvoted 1 times
Went with C
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 201/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
upvoted 1 times
Selected Answer: C
This is the best option for this scenario because it's quick and inexpensive, and it can provide a baseline for labeling the historical performance
data. Once we have labeled data, we can train a predictive maintenance model to detect potential failures and alert the service desk team.
upvoted 1 times
Selected Answer: C
https://www.geeksforgeeks.org/z-score-for-outlier-detection-python/
upvoted 1 times
Selected Answer: B
I vote for B
- https://developers.google.com/machine-learning/guides/rules-of-ml
upvoted 3 times
Selected Answer: A
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 202/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You are developing an ML model that uses sliced frames from video feed and creates bounding boxes around specific objects. You want to
automate the following steps in your training pipeline: ingestion and preprocessing of data in Cloud Storage, followed by training and
hyperparameter tuning of the object model using Vertex AI jobs, and finally deploying the model to an endpoint. You want to orchestrate the entire
pipeline with minimal cluster management. What approach should you use?
Correct Answer: A
From:
https://cloud.google.com/vertex-ai/docs/pipelines/build-pipeline#sdk
"1. If you use TensorFlow in an ML workflow that processes terabytes of structured data or text data, we recommend that you build your pipeline
using TFX.
To learn more about building a TFX pipeline, follow the TFX getting started tutorials.
To learn more about using Vertex AI Pipelines to run a TFX pipeline, follow the TFX on Google Cloud tutorials.
2. For other use cases, we recommend that you build your pipeline using the Kubeflow Pipelines SDK. By building a pipeline with the Kubeflow
Pipelines SDK, you can implement your workflow by building custom components or reusing prebuilt components, such as the Google Cloud
Pipeline Components. Google Cloud Pipeline Components make it easier to use Vertex AI services like AutoML in your pipeline."
So I guess since it is image processing, it should be Kubeflow - answer C (TFX is for structured or text data).
upvoted 11 times
Selected Answer: C
If you use TensorFlow in an ML workflow that processes terabytes of structured data or text data, should use TFX. For other use cases, Kubeflow.
Link: https://cloud.google.com/vertex-ai/docs/pipelines/build-pipelin
upvoted 1 times
Overall, using Vertex AI Pipelines with TensorFlow Extended (TFX) SDK provides a comprehensive and managed solution for handling video feed
data in an ML pipeline, while minimizing the need for manual infrastructure management and maximizing scalability and efficiency.
upvoted 2 times
minimal managment
upvoted 1 times
Selected Answer: C
Went with C
upvoted 1 times
Selected Answer: C
Vertex AI Pipelines with Kubeflow Pipelines SDK provides a high-level interface for building end-to-end machine learning pipelines. This approach
allows for easy integration with Google Cloud services, including Cloud Storage for data ingestion and preprocessing, Vertex AI for training and
hyperparameter tuning, and deployment to an endpoint. The Kubeflow Pipelines SDK also allows for easy orchestration of the entire pipeline,
minimizing cluster management.
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 203/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Answer is C. If you use TensorFlow in an ML workflow that processes terabytes of structured data or text data, should use TFX. For other use cases,
Kubeflow. Link: https://cloud.google.com/vertex-ai/docs/pipelines/build-pipeline
upvoted 1 times
Selected Answer: C
Answer C...
https://cloud.google.com/architecture/ml-on-gcp-best-practices#use-vertex-pipelines
upvoted 1 times
Selected Answer: B
" You want to orchestrate the entire pipeline with minimal cluster management"
because of that it cant be answer c
i vote for b, becausse there is no cluster management with vertex ai
upvoted 2 times
Selected Answer: C
C
"If you are using other frameworks, we recommend using Kubeflow Pipeline, which is very flexible and allows you to use simple code to construct
pipelines. Kubeflow Pipeline also provides Google Cloud pipeline components such as Vertex AI AutoML."
(Journey to Become a Google Cloud Machine Learning Engineer: Build the mind and hand of a Google Certified ML professional)
upvoted 3 times
Selected Answer: C
vote C
upvoted 1 times
Selected Answer: C
I will go C, because for generic orchestration purpose kuberflow is recommended while TFX should go with large scale tasks.
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 204/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You are training an object detection machine learning model on a dataset that consists of three million X-ray images, each roughly 2 GB in size.
You are using Vertex AI Training to run a custom training application on a Compute Engine instance with 32-cores, 128 GB of RAM, and 1 NVIDIA
P100 GPU. You notice that model training is taking a very long time. You want to decrease training time without sacrificing model performance.
A. Increase the instance memory to 512 GB and increase the batch size.
B. Replace the NVIDIA P100 GPU with a v3-32 TPU in the training job.
Correct Answer: C
Selected Answer: C
I would say C.
The question asks about time, so the option "early stopping" looks fine because it will no impact the existent accuracy (it will maybe improve it).
The tf.distribute.Strategy reading the TF docs says that it's used when you want to split training between GPUs, but the question says that we have
a single GPU.
Open to discuss. :)
upvoted 6 times
Selected Answer: B
I would say B:
A. Increse memory doesn't mean necessary a speed up of the process, it's not a batch-size problem
B. It seems a image -> Tensorflow situation. So transforming image into tensors means that a TPU works better and maybe faster
C. It's not a overfitting problem
D. Same here, it's not a memory or input-size problem
upvoted 1 times
Selected Answer: D
In my eyes the only solution is distributed training. 3 000 000 x 2GB = 6 Petabytes worth of data. No single device will get you there.
upvoted 2 times
Selected Answer: B
Selected Answer: B
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 205/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Given the context and the need for a framework-agnostic approach, you might consider a combination of options A and D. Increasing instance
memory and batch size can still be beneficial, and if you're using a deep learning framework that supports distributed training (like TensorFlow or
PyTorch), implementing distributed training (Option D) can further accelerate the process.
upvoted 1 times
Selected Answer: B
I would go with B as v3-32 TPU offers much more computational power than a single P100 GPU, and this upgrade should provide a substantial
decrease in training time.
Also tf.distributestrategy is good to perform distreibuted training on multiple GPUs or TPUs but the current setup has just one GPU which makes it
the second best option provided the architecture uses multiple GPUs.
Increase in memory may allow large batch size but wont address the fundamental problem which is over utilised GPU
Early stopping is good for avoiding overfitting when model already starts performing at its best. Its good to reduce overall training time but wont
improve the training speed
upvoted 4 times
Selected Answer: B
Given the options and the goal of decreasing training time, options B (using TPUs) and D (distributed training) are the most effective ways to
achieve this goal
Early stopping is a technique that can help save training time by monitoring a validation metric and stopping the training process when the metric
stops improving. While it can help in terms of stopping unnecessary training runs, it may not provide as substantial a speedup as other options.
upvoted 2 times
Selected Answer: C
A. Increase the instance memory to 512 GB and increase the batch size.
> this will not necessarily decrease training time
B. Replace the NVIDIA P100 GPU with a v3-32 TPU in the training job. Most Voted
> TPU can sacrifice performance
C. Enable early stopping in your Vertex AI Training job.
> YES, this decreases training time without sacrificing performance, if set properly
D. Use the tf.distribute.Strategy API and run a distributed training job.
> No idea .... But I believe the type of machine and architecture cannot be changed as per the wording of the question.
upvoted 1 times
Therefore, replacing the NVIDIA P100 GPU with a v3-32 TPU in the Vertex AI Training job would be the most effective way to decrease training time
while maintaining or even improving model performance
upvoted 2 times
I don't understand why so many people are voting for D (tf.distribute.Strategy API). If we look at our training infrastructure, we can see the
bottleneck is obviously the GPU, which has 12GB or 16GB memory depending on the model
(https://www.leadtek.com/eng/products/ai_hpc(37)/tesla_p100(761)/detail). This means we can afford to have a batch size of only 6-8 images (2GB
each) even if we assume the GPU is utilized 100%. And remember the training size is 3M, which means each epoch will have 375-500K steps in the
best case.
With 32-cores and 128GB memory, we are able to afford higher batch sizes (e.g., 32), so moving to TPU will accelerate the training.
A is wrong because we can't afford a larger batch size with the current GPU. D is wrong because you don't have multiple GPUs and your current
GPU is saturated. C is a viable option, but it seems less optimal than B.
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 206/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
upvoted 4 times
Selected Answer: D
Selected Answer: C
went with C
upvoted 1 times
Selected Answer: A
went with A
upvoted 1 times
Went with B
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 207/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You are a data scientist at an industrial equipment manufacturing company. You are developing a regression model to estimate the power
consumption in the company’s manufacturing plants based on sensor data collected from all of the plants. The sensors collect tens of millions of
records every day. You need to schedule daily training runs for your model that use all the data collected up to the current date. You want your
model to scale smoothly and require minimal development work. What should you do?
B. Develop a custom TensorFlow regression model, and optimize it using Vertex AI Training.
C. Develop a custom scikit-learn regression model, and optimize it using Vertex AI Training.
Correct Answer: A
Selected Answer: D
The key is to understand the amount of data that needs to be used for training - the sensor collects tens of millions of records every day and the
model needs to use all the data up to the current date.
There is a limitation for AutoML is 100M rows -> https://cloud.google.com/vertex-ai/docs/tabular-data/classification-regression/prepare-data
upvoted 12 times
Selected Answer: D
Either A or D . Since not stated where is sensor data stored . hence go for A
upvoted 2 times
Selected Answer: A
I would go with A because it states that it requires minimal development work. Not sure tho, correct me if I’m wrong
upvoted 3 times
Went with D
upvoted 1 times
Old question, the quotas were removed when they moved AutoML into VertexAI.
https://cloud.google.com/vertex-ai/docs/quotas#model_quotas#tabular
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 208/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
upvoted 3 times
BigQuery is an unnecessary distraction IMO (e.g. why would we assume BigQuery and not BigTable!)
upvoted 1 times
Selected Answer: D
Answer D
https://cloud.google.com/blog/products/data-analytics/automl-tables-now-generally-available-bigquery-ml
This legacy version of AutoML Tables is deprecated and will no longer be available on Google Cloud after January 23, 2024. All the functionality of
legacy AutoML Tables and new features are available on the Vertex AI platform. See Migrate to Vertex AI to learn how to migrate your resources.
upvoted 2 times
You require minimal development work and the question doesn't mention if your data is stored in BQ
upvoted 1 times
Selected Answer: D
Selected Answer: A
Selected Answer: A
Vote for D
A dosen't work because AutoML has limits on training data
- https://www.examtopics.com/exams/google/professional-machine-learning-engineer/view/10/
upvoted 3 times
Selected Answer: D
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 209/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You built a custom ML model using scikit-learn. Training time is taking longer than expected. You decide to migrate your model to Vertex AI
Training, and you want to improve the model’s training time. What should you try out first?
B. Train your model in a distributed mode using multiple Compute Engine VMs.
C. Train your model with DLVM images on Vertex AI, and ensure that your code utilizes NumPy and SciPy internal methods whenever possible.
Correct Answer: C
Selected Answer: D
Options B and C may also be relevant in certain scenarios, but they are generally more involved and might require additional considerations.
Option B can be effective for large-scale training tasks, but it might add complexity and overhead. Option C could be helpful, but the impact on
training time might not be as immediate and substantial as using GPUs.
upvoted 2 times
Selected Answer: D
D: Training your model with GPUs can provide a substantial speedup, especially for deep learning models or models that require a lot of
computation. This option is likely to have a significant impact on training time.
NOT C: While optimizing code can help improve training time to some extent, it may not provide as significant a speedup as the other options.
However, it's still a good practice to optimize your code.
upvoted 1 times
I dont think scikit-learn would support GPU or distribution, so based on "What should you try out first?" I think > C. Train your model with DLVM
images on Vertex AI, and ensure that your code utilizes NumPy and SciPy internal methods whenever possible.
upvoted 3 times
Selected Answer: C
Also, most of scikit-learn assumes data is in NumPy arrays or SciPy sparse matrices of a single numeric dtype.
I choose C as the correct answer.
upvoted 3 times
Selected Answer: C
Went with C
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 210/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
upvoted 1 times
Selected Answer: C
Answer C
upvoted 1 times
https://scikit-learn.org/stable/faq.html#will-you-add-gpu-support
upvoted 1 times
Selected Answer: C
C is correct absolutely
https://console.cloud.google.com/marketplace/details/click-to-deploy-images/deeplearning?_ga=2.139171125.787784554.1674450530-
1146240914.1659613735&project=quantum-hash-240404
upvoted 1 times
Selected Answer: C
No D
https://cloud.google.com/vertex-ai/docs/predictions/pre-built-containers?hl=ko#scikit-learn
upvoted 1 times
Selected Answer: D
D (not sure)
- https://cloud.google.com/vertex-ai/docs/training/code-requirements#gpus
upvoted 1 times
Training a machine learning model on a GPU can significantly improve the training time compared to training on a CPU.
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 211/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You are an ML engineer at a travel company. You have been researching customers’ travel behavior for many years, and you have deployed models
that predict customers’ vacation patterns. You have observed that customers’ vacation destinations vary based on seasonality and holidays;
however, these seasonal variations are similar across years. You want to quickly and easily store and compare the model versions and
A. Store the performance statistics in Cloud SQL. Query that database to compare the performance statistics across the model versions.
B. Create versions of your models for each season per year in Vertex AI. Compare the performance statistics across the models in the
C. Store the performance statistics of each pipeline run in Kubeflow under an experiment for each season per year. Compare the results
D. Store the performance statistics of each version of your models using seasons and years as events in Vertex ML Metadata. Compare the
Correct Answer: B
Selected Answer: B
https://cloud.google.com/vertex-ai/docs/model-registry/versioning
Model versioning lets you create multiple versions of the same model. With model versioning, you can organize your models in a way that helps
navigate and understand which changes had what effect on the models. With Vertex AI Model Registry you can view your models and all of their
versions in a single view. You can drill down into specific model versions and see exactly how they performed.
upvoted 1 times
Vertex AI provides a managed environment for machine learning, and creating model versions for each season per year is a structured way to
organize and compare models. You can use the Evaluate tab to compare performance metrics easily. This approach is well-suited for the task.
upvoted 2 times
Vertex ML Metadata is designed for tracking metadata and lineage in machine learning pipelines. While it can store model version information
and performance statistics, it might not provide as straightforward a way to compare models across years and seasons as Vertex AI's model
versioning and evaluation tools.
upvoted 1 times
Selected Answer: D
I absolutely do not master this topicm but I would say correct answer is D.
It does not sound right to systematically create versions of a model beased on seasonality, if the model has not changed. "Events" in metadata
sound right.
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 212/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Selected Answer: D
D. Store the performance statistics of each version of your models using seasons and years as events in Vertex ML Metadata. Compare the results
across the slices.
https://cloud.google.com/vertex-ai/docs/ml-metadata/analyzing#filtering
Which versions of a trained model achieved a certain quality threshold?
upvoted 1 times
Went with D
upvoted 1 times
Selected Answer: B
https://cloud.google.com/vertex-ai/docs/model-registry/versioning
Model versioning lets you create multiple versions of the same model. With model versioning, you can organize your models in a way that helps
navigate and understand which changes had what effect on the models. With Vertex AI Model Registry you can view your models and all of their
versions in a single view. You can drill down into specific model versions and see exactly how they performed.
upvoted 1 times
You can compare evaluation results across different models, model versions, and evaluation jobs --> https://cloud.google.com/vertex-
ai/docs/evaluation/using-model-evaluation
Answer D
upvoted 1 times
Selected Answer: D
D
- https://cloud.google.com/vertex-ai/docs/ml-metadata/introduction
upvoted 2 times
Vote D. It is easy to compare via Vertex ML Metadata UI the performance statistics across the different slices and see how the model performance
varies over time.
upvoted 2 times
Selected Answer: D
i think it is D
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 213/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You are an ML engineer at a manufacturing company. You need to build a model that identifies defects in products based on images of the
product taken at the end of the assembly line. You want your model to preprocess the images with lower computation to quickly extract features
of defects in products. Which approach should you use to build the model?
A. Reinforcement learning
B. Recommender system
Correct Answer: D
Selected Answer: D
Selected Answer: D
D for sure
upvoted 1 times
Selected Answer: D
Went with D
upvoted 1 times
Answer D
upvoted 1 times
Selected Answer: D
CNN scenario
upvoted 1 times
Selected Answer: D
best way
upvoted 1 times
Selected Answer: D
D
CNN is good for images processing
- https://developers.google.com/machine-learning/practica/image-classification/convolutional-neural-networks
upvoted 1 times
Selected Answer: D
Obviously D.
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 214/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You are developing an ML model intended to classify whether X-ray images indicate bone fracture risk. You have trained a ResNet architecture on
Vertex AI using a TPU as an accelerator, however you are unsatisfied with the training time and memory usage. You want to quickly iterate your
training code but make minimal changes to the code. You also want to minimize impact on the model’s accuracy. What should you do?
Correct Answer: A
i think should be D
https://cloud.google.com/tpu/docs/bfloat16
upvoted 5 times
Selected Answer: D
Configuring bfloat16 instead of float32 (D): This offers a good balance between speed, memory usage, and minimal code changes. Bfloat16 uses 1
bits per float value compared to 32 bits for float32.
pen_spark
expand_more This can significantly reduce memory usage while maintaining similar accuracy in many machine learning models, especially for
image recognition tasks.expand_more It's a quick change with minimal impact on the code and potentially large gains in training speed.
upvoted 1 times
"the Google hardware team chose bfloat16 for Cloud TPUs to improve hardware efficiency while maintaining the ability to train deep learning
models accurately, all with minimal switching costs from float32"
upvoted 1 times
Selected Answer: B
while reducing the global batch size (Option B) and configuring your model to use bfloat16 (Option D) are both valid options, reducing the global
batch size is typically a safer and more straightforward choice to quickly iterate and make minimal changes to your code while still achieving
reasonable model performance.
upvoted 1 times
While bfloat16 offers a lower precision compared to float32, it maintains a similar dynamic range. This means that the reduction in numerica
precision is unlikely to have a substantial impact on the accuracy of your model, especially in the context of image classification tasks like
bone fracture risk assessment in X-rays.
While reducing the batch size can decrease memory usage, it can also affect the model's convergence and accuracy. Additionally, TPUs are
highly efficient with large batch sizes, so reducing the batch size might not fully leverage the TPU's capabilities.
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 215/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
upvoted 1 times
Selected Answer: D
Went with D
upvoted 1 times
Selected Answer: D
https://cloud.google.com/tpu/docs/bfloat16
upvoted 1 times
Selected Answer: D
Answer D
upvoted 2 times
Selected Answer: D
I go with D exactly, primarily. the rest don't make any sense at all
upvoted 2 times
Selected Answer: D
It should be D.
upvoted 1 times
D
Agree with mymy9418
upvoted 2 times
Selected Answer: D
Agree with D
upvoted 1 times
Selected Answer: B
It should be B.
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 216/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You have successfully deployed to production a large and complex TensorFlow model trained on tabular data. You want to predict the lifetime
value (LTV) field for each subscription stored in the BigQuery table named subscription. subscriptionPurchase in the project named my-
fortune500-company-project.
You have organized all your training code, from preprocessing data from the BigQuery table up to deploying the validated model to the Vertex AI
endpoint, into a TensorFlow Extended (TFX) pipeline. You want to prevent prediction drift, i.e., a situation when a feature data distribution in
B. Add a model monitoring job where 10% of incoming predictions are sampled 24 hours.
C. Add a model monitoring job where 90% of incoming predictions are sampled 24 hours.
D. Add a model monitoring job where 10% of incoming predictions are sampled every hour.
Correct Answer: C
Selected Answer: B
You need to monitor it first and foremost to see if there is a drift and if there is then a measure can be devised. training every date is an over kill.
upvoted 1 times
Selected Answer: A
Continuous Retraining: Continuously retraining the model allows it to adapt to changes in the data distribution, helping to mitigate prediction drift
Daily retraining provides a good balance between staying up-to-date and avoiding excessive retraining.
Options B, C, and D involve model monitoring but do not address the issue of keeping the model updated with the changing data distribution.
Monitoring alone can help you detect drift, but it does not actively prevent it. Retraining the model is necessary to address drift effectively.
upvoted 3 times
I think you're slightly missing the point, the answer should be B, let me explain why..
The whole point of this question is to come up with a PREVENTATIVE way of handling prediction drift so you need to find a way to DETECT the
drift before it occurs, this is exactly what solution B does and ensures it's done in a way that is not too frequent i.e D and not too resource
intensive with the large sample i.e C remember if sampling is done well you don't need 90% of the data to detect drift.
Solution A suggests retraining every day which is a CRAZY proposal, why would you retrain every day even if you don't know if your data is
drifting?? Huge waste of resources and time.
upvoted 1 times
Went with B
upvoted 1 times
Selected Answer: B
Continuous retraining (option A) is not necessarily the best solution for preventing prediction drift, as it can be time-consuming and expensive.
Instead, monitoring the performance of the model in production is a better approach. Option B is a good choice because it samples a small
percentage of incoming predictions and checks for any significant changes in the feature data distribution over a 24-hour period. This allows you
to detect any drift and take appropriate action to address it before it affects the model's performance. Options C and D are less effective because
they either sample too many or too few predictions and/or at too frequent intervals.
upvoted 4 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 217/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
I am just not sure why sampling too few (10%) is important. Is this a costly service?
upvoted 1 times
In many cases, a 10% sample of the data can provide statistically significant insights into the model's performance and the presence of drift.
It's a balancing act between getting enough data to make informed decisions and not overburdening the system.
In some datasets, especially large ones, a lot of the data might be redundant or not particularly informative. Sampling a smaller fraction can
help filter out noise and focus on the most relevant information.
upvoted 1 times
Answer B
upvoted 1 times
Selected Answer: B
B , I got it from Machine Learning in the Enterprise course for google partnet skillboost
you can watch cafully on video "Model management using Vertex AI"
I imply that it is default setting on typical case.
upvoted 3 times
Using 10% of hourly requests would yield a better distribution and faster feed back loop
upvoted 1 times
Selected Answer: B
Selected Answer: B
B (not sure)
- https://cloud.google.com/vertex-ai/docs/model-monitoring/overview
- https://cloud.google.com/vertex-ai/docs/model-monitoring/using-model-monitoring#drift-detection
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 218/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You recently developed a deep learning model using Keras, and now you are experimenting with different training strategies. First, you trained the
model using a single GPU, but the training process was too slow. Next, you distributed the training across 4 GPUs using
tf.distribute.MirroredStrategy (with no other changes), but you did not observe a decrease in training time. What should you do?
Correct Answer: C
Selected Answer: D
Ans D: Check this link https://www.tensorflow.org/guide/gpu_performance_analysis for details on how to Optimize the performance on the multi-
GPU single host
upvoted 10 times
Selected Answer: D
when using tf.distribute.MirroredStrategy, TensorFlow automatically takes care of distributing the dataset across the available devices (GPUs in this
case).
To make sure that the data is efficiently distributed across the GPUs, you should increase the global batch size. This ensures that each GPU receives
a larger batch of data to process, effectively utilizing the additional computational power. The global batch size is the sum of the batch sizes for all
devices. For example, if you had a batch size of 64 for a single GPU, you would set the global batch size to 256 (64 * 4) when using 4 GPUs.
upvoted 1 times
When you distribute the training across multiple GPUs using tf.distribute.MirroredStrategy, the training time may not decrease if the dataset
loading and preprocessing become a bottleneck. In this case, option A, distributing the dataset with
tf.distribute.Strategy.experimental_distribute_dataset, can help improve the performance.
upvoted 1 times
To make sure that the data is efficiently distributed across the GPUs, you should increase the global batch size. This ensures that each GPU
receives a larger batch of data to process, effectively utilizing the additional computational power. The global batch size is the sum of the batch
sizes for all devices. For example, if you had a batch size of 64 for a single GPU, you would set the global batch size to 256 (64 * 4) when using 4
GPUs.
upvoted 1 times
When going from training with a single GPU to multiple GPUs on the same host, ideally you should experience the performance scaling with only
the additional overhead of gradient communication and increased host thread utilization. Because of this overhead, you will not have an exact 2x
speedup if you move from 1 to 2 GPUs.
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 219/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Try to maximize the batch size, which will lead to higher device utilization and amortize the costs of communication across multiple GPUs. Using
the memory profiler helps get a sense of how close your program is to peak memory utilization. Note that while a higher batch size can affect
convergence, this is usually outweighed by the performance benefits.
upvoted 2 times
Selected Answer: D
Went with D
upvoted 1 times
If distributing the training across multiple GPUs did not result in a decrease in training time, the issue may be related to the batch size being too
small. When using multiple GPUs, each GPU gets a smaller portion of the batch size, which can lead to slower training times due to increased
communication overhead. Therefore, increasing the batch size can help utilize the GPUs more efficiently and speed up training.
upvoted 2 times
Selected Answer: D
Answer D
upvoted 1 times
Selected Answer: D
To speed up the training of the deep learning model, increasing the batch size. When using multiple GPUs with tf.distribute.MirroredStrategy,
increasing the batch size can help to better utilize the additional GPUs and potentially reduce the training time. This is because larger batch sizes
allow each GPU to process more data in parallel, which can help to improve the efficiency of the training process.
upvoted 1 times
Selected Answer: C
TPUs are Google's specialized ASICs designed to dramatically accelerate machine learning workloads. Hence it should be C.
upvoted 1 times
Selected Answer: D
I think it's D
upvoted 1 times
I think its A
upvoted 4 times
B (not sure)
- https://www.tensorflow.org/guide/keras/writing_a_training_loop_from_scratch
-https://www.tensorflow.org/guide/distributed_training#use_tfdistributestrategy_with_custom_training_loops
upvoted 1 times
Selected Answer: A
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 220/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
I think it's A,
https://www.tensorflow.org/api_docs/python/tf/distribute/Strategy#in_short
upvoted 3 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 221/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You work for a gaming company that has millions of customers around the world. All games offer a chat feature that allows players to
communicate with each other in real time. Messages can be typed in more than 20 languages and are translated in real time using the Cloud
Translation API. You have been asked to build an ML system to moderate the chat in real time while assuring that the performance is uniform
across the various languages and without changing the serving infrastructure.
You trained your first model using an in-house word2vec model for embedding the chat messages translated by the Cloud Translation API.
However, the model has significant differences in performance across the different languages. How should you improve it?
A. Add a regularization term such as the Min-Diff algorithm to the loss function.
D. Remove moderation for languages for which the false positive rate is too high.
Correct Answer: D
Selected Answer: B
Answer B
Since the performance of the model varies significantly across different languages, it suggests that the translation process might have introduced
some noise in the chat messages, making it difficult for the model to generalize across languages. One way to address this issue is to train a
classifier using the chat messages in their original language.
upvoted 7 times
Selected Answer: A
uniform performance
upvoted 1 times
Min-diff may reduce model unfairness, but here the concern is about improving performance. Training models avoiding Cloud Natural API should
be more suitable.
upvoted 1 times
Selected Answer: A
A is correct since it encourages the model to have similar performance across languages.
B would entail training 20 word2vec embeddings + maintaining 20 models at the same time. On top of that, there would be no guarantee that
those models will have comparable performance across languages. This is certainly not something you would do after training your first model.
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 222/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Selected Answer: A
A is correct, the key part of the question is „[…] assuring the performance is uniform […]“ which is baked into the Min-Diff regularisation:
https://ai.googleblog.com/2020/11/mitigating-unfair-bias-in-ml-models.html
upvoted 1 times
Selected Answer: B
Went with B
upvoted 1 times
Selected Answer: B
Since the current model has significant differences in performance across the different languages, it is likely that the translations produced by the
Cloud Translation API are not of uniform quality across all languages. Therefore, it would be best to train a classifier using the chat messages in
their original language instead of relying on translations.
This approach has several advantages. First, the model can directly learn the nuances of each language, leading to better performance across all
languages. Second, it eliminates the need for translation, reducing the possibility of errors and improving the overall speed of the system. Finally, it
is a relatively simple approach that can be implemented without changing the serving infrastructure.
upvoted 3 times
should be A
https://ai.googleblog.com/2020/11/mitigating-unfair-bias-in-ml-models.html
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 223/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You work for a gaming company that develops massively multiplayer online (MMO) games. You built a TensorFlow model that predicts whether
players will make in-app purchases of more than $10 in the next two weeks. The model’s predictions will be used to adapt each user’s game
experience. User data is stored in BigQuery. How should you serve your model while optimizing cost, user experience, and ease of management?
A. Import the model into BigQuery ML. Make predictions using batch reading data from BigQuery, and push the data to Cloud SQL
B. Deploy the model to Vertex AI Prediction. Make predictions using batch reading data from Cloud Bigtable, and push the data to Cloud SQL.
C. Embed the model in the mobile application. Make predictions after every in-app purchase event is published in Pub/Sub, and push the data
to Cloud SQL.
D. Embed the model in the streaming Dataflow pipeline. Make predictions after every in-app purchase event is published in Pub/Sub, and push
Correct Answer: A
Selected Answer: A
Selected Answer: A
Embedding the model in a streaming Dataflow pipeline allows low latency predictions on real-time events published to Pub/Sub. This provides a
responsive user experience.
Dataflow provides a managed service to scale predictions and integrate with Pub/Sub, without having to manage servers.
Streaming predictions only when events occur optimizes cost compared to bulk or client-side prediction.
Pushing results to Cloud SQL provides a managed database for persistence.
In contrast, options A and B use inefficient batch predictions. Option C increases mobile app size and cost.
upvoted 1 times
Selected Answer: D
D could be correct
upvoted 1 times
Selected Answer: D
Selected Answer: D
For "used to adapt each user's game experience" points out to non-batch, hence excludes A & B, and embedding the model in the mobile app
would not necessarily "optimize cost". Plus, the classical streaming solution builds on Dataflow along with Pub/Sub and BigQuery, embedding ML
in Dataflow is low-code https://cloud.google.com/blog/products/data-analytics/latest-dataflow-innovations-for-real-time-streaming-and-aiml and
apparently a modified version of the question points to the same direction https://mikaelahonen.com/en/data/gcp-mle-exam-questions/
upvoted 3 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 224/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
there's no need to make a prediction after every in-app purchase event. Am i wrong?
upvoted 3 times
Yeah its A
upvoted 2 times
Answer C
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 225/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You are building a linear regression model on BigQuery ML to predict a customer’s likelihood of purchasing your company’s products. Your model
uses a city name variable as a key predictive component. In order to train and serve the model, your data must be organized in columns. You want
to prepare your data using the least amount of coding while maintaining the predictable variables. What should you do?
A. Use TensorFlow to create a categorical variable with a vocabulary list. Create the vocabulary file, and upload it as part of your model to
BigQuery ML.
B. Create a new view with BigQuery that does not include a column with city information
C. Use Cloud Data Fusion to assign each city to a region labeled as 1, 2, 3, 4, or 5, and then use that number to represent the city in the model.
D. Use Dataprep to transform the state column using a one-hot encoding method, and make each city a column with binary values.
Correct Answer: B
Selected Answer: D
A. Using TensorFlow: This is an overkill for this scenario. BigQuery ML can handle one-hot encoding natively within Dataprep.
B. Excluding City Information: This removes a potentially important predictive variable, reducing model accuracy.
C. Assigning Region Labels: This approach loses granularity and might not capture the specific variations between cities.
upvoted 2 times
Selected Answer: D
Selected Answer: D
Went with D
upvoted 1 times
Selected Answer: D
Answer D
upvoted 1 times
Selected Answer: C
for a fuller answer, D--> transforms “state” column not city column
C--> at least works with city column
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 226/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Selected Answer: D
https://docs.trifacta.com/display/SS/Prepare+Data+for+Machine+Processing
upvoted 2 times
Selected Answer: D
This will allow you to maintain the city name variable as a predictor while ensuring that the data is in a format that can be used to train a linear
regression model on BigQuery ML.
upvoted 1 times
Answer D
upvoted 1 times
Selected Answer: D
Selected Answer: C
I vote for C
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 227/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You are an ML engineer at a bank that has a mobile application. Management has asked you to build an ML-based biometric authentication for the
app that verifies a customer’s identity based on their fingerprint. Fingerprints are considered highly sensitive personal information and cannot be
downloaded and stored into the bank databases. Which learning strategy should you recommend to train and deploy this ML mode?
B. Federated learning
D. Differential privacy
Correct Answer: C
Selected Answer: B
B
With federated learning, all the data is collected, and the model is trained with algorithms across multiple decentralized edge devices such as cell
phones or websites, without exchanging them.
(Journey to Become a Google Cloud Machine Learning Engineer: Build the mind and hand of a Google Certified ML professional)
upvoted 8 times
Selected Answer: B
Federated learning allows training the model on the user's devices themselves.
pen_spark
expand_more The model updates its parameters based on local training data on the device without ever needing the raw fingerprint information to
leave the device. This ensures the highest level of privacy for sensitive biometric data.
upvoted 1 times
Selected Answer: B
Went with B
upvoted 1 times
I think the giveaway is in the question "Which learning strategy.."... Federated Learning seems to be the only one !
upvoted 3 times
B. Federated learning would be the best learning strategy to train and deploy the ML model for biometric authentication in this scenario. Federated
learning allows for training an ML model on distributed data without transferring the raw data to a centralized location.
upvoted 1 times
Selected Answer: A
Ans is A for me
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 228/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Selected Answer: A
It seems A, to me.
upvoted 1 times
Selected Answer: B
Federated Learning enables mobile phones to collaboratively learn a shared prediction model while keeping all the training data on device.
https://ai.googleblog.com/2017/04/federated-learning-collaborative.html
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 229/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You are experimenting with a built-in distributed XGBoost model in Vertex AI Workbench user-managed notebooks. You use BigQuery to split your
data into training and validation sets using the following queries:
After training the model, you achieve an area under the receiver operating characteristic curve (AUC ROC) value of 0.8, but after deploying the
model to production, you notice that your model performance has dropped to an AUC ROC value of 0.65. What problem is most likely occurring?
C. The tables that you created to hold your training and validation records share some records, and you may not be using all the data in your
initial table.
D. The RAND() function generated a number that is less than 0.2 in both instances, so every record in the validation table will also be in the
training table.
Correct Answer: A
Selected Answer: C
- Excluding D as RAND() samples 80% for “.training” & 20% for “.validaton”: https://stackoverflow.com/questions/42115968/how-does-rand-works
in-bigquery;
- Could be that those 2 samplings share some records since pseudo-randomly sampled over the same “.mytable”, & therefore might not be using
all of its data, thus C seems valid;
- Excluding B as there is no indication otherwise of insufficient amount of training data, after training AUC ROC was 0.8, that we know;
- There could be a training-serving skew occurring in Prod, but “most likely occurring” is C as a result of the selective information presented:
https://developers.google.com/machine-learning/guides/rules-of-ml#training-serving_skew
upvoted 3 times
Answer C
upvoted 2 times
Selected Answer: C
Answer C
upvoted 1 times
Selected Answer: C
since we are calling rand twice it might be that data that was in training set ends up in testing set too. If we had called it just once I would say D.
upvoted 2 times
Selected Answer: D
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 230/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
If there were one RAND() in front of those two queries it would be true. There are two separate RAND() and "every record in the validation table
will also be in the training table" is not true.
upvoted 2 times
Selected Answer: C
C (not sure)
upvoted 4 times
Selected Answer: C
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 231/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
During batch training of a neural network, you notice that there is an oscillation in the loss. How should you adjust your model to ensure that it
converges?
Correct Answer: C
Selected Answer: B
B
larger learning rates can reduce training time but may lead to model oscillation and may miss the optimal model parameter values.
upvoted 7 times
Selected Answer: B
A. Decrease Batch Size: While a smaller batch size can sometimes help with convergence, it can also lead to slower training. It might not necessarily
address the issue of oscillation.
C. Increase Learning Rate: A higher learning rate can cause the loss to jump around more erratically, potentially worsening the oscillation problem.
D. Increase Batch Size: A larger batch size can lead to smoother updates but might also make the model less sensitive to local gradients and hinde
convergence, especially with an already oscillating loss.
upvoted 1 times
Selected Answer: C
I don't understand
upvoted 1 times
Went with B
upvoted 1 times
Answer B
upvoted 1 times
Selected Answer: B
having a large learning rate results in Instability or Oscillations. Thus, the first solution is to tune the learning rate by gradually decreasing it.
https://towardsdatascience.com/8-common-pitfalls-in-neural-network-training-workarounds-for-them-7d3de51763ad
upvoted 1 times
Selected Answer: B
https://ai.stackexchange.com/questions/14079/what-could-an-oscillating-training-loss-curve-
represent#:~:text=Try%20lowering%20the%20learning%20rate,step%20and%20overshoot%20it%20again.
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 232/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You work for a toy manufacturer that has been experiencing a large increase in demand. You need to build an ML model to reduce the amount of
time spent by quality control inspectors checking for product defects. Faster defect detection is a priority. The factory does not have reliable Wi-
Fi. Your company wants to implement the new ML model as soon as possible. Which model should you use?
Correct Answer: D
Hence faster defect detection is a priority, AutoML Vision Edge mobile-low-latency-1 model should be the choice. This model is designed to run
efficiently on mobile devices and prioritize low latency, which means that it can provide fast defect detection without requiring a connection to the
cloud.
https://cloud.google.com/vision/automl/docs/train-edge
upvoted 8 times
Selected Answer: B
B
"reduce the amount of time spent by quality control inspectors checking for product defects."-> low latency
upvoted 6 times
Selected Answer: B
The AutoML Vision Edge mobile-low-latency-1 model prioritizes speed over accuracy, making it ideal for real-time defect detection on the factory
floor without a stable internet connection. This allows for faster inspections and quicker identification of faulty products.
upvoted 2 times
Selected Answer: B
Went with B
upvoted 1 times
Selected Answer: B
Answer B
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 233/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
upvoted 1 times
Selected Answer: B
It's B.
upvoted 1 times
Selected Answer: B
vote B
upvoted 4 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 234/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You need to build classification workflows over several structured datasets currently stored in BigQuery. Because you will be performing the
classification several times, you want to complete the following steps without writing code: exploratory data analysis, feature selection, model
building, training, and hyperparameter tuning and serving. What should you do?
Correct Answer: A
Selected Answer: B
B (similar to question 7)
upvoted 7 times
Selected Answer: B
Vertex AutoML is a Google Cloud Platform service designed for building machine learning models without writing code.expand_more It automates
various stages of the machine learning pipeline, including those you mentioned:
Selected Answer: B
Went with B
upvoted 1 times
Selected Answer: B
Answer B
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 235/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You are an ML engineer in the contact center of a large enterprise. You need to build a sentiment analysis tool that predicts customer sentiment
from recorded phone conversations. You need to identify the best approach to building a model while ensuring that the gender, age, and cultural
differences of the customers who called the contact center do not impact any stage of the model development pipeline and results. What should
you do?
A. Convert the speech to text and extract sentiments based on the sentences.
B. Convert the speech to text and build a model based on the words.
D. Convert the speech to text and extract sentiment using syntactical analysis.
Correct Answer: D
Selected Answer: A
Selected Answer: A
A. Convert speech to text and extract sentiments based on sentences: This method focuses on the content of the conversation, minimizing the
influence of factors like voice tone (which can be culturally or gender-specific). Sentiment analysis techniques can analyze the meaning and contex
of sentences to identify positive, negative, or neutral sentiment.
upvoted 2 times
C. Extract sentiment directly from voice recordings: This approach can be biased as voice characteristics like pitch or pace can vary based on
gender, age, and cultural background.
D. Convert speech to text and extract sentiment using syntactical analysis: While syntax can provide some clues, it's not the strongest indicator
of sentiment. Additionally, cultural differences in sentence structure could impact accuracy.
upvoted 1 times
"May’s sentence embedding adaptation of WEAT, known as the Sentence Embedding Association Test (SEAT), shows less clear racial and gender
bias in language models and embeddings than the corresponding word embedding formulation"
From: https://medium.com/institute-for-applied-computational-science/bias-in-nlp-embeddings-b1dabb8bbe20
upvoted 2 times
This approach involves converting the speech to text, which allows you to analyze the content of the conversations without directly dealing with
the speakers' gender, age, or cultural differences. By building a model based on the words, you can focus on the language used in the
conversations to predict sentiment, making the model more inclusive and less sensitive to demographic factors.
Option A could be influenced by the syntactical nuances and structures used in different cultures, and option C might be impacted by the
variations in voice tones across genders and ages. Option B, on the other hand, relies on the text content, which provides a more neutral and
content-focused basis for sentiment analysis.
upvoted 2 times
Selected Answer: B
B: People of different cultures will often use difference sentence structures, so words would be safer than sentences
upvoted 1 times
upvoted 1 times
Selected Answer: A
building a model based on words, may also be effective but could potentially be influenced by factors such as accents, dialects, or language
variations that may differ between speakers.extracting sentiment directly from voice recordings, may be less accurate due to the subjective nature
of interpreting emotions from audio alone.using syntactical analysis, may be useful in certain contexts but may not capture the full range of
sentiment expressed in a conversation. Therefore, A provides the most comprehensive and unbiased approach to sentiment analysis in this
scenario.
upvoted 1 times
Selected Answer: A
Answer A
upvoted 1 times
By working directly with the audio data, you can account for important aspects like tone, pitch, and rhythm of speech, which might provide
valuable information regarding sentiment.
upvoted 1 times
Selected Answer: C
There is the possibility for a more sophisticated architecture for an audio processing pipeline, and the “not impact any stage of the model
development pipeline and results” somewhat calls for a more holistic answer: https://cloud.google.com/architecture/categorizing-audio-files-
using-ml#converting_speech_to_text. Plus, it adds “voice emotion information, related to an audio recording, indicating that a vocal utterance of a
speaker is spoken with negative or positive emotion”: https://patents.google.com/patent/US20140220526A1/en.
upvoted 2 times
Selected Answer: B
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 237/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Can anyone explain how to choose between words and sentences? I feel like the model could pick up bias from both
upvoted 1 times
I agree with qaz09. To avoid demographical variables influence model shoud be built on the words.
upvoted 2 times
Answer A
upvoted 1 times
For "ensuring that the gender, age, and cultural differences of the customers who called the contact center do not impact any stage of the model
development pipeline and results" I think the model should be built on the words rather than sentences
upvoted 3 times
Selected Answer: A
A
Convert the speech to text and extract sentiments based on the sentences.
upvoted 1 times
Selected Answer: D
vote D
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 238/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You need to analyze user activity data from your company’s mobile applications. Your team will use BigQuery for data analysis, transformation,
and experimentation with ML algorithms. You need to ensure real-time ingestion of the user activity data into BigQuery. What should you do?
B. Run an Apache Spark streaming job on Dataproc to ingest the data into BigQuery.
D. Configure Pub/Sub and a Dataflow streaming job to ingest the data into BigQuery,
Correct Answer: A
Selected Answer: A
Selected Answer: D
Werner123 i agree
upvoted 1 times
Selected Answer: D
User data would most likely include PII, for that case it is still recommended to use Dataflow since you need to remove/anonymise sensitive data.
upvoted 2 times
Selected Answer: A
Agree with TNT87. From the same link: “For Pub/Sub messages where advanced preload transformations or data processing before landing data in
BigQuery (such as masking PII) is necessary, we still recommend going through Dataflow.” It’s “analyze user activity data”, not merely streaming IoT
into BigQuery so that concerns like privacy are per se n/a. One can deal with PII after landing in BigQuery as well, but apparently that’s not what
they recommend.
upvoted 3 times
Selected Answer: D
Selected Answer: D
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 239/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
D. Configure Pub/Sub and a Dataflow streaming job to ingest the data into BigQuery.
This solution involves using Google Cloud Pub/Sub as the messaging service to receive the data from the mobile application, and then using
Google Cloud Dataflow to transform and load the data into BigQuery in real time. Pub/Sub is a scalable and reliable messaging service that can
handle high-volume real-time data streaming, while Dataflow provides a unified programming model to develop and run data processing
pipelines. This solution is suitable for handling large volumes of user activity data from mobile applications and ingesting it into BigQuery in real-
time for analysis and ML experimentation.
upvoted 2 times
Selected Answer: A
A
agree with pshemol
upvoted 3 times
need dataflow
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 240/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You work for a gaming company that manages a popular online multiplayer game where teams with 6 players play against each other in 5-minute
battles. There are many new players every day. You need to build a model that automatically assigns available players to teams in real time. User
research indicates that the game is more enjoyable when battles have players with similar skill levels. Which business metrics should you track to
B. Precision and recall of assigning players to teams based on their predicted versus actual ability
C. User engagement as measured by the number of battles played daily per user
D. Rate of return as measured by additional revenue generated minus the cost of developing a new model
Correct Answer: C
Selected Answer: C
The game is more enjoyable - the better and "business metrics" points me to user engagement as best metric
upvoted 8 times
Selected Answer: C
focusing on user engagement through the number of battles played daily provides a clearer indication of whether the model successfully creates
balanced and enjoyable matches, which is the core objective. If players find battles more engaging due to fairer competition, they're more likely to
keep playing. This can then translate to long-term benefits like increased retention and potential monetization opportunities.
upvoted 1 times
Selected Answer: C
Looking for "business metrics to track," I think C could be the most important metric. Although, option B is also a good choice.
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 241/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Selected Answer: C
The focus is to obtain a model that assigns players to teams with players with similar level of skill (or average team 1 skill == average team 2 skill)
A: A fast queue assignment may not focus on pearing players with the same levels of skills. A random assignment would work.
B: This would be an option but is more difficult to measure than C, we don’t know If we have a measure of skill level. Also, for new players this
metric would not be available at the beginning. I think “There are many new players every day.” is a key point important to discard answer B.
C: Players play more games daily ← players enjoy the game more frequently and the other way round should also apply. Easy to measure also for
new players.
I would go with C.
upvoted 2 times
Selected Answer: C
Selected Answer: C
Went with C
upvoted 1 times
Selected Answer: B
This is B, as it directly relates to our model's ability to predict player ability. There are many factors beyond our model which will impact user
engagement (e.g. whether the game is actually enjoyable) so it's not a good measurement of the model performance
upvoted 3 times
Answer C
upvoted 1 times
Selected Answer: C
The question is asking about "available players". Therefore, the business metric is the user engagement.
upvoted 4 times
Selected Answer: C
Asks for >business metric<, and problem states "user research indicates that the game is more enjoyable when battles have players with similar
skill levels.", which means more battles per user if your model is performing well.
upvoted 1 times
It's C. The question specifically asks for a business metric. Precision and recall are not business metrics, but user engagement is
upvoted 4 times
The template uses the 'ability' to create teams. For this, we can conclude that the system measures the player's skill. So, nothing better than
comparing the predict ability with the actual ability to understand the performance of the model.
upvoted 3 times
Selected Answer: B
These two metrics are the most relevant for measuring the performance of the model in assigning players to teams based on skill level. The
average wait time can indicate whether the model is making efficient and quick team assignments, while precision and recall can measure the
accuracy of the model's predictions. It's important to balance precision and recall since assigning players to a team with a large difference in skill
level could have a negative impact on the players' gaming experience.
C and D are also important metrics to track, but they may not be as directly tied to the performance of the team assignment model. User
engagement can indicate the success of the overall gaming experience, but it can be influenced by other factors beyond team assignments. The
rate of return is also an important metric, but it may not be a direct measure of the success of the team assignment model.
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 242/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
upvoted 4 times
Selected Answer: B
To measure the performance of a model that assigns available players to teams in real time, the business metrics that should be tracked should
reflect the ability of the model to effectively balance the skill levels of players in battles. Therefore, the best answer is B, precision and recall of
assigning players to teams based on their predicted versus actual ability.
upvoted 3 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 243/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You are building an ML model to predict trends in the stock market based on a wide range of factors. While exploring the data, you notice that
some features have a large range. You want to ensure that the features with the largest magnitude don’t overfit the model. What should you do?
B. Apply a principal component analysis (PCA) to minimize the effect of any particular feature.
C. Use a binning strategy to replace the magnitude of each feature with the appropriate bin number.
Correct Answer: D
Selected Answer: D
D. Normalize the data by scaling it to have values between 0 and 1 (Min-Max scaling): This technique ensures all features contribute proportionally
to the model's learning process.
pen_spark
expand_more It prevents features with a larger magnitude from dominating the model and reduces the risk of overfitting.expand_more
upvoted 2 times
B. Apply a principal component analysis (PCA) to minimize the effect of any particular feature: PCA is a dimensionality reduction technique that
can be useful, but its primary function is to reduce the number of features, not specifically address differences in feature scales.
C. Use a binning strategy to replace the magnitude of each feature with the appropriate bin number: Binning can introduce information loss an
might not capture the nuances within each bin, potentially affecting the model's accuracy.
upvoted 1 times
Selected Answer: D
Selected Answer: D
Not A because a logarithmic transformation may be appropriate for data with a skewed distribution, but it doesn't necessarily address the issue of
features having different scales.
upvoted 4 times
Features with a larger magnitude might still dominate after a log transformation if the range of values is significantly different from other features.
Scaling is better, will go with Option D
upvoted 1 times
Selected Answer: D
The correct answer is D. Min-max scaling will render all variables comparable by bringing them to a common ground.
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 244/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
Selected Answer: D
Selected Answer: A
From my point of view, log transformation is more tolerant to outliers. Thus, went to A.
upvoted 1 times
Selected Answer: A
See https://developers.google.com/machine-learning/data-prep/transform/normalization
upvoted 2 times
Selected Answer: A
A is a better option because Log transform data used when we want a heavily skewed feature to be transformed into a normal distribution as close
as possible, because when you normalize data using Minimum Maximum scaler, It doesn't work well with many outliers and its prone to
unexpected behaviours if values go out of the given range in the test set. It is a less popular alternative to scaling.
upvoted 1 times
Selected Answer: D
The challenge is the “scale” (significant variations in magnitude and spread): https://stats.stackexchange.com/questions/462380/does-data-
normalization-reduce-over-fitting-when-training-a-model,
apparently largely used anyhow: https://itadviser.dev/stock-market-data-normalization-for-time-series/.
upvoted 1 times
Selected Answer: D
The question doesn't talk about the skewness within each feature. It talks about normalizing the effect of features with large range. So scaling each
feature within (0,1) range will solve the problem
upvoted 1 times
upvoted 1 times
Selected Answer: C
I think C could be a better choice. Bucketizing the data we can fix the distribution problem by bins.
in letter A, standardization by log could not be effective if the range of the data has negative and positive values.
In letter D, definitely normalization does not resolve the skew problem. Data normalization assumes that data has some normal distribution.
https://medium.com/analytics-vidhya/data-transformation-for-numeric-features-fb16757382c0
upvoted 3 times
Standardization and normalization are common techniques to preprocess the data to be more suitable for machine learning models. Normalization
scales the data to be within a specific range (commonly between 0 and 1 or -1 and 1), which can help prevent features with large magnitudes from
dominating the model. This approach is especially useful when using models that are sensitive to the magnitude of features, such as distance-
based models or neural networks.
upvoted 1 times
https://developers.google.com/machine-learning/data-prep/transform/normalization#log-scaling
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 246/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You work for a biotech startup that is experimenting with deep learning ML models based on properties of biological organisms. Your team
frequently works on early-stage experiments with new architectures of ML models, and writes custom TensorFlow ops in C++. You train your
models on large datasets and large batch sizes. Your typical batch size has 1024 examples, and each example is about 1 MB in size. The average
size of a network with all weights and embeddings is 20 GB. What hardware should you choose for your models?
A. A cluster with 2 n1-highcpu-64 machines, each with 8 NVIDIA Tesla V100 GPUs (128 GB GPU memory in total), and a n1-highcpu-64
B. A cluster with 2 a2-megagpu-16g machines, each with 16 NVIDIA Tesla A100 GPUs (640 GB GPU memory in total), 96 vCPUs, and 1.4 TB
RAM
Correct Answer: B
Selected Answer: D
D: use CPU when models that contain many custom TensorFlow operations written in C++
https://cloud.google.com/tpu/docs/intro-to-tpu#cpus
upvoted 5 times
Selected Answer: D
B looks like unleashing a rocket launcher to swat a fly ("early-stage experiments"). D is enough (c++).
upvoted 2 times
Selected Answer: D
D: use CPU when models that contain many custom TensorFlow operations written in C++
https://cloud.google.com/tpu/docs/intro-to-tpu#cpus
upvoted 3 times
“writes custom TensorFlow ops in C++” -> use CPUs when “Models that contain many custom TensorFlow operations written in C++”:
https://cloud.google.com/tpu/docs/intro-to-tpu#when_to_use_tpus
upvoted 2 times
Selected Answer: B
The best hardware for your models would be a cluster with 2 a2-megagpu-16g machines, each with 16 NVIDIA Tesla A100 GPUs (640 GB GPU
memory in total), 96 vCPUs, and 1.4 TB RAM.
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 247/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
High GPU memory: Each A100 GPU has 40 GB of memory, which is more than enough to store the weights and embeddings of your models.
Large batch sizes: With 16 GPUs per machine, you can train your models with large batch sizes, which will improve training speed.
Fast CPUs: The 96 vCPUs on each machine will provide the processing power you need to run your custom TensorFlow ops in C++.
Adequate RAM: The 1.4 TB of RAM on each machine will ensure that your models have enough memory to train and run.
The other options are not as suitable for your needs. Option A has less GPU memory, which will slow down training. Option B has more GPU
memory, but it is also more expensive. Option C has a TPU, which is a good option for some deep learning tasks, but it is not as well-suited for
your needs as a GPU cluster. Option D has more vCPUs and RAM, but it does not have enough GPU memory to train your models.
Therefore, the best hardware for your models is a cluster with 2 a2-megagpu-16g machines.
upvoted 3 times
To determine the appropriate hardware for training the models, we need to calculate the required memory and processing power based on the siz
of the model and the size of the input data.
Given that the batch size is 1024 and each example is 1 MB, the total size of each batch is 1024 * 1 MB = 1024 MB = 1 GB. Therefore, we need to
load 1 GB of data into memory for each batch.
The total size of the network is 20 GB, which means that it can fit in the memory of most modern GPUs.
upvoted 3 times
Selected Answer: D
It's D
upvoted 1 times
D
CPUs are recommended for TensorFlow ops written in C++
- https://cloud.google.com/tpu/docs/tensorflow-ops (Cloud TPU only supports Python)
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 248/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You are an ML engineer at an ecommerce company and have been tasked with building a model that predicts how much inventory the logistics
team should order each month. Which approach should you take?
A. Use a clustering algorithm to group popular items together. Give the list to the logistics team so they can increase inventory of the popular
items.
B. Use a regression model to predict how much additional inventory should be purchased each month. Give the results to the logistics team at
the beginning of the month so they can increase inventory by the amount predicted by the model.
C. Use a time series forecasting model to predict each item's monthly sales. Give the results to the logistics team so they can base inventory
D. Use a classification model to classify inventory levels as UNDER_STOCKED, OVER_STOCKED, and CORRECTLY_STOCKEGive the report to the
Correct Answer: B
Selected Answer: C
This type of model is well-suited to predicting inventory levels because it can take into account trends and patterns in the data over time, such as
seasonal fluctuations in demand or changes in customer behavior.
upvoted 8 times
Selected Answer: C
https://cloud.google.com/learn/what-is-time-series
"For example, a large retail store may have millions of items to forecast so that inventory is available when demand is high, and not overstocked
when demand is low."
upvoted 1 times
Answer C
upvoted 1 times
Selected Answer: C
Selected Answer: C
C (by experience)
Use a time series forecasting model to predict each item's monthly sales. Give the results to the logistics team so they can base inventory on the
amount predicted by the model.
upvoted 2 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 249/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics
You are building a TensorFlow model for a financial institution that predicts the impact of consumer spending on inflation globally. Due to the size
and nature of the data, your model is long-running across all types of hardware, and you have built frequent checkpointing into the training
process. Your organization has asked you to minimize cost. What hardware should you choose?
A. A Vertex AI Workbench user-managed notebooks instance running on an n1-standard-16 with 4 NVIDIA P100 GPUs
B. A Vertex AI Workbench user-managed notebooks instance running on an n1-standard-16 with an NVIDIA P100 GPU
C. A Vertex AI Workbench user-managed notebooks instance running on an n1-standard-16 with a non-preemptible v3-8 TPU
D. A Vertex AI Workbench user-managed notebooks instance running on an n1-standard-16 with a preemptible v3-8 TPU
Correct Answer: B
Selected Answer: D
D
you have built frequent checkpointing into the training process / minimize cost -> preemptible
upvoted 7 times
Selected Answer: D
Preemptible v3-8 TPUs are the most cost-effective option for training large TensorFlow models. They are up to 80% cheaper than non-preemptible
v3-8 TPUs, and they are only preempted if Google Cloud needs the resources for other workloads.
In this case, the model is long-running and checkpointing is used. This means that the training process can be interrupted and resumed without
losing any progress. Therefore, preemptible TPUs are a safe choice, as the training process will not be interrupted if the TPU is preempted.
Selected Answer: D
Answer D
upvoted 1 times
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 250/505