0% found this document useful (0 votes)

181 views250 pages

Professional Machine Learning Engineer-Part1

The document contains a series of questions and answers related to the Professional Machine Learning Engineer Exam, focusing on various machine learning scenarios and solutions. Each question presents a specific problem, options for answers, and community feedback on the correctness of those answers. The document serves as a resource for exam preparation, featuring expert-verified content and user discussions.

Uploaded by

Aline Conti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

181 views250 pages

Professional Machine Learning Engineer-Part1

Uploaded by

Aline Conti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 250

29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

- Expert Verified, Online, Free.

 Custom View Settings

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 1/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Topic 1 - Single Topic

Question #1 Topic 1

You are building an ML model to detect anomalies in real-time sensor data. You will use Pub/Sub to handle incoming requests. You want to store

the results for analytics and visualization. How should you configure the pipeline?

A. 1 = Dataflow, 2 = AI Platform, 3 = BigQuery

B. 1 = DataProc, 2 = AutoML, 3 = Cloud Bigtable

C. 1 = BigQuery, 2 = AutoML, 3 = Cloud Functions

D. 1 = BigQuery, 2 = AI Platform, 3 = Cloud Storage

Correct Answer: C

Reference:

https://cloud.google.com/solutions/building-anomaly-detection-dataflow-bigqueryml-dlp

Community vote distribution

A (100%)

  esuaaaa Highly Voted  2 years, 11 months ago

Definitely A. Dataflow is must.

upvoted 19 times

  inder0007 Highly Voted  2 years, 11 months ago

Even if I follow the link, it should be dataflow, AI-Platform and Bigquery.

Real answer should be A
upvoted 9 times

  RPS007 Most Recent  4 days, 8 hours ago

Selected Answer: A

Verified Answer
upvoted 1 times

  Shreeti_Saha 3 weeks, 3 days ago

Option A
upvoted 1 times

  fragkris 5 months ago

Selected Answer: A

A - Dataflow is the only correct option for this case.

upvoted 1 times

  RangasamyArran 5 months, 4 weeks ago

AutoML is useful for labeled data. So either A or D. Dataflow is must for pipeline so A is correct
upvoted 1 times

  LMDY 7 months ago

Selected Answer: A

A. Definitely it's the correct answer

upvoted 1 times

  deepakno 7 months, 4 weeks ago

This use case similar to anomaly detection also points to A only.
https://cloud.google.com/blog/products/data-analytics/anomaly-detection-using-streaming-analytics-and-ai
upvoted 1 times

  Element26 8 months, 1 week ago

The answers discussed here where considered wrong when I took the mock test of GCP PMLE. Can anyone assist on this whether to go with with
the one given my examtopics ?
upvoted 1 times

  sachinbhavar 8 months, 3 weeks ago

The correct answer is D. 1 = BigQuery, 2 = AI Platform, 3 = Cloud Storage.
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 2/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

  Padmavi 9 months, 1 week ago

Hi All,
I am going to take this exam. Can anyone please share the complete 166 questions and answer to the below gmail address. I cant afford for the
Contributor Access. It would be of great help if anyone would forward the questions and answers to my mail id
manyaprai1691995@gmail.com
Thank you
upvoted 1 times

  Remi2021 9 months, 2 weeks ago

right answer i A. Dataflow is a must.
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: A

Went with A
upvoted 1 times

  Puneet2022 1 year ago

Selected Answer: A

Dataflow is required
upvoted 1 times

  Puneet2022 1 year ago

A, data flow is required
upvoted 1 times

  encasfer 1 year, 1 month ago

Selected Answer: A

Definitely A.
upvoted 1 times

  f828ba8 1 year, 2 months ago

Selected Answer: A

Definitely A.
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 3/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #2 Topic 1

Your organization wants to make its internal shuttle service route more efficient. The shuttles currently stop at all pick-up points across the city

every 30 minutes between 7 am and 10 am. The development team has already built an application on Google Kubernetes Engine that requires

users to confirm their presence and shuttle station one day in advance. What approach should you take?

A. 1. Build a tree-based regression model that predicts how many passengers will be picked up at each shuttle station. 2. Dispatch an

appropriately sized shuttle and provide the map with the required stops based on the prediction.

B. 1. Build a tree-based classification model that predicts whether the shuttle should pick up passengers at each shuttle station. 2. Dispatch

an available shuttle and provide the map with the required stops based on the prediction.

C. 1. Define the optimal route as the shortest route that passes by all shuttle stations with confirmed attendance at the given time under

capacity constraints. 2. Dispatch an appropriately sized shuttle and indicate the required stops on the map.

D. 1. Build a reinforcement learning model with tree-based classification models that predict the presence of passengers at shuttle stops as

agents and a reward function around a distance-based metric. 2. Dispatch an appropriately sized shuttle and provide the map with the

required stops based on the simulated outcome.

Correct Answer: A

Community vote distribution

C (93%) 7%

  nissili Highly Voted  2 years, 10 months ago

C: for all confirmed.

upvoted 23 times

  sensev 2 years, 9 months ago

I agree with this, because it mentioned that they now "require users to confirm their presence". I think this is an example of when a classical
routing algorithm is a better fit compare to ML-approach.
upvoted 11 times

  fragkris Most Recent  5 months ago

Selected Answer: C

C - Since we have the attendance list in advance. Tree-based classification, regression and reinforced learning sounds useless in this case.
upvoted 2 times

  Sum_Sum 5 months, 2 weeks ago

Selected Answer: C

you do not need to predict how many people will be at each station as the requirement mentions they have to register a day in advance
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: C

Went with C
upvoted 1 times

  n_shanthi 1 year ago

I think it should be C. I can easily eliminate D, this is not a case for reinforcement learning. Moreover, it seems like a Route Optimization rather than
finding out best sized shuttle as mentioned in A or whether the shuttle should stop at a point as per point B.
upvoted 1 times

  asava 1 year, 1 month ago

Selected Answer: C

This is a route optimization problem

upvoted 1 times

  will7722 1 year, 1 month ago

why is C the wrong answer? this is a machine learning test, C has nothing to do with prediction
upvoted 1 times

  EFIGO 1 year, 5 months ago

Selected Answer: C

No need to predict the presences since they are already confirmed, best thing we can do is optimize the route
upvoted 3 times

  abhi0706 1 year, 6 months ago

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 4/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

C. route more efficient is an optimization model

upvoted 1 times

  GCP72 1 year, 8 months ago

Selected Answer: C

C is looks correct for me

upvoted 1 times

  Dr_Ethan 1 year, 8 months ago

Confirmed C
upvoted 1 times

  enghabeth 1 year, 9 months ago

Selected Answer: C

C. route more efficient is an optimization model

upvoted 2 times

  David_ml 1 year, 11 months ago

Selected Answer: C

Answer is C
upvoted 1 times

  David_ml 2 years ago

Selected Answer: C

correct answer is C
upvoted 1 times

  baimus 2 years, 1 month ago

Answer is C. This is a case where machine learning would be terrible, as it would not be 100% accurate and some passengers would not get picked
up. A simple algorith works better here, and the question confirms customers will be indicating when they are at the stop so no ML required.
upvoted 1 times

  giaZ 2 years, 2 months ago

C. Why would you want to predict anything here? The info on how many passengers will be and at which stations are already given by passengers
themselves.
upvoted 1 times

  sid515 2 years, 3 months ago

Selected Answer: C

It needs to be C. No use of ML here

upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 5/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #3 Topic 1

You were asked to investigate failures of a production line component based on sensor readings. After receiving the dataset, you discover that

less than 1% of the readings are positive examples representing failure incidents. You have tried to train several classification models, but none of

them converge. How should you resolve the class imbalance problem?

A. Use the class distribution to generate 10% positive examples.

B. Use a convolutional neural network with max pooling and softmax activation.

C. Downsample the data with upweighting to create a sample with 10% positive examples.

D. Remove negative examples until the numbers of positive and negative examples are equal.

Correct Answer: B

Reference:

https://towardsdatascience.com/convolution-neural-networks-a-beginners-guide-implementing-a-mnist-hand-written-digit-8aa60330d022

Community vote distribution

C (92%) 8%

  celia20200410 Highly Voted  2 years, 9 months ago

ANS: C
https://developers.google.com/machine-learning/data-prep/construct/sampling-splitting/imbalanced-data#downsampling-and-upweighting

- less than 1% of the readings are positive

- none of them converge.

Downsampling (in this context) means training on a disproportionately low subset of the majority class examples.
upvoted 29 times

  mousseUwU 2 years, 6 months ago

Agree, C is correct
upvoted 1 times

  MisterHairy Highly Voted  2 years, 4 months ago

=New Question3=
You are going to train a DNN regression model with Keras APJs using this code:

model - tf.keras.Sequential() model.add(tf.keras.layers.Oense(

256,
use_bias-True,
activation-•relu',
kernel_initializer-None,
kernel_regularizer-None,
input_shape-(500,)))
model.add(tf.keras.layers.Oropout(rate-0.25))
model.add(tf.keras.layers.Oense(
128, use_bias-True,
activation-•relu',
kernel_initializer-'uniform',
kernel_regularizer-'12'))
model.add(tf.keras.layers.Oropout(rate-0.25))
model.add(tf.keras.layers.Oense(
2, use_bias-False,
activation-•softriax'))
model.cornpile(loss-•mse')

How many trainable weights does your model have? (The arithmetic below is correct.)

A. 501*256+257*128+2 = 161154
B. 500*256+256*128+128*2 = 161024
C. 501*256+257*128+128*2 = 161408
D. 500*256*0(?)25+256*128*0(?)25+128*2 = 4044
upvoted 8 times

  tooooony55 2 years, 3 months ago

B: Dense layers with 100 % trainable weigts, the dropout rate at 0.25 will randomly drop 25 % for the regularization's sake - still training for 100
% of the weights.
upvoted 1 times

  AlexZot 2 years, 3 months ago

Correct answer is C. Do not forget about bias term which is also trainable parameter.
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 6/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

upvoted 3 times

  sakura65 2 years ago

Why 128 for the last layer is correct and not 129 X 2?
upvoted 1 times

  suresh_vn 1 year, 8 months ago

because of use_bias = False
upvoted 1 times

  suresh_vn 1 year, 8 months ago

C is correct. 2nd Layer with use_bias = True
upvoted 3 times

  NickHapton 2 years, 4 months ago

Why do you post new questions in every existing question rather than post them as a new question?
upvoted 4 times

  MisterHairy 2 years, 4 months ago

Only moderator can post new questions. Thus, I am left with this format. I have emailed the additional questions to the moderator, but
he/she has not added them to the site. These questions were received off of other practice tests, but answers were not provided.
upvoted 4 times

  MisterHairy 2 years, 4 months ago

Answer?
upvoted 1 times

  Mohamed_Mossad 1 year, 10 months ago

D , is the only option that takes care of the dropout factor
upvoted 1 times

  Mohamed_Mossad 1 year, 10 months ago

my bad , this was tricky "The Dropout Layer randomly disables neurons during training. They still are present in your model and therefore
aren´t discounted from the number of parameters in your model summary." , so D is wrong , C and A takes care of the bias , but C is correct
upvoted 2 times

  fragkris Most Recent  5 months ago

Selected Answer: C

C - Downsample the majority and add weights to it.

upvoted 2 times

  tatpicc 5 months, 2 weeks ago

Max Pooling is a pooling operation that calculates the maximum value for patches of a feature map, and uses it to create a downsampled (pooled)
feature map. It is usually used after a convolutional layer.
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: C

Went with C
upvoted 1 times

  Puneet2022 1 year ago

Selected Answer: C

https://developers.google.com/machine-learning/data-prep/construct/sampling-splitting/imbalanced-data#downsampling-and-upweighting
upvoted 1 times

  enghabeth 1 year, 2 months ago

Selected Answer: C

https://developers.google.com/machine-learning/data-prep/construct/sampling-splitting/imbalanced-data
upvoted 1 times

  Fatiy 1 year, 2 months ago

Selected Answer: C

C. Downsample the data with upweighting to create a sample with 10% positive examples.

Dealing with class imbalance can be challenging for machine learning models. One common approach to resolving the problem is to downsample
the data, either by removing examples from the majority class or by oversampling the minority class. In this case, since you have very few positive
examples, you would want to oversample the positive examples to create a sample that better represents the underlying distribution of the data.
This could involve using upweighting, where positive examples are given a higher weight in the loss function to compensate for their relative
scarcity in the data. This can help the model to better focus on the positive examples and improve its performance in classifying failure incidents.
upvoted 2 times

  Fatiy 1 year, 2 months ago

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 7/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

C. Downsample the data with upweighting to create a sample with 10% positive examples.

  SharathSH 1 year, 4 months ago

Answer would obviously be C
As the dataset is imbalanced and you need to resolve this issue in order to obtain desired result the best approach will be to downsample the data
upvoted 1 times

  EFIGO 1 year, 5 months ago

Selected Answer: C

Best practice for imbalanced dataset is to downsample with upweight

https://developers.google.com/machine-learning/data-prep/construct/sampling-splitting/imbalanced-data#downsampling-and-upweighting
upvoted 1 times

  GCP72 1 year, 8 months ago

Selected Answer: C

Correct answer is "C"

upvoted 1 times

  enghabeth 1 year, 9 months ago

Selected Answer: C

C. because regardless of the model you use, you should always try to transform or adapt your dataset so that it is more balanced
upvoted 1 times

  Mohamed_Mossad 1 year, 10 months ago

Selected Answer: C

https://developers.google.com/machine-learning/data-prep/construct/sampling-splitting/imbalanced-data
upvoted 1 times

  Mohamed_Mossad 1 year, 10 months ago

Selected Answer: D

https://developers.google.com/machine-learning/data-prep/construct/sampling-splitting/imbalanced-data
upvoted 1 times

  Mohamed_Mossad 1 year, 10 months ago

sorry , mean C
upvoted 1 times

  wences 2 years, 3 months ago

agree on C
upvoted 1 times

  irumata 2 years, 3 months ago

Selected Answer: C

we need to balance and easiest way to downsample

upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 8/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #4 Topic 1

You want to rebuild your ML pipeline for structured data on Google Cloud. You are using PySpark to conduct data transformations at scale, but

your pipelines are taking over 12 hours to run. To speed up development and pipeline run time, you want to use a serverless tool and SQL syntax.

You have already moved your raw data into Cloud Storage. How should you build the pipeline on Google Cloud while meeting the speed and

processing requirements?

A. Use Data Fusion's GUI to build the transformation pipelines, and then write the data into BigQuery.

B. Convert your PySpark into SparkSQL queries to transform the data, and then run your pipeline on Dataproc to write the data into BigQuery.

C. Ingest your data into Cloud SQL, convert your PySpark commands into SQL queries to transform the data, and then use federated queries

from BigQuery for machine learning.

D. Ingest your data into BigQuery using BigQuery Load, convert your PySpark commands into BigQuery SQL queries to transform the data, and

then write the transformations to a new table.

Correct Answer: B

Community vote distribution

D (71%) B (21%) 8%

  nunzio144 Highly Voted  2 years, 9 months ago

It should be D .... Data Fusion is not SQL syntax ....

upvoted 19 times

  q4exam 2 years, 7 months ago

Agree, BQ is the only serverless that support SQL
upvoted 4 times

  A4M 2 years, 3 months ago

Needs to be D as the most suitable answer given the req's in question Datafusion is more of a no code Data transformation tool
upvoted 1 times

  Celia20210714 Highly Voted  2 years, 9 months ago

ANS: A
https://cloud.google.com/data-fusion#section-1
- Data Fusion is a serverless approach leveraging the scalability and reliability of Google services like Dataproc means Data Fusion offers the best o
data integration capabilities with a lower total cost of ownership.
- BigQuery is serverless and supports SQL.
- Dataproc is not serverless, you have to manage clusters.
- Cloud SQL is not serverless, you have to manage instances.
upvoted 11 times

  q4exam 2 years, 7 months ago

Data Fusion is not serverless, it create dataproc to execute the job .... I think the answer is C
upvoted 1 times

  mousseUwU 2 years, 6 months ago

Data Fusion is serverless: https://cloud.google.com/data-fusion#all-features
upvoted 3 times

  tavva_prudhvi 1 year, 1 month ago

I think you're only viewing the sentence "A serverless approach leveraging the scalability and reliability of Google services like Dataproc
means Data Fusion offers the best of data integration capabilities with a lower total cost of ownership", The sentence implies that Data
Fusion leverages a serverless approach, but it does not explicitly state that Data Fusion itself is serverless. It states that Data Fusion offers
the best of data integration capabilities by using a serverless approach that leverages the scalability and reliability of Google services like
Dataproc. So, while Data Fusion may not be fully serverless, it is designed to take advantage of serverless capabilities through its
integration with Google services.
upvoted 2 times

  mousseUwU 2 years, 6 months ago

Agree, A is correct
upvoted 2 times

  fragkris Most Recent  5 months ago

Selected Answer: D

D - BigQuery is the only serverless and SQL-syntax option.

upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 9/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

  Sum_Sum 5 months, 2 weeks ago

Selected Answer: D

D - as BQ is server less and supports SQL

none of the other options match both criteria
upvoted 1 times

  12112 9 months, 3 weeks ago

Selected Answer: D

I'll go with D.
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: D

Went with D
upvoted 3 times

  asava 1 year, 1 month ago

Selected Answer: B

BQ is the serverless solution

upvoted 2 times

  mellowed 1 year, 3 months ago

Correct option is D
upvoted 1 times

  ssaporylo 1 year, 3 months ago

Vote D
upvoted 1 times

  ares81 1 year, 3 months ago

Selected Answer: A

It should be A.
upvoted 1 times

  EFIGO 1 year, 5 months ago

Selected Answer: D

Data Fusion is not in SQL syntax, so no A;

Dataproc is not serverless, so no B;
Passing through Cloud SQL is uselss, just go with BigQuery, so no C;
D is correct
upvoted 3 times

  abhi0706 1 year, 6 months ago

C,D booth can be implemented as will work
but D is faster for implementation
upvoted 1 times

  GCP72 1 year, 8 months ago

Selected Answer: D

Correct answer is "D"

upvoted 1 times

  alejo_1053 1 year, 9 months ago

Selected Answer: B

I was thinking B, but now I'm kind of confused that nobody voted it
upvoted 3 times

  Mohamed_Mossad 1 year, 10 months ago

Selected Answer: D

"write sql syntax" will drop A , as datafusion is drag drop tool

"serveless' will drop B as dataproc is not serverless
answer will be in C,D booth can be implemented as will work
but D is faster for implementation
upvoted 2 times

  David_ml 2 years ago

Selected Answer: D

D is correct
upvoted 2 times

  morgan62 2 years ago

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 10/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

I think the answer is B...

But feeling very frustrated after seeing no one choosing B as the answer.
upvoted 2 times

  David_ml 1 year, 11 months ago

For these questions, you have to choose the answer that requires least effort. yes B is a viable option since u can set up dataproc to be
serverless. However D is the right answer since it requires least effort and time.
upvoted 2 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 11/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #5 Topic 1

You manage a team of data scientists who use a cloud-based backend system to submit training jobs. This system has become very difficult to

administer, and you want to use a managed service instead. The data scientists you work with use many different frameworks, including Keras,

PyTorch, theano, Scikit-learn, and custom libraries. What should you do?

A. Use the AI Platform custom containers feature to receive training jobs using any framework.

B. Configure Kubeflow to run on Google Kubernetes Engine and receive training jobs through TF Job.

C. Create a library of VM images on Compute Engine, and publish these images on a centralized repository.

D. Set up Slurm workload manager to receive jobs that can be scheduled to run on your cloud infrastructure.

Correct Answer: D

Community vote distribution

A (93%) 7%

  gcp2021go Highly Voted  2 years, 10 months ago

the answer is A
upvoted 23 times

  guruguru Highly Voted  2 years, 9 months ago

A, because AI platform supported all the frameworks mentioned. And Kubeflow is not managed service in GCP. https://cloud.google.com/ai-
platform/training/docs/getting-started-pytorch
upvoted 9 times

  fragkris Most Recent  5 months ago

Selected Answer: A

Chose A
upvoted 1 times

  Sum_Sum 5 months, 2 weeks ago

Selected Answer: A

A is the only Google managed service solution

B,C - are not managed
D- is a 3rd party
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: A

Went with A
upvoted 2 times

  Antmal 1 year, 3 months ago

The answer must be D as nowhere in the question has GCP been mention https://aadityachapagain.com/2020/09/distributed-training-with-slurm-
on-gcp/
upvoted 1 times

  tavva_prudhvi 1 year, 1 month ago

D is incorrect, this is more far from a managed service based solution.
upvoted 2 times

  Antmal 1 year, 3 months ago

The Answer is D. As no where in the answer has GCP been mentioned. https://aadityachapagain.com/2020/09/distributed-training-with-slurm-on-
gcp/
upvoted 1 times

  ares81 1 year, 3 months ago

Selected Answer: A

It's A
upvoted 2 times

  Moulichintakunta 1 year, 4 months ago

Selected Answer: D

Here the question is on workload management not on supporting frameworks slurm is a managed solution for workloads
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 12/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

  EFIGO 1 year, 5 months ago

Selected Answer: A

Now it's Vertex AI (instead of AI Platform), but it's the best solution, no need to do anything more complicated
upvoted 3 times

  abhi0706 1 year, 6 months ago

A - Vertex AI now
upvoted 3 times

  GCP72 1 year, 8 months ago

Selected Answer: A

Correct answer is "A"

upvoted 1 times

  caohieu04 2 years, 2 months ago

Selected Answer: A

A is correct
upvoted 2 times

  vinit1101 2 years, 3 months ago

Selected Answer: A

the answer is A
upvoted 2 times

  NamitSehgal 2 years, 3 months ago

A AI Platform should be correct
You can build pipelines in Airflow, Kubeflow, Dataflow but they need to be managed over AI platform or Vertex AI.
upvoted 2 times

  MisterHairy 2 years, 4 months ago

=New Question5=
You are building a linear model with over 100 input features, all with values between -1 and 1. You suspect that many features are non-informative
You want to remove the non-informative features from your model while keeping the informative ones in their original form. Which technique
should you use?

A. Use Principal Component Analysis to eliminate the least informative features.

B. Use L 1 regularization to reduce the coefficients of uninformative features to 0.
C. After building your model, use Shapley values to determine which features are the most informative.
D. Use an iterative dropout technique to identify which features do not degrade the model when removed
upvoted 2 times

  wviv 2 years, 3 months ago

B - L1 regularization is useful for selective reduction to zero
upvoted 7 times

  wences 2 years, 2 months ago

Ans is C
upvoted 1 times

  MisterHairy 2 years, 4 months ago

Answer?
upvoted 1 times

  [Removed] 2 years, 4 months ago

B
"So we can use L1 regularization to encourage many of the uninformative coefficients in our model to be exactly 0,"
https://developers.google.com/machine-learning/crash-course/regularization-for-sparsity/l1-regularization
upvoted 7 times

  lordcenzin 2 years, 2 months ago

totally agree with you
upvoted 1 times

  giaZ 2 years, 1 month ago

Best ans is B for how the question is posed. Sampled Shapley is a method for evaluating features attributions, but it's recommended especially
for non-differentiable models..
https://cloud.google.com/ai-platform/prediction/docs/ai-explanations/overview#sampled-shapley
Plus, you'd need to build the model first, calculate those attributions, remove the features with low attributions, and re-train the model with
fewer features..
upvoted 2 times

  ashii007 2 years, 4 months ago

Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. It
provides no native support for machine learning framework.
Why is D still marked as correct answer, when it is clearly a wrong answer and other comments are in agreement.

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 13/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 14/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #6 Topic 1

You work for an online retail company that is creating a visual search engine. You have set up an end-to-end ML pipeline on Google Cloud to

classify whether an image contains your company's product. Expecting the release of new products in the near future, you configured a retraining

functionality in the pipeline so that new data can be fed into your ML models. You also want to use AI Platform's continuous evaluation service to

ensure that the models have high accuracy on your test dataset. What should you do?

A. Keep the original test dataset unchanged even if newer products are incorporated into retraining.

B. Extend your test dataset with images of the newer products when they are introduced to retraining.

C. Replace your test dataset with images of the newer products when they are introduced to retraining.

D. Update your test dataset with images of the newer products when your evaluation metrics drop below a pre-decided threshold.

Correct Answer: C

Community vote distribution

B (81%) D (19%)

  esuaaaa Highly Voted  2 years, 10 months ago

I think B is the right answer.

A: Doesn't make sense. If you don't use the new product, it becomes useless.
C: Conventional products are also necessary as data.
D: I don't understand the need to wait until the threshold is exceeded.
upvoted 30 times

  mousseUwU 2 years, 6 months ago

Agree with you, B is correct
upvoted 1 times

  q4exam 2 years, 7 months ago

Agree, B as it extends to new products.
upvoted 1 times

  VincenzoP84 11 months, 3 weeks ago

D could have sense considering that is mentioned the intention to use AI Platform's continuous evaluation service
upvoted 2 times

  maukaba 5 months, 2 weeks ago

it's D for two reasons:
- explicitly required in the question to leverage Continuous evaluation service
- the threshod check allows to decide when perform the retrain avoiding making it for every single new data arrived.
upvoted 1 times

  gcp2021go Highly Voted  2 years, 10 months ago

answer is B
upvoted 11 times

  guilhermebutzke Most Recent  3 months, 1 week ago

Selected Answer: B

My initial confusion with option B arose from the phrase "with images of the newer products when they are introduced to retraining." Initially, I
mistakenly interpreted it as recommending the use of the same images in both training and testing, which is incorrect. However, upon further
reflection, I realized that using the same product does not necessarily mean using identical images. Therefore, I now believe that option B is the
most suitable choice.
upvoted 1 times

  bugger123 4 months, 3 weeks ago

Selected Answer: B

A and C make no sense - you don't want to lose any of the performance on existing products.
D - Why would you wait for your performance to drop in the first place? That's a reactive rather than proactive approach.
The answer is B
upvoted 1 times

  fragkris 5 months ago

Selected Answer: B

B for sure
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 15/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

  Sum_Sum 5 months, 2 weeks ago

B is the only thing we do in practice
upvoted 1 times

  harithacML 9 months, 3 weeks ago

Selected Answer: B

A. Keep the original test dataset unchanged even if newer products are incorporated into retraining. : This would not test on new products.
B. Extend your test dataset with images of the newer products when they are introduced to retraining. Most Voted : old+new products testing.
Great
C. Replace your test dataset with images of the newer products when they are introduced to retraining. : No need of old product to be tested? old
product recognition might change when new products are added in training. Option Not good.
D. Update your test dataset with images of the newer products when your evaluation metrics drop below a pre-decided threshold.: why wait? no
need
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: B

Went with B
upvoted 2 times

  will7722 1 year, 1 month ago

Selected Answer: B

you can't just replace the old product data with just new product, until you don't sell old product anymore
upvoted 2 times

  SharathSH 1 year, 4 months ago

Ans: B
A would not use the newer data hence not a ideal option
C Replacing will not be a good option as it will replace older data with newer data which in turn hampers accuracy
D waiting for threshold is not a better option
upvoted 1 times

  koakande 1 year, 5 months ago

B is the most plausible answer. The key principle is that test set should represent ground truth distribution to infer credible model evaluation. So
once new products become available, test set should be updated to reflect the new product distribution
upvoted 2 times

  EFIGO 1 year, 5 months ago

Selected Answer: B

You need to correctly classify newer products, so you need the new training data ==> A is wrong;
You need to keep doing a good job on older dataset, you can't just ignore it ==> C is wrong;
You know when you are introducing new products, there is no need to wait for a drop in preformaces ==> D is wrong;
B is correct
upvoted 2 times

  abhi0706 1 year, 6 months ago

it should be B as its inclusive
upvoted 1 times

  abhi0706 1 year, 6 months ago

B as it extends to new products
upvoted 2 times

  GCP72 1 year, 8 months ago

Selected Answer: B

Correct answer is "B"

upvoted 2 times

  sachinxshrivastav 1 year, 8 months ago

Selected Answer: B

B is the right one

upvoted 1 times

  Mohamed_Mossad 1 year, 10 months ago

Selected Answer: D

answer between B,D but in the question "You also want to use AI Platform's continuous evaluation service" will make me biased towards D , also
retrain is done when model performance is below threshold , not whenever new data is intoroduce
upvoted 4 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 16/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #7 Topic 1

You need to build classification workflows over several structured datasets currently stored in BigQuery. Because you will be performing the

classification several times, you want to complete the following steps without writing code: exploratory data analysis, feature selection, model

building, training, and hyperparameter tuning and serving. What should you do?

A. Configure AutoML Tables to perform the classification task.

B. Run a BigQuery ML task to perform logistic regression for the classification.

C. Use AI Platform Notebooks to run the classification model with pandas library.

D. Use AI Platform to run the classification model job configured for hyperparameter tuning.

Correct Answer: B

BigQuery ML supports supervised learning‫ג‬€ with the logistic regression model type.

Reference:

https://cloud.google.com/bigquery-ml/docs/logistic-regression-prediction

Community vote distribution

A (100%)

  guruguru Highly Voted  2 years, 9 months ago

A. Because BigQuery ML need to write code.

upvoted 24 times

  Azhar10 Most Recent  4 weeks ago

The question says 'over several structured datasets' means large/multiple datasets and 'several times' means frequently use of data. Though
BigQuery ML is not an absolute 'NO Code' solution but all it needs is very simple SQL query to train ML model So 'B' could be the correct answer
here but it is asking for Hyperparameter tuning which is not available in BigQuery ML so correct answer is 'A'
upvoted 1 times

  fragkris 5 months ago

Selected Answer: A

A - AutoML is no code
upvoted 1 times

  harithacML 9 months, 3 weeks ago

Selected Answer: A

requirement : No code
A. Configure AutoML Tables to perform the classification task. : No code
B. Run a BigQuery ML task to perform logistic regression for the classification. : coding LR model
C. Use AI Platform Notebooks to run the classification model with pandas library. : Notebooks include codes
D. Use AI Platform to run the classification model job configured for hyperparameter tuning.: job needs to be written what to execute
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: A

Went with A
upvoted 1 times

  Moulichintakunta 1 year, 4 months ago

Selected Answer: A

Because BigQueryML doesn't have lots of steps that mentioned in question

upvoted 1 times

  EFIGO 1 year, 5 months ago

Selected Answer: A

"without writing code" ==> AutoML

A is correct
upvoted 1 times

  abhi0706 1 year, 6 months ago

Correct answer is "A"
upvoted 1 times

  GCP72 1 year, 8 months ago

Selected Answer: A

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 17/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Correct answer is "A"

upvoted 1 times

  sachinxshrivastav 1 year, 8 months ago

Selected Answer: A

Because BigQuery ML need to write code, so A is the correct one

upvoted 1 times

  Mohamed_Mossad 1 year, 10 months ago

Selected Answer: A

"without writing code" only A option complies with this statment , all other options requires writing code
upvoted 1 times

  caohieu04 2 years, 2 months ago

Selected Answer: A

A is correct
upvoted 2 times

  NamitSehgal 2 years, 3 months ago

A is correct
https://cloud.google.com/automl-tables/docs/beginners-guide
upvoted 3 times

  MisterHairy 2 years, 4 months ago

=New Question7=
You recently designed and built a custom neural network that uses critical dependencies specific to your organization's framework. You need to
train the model using a managed training service on Google Cloud. However, the ML framework and related dependencies are not supported by A
Platform Training. Also, both your model and your data are too large to fit in memory on a single machine. Your ML framework of choice uses the
scheduler, workers, and servers distribution structure. What should you do?

A. Build your custom container to run jobs on Al Platform Training

B. Use a built-in model available on Al Platform Training
C. Build your custom containers to run distributed training jobs on Al Platform Training
D. Reconfigure your code to a ML framework with dependencies that are supported by Al Platform Training
upvoted 3 times

  coderpk 2 years, 3 months ago

C custom container and distributed system
upvoted 4 times

  A4M 2 years, 3 months ago

Answer - C
It's between A & C
C - Because the questions states data too large to fit in memory hence distributed training is relevant
upvoted 3 times

  morgan62 2 years ago

C is the answer without doubt.
A: Distributed? Nope
B: Built-in? Nope
D: Reconfig? Nope
upvoted 3 times

  MisterHairy 2 years, 4 months ago

Answer?
upvoted 1 times

  ashii007 2 years, 4 months ago

You have to export out BQ trained ML model to set it up for inference. Inference is not natively offered in BQ.
You can perform EDA in autoML tables.
upvoted 1 times

  alphard 2 years, 4 months ago

A is right.

Dump data to table and do the work by clicks of button and no coding needed.
upvoted 1 times

  mousseUwU 2 years, 6 months ago

A -> Automatically build and deploy state-of-the-art machine learning models on structured data

https://cloud.google.com/automl-tables/docs#docs
upvoted 2 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 18/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #8 Topic 1

You work for a public transportation company and need to build a model to estimate delay times for multiple transportation routes. Predictions

are served directly to users in an app in real time. Because different seasons and population increases impact the data relevance, you will retrain

the model every month. You want to follow Google-recommended best practices. How should you configure the end-to-end architecture of the

predictive model?

A. Configure Kubeflow Pipelines to schedule your multi-step workflow from training to deploying your model.

B. Use a model trained and deployed on BigQuery ML, and trigger retraining with the scheduled query feature in BigQuery.

C. Write a Cloud Functions script that launches a training and deploying job on AI Platform that is triggered by Cloud Scheduler.

D. Use Cloud Composer to programmatically schedule a Dataflow job that executes the workflow from training to deploying your model.

Correct Answer: A

Community vote distribution

A (100%)

  Paul_Dirac Highly Voted  2 years, 10 months ago

Answer: A
A. Kubeflow Pipelines can form an end-to-end architecture (https://www.kubeflow.org/docs/components/pipelines/overview/pipelines-overview/)
and deploy models.
B. BigQuery ML can't offer an end-to-end architecture because it must use another tool, like AI Platform, for serving models at the end of the
process (https://cloud.google.com/bigquery-ml/docs/export-model-tutorial#online_deployment_and_serving).
C. Cloud Scheduler can trigger the first step in a pipeline, but then some orchestrator is needed to continue the remaining steps. Besides, having
Cloud Scheduler alone can't ensure failure handling during pipeline execution.
D. A Dataflow job can't deploy models, it must use AI Platform at the end instead.
upvoted 36 times

  q4exam 2 years, 7 months ago

Dataflow can deploy model .... this is how you do stream inference on stream
upvoted 1 times

  mousseUwU 2 years, 6 months ago

Please send a source link?
upvoted 1 times

  lordcenzin 2 years, 2 months ago

yes you can but it is not supposed to do that. DF is for data processing and transformation. you would loose all shenanigans kubeflow
provide as native.
Among the two answers, i think A is the most correct
upvoted 1 times

  mousseUwU 2 years, 6 months ago

I guess it's A
upvoted 3 times

  gcp2021go Highly Voted  2 years, 10 months ago

the answer is D. found similar explaination in this course. open for discussion. I found B could also work, but the question asked for end-to end,
thus I choose D in stead of B https://www.coursera.org/lecture/ml-pipelines-google-cloud/what-is-cloud-composer-CuXTQ
upvoted 11 times

  tavva_prudhvi 1 year, 1 month ago

D is incorrect. Cloud Composer is a fully managed workflow orchestration service built on Apache Airflow. It is a recommended way by Google
to schedule continuous training jobs. But it isn’t used to run the training jobs. AI Platform is used for training and deployment.
upvoted 1 times

  fragkris Most Recent  5 months ago

Selected Answer: A

Chose A
upvoted 1 times

  Sum_Sum 5 months, 2 weeks ago

Selected Answer: A

D - Dataflow job can't deploy models

B,C are not - are not complete solutions
leaving A to be the correct one
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 19/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

  suranga4 7 months, 1 week ago

Answer is A
upvoted 1 times

  harithacML 9 months, 3 weeks ago

Selected Answer: A

Req: retrain the model every month+ Google-recommended best practice+ end-to-end architecture
A. Configure Kubeflow Pipelines to schedule your multi-step workflow from training to deploying your model. : Supports all above
B. Use a model trained and deployed on BigQuery ML, and trigger retraining with the scheduled query feature in BigQuery : Why BigQuery ML
when vertexAI/kubflow can handle end to end. BigQuery ML+ traigger only initiate the code run.
C. Write a Cloud Functions script that launches a training and deploying job on AI Platform that is triggered by Cloud Scheduler. : Not
recommended by google for end to end ML
D. Use Cloud Composer to programmatically schedule a Dataflow job that executes the workflow from training to deploying your model. : Not
recommended by google for end to end ML. what if model fails? matrix monitor?
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: A

Went with A
upvoted 1 times

  John_Pongthorn 1 year, 2 months ago

Selected Answer: A

A : Yet the newer is Vertext-AI Pipeline built on Kubeflow

upvoted 1 times

  Fatiy 1 year, 2 months ago

Selected Answer: A

A : In this case, it would be a good fit as you need to retrain your model every month, which can be automated with Kubeflow Pipelines. This make
it easier to manage the entire process, from training to deploying, in a streamlined and scalable manner.
upvoted 1 times

  EFIGO 1 year, 5 months ago

Selected Answer: A

A is correct
All the options get you to the required result, but only A follows the Google-recommended best practices
upvoted 1 times

  abhi0706 1 year, 6 months ago

Answer is A: Kubeflow Pipelines can form an end-to-end architecture
upvoted 1 times

  GCP72 1 year, 8 months ago

Selected Answer: A

Correct answer is "A"

upvoted 1 times

  caohieu04 2 years, 2 months ago

Selected Answer: A

Community vote
upvoted 2 times

  lordcenzin 2 years, 2 months ago

Selected Answer: A

A for me too. KF provides all the end2end tools to perform what is asked
upvoted 2 times

  gcper 2 years, 7 months ago

Kubeflow can handle all of those things, including deploying to a model endpoint for real-time serving.
upvoted 2 times

  celia20200410 2 years, 9 months ago

ANS: A
https://medium.com/google-cloud/how-to-build-an-end-to-end-propensity-to-purchase-solution-using-bigquery-ml-and-kubeflow-pipelines-
cd4161f734d9#75c7

To automate this model-building process, you will orchestrate the pipeline using Kubeflow Pipelines, ‘a platform for building and deploying
portable, scalable machine learning (ML) workflows based on Docker containers.’
upvoted 6 times

  q4exam 2 years, 7 months ago

I think both A and D are correct because it is just different fashion of doing ML ...
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 20/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

upvoted 1 times

  ms_lemon 2 years, 6 months ago

But D doesn't follow Google best practices
upvoted 2 times

  george_ognyanov 2 years, 6 months ago

Answer seems to be A really. Here is a link from Google-recommended best practices. They are talking about Vertex AI Pipelines, which
are essentially Kubeflow.

https://cloud.google.com/architecture/ml-on-gcp-best-practices?hl=en#machine-learning-workflow-orchestration
upvoted 3 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 21/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #9 Topic 1

You are developing ML models with AI Platform for image segmentation on CT scans. You frequently update your model architectures based on

the newest available research papers, and have to rerun training on the same dataset to benchmark their performance. You want to minimize

computation costs and manual intervention while having version control for your code. What should you do?

A. Use Cloud Functions to identify changes to your code in Cloud Storage and trigger a retraining job.

B. Use the gcloud command-line tool to submit training jobs on AI Platform when you update your code.

C. Use Cloud Build linked with Cloud Source Repositories to trigger retraining when new code is pushed to the repository.

D. Create an automated workflow in Cloud Composer that runs daily and looks for changes in code in Cloud Storage using a sensor.

Correct Answer: B

https://cloud.google.com/ai-platform/training/docs/training-jobs

Community vote distribution

C (100%)

  celia20200410 Highly Voted  2 years, 9 months ago

ANS:C

CI/CD for Kubeflow pipelines.

At the heart of this architecture is Cloud Build, infrastructure. Cloud Build can import source from Cloud Source Repositories, GitHub, or Bitbucket,
and then execute a build to your specifications, and produce artifacts such as Docker containers or Python tar files.
upvoted 24 times

  q4exam 2 years, 7 months ago

I think B might be make sense if they have compute concern, there might be many version change but not all that you want to trigger compute
upvoted 3 times

  chohan Highly Voted  2 years, 10 months ago

Should be C
https://cloud.google.com/architecture/architecture-for-mlops-using-tfx-kubeflow-pipelines-and-cloud-build#cicd_architecture
upvoted 10 times

  HaiMinhNguyen Most Recent  3 months, 2 weeks ago

I mean C is indeed the most logical, but i do not see anything relevant to cost concern. Anyone has any explanation?
upvoted 1 times

  harithacML 9 months, 3 weeks ago

Selected Answer: C

Req :frequently rerun training + minimise computation costs + 0 manual intervention + version control for your code
A. Use Cloud Functions to identify changes to your code in Cloud Storage and trigger a retraining job. : No version control
B. Use the gcloud command-line tool to submit training jobs on AI Platform when you update your code. : Needs manual intervention to gcloud cl
code submission
C. Use Cloud Build linked with Cloud Source Repositories to trigger retraining when new code is pushed to the repository. Yes, connects to github
like Vcontrols, automated=0 manual intervention + can initiate upon code changes + cost(not sure compared to other options)
D. Create an automated workflow in Cloud Composer that runs daily and looks for changes in code in Cloud Storage using a sensor. : Sensor?? too
much . also none of req meets.
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: C

Went with C
upvoted 1 times

  fredcaram 1 year, 1 month ago

Selected Answer: C

C follows a best practice, B is a manual step

upvoted 1 times

  EFIGO 1 year, 5 months ago

Selected Answer: C

C is the correct answer, it's the Google recommended approach;

Checking for changes in code without using Cloud Source Repository is a bad choice, so no A and B;
Cloud Composer is an overkill, so no D.
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 22/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

  abhi0706 1 year, 6 months ago

Answer is C
upvoted 1 times

  GCP72 1 year, 8 months ago

Selected Answer: C

Correct answer is "C"

upvoted 2 times

  Mohamed_Mossad 1 year, 10 months ago

C is the best answer because "having version control for your code"
upvoted 1 times

  caohieu04 2 years, 2 months ago

Selected Answer: C

Community vote
upvoted 3 times

  NamitSehgal 2 years, 3 months ago

C cloudbuild
upvoted 1 times

  ashii007 2 years, 4 months ago

B is definitely wrong because it will require manual intervention.Question specifically states the objective of minimal manual intervention. C is the
way to go.
upvoted 1 times

  alphard 2 years, 4 months ago

My answer is C.

Ci/CD/CT is executed in Cloud Build.

upvoted 1 times

  mousseUwU 2 years, 6 months ago

C is correct
upvoted 1 times

  gcper 2 years, 7 months ago

Cloud Build + Source Repository triggers for CI/CD

upvoted 2 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 23/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #10 Topic 1

Your team needs to build a model that predicts whether images contain a driver's license, passport, or credit card. The data engineering team

already built the pipeline and generated a dataset composed of 10,000 images with driver's licenses, 1,000 images with passports, and 1,000

images with credit cards. You now have to train a model with the following label map: [`˜drivers_license', `˜passport', `˜credit_card']. Which loss

function should you use?

A. Categorical hinge

B. Binary cross-entropy

C. Categorical cross-entropy

D. Sparse categorical cross-entropy

Correct Answer: D

se sparse_categorical_crossentropy. Examples for above 3-class classification problem: [1] , [2], [3]

Reference:

https://stats.stackexchange.com/questions/326065/cross-entropy-vs-sparse-cross-entropy-when-to-use-one-over-the-other

Community vote distribution

C (52%) D (48%)

  ransev Highly Voted  2 years, 10 months ago

Answer is C
upvoted 19 times

  gcp2021go 2 years, 10 months ago

Use sparse categorical crossentropy when your classes are mutually exclusive (e.g. when each sample belongs exactly to one class) and
categorical crossentropy when one sample can have multiple classes or labels are soft probabilities (like [0.5, 0.3, 0.2]).
upvoted 8 times

  GogoG 2 years, 6 months ago

Definitely C - the target variable label formulated in the question requires a categorical cross entropy loss function i.e. 3 columns
'drivers_license' , 'passport', 'credit_card' that can take values 1, 0. Meanwhile sparse categorical cross entropy would require the labels to be
integer encoded in a single vector, for example, 'drivers_license' = 1, 'passport' = 2, 'credit_card' = 3.
upvoted 7 times

  Jarek7 9 months, 3 weeks ago

Actually it is exactly the opposite. Your label map has 3 options which are mutually exclusive. A document cannot be both - a driver
license and a passport. There is a SPARSE vector as output - only one of the categorical outputs is valid for a one example.
upvoted 1 times

  Jarek7 9 months, 3 weeks ago

No, I'm sorry, I wrote it before checking - You were right. We use sparse categorical cross entropy when we have just an index (integer
as a label. The only difference is that it decodes the integer into one hot representation that suites to out DNN output.
upvoted 1 times

  gcp2021go Highly Voted  2 years, 10 months ago

answer is D
https://machinelearningmastery.com/how-to-choose-loss-functions-when-training-deep-learning-neural-networks/
upvoted 10 times

  ori5225 2 years, 8 months ago

  giaZ 2 years, 1 month ago

Literally from the link you posted:
"A possible cause of frustration when using cross-entropy with classification problems with a large number of labels is the one hot encoding
process. [...] This can mean that the target element of each training example may require a one hot encoded vector with tens or hundreds of
thousands of zero values, requiring significant memory. Sparse cross-entropy addresses this by performing the same cross-entropy calculation
of error, without requiring that the target variable be one hot encoded prior to training".
Here we have 3 categories...No problem doing one-hot encoding. Answer: C
upvoted 2 times

  gscharly Most Recent  1 week, 1 day ago

Selected Answer: C

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 24/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

I'd go with C. Categorical cross entropy is used when classes are mutually exclusive. If the number of classes was very high, then we could use
sparse categorical cross entropy.
upvoted 1 times

  pinimichele01 2 weeks, 3 days ago

Selected Answer: D

Use sparse categorical crossentropy when your classes are mutually exclusive (e.g. when each sample belongs exactly to one class) and categorical
crossentropy when one sample can have multiple classes or labels are soft probabilities (like [0.5, 0.3, 0.2]).
upvoted 1 times

  pinimichele01 1 week, 1 day ago

A. Categorical hinge : Mainly for SVM soft margins
B. Binary cross-entropy : for 2 class only
C. Categorical cross-entropy: Multi class but not necessarily Mutually exclusive
D. Sparse categorical cross-entropy : Multi class + Mutually exclusive only , saves memory too
upvoted 1 times

  pinimichele01 1 week, 1 day ago

https://www.tensorflow.org/api_docs/python/tf/keras/losses/categorical_crossentropy
https://www.tensorflow.org/api_docs/python/tf/keras/metrics/sparse_categorical_crossentropy
upvoted 1 times

  Yan_X 3 weeks, 4 days ago

Selected Answer: C

C
D is for integer value instead of one-hot encoded vectors, in our question, it is 'drivers_license', 'passport', 'credit_card' one-hot.
upvoted 1 times

  Paulus89 2 months ago

Selected Answer: C

It depends on how the labels are encoded. If onehot use CCE. If its a single integer representing the class use SCCE (Source: same as in the official
(wrong) answer)
From the question it's not clear how the labels are encoded. But for just 3 classes there is no doubt it's better to go with one-hot encoding.
Memory restrictions or a huge number of classes might point to SCCE
upvoted 1 times

  Zwi3b3l 3 months ago

Selected Answer: D

You now HAVE TO to train a model with the following label map: [`˜drivers_license', `˜passport', `˜credit_card'].
upvoted 2 times

  Sum_Sum 5 months, 2 weeks ago

Selected Answer: C

If you are wondering between C & D - think about what "sparse" means
It is used when dealing with hundreds of categories
upvoted 1 times

  Sahana_98 6 months ago

Selected Answer: D

mutually exclusive classes

upvoted 1 times

  syedsajjad 6 months, 3 weeks ago

In this case, we have a multi-class classification problem with three classes: driver's license, passport, and credit card. Therefore, we should use the
categorical cross-entropy loss function to train our model.

Sparse categorical cross-entropy is used for multi-class classification problems where the labels are represented in a sparse matrix format. This is
not the case in this problem.
upvoted 2 times

  lalala_meow 7 months, 1 week ago

Selected Answer: C

Only 3 categories of values being either T or F. They don't really need to be integer encoded, which differs sparse cross-entropy from categorical.
upvoted 1 times

  Dan137 8 months ago

Selected Answer: D

https://fmorenovr.medium.com/sparse-categorical-cross-entropy-vs-categorical-cross-entropy-ea01d0392d28
upvoted 1 times

  Dan137 8 months ago

categorical_crossentropy (cce) produces a one-hot array containing the probable match for each category,

sparse_categorical_crossentropy (scce) produces a category index of the most likely matching category.

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 25/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

upvoted 1 times

  Venish 8 months, 3 weeks ago

The correct answer is: C. Categorical cross-entropy.

you are dealing with a multi-class classification problem where each image can belong to one of three classes: "driver's license," "passport," or
"credit card." Categorical cross-entropy is the appropriate loss function for multi-class classification tasks. It measures the dissimilarity between the
predicted class probabilities and the true class labels. It's designed to penalize larger errors in predicted probabilities and help the model converge
towards more accurate predictions.
upvoted 1 times

  harithacML 9 months, 3 weeks ago

Req : Multi class + mutually exclusive labels
A. Categorical hinge : Mainly for SVM soft margins
B. Binary cross-entropy : for 2 class only
C. Categorical cross-entropy: Multi class but not necessarily Mutually exclusive
D. Sparse categorical cross-entropy : Multi class + Mutually exclusive only , saves memory too
upvoted 1 times

  momosoundz 10 months, 1 week ago

Selected Answer: C

it's C
upvoted 1 times

  NickHapton 10 months, 2 weeks ago

Answer is C.
sparse means your label is mutually exclusive, but in this case, an image can consist of driver licence and credit card and etc
upvoted 1 times

  ashu381 10 months, 3 weeks ago

Selected Answer: C

Categorical cross entropy as model is trained with [1,0,0]/[0,1,0]/[0,0,1] kind of labels as given in the question
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 26/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #11 Topic 1

You are designing an ML recommendation model for shoppers on your company's ecommerce website. You will use Recommendations AI to build,

test, and deploy your system. How should you develop recommendations that increase revenue while following best practices?

A. Use the ‫ג‬€Other Products You May Like‫ג‬€ recommendation type to increase the click-through rate.

B. Use the ‫ג‬€Frequently Bought Together‫ג‬€ recommendation type to increase the shopping cart size for each order.

C. Import your user events and then your product catalog to make sure you have the highest quality event stream.

D. Because it will take time to collect and record product data, use placeholder values for the product catalog to test the viability of the model.

Correct Answer: B

Frequently bought together' recommendations aim to up-sell and cross-sell customers by providing product.

Reference:

https://rejoiner.com/resources/amazon-recommendations-secret-selling-online/

Community vote distribution

B (86%) 14%

  chohan Highly Voted  2 years, 10 months ago

Answer should be B
https://cloud.google.com/recommendations-ai/docs/placements#rps
upvoted 19 times

  Celia20210714 Highly Voted  2 years, 9 months ago

ANS:B
https://cloud.google.com/recommendations-ai/docs/placements#fbt
Frequently bought together (shopping cart expansion)
The "Frequently bought together" recommendation predicts items frequently bought together for a specific product within the same shopping
session. If a list of products is being viewed, then it predicts items frequently bought with that product list.

This recommendation is useful when the user has indicated an intent to purchase a particular product (or list of products) already, and you are
looking to recommend complements (as opposed to substitutes). This recommendation is commonly displayed on the "add to cart" page, or on
the "shopping cart" or "registry" pages (for shopping cart expansion).
upvoted 7 times

  harithacML Most Recent  9 months, 3 weeks ago

Selected Answer: B

Req: ML Recommendations + increase revenue + best practices

A. Use the ‫ג‬€Other Products You May Like‫ג‬€ recommendation type to increase the click-through rate. : You may like ? No
B. Use the ‫ג‬€Frequently Bought Together‫ג‬€ recommendation type to increase the shopping cart size for each order. : Viable with companies
purchase information. Also this is the basic recommendation to get started with : cross sell and upsell
C. Import your user events and then your product catalog to make sure you have the highest quality event stream. : Ensuring quality? This makes
sure the data quality. Not bringing more sales much
D. Because it will take time to collect and record product data, use placeholder values for the product catalog to test the viability of the model. :
dummy values to replace for now? No value added to sales.
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: B

Went with B
upvoted 1 times

  Yajnas_arpohc 1 year, 1 month ago

Selected Answer: C

https://cloud.google.com/recommendations-ai/docs/overview
upvoted 1 times

  EFIGO 1 year, 5 months ago

Selected Answer: B

B directly impact the revenue

upvoted 1 times

  GCP72 1 year, 8 months ago

Selected Answer: B

Correct answer is "B"

upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 27/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

  caohieu04 2 years, 2 months ago

Selected Answer: B

Community vote
upvoted 2 times

  NamitSehgal 2 years, 3 months ago

Event Data is important along with product data but I am not sure if there is a catch here, what goes first
https://github.com/GoogleCloudPlatform/analytics-componentized-patterns/blob/master/retail/recommendation-
system/bqml/bqml_retail_recommendation_system.ipynb
upvoted 1 times

  ramen_lover 2 years, 5 months ago

I don't know the correct answer, but it seems C and D are not correct:
- "Do not record user events for product items that have not been imported yet."; i.e., import your product catalog first and then your user events.
- "Make sure that all required catalog information is included and correct. Do not use dummy or placeholder values."
https://cloud.google.com/retail/recommendations-ai/docs/upload-catalog#catalog_import_best_practices

I think the correct answer is B, because the "default optimization objective" for FBT is "revenue per order", whereas the "default optimization
objective" for OYML is "click-through rate".
https://cloud.google.com/retail/recommendations-ai/docs/placements#fbt
upvoted 4 times

  mousseUwU 2 years, 6 months ago

Sense is B
upvoted 1 times

  gcp2021go 2 years, 10 months ago

the correct answer should be C
there is a diagram on the webpage, discuss how it works https://cloud.google.com/recommendations
upvoted 5 times

  sensev 2 years, 9 months ago

I think B is the correct answer instead of C, since B directly contributes to increasing revenue.
upvoted 2 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 28/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #12 Topic 1

You are designing an architecture with a serverless ML system to enrich customer support tickets with informative metadata before they are

routed to a support agent. You need a set of models to predict ticket priority, predict ticket resolution time, and perform sentiment analysis to help

agents make strategic decisions when they process support requests. Tickets are not expected to have any domain-specific terms or jargon.

The proposed architecture has the following flow:

Which endpoints should the Enrichment Cloud Functions call?

A. 1 = AI Platform, 2 = AI Platform, 3 = AutoML Vision

B. 1 = AI Platform, 2 = AI Platform, 3 = AutoML Natural Language

C. 1 = AI Platform, 2 = AI Platform, 3 = Cloud Natural Language API

D. 1 = Cloud Natural Language API, 2 = AI Platform, 3 = Cloud Vision API

Correct Answer: B

Community vote distribution

C (94%) 6%

  Celia20210714 Highly Voted  2 years, 9 months ago

ANS: C

https://cloud.google.com/architecture/architecture-of-a-serverless-ml-model#architecture
The architecture has the following flow:
A user writes a ticket to Firebase, which triggers a Cloud Function.
-The Cloud Function calls 3 different endpoints to enrich the ticket:
-An AI Platform endpoint, where the function can predict the priority.
-An AI Platform endpoint, where the function can predict the resolution time.
-The Natural Language API to do sentiment analysis and word salience.
-For each reply, the Cloud Function updates the Firebase real-time database.
-The Cloud Function then creates a ticket into the helpdesk platform using the RESTful API.
upvoted 25 times

  gcp2021go Highly Voted  2 years, 10 months ago

the answer should be C. The tickets do not include specific terms , which means, it doesn't need to be custom built. thus, we can use cloud NLP AP
instead of automl NLP.
upvoted 16 times

  Sum_Sum Most Recent  5 months, 2 weeks ago

Selected Answer: C

C - as Natural Language API has sentiment analysis

and using the API over a custom model is always preferred
upvoted 1 times

  harithacML 9 months, 3 weeks ago

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 29/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Selected Answer: C

Req : serverless ML system + models to (predict ticket priority -predict ticket resolution time- perform sentiment analysis )
The proposed architecture has the following flow:

A. 1 = AI Platform, 2 = AI Platform, 3 = AutoML Vision. : No image data as input here. Only text (NLP)
B. 1 = AI Platform, 2 = AI Platform, 3 = AutoML Natural Language : Only sentiment for 3rd endpoint. No custom model needed :
https://cloud.google.com/natural-language/automl/docs/beginners-guide . So autoML not required
C. 1 = AI Platform, 2 = AI Platform, 3 = Cloud Natural Language API : 1- for classification(priority :high low medium), 2- ticket time-regression -3-
sentiment analysis the CNL api is enough
D. 1 = Cloud Natural Language API, 2 = AI Platform, 3 = Cloud Vision API : No image data
upvoted 2 times

  M25 11 months, 3 weeks ago

Selected Answer: C

Went with C
upvoted 1 times

  wish0035 1 year, 4 months ago

Selected Answer: C

ANS: C

This is the exact solution by Google: https://web.archive.org/web/20210618072649/https://cloud.google.com/architecture/architecture-of-a-

serverless-ml-model#architecture
upvoted 2 times

  jespinosal 1 year, 4 months ago

Selected Answer: B

ANS: B As you need to train custom regression models (Auto ML), as NLP API is not going to be able to rank your Priority and eval the Time.
upvoted 1 times

  jespinosal 1 year, 4 months ago

ANS: C as NLP API is not able to perform custom Regression Models (predict time) and Priority. You need Auto ML o train your own
upvoted 1 times

  EFIGO 1 year, 5 months ago

Selected Answer: C

AI Platform (now Vertex AI) for both the predictions and Natural Language API for sentiment analysis since there are no specific terms (so no need
to custom build something with an AutoML), so C
upvoted 2 times

  GCP72 1 year, 8 months ago

Selected Answer: C

Correct answer is "C"

upvoted 1 times

  Mohamed_Mossad 1 year, 10 months ago

Selected Answer: C

- by options eliminations A,D must be dropped we have no vision tasks in this system
- answer between B,C , question stated "no specific domain or jargon" so natural laguage api is prefered over automl since there no custom
entinites or custom training , so I vote for C
upvoted 2 times

  caohieu04 2 years, 2 months ago

Selected Answer: C

Community vote
upvoted 4 times

  alphard 2 years, 4 months ago

Mine is C.

Priority prediction is categorical. Resolution time is linear regression. Sentiment is a NLP problem.
upvoted 2 times

  chohan 2 years, 10 months ago

Should be B, don't forget the domain specific terms and jargons
https://medium.com/google-cloud/analyzing-sentiment-of-text-with-domain-specific-vocabulary-and-topics-726b8f287aef
upvoted 1 times

  gcp2021go 2 years, 10 months ago

the question said "Tickets are not expected to have any domain-specific terms or jargon."
upvoted 7 times

  inder0007 2 years, 10 months ago

not sure if I agree with b, I think D is a better choice

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 30/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

upvoted 1 times

  Hiba01 2 years, 10 months ago

predict ticket priority (AI plateform : classification), predict ticket resolution time (AI plateform : regression), and perform sentiment analysis (
Cloud NLP API )
upvoted 2 times

  sensev 2 years, 9 months ago

D is wrong since Cloud Vision API is not needed.
upvoted 2 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 31/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #13 Topic 1

You have trained a deep neural network model on Google Cloud. The model has low loss on the training data, but is performing worse on the

validation data. You want the model to be resilient to overfitting. Which strategy should you use when retraining the model?

A. Apply a dropout parameter of 0.2, and decrease the learning rate by a factor of 10.

B. Apply a L2 regularization parameter of 0.4, and decrease the learning rate by a factor of 10.

C. Run a hyperparameter tuning job on AI Platform to optimize for the L2 regularization and dropout parameters.

D. Run a hyperparameter tuning job on AI Platform to optimize for the learning rate, and increase the number of neurons by a factor of 2.

Correct Answer: D

Community vote distribution

C (100%)

  chohan Highly Voted  2 years, 10 months ago

Should be C
https://machinelearningmastery.com/introduction-to-regularization-to-reduce-overfitting-and-improve-generalization-error/
upvoted 24 times

  inder0007 Highly Voted  2 years, 10 months ago

increasing the size of the network will make the overfitting situation worse
upvoted 6 times

  fragkris Most Recent  5 months ago

Selected Answer: C

Voted C
upvoted 1 times

  Sum_Sum 5 months, 2 weeks ago

Selected Answer: C

A,B have very specific numbers which doesn't gurantee success

C is best
D - increases the size - which is not helping with overfitting
upvoted 1 times

  harithacML 9 months, 3 weeks ago

Selected Answer: C

Req: make model resilient

A. Apply a dropout parameter of 0.2, and decrease the learning rate by a factor of 10. : Might / might not work . But may not find optimal
parameter set since it uses random values
B. Apply a L2 regularization parameter of 0.4, and decrease the learning rate by a factor of 10. : Might / might not work . But may not find optimal
parameter set since it uses random values
C. Run a hyperparameter tuning job on AI Platform to optimize for the L2 regularization and dropout parameters. : l2 and dropout are
regularisation method which would work. Let AI find the optimal solution on how extend these parameters should regularise. Yes this would work.
D. Run a hyperparameter tuning job on AI Platform to optimize for the learning rate, and increase the number of neurons by a factor of 2 :
AIplatform would do but adding neurons would make network nore complex. So we can eliminate this option.
upvoted 2 times

  ashu381 11 months, 2 weeks ago

Selected Answer: C

It should be C as regularization (L1/L2), early stopping and drop out are some of the ways in deep learning to handle overfitting. Other options
have specific values which may or may not solve overfitting as it depends on specific use case.
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: C

Went with C
upvoted 2 times

  wish0035 1 year, 4 months ago

Selected Answer: C

ANS: C

A and B are random values, why they choose that values?

D could increase even more overfitting since you're using a more complex model.

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 32/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

upvoted 2 times

  EFIGO 1 year, 5 months ago

Selected Answer: C

We don't know the optimum values for the parameters, so we need to run a hyperparameter tuning job; L2 regularization and dropout parameters
are great ways to avoid overfitting.
So C is the answer
upvoted 1 times

  GCP72 1 year, 8 months ago

Selected Answer: C

Correct answer is "C"

upvoted 1 times

  Mohamed_Mossad 1 year, 10 months ago

Selected Answer: C

- by options eliminations C,D are better than A,D (more automated , scalable)
- between C,D C is better as in D "and increase the number of neurons by a factor of 2" will make matters worse and increase overfitting
upvoted 1 times

  Mohamed_Mossad 1 year, 10 months ago

also in A,D mainly learning rate has no direct relation with overfitting
upvoted 1 times

  morgan62 2 years ago

Selected Answer: C

C for sure
upvoted 2 times

  giaZ 2 years, 1 month ago

Selected Answer: C

Best practice is to let a AI Platform tool run the tuning to optimize hyperparameters. Why should I trust values in answers A or B?? Plus L2
regularization and dropout are the way to go here.
upvoted 2 times

  caohieu04 2 years, 2 months ago

Selected Answer: C

Community vote
upvoted 2 times

  wences 2 years, 3 months ago

Selected Answer: C

it is the logical ans

upvoted 3 times

  stefant 2 years, 3 months ago

Selected Answer: C

regularization and dropout

upvoted 3 times

  NamitSehgal 2 years, 3 months ago

Increasing Neurons or layers / network will increase overfitting, it is good for under fitting. C should be fine.
upvoted 2 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 33/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #14 Topic 1

You built and manage a production system that is responsible for predicting sales numbers. Model accuracy is crucial, because the production

model is required to keep up with market changes. Since being deployed to production, the model hasn't changed; however the accuracy of the

model has steadily deteriorated.

What issue is most likely causing the steady decline in model accuracy?

A. Poor data quality

B. Lack of model retraining

C. Too few layers in the model for capturing information

D. Incorrect data split ratio during model training, evaluation, validation, and test

Correct Answer: D

Community vote distribution

B (100%)

  esuaaaa Highly Voted  2 years, 10 months ago

B. Retraining is needed as the market is changing.

upvoted 28 times

  sensev 2 years, 9 months ago

I also think it is B - who is giving the "correct" answers to the questions? I feel like 4 out of 5 of them are incorrect.
upvoted 11 times

  NickHapton Highly Voted  2 years, 4 months ago

the biggest issue of this website is `all correct answers` are wrong
upvoted 13 times

  Azhar10 Most Recent  4 weeks ago

Selected Answer: B

The market can be dynamic, Sales trends, customer preferences, and even competitor strategies might evolve over time but our model hasn't
changed since the deployment so our model can adapt with these changes by retraining only
Degradation Over Time: Without retraining to adapt to these changes, the model's predictions become less accurate as the real world diverges
from the data it was trained on.
upvoted 1 times

  97a158e 3 months, 1 week ago

As the consistent changes in the Market data, the Model in Production should regularly retrain for better results. Option B is the right choice
upvoted 1 times

  fragkris 5 months ago

Selected Answer: B

Keeping the model up to date is crucial. So - B.

upvoted 1 times

  Sum_Sum 5 months, 2 weeks ago

Selected Answer: B

B because the environment is changing and the model only captures past performance
upvoted 1 times

  harithacML 9 months, 3 weeks ago

Selected Answer: B

Situation : model trained long before.

Q : why accuracy of the model has steadily deteriorated.

A. Poor data quality : Model perfomance depens on trained model only. Quality issue should be taken care by pipeline and it do not much affect
the model to cause a performance slow down over time
B. Lack of model retraining : Very obvious
C. Too few layers in the model for capturing information : If so model wpould not have been deployed at first stage due to low performance on
unseen data
D. Incorrect data split ratio during model training, evaluation, validation, and test : This is relevant only at training when model deployed at first
place, We have way passed that. Not not the reason.
upvoted 2 times

  M25 11 months, 3 weeks ago

Selected Answer: B

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 34/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Went with B
upvoted 1 times

  niketd 1 year, 1 month ago

Selected Answer: B

B is correct. Model needs to keep up with the market changes, implying that the underlying data distribution would be changing as well. Hence
retrain the model.
upvoted 1 times

  tavva_prudhvi 1 year, 1 month ago

Selected Answer: B

The questions says the model is required to keep up with market changes, hence retraining needed.
upvoted 1 times

  Ade_jr 1 year, 4 months ago

B is the correct answer
upvoted 1 times

  wish0035 1 year, 4 months ago

Selected Answer: B

ANS: B
upvoted 1 times

  EFIGO 1 year, 5 months ago

Selected Answer: B

Data distribution changes over time and so should do the model, so B is the correct answer
upvoted 1 times

  GCP72 1 year, 8 months ago

Selected Answer: B

Correct answer is "B"

upvoted 1 times

  morgan62 2 years ago

Selected Answer: B

B for sure
upvoted 3 times

  caohieu04 2 years, 2 months ago

Selected Answer: B

Community vote
upvoted 2 times

  ESP_SAP 2 years, 3 months ago

Selected Answer: B

B. Retraining is needed as the market is changing. its how the Model keep updated and predictions accuracy.
upvoted 2 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 35/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #15 Topic 1

You have been asked to develop an input pipeline for an ML training model that processes images from disparate sources at a low latency. You

discover that your input data does not fit in memory. How should you create a dataset following Google-recommended best practices?

A. Create a tf.data.Dataset.prefetch transformation.

B. Convert the images to tf.Tensor objects, and then run Dataset.from_tensor_slices().

C. Convert the images to tf.Tensor objects, and then run tf.data.Dataset.from_tensors().

D. Convert the images into TFRecords, store the images in Cloud Storage, and then use the tf.data API to read the images for training.

Correct Answer: B

Reference:

https://www.tensorflow.org/api_docs/python/tf/data/Dataset

Community vote distribution

D (100%)

  chohan Highly Voted  2 years, 10 months ago

Should be D
upvoted 19 times

  alphard Highly Voted  2 years, 4 months ago

My option is D.

Cite from Google Pag: to construct a Dataset from data in memory, use tf.data.Dataset.from_tensors() or tf.data.Dataset.from_tensor_slices(). When
input data is stored in a file (not in memory), the recommended TFRecord format, you can use tf.data.TFRecordDataset().

tf.data.Dataset is for data in memory.

tf.data.TFRecordDataset is for data in non-memory storage.
upvoted 13 times

  pinimichele01 Most Recent  2 weeks, 3 days ago

Selected Answer: D

tf.data.Dataset is for data in memory.

tf.data.TFRecordDataset is for data in non-memory storage.
upvoted 1 times

  samratashok 1 month, 3 weeks ago

Selected Answer: D

why this website shows wrong option as answer, this is my observation from so many questions?
upvoted 2 times

  fragkris 5 months ago

Selected Answer: D

D is correct
upvoted 1 times

  Sum_Sum 5 months, 2 weeks ago

Selected Answer: D

D because:
tf.data.Dataset is for data in memory.
tf.data.TFRecordDataset is for data in non-memory storage.
upvoted 1 times

  boobyg1 6 months ago

Selected Answer: D

all "correct" answers are wrong

upvoted 2 times

  M25 11 months, 3 weeks ago

Selected Answer: D

Went with D
upvoted 1 times

  India_willsmith 1 year ago

For all questions the given answers and voted answers are different. Which one should be considered for exam?

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 36/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

upvoted 2 times

  Alfredo_OSS 1 year ago

You should consider the voted ones.
upvoted 2 times

  enghabeth 1 year, 2 months ago

Selected Answer: D

Converting your data into TFRecord has many advantages, such as: More efficient storage: the TFRecord data can take up less space than the
original data; it can also be partitioned into multiple files. Fast I/O: the TFRecord format can be read with parallel I/O operations, which is useful for
TPUs or multiple hosts
upvoted 1 times

  enghabeth 1 year, 2 months ago

Selected Answer: D

my option is D
upvoted 1 times

  Omi_04040 1 year, 4 months ago

Ans: D
upvoted 1 times

  wish0035 1 year, 4 months ago

Selected Answer: D

ans: D
upvoted 1 times

  EFIGO 1 year, 5 months ago

Selected Answer: D

For data in memory use tf.data.Dataset, for data in non-memory storage use tf.data.TFRecordDataset.
Since data don't fit in memory, go with option D.
upvoted 1 times

  GCP72 1 year, 8 months ago

Selected Answer: D

Correct answer is "D"

upvoted 1 times

  Mohamed_Mossad 1 year, 10 months ago

Selected Answer: D

- by options eliminations A is the first option to be dropped , prefetch will use additional memory overhead to buffer images
- answer in B,C,D but D is the best answer as we save the huge images dataset on gcs then load batches of data for training
- B,C not good as they did not provide a solution to images are not fit in memory
upvoted 2 times

  rsamant 2 years ago

converting to TFrecords and writing to storage account is low latency ? shouldn't it A as tf.data.Dataset also read the data in batch in memory ?
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 37/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #16 Topic 1

You are an ML engineer at a large grocery retailer with stores in multiple regions. You have been asked to create an inventory prediction model.

Your model's features include region, location, historical demand, and seasonal popularity. You want the algorithm to learn from new inventory

data on a daily basis. Which algorithms should you use to build the model?

A. Classification

B. Reinforcement Learning

C. Recurrent Neural Networks (RNN)

D. Convolutional Neural Networks (CNN)

Correct Answer: B

Reference:

https://www.kdnuggets.com/2018/03/5-things-reinforcement-learning.html

Community vote distribution

C (92%) 8%

  esuaaaa Highly Voted  2 years, 10 months ago

The answer is C. Use RNN because it is a time series analysis.

upvoted 27 times

  george_ognyanov Highly Voted  2 years, 6 months ago

As Y2Data pointed out, your reasoning for choosing B does not make much sense.

Furthermore, Reinforcement Learning for this question does not make much sense to me. Reinforcement Learning is basically agent - task
problems. You give the agent a task i.e. get out of a maze and then through trial and error and many many iterations the agent learns the correct
way to perform the task. It is called Reinforcement because you ... well ... reinforce the agent, you reward the agent for correct choices and penalize
for incorrect choices. In RL you dont use many / any previous data because the data is generated with each iteration I think.
upvoted 7 times

  vale_76_na_xxx Most Recent  4 months, 1 week ago

go for C
https://www.akkio.com/post/deep-learning-vs-reinforcement-learning-key-differences-and-use-
cases#:~:text=Reinforcement%20learning%20is%20particularly%20well,of%20reinforcement%20learning%20in%20action.
upvoted 1 times

  Sum_Sum 5 months, 2 weeks ago

Selected Answer: C

The question asks for "prediction model"

classification and RL do not fit the bill
CNN are used for vision
so only answer left is C
upvoted 1 times

  12112 9 months, 3 weeks ago

Selected Answer: C

I'm not sure that daily basis means it is time series. It could mean updating the model daily.
But I'll follow collective intelligence.
upvoted 2 times

  M25 11 months, 3 weeks ago

Selected Answer: C

Went with C
upvoted 1 times

  enghabeth 1 year, 2 months ago

Selected Answer: B

Reinforcement Learning(RL) is a type of machine learning technique that enables an agent to learn in an interactive environment by trial and error
using feedback from its own actions and experiences.
upvoted 1 times

  wish0035 1 year, 4 months ago

Selected Answer: C

ans: C
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 38/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

  EFIGO 1 year, 5 months ago

Selected Answer: C

RNN are a fit tool to work with time-series as this one, so C

upvoted 1 times

  GCP72 1 year, 8 months ago

Selected Answer: C

Correct answer is "C"

upvoted 2 times

  Mohamed_Mossad 1 year, 10 months ago

Selected Answer: C

"algorithm to learn from new inventory data on a daily basis" = time series model , best option to deal with time series is forsure RNN , vote for C
upvoted 1 times

  morgan62 2 years ago

Selected Answer: C

It's C.
upvoted 3 times

  A4M 2 years, 2 months ago

C - for time series
upvoted 2 times

  alphard 2 years, 4 months ago

My option is B.

"You want the algorithm to learn from new inventory data on a daily basis". The implication is a feedback with reward or punishment, which can
optimise the mode. But, all other options can only practice prediction against new data rather than learning knowledge from new data
automatically.
upvoted 4 times

  majejim435 2 years, 6 months ago

I think it's D (CNN).

I'd use C (RNN) in case we are predicting only based on historical demand (time series). However, as we are also taking region, location and
seasonality popularity into consideration, it is not a time series problem anymore.
upvoted 1 times

  mousseUwU 2 years, 6 months ago

*RNN* is the "Preferred algorithm for sequential data like time series, speech, text, financial data, audio, video, weather and much more" since "It
learns over time what information is important and what is not" because they "can remember important things about the input they received,
which allows them to be very precise in predicting what’s coming next".

- source: https://builtin.com/data-science/recurrent-neural-networks-and-lstm

And *Reinforcement Learning* doesn't mean that the model will learn from new data (better explained by george_ognyanov).
upvoted 5 times

  votvalenok 2 years, 9 months ago

B. "you want the algorithm to LEARN FROM NEW inventory data on a daily basis"
upvoted 2 times

  Y2Data 2 years, 7 months ago

how is "learn from new data" constituting option B?
upvoted 4 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 39/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #17 Topic 1

You are building a real-time prediction engine that streams files which may contain Personally Identifiable Information (PII) to Google Cloud. You

want to use the

Cloud Data Loss Prevention (DLP) API to scan the files. How should you ensure that the PII is not accessible by unauthorized individuals?

A. Stream all files to Google Cloud, and then write the data to BigQuery. Periodically conduct a bulk scan of the table using the DLP API.

B. Stream all files to Google Cloud, and write batches of the data to BigQuery. While the data is being written to BigQuery, conduct a bulk scan

of the data using the DLP API.

C. Create two buckets of data: Sensitive and Non-sensitive. Write all data to the Non-sensitive bucket. Periodically conduct a bulk scan of that

bucket using the DLP API, and move the sensitive data to the Sensitive bucket.

D. Create three buckets of data: Quarantine, Sensitive, and Non-sensitive. Write all data to the Quarantine bucket. Periodically conduct a bulk

scan of that bucket using the DLP API, and move the data to either the Sensitive or Non-Sensitive bucket.

Correct Answer: A

Community vote distribution

D (70%) B (26%) 4%

  chohan Highly Voted  2 years, 10 months ago

Should be D
https://cloud.google.com/architecture/automating-classification-of-data-uploaded-to-cloud-
storage#building_the_quarantine_and_classification_pipeline
upvoted 25 times

  Swagluke 2 years, 8 months ago

All PII should be Sensitive data, that's why I think the answer is A.
upvoted 1 times

  u_phoria 1 year, 9 months ago

Option D, as documented in that link (a fully automated process, using Cloud Functions - rather than a "periodic" scan as worded in the
question), would be my choice.

It's easier than B, which would work for a real-time scenario - but would require loads more custom work to implement (things like batching,
segmentation, triggering).

A and C are 'reactive' / periodic, and so not appropriate for the given scenario.
upvoted 1 times

  maartenalexander Highly Voted  2 years, 10 months ago

D; others pose risks
upvoted 5 times

  fragkris Most Recent  4 months, 3 weeks ago

Selected Answer: D

D - Quarantine bucket is the google reccomended approach

upvoted 2 times

  tavva_prudhvi 5 months, 4 weeks ago

Selected Answer: D

Option B does not provide a clear separation between sensitive and non-sensitive data before it is written to BigQuery, which means that PII might
be exposed during the process.

But, in D offers a better level of security by writing all the data to a Quarantine bucket first. This way, the DLP API can scan and categorize the data
into Sensitive or Non-sensitive buckets before it is further processed or stored. This ensures that PII is not accessible by unauthorized individuals, a
the sensitive data is identified and separated from the non-sensitive data before any further actions are taken.
upvoted 1 times

  harithacML 9 months, 3 weeks ago

Selected Answer: D

real-time prediction engine, that streams files to Google Cloud. PII is not accessible by unauthorized individuals.
D
upvoted 1 times

  Liting 9 months, 3 weeks ago

Selected Answer: D

D should be the correct answer

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 40/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: D

Went with D
upvoted 2 times

  lucaluca1982 1 year ago

Selected Answer: B

B is real time
upvoted 1 times

  dfdrin 1 year ago

Selected Answer: D

It's D
upvoted 1 times

  enghabeth 1 year, 2 months ago

Selected Answer: B

A, D, C they do not apply to a realtime case, all three say that the scan is applied periodically
Then it's B
upvoted 2 times

  tavva_prudhvi 1 year, 1 month ago

Never mentioned periodically in the question, if I'm not wrong?
upvoted 1 times

  guilhermebutzke 1 year, 3 months ago

Selected Answer: B

I think that is the correct because of the "real time" application.

upvoted 1 times

  EFIGO 1 year, 5 months ago

Selected Answer: D

D is the right answer: you can temporarily store the sensitive data in a Quarantine bucket with restricted access, then move the data to the relative
buckets once the PII have been protected.
upvoted 1 times

  GCP72 1 year, 8 months ago

Selected Answer: D

Correct answer is "D"

upvoted 1 times

  dasouna 1 year, 11 months ago

Answer is D : Question says that there MAY be sensitive data, so not all data is sensitive. This is why we need 3 buckets : Quarantine as a landing
bucket, sensitive for sensitive data after DLP scan, non-sensitive for non-sensitive after DLP scan.
https://cloud.google.com/architecture/automating-classification-of-data-uploaded-to-cloud-storage
upvoted 1 times

  atuls287 2 years, 1 month ago

Selected Answer: B

Reason being "Real Time' DLP scanning. Option A would scan all the data again and again. For others - Buckets etc is overkill and offline process.
upvoted 2 times

  lukacs16 2 years, 1 month ago

But what about the real-time element, how would that work with the quarantine?
upvoted 1 times

  pml2021 2 years, 1 month ago

Selected Answer: A

Altough both A and D are correct when scanning for the data using DLP, however here the question is streaming data and the best option in this
particular case would be A.
Check this use case Using Cloud DLP with BigQuery
https://cloud.google.com/dlp/docs/dlp-bigquery
Also the other use case involving the DLP using Quarantine bucket is by uploading the files and not streaming.
https://cloud.google.com/architecture/automating-classification-of-data-uploaded-to-cloud-
storage#building_the_quarantine_and_classification_pipeline
upvoted 1 times

  giaZ 2 years, 1 month ago

Why not B then? Ans A says "periodically". Shouldn't you scan as the data comes in, for real-time?
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 41/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 42/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #18 Topic 1

You work for a large hotel chain and have been asked to assist the marketing team in gathering predictions for a targeted marketing strategy. You

need to make predictions about user lifetime value (LTV) over the next 20 days so that marketing can be adjusted accordingly. The customer

dataset is in BigQuery, and you are preparing the tabular data for training with AutoML Tables. This data has a time signal that is spread across

multiple columns. How should you ensure that

AutoML fits the best model to your data?

A. Manually combine all columns that contain a time signal into an array. AIlow AutoML to interpret this array appropriately. Choose an

automatic data split across the training, validation, and testing sets.

B. Submit the data for training without performing any manual transformations. AIlow AutoML to handle the appropriate transformations.

Choose an automatic data split across the training, validation, and testing sets.

C. Submit the data for training without performing any manual transformations, and indicate an appropriate column as the Time column. AIlow

AutoML to split your data based on the time signal provided, and reserve the more recent data for the validation and testing sets.

D. Submit the data for training without performing any manual transformations. Use the columns that have a time signal to manually split your

data. Ensure that the data in your validation set is from 30 days after the data in your training set and that the data in your testing sets from 30

days after your validation set.

Correct Answer: D

Community vote distribution

D (68%) C (32%)

  kkd14 Highly Voted  2 years, 9 months ago

Should be D. As time signal that is spread across multiple columns so manual split is required.
upvoted 21 times

  sensev 2 years, 9 months ago

Also think it is D, since it mentioned that the time signal is spread across multiple columns.
upvoted 4 times

  GogoG 2 years, 6 months ago

Correct answer is C - AutoML handles training, validation, test splits automatically for you when you specify a Time column. There is no
requirement to do this manually.
upvoted 4 times

  george_ognyanov 2 years, 6 months ago

Correct answer is D. It clearly says the time signal data is spread across different columns. If it weren't then C would be correct and your
point would be valid. However, in this case the answer is D 100%.

https://cloud.google.com/automl-tables/docs/data-best-practices#time
upvoted 9 times

  irumata 2 years, 3 months ago

this comment is only about time information in different columns, not about time itself. C is correct as for me
upvoted 1 times

  irumata 2 years, 3 months ago

but if time signal means time mark not the business signal the D is the correct - very controversial
upvoted 1 times

  Werner123 2 months ago

I think the answer is C. In this case I am interpreting time signal as the features that hold predictive power as a function of time i.e. time signal.
There is no indication to how much data is available so using the 30 days after mark is not wise. You only have 30 days worth of data for
validation set. If you have a few years worth of data this seems like a unnecessary small validation set.
upvoted 4 times

  DucLee3110 Highly Voted  2 years, 10 months ago

C
You use the Time column to tell AutoML Tables that time matters for your data; it is not randomly distributed over time. When you specify the Time
column, AutoML Tables use the earliest 80% of the rows for training, the next 10% of rows for validation, and the latest 10% of rows for testing.
AutoML Tables treats each row as an independent and identically distributed training example; setting the Time column does not change this. The
Time column is used only to split the data set.
You must include a value for the Time column for every row in your dataset. Make sure that the Time column has enough distinct values, so that
the evaluation and test sets are non-empty. Usually, having at least 20 distinct values should be sufficient.
https://cloud.google.com/automl-tables/docs/prepare#time

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 43/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

upvoted 14 times

  salsabilsf 2 years, 9 months ago

From the link you provided, I think it's A :

The Time column must have a data type of Timestamp.

During schema review, you select this column as the Time column. (In the API, you use the timeColumnSpecId field.) This selection takes effect
only if you have not specified the data split column.

If you have a time-related column that you do not want to use to split your data, set the data type for that column to Timestamp but do not set
it as the Time column.
upvoted 2 times

  guilhermebutzke Most Recent  3 months, 1 week ago

Selected Answer: D

thinking that "spread across multiple columns" seems like "columns with redundant information," and considering how AutoML can deal with
correlated columns, I think option C is the best choice, with no need for a manual split.

However, "time information is not contained in a single column" is the same thing as "time signal that is spread across multiple columns." I agree
that D could be the best option.

Then, I tend to think that D is the best choice because the text could be more clearly expressed in redundant options.
upvoted 2 times

  Mickey321 5 months, 2 weeks ago

Selected Answer: C

Either C or D but leaning towards C as not get the 30 days in D

upvoted 2 times

  Sum_Sum 5 months, 2 weeks ago

Selected Answer: D

"data has a time signal that is spread across multiple columns" - I interpret as having > 1 timeseries column.
AutoML knows how to deal with a single column but not multiple
hence answer is D
upvoted 1 times

  Krish6488 5 months, 3 weeks ago

Selected Answer: C

Since AutoML is good enough to perform the splits, C appears to be the right answer. Moreover, time information across multiple columns which
requires manual split as per option D is different from the question's scenario where the time signal is spread across multiple columns which can b
hours, months, days, etc. if we can define in AutoML the right time signal column, its enojugh to split the data and pick most recent data as test
data and earliest data as test data
upvoted 1 times

  atlas_lyon 8 months, 1 week ago

Selected Answer: D

A Wrong, Even if columns are combines into a 1D-array(column), the time signal should be noticed to autoML anyway. Automatic split cannot wor
since we need more than 20 days history
B Wrong, Without indicating time signal to AutoML, data would leak in (time leakage) in training/validation/test sets
C Wrong, but might be possible if time signal wouldn't have bee spread across multiple columns
D True, because time signal is spread accross multiple columns require to manually split the data. Since we want to predict LTV over the next 20
days, it is necessary to have at least 20 days history between the splits (30 seems okay: 10 days predictions) Validating and testing on the last 2
months seems reasonable for marketing purpose (usually seasonal).
upvoted 2 times

  12112 9 months, 3 weeks ago

Why 30 days after each data sets, even though we need to predict only for 20 days?
upvoted 1 times

  Liting 9 months, 3 weeks ago

Selected Answer: D

Agree with kkd14. D should be the correct answer.

upvoted 1 times

  SamuelTsch 9 months, 3 weeks ago

Selected Answer: C

As far as I understand, that AutoML table can handle time-signal column full automatically. Thus, I went to C.
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: D

Went with D
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 44/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

  hghdh5454 1 year, 1 month ago

C. Submit the data for training without performing any manual transformations, and indicate an appropriate column as the Time column. Allow
AutoML to split your data based on the time signal provided, and reserve the more recent data for the validation and testing sets.

This approach ensures that AutoML can handle the time-based nature of the data properly. By providing the Time column, AutoML can
automatically split the data in a way that respects the time-based structure, using more recent data for validation and testing. This approach is
especially important for time-series data, as it helps prevent leakage of future information into the training set, ensuring a more accurate and
reliable model.
upvoted 1 times

  enghabeth 1 year, 2 months ago

Selected Answer: D

https://cloud.google.com/automl-tables/docs/data-best-practices#time

- If the time information is not contained in a single column, you can use a manual data split to use the most recent data as the test data, and the
earliest data as the training data.
upvoted 1 times

  John_Pongthorn 1 year, 3 months ago

Selected Answer: D

I go with D: https://cloud.google.com/automl-tables/docs/data-best-practices#time
Read it carefully at the last paragraph of the topic: If the time information is not contained in a single column, you can use a manual data split to
use the most recent data as the test data, and the earliest data as the training data.
upvoted 1 times

  Omi_04040 1 year, 4 months ago

Answer: D
https://cloud.google.com/automl-tables/docs/data-best-practices#time
upvoted 1 times

  hiromi 1 year, 4 months ago

Selected Answer: D

D
https://cloud.google.com/automl-tables/docs/data-best-practices
upvoted 2 times

  EFIGO 1 year, 5 months ago

Selected Answer: D

Automatic splitting is wrong for time-series, you need to split the data in older-newer, so A and B are wrong.
Since the time info is split in more columns, we can't use the option provided by C for the timestamps, but we need to go with D.
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 45/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #19 Topic 1

You have written unit tests for a Kubeflow Pipeline that require custom libraries. You want to automate the execution of unit tests with each new

push to your development branch in Cloud Source Repositories. What should you do?

A. Write a script that sequentially performs the push to your development branch and executes the unit tests on Cloud Run.

B. Using Cloud Build, set an automated trigger to execute the unit tests when changes are pushed to your development branch.

C. Set up a Cloud Logging sink to a Pub/Sub topic that captures interactions with Cloud Source Repositories. Configure a Pub/Sub trigger for

Cloud Run, and execute the unit tests on Cloud Run.

D. Set up a Cloud Logging sink to a Pub/Sub topic that captures interactions with Cloud Source Repositories. Execute the unit tests using a

Cloud Function that is triggered when messages are sent to the Pub/Sub topic.

Correct Answer: B

Community vote distribution

B (100%)

  maartenalexander Highly Voted  2 years, 10 months ago

B. GCP recommends to use Cloud Build when building KubeFlow Pipelines. It's possible to run unit tests in Cloud Build. And, the others seems
overly complex/unnecessary
upvoted 16 times

  mousseUwU Highly Voted  2 years, 6 months ago

B makes sense because of this: https://cloud.google.com/architecture/architecture-for-mlops-using-tfx-kubeflow-pipelines-and-cloud-
build#cicd_architecture
upvoted 7 times

  mousseUwU 2 years, 6 months ago

The image explains a lot
upvoted 2 times

  Sum_Sum Most Recent  5 months, 2 weeks ago

Selected Answer: B

B is the only sensible answer as its a feature of CloudBuild

everything else is the delusions of a madmen
upvoted 1 times

  SamuelTsch 9 months, 3 weeks ago

Selected Answer: B

A, C, D need addiontal maunal tasks.

B is correct.
upvoted 1 times

  Scipione_ 11 months, 2 weeks ago

Selected Answer: B

Cloud Build is the best choice but the other answers are feasible.
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: B

Went with B
upvoted 1 times

  enghabeth 1 year, 2 months ago

Selected Answer: B

Because it is the most automatic of the options

upvoted 1 times

  wish0035 1 year, 4 months ago

Selected Answer: B

ans: B
upvoted 1 times

  EFIGO 1 year, 5 months ago

Selected Answer: B

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 46/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

B is the Google-recommended best practice.

upvoted 1 times

  GCP72 1 year, 8 months ago

Correct answer is "B"
upvoted 1 times

  morgan62 2 years ago

Selected Answer: B

B it is.
upvoted 2 times

  Danny2021 2 years, 7 months ago

Easy one, B, Cloud Build is the tool for CI/CD.
upvoted 5 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 47/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #20 Topic 1

You are training an LSTM-based model on AI Platform to summarize text using the following job submission script: gcloud ai-platform jobs submit

training $JOB_NAME \

--package-path $TRAINER_PACKAGE_PATH \

--module-name $MAIN_TRAINER_MODULE \

--job-dir $JOB_DIR \

--region $REGION \

--scale-tier basic \

-- \

--epochs 20 \

--batch_size=32 \

--learning_rate=0.001 \

You want to ensure that training time is minimized without significantly compromising the accuracy of your model. What should you do?

A. Modify the 'epochs' parameter.

B. Modify the 'scale-tier' parameter.

C. Modify the 'batch size' parameter.

D. Modify the 'learning rate' parameter.

Correct Answer: C

Community vote distribution

B (100%)

  maartenalexander Highly Voted  2 years, 10 months ago

B. Changing the scale tier does not impact performance–only speeds up training time. Epochs, Batch size, and learning rate all are hyperparameter
that might impact model accuracy.
upvoted 28 times

  SamuelTsch Most Recent  9 months, 3 weeks ago

Selected Answer: B

A, C, D could impact the accuracy. But B not.

upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: B

Went with B
upvoted 1 times

  enghabeth 1 year, 2 months ago

Selected Answer: B

A is incorrect, less training iteration will affect model performance.

B is correct, cost is not a concern as it is not mentioned in the question, the scale tier can be upgraded to significantly minimize the training time.

C is incorrect, wouldn’t affect training time, but would affect model performance.

D is incorrect, the model might converge faster with higher learning rate, but this would affect the training routine and might cause exploding
gradients.
upvoted 1 times

  ares81 1 year, 3 months ago

Selected Answer: B

It's B!
upvoted 1 times

  EFIGO 1 year, 5 months ago

Selected Answer: B

A, C, D are all about hyperparameters that might impact model accuracy, while B is just about computing speed; so upgrading the scale tier will
make the model faster with no chance of reducing accuracy.
upvoted 2 times

  GCP72 1 year, 8 months ago

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 48/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Selected Answer: B

Correct answer is "B"

upvoted 1 times

  Mohamed_Mossad 1 year, 10 months ago

Selected Answer: B

- using options elimination all options except B can harm the accuracy
upvoted 3 times

  morgan62 2 years ago

Selected Answer: B

B for sure.
upvoted 2 times

  igor_nov1 2 years, 2 months ago

Selected Answer: B

Might be hrlpfull https://cloud.google.com/ai-platform/training/docs/machine-types#scale_tiers

Google may optimize the configuration of the scale tiers for different jobs over time, based on customer feedback and the availability of cloud
resources. Each scale tier is defined in terms of its suitability for certain types of jobs. Generally, the more advanced the tier, the more machines are
allocated to the cluster, and the more powerful the specifications of each virtual machine. As you increase the complexity of the scale tier, the
hourly cost of training jobs, measured in training units, also increases. See the pricing page to calculate the cost of your job.
upvoted 1 times

  ashii007 2 years, 4 months ago

A,C and D all point to hyper parameter tuning which is not the objective in the question.

As others have said - B is only way to improve the time to training the model.
upvoted 3 times

  santy79 2 years, 5 months ago

Selected Answer: B

examtopics , Can we attach releveant docs why C ?

upvoted 1 times

  mousseUwU 2 years, 6 months ago

Correct is B, scale-tier is the definition of what GPU will be used: https://cloud.google.com/ai-platform/training/docs/using-gpus
upvoted 3 times

  Y2Data 2 years, 7 months ago

Should be B.
Question didn't say anything about cost, so while B would increase cost with more computation time, it would save real-world time.
upvoted 3 times

  Danny2021 2 years, 7 months ago

Go with B, all the other options could affect the accuracy.
upvoted 3 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 49/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #21 Topic 1

You have deployed multiple versions of an image classification model on AI Platform. You want to monitor the performance of the model versions

over time. How should you perform this comparison?

A. Compare the loss performance for each model on a held-out dataset.

B. Compare the loss performance for each model on the validation data.

C. Compare the receiver operating characteristic (ROC) curve for each model using the What-If Tool.

D. Compare the mean average precision across the models using the Continuous Evaluation feature.

Correct Answer: B

Community vote distribution

D (64%) B (23%) 14%

  chohan Highly Voted  2 years, 10 months ago

Answer is D
upvoted 13 times

  Danny2021 Highly Voted  2 years, 7 months ago

D is correct. Choose the feature / capability GCP provides is always a good bet. :)
upvoted 6 times

  Sum_Sum Most Recent  5 months, 2 weeks ago

Selected Answer: D

D - because you are using a Google provided feature.

remember in this exam its important to always choose the google services over anything else
upvoted 3 times

  claude2046 6 months, 3 weeks ago

mAP is for object detection, so the answer should be B
upvoted 1 times

  Liting 9 months, 3 weeks ago

Selected Answer: D

Went with D, using continuous evaluation feature seems correct to me.

upvoted 1 times

  SamuelTsch 9 months, 3 weeks ago

Selected Answer: D

I choose by myself D. But as I read the post here https://www.v7labs.com/blog/mean-average-precision, I was not sure about D.
It wrote mAP is commonly used for object detection or instance segmentation tasks.
Validation Dataset in GCP context: not trained dataset and not seen dataset
upvoted 1 times

  Voyager2 11 months ago

Selected Answer: D

D. Compare the mean average precision across the models using the Continuous Evaluation feature
https://cloud.google.com/vertex-ai/docs/evaluation/introduction
Vertex AI provides model evaluation metrics, such as precision and recall, to help you determine the performance of your models...
Vertex AI supports evaluation of the following model types:
AuPRC: The area under the precision-recall (PR) curve, also referred to as average precision. This value ranges from zero to one, where a higher
value indicates a higher-quality model.
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: D

Went with D
upvoted 1 times

  lucaluca1982 1 year ago

Selected Answer: B

I go for B. Option D is good when we are already in production

upvoted 1 times

  prakashkumar1234 1 year, 1 month ago

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 50/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

o monitor the performance of the model versions over time, you should compare the loss performance for each model on the validation data.
Therefore, option B is the correct answer.
upvoted 1 times

  Jarek7 11 months, 3 weeks ago

Please, How? B is not monitoring. It is a validation. The definition of monitoring states:
"observe and check the progress or quality of (something) over a period of time"
So it is a continuous process. Each option A,B,C are just one time check, not monitoring.
upvoted 3 times

  Fatiy 1 year, 2 months ago

Selected Answer: B

The best option to monitor the performance of multiple versions of an image classification model on AI Platform over time is to compare the loss
performance for each model on the validation data.

Option B is the best approach because comparing the loss performance of each model on the validation data is a common method to monitor
machine learning model performance over time. The validation data is a subset of the data that is not used for model training, but is used to
evaluate its performance during training and to compare different versions of the model. By comparing the loss performance of each model on the
same validation data, you can determine which version of the model has better performance.
upvoted 4 times

  enghabeth 1 year, 2 months ago

Selected Answer: D

If you have multiple model versions in a single model and have created an evaluation job for each one, you can view a chart comparing the mean
average precision of the model versions over time
upvoted 1 times

  guilhermebutzke 1 year, 2 months ago

Guys, I not sure about the answer D ... And maybe you could help me in my arguments.

I think choose loss to compare the model performance is better than see for metrics. For example, when can build an image model classification
that has good precision metrics, because the class in unbalanced, but the loss could be terrible because of kind of loss choose that penalizes
classes.

so, losses are better than metrics to available models, and the answer is in A or B.

I thought that the A could be the answer because I see validation as a part of the training process. So, If we want to test the model performance
over time, we have to use new data, which I suppose to be the held-out data.
upvoted 3 times

  wish0035 1 year, 4 months ago

Selected Answer: D

ans: D
upvoted 1 times

  EFIGO 1 year, 5 months ago

Selected Answer: D

Since you want to monitor the performance of the model versions *over time*, use the Continuous Evaluation feature, so D
upvoted 1 times

  vakati 1 year, 5 months ago

Answer : D
https://cloud.google.com/ai-platform/prediction/docs/continuous-evaluation#how_it_works
upvoted 3 times

  GCP72 1 year, 8 months ago

Selected Answer: D

Correct answer is "D"

upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 51/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #22 Topic 1

You trained a text classification model. You have the following SignatureDefs:

You started a TensorFlow-serving component server and tried to send an HTTP request to get a prediction using: headers = {"content-type":

"application/json"} json_response = requests.post('http: //localhost:8501/v1/models/text_model:predict', data=data, headers=headers)

What is the correct way to write the predict request?

A. data = json.dumps({‫ג‬€signature_name‫ ג‬:€‫ג‬€seving_default‫ ג‬,€‫ג‬€instances‫ג‬€ [['ab', 'bc', 'cd']]})

B. data = json.dumps({‫ג‬€signature_name‫ ג‬:€‫ג‬€serving_default‫ ג‬,€‫ג‬€instances‫ג‬€ [['a', 'b', 'c', 'd', 'e', 'f']]})

C. data = json.dumps({‫ג‬€signature_name‫ ג‬:€‫ג‬€serving_default‫ ג‬,€‫ג‬€instances‫ג‬€ [['a', 'b', 'c'], ['d', 'e', 'f']]})

D. data = json.dumps({‫ג‬€signature_name‫ ג‬:€‫ג‬€serving_default‫ ג‬,€‫ג‬€instances‫ג‬€ [['a', 'b'], ['c', 'd'], ['e', 'f']]})

Correct Answer: C

Community vote distribution

D (100%)

  [Removed] Highly Voted  2 years, 11 months ago

Options:

A. data = json.dumps({“signature_name”: “seving_default”, “instances” [[‘ab’, ‘bc’, ‘cd’]]})

B. data = json.dumps({“signature_name”: “serving_default”, “instances” [[‘a’, ‘b’, ‘c’, ‘d’, ‘e’, ‘f’]]})
C. data = json.dumps({“signature_name”: “serving_default”, “instances” [[‘a’, ‘b’, ‘c’], [‘d’, ‘e’, ‘f’]]})
D. data = json.dumps({“signature_name”: “serving_default”, “instances” [[‘a’, ‘b’], [‘c’, ‘d’], [‘e’, ‘f’]]})
upvoted 27 times

  maartenalexander Highly Voted  2 years, 10 months ago

Most likely D. A negative number in the shape enables auto expand (https://stackoverflow.com/questions/37956197/what-is-the-negative-index-
in-shape-arrays-used-for-tensorflow).

Then the first number -1 out of the shape (-1, 2) speaks the number of 1 dimensional arrays within the tensor (and it can autoexpand) while the
second numer (2) sets the number of elements in the inner array at 2. Hence D.
upvoted 19 times

  M25 Most Recent  11 months, 3 weeks ago

Selected Answer: D

Went with D
upvoted 2 times

  wish0035 1 year, 4 months ago

Selected Answer: D

ans: D
upvoted 1 times

  EFIGO 1 year, 5 months ago

Selected Answer: D

Having "shape=[-1,2]", the input can have as many rows as we want, but each row needs to be of 2 elements. The only option satisfying this
requirement is D.
upvoted 1 times

  GCP72 1 year, 8 months ago

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 52/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Selected Answer: D

Correct answer is "D"

upvoted 1 times

  Mohamed_Mossad 1 year, 10 months ago

Selected Answer: D

will vote for D , as the data shape in instances matches the shape in signature def
upvoted 1 times

  pml2021 2 years, 1 month ago

Selected Answer: D

shape is (-1,2) indicating any no of rows, 2 columns only.

upvoted 2 times

  mousseUwU 2 years, 6 months ago

D is correct if shape(-1,2) means 2 columns for each row
upvoted 3 times

  mousseUwU 2 years, 6 months ago

Link to explanation: https://stackoverflow.com/questions/37956197/what-is-the-negative-index-in-shape-arrays-used-for-tensorflow
upvoted 1 times

  Danny2021 2 years, 7 months ago

D: (-1, 2) represents a vector with any number of rows but only 2 columns.
upvoted 5 times

  inder0007 2 years, 10 months ago

Correct answer is D, the shapes otherwise don't matter
upvoted 4 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 53/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #23 Topic 1

Your organization's call center has asked you to develop a model that analyzes customer sentiments in each call. The call center receives over one

million calls daily, and data is stored in Cloud Storage. The data collected must not leave the region in which the call originated, and no Personally

Identifiable Information (PII) can be stored or analyzed. The data science team has a third-party tool for visualization and access which requires a

SQL ANSI-2011 compliant interface. You need to select components for data processing and for analytics. How should the data pipeline be

designed?

A. 1= Dataflow, 2= BigQuery

B. 1 = Pub/Sub, 2= Datastore

C. 1 = Dataflow, 2 = Cloud SQL

D. 1 = Cloud Function, 2= Cloud SQL

Correct Answer: B

Community vote distribution

A (100%)

  inder0007 Highly Voted  2 years, 10 months ago

The correct answer is A

upvoted 18 times

  GogoG 2 years, 7 months ago

Evidence here https://github.com/GoogleCloudPlatform/dataflow-contact-center-speech-analysis
upvoted 7 times

  salsabilsf Highly Voted  2 years, 10 months ago

Should be A
upvoted 7 times

  Sum_Sum Most Recent  5 months, 2 weeks ago

Selected Answer: A

A - because it has BigQuery.

Almost never would you see an answer that prefers CloudSQL over BQ
upvoted 2 times

  M25 11 months, 3 weeks ago

Selected Answer: A

Went with A
upvoted 2 times

  MithunDesai 1 year, 4 months ago

Selected Answer: A

correct answer is A
upvoted 1 times

  Moulichintakunta 1 year, 4 months ago

Selected Answer: A

we need a dataflow to process data from cloud storage and data is unstructured and if we want to perform analysis on unstructured with SQL
interface BIgQuery is the only option
upvoted 1 times

  EFIGO 1 year, 5 months ago

Selected Answer: A

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 54/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

You need to do analytics, so the answer needs to contain BigQuery and only option A does.
Moreover, BigQuery is fine with SQL and Dataflow is the right tool for the processing pipline.
upvoted 1 times

  GCP72 1 year, 8 months ago

Selected Answer: A

Correct answer is "A"

upvoted 1 times

  SUNWS7 1 year, 11 months ago

D - to call API you need Cloud Functions. Dataflow would be for ETL
upvoted 2 times

  SUNWS7 1 year, 11 months ago

Sorry incorrect - Dataflow can call external API so stand corrected . Answer : A
upvoted 2 times

  SUNWS7 2 years, 1 month ago

Selected Answer: A

Dataflow & BigQuery

upvoted 2 times

  skipper_com 2 years, 5 months ago

A, https://cloud.google.com/architecture/architecture-for-mlops-using-tfx-kubeflow-pipelines-and-cloud-build Fig.6
upvoted 1 times

  mousseUwU 2 years, 6 months ago

A is correct
Dataflow - Unified stream and batch data processing that's serverless, fast, and cost-effective
BigQuery - Good for analytics and dashboards
upvoted 3 times

  pddddd 2 years, 7 months ago

BQ is SQL ANSI-2011 compliant
upvoted 1 times

  Danny2021 2 years, 7 months ago

A or C. Not sure how many third-party tool supports BigQuery. If not, then the answer is C.
upvoted 2 times

  David_ml 1 year, 11 months ago

wrong. cloud sql is not for analytics.
upvoted 1 times

  Jijiji 2 years, 8 months ago

it's def A
upvoted 3 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 55/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #24 Topic 1

You are an ML engineer at a global shoe store. You manage the ML models for the company's website. You are asked to build a model that will

recommend new products to the user based on their purchase behavior and similarity with other users. What should you do?

A. Build a classification model

B. Build a knowledge-based filtering model

C. Build a collaborative-based filtering model

D. Build a regression model using the features as predictors

Correct Answer: C

Reference:

https://cloud.google.com/solutions/recommendations-using-machine-learning-on-compute-engine

Community vote distribution

C (100%)

  maartenalexander Highly Voted  2 years, 10 months ago

C. Collaborative filtering is about user similarity and product recommendations. Other models won't work
upvoted 19 times

  Sum_Sum Most Recent  5 months, 2 weeks ago

Selected Answer: C

Chat gPT:
Collaborative filtering models are specifically designed for recommendation systems. They work by analyzing the interactions and behaviors of
users and items, then making predictions about what users will like based on similarities with other users. In this case, since you're looking at
purchase behavior and user similarities, a collaborative filtering approach is well-suited to identify and recommend products that users with similar
behaviors have liked or purchased.

Classification models (Option A) and regression models (Option D) are generally used for different types of predictive modeling tasks, not
specifically for recommendations. A knowledge-based filtering model (Option B), while useful in recommendation systems, relies more on explicit
knowledge about users and items, rather than on user interaction patterns and similarities, which seems to be the focus in this scenario.
upvoted 1 times

  10SR 8 months, 1 week ago

C. Collaborative filtering is apt amongst the answers
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: C

Went with C
upvoted 2 times

  wish0035 1 year, 4 months ago

Selected Answer: C

ans: C
upvoted 1 times

  hiromi 1 year, 4 months ago

Selected Answer: C

C
https://cloud.google.com/blog/topics/developers-practitioners/looking-build-recommendation-system-google-cloud-leverage-following-
guidelines-identify-right-solution-you-part-i
upvoted 1 times

  EFIGO 1 year, 5 months ago

Selected Answer: C

This is a textbook application of collaborative filtering, C is the correct answer

upvoted 1 times

  GCP72 1 year, 8 months ago

Selected Answer: C

Correct answer is "C"

upvoted 1 times

  Mohamed_Mossad 1 year, 10 months ago

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 56/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Selected Answer: C

https://developers.google.com/machine-learning/recommendation/collaborative/basics
upvoted 1 times

  giaZ 2 years, 1 month ago

Selected Answer: C

Definitely C
upvoted 2 times

  caohieu04 2 years, 2 months ago

Selected Answer: C

Community vote
upvoted 2 times

  xiaoF 2 years, 3 months ago

should be C
upvoted 2 times

  mousseUwU 2 years, 6 months ago

C - https://cloud.google.com/architecture/recommendations-using-machine-learning-on-compute-engine#filtering_the_data
upvoted 4 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 57/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #25 Topic 1

You work for a social media company. You need to detect whether posted images contain cars. Each training example is a member of exactly one

class. You have trained an object detection neural network and deployed the model version to AI Platform Prediction for evaluation. Before

deployment, you created an evaluation job and attached it to the AI Platform Prediction model version. You notice that the precision is lower than

your business requirements allow. How should you adjust the model's final layer softmax threshold to increase precision?

A. Increase the recall.

B. Decrease the recall.

C. Increase the number of false positives.

D. Decrease the number of false negatives.

Correct Answer: D

Community vote distribution

B (93%) 7%

  Paul_Dirac Highly Voted  2 years, 10 months ago

Decreasing FN increases recall (D). So D and A are the same.

Increasing FP decreases precision (C).

Answer: B ("improving precision typically reduces recall and vice versa", https://developers.google.com/machine-learning/crash-
course/classification/precision-and-recall)
upvoted 26 times

  Swagluke 2 years, 8 months ago

I do believe B is the right answer.
But D and A aren't exactly the same.
A. Increase recall can be either
1. keeping TP + FN the same but increase TP and decrease FN. Which isn't sure how that's gonna affect Precision since both TP and TP+FP
increase.
2. keeping TP the same but increase (TP + FN), which is increasing FN (Same as D), not sure how that will affect Precision as well.
upvoted 3 times

  Danny2021 Highly Voted  2 years, 7 months ago

Precision = TruePositives / (TruePositives + FalsePositives)

Recall = TruePositives / (TruePositives + FalseNegatives)
A. Increase recall -> will decrease precision
B. Decrease recall -> will increase precision
C. Increase the false positives -> will decrease precision
D. Decrease the false negatives -> will increase recall, reduce precision
The correct answer is B.
upvoted 19 times

  SamuelTsch Most Recent  9 months, 3 weeks ago

Selected Answer: B

To increase precision, you have to decrese recall, increse true positives, increse false negatives and decrease false positives
upvoted 2 times

  M25 11 months, 3 weeks ago

Selected Answer: B

Went with B
upvoted 3 times

  Fatiy 1 year, 2 months ago

Selected Answer: B

Option B is the best approach because decreasing the threshold will increase the precision by reducing the number of false positives.
upvoted 1 times

  John_Pongthorn 1 year, 3 months ago

Selected Answer: B

A , C , D they are the same. So I go with B , it is threshold adjustment from 0.5 +-

upvoted 1 times

  John_Pongthorn 1 year, 3 months ago

WE want to increase Precision, it is the same as decreasing recall. Both are opposed each other.
https://developers.google.com/machine-learning/crash-course/classification/precision-and-recall

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 58/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

upvoted 1 times

  wish0035 1 year, 4 months ago

ans: B.
A: should decrease even more the precission.
C: will decrease precision
D: will increase recall (precision would be the same)
upvoted 1 times

  EFIGO 1 year, 5 months ago

Selected Answer: B

Precision and recall are negatively correlated, when one goes up the other goes down and vice-versa; to increase precidion we need to decrease
recall, therefore answer B.
(To be more complete, answer C and D are wrong because they both would increase recall, according to the recall formula)
upvoted 2 times

  GCP72 1 year, 8 months ago

Selected Answer: C

Correct answer is "C"

upvoted 1 times

  GCP72 1 year, 8 months ago

sorry correct ans is " B"
upvoted 1 times

  originalliang 1 year, 8 months ago

Answer is D
If the dataset does not change, TP + FN is constant.
FN goes down then TP goes up.
Hence Precision = TP / TP + FP goes up.
upvoted 2 times

  Mohamed_Mossad 1 year, 10 months ago

Selected Answer: B

precision and recall have negative proportion , so to increase precision reduce recall
upvoted 1 times

  morgan62 2 years ago

Selected Answer: B

It's B.
C,D is basically ruining your model.
upvoted 1 times

  sonxxx 2 years, 1 month ago

Answer: D
Because of Precision should respond the answer how many retrieved items are relevant? In the relation of False Negative / true positives an optima
precision need a high number of true positives. If your model is precision is lower than your business requirement is because the model has a high
number of false negatives. Check it in: https://en.wikipedia.org/wiki/Precision_and_recall
upvoted 2 times

  xiaoF 2 years, 3 months ago

Selected Answer: B

definitely B
upvoted 1 times

  Sangy22 2 years, 3 months ago

I think this should be C. The reason is, for one to increase precision, the classification threshold for whether the car is there or not should be kept
low. That way, even when the model is not very confident (say only 60% confident), it will say, yes, car is there. What this does is it will crease the
times the model says car is present, driving up precision (when it says car is there, car is really there). The consequence of this is, False positives wil
increase too, reducing recall.
So C is my choice.

Choices A and B are not really right, as precision and recall are after-effects, not something you will control ahead.
upvoted 1 times

  Bemnet 2 years, 4 months ago

Answer is B . 100% sure . The only way to affect precision and recall is by adjusting threshold. FN and FP go in opposite direction so C & D are the
same. A increasing recall decreases precision .
upvoted 3 times

  santy79 2 years, 5 months ago

Selected Answer: B

examtopics, It would be good if justification is attached with correct answer

Decrease recall -> will increase precision
upvoted 2 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 59/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 60/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #26 Topic 1

You are responsible for building a unified analytics environment across a variety of on-premises data marts. Your company is experiencing data

quality and security challenges when integrating data across the servers, caused by the use of a wide range of disconnected tools and temporary

solutions. You need a fully managed, cloud-native data integration service that will lower the total cost of work and reduce repetitive work. Some

members on your team prefer a codeless interface for building Extract, Transform, Load (ETL) process. Which service should you use?

A. Dataflow

B. Dataprep

C. Apache Flink

D. Cloud Data Fusion

Correct Answer: D

Community vote distribution

D (93%) 7%

  [Removed] Highly Voted  2 years, 11 months ago

D. correct.
Reference: https://cloud.google.com/data-fusion
upvoted 14 times

  pinimichele01 Most Recent  2 weeks, 1 day ago

Selected Answer: D

codeless interface -> D

upvoted 1 times

  Sum_Sum 5 months, 2 weeks ago

Selected Answer: D

D is correct
upvoted 1 times

  SamuelTsch 9 months, 3 weeks ago

Selected Answer: D

I think D is correct.
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: D

Went with D
upvoted 1 times

  FDS1993 1 year, 1 month ago

Selected Answer: B

Answer is B
upvoted 1 times

  Fatiy 1 year, 2 months ago

Selected Answer: D

Cloud Data Fusion is a fully managed, cloud-native data integration service provided by Google Cloud Platform. It is designed to simplify the
process of building and managing ETL pipelines across a variety of data sources and targets.
upvoted 2 times

  EFIGO 1 year, 5 months ago

Selected Answer: D

"codeless interface" ==> Data Fusion

upvoted 3 times

  GCP72 1 year, 8 months ago

Selected Answer: D

Correct answer is "D"

upvoted 1 times

  capt2101akash 1 year, 9 months ago

Selected Answer: D

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 61/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

D is correct as it is codeless
upvoted 1 times

  Mohamed_Mossad 1 year, 10 months ago

Selected Answer: D

https://cloud.google.com/data-fusion/docs/concepts/overview#using_the_code-free_web_ui
upvoted 1 times

  morgan62 2 years ago

Selected Answer: D

D without any doubt

upvoted 2 times

  xiaoF 2 years, 3 months ago

D.
Datafusion is more designed for data ingestion from one source to another one, with few transformation. Dataprep is more designed for data
preparation (as its name means), data cleaning, new column creation, splitting column. Dataprep also provide insight of the data for helping you in
your recipes.
upvoted 3 times

  majejim435 2 years, 6 months ago

D. Dataprep would also work but Data Fusion is better suited.
(See https://stackoverflow.com/questions/58175386/can-google-data-fusion-make-the-same-data-cleaning-than-dataprep)
upvoted 2 times

  mousseUwU 2 years, 6 months ago

D is correct

Visual point-and-click interface enabling code-free deployment of ETL/ELT data pipelines and Operate high-volumes of data pipelines periodically

source: https://cloud.google.com/data-fusion#all-features
upvoted 4 times

  raintree 2 years, 7 months ago

B. Dataprep makes use of Apache beam, which can process streaming and batch, and thus prevent training-serving skew.
upvoted 2 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 62/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #27 Topic 1

You are an ML engineer at a regulated insurance company. You are asked to develop an insurance approval model that accepts or rejects

insurance applications from potential customers. What factors should you consider before building the model?

A. Redaction, reproducibility, and explainability

B. Traceability, reproducibility, and explainability

C. Federated learning, reproducibility, and explainability

D. Differential privacy, federated learning, and explainability

Correct Answer: A

Community vote distribution

B (70%) D (30%)

  gcp2021go Highly Voted  2 years, 9 months ago

I think the answer should be B. as I review the OECD document on impact of AI on insurance, the document mention explainability, traceable.
However, open for discussion. https://www.oecd.org/finance/Impact-Big-Data-AI-in-the-Insurance-Sector.pdf
upvoted 32 times

  salsabilsf Highly Voted  2 years, 10 months ago

Should be B
upvoted 13 times

  DucLee3110 2 years, 10 months ago

I think it should be A, as it is regulated, so need to have PII
upvoted 2 times

  gscharly Most Recent  1 week, 3 days ago

Selected Answer: B

went with B
upvoted 2 times

  Sum_Sum 5 months, 2 weeks ago

Selected Answer: B

B. Traceability, reproducibility, and explainability.

Traceability: This involves maintaining records of the data, decisions, and processes used in the model. This is crucial in regulated industries for
audit purposes and to ensure compliance with regulatory standards. It helps in understanding how the model was developed and how it makes
decisions.

Reproducibility: Ensuring that the results of the model can be reproduced using the same data and methods is vital for validating the model's
reliability and for future development or debugging.

Explainability: Given the significant impact of the model’s decisions on individuals' lives, it's crucial that the model's decisions can be explained in
understandable terms. This is not just a best practice in AI ethics; in many jurisdictions, it's a legal requirement under regulations that mandate
transparency in automated decision-making.
upvoted 2 times

  tavva_prudhvi 10 months ago

Selected Answer: B

B. Traceability, reproducibility, and explainability are the most important factors to consider before building an insurance approval model.
Traceability ensures that the data used in the model is reliable and can be traced back to its source.
Reproducibility ensures that the model can be replicated and tested to ensure its accuracy and fairness.
Explainability ensures that the model's decisions can be explained to customers and regulators in a transparent manner. These factors are crucial
for building a trustworthy and compliant model for an insurance company.
Redaction is also important for protecting sensitive customer information, but it is not as critical as the other factors listed. Federated learning and
differential privacy are techniques used to protect data privacy, but they are not necessarily required for building an insurance approval model.
upvoted 4 times

  M25 11 months, 3 weeks ago

Selected Answer: B

Went with B
upvoted 1 times

  shankalman717 1 year, 2 months ago

Selected Answer: B

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 63/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

B. Traceability, reproducibility, and explainability

When developing an insurance approval model, it's crucial to consider several factors to ensure that the model is fair, accurate, and compliant with
regulations. The factors to consider include:

Traceability: It's important to be able to trace the data used to build the model and the decisions made by the model. This is important for
transparency and accountability.

Reproducibility: The model should be built in a way that allows for its reproducibility. This means that other researchers should be able to
reproduce the same results using the same data and methods.

Explainability: The model should be able to provide clear and understandable explanations for its decisions. This is important for building trust with
customers and ensuring compliance with regulations.

Other factors that may also be important to consider, depending on the specific context of the insurance company and its customers, include data
privacy and security, fairness, and bias mitigation.
upvoted 4 times

  shankalman717 1 year, 2 months ago

B. Traceability, reproducibility, and explainability

When developing an insurance approval model, it's crucial to consider several factors to ensure that the model is fair, accurate, and compliant with
regulations. The factors to consider include:

Traceability: It's important to be able to trace the data used to build the model and the decisions made by the model. This is important for
transparency and accountability.

  ares81 1 year, 3 months ago

Selected Answer: D

Checking Google documents, it seems D.

upvoted 2 times

  tavva_prudhvi 1 year, 1 month ago

Please mention the links
upvoted 1 times

  wish0035 1 year, 4 months ago

Selected Answer: B

ans: B
upvoted 1 times

  GCP72 1 year, 8 months ago

Selected Answer: B

Correct answer is "B"

upvoted 1 times

  capt2101akash 1 year, 9 months ago

Selected Answer: D

should be D as all of the techniques abide to any problems related to insurance

upvoted 3 times

  suresh_vn 1 year, 9 months ago

B should be True
upvoted 1 times

  rgrand8 1 year, 12 months ago

Selected Answer: B

Traceability is a key factor due to GDPR laws

upvoted 3 times

  baimus 2 years, 1 month ago

Just to add weight the correct side: this is B.
upvoted 2 times

  lordcenzin 2 years, 2 months ago

Selected Answer: B

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 64/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

don't undertsand why you are thinking of the privacy issue. Here it is not mentioned nor relevant imo. Moreover the traceability is a key. For me B
upvoted 3 times

  sid515 2 years, 3 months ago

It should be D. Differential privacy - so that PII data is masked, Federated learning - Federated Learning enables mobile phones to collaboratively
learn a shared prediction model while keeping all the training data on device, decoupling the ability to do machine learning from the need to store
the data in the cloud
upvoted 2 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 65/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #28 Topic 1

You are training a Resnet model on AI Platform using TPUs to visually categorize types of defects in automobile engines. You capture the training

profile using the

Cloud TPU profiler plugin and observe that it is highly input-bound. You want to reduce the bottleneck and speed up your model training process.

Which modifications should you make to the tf.data dataset? (Choose two.)

A. Use the interleave option for reading data.

B. Reduce the value of the repeat parameter.

C. Increase the buffer size for the shuttle option.

D. Set the prefetch option equal to the training batch size.

E. Decrease the batch size argument in your transformation.

Correct Answer: AE

Community vote distribution

AD (90%) 10%

  ralf_cc Highly Voted  2 years, 9 months ago

AD - please weigh in guys

upvoted 37 times

  danielp14021990 Highly Voted  2 years, 5 months ago

A. Use the interleave option for reading data. - Yes, that helps to parallelize data reading.
B. Reduce the value of the repeat parameter. - No, this is only to repeat rows of the dataset.
C. Increase the buffer size for the shuttle option. - No, there is only a shuttle option.
D. Set the prefetch option equal to the training batch size. - Yes, this will pre-load the data.
E. Decrease the batch size argument in your transformation. - No, could be even slower due to more I/Os.

https://www.tensorflow.org/guide/data_performance
upvoted 24 times

  harithacML Most Recent  9 months, 3 weeks ago

Selected Answer: AD

A and D : https://www.tensorflow.org/guide/data_performance , interleave and prefetch

upvoted 2 times

  M25 11 months, 3 weeks ago

Selected Answer: AD

Went with A & D

upvoted 2 times

  MithunDesai 1 year, 4 months ago

Selected Answer: AD

yes AD
upvoted 1 times

  OJ42 1 year, 8 months ago

Selected Answer: AD

Yes AD
upvoted 1 times

  GCP72 1 year, 8 months ago

Selected Answer: AD

YES.....AD - agree with danielp1

upvoted 1 times

  u_phoria 1 year, 9 months ago

Selected Answer: AD

AD - agree with danielp1

By the way, this is handy to understand the significance of shuffle buffer_size: https://stackoverflow.com/a/48096625/1933315
upvoted 2 times

  onku 1 year, 9 months ago

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 66/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Selected Answer: DE

I think D & E are correct.

upvoted 1 times

  Xrobat 1 year, 10 months ago

AD should be the right answer.
upvoted 3 times

  eddy1234567890 1 year, 10 months ago

Answers?
upvoted 1 times

  93alejandrosanchez 2 years, 6 months ago

For me it should be D and E as well. Prefetching will help reading data while training is performed, which helps with the bottleneck, D is for sure
right. I think decreasing batch size would help too, because less records will be read in each training step (reading a lot of records would lead to
the bottleneck described, as reading data is costly).

I'm not 100% sure on A, personally I don't think processing many input files concurrently would help in this case because the reading operation is
precisely the problem. However, I'm no expert in this topic so I might be wrong.
upvoted 2 times

  klemiec 2 years, 2 months ago

D is not correct answer. Instead of decrising batch size, incrising may help. (https://cloud.google.com/tpu/docs/performance-guide - "TPU
model performance" section)
upvoted 1 times

  gcp2021go 2 years, 9 months ago

I think it should be DE. I found this article https://towardsdatascience.com/overcoming-data-preprocessing-bottlenecks-with-tensorflow-data-
service-nvidia-dali-and-other-d6321917f851
upvoted 3 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 67/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #29 Topic 1

You have trained a model on a dataset that required computationally expensive preprocessing operations. You need to execute the same

preprocessing at prediction time. You deployed the model on AI Platform for high-throughput online prediction. Which architecture should you

use?

A. Validate the accuracy of the model that you trained on preprocessed data. Create a new model that uses the raw data and is available in

real time. Deploy the new model onto AI Platform for online prediction.

B. Send incoming prediction requests to a Pub/Sub topic. Transform the incoming data using a Dataflow job. Submit a prediction request to AI

Platform using the transformed data. Write the predictions to an outbound Pub/Sub queue.

C. Stream incoming prediction request data into Cloud Spanner. Create a view to abstract your preprocessing logic. Query the view every

second for new records. Submit a prediction request to AI Platform using the transformed data. Write the predictions to an outbound Pub/Sub

queue.

D. Send incoming prediction requests to a Pub/Sub topic. Set up a Cloud Function that is triggered when messages are published to the

Pub/Sub topic. Implement your preprocessing logic in the Cloud Function. Submit a prediction request to AI Platform using the transformed

data. Write the predictions to an outbound Pub/Sub queue.

Correct Answer: D

Reference:

https://cloud.google.com/pubsub/docs/publisher

Community vote distribution

B (76%) D (24%)

  SparkExpedition Highly Voted  2 years, 9 months ago

Supporting B ..https://cloud.google.com/architecture/data-preprocessing-for-ml-with-tf-transform-pt1#where_to_do_preprocessing
upvoted 28 times

  inder0007 Highly Voted  2 years, 10 months ago

I think it should b B
upvoted 13 times

  q4exam 2 years, 7 months ago

I also agree with B, this is how I would advise clients to do it as well
upvoted 3 times

  Liting Most Recent  9 months, 3 weeks ago

Selected Answer: B

Went with B, using dataflow for large amount data transformation is the best option
upvoted 2 times

  SamuelTsch 9 months, 3 weeks ago

Selected Answer: B

I went to B.
A is completely wrong. C: 1st cloud spanner is not designed for high throughput, also it is not for preprocessing. D: cloud function could not be ge
enough resource to do the high computational transformation.
upvoted 2 times

  ashu381 10 months, 3 weeks ago

Selected Answer: B

Because the concern here is high throughput and not specifically the latency so better to go with option B
upvoted 1 times

  Voyager2 11 months ago

Selected Answer: D

B. Send incoming prediction requests to a Pub/Sub topic. Transform the incoming data using a Dataflow job. Submit a prediction request to AI
Platform using the transformed data. Write the predictions to an outbound Pub/Sub queue
https://dataintegration.info/building-streaming-data-pipelines-on-google-cloud
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: B

Went with B
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 68/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

  e707 1 year ago

Selected Answer: D

I think it's D as B is not a good choice because it requires you to run a Dataflow job for each prediction request. This is inefficient and can lead to
latency issues.
upvoted 2 times

  lucaluca1982 1 year ago

Yes i agree Dataflow can introduce latency
upvoted 1 times

  lucaluca1982 1 year ago

Selected Answer: D

I go for D. Option B has Dataflow that it is more suitable for batch

upvoted 1 times

  SergioRubiano 1 year, 1 month ago

Selected Answer: B

It's B
upvoted 1 times

  MithunDesai 1 year, 4 months ago

Selected Answer: B

yes ans B
upvoted 1 times

  hiromi 1 year, 4 months ago

Selected Answer: B

B
Pubsub + DataFlow + Vertex AI (AI Platform)
upvoted 1 times

  suresh_vn 1 year, 8 months ago

Selected Answer: B

Should be B. Dataflow is BEST option for preprocessing training , testing data both
upvoted 1 times

  sachinxshrivastav 1 year, 8 months ago

Selected Answer: B

Answer should be B
upvoted 1 times

  Mohamed_Mossad 1 year, 10 months ago

Selected Answer: B

- using options eliminatios , A totally wrong , D also not valid as cloud functions is not sutiable for heavy data workflows
- answer between B,D will vote for B as dataflow is the best solution while dealing with heavy data workflows
upvoted 2 times

  gcp2021go 2 years, 6 months ago

Why not D?
upvoted 1 times

  kaike_reis 2 years, 5 months ago

Because, most of the time where you need to execute a full transformation pipeline and you have a comparison between dataflow and cloud
function it's recommended to go with dataflow. It's a solution more prepared to solve those cases.
upvoted 2 times

  fdmenendez 2 years, 3 months ago

I understand that, but question about the "for high-throughput online prediction", is it dataflow more suitable for online?
upvoted 1 times

  Grkrish2002 2 years, 4 months ago

Computationally expensive is the keyword. Cloud function will not be suitable for these kind of preprocessing workloads
upvoted 5 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 69/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #30 Topic 1

Your team trained and tested a DNN regression model with good results. Six months after deployment, the model is performing poorly due to a

change in the distribution of the input data. How should you address the input differences in production?

A. Create alerts to monitor for skew, and retrain the model.

B. Perform feature selection on the model, and retrain the model with fewer features.

C. Retrain the model, and select an L2 regularization parameter with a hyperparameter tuning service.

D. Perform feature selection on the model, and retrain the model on a monthly basis with fewer features.

Correct Answer: C

Community vote distribution

A (100%)

  celia20200410 Highly Voted  2 years, 9 months ago

Data values skews: These skews are significant changes in the

statistical properties of data, which means that data patterns are
changing, and you need to trigger a retraining of the model to capture
these changes.
https://developers.google.com/machine-learning/guides/rules-of-ml/#rule_37_measure_trainingserving_skew
upvoted 31 times

  mousseUwU 2 years, 6 months ago

I agree, A is correct
upvoted 2 times

  oliveolil 2 years, 5 months ago

Rule #37:
The difference between the performance on the holdout data and the "nextday" data. Again, this will always exist. You should tune your
regularization to maximize the next-day performance. However, large drops in performance between holdout and next-day data may indicate
that some features are time-sensitive and possibly degrading model performance.

Maybe it should be C
upvoted 2 times

  Paul_Dirac Highly Voted  2 years, 10 months ago

A
Data drift doesn't necessarily require feature reselection (e.g. by L2 regularization).
https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning#challenges
upvoted 5 times

  tavva_prudhvi Most Recent  10 months ago

Selected Answer: A

When the distribution of input data changes, the model may not perform as well as it did during training. It is important to monitor the
performance of the model in production and identify any changes in the distribution of input data. By creating alerts to monitor for skew, you can
detect when the input data distribution has changed and take action to retrain the model using more recent data that reflects the new distribution
This will help ensure that the model continues to perform well in production.
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: A

Went with A
upvoted 2 times

  SergioRubiano 1 year, 1 month ago

Selected Answer: A

A is correct
upvoted 1 times

  tavva_prudhvi 1 year, 1 month ago

Its A, as the model itself is performing well, neither overfitting nor performing poorly suddenly, it's a gradual change so regularization on the
original model would not help. C is incorrect.
upvoted 1 times

  Fatiy 1 year, 2 months ago

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 70/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Selected Answer: A

Creating alerts to monitor for skew in the input data can help to detect when the distribution of the data has changed and the model's
performance is affected. Once a skew is detected, retraining the model with the new data can improve its performance.
upvoted 1 times

  enghabeth 1 year, 2 months ago

Selected Answer: A

Skew & drift monitoring: Production data tends to constantly change in different dimensions (i.e. time and system wise). And this causes the
performance of the model to drop.
https://cloud.google.com/vertex-ai/docs/model-monitoring/using-model-monitoring
upvoted 1 times

  hiromi 1 year, 4 months ago

Selected Answer: A

A
You don't need to do feature selection again
upvoted 2 times

  Mohamed_Mossad 1 year, 10 months ago

Selected Answer: A

A very obvious , no need for explanation

upvoted 1 times

  Mohamed_Mossad 1 year, 10 months ago

Selected Answer: A

abviously A no tricks here , no too much thinking

upvoted 1 times

  ggorzki 2 years, 3 months ago

Selected Answer: A

A
as celia explained
upvoted 1 times

  kaike_reis 2 years, 5 months ago

Colleagues that said (C) keep attention for the question: They said the model was good, so for skewness is only necessary the (A) solution.
upvoted 1 times

  Danny2021 2 years, 7 months ago

A. It is well documented in Google model monitoring docs.
upvoted 2 times

  gcp2021go 2 years, 9 months ago

should be C. as L2 regularization prevent overfitting - can potential maintain model performance if data distribution is little skewed.
upvoted 2 times

  inder0007 2 years, 10 months ago

A model learns the distribution of the data, if it has done its job well any change in the distribution will lead to underperformance not by virtue of
poor model performance but by very definition.
upvoted 2 times

  [Removed] 2 years, 11 months ago

C. "A problem is said to be ill-posed if small changes in the given information cause large changes in the solution. This instability with respect to
the data makes solutions unreliable because small measurement errors or uncertainties in parameters may be greatly magnified and lead to wildly
different responses. […] The idea behind regularization is to use supplementary information to restate an ill-posed problem in a stable form."

Reference: https://machinelearningmastery.com/introduction-to-regularization-to-reduce-overfitting-and-improve-generalization-error/
upvoted 4 times

  sonxxx 2 years, 1 month ago

right. It is a general problem in the model so we need find a general solution for the model. A answer increase instability and the model cost.
upvoted 1 times

  Y2Data 2 years, 7 months ago

The model itself is fine, neither overfitting nor performing poorly suddenly, it's a gradual change so regularization on the original model would
not help. C is incorrect.
upvoted 4 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 71/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #31 Topic 1

You need to train a computer vision model that predicts the type of government ID present in a given image using a GPU-powered virtual machine

on Compute

Engine. You use the following parameters:

✑ Optimizer: SGD
✑ Image shape = 224ֳ—224
✑ Batch size = 64
✑ Epochs = 10
✑ Verbose =2
During training you encounter the following error: ResourceExhaustedError: Out Of Memory (OOM) when allocating tensor. What should you do?

A. Change the optimizer.

B. Reduce the batch size.

C. Change the learning rate.

D. Reduce the image shape.

Correct Answer: B

Reference:

https://github.com/tensorflow/tensorflow/issues/136

Community vote distribution

B (90%) 10%

  maartenalexander Highly Voted  2 years, 10 months ago

B. I think you want to reduce batch size. Learning rate and optimizer shouldn't really impact memory utilisation. Decreasing image size (A) would
work, but might be costly in terms final performance
upvoted 22 times

  guruguru Highly Voted  2 years, 9 months ago

B. https://stackoverflow.com/questions/59394947/how-to-fix-resourceexhaustederror-oom-when-allocating-
tensor/59395251#:~:text=OOM%20stands%20for%20%22out%20of,in%20your%20Dense%20%2C%20Conv2D%20layers
upvoted 9 times

  SamuelTsch Most Recent  9 months, 3 weeks ago

Selected Answer: B

no doubt went to B
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: B

Went with B
upvoted 2 times

  SergioRubiano 1 year, 1 month ago

Selected Answer: B

B is correct
upvoted 1 times

  Fatiy 1 year, 2 months ago

Selected Answer: B

By reducing the batch size, the amount of memory required for each iteration of the training process is reduced
upvoted 1 times

  Fatiy 1 year, 2 months ago

Selected Answer: A

  Fatiy 1 year, 2 months ago

Sorry it's not the response for this question. it's the response for the previous question.
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 72/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

  John_Pongthorn 1 year, 2 months ago

Selected Answer: B

Reduce the image shape != Reduce the image Size.

upvoted 1 times

  seifou 1 year, 5 months ago

The answer is B
Since you are using an SGD, you can use a batch size of 1
ref: https://stackoverflow.com/questions/63139072/batch-size-for-stochastic-gradient-descent-is-length-of-training-data-and-not-1
upvoted 2 times

  Mohamed_Mossad 1 year, 10 months ago

Selected Answer: B

to fix memory overflow you need to reduce batch size also reduce input resolution is valid
but reducing image size can harm model performance , so answer is B
upvoted 3 times

  alphard 2 years, 4 months ago

B is my option. But, D seems not wrong.

Reducing batch size or reducing image size bot can reduce memory usage. But, the former seems much easier.
upvoted 2 times

  kaike_reis 2 years, 5 months ago

B is correct.

Letter D can be used, as we reduced the image size but this will directly impact the model's performance. Another point is that when doing this, if
you are using a model via Keras's `Functional API` you need to change the definition of the input and also apply pre-processing on the image to
reduce its size . In other words: much more work than the letter B.
upvoted 3 times

  mousseUwU 2 years, 6 months ago

B is correct, it uses less memory.

A works too but depending on what you need you will loose perfomance (just like maartenalexander said) so I think it is not recommended.
upvoted 3 times

  george_ognyanov 2 years, 6 months ago

Initially, I though D. ,decreasing image size, would be the correct one, but now that I am reviewing the test I think maartenalexander is correct in
saying reduced image size might decrease final performance, so I'd go with B eventually.
upvoted 2 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 73/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #32 Topic 1

You developed an ML model with AI Platform, and you want to move it to production. You serve a few thousand queries per second and are

experiencing latency issues. Incoming requests are served by a load balancer that distributes them across multiple Kubeflow CPU-only pods

running on Google Kubernetes Engine

(GKE). Your goal is to improve the serving latency without changing the underlying infrastructure. What should you do?

A. Significantly increase the max_batch_size TensorFlow Serving parameter.

B. Switch to the tensorflow-model-server-universal version of TensorFlow Serving.

C. Significantly increase the max_enqueued_batches TensorFlow Serving parameter.

D. Recompile TensorFlow Serving using the source to support CPU-specific optimizations. Instruct GKE to choose an appropriate baseline

minimum CPU platform for serving nodes.

Correct Answer: D

Community vote distribution

D (43%) A (39%) C (17%)

  Y2Data Highly Voted  2 years, 7 months ago

D is correct since this question is focusing on server performance which development env is higher than production env. It's already throttling so
increase the pressure on them won't help. Both A and C is essentially doing this. B is a bit mysterious, but we definitely know that D would work.
upvoted 25 times

  mousseUwU 2 years, 6 months ago

I think it's D too
upvoted 2 times

  pinimichele01 Most Recent  2 weeks, 1 day ago

Selected Answer: D

increasing the max_batch_size TensorFlow Serving parameter, is not the best choice because increasing the batch size may not necessarily improve
latency. In fact, it may even lead to higher latency for individual requests, as they will have to wait for the batch to be filled before processing. This
may be useful when optimizing for throughput, but not for serving latency, which is the primary goal in this scenario.
upvoted 1 times

  pico 5 months, 2 weeks ago

Selected Answer: C

https://github.com/tensorflow/serving/blob/master/tensorflow_serving/batching/README.md#batch-scheduling-parameters-and-tuning

A may help to some extent, but it primarily affects how many requests are processed in a single batch. It might not directly address latency issues.

D is a valid approach for optimizing TensorFlow Serving for CPU-specific optimizations, but it's a more involved process and might not be the
quickest way to address latency issues.
upvoted 4 times

  ichbinnoah 5 months, 2 weeks ago

Selected Answer: A

I think A is correct, as D implies changes to the infrastructure (question says you must not do that).
upvoted 1 times

  edoo 1 month, 3 weeks ago

This is purely a software optimization and on how GKE handles requests. GKE should be able to choose different CPU types for nodes within the
same cluster, which doesn't represent a change in architecture.
upvoted 1 times

  tavva_prudhvi 8 months, 3 weeks ago

Selected Answer: D

  harithacML 9 months, 3 weeks ago

Selected Answer: D

max_batch_size parameter controls the maximum number of requests that can be batched together by TensorFlow Serving. Increasing this
parameter can help reduce the number of round trips between the client and server, which can improve serving latency. However, increasing the
batch size too much can lead to higher memory usage and longer processing times for each batch.

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 74/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

upvoted 1 times

  Liting 9 months, 3 weeks ago

Selected Answer: D

Definetely D
to improve the serving latency of an ML model on AI Platform, you can recompile TensorFlow Serving using the source to support CPU-specific
optimizations and instruct GKE to choose an appropriate baseline minimum CPU platform for serving nodes, this way GKE will schedule the pods
on nodes with at least that CPU platform.
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: D

Went with D
upvoted 1 times

  SergioRubiano 1 year, 1 month ago

Selected Answer: A

A is correct. max_batch_size TensorFlow Serving parameter

upvoted 2 times

  Yajnas_arpohc 1 year, 1 month ago

Selected Answer: A

CPU-only: One Approach

If your system is CPU-only (no GPU), then consider starting with the following values: num_batch_threads equal to the number of CPU cores;
max_batch_size to a really high value; batch_timeout_micros to 0. Then experiment with batch_timeout_micros values in the 1-10 millisecond (1000
10000 microsecond) range, while keeping in mind that 0 may be the optimal value.

https://github.com/tensorflow/serving/tree/master/tensorflow_serving/batching
upvoted 2 times

  frangm23 1 year ago

In that very link, what it says is that max_batch_size is the parameter that governs the latency/troughput tradeoff, and as I understand, the
higher the batch size, the higher the throughput, but that doesn't assure that latency will be lower.
I would go with D
upvoted 3 times

  Omi_04040 1 year, 4 months ago

Answer: D
https://www.youtube.com/watch?v=fnZTVQ1SnDg
upvoted 1 times

  wish0035 1 year, 4 months ago

Selected Answer: D

ans: D
upvoted 1 times

  sachinxshrivastav 1 year, 8 months ago

Selected Answer: D

D is the right one

upvoted 1 times

  sachinxshrivastav 1 year, 8 months ago

Selected Answer: D

D is the correct one

upvoted 1 times

  felden 1 year, 9 months ago

Selected Answer: D

A would further increase latency. It may only help to improve the throughput if the memory and computation power of the GKE pods are not
saturated.
upvoted 1 times

  dunhill 1 year, 10 months ago

Selected Answer: D

aggree with Y2Data's viewpoint

upvoted 1 times

  u_phoria 1 year, 10 months ago

Selected Answer: A

Tricky one... I'd go for A, based on https://www.tensorflow.org/tfx/serving/performance#batch_size:

"Configuring the latter kind of batching allows you to hit TensorFlow Serving at extremely high QPS, while allowing it to sub-linearly scale the
compute resources needed to keep up."

In other words, trade a bit of latency to batch up requests (that latency being of the order of very few milliseconds or less, based on
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 75/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

https://github.com/tensorflow/serving/blob/master/tensorflow_serving/batching/README.md#batch-scheduling-parameters-and-tuning) in order
to gain sub-linear scalability when under heavy load. This assumes that latency is a consequence of being processing bound - which is implied but
not explicitly stated.

D would likely also work, but is more involved. By manually recompiling TF Serving, we are starting to move away from the goodness of a fully
managed solution...
upvoted 2 times

  u_phoria 1 year, 9 months ago

Coming back to this 3 weeks on - it's actually D.
A helps more with throughput. On the other hand, TF Serving is by default built with modest assumptions around CPU support for dense
comping features. And "building TensorFlow Serving from source is relatively easy using Docker" (per same link in original post -
https://www.tensorflow.org/tfx/serving/performance).
upvoted 4 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 76/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #33 Topic 1

You have a demand forecasting pipeline in production that uses Dataflow to preprocess raw data prior to model training and prediction. During

preprocessing, you employ Z-score normalization on data stored in BigQuery and write it back to BigQuery. New training data is added every week.

You want to make the process more efficient by minimizing computation time and manual intervention. What should you do?

A. Normalize the data using Google Kubernetes Engine.

B. Translate the normalization algorithm into SQL for use with BigQuery.

C. Use the normalizer_fn argument in TensorFlow's Feature Column API.

D. Normalize the data with Apache Spark using the Dataproc connector for BigQuery.

Correct Answer: B

Community vote distribution

B (82%) Other

  maartenalexander Highly Voted  2 years, 10 months ago

B. I think. BiqQuery definitely minimizes computational time for normalization. I think it would also minimize manual intervention. For data
normalization in dataflow you'd have to pass in values of mean and standard deviation as a side-input. That seems more work than a simple SQL
query
upvoted 20 times

  93alejandrosanchez 2 years, 6 months ago

I agree that B would definitely get the job done. But wouldn't D work as well and keep all the data pre-processing in Dataflow?
upvoted 2 times

  kaike_reis 2 years, 5 months ago

Dataflow uses Beam, different from dataproc that uses Spark.

I think that D would be wrong because we would add one more service into the pipeline for a simple transformation (minus the mean and
divide by std).
upvoted 3 times

  Sum_Sum Most Recent  5 months, 2 weeks ago

Selected Answer: B

z-scores is very easy to do in BQ - no need for more complex solutions

upvoted 1 times

  elenamatay 7 months, 2 weeks ago

B. All that maartenalexander said, + BigQuery already has a function for that: https://cloud.google.com/bigquery/docs/reference/standard-
sql/bigqueryml-syntax-standard-scaler , we could even schedule the query for calculating this automatically :)
upvoted 2 times

  aaggii 9 months, 2 weeks ago

Selected Answer: C

Every week when new data is loaded mean and standard deviation is calculated for it and passed as parameter to calculate z score at serving
https://towardsdatascience.com/how-to-normalize-features-in-tensorflow-5b7b0e3a4177
upvoted 1 times

  tavva_prudhvi 9 months, 2 weeks ago

owever, in the given scenario, you are using Dataflow for preprocessing and BigQuery for storing data.

To make the process more efficient by minimizing computation time and manual intervention, you should still opt for option B: Translate the
normalization algorithm into SQL for use with BigQuery. This way, you can perform the normalization directly in BigQuery, which will save time
and resources compared to using an external tool.
upvoted 1 times

  SamuelTsch 9 months, 3 weeks ago

Selected Answer: B

A, D usually need additional configuration, which could cost much more time.
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: B

Went with B
upvoted 2 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 77/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

  SergioRubiano 1 year, 1 month ago

Selected Answer: B

Best way is B
upvoted 2 times

  Fatiy 1 year, 2 months ago

Selected Answer: D

Option D is the best solution because Apache Spark provides a distributed computing platform that can handle large-scale data processing with
ease. By using the Dataproc connector for BigQuery, Spark can read data directly from BigQuery and perform the normalization process in a
distributed manner. This can significantly reduce computation time and manual intervention. Option A is not a good solution because Kubernetes
is a container orchestration platform that does not directly provide data normalization capabilities. Option B is not a good solution because Z-scor
normalization is a data transformation technique that cannot be easily translated into SQL. Option C is not a good solution because the
normalizer_fn argument in TensorFlow's Feature Column API is only applicable for feature normalization during model training, not for data
preprocessing.
upvoted 2 times

  ares81 1 year, 3 months ago

Selected Answer: B

Best way to proceed is B.

upvoted 2 times

  Fatiy 1 year, 2 months ago

SQL is not as flexible as other programming languages like Python, which can limit the ability to customize the normalization process or
incorporate new features in the future.
upvoted 1 times

  Mohamed_Mossad 1 year, 10 months ago

Selected Answer: B

B is the most efficient as you will not load --> process --> save , no you will only write some sql in bigquery and voila :D
upvoted 4 times

  baimus 2 years, 1 month ago

It's B, bigquery can do this internally, no need for dataflow
upvoted 2 times

  Fatiy 1 year, 2 months ago

SQL is not as flexible as other programming languages like Python, which can limit the ability to customize the normalization process or
incorporate new features in the future.
upvoted 1 times

  xiaoF 2 years, 2 months ago

Selected Answer: B

I agree with B.
upvoted 2 times

  alashin 2 years, 9 months ago

B. I agree with B as well.
upvoted 3 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 78/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #34 Topic 1

You need to design a customized deep neural network in Keras that will predict customer purchases based on their purchase history. You want to

explore model performance using multiple model architectures, store training data, and be able to compare the evaluation metrics in the same

dashboard. What should you do?

A. Create multiple models using AutoML Tables.

B. Automate multiple training runs using Cloud Composer.

C. Run multiple training jobs on AI Platform with similar job names.

D. Create an experiment in Kubeflow Pipelines to organize multiple runs.

Correct Answer: C

Community vote distribution

D (83%) C (17%)

  ralf_cc Highly Voted  2 years, 9 months ago

D - https://www.kubeflow.org/docs/about/use-cases/
upvoted 11 times

  salsabilsf Highly Voted  2 years, 10 months ago

Should be D
upvoted 6 times

  tikka0804 Most Recent  5 months, 1 week ago

I would vote for D but if C had said instead "different job names" .. would that have been a better option?
upvoted 1 times

  Sum_Sum 5 months, 2 weeks ago

Selected Answer: D

D - everything else is just nonsense

upvoted 1 times

  SamuelTsch 9 months, 3 weeks ago

Selected Answer: D

D should be correct
upvoted 2 times

  Liting 9 months, 3 weeks ago

Selected Answer: D

C has similar job name, which make it wrong

So correct answer should be D
upvoted 1 times

  tavva_prudhvi 10 months ago

Selected Answer: D

The best approach is to create an experiment in Kubeflow Pipelines to organize multiple runs.

Option A is incorrect because AutoML Tables is a managed machine learning service that automates the process of building machine learning
models from tabular data. It does not provide the flexibility to customize the model architecture or explore multiple model architectures.

Option B is incorrect because Cloud Composer is a managed workflow orchestration service that can be used to automate machine learning
workflows. However, it does not provide the same level of flexibility or scalability as Kubeflow Pipelines.

Option C is incorrect because running multiple training jobs on AI Platform with similar job names will not allow you to easily organize and
compare the results.
upvoted 5 times

  M25 11 months, 3 weeks ago

Selected Answer: D

Went with D
upvoted 1 times

  Fatiy 1 year, 2 months ago

Selected Answer: D

With Kubeflow Pipelines, you can create experiments that help you keep track of multiple training runs with different model architectures and
hyperparameters.
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 79/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

upvoted 1 times

  mymy9418 1 year, 4 months ago

Selected Answer: C

https://cloud.google.com/vertex-ai/docs/experiments/user-journey/uj-compare-models
upvoted 2 times

  suresh_vn 1 year, 8 months ago

D
option C does not work since CAIP have updated to VertexAI
upvoted 1 times

  Mohamed_Mossad 1 year, 9 months ago

Selected Answer: D

https://www.kubeflow.org/docs/components/pipelines/concepts/experiment/
https://www.kubeflow.org/docs/components/pipelines/concepts/run/
upvoted 1 times

  mmona19 2 years ago

Selected Answer: D

D- we need to use experiments feature to comapre models,having different jobnames is not going to help track experiments.
upvoted 3 times

  sid515 2 years, 3 months ago

C for me. It only talks about experimentation .. thats where AI platform fits better.
upvoted 2 times

  NamitSehgal 2 years, 3 months ago

Selected Answer: C

Similar job names is a bit of a confusion creator as we can not use same job names for sure. D sounds better but better in vertex AI during
experiment phase only.
upvoted 1 times

  kfrd 2 years, 6 months ago

C anyone? D seems to me like an overkill.
upvoted 4 times

  kaike_reis 2 years, 5 months ago

(C) presents the most specific solution for what the question asks for: experimenting with models with their due comparisons. All of this is
possible with the AI Platform. Furthermore, the question only speaks of experimentation. Kubeflow would be more powerfull if was a necessity
for end-to-end pipeline.
upvoted 3 times

  Danny2021 2 years, 7 months ago

D. In the new Vertex AI, it now supports experimentation with hyper parameter tuning.
upvoted 4 times

  tavva_prudhvi 9 months, 2 weeks ago

How can we track the progress of each run and compare the results in the vertex AI dashboard?
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 80/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #35 Topic 1

You are developing a Kubeflow pipeline on Google Kubernetes Engine. The first step in the pipeline is to issue a query against BigQuery. You plan

to use the results of that query as the input to the next step in your pipeline. You want to achieve this in the easiest way possible. What should you

do?

A. Use the BigQuery console to execute your query, and then save the query results into a new BigQuery table.

B. Write a Python script that uses the BigQuery API to execute queries against BigQuery. Execute this script as the first step in your Kubeflow

pipeline.

C. Use the Kubeflow Pipelines domain-specific language to create a custom component that uses the Python BigQuery client library to execute

queries.

D. Locate the Kubeflow Pipelines repository on GitHub. Find the BigQuery Query Component, copy that component's URL, and use it to load the

component into your pipeline. Use the component to execute queries against BigQuery.

Correct Answer: A

Community vote distribution

D (77%) B (23%)

  maartenalexander Highly Voted  2 years, 10 months ago

D. Kubeflow pipelines have different types of components, ranging from low- to high-level. They have a ComponentStore that allows you to access
prebuilt functionality from GitHub.
upvoted 20 times

  gcp2021go 2 years, 9 months ago

agree, links: https://github.com/kubeflow/pipelines/blob/master/components/gcp/bigquery/query/sample.ipynb; https://v0-
5.kubeflow.org/docs/pipelines/reusable-components/
upvoted 6 times

  NamitSehgal Highly Voted  2 years, 3 months ago

Selected Answer: D

Not sure what is the reason behind putting A as it is manual and manual steps can not be part of automation. I would say Answer is D as it just
require a clone of the component from github. Using a Python and import bigquery component may sounds good too, but ask was what is easiest
It depends how word "easy" is taken by individuals but definitely not A.
upvoted 6 times

  fragkris Most Recent  4 months, 3 weeks ago

Selected Answer: B

Im going "against the flow" and chosing B. It just sounds a lot easier option than D.
upvoted 1 times

  friedi 10 months, 2 weeks ago

Selected Answer: B

Very confused as to why D is the correct answer. To me it seems a) much simpler to just write a couple of lines of python
(https://cloud.google.com/bigquery/docs/reference/libraries#client-libraries-install-python) and b) the documentation for the BigQuery reusable
component (https://v0-5.kubeflow.org/docs/pipelines/reusable-components/) states that the data is written to Google Cloud Storage, which mean
we have to write the fetching logic in the next pipeline step, going against the "as simple as possible" requirement. Would be interested to hear
why I am wrong.
upvoted 2 times

  friedi 10 months, 1 week ago

Actually, the problem statement even says that the query result has to be used as input to the next step, meaning with answer D) we would
have to download the results before passing them to the next step. Additionally, we would have to handle potentially existing files in Google
Cloud Storage if the pipeline is either executed multiple times or even in parallel. (I will die on this hill 😆 ).
upvoted 2 times

  tavva_prudhvi 5 months, 3 weeks ago

Yup, you raised valid points. Depending on your specific requirements and familiarity with Python, writing a custom script using the BigQuery
API (Option B) can be a simpler and more flexible approach.

With Option B, you can write a Python script that uses the BigQuery API to execute queries against BigQuery and fetch the data directly into
your pipeline. This way, you can process the data as needed and pass it to the next step in the pipeline without the need to fetch it from Google
Cloud Storage.

While using the reusable BigQuery Query Component (Option D) provides a pre-built solution, it does require additional steps to fetch the data
from Google Cloud Storage for the next step in the pipeline, which might not be the simplest approach.
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 81/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

  M25 11 months, 3 weeks ago

Selected Answer: D

Went with D
upvoted 1 times

  Mohamed_Mossad 1 year, 9 months ago

Selected Answer: D

https://linuxtut.com/en/f4771efee37658c083cc/
upvoted 1 times

  Mohamed_Mossad 1 year, 9 months ago

answer between C,D but above link has an article which uses a ready .yml file for bigquery component on official kubeflow pipelines repo
upvoted 1 times

  David_ml 1 year, 11 months ago

Selected Answer: D

Answer is D.
upvoted 2 times

  donchoripan 2 years, 1 month ago

A. it says the easiest way possible so it sounds like just running the query on the console should be enogh. It doesn't says that the data will need to
be uploaded again anytime soon, so we can asume that its just a one time query to be run.
upvoted 1 times

  David_ml 1 year, 11 months ago

A is wrong. Answer is D. It's a pipeline which means you will run it multiple times? Do you always want to make the query manually each time
you run your pipeline?
upvoted 3 times

  xiaoF 2 years, 2 months ago

D is good.
upvoted 2 times

  aepos 2 years, 5 months ago

The result of D is just the path to the Cloud Storage where the result is stored not the data itself. So the input to the next step is this path, where
you still have to load the data? So i would guess B. Can anyone explain if i am wrong?
upvoted 2 times

  kaike_reis 2 years, 5 months ago

D. The easiest way possible in developer's world: copy code from stackoverflow or github hahaha. Jokes a part, I think D is the correct. (A) is
manual, so you have to do always. (B) could be, but is not the easiest one because you need to write a script for this. (C) uses Kubeflow intern
solution, but you need to work to create a custom component. (D) is the (C) solution, but easier using a component created previously to do the
job.
upvoted 2 times

  celia20200410 2 years, 9 months ago

ans: c
https://medium.com/google-cloud/using-bigquery-and-bigquery-ml-from-kubeflow-pipelines-991a2fa4bea8
https://cloud.google.com/architecture/architecture-for-mlops-using-tfx-kubeflow-pipelines-and-cloud-build#kubeflow-piplines-components
Kubeflow Pipelines, a containerized task can invoke other services such as BigQuery jobs, AI Platform (distributed) training jobs, and Dataflow jobs.
upvoted 1 times

  raviperi 2 years, 7 months ago

why create a custom component when a big query's reusable component is already present. Answer is D.
upvoted 6 times

  chohan 2 years, 10 months ago

Should be B
upvoted 4 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 82/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #36 Topic 1

You are building a model to predict daily temperatures. You split the data randomly and then transformed the training and test datasets.

Temperature data for model training is uploaded hourly. During testing, your model performed with 97% accuracy; however, after deploying to

production, the model's accuracy dropped to 66%. How can you make your production model more accurate?

A. Normalize the data for the training, and test datasets as two separate steps.

B. Split the training and test data based on time rather than a random split to avoid leakage.

C. Add more data to your test set to ensure that you have a fair distribution and sample for testing.

D. Apply data transformations before splitting, and cross-validate to make sure that the transformations are applied to both the training and

test sets.

Correct Answer: D

Community vote distribution

B (87%) 13%

  maartenalexander Highly Voted  2 years, 10 months ago

B. If you do time series prediction, you can't borrow information from the future to predict the future. If you do, you are artificially increasing your
accuracy.
upvoted 33 times

  fragkris Most Recent  4 months, 3 weeks ago

Selected Answer: B

Definetely B
upvoted 1 times

  Sum_Sum 5 months, 2 weeks ago

Selected Answer: B

they did not explicitly say forecasting, but splitting by time is the number one rule you learn
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: B

Went with B
upvoted 1 times

  SergioRubiano 1 year, 1 month ago

Selected Answer: D

D is correct. cross-validate
upvoted 2 times

  Mohamed_Mossad 1 year, 10 months ago

Selected Answer: B

train accuracy 97% , production accuracy 66% ---> time series data ---> random split ---> cause leakage , answer is B
upvoted 2 times

  David_ml 1 year, 11 months ago

Selected Answer: B

You don't split data randomly for time series prediction.

upvoted 3 times

  mmona19 2 years ago

Selected Answer: B

B should be the answer. D is incorrect as normalize before split is going to do data leak
https://community.rapidminer.com/discussion/32592/normalising-data-before-data-split-or-after
upvoted 2 times

  giaZ 2 years, 1 month ago

Selected Answer: B

If you do random split in a time series, your risk that training data will contain information about the target (definition of leakage), but similar data
won't be available when the model is used for prediction. Leakage causes the model to look accurate until you start making actual predictions with
it.
upvoted 3 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 83/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

  xiaoF 2 years, 2 months ago

agree B as well
upvoted 2 times

  JobQ 2 years, 4 months ago

I think is B
upvoted 2 times

  Danny2021 2 years, 7 months ago

B. D doesn't improve anything at all. Split and Transform is no different than Transform and Split if the transform logic is the same.
upvoted 3 times

  Jijiji 2 years, 8 months ago

seems like D
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 84/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #37 Topic 1

You are developing models to classify customer support emails. You created models with TensorFlow Estimators using small datasets on your on-

premises system, but you now need to train the models using large datasets to ensure high performance. You will port your models to Google

Cloud and want to minimize code refactoring and infrastructure overhead for easier migration from on-prem to cloud. What should you do?

A. Use AI Platform for distributed training.

B. Create a cluster on Dataproc for training.

C. Create a Managed Instance Group with autoscaling.

D. Use Kubeflow Pipelines to train on a Google Kubernetes Engine cluster.

Correct Answer: C

Community vote distribution

A (93%) 7%

  maartenalexander Highly Voted  2 years, 10 months ago

A. AI platform provides lower infrastructure overhead and allows you to not have to refactor your code too much (no containerization and such,
like in KubeFlow).
upvoted 27 times

  fragkris Most Recent  4 months, 3 weeks ago

Selected Answer: A

I chose A. Even though D is a working option, it requires us to create a GKE cluster, which requires more work.
upvoted 2 times

  Sum_Sum 5 months, 2 weeks ago

Selected Answer: A

A - because it has native support for TF

upvoted 1 times

  harithacML 9 months, 3 weeks ago

Selected Answer: A

A. Use AI Platform for distributed training. : Managed , low infra change migration: yes , although need code refactoring to bigquery sql
B. Create a cluster on Dataproc for training. : only cluster ? what about training?
C. Create a Managed Instance Group with autoscaling. : Same Q?
D. Use Kubeflow Pipelines to train on a Google Kubernetes Engine cluster : only training?
upvoted 2 times

  M25 11 months, 3 weeks ago

Selected Answer: A

Went with A
upvoted 1 times

  Fatiy 1 year, 2 months ago

Selected Answer: A

Option A is the best choice as AI Platform provides a distributed training framework, enabling you to train large-scale models faster and with less
effort
upvoted 1 times

  Mohamed_Mossad 1 year, 10 months ago

Selected Answer: A

using options eliminations answer between A,D will vote for A as it is easier
upvoted 1 times

  Mohamed_Mossad 1 year, 10 months ago

- using options eliminations answer between A,D will vote for A as it is easier
upvoted 1 times

  David_ml 1 year, 11 months ago

Selected Answer: A

The answer is A. AI platform also contains kubeflow pipelines. you don't need to set up infrastructure to use it. For D you need to set up a
kubernetes cluster engine. The question asks us to minimize infrastructure overheard.
upvoted 2 times

  mmona19 2 years ago

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 85/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Selected Answer: D

D- Kubeflow pipelines with Vertex ai provides you ability to reuse existing code using a TF conatiner in a pipeline. it helps automate the process.
there is a qwiklab walking through this.
A-incorrect, question is asking resuse existing code with minimum changes. distributed deployment does not address that.
upvoted 1 times

  David_ml 1 year, 11 months ago

  A4M 2 years, 3 months ago

A - better to go with managed service and distributed
upvoted 2 times

  DHEEPAK 2 years, 3 months ago

I am 100% sure that the answer is D.
Kubeflow pipelines were designed keeping:

A) Portability.
B) Composability.
C) Flexibility in mind.

This is the pain point that the kubeflow pipelines address

upvoted 1 times

  David_ml 1 year, 11 months ago

  NamitSehgal 2 years, 3 months ago

Selected Answer: A

TensorFlow Estimators means require distributed and that is key feature for AI platform or later Vertex AI.
upvoted 3 times

  JobQ 2 years, 4 months ago

I think is A
upvoted 1 times

  q4exam 2 years, 7 months ago

I think the answer is either A or B, but personally think it is likely B because dataproc is a common tool box on GCP used for ML while AI platform
might require refactoring. However, I dont really know A or B
upvoted 3 times

  george_ognyanov 2 years, 6 months ago

Another vote for answer A. AI Platform distributed training here.

However, I wanted to share my logic why its not B as well. Dataproc is a managed Hadoop and as such needs a processing engine for ML tasks.
Most likely Spark and SparkML. Now Spark code is quite different than pure Python and SparkML is even more different than TFcode. I imagine
there might me a way to convert TF code to run on SparkML, but this seems a lot of work. And besides the question specifically wants us to
minimize refactoring, so there you have it, we can eliminate option B 100%.
upvoted 4 times

  Danny2021 2 years, 7 months ago

A. D involves more infra overhead.
upvoted 4 times

  salsabilsf 2 years, 10 months ago

Should be D
upvoted 1 times

  GCP_Guru 2 years, 4 months ago

why?????
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 86/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #38 Topic 1

You have trained a text classification model in TensorFlow using AI Platform. You want to use the trained model for batch predictions on text data

stored in

BigQuery while minimizing computational overhead. What should you do?

A. Export the model to BigQuery ML.

B. Deploy and version the model on AI Platform.

C. Use Dataflow with the SavedModel to read the data from BigQuery.

D. Submit a batch prediction job on AI Platform that points to the model location in Cloud Storage.

Correct Answer: A

Community vote distribution

A (58%) D (33%) 8%

  maartenalexander Highly Voted  2 years, 10 months ago

A. You would want to minimize computational overhead–BigQuery minimizes such overhead

upvoted 19 times

  q4exam 2 years, 7 months ago

BQML doesnt support NLP model
upvoted 3 times

  ms_lemon 2 years, 6 months ago

you can import a TF model in BQ ML
upvoted 9 times

  gcp2021go 2 years, 6 months ago

agree. https://cloud.google.com/bigquery-ml/docs/making-predictions-with-imported-tensorflow-models
upvoted 5 times

  harithacML 9 months, 3 weeks ago

No need . This is a text classification problem. need to convert words to numbers and use a classifier.
upvoted 3 times

  chohan Highly Voted  2 years, 10 months ago

I think it's A
https://cloud.google.com/bigquery-ml/docs/making-predictions-with-imported-tensorflow-models#importing_models
upvoted 11 times

  Aastha_Vashist Most Recent  1 month, 1 week ago

Selected Answer: A

Bquery to minimize computational overhead

upvoted 2 times

  MrTracer 4 months ago

Selected Answer: D

Would go with D
upvoted 1 times

  Sum_Sum 5 months, 2 weeks ago

Selected Answer: A

A - you can import TF models to BQ

upvoted 2 times

  harithacML 9 months, 3 weeks ago

Selected Answer: A

Model : AI Platform.
pred batch data : BigQuery
constraint : computational overhead

Same platform as data == less computation required to load and pass it to model
upvoted 2 times

  Liting 9 months, 3 weeks ago

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 87/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Selected Answer: A

minimize computational overhead–>BigQuery

upvoted 2 times

  Voyager2 10 months, 3 weeks ago

Not sure if when you have the saved model in Cloud storage that means that you don't use compute in vertex. I think that the option compute-fre
is bigquery
upvoted 1 times

  Voyager2 10 months, 3 weeks ago

Not sure
Text Classification Using BigQuery ML and ML.NGRAMS
https://medium.com/@jeffrey.james/text-classification-using-bigquery-ml-and-ml-ngrams-6e365f0b5505
upvoted 1 times

  rexduo 11 months, 2 weeks ago

Selected Answer: A

I think D have extra compute on extrating data frm BQ

upvoted 2 times

  Darshan12 11 months, 2 weeks ago

There are some drawbacks to option D.

Cost: Submitting a batch prediction job on AI Platform is a paid service. The cost will depend on the size of the model and the amount of data that
you are predicting.
Complexity: Submitting a batch prediction job on AI Platform requires you to write some code. This can be a challenge if you are not familiar with
AI Platform.
Performance: Submitting a batch prediction job on AI Platform may not be as efficient as using BigQuery ML. This is because AI Platform needs to
load the model into memory before it can run the predictions.
Overall, option D is a viable option, but it may not be the best option for all situations.
upvoted 2 times

  M25 11 months, 3 weeks ago

Selected Answer: D

Went with D
upvoted 1 times

  lucaluca1982 1 year ago

Selected Answer: C

why not C?
upvoted 1 times

  lucaluca1982 1 year ago

Selected Answer: C

what about C?
upvoted 1 times

  tavva_prudhvi 10 months ago

This is an option that can be used to minimize computational overhead, but it is more complex to set up and requires you to have Dataflow
installed.
upvoted 2 times

  king31 5 months ago

Although it's more complex, the question doesn't imply any restrictions on complexity, only computational overheard
upvoted 1 times

  lucaluca1982 1 year ago

Selected Answer: D

D is more straightforward
upvoted 1 times

  studybrew 1 year ago

is it A or D?
upvoted 1 times

  tavva_prudhvi 1 year, 1 month ago

In this document(https://cloud.google.com/vertex-ai/docs/tabular-data/classification-regression/get-batch-predictions), it mentions "For
classification or regression models, you can provide input data in one of two formats: BigQuery tables, CSV objects in Cloud Storage.

Now, is it A/D?
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 88/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #39 Topic 1

You work with a data engineering team that has developed a pipeline to clean your dataset and save it in a Cloud Storage bucket. You have

created an ML model and want to use the data to refresh your model as soon as new data is available. As part of your CI/CD workflow, you want to

automatically run a Kubeflow

Pipelines training job on Google Kubernetes Engine (GKE). How should you architect this workflow?

A. Configure your pipeline with Dataflow, which saves the files in Cloud Storage. After the file is saved, start the training job on a GKE cluster.

B. Use App Engine to create a lightweight python client that continuously polls Cloud Storage for new files. As soon as a file arrives, initiate

the training job.

C. Configure a Cloud Storage trigger to send a message to a Pub/Sub topic when a new file is available in a storage bucket. Use a Pub/Sub-

triggered Cloud Function to start the training job on a GKE cluster.

D. Use Cloud Scheduler to schedule jobs at a regular interval. For the first step of the job, check the timestamp of objects in your Cloud

Storage bucket. If there are no new files since the last run, abort the job.

Correct Answer: C

Community vote distribution

C (88%) 13%

  Paul_Dirac Highly Voted  2 years, 10 months ago

C
https://cloud.google.com/architecture/architecture-for-mlops-using-tfx-kubeflow-pipelines-and-cloud-build#triggering-and-scheduling-kubeflow
pipelines
upvoted 15 times

  Paul_Dirac Highly Voted  2 years, 10 months ago

C
https://cloud.google.com/architecture/architecture-for-mlops-using-tfx-kubeflow-pipelines-and-cloud-build#triggering-and-scheduling-kubeflow
pipelines
upvoted 7 times

  ori5225 2 years, 8 months ago

On a schedule, using Cloud Scheduler.
Responding to an event, using Pub/Sub and Cloud Functions. For example, the event can be the availability of new data files in a Cloud Storage
bucket.
upvoted 1 times

  tavva_prudhvi 10 months ago

Option D requires the job to be scheduled at regular intervals, even if there are no new files. This can waste resources and lead to
unnecessary delays in the training process.
upvoted 1 times

  fragkris Most Recent  4 months, 3 weeks ago

Selected Answer: C

C - This is the google reccomended method.

upvoted 1 times

  Sum_Sum 5 months, 2 weeks ago

Selected Answer: C

C- because you don't want to re-engineer the pipeline

upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: C

Went with C
upvoted 1 times

  Fatiy 1 year, 2 months ago

Selected Answer: C

The scenario involves automatically running a Kubeflow Pipelines training job on GKE as soon as new data becomes available. To achieve this, we
can use Cloud Storage to store the cleaned dataset, and then configure a Cloud Storage trigger that sends a message to a Pub/Sub topic wheneve
a new file is added to the storage bucket. We can then create a Pub/Sub-triggered Cloud Function that starts the training job on a GKE cluster.
upvoted 1 times

  behzadsw 1 year, 3 months ago

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 89/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Selected Answer: A

The question says: As part of your CI/CD workflow, you want to automatically run a Kubeflow..

C is also an option but it seems more cumbersome.

One thing hat could be against A is that the data engineering team is separate team so they might not access your CI/CD if any changes from thei
side is needed..
upvoted 1 times

  tavva_prudhvi 10 months ago

Option A requires the data engineering team to modify the pipeline, which can be time-consuming and error-prone.
upvoted 1 times

  hiromi 1 year, 4 months ago

Selected Answer: C

C
Pubsub is the keyword
upvoted 2 times

  Mohamed_Mossad 1 year, 9 months ago

Selected Answer: C

event driven architecture is better than polling based architecure so I will vote for C
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 90/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #40 Topic 1

You have a functioning end-to-end ML pipeline that involves tuning the hyperparameters of your ML model using AI Platform, and then using the

best-tuned parameters for training. Hypertuning is taking longer than expected and is delaying the downstream processes. You want to speed up

the tuning job without significantly compromising its effectiveness. Which actions should you take? (Choose two.)

A. Decrease the number of parallel trials.

B. Decrease the range of floating-point values.

C. Set the early stopping parameter to TRUE.

D. Change the search algorithm from Bayesian search to random search.

E. Decrease the maximum number of trials during subsequent training phases.

Correct Answer: BD

Reference:

https://cloud.google.com/ai-platform/training/docs/hyperparameter-tuning-overview

Community vote distribution

CE (52%) CD (22%) 13% 9%

  gcp2021go Highly Voted  2 years, 9 months ago

I think should CE. I can't find any reference regarding B can reduce tuning time.
upvoted 18 times

  Paul_Dirac Highly Voted  2 years, 10 months ago

Answer: B & C (Ref: https://cloud.google.com/ai-platform/training/docs/using-hyperparameter-tuning)

(A) Decreasing the number of parallel trials will increase tuning time.
(D) Bayesian search works better and faster than random search since it's selective in points to evaluate and uses knowledge of previouls evaluated
points.
(E) maxTrials should be larger than 10*the number of hyperparameters used. And spanning the whole minimum space (10*num_hyperparams)
already takes some time. So, lowering maxTrials has little effect on reducing tuning time.
upvoted 16 times

  dxxdd7 2 years, 7 months ago

In your link, when they mentionned maxTrials they said that "In most cases there is a point of diminishing returns after which additional trials
have little or no effect on the accuracy"
They also say that it can affect time and cost
I think i'd rather go with CE
upvoted 10 times

  pinimichele01 Most Recent  2 weeks, 1 day ago

Selected Answer: CE

see pawan94
upvoted 2 times

  pawan94 3 months, 3 weeks ago

C and E, if you reference the latest docs of hptune job on vertex ai :
1. A not possible (refer: https://cloud.google.com/vertex-ai/docs/training/using-hyperparameter-
tuning#:~:text=the%20benefit%20of%20reducing%20the%20time%20the) , if you reduce the number of parallel trials then the speed of overall
completion gets negatively affected.
. The question is about how to speed up the process but not changing the model params. Changing the optimization algorithm would lead to
unexpected results.

So in my opinion C and E ( after carefully reading the updated docs) and please don't believe everything CHATGPT says . I encountered so many
questions where the LLM's are giving completely wrong answers
upvoted 3 times

  fragkris 4 months, 3 weeks ago

Selected Answer: CD

I chose C and D
upvoted 2 times

  Sum_Sum 5 months, 2 weeks ago

Selected Answer: CD

Chat GPT says:

. Set the early stopping parameter to TRUE.

Early Stopping: Enabling early stopping allows the tuning process to terminate a trial if it becomes clear that it's not producing promising results.
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 91/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

This prevents wasting time on unpromising trials and can significantly speed up the hyperparameter tuning process. It helps to focus resources on
more promising parameter combinations.
D. Change the search algorithm from Bayesian search to random search.

Random Search Algorithm: Random search, as opposed to Bayesian optimization, doesn't attempt to build a model of the objective function. While
Bayesian search can be more efficient in finding the optimal parameters, random search is often faster per iteration. Random search can be
particularly effective when the hyperparameter space is large, as it doesn't require as much computational power to select the next set of
parameters to evaluate.
upvoted 2 times

  Voyager2 10 months, 3 weeks ago

Selected Answer: CE

C&E
This video explains very well the max trials and parallel trials
https://youtu.be/8hZ_cBwNOss
This link explains early stopping
See https://cloud.google.com/ai-platform/training/docs/using-hyperparameter-tuning#early-stopping
upvoted 3 times

  rexduo 11 months, 2 weeks ago

Selected Answer: CE

A increase time, B HP tuning job normally bottle neck is not at model size, D did reduce time, but might significantly hurt effectiveness
upvoted 1 times

  CloudKida 11 months, 3 weeks ago

Selected Answer: AC

Running parallel trials has the benefit of reducing the time the training job takes (real time—the total processing time required is not typically
changed). However, running in parallel can reduce the effectiveness of the tuning job overall. That is because hyperparameter tuning uses the
results of previous trials to inform the values to assign to the hyperparameters of subsequent trials. When running in parallel, some trials start
without having the benefit of the results of any trials still running.
You can specify that AI Platform Training must automatically stop a trial that has become clearly unpromising. This saves you the cost of continuing
a trial that is unlikely to be useful.

To permit stopping a trial early, set the enableTrialEarlyStopping value in the HyperparameterSpec to TRUE.
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: CE

Went with C & E

upvoted 1 times

  kucuk_kagan 1 year, 1 month ago

Selected Answer: AD

To speed up the tuning job without significantly compromising its effectiveness, you can take the following actions:

A. Decrease the number of parallel trials: By reducing the number of parallel trials, you can limit the amount of computational resources being used
at a given time, which may help speed up the tuning job. However, reducing the number of parallel trials too much could limit the exploration of
the parameter space and result in suboptimal results.

D. Change the search algorithm from Bayesian search to random search: Bayesian optimization is a computationally intensive method that requires
more time and resources than random search. By switching to a simpler method like random search, you may be able to speed up the tuning job
without compromising its effectiveness. However, random search may not be as efficient in finding the best hyperparameters as Bayesian
optimization.
upvoted 1 times

  Yajnas_arpohc 1 year, 1 month ago

Selected Answer: DE

Early stopping is for training, not hyperparameter tuning

upvoted 1 times

  Fatiy 1 year, 2 months ago

Selected Answer: AD

The two actions that can speed up hyperparameter tuning without compromising effectiveness are decreasing the number of parallel trials and
changing the search algorithm from Bayesian search to random search.
upvoted 2 times

  shankalman717 1 year, 2 months ago

Selected Answer: CD

B. Decrease the range of floating-point values: Reducing the range of the hyperparameters will decrease the search space and the time it takes to
find the optimal hyperparameters. However, if the range is too narrow, it may not be possible to find the best hyperparameters.

C. Set the early stopping parameter to TRUE: Setting the early stopping parameter to true will stop the trial when the performance has stopped
improving. This will help to reduce the number of trials needed and thus speed up the hypertuning job without compromising its effectiveness.
D.Changing the search algorithm from Bayesian search to random search could also be a valid action to speed up the hypertuning job. Random
search can explore the hyperparameter space more efficiently and with less computation cost compared to Bayesian search, especially when the
search space is large and complex. However, it may not be as effective as Bayesian search in finding the best hyperparameters in some cases.

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 92/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

upvoted 1 times

  tavva_prudhvi 1 year, 1 month ago

D might not be the correct option, as for random search it might be faster but there might be a chance of decreased accuracy and this violates
the questions as it says, to not comprise efficiency!
upvoted 1 times

  Mohamed_Mossad 1 year, 9 months ago

Selected Answer: CE

Answer C,E
=========
Explanation :
A. Decrease the number of parallel trials : doing this will of course make Hypertuning take more time , we need to increase parallel trials not
decrease
B.Decrease the range of floating-point values : theoretically this should speed up the computation but this is not the most correct answer
C. Set the early stopping parameter to TRUE : this is very good option
D. Change the search algorithm from Bayesian search to random search : also searching the search algorithm will not have a great impact
E. Decrease the maximum number of trials during subsequent training phases : very good option
upvoted 2 times

  David_ml 1 year, 11 months ago

Selected Answer: CE

CE for me.
upvoted 1 times

  morgan62 2 years ago

Selected Answer: CE

I vote for C,E.

A: If you decrease # of parallel trials, training takes more time.

B: Even if you decrease the range, training takes the same time becuz # trial remained still.
D: Going worse.
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 93/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #41 Topic 1

Your team is building an application for a global bank that will be used by millions of customers. You built a forecasting model that predicts

customers' account balances 3 days in the future. Your team will use the results in a new feature that will notify users when their account balance

is likely to drop below $25. How should you serve your predictions?

A. 1. Create a Pub/Sub topic for each user. 2. Deploy a Cloud Function that sends a notification when your model predicts that a user's

account balance will drop below the $25 threshold.

B. 1. Create a Pub/Sub topic for each user. 2. Deploy an application on the App Engine standard environment that sends a notification when

your model predicts that a user's account balance will drop below the $25 threshold.

C. 1. Build a notification system on Firebase. 2. Register each user with a user ID on the Firebase Cloud Messaging server, which sends a

notification when the average of all account balance predictions drops below the $25 threshold.

D. 1. Build a notification system on Firebase. 2. Register each user with a user ID on the Firebase Cloud Messaging server, which sends a

notification when your model predicts that a user's account balance will drop below the $25 threshold.

Correct Answer: A

Community vote distribution

D (94%) 6%

  salsabilsf Highly Voted  2 years, 10 months ago

Should be D !
creating a Pub/Sub topic for each user is overkill
upvoted 18 times

  Y2Data 2 years, 7 months ago

Yes, create a topic is overkill but not a NOTIFICATION SYSTEM. it's totally normal.
Seriously, the step two involves "REGISTER EACH USER ....", how is this better than create a topic????

should be A and it's so obvious!

upvoted 3 times

  q4exam 2 years, 7 months ago

I think A is straight forward answer but in real life, customer also consider cost, so practically, app engine will be picked in this case.....
because of the large user base
upvoted 2 times

  SlipperySlope Highly Voted  2 years, 2 months ago

Selected Answer: D

D is correct. Firebase is designed for exactly this sort of scenario. Also, it would not be possible to create millions of pubsub topics due to GCP
quotas
https://cloud.google.com/pubsub/quotas#quotas
https://firebase.google.com/docs/cloud-messaging
upvoted 6 times

  fragkris Most Recent  4 months, 3 weeks ago

Selected Answer: D

D is correct. Firebase is used for applications.

upvoted 1 times

  harithacML 9 months, 3 weeks ago

Selected Answer: A

simple answer , use tools most mentioned during training . , cloud functions
upvoted 1 times

  Kowalski 8 months ago

Pub/Sub has a limit of 10,000 topics only and can't be increased https://cloud.google.com/pubsub/quotas#resource_limits.
upvoted 2 times

  M25 11 months, 3 weeks ago

Selected Answer: D

Went with D
upvoted 1 times

  SergioRubiano 1 year, 1 month ago

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 94/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Selected Answer: D

"Create a Pub/Sub topic for each user" xD

upvoted 1 times

  Mohamed_Mossad 1 year, 9 months ago

Selected Answer: D

"Create a Pub/Sub topic for each user" this is crazy , we can not imagine a system with millions of pub/sub topics , so A,B wrong
C also wrong
upvoted 3 times

  mmona19 2 years ago

Selected Answer: D

D- is more automated compared to A. A is overkill

upvoted 1 times

  Vidyasagar 2 years, 3 months ago

Selected Answer: D

I think, D is the best answer

upvoted 3 times

  fdmenendez 2 years, 3 months ago

Project limit is 10,000 topics, you could have multiple projects but that does not scale well. so D.
https://cloud.google.com/pubsub/quotas#resource_limits
upvoted 4 times

  NamitSehgal 2 years, 3 months ago

D looks more relevant
Notification messages: Simply display a message content, which is handled by the FCM SDK. Data Messages: Display a message with some set
interactions
upvoted 3 times

  Danny2021 2 years, 5 months ago

A doesn't work. There is a quota limit on the number of pub/sub topics you can create, also one Cloud function cannot subscribe to millions of
topics. A doesn't scale at all.
upvoted 3 times

  Danny2021 2 years, 5 months ago

Answer is D. FCM is designed for this type of notification sent to mobile and desktop apps.
upvoted 4 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 95/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #42 Topic 1

You work for an advertising company and want to understand the effectiveness of your company's latest advertising campaign. You have

streamed 500 MB of campaign data into BigQuery. You want to query the table, and then manipulate the results of that query with a pandas

dataframe in an AI Platform notebook.

What should you do?

A. Use AI Platform Notebooks' BigQuery cell magic to query the data, and ingest the results as a pandas dataframe.

B. Export your table as a CSV file from BigQuery to Google Drive, and use the Google Drive API to ingest the file into your notebook instance.

C. Download your table from BigQuery as a local CSV file, and upload it to your AI Platform notebook instance. Use pandas.read_csv to ingest

he file as a pandas dataframe.

D. From a bash cell in your AI Platform notebook, use the bq extract command to export the table as a CSV file to Cloud Storage, and then use

gsutil cp to copy the data into the notebook. Use pandas.read_csv to ingest the file as a pandas dataframe.

Correct Answer: C

Reference:

https://cloud.google.com/bigquery/docs/bigquery-storage-python-pandas

Community vote distribution

A (90%) 10%

  zosoabi Highly Voted  2 years, 10 months ago

A: no "CSV" found in provided link https://cloud.google.com/bigquery/docs/bigquery-storage-python-pandas

upvoted 26 times

  Sum_Sum Most Recent  5 months, 2 weeks ago

Selected Answer: A

A is the google recommended answer. And what you should use

C is what the intern does ...
upvoted 3 times

  sharth 3 months, 3 weeks ago

Dude, I laughed so hard
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: A

Went with A
upvoted 2 times

  SergioRubiano 1 year, 1 month ago

Selected Answer: A

A, Using the command %%bigquery df

upvoted 1 times

  Dunnoth 1 year, 2 months ago

Why not D? using BQ notebook magic would be ok for a single time use. but usually a DS would reload the data multiple time, and every time you
need to stream 500mb data to the notebook instance from BQ. Isn't it cheaper to store the data as a csv in a bucket?
upvoted 2 times

  John_Pongthorn 1 year, 3 months ago

Selected Answer: A

%%bigquery df
SELECT name, SUM(number) as count
FROM `bigquery-public-data.usa_names.usa_1910_current`
GROUP BY name
ORDER BY count DESC
LIMIT 3

print(df.head())
upvoted 4 times

  hiromi 1 year, 4 months ago

Selected Answer: A

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 96/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

A
https://cloud.google.com/bigquery/docs/visualize-jupyter
upvoted 2 times

  Sachin2360 1 year, 10 months ago

Answer : A . Refer to this link for details: https://cloud.google.com/bigquery/docs/bigquery-storage-python-pandas
First 2 points talks about querying the data.
Download query results to a pandas DataFrame by using the BigQuery Storage API from the IPython magics for BigQuery in a Jupyter notebook.
Download query results to a pandas DataFrame by using the BigQuery client library for Python.
Download BigQuery table data to a pandas DataFrame by using the BigQuery client library for Python.
Download BigQuery table data to a pandas DataFrame by using the BigQuery Storage API client library for Python.
upvoted 2 times

  Mohamed_Mossad 1 year, 10 months ago

Selected Answer: A

https://googleapis.dev/python/bigquery/latest/magics.html#ipython-magics-for-bigquery
upvoted 2 times

  NickNtaken 2 years ago

Selected Answer: A

this is the simplest and most straightforward way read BQ data into Pandas dataframe.
upvoted 3 times

  mmona19 2 years ago

Selected Answer: C

both A and C is technically correct. C has more manual step and A has less. The question does not ask which requires least effort. so C is clear
answer
upvoted 1 times

  wish0035 1 year, 4 months ago

"A and C are valid, but C is more difficult than A. they don't ask to be easier so I will go with the more difficult". WHAAAT?
Google best practices are always: easier > harder. Even they encourage you to skip ML if you don't need ML.
upvoted 2 times

  SlipperySlope 2 years, 2 months ago

Selected Answer: C

C is the correct answer due to the size of the data. It wouldn't be possible to download it all into an in memory data frame.
upvoted 1 times

  u_phoria 1 year, 10 months ago

500mb of data into a pandas dataframe generally isn't a problem, far from it.
upvoted 2 times

  ggorzki 2 years, 3 months ago

Selected Answer: A

IPython magics for BigQuery

https://cloud.google.com/bigquery/docs/bigquery-storage-python-pandas
upvoted 1 times

  NamitSehgal 2 years, 3 months ago

I agree with A
upvoted 1 times

  Y2Data 2 years, 7 months ago

Just load it

https://googleapis.dev/python/bigquery/latest/magics.html
upvoted 2 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 97/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #43 Topic 1

You are an ML engineer at a global car manufacture. You need to build an ML model to predict car sales in different cities around the world. Which

features or feature crosses should you use to train city-specific relationships between car type and number of sales?

A. Thee individual features: binned latitude, binned longitude, and one-hot encoded car type.

B. One feature obtained as an element-wise product between latitude, longitude, and car type.

C. One feature obtained as an element-wise product between binned latitude, binned longitude, and one-hot encoded car type.

D. Two feature crosses as an element-wise product: the first between binned latitude and one-hot encoded car type, and the second between

binned longitude and one-hot encoded car type.

Correct Answer: C

Community vote distribution

C (100%)

  Paul_Dirac Highly Voted  2 years, 9 months ago

C
https://developers.google.com/machine-learning/crash-course/feature-crosses/check-your-understanding
upvoted 22 times

  ebinv2 Highly Voted  2 years, 9 months ago

C should be the answer
upvoted 8 times

  Sum_Sum Most Recent  5 months, 2 weeks ago

C - everything else is madness
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: C

Went with C
upvoted 1 times

  Mohamed_Mossad 1 year, 10 months ago

Selected Answer: C

https://developers.google.com/machine-learning/crash-course/feature-crosses/video-lecture
upvoted 4 times

  A4M 2 years, 3 months ago

C - Answer
when doing feature cross the features need to be binned
upvoted 3 times

  MK_Ahsan 2 years, 3 months ago

Selected Answer: C

https://developers.google.com/machine-learning/crash-course/feature-crosses/check-your-understanding
Answer C: It needs a feature cross to obtain one feature.
upvoted 3 times

  NamitSehgal 2 years, 3 months ago

I got with C
upvoted 3 times

  ramen_lover 2 years, 5 months ago

"element-wise product" sounds like we are not using a feature cross but artificially creating a new column whose values is the "element-wise
product" of other column values...; i.e., (1, 2, 3) => 1 * 2 * 3 = 6.
I am not a native English speaker; thus, I might misunderstand the sentence.
upvoted 1 times

  ralf_cc 2 years, 9 months ago

D - https://developers.google.com/machine-learning/crash-course/feature-crosses/video-lecture
upvoted 4 times

  jk73 2 years, 7 months ago

Cannot be D, Despite Binning is a good idea because it enables the model to learn nonlinear relationships within a single feature; separate
latitude and longitude in different feature crosses is not a good one, this separation will prevent the model from learning city-specific sales. A

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 98/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

city is the conjunction of latitude and longitude.

In that order of Ideas Crossing binned latitude with binned longitude enables the model to learn city-specific effects of car type.

I will go for C,
https://developers.google.com/machine-learning/crash-course/feature-crosses/check-your-understanding
upvoted 12 times

  george_ognyanov 2 years, 6 months ago

Damn that was a good explanation. Thank you for writing it out.
upvoted 2 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 99/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #44 Topic 1

You work for a large technology company that wants to modernize their contact center. You have been asked to develop a solution to classify

incoming calls by product so that requests can be more quickly routed to the correct support team. You have already transcribed the calls using

the Speech-to-Text API. You want to minimize data preprocessing and development time. How should you build the model?

A. Use the AI Platform Training built-in algorithms to create a custom model.

B. Use AutoMlL Natural Language to extract custom entities for classification.

C. Use the Cloud Natural Language API to extract custom entities for classification.

D. Build a custom model to identify the product keywords from the transcribed calls, and then run the keywords through a classification

algorithm.

Correct Answer: A

Community vote distribution

B (82%) C (18%)

  chohan Highly Voted  2 years, 10 months ago

Should be B
-> minimize data preprocessing and development time
upvoted 21 times

  sensev 2 years, 9 months ago

Agree its B. A and D is incorrect since it requires more development time. C is also incorrect since the product is company specific and might
not be well recognized by Cloud Natural Language API.
upvoted 5 times

  neohanju 2 years, 8 months ago

I thought the answer is B too. However, after carefully reading the question and answers again, B produces entities for classification only, not a
classification result.
So, A and D are only candidates and A is better.
upvoted 2 times

  21c17b3 Most Recent  2 months, 1 week ago

I'm voting C here!

upvoted 2 times

  ralf_cc 2 months, 2 weeks ago

AutoML only has classification and regression
upvoted 1 times

  pico 7 months, 3 weeks ago

Selected Answer: C

Key Differences:

Approach: Option B (AutoML Natural Language) involves using an AutoML service to train a custom NLP model, while Option C (Cloud Natural
Language API) relies on a pre-built NLP API.

Control and Customization: Option B gives you more control and customization over the training process, as you train a model specific to your
needs. Option C offers less control but is quicker to set up since it uses a pre-built API.

Complexity: Option B might require more technical expertise to set up and configure the AutoML model, while Option C is more straightforward
and user-friendly.

In summary, both options allow you to extract custom entities for classification, but Option B (AutoML) involves more manual involvement in
training a custom model, while Option C (Cloud Natural Language API) provides a simpler, pre-built solution
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: B

Went with B
upvoted 2 times

  lucaluca1982 1 year ago

Selected Answer: C

why not C?
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 100/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

  julliet 11 months, 1 week ago

you have to classify company products, which are custom classes
upvoted 1 times

  pico 7 months, 3 weeks ago

you can still use Option C (Cloud Natural Language API) even when the solution needs to classify incoming calls by company-specific
products rather than general products. The Cloud Natural Language API can be customized to handle company-specific entities and
classifications effectively.
upvoted 2 times

  John_Pongthorn 1 year, 2 months ago

Selected Answer: B

AutoML is appropriate to classify incoming calls by product (Custom) to be routed to the correct support team.

Cloud Natural Language API is for general case (not particular business)
upvoted 1 times

  Mohamed_Mossad 1 year, 10 months ago

Selected Answer: B

"minimize data preprocessing and development time" answer will be limited to B,C
will choose C as Natural Language API does not handle custom operation
upvoted 2 times

  mmona19 2 years ago

B- automl custom classification and entity is going to help with minimum effort.
upvoted 4 times

  baimus 2 years, 1 month ago

I'm leaning towards C over B here. The question is underlining that minimal development time is required, and C is even less than B. If the
information is really domain specific, then you'd need B, but it's not clear what products the company sells, so we don't have enough info to say it'
too domain specific for C.
upvoted 4 times

  giaZ 2 years, 1 month ago

If anything, C is wrong because it tells you something that is not true: extract custom entities with Natural Language API it's not possible. That i
something you can do only with AutoML. Look at this comparison table: https://cloud.google.com/natural-language#section-6
That's how they subtly point you at answer B.
upvoted 8 times

  ggorzki 2 years, 3 months ago

Selected Answer: B

AutoML Natural Language - custom entities, with least development time

upvoted 4 times

  NamitSehgal 2 years, 3 months ago

Should be B
Basic classification, entity extraction, and sentiment analysis are available through the Cloud Natural Language API. AutoML Natural Language
enables you to define custom classification categories, entities, and sentiment scores that are relevant to your application.
upvoted 2 times

  David_ml 1 year, 11 months ago

no. if you need custom entities you don't use APIs
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 101/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #45 Topic 1

You are training a TensorFlow model on a structured dataset with 100 billion records stored in several CSV files. You need to improve the

input/output execution performance. What should you do?

A. Load the data into BigQuery, and read the data from BigQuery.

B. Load the data into Cloud Bigtable, and read the data from Bigtable.

C. Convert the CSV files into shards of TFRecords, and store the data in Cloud Storage.

D. Convert the CSV files into shards of TFRecords, and store the data in the Hadoop Distributed File System (HDFS).

Correct Answer: B

Reference:

https://cloud.google.com/dataflow/docs/guides/templates/provided-batch

Community vote distribution

C (65%) A (29%) 6%

  ralf_cc Highly Voted  2 years, 9 months ago

C - not enough info in the question, but C is the "most correct" one
upvoted 24 times

  fragkris Most Recent  4 months, 3 weeks ago

Selected Answer: C

C is the google reccomended approach.

upvoted 1 times

  Sum_Sum 5 months, 2 weeks ago

C is the correct one as BQ will not help you with performance
upvoted 1 times

  peetTech 7 months ago

Selected Answer: C

C https://datascience.stackexchange.com/questions/16318/what-is-the-benefit-of-splitting-tfrecord-file-into-
shards#:~:text=Splitting%20TFRecord%20files%20into%20shards,them%20through%20a%20training%20process.
upvoted 2 times

  peetTech 7 months ago

  ftl 7 months, 2 weeks ago

bard: The correct answer is:

C. Convert the CSV files into shards of TFRecords, and store the data in Cloud Storage.
TFRecords is a TensorFlow-specific binary format that is optimized for performance. Converting the CSV files into TFRecords will improve the
input/output execution performance. Sharding the TFRecords will allow the data to be read in parallel, which will further improve performance.

The other options are not as likely to improve performance.

Loading the data into BigQuery or Cloud Bigtable will add an additional layer of abstraction, which can slow down performance.
Storing the TFRecords in HDFS is not likely to improve performance, as HDFS is not optimized for TensorFlow.
upvoted 1 times

  tavva_prudhvi 8 months, 3 weeks ago

Using BigQuery or Bigtable may not be the most efficient option for input/output operations with TensorFlow. Storing the data in HDFS may be an
option, but Cloud Storage is generally a more scalable and cost-effective solution.
upvoted 1 times

  PST21 10 months, 3 weeks ago

While Bigtable can offer high-performance I/O capabilities, it is important to note that it is primarily designed for structured data storage and real-
time access patterns. In this scenario, the focus is on optimizing input/output execution performance, and using TFRecords in Cloud Storage aligns
well with that goal.
upvoted 1 times

  Voyager2 10 months, 4 weeks ago

Selected Answer: A

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 102/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

A. Load the data into BigQuery, and read the data from BigQuery.
https://cloud.google.com/blog/products/ai-machine-learning/tensorflow-enterprise-makes-accessing-data-on-google-cloud-faster-and-easier
Precisely on this link provided in other comments it whos that the best shot with tfrecords is: 18752 Records per second. In the same report it
shows that bigquery is morethan 40000 recors per second
upvoted 2 times

  tavva_prudhvi 9 months, 1 week ago

BigQuery is designed for running large-scale analytical queries, not for serving input pipelines for machine learning models like TensorFlow.
BigQuery's strength is in its ability to handle complex queries over vast amounts of data, but it may not provide the optimal performance for th
specific task of feeding data into a TensorFlow model.

On the other hand, converting the CSV files into shards of TFRecords and storing them in Cloud Storage (Option C) will provide better
performance because TFRecords is a format designed specifically for TensorFlow. It allows for efficient storage and retrieval of data, making it a
more suitable choice for improving the input/output execution performance. Additionally, Cloud Storage provides high throughput and low-
latency data access, which is beneficial for training large-scale TensorFlow models.
upvoted 2 times

  M25 11 months, 3 weeks ago

Selected Answer: C

Went with C
upvoted 2 times

  shankalman717 1 year, 2 months ago

Selected Answer: C

Cloud Bigtable is typically used to process unstructured data, such as time-series data, logs, or other types of data that do not conform to a fixed
schema. However, Cloud Bigtable can also be used to store structured data if necessary, such as in the case of a key-value store or a database that
does not require complex relational queries.
upvoted 1 times

  shankalman717 1 year, 2 months ago

Selected Answer: C

Option C, converting the CSV files into shards of TFRecords and storing the data in Cloud Storage, is the most appropriate solution for improving
input/output execution performance in this scenario
upvoted 1 times

  behzadsw 1 year, 3 months ago

Selected Answer: A

https://cloud.google.com/architecture/ml-on-gcp-best-practices#store-tabular-data-in-bigquery
BigQuery for structured data, cloud storage for unstructed data
upvoted 3 times

  ShePiDai 11 months, 2 weeks ago

agree. BigQuery and Cloud Storage have effectively identical storage performance, where BigQuery is optimised for structured dataset and GCS
for unstructured.
upvoted 1 times

  Mohamed_Mossad 1 year, 10 months ago

Selected Answer: D

"100 billion records stored in several CSV files" that means we deal with distributed big data problem , so HDFS is very suitable , Will choose D
upvoted 1 times

  hoai_nam_1512 1 year, 8 months ago

HDFS will require more resources
100 bil record is processed fine with Cloud Storage object
upvoted 1 times

  David_ml 1 year, 11 months ago

Answer is C. TFRecords in cloud storage for big data is the recommended practice by Google for training TF models.
upvoted 4 times

  giaZ 2 years, 1 month ago

Selected Answer: C

Google best practices: Use Cloud Storage buckets and directories to group the shards of data (either sharded TFRecord files if using Tensorflow, or
Avro if using any other framework). Aim for files of at least 100Mb, and 100 - 10000 shards.
upvoted 2 times

  baimus 2 years, 1 month ago

It's C, although I do note that: "A very common case of this practise is to store TF Records in a Hadoop File System or on bucket — based public
cloud solutions like Google Cloud Storage.", but they haven't specified a hadoop cluster is available and performanec from cloud storage will be
better.
upvoted 2 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 103/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #46 Topic 1

As the lead ML Engineer for your company, you are responsible for building ML models to digitize scanned customer forms. You have developed a

TensorFlow model that converts the scanned images into text and stores them in Cloud Storage. You need to use your ML model on the

aggregated data collected at the end of each day with minimal manual intervention. What should you do?

A. Use the batch prediction functionality of AI Platform.

B. Create a serving pipeline in Compute Engine for prediction.

C. Use Cloud Functions for prediction each time a new data point is ingested.

D. Deploy the model on AI Platform and create a version of it for online inference.

Correct Answer: D

Community vote distribution

A (100%)

  Paul_Dirac Highly Voted  2 years, 10 months ago

Use the model at the end of the day => Not D, C.

Minimize manual intervention => not B
Ans: A
upvoted 27 times

  Arthurious Most Recent  1 month, 1 week ago

Selected Answer: A

A is the most efficient

upvoted 1 times

  Sum_Sum 5 months, 2 weeks ago

Selected Answer: A

A is the only way

upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: A

Went with A
upvoted 1 times

  ares81 1 year, 3 months ago

Selected Answer: A

There is only A, for me.

upvoted 1 times

  koakande 1 year, 4 months ago

Selected Answer: A

Because aggregated data can be sent at the end of the day for batch prediction and AI platform is managed so satisfy minimal intervention
requirement
Not B as violates minimal intervention requirement
Not C and D as real-time or online inference is not needed since data is aggregated at the end of the day
upvoted 3 times

  hiromi 1 year, 4 months ago

Selected Answer: A

You need to use your ML model on the aggregated data collected at the end of each day with minimal manual intervention.
upvoted 1 times

  seifou 1 year, 5 months ago

A.
https://datatonic.com/insights/vertex-ai-improving-debugging-batch-
prediction/#:~:text=Vertex%20AI%20Batch%20Prediction%20provides,to%20GCS%20or%20BigQuery%2C%20respectively.
upvoted 1 times

  Mohamed_Mossad 1 year, 10 months ago

Selected Answer: A

"You need to use your ML model on the aggregated data" that means we need the batch prediction feature in AI platform
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 104/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

  ggorzki 2 years, 3 months ago

Selected Answer: A

A
https://cloud.google.com/ai-platform/prediction/docs/batch-predict
upvoted 3 times

  george_ognyanov 2 years, 6 months ago

Another vote for A. Technically, through the right lens D could be correct as well, but what tipped me towards A was batch vs online predictions
and the need for less manual work.
upvoted 3 times

  Y2Data 2 years, 7 months ago

https://cloud.google.com/ai-platform/prediction/docs/batch-predict
upvoted 2 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 105/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #47 Topic 1

You recently joined an enterprise-scale company that has thousands of datasets. You know that there are accurate descriptions for each table in

BigQuery, and you are searching for the proper BigQuery table to use for a model you are building on AI Platform. How should you find the data

that you need?

A. Use Data Catalog to search the BigQuery datasets by using keywords in the table description.

B. Tag each of your model and version resources on AI Platform with the name of the BigQuery table that was used for training.

C. Maintain a lookup table in BigQuery that maps the table descriptions to the table ID. Query the lookup table to find the correct table ID for

the data that you need.

D. Execute a query in BigQuery to retrieve all the existing table names in your project using the INFORMATION_SCHEMA metadata tables that

are native to BigQuery. Use the result o find the table that you need.

Correct Answer: B

Community vote distribution

A (100%)

  chohan Highly Voted  2 years, 10 months ago

Should be A
https://cloud.google.com/data-catalog/docs/concepts/overview
upvoted 18 times

  mmona19 Highly Voted  2 years ago

Selected Answer: A

who is providing these answers?? Its clearly A. most of the answers are incorrect here.
upvoted 7 times

  fragkris Most Recent  4 months, 3 weeks ago

Selected Answer: A

A without hesitation.
upvoted 1 times

  Sum_Sum 5 months, 2 weeks ago

Selected Answer: A

A is the only way

upvoted 1 times

  SamuelTsch 9 months, 3 weeks ago

Selected Answer: A

A should be correct
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: A

Went with A
upvoted 2 times

  TheGrew 2 years, 2 months ago

Selected Answer: A

Another vote for A by me.

upvoted 1 times

  NamitSehgal 2 years, 3 months ago

Selected Answer: A

A should be the way to go for large datasets

--This is also good but it is legacy way of checking:-
NFORMATION_SCHEMA contains these views for table metadata: TABLES and TABLE_OPTIONS for metadata about tables. COLUMNS and
COLUMN_FIELD_PATHS for metadata about columns and fields. PARTITIONS for metadata about table partitions (Preview)
upvoted 3 times

  JobQ 2 years, 4 months ago

I vote A
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 106/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

  george_ognyanov 2 years, 6 months ago

Another vote for answer A from me.
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 107/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #48 Topic 1

You started working on a classification problem with time series data and achieved an area under the receiver operating characteristic curve (AUC

ROC) value of

99% for training data after just a few experiments. You haven't explored using any sophisticated algorithms or spent any time on hyperparameter

tuning. What should your next step be to identify and fix the problem?

A. Address the model overfitting by using a less complex algorithm.

B. Address data leakage by applying nested cross-validation during model training.

C. Address data leakage by removing features highly correlated with the target value.

D. Address the model overfitting by tuning the hyperparameters to reduce the AUC ROC value.

Correct Answer: B

Community vote distribution

B (56%) C (40%) 4%

  Paul_Dirac Highly Voted  2 years, 10 months ago

Ans: B (Ref: https://towardsdatascience.com/time-series-nested-cross-validation-76adba623eb9)

(C) High correlation doesn't mean leakage. The question may suggest target leakage and the defining point of this leakage is the availability of
data after the target is available.(https://www.kaggle.com/dansbecker/data-leakage)
upvoted 27 times

  Jarek7 9 months, 3 weeks ago

This ref doesn't explain WHY we should use NCV in this case - it just explains HOW to use NCV when dealing with time series.
Cross-validation, including nested cross-validation, is a powerful tool for model evaluation and hyperparameter tuning, but it does NOT
DIRECTLY ADDRESS data leakage. Data leakage refers to a situation where information from the test dataset leaks into the training dataset,
causing the model to have an unrealistically high performance. Nested cross-validation can indeed help provide a more accurate estimation of
the model's performance on unseen data, but IT DOESN'T SOLVE the underlying issue of data leakage if it's already present.
upvoted 3 times

  John_Pongthorn Highly Voted  1 year, 1 month ago

Selected Answer: C

C: this is correct choice 1000000000%

This is data leakage issue on training data
https://cloud.google.com/automl-tables/docs/train#analyze
The question is from this content.
If a column's Correlation with Target value is high, make sure that is expected, and not an indication of target leakage.

Let 's explain on my owner way, sometime the feature used on training data use value to calculate something from target value unintentionally, it
result in high correlation with each other.
for instance , you predict stock price by using moving average, MACD , RSI despite the fact that 3 features have been calculated from price (target)
upvoted 8 times

  black_scissors 11 months ago

I agree. Besides, when a CV is done randomly (not split by the time point) it can make things worse.
upvoted 2 times

  AnnaR Most Recent  3 days, 6 hours ago

B: correct.
considering c, but why should we remove a feature of highly predictive nature?? for me, this does not explain the problem of overfitting... a highly
predictive feature is also useful for good performance evaluated on the test set.
--> Decide for B!
upvoted 2 times

  gscharly 1 week, 1 day ago

Selected Answer: B

agree with Paul_Dirac

upvoted 1 times

  b1a8fae 4 months ago

Selected Answer: B

I initially went with B- however after reading this: https://machinelearningmastery.com/nested-cross-validation-for-machine-learning-with-python/

I think C is right. Quoted from the link: "Nested cross-validation is an approach to model hyperparameter optimization and model selection that
attempts to overcome the problem of overfitting the training dataset.". Overfitting is exactly our problem here. Correlated features in the dataset
may be a sign of data leakage, but they are not necessarily.
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 108/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

  Sum_Sum 5 months, 2 weeks ago

Selected Answer: B

I think its B. GPT4 makes a good argument about C:

While this is a valid approach to handling data leakage, it might not be sufficient if the leakage is due to reasons other than high correlation, such
as temporal leakage in time-series data.
upvoted 1 times

  pico 7 months, 2 weeks ago

Selected Answer: A

Option A: This option is a reasonable choice. Switching to a less complex algorithm can help reduce overfitting, and using k-fold cross-validation
can provide a better estimate of how well the model will generalize to unseen data. It's essential to ensure that the high performance isn't solely
due to overfitting.
upvoted 1 times

  pico 7 months, 2 weeks ago

Option B: Nested cross-validation is primarily used to estimate model performance accurately and select the best model hyperparameters.
While it's a good practice, it doesn't directly address the overfitting issue. It helps prevent over-optimistic model performance estimates but
doesn't necessarily fix the overfitting problem.

Option C: Removing features highly correlated with the target value can be a valid step in feature selection or preprocessing. However, it
doesn't directly address the overfitting issue or explain why the model is performing exceptionally well on the training data. It's a separate step
from mitigating overfitting.

Option D: This option is incorrect. Tuning hyperparameters should aim to improve model performance on the validation set, not reduce it.

In summary, the most appropriate next step is Option A:

upvoted 1 times

  atlas_lyon 8 months, 1 week ago

Selected Answer: B

B: If splits are done chronologically(as it is always advised), Nested CV should work

C: High correlation with target means we have to check if this is strong explanatory power or data leakage. dropping the features won't help us
distinguish in those cases but may help reveal independence contribution of remaining features
upvoted 1 times

  tavva_prudhvi 8 months, 3 weeks ago

Selected Answer: B

Option C is a good step to avoid overfitting, but it's not necessarily the best approach to address data leakage.

Data leakage occurs when information from the validation or test data leaks into the training data, leading to overly optimistic performance
metrics. In time-series data, it's important to avoid using future information to predict past events.

Removing features highly correlated with the target value may help to reduce overfitting, but it does not necessarily address data leakage.

Therefore, applying nested cross-validation during model training is a better approach to address data leakage in this scenario.
upvoted 2 times

  Jarek7 9 months, 3 weeks ago

Selected Answer: C

https://towardsdatascience.com/avoiding-data-leakage-in-timeseries-101-25ea13fcb15f
Directly says: "Dive straight into the MVP, cross-validate later!"
MVP stands for Minimum Viable Product
upvoted 1 times

  Liting 9 months, 3 weeks ago

Selected Answer: B

Agree with Paul_Dirac. Also it is recommended to use nested-cross-validation to avoid data leakage in time series data.
upvoted 1 times

  black_scissors 11 months ago

Selected Answer: C

There can be a feature causing data leakage which might have been overlooked. In addition, when cross-validation is done randomly, the leakage
can be even bigger.
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: B

Went with B
upvoted 1 times

  hiromi 1 year, 4 months ago

Selected Answer: B

B
I agree with Paul_Dirac

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 109/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

upvoted 2 times

  Mohamed_Mossad 1 year, 10 months ago

Selected Answer: B

"You haven't explored using any sophisticated algorithms or spent any time on hyperparameter tuning" so we have a base /default
hyperparameters estimator so overfitting is quite not possible , so it is a leakage problem , by inspection C is wrong , so it will be B
upvoted 1 times

  David_ml 1 year, 11 months ago

Selected Answer: B

Quite tricky but through elimination, correct answer is B. Model overfitting doesn't apply here as we can't tell if a model is overfitting by just
looking at training data results.
upvoted 3 times

  Celia20210714 2 years, 9 months ago

ANS: D
(AUC ROC) value of 99% for training data after just a few experiments
>> overfitting
upvoted 2 times

  sensev 2 years, 9 months ago

D is incorrect since they mentioned that the initial model without sophisticated algorithm (e.g. model architecture) and without parameter
tuning already achieves 99% accuracy. This suggests data leakage, and especially so since they mentioned it is time-series data, which suggest
incorrect data split for train and evaluation.
upvoted 5 times

  Paul_Dirac 2 years, 9 months ago

No. We won't be able to know whether the model is overfitting just by looking at the training set alone.
upvoted 3 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 110/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #49 Topic 1

You work for an online travel agency that also sells advertising placements on its website to other companies. You have been asked to predict the

most relevant web banner that a user should see next. Security is important to your company. The model latency requirements are 300ms@p99,

the inventory is thousands of web banners, and your exploratory analysis has shown that navigation context is a good predictor. You want to

Implement the simplest solution. How should you configure the prediction pipeline?

A. Embed the client on the website, and then deploy the model on AI Platform Prediction.

B. Embed the client on the website, deploy the gateway on App Engine, and then deploy the model on AI Platform Prediction.

C. Embed the client on the website, deploy the gateway on App Engine, deploy the database on Cloud Bigtable for writing and for reading the

user's navigation context, and then deploy the model on AI Platform Prediction.

D. Embed the client on the website, deploy the gateway on App Engine, deploy the database on Memorystore for writing and for reading the

user's navigation context, and then deploy the model on Google Kubernetes Engine.

Correct Answer: B

Community vote distribution

B (50%) C (50%)

  Paul_Dirac Highly Voted  2 years, 9 months ago

Security => not A.

B: doesn't handle processing with banner inventory.
D: deployment on GKE is less simple than on AI Platform. Besides, MemoryStore is in-memory while banners are stored persistently.
Ans: C
upvoted 12 times

  pinimichele01 4 days, 7 hours ago

B: doesn't handle processing with banner inventory ---> not true...
upvoted 1 times

  Celia20210714 Highly Voted  2 years, 9 months ago

ANS: C
GAE + IAP
https://medium.com/google-cloud/secure-cloud-run-cloud-functions-and-app-engine-with-api-key-73c57bededd1

Bigtable at low latency

https://cloud.google.com/bigtable#section-2
upvoted 6 times

  AnnaR Most Recent  3 days, 6 hours ago

Selected Answer: B

Was torn between B and C, but decided for B, because the question states how we should configure the PREDICTION pipeline!
Since the exploratory analysis already identified navigation context as good predictor, the focus should be on the prediction model itself.
upvoted 2 times

  gscharly 1 week, 1 day ago

Selected Answer: C

agree with Paul_Dirac

upvoted 1 times

  rightcd 1 month, 2 weeks ago

look at Q80
upvoted 2 times

  Sum_Sum 5 months, 2 weeks ago

Selected Answer: B

I was torn between B and C.

But I really don't see the need for a DB
upvoted 3 times

  Mickey321 5 months, 2 weeks ago

Selected Answer: B

Embed the client on the website, deploy the gateway on App Engine, and then deploy the model on AI Platform Prediction.
upvoted 1 times

  harithacML 9 months, 2 weeks ago

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 111/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Selected Answer: B

secuirity (gateway) + Simplest(ai, not DB)

upvoted 1 times

  Liting 9 months, 3 weeks ago

Selected Answer: C

Bigtable is recommended for storage in the case scenario.

upvoted 1 times

  tavva_prudhvi 10 months ago

Selected Answer: C

B is also a possible solution, but it does not include a database for storing and retrieving the user's navigation context. This means that every time
user visits a page, the gateway would need to query the website to retrieve the navigation context, which could be slow and inefficient. By using
Cloud Bigtable to store the navigation context, the gateway can quickly retrieve the context from the database and pass it to the model for
prediction. This makes the overall prediction pipeline more efficient and scalable. Therefore, C is a better option compared to B.
upvoted 3 times

  friedi 10 months, 1 week ago

Selected Answer: B

B is correct, C introduces computational overhead, unnecessarily increasing serving latency.

upvoted 1 times

  Voyager2 10 months, 4 weeks ago

Selected Answer: C

C. Embed the client on the website, deploy the gateway on App Engine, deploy the database on Cloud Bigtable for writing and for reading the
user's navigation context, and then deploy the model on AI Platform Prediction
https://cloud.google.com/architecture/minimizing-predictive-serving-latency-in-machine-learning#choosing_a_nosql_database
Typical use cases for Bigtable are:
* Ad prediction that leverages dynamically aggregated values over all ad requests and historical data.
upvoted 1 times

  CloudKida 11 months, 3 weeks ago

Selected Answer: C

Bigtable is a massively scalable NoSQL database service engineered for high throughput and for low-latency workloads. It can handle petabytes of
data, with millions of reads and writes per second at a latency that's on the order of milliseconds.

Typical use cases for Bigtable are:

Fraud detection that leverages dynamically aggregated values. Applications in Fintech and Adtech are usually subject to heavy reads and writes.
Ad prediction that leverages dynamically aggregated values over all ad requests and historical data.
Booking recommendation based on the overall customer base's recent bookings.
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: C

Went with C
upvoted 1 times

  fredcaram 1 year ago

Selected Answer: B

The volume is too low for a Bigtable scenario

upvoted 1 times

  SergioRubiano 1 year, 1 month ago

Selected Answer: B

B.
simplest solution
upvoted 1 times

  alejandroverger 1 year, 1 month ago

Selected Answer: B

B. Embed the client on the website, deploy the gateway on App Engine, and then deploy the model on AI Platform Prediction.

For the simplest solution, you can embed the client on the website to collect user data and send it to a gateway deployed on App Engine. App
Engine provides a scalable and cost-effective solution to handle web requests. Then, you can deploy your model on AI Platform Prediction, which
can handle the required latency (300ms@p99) and provides a managed solution for serving machine learning models.

Option A might not provide the necessary security by directly accessing AI Platform Prediction from the client side. Options C and D introduce
additional complexity by adding a database layer (Cloud Bigtable and Memorystore, respectively) that is not necessary for the simplest solution, as
you can use the navigation context directly from the client.
upvoted 1 times

  kucuk_kagan 1 year ago

gpt answer

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 112/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

upvoted 2 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 113/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #50 Topic 1

Your team is building a convolutional neural network (CNN)-based architecture from scratch. The preliminary experiments running on your on-

premises CPU-only infrastructure were encouraging, but have slow convergence. You have been asked to speed up model training to reduce time-

to-market. You want to experiment with virtual machines (VMs) on Google Cloud to leverage more powerful hardware. Your code does not include

any manual device placement and has not been wrapped in Estimator model-level abstraction. Which environment should you train your model on?

A. AVM on Compute Engine and 1 TPU with all dependencies installed manually.

B. AVM on Compute Engine and 8 GPUs with all dependencies installed manually.

C. A Deep Learning VM with an n1-standard-2 machine and 1 GPU with all libraries pre-installed.

D. A Deep Learning VM with more powerful CPU e2-highcpu-16 machines with all libraries pre-installed.

Correct Answer: A

Community vote distribution

C (67%) D (20%) 13%

  celia20200410 Highly Voted  2 years, 9 months ago

ANS: C

to support CNN, you should use GPU.

for preliminary experiment, pre-installed pkgs/libs are good choice.

https://cloud.google.com/deep-learning-vm/docs/cli#creating_an_instance_with_one_or_more_gpus
https://cloud.google.com/deep-learning-vm/docs/introduction#pre-installed_packages
upvoted 14 times

  Paul_Dirac Highly Voted  2 years, 9 months ago

Code without manual device placement => default to CPU if TPU is present or to the lowest order GPU if multiple GPUs are present. => Not A, B.
D: already using CPU and needing GPU for CNN.
Ans: C
upvoted 12 times

  gscharly Most Recent  1 week, 2 days ago

Selected Answer: C

Agree with celia20200410 - C

upvoted 1 times

  Sum_Sum 5 months, 2 weeks ago

Selected Answer: C

Agree with celia20200410 - C

upvoted 2 times

  Mickey321 5 months, 2 weeks ago

Selected Answer: D

keyword: Your code does not include any manual device placement and has not been wrapped in Estimator model-level abstraction.
upvoted 1 times

  Liting 9 months, 3 weeks ago

Selected Answer: C

Should use the deep learning VM with GPU.

TPU should be selected only if necessary, coz it incurs high cost. GPU in this case is enough.
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: C

Went with C
upvoted 1 times

  Melampos 1 year ago

Selected Answer: A

thinking in fastest way

upvoted 1 times

  SergioRubiano 1 year, 1 month ago

Selected Answer: C

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 114/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

You should use GPU.

upvoted 1 times

  BenMS 1 year, 2 months ago

Selected Answer: D

Critical sentence: Your code does not include any manual device placement and has not been wrapped in Estimator model-level abstraction.

So only answer we have. it's D.

upvoted 2 times

  shankalman717 1 year, 2 months ago

Critical sentece: Your code does not include any manual device placement and has not been wrapped in Estimator model-level abstraction.

So only answer we have. it's D.

upvoted 3 times

  tavva_prudhvi 10 months ago

Option D provides a more powerful CPU but does not include a GPU, which may not be optimal for deep learning training.
upvoted 2 times

  ares81 1 year, 3 months ago

Selected Answer: C

It's C.
upvoted 1 times

  suresh_vn 1 year, 8 months ago

"has not been wrapped in Estimator model-level abstraction"
How you can use GPU?
D in my opinion, E-family using for high CPU tasks
upvoted 3 times

  Mohamed_Mossad 1 year, 9 months ago

Selected Answer: C

Answer C
========
Explanation
"speed up model training" will make us biased towards GPU,TPU options
by options eliminations we may need to stay away of any manual installations , so using preconfigered deep learning will speed up time to market
upvoted 1 times

  mmona19 2 years ago

Selected Answer: A

the question is asking speed up time to market which can happen if model trains fast. so TPU VM can be a solution.
https://cloud.google.com/blog/products/compute/introducing-cloud-tpu-vms option A. if question asks most managed way than answer is deep
learning container with everything installed. C
upvoted 1 times

  tavva_prudhvi 10 months ago

Option A with 1 TPU and option B with 8 GPUs might provide even faster training, but since the code does not include manual device
placement, it may not utilize all the available resources effectively.
upvoted 2 times

  maukaba 7 months, 1 week ago

Instead If you have a single GPU, TensorFlow will use this accelerator to speed up model training with no extra work on your part:
https://codelabs.developers.google.com/vertex-p2p-distributed#2
Normally you don't use just one TPU and for both GPUs and TPUs it is necessary to define a distributed training strategy:
https://www.tensorflow.org/guide/distributed_training
upvoted 1 times

  NamitSehgal 2 years, 3 months ago

C is correct
upvoted 1 times

  GCP_Guru 2 years, 4 months ago

Selected Answer: C

C is correct.
upvoted 2 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 115/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #51 Topic 1

You work on a growing team of more than 50 data scientists who all use AI Platform. You are designing a strategy to organize your jobs, models,

and versions in a clean and scalable way. Which strategy should you choose?

A. Set up restrictive IAM permissions on the AI Platform notebooks so that only a single user or group can access a given instance.

B. Separate each data scientist's work into a different project to ensure that the jobs, models, and versions created by each data scientist are

accessible only to that user.

C. Use labels to organize resources into descriptive categories. Apply a label to each created resource so that users can filter the results by

label when viewing or monitoring the resources.

D. Set up a BigQuery sink for Cloud Logging logs that is appropriately filtered to capture information about AI Platform resource usage. In

BigQuery, create a SQL view that maps users to the resources they are using

Correct Answer: A

Community vote distribution

C (100%)

  chohan Highly Voted  2 years, 10 months ago

I think should be C,
As IAM roles are given to the entire AI Notebook resource, not to a specific instance.
upvoted 13 times

  celia20200410 Highly Voted  2 years, 9 months ago

ans: c

https://cloud.google.com/ai-platform/prediction/docs/resource-labels#overview_of_labels
You can add labels to your AI Platform Prediction jobs, models, and model versions, then use those labels to organize resources into categories
when viewing or monitoring the resources.

For example, you can label jobs by team (such as engineering or research) and development phase (prod or test), then filter the jobs based on the
team and phase.

Labels are also available on operations, but these labels are derived from the resource to which the operation applies. You cannot add or update
labels on an operation.

A label is a key-value pair, where both the key and the value are custom strings that you supp
upvoted 9 times

  vivid_cucumber 2 years, 5 months ago

I read through this page: https://cloud.google.com/ai-platform/prediction/docs/sharing-models. This one sounds more like A. Is isn't that
correct? I am not quite sure.
upvoted 1 times

  vivid_cucumber 2 years, 5 months ago

or maybe A is not correct because "sharing models using IAM" only applies to "manage access to resource" but this question is more like
asking to "organize jobs, models, and versions". not sure if my understanding is right or not.
upvoted 1 times

  Sum_Sum Most Recent  5 months, 2 weeks ago

C
Although there are some questions where setting up a logging sink to BQ is the answer.
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: C

Went with C
upvoted 1 times

  BenMS 1 year, 2 months ago

Selected Answer: C

Restricting access is not scalable and creates silos - better to document sharable resources through tagging, hence C.
upvoted 1 times

  hiromi 1 year, 4 months ago

Selected Answer: C

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 116/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

C
Resource tagging/labeling is the best way to manage ML resources for medium/big data science teams.
upvoted 1 times

  ggorzki 2 years, 3 months ago

Selected Answer: C

https://cloud.google.com/ai-platform/prediction/docs/resource-labels#overview_of_labels
(A) applies only to notebooks wich is not enough
upvoted 4 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 117/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #52 Topic 1

You are training a deep learning model for semantic image segmentation with reduced training time. While using a Deep Learning VM Image, you

receive the following error: The resource 'projects/deeplearning-platforn/zones/europe-west4-c/acceleratorTypes/nvidia-tesla-k80' was not found.

What should you do?

A. Ensure that you have GPU quota in the selected region.

B. Ensure that the required GPU is available in the selected region.

C. Ensure that you have preemptible GPU quota in the selected region.

D. Ensure that the selected GPU has enough GPU memory for the workload.

Correct Answer: A

Community vote distribution

B (93%) 7%

  celia20200410 Highly Voted  2 years, 9 months ago

ANS: B
https://cloud.google.com/deep-learning-vm/docs/troubleshooting#resource_not_found

https://cloud.google.com/compute/docs/gpus/gpu-regions-zones

Resource not found

Symptom: - The resource 'projects/deeplearning-platform/zones/europe-west4-c/acceleratorTypes/nvidia-tesla-k80' was not found

Problem: You are trying to create an instance with one or more GPUs in a region where GPUs are not available (for example, an instance with a K80
GPU in europe-west4-c).

Solution: To determine which region has the required GPU, see GPUs on Compute Engine.
upvoted 21 times

  stomcarlo Highly Voted  2 years, 10 months ago

it is B, the error message relates to Quota is different:

https://cloud.google.com/deep-learning-vm/docs/troubleshooting#resource_not_found
upvoted 8 times

  fragkris Most Recent  4 months, 3 weeks ago

Selected Answer: B

Not all resources can be found in any region. Therefore - B

upvoted 1 times

  abhay669 5 months ago

Selected Answer: B

It is clearly mentioned here: https://cloud.google.com/deep-learning-vm/docs/troubleshooting

upvoted 1 times

  Sum_Sum 5 months, 2 weeks ago

Selected Answer: B

B - because it's "cant be found"

upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: B

Went with B
upvoted 1 times

  BenMS 1 year, 2 months ago

Selected Answer: B

The error says the resource was not found - hence B.

If quota was the problem (A) then you'd see a different error message.
upvoted 2 times

  hiromi 1 year, 4 months ago

Selected Answer: B

B obviously
upvoted 2 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 118/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

  _luigi_ 2 years ago

Selected Answer: B

The resource is not found because it doesn't exist in the region.

upvoted 3 times

  mmona19 2 years ago

Selected Answer: A

the question is asking what should you do not why is the error.
Answer should be A. if you get that exception, make sure to check your limit for instance before running the job.
upvoted 1 times

  ggorzki 2 years, 3 months ago

Selected Answer: B

https://cloud.google.com/deep-learning-vm/docs/troubleshooting#resource_not_found
upvoted 2 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 119/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #53 Topic 1

Your team is working on an NLP research project to predict political affiliation of authors based on articles they have written. You have a large

training dataset that is structured like this:

You followed the standard 80%-10%-10% data distribution across the training, testing, and evaluation subsets. How should you distribute the

training examples across the train-test-eval subsets while maintaining the 80-10-10 proportion?

A. Distribute texts randomly across the train-test-eval subsets: Train set: [TextA1, TextB2, ...] Test set: [TextA2, TextC1, TextD2, ...] Eval set:

[TextB1, TextC2, TextD1, ...]

B. Distribute authors randomly across the train-test-eval subsets: (*) Train set: [TextA1, TextA2, TextD1, TextD2, ...] Test set: [TextB1, TextB2,

...] Eval set: [TexC1,TextC2 ...]

C. Distribute sentences randomly across the train-test-eval subsets: Train set: [SentenceA11, SentenceA21, SentenceB11, SentenceB21,

SentenceC11, SentenceD21 ...] Test set: [SentenceA12, SentenceA22, SentenceB12, SentenceC22, SentenceC12, SentenceD22 ...] Eval set:

[SentenceA13, SentenceA23, SentenceB13, SentenceC23, SentenceC13, SentenceD31 ...]

D. Distribute paragraphs of texts (i.e., chunks of consecutive sentences) across the train-test-eval subsets: Train set: [SentenceA11,

SentenceA12, SentenceD11, SentenceD12 ...] Test set: [SentenceA13, SentenceB13, SentenceB21, SentenceD23, SentenceC12, SentenceD13

...] Eval set: [SentenceA11, SentenceA22, SentenceB13, SentenceD22, SentenceC23, SentenceD11 ...]

Correct Answer: C

Community vote distribution

B (92%) 8%

  rc380 Highly Voted  2 years, 8 months ago

I think since we are predicting political leaning of authors, perhaps distributing authors make more sense? (B)
upvoted 18 times

  sensev 2 years, 8 months ago

Agree it should be B. Since every author has his/her distinct style, splitting different text from the same author across different set could result i
data label leakage.
upvoted 7 times

  dxxdd7 2 years, 7 months ago

I don't agree as we want to know the political affiliation from a text and not based on an author. I think A is better
upvoted 1 times

  jk73 2 years, 7 months ago

it is the political affiliation from a text, but to whom belong that text?
The statement clearly says ... Predict political affiliation of authors based on articles they have written. Hence the political affiliation is for
each author according to the text he wrote.
upvoted 2 times

  jk73 2 years, 7 months ago

Exactly! I also consider is B
Check this out!
If we just put inside the Training set , Validation set and Test set , randomly Text, Paragraph or sentences the model will have the ability to learn
specific qualities about The Author's use of language beyond just his own articles. Therefore the model will mixed up different opinions.
Rather if we divided things up a the author level, so that given authors were only on the training data, or only in the test data or only in the
validation data. The model will find more difficult to get a high accuracy on the test validation (What is correct and have more sense!). Because

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 120/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

it will need to really focus in author by author articles rather than get a single political affiliation based on a bunch of mixed articles from
different authors.

https://developers.google.com/machine-learning/crash-course/18th-century-literature
upvoted 12 times

  inder0007 Highly Voted  2 years, 9 months ago

Should be A, we are trying to get a label on the entire text so only A makes sense
upvoted 8 times

  GogoG 2 years, 6 months ago

Correct answer is B - https://developers.google.com/machine-learning/crash-course/18th-century-literature
upvoted 5 times

  Dunnoth 1 year, 2 months ago

This is a known study. if you use A, the moment a new author is given in a test set the accuracy is waay low than what your metrics might
suggest. To have realistic evaluation results it should be B. Also note that the label is for the "authour" not a text.
upvoted 1 times

  tavva_prudhvi Most Recent  10 months ago

Selected Answer: B

This is the best approach as it ensures that the data is distributed in a way that is representative of the overall population. By randomly distributing
authors across the subsets, we ensure that each subset has a similar distribution of political affiliations. This helps to minimize bias and increases
the likelihood that our model will generalize well to new data.

Distributing texts randomly or by sentences or paragraphs may result in subsets that are biased towards a particular political affiliation. This could
lead to overfitting and poor generalization performance. Therefore, it is important to distribute the data in a way that maintains the overall
distribution of political affiliations across the subsets.
upvoted 3 times

  M25 11 months, 3 weeks ago

Selected Answer: B

Went with B
upvoted 1 times

  John_Pongthorn 1 year, 2 months ago

Selected Answer: B

https://cloud.google.com/automl-tables/docs/prepare#split
https://developers.google.com/machine-learning/crash-course/18th-century-literature
upvoted 1 times

  enghabeth 1 year, 2 months ago

Selected Answer: B

Ans B
The model is to predict which political party the author belongs to, not which political party the text belongs to... You do not have the information
of the political party of each text, you are assuming that the texts are associated with the political party of the author.
upvoted 1 times

  bL357A 1 year, 7 months ago

Selected Answer: A

label is party, feature is text

upvoted 1 times

  suresh_vn 1 year, 8 months ago

IMO, B is correct
A,C,D label leakaged
upvoted 1 times

  ggorzki 2 years, 3 months ago

Selected Answer: B

https://developers.google.com/machine-learning/crash-course/18th-century-literature
Split by authors, otherwise there will be data leakage - the model will get the ability to learn author specific use of language
upvoted 6 times

  NamitSehgal 2 years, 3 months ago

B I agree
upvoted 1 times

  JobQ 2 years, 4 months ago

I already saw the video in: https://developers.google.com/machine-learning/crash-course/18th-century-literature

Based on this video I concluded that the answer is A. What answer B is saying is that you will have Author B's texts in the training set, Author A's
texts in the testing set and Author C's texts in the validation set. According to the video B is incorrect.

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 121/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

We want to have texts from author A in the training, testing and validation set. So A is correct. I think most people are choosing B because the
word "author" but let's be careful.
upvoted 2 times

  giaZ 2 years, 1 month ago

I though the same initially, but no..We'd want texts from author A in the training, testing and validation set if the task was to predict the author
from a text (meaning, if the label was the author..right? You train the model to learn the style of text and connect it to an author. You'd need
new texts from the same author in the test and validation sets, to see if the model is able to recognize him/her). HERE, the task is to predict
political affiliation from a text of an author. The author is given. In the test and validation sets you need new authors, to see wether the model is
able to guess their political affiliation. So you would do 80 authors (and corresponding texts) for training, 10 different authors for validation, and
10 different ones for test.
upvoted 5 times

  pddddd 2 years, 7 months ago

Partition by author - there is an actual example in Coursera 'Production ML systems' course
upvoted 1 times

  Macgogo 2 years, 7 months ago

I think it is B.
--
Your test data includes data from populations that will not be represented in production.

For example, suppose you are training a model with purchase data from a number of stores. You know, however, that the model will be used
primarily to make predictions for stores that are not in the training data. To ensure that the model can generalize to unseen stores, you should
segregate your data sets by stores. In other words, your test set should include only stores different from the evaluation set, and the evaluation set
should include only stores different from the training set.
https://cloud.google.com/automl-tables/docs/prepare#ml-use
upvoted 4 times

  Danny2021 2 years, 7 months ago

Should be D. Please see the dataset provided, it is based on the text / paragraphs.
upvoted 1 times

  george_ognyanov 2 years, 6 months ago

Have a look at the link the other have already provided twice. Splitting sentence by sentence is literally mentioned in said video as a bad
example and something we should not do in this case.
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 122/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #54 Topic 1

Your team has been tasked with creating an ML solution in Google Cloud to classify support requests for one of your platforms. You analyzed the

requirements and decided to use TensorFlow to build the classifier so that you have full control of the model's code, serving, and deployment. You

will use Kubeflow pipelines for the ML platform. To save time, you want to build on existing resources and use managed services instead of

building a completely new model. How should you build the classifier?

A. Use the Natural Language API to classify support requests.

B. Use AutoML Natural Language to build the support requests classifier.

C. Use an established text classification model on AI Platform to perform transfer learning.

D. Use an established text classification model on AI Platform as-is to classify support requests.

Correct Answer: D

Community vote distribution

C (91%) 9%

  arbik Highly Voted  2 years, 9 months ago

ANS: C as you want to have full control of the model code.

upvoted 27 times

  Celia20210714 Highly Voted  2 years, 9 months ago

ANS: D

https://cloud.google.com/ai-platform/training/docs/algorithms
- to use TensorFlow
- to build on existing resources
- to use managed services
upvoted 11 times

  george_ognyanov 2 years, 6 months ago

While D is very close for me, I think there are 2 giveaways here:
"To save time, you want to build on existing resources" - transfer learning
"instead of building a completely new model" - answer D leaves the model as is

ANS C:
upvoted 2 times

  ms_lemon 2 years, 6 months ago

the model cannot work as-is as the classes to predict will likely not be the same; we need to use transfer learning to retrain the last layer and
adapt it to the classes we need, hence C
upvoted 6 times

  M25 Most Recent  11 months, 3 weeks ago

Selected Answer: C

Went with C
upvoted 3 times

  Dunnoth 1 year, 2 months ago

Selected Answer: C

Usage of Tensorflow, can build a simple model by using a sentence embedding and a single layer classifier.
upvoted 1 times

  enghabeth 1 year, 2 months ago

Selected Answer: D

you don't need transfer learning in this case

upvoted 1 times

  Mohamed_Mossad 1 year, 10 months ago

Selected Answer: C

- "You analyzed the requirements and decided to use TensorFlow" this will make choices to reduce to C and D
- " so that you have full control of the model's code " will make us choose C
upvoted 2 times

  David_ml 1 year, 11 months ago

Selected Answer: C

Answer is C.
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 123/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

upvoted 1 times

  MasterMath 2 years ago

According to me it is B.
A is not correct as it uses an API call only and we won't build the system on existing resources.
C & D I do not see in AI Platform (Vertex AI) an established text classification that can be used.
The B answer is the right one, you have the labeled data, you need to remove the custom TF code and build a classifier with AutoML Natural
Language
upvoted 1 times

  David_ml 1 year, 11 months ago

B is wrong. question says " you have full control of the model's code". You don't have full control of automl code. The right answer is C.
upvoted 1 times

  giaZ 2 years, 1 month ago

Selected Answer: C

"full control of the model's code, serving, and deployment": Not A nor B.
and "you want to build on existing resources and use managed services": Not D (that's "as-is") You want transfer learning.
upvoted 3 times

  NamitSehgal 2 years, 3 months ago

Cis correct
upvoted 1 times

  george_ognyanov 2 years, 6 months ago

ANS: C according to me as well. As arbik said, full control, custom model are give aways.
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 124/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #55 Topic 1

You recently joined a machine learning team that will soon release a new project. As a lead on the project, you are asked to determine the

production readiness of the ML components. The team has already tested features and data, model development, and infrastructure. Which

additional readiness check should you recommend to the team?

A. Ensure that training is reproducible.

B. Ensure that all hyperparameters are tuned.

C. Ensure that model performance is monitored.

D. Ensure that feature expectations are captured in the schema.

Correct Answer: A

Community vote distribution

C (94%) 6%

  inder0007 Highly Voted  2 years, 9 months ago

I think it should be C
upvoted 20 times

  omar_bh 2 years, 9 months ago

performance monitoring is a continuous effort that happens all time. but reproducibility makes more sense to be added to model QA
upvoted 3 times

  sensev 2 years, 9 months ago

The question was not about model QA but production readiness, thus I think the answer is C because monitor model performance in
production is important. As regard to A, I would I argue it could fall under "model development", since reproducible training is already
important during model development.
upvoted 3 times

  vivid_cucumber 2 years, 5 months ago

To my understanding, I think A might be correct since model performance monitoring is happens "in production". but the question said
the project "will soon release" which means right now is before launching, so to me testing the reproducible would make more sense. (I
was confused about A and C for a long time)
reference:
- Testing reproducibility: https://developers.google.com/machine-learning/testing-debugging/pipeline/deploying
- Testing in Production: https://developers.google.com/machine-learning/testing-debugging/pipeline/production
upvoted 6 times

  simoncerda 2 years, 4 months ago

I also think is C:
reference :
https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/aad9f93b86b7addfea4c419b9100c6cdd26cacea.pdf
upvoted 1 times

  ralf_cc Highly Voted  2 years, 9 months ago

A - important one before moving to the production

upvoted 9 times

  salsabilsf 2 years, 9 months ago

Testing for Deploying Machine Learning Models:
- Test Model Updates with Reproducible Training
https://developers.google.com/machine-learning/testing-debugging/pipeline/deploying
upvoted 5 times

  fragkris Most Recent  4 months, 3 weeks ago

Selected Answer: C

Monitoring is crucial. So - C
upvoted 2 times

  M25 11 months, 3 weeks ago

Selected Answer: C

Went with C
upvoted 1 times

  e707 1 year ago

Selected Answer: C

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 125/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

I'll go with C.
Monitoring model performance is an important aspect of production readiness. It allows the team to detect and respond to changes in
performance that may affect the quality of the model. The other options are also important, but they are more focused on the development phase
of the project rather than the production phase.
upvoted 1 times

  John_Pongthorn 1 year, 2 months ago

Selected Answer: C

Hey! all guys

A+B+D=The team has already tested features and data, model development, and infrastructure. we are about to go live with production.
Monitoring readiness is the last thing to account for.

It will be very rediculous if you launch model as production regardless of how we will have about monitoring. you will lauch model as production
for while and will make plan to model performance monitoring later ??? you are too reckless.

Pls . Read it carefully https://developers.google.com/machine-learning/testing-debugging/pipeline/production

https://developers.google.com/machine-learning/testing-debugging/pipeline/overview#what-is-an-ml-pipeline.
You

Most guys prefer A : https://developers.google.com/machine-learning/testing-debugging/pipeline/deploying I think that it is all about model

development prior to deploying .
upvoted 4 times

  enghabeth 1 year, 2 months ago

Selected Answer: C

I think that your team ensure that all hypermarameters were turned yet when tested features... i think that it's more important that they ensure tha
model performance is monitored than thaining is reproducible for best practices.
https://cloud.google.com/architecture/ml-on-gcp-best-practices
upvoted 1 times

  John_Pongthorn 1 year, 3 months ago

Selected Answer: C

Reproducible Training is more likely to be in the Deployment step in that it referred to the question "The team has already tested features and data
model development" but the question focuses on Production readiness
https://developers.google.com/machine-learning/testing-debugging/pipeline/production
Monitor section is part of this above link
upvoted 1 times

  ares81 1 year, 3 months ago

Selected Answer: C

C, for me.
upvoted 1 times

  vakati 1 year, 5 months ago

Selected Answer: C

It's mentioned that the team has already tested features and data, implying that data generation is reproducible. If you have to test features data
has to be reproducible to compare model outputs. ( https://developers.google.com/machine-learning/data-prep/construct/sampling-
splitting/randomization). Hence C makes more sense
upvoted 2 times

  bL357A 1 year, 7 months ago

Selected Answer: C

https://cloud.google.com/ai-platform/docs/ml-solutions-overview
upvoted 1 times

  u_phoria 1 year, 10 months ago

Selected Answer: C

With the specific focus on "production readiness" as stated, I'd pick C above the others.
upvoted 2 times

  KD1988 1 year, 10 months ago

I think it's C.
A is related to infrastructure, B is related to model development and D is related to Data and features. It clearly mentioned that team has already
tested for model development, data and features and infrastructure.
upvoted 1 times

  Mohamed_Mossad 1 year, 10 months ago

Selected Answer: A

"production readiness" means that we are still in dev-test phase , and "performance
monitoring" happens in production , and what if monitoring is applied but the model re-train is difficult , so "A" is the best answer
upvoted 1 times

  abc0000 2 years, 2 months ago

A makes more sense than C.
upvoted 2 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 126/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 127/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #56 Topic 1

You work for a credit card company and have been asked to create a custom fraud detection model based on historical data using AutoML Tables.

You need to prioritize detection of fraudulent transactions while minimizing false positives. Which optimization objective should you use when

training the model?

A. An optimization objective that minimizes Log loss

B. An optimization objective that maximizes the Precision at a Recall value of 0.50

C. An optimization objective that maximizes the area under the precision-recall curve (AUC PR) value

D. An optimization objective that maximizes the area under the receiver operating characteristic curve (AUC ROC) value

Correct Answer: C

Community vote distribution

C (88%) 13%

  Paul_Dirac Highly Voted  2 years, 9 months ago

This is a case of imbalanced data.

Ans: C
https://stats.stackexchange.com/questions/262616/roc-vs-precision-recall-curves-on-imbalanced-dataset

https://neptune.ai/blog/f1-score-accuracy-roc-auc-pr-auc
upvoted 20 times

  GogoG 2 years, 6 months ago

C is wrong - correct answer is D. ROC basically compares True Positives against False Negative, exactly what we are trying to optimise for.
upvoted 2 times

  ralf_cc Highly Voted  2 years, 9 months ago

D - https://en.wikipedia.org/wiki/Receiver_operating_characteristic
upvoted 8 times

  omar_bh 2 years, 9 months ago

True. The true positive is presented by Y axis. The bigger the area the graph take, the higher TP ratio
upvoted 2 times

  tavva_prudhvi 9 months, 1 week ago

A larger area under the ROC curve does indicate a better model performance in terms of correctly identifying true positives. However, it doe
not take into account the imbalance in the class distribution or the costs associated with false positives and false negatives.

In contrast, the AUC PR curve focuses on the trade-off between precision (Y-axis) and recall (X-axis), making it more suitable for imbalanced
datasets and applications with different costs for false positives and false negatives, like credit card fraud detection.
upvoted 2 times

  tavva_prudhvi 9 months, 1 week ago

AUC ROC is more suitable when the class distribution is balanced and false positives and false negatives have similar costs.

In the case of credit card fraud detection, the class distribution is typically imbalanced (fewer fraudulent transactions compared to non-
fraudulent ones), and the cost of false positives (incorrectly identifying a transaction as fraudulent) and false negatives (failing to detect a
fraudulent transaction) are not the same.

By maximizing the AUC PR (area under the precision-recall curve), the model focuses on the trade-off between precision (proportion of true
positives among predicted positives) and recall (proportion of true positives among actual positives), which is more relevant in imbalanced
datasets and for applications where the costs of false positives and false negatives are not equal. This makes option C a better choice for credit
card fraud detection.
upvoted 2 times

  tavva_prudhvi Most Recent  10 months ago

Selected Answer: C

In fraud detection, it's crucial to minimize false positives (transactions flagged as fraudulent but are actually legitimate) while still detecting as man
fraudulent transactions as possible. AUC PR is a suitable optimization objective for this scenario because it provides a balanced trade-off between
precision and recall, which are both important metrics in fraud detection. A high AUC PR value indicates that the model has high precision and
recall, which means it can detect a large number of fraudulent transactions while minimizing false positives.

Log loss (A) and AUC ROC (D) are also commonly used optimization objectives in machine learning, but they may not be as effective in this
particular scenario. Precision at a Recall value of 0.50 (B) is a specific metric and not an optimization objective.
upvoted 4 times

  M25 11 months, 3 weeks ago

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 128/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Selected Answer: C

Went with C
upvoted 1 times

  John_Pongthorn 1 year, 2 months ago

Selected Answer: C

Hi Everyone
I discover, there are some clues that this question is likely to refer to the last section of https://developers.google.com/machine-learning/crash-
course/classification/roc-and-auc
This is what it tries to tell us especially with the last sentence
Classification-threshold invariance is not always desirable. In cases where there are wide disparities in the cost of false negatives vs. false positives,
it may be critical to minimize one type of classification error. For example, when doing email spam detection, you likely want to prioritize
minimizing false positives (even if that results in a significant increase of false negatives). AUC isn't a useful metric for this type of optimization.

Additionally, it tells me which of the following choices is the answer to this question as below.
https://cloud.google.com/automl-tables/docs/train#opt-obj.
upvoted 1 times

  enghabeth 1 year, 2 months ago

Selected Answer: D

What is different however is that ROC AUC looks at a true positive rate TPR and false positive rate FPR while PR AUC looks at positive predictive
value PPV and true positive rate TPR.

Detect Fraudulent transactions = Max TP

Minimizing false positives -> min FP

https://neptune.ai/blog/f1-score-accuracy-roc-auc-pr-
auc#:~:text=ROC%20AUC%20vs%20PR%20AUC&text=What%20is%20different%20however%20is,and%20true%20positive%20rate%20TPR
upvoted 1 times

  John_Pongthorn 1 year, 3 months ago

Selected Answer: C

Detection of fraudulent transactions seems to be imbalanced data.

https://cloud.google.com/automl-tables/docs/train#opt-obj
AUC ROC : Distinguish between classes. Default value for binary classification.

AUC PR Optimize results for predictions for the less common class.
it is straightforward to answer, you just have to capture key word to get the right way. (Almost banlanced Or Imbalanced)
https://machinelearningmastery.com/roc-curves-and-precision-recall-curves-for-classification-in-python/

When to Use ROC vs. Precision-Recall Curves?

Generally, the use of ROC curves and precision-recall curves are as follows:

ROC curves should be used when there are roughly equal numbers of observations for each class.
Precision-Recall curves should be used when there is a moderate to large class imbalance.
upvoted 3 times

  ares81 1 year, 3 months ago

Selected Answer: C

Fraud Detection --> Imbalanced Dataset ---> AUC PR --> C, for me

upvoted 1 times

  wish0035 1 year, 4 months ago

Selected Answer: C

ans: C
Paul_Dirac and giaZ are correct.
upvoted 1 times

  hiromi 1 year, 4 months ago

Selected Answer: C

C
https://towardsdatascience.com/on-roc-and-precision-recall-curves-c23e9b63820c
upvoted 2 times

  itallix 1 year, 7 months ago

"You need to prioritize detection of fraudulent transactions while minimizing false positives."
Seems that answer B fits this well. If we want to focus exactly on minimizing false positives we can do that by maximising Precision at a specific
Recall value. C is about balance between these two, and D doesn't care about false positive/negatives.
upvoted 2 times

  suresh_vn 1 year, 8 months ago

Selected Answer: D

D
https://en.wikipedia.org/wiki/Receiver_operating_characteristic
C optimize precision only

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 129/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

upvoted 1 times

  suresh_vn 1 year, 8 months ago

Sorry, C is my final decision
https://cloud.google.com/automl-tables/docs/train#opt-obj
upvoted 1 times

  rtnk22 1 year, 8 months ago

Selected Answer: C

Answer is c.
upvoted 1 times

  giaZ 2 years, 1 month ago

https://icaiit.org/proceedings/6th_ICAIIT/1_3Fayzrakhmanov.pdf
The problem of fraudulent transactions detection, which is an imbalanced classification problem (most transactions are not fraudulent), you want t
maximize both precision and recall; so the area under the PR curve. As a matter of fact, the question asks you to focus on detecting fraudulent
transactions (maximize true positive rate, a.k.a. Recall) while minimizing false positives (a.k.a. maximizing Precision). Another way to see it is this: fo
imbalanced problems like this one you'll get a lot of true negatives even from a bad model (it's easy to guess a transaction as "non-fraudulent"
because most of them are!), and with high TN the ROC curve goes high fast, which would be misleading. So you wanna avoid dealing with true
negatives in your evaluation, which is precisely what the PR curve allows you to do.
upvoted 6 times

  ramen_lover 2 years, 4 months ago

The following is the official document for the list of optimization objectives for AutoML Tables
"About model optimization objectives"
https://cloud.google.com/automl-tables/docs/train#opt-obj

AUC PR: Optimize results for predictions for the less common class.
upvoted 3 times

  attaraya 2 years, 5 months ago

I also vote for Ans:C since this is an outlier detection which is imbalanced. So best metric is AUC-PR to evaluate the model
upvoted 2 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 130/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #57 Topic 1

Your company manages a video sharing website where users can watch and upload videos. You need to create an ML model to predict which

newly uploaded videos will be the most popular so that those videos can be prioritized on your company's website. Which result should you use to

determine whether the model is successful?

A. The model predicts videos as popular if the user who uploads them has over 10,000 likes.

B. The model predicts 97.5% of the most popular clickbait videos measured by number of clicks.

C. The model predicts 95% of the most popular videos measured by watch time within 30 days of being uploaded.

D. The Pearson correlation coefficient between the log-transformed number of views after 7 days and 30 days after publication is equal to 0.

Correct Answer: C

Community vote distribution

C (100%)

  Paul_Dirac Highly Voted  2 years, 10 months ago

Ans: C (See https://developers.google.com/machine-learning/problem-framing/framing#quantify-it; though it's just an example.)

(A) The absolute number of likes shouldn't be used because no information about subscribers or visits to the website is provided. The number may
vary.
(B) Clickbait videos are a subset of uploaded videos. Using them is an improper criterion.
(D) The coefficient should reach 1. (Ref:https://arxiv.org/pdf/1510.06223.pdf)
upvoted 16 times

  sensev 2 years, 9 months ago

Thanks for the detailed unswer and reference!
upvoted 5 times

  M25 Most Recent  11 months, 3 weeks ago

Selected Answer: C

Went with C
upvoted 1 times

  wish0035 1 year, 4 months ago

ans: C
In this type of questions, I think a good idea is trying to copy already existing solutions. For this case, YouTube cares a lot about watchtime. In a
previous question, Amazon implemented "Usually buy together" for maximizing profit.
upvoted 3 times

  hiromi 1 year, 4 months ago

Selected Answer: C

Must be C
upvoted 1 times

  Mohamed_Mossad 1 year, 9 months ago

Selected Answer: C

watch time among all other options is the most KPI to rely on
upvoted 2 times

  baimus 2 years, 1 month ago

I think this is B. The question specifies "popular" and also that "newly uploaded" videos need prioritising. C is therefore wrong because you don't
have that metric until 30 days has passed from upload time. "Click through rate" is one measure of popularity, so it fits, and is instant.
upvoted 1 times

  NamitSehgal 2 years, 3 months ago

C looks correct.
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 131/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

  celia20200410 2 years, 9 months ago

ANS: C

D is wrong.
Pearson's Correlation Coefficient is a linear correlation coefficient that returns a value of between -1 and +1.
A -1 means there is a strong negative correlation
+1 means that there is a strong positive correlation
0 means that there is no correlation
upvoted 3 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 132/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #58 Topic 1

You are working on a Neural Network-based project. The dataset provided to you has columns with different ranges. While preparing the data for

model training, you discover that gradient optimization is having difficulty moving weights to a good solution. What should you do?

A. Use feature construction to combine the strongest features.

B. Use the representation transformation (normalization) technique.

C. Improve the data cleaning step by removing features with missing values.

D. Change the partitioning step to reduce the dimension of the test set and have a larger training set.

Correct Answer: C

Community vote distribution

B (92%) 8%

  kurasaki Highly Voted  2 years, 9 months ago

Vote for B. We could impute instead of remove the column to avoid loss of information
upvoted 25 times

  pddddd Highly Voted  2 years, 7 months ago

I also think it is B:
"The presence of feature value X in the formula will affect the step size of the gradient descent. The difference in ranges of features will cause
different step sizes for each feature. To ensure that the gradient descent moves smoothly towards the minima and that the steps for gradient
descent are updated at the same rate for all the features, we scale the data before feeding it to the model."
upvoted 8 times

  MultiCloudIronMan Most Recent  4 weeks ago

Selected Answer: B

Because the range needs to normalize

upvoted 1 times

  fragkris 4 months, 3 weeks ago

Selected Answer: B

B - The key phrase is "different ranges", therefore we need to normalize the values.
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: B

Went with B
upvoted 1 times

  SergioRubiano 12 months ago

Selected Answer: B

Normalization
upvoted 1 times

  ares81 1 year, 3 months ago

Selected Answer: B

Normalization is the word.

upvoted 2 times

  ares81 1 year, 3 months ago

Selected Answer: C

Normalization is the word.

upvoted 1 times

  hiromi 1 year, 4 months ago

Selected Answer: B

B
"Normalization" is the keyword
upvoted 1 times

  ggorzki 2 years, 3 months ago

Selected Answer: B

normalization
https://developers.google.com/machine-learning/data-prep/transform/transform-numeric
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 133/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

upvoted 4 times

  MK_Ahsan 2 years, 3 months ago

B. The problem does not mention anything about missing values. It needs to normalize the features with different ranges.
upvoted 4 times

  NamitSehgal 2 years, 3 months ago

Looking at explanation I would choose C as well
upvoted 1 times

  kaike_reis 2 years, 5 months ago

(B)
- NN models needs features with close ranges
- SGD converges well using features in [0, 1] scale
- The question specifically mention "different ranges"
Documentation - https://developers.google.com/machine-learning/data-prep/transform/transform-numeric
upvoted 3 times

  Y2Data 2 years, 7 months ago

When gradient descent fails, it's out of the lacking of a powerful feature. Using normalization would make it worse.
Instead, using either A or C would increase the strength of certain feature.
But, C should come first since A is only feasible after at least 1 meaningful training.
So C.
upvoted 2 times

  ralf_cc 2 years, 9 months ago

B - remove the outliers?
upvoted 3 times

  omar_bh 2 years, 9 months ago

Normalization is more complicated than that.

Normalization changes the values of dataset's numeric fields to be in a common scale, without impacting differences in the ranges of values.
Normalization is required only when features have different ranges.
upvoted 4 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 134/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #59 Topic 1

Your data science team needs to rapidly experiment with various features, model architectures, and hyperparameters. They need to track the

accuracy metrics for various experiments and use an API to query the metrics over time. What should they use to track and report their

experiments while minimizing manual effort?

A. Use Kubeflow Pipelines to execute the experiments. Export the metrics file, and query the results using the Kubeflow Pipelines API.

B. Use AI Platform Training to execute the experiments. Write the accuracy metrics to BigQuery, and query the results using the BigQuery API.

C. Use AI Platform Training to execute the experiments. Write the accuracy metrics to Cloud Monitoring, and query the results using the

Monitoring API.

D. Use AI Platform Notebooks to execute the experiments. Collect the results in a shared Google Sheets file, and query the results using the

Google Sheets API.

Correct Answer: B

Community vote distribution

A (69%) C (27%) 4%

  Celia20210714 Highly Voted  2 years, 9 months ago

ANS: A
https://codelabs.developers.google.com/codelabs/cloud-kubeflow-pipelines-gis
Kubeflow Pipelines (KFP) helps solve these issues by providing a way to deploy robust, repeatable machine learning pipelines along with
monitoring, auditing, version tracking, and reproducibility. Cloud AI Pipelines makes it easy to set up a KFP installation.
upvoted 12 times

  Dunnoth Highly Voted  1 year, 2 months ago

Selected Answer: A

Old answer is A. New answer (not available) would be Virtex AI experiments which comes with monitoring API inbuilt.
https://cloud.google.com/blog/topics/developers-practitioners/track-compare-manage-experiments-vertex-ai-experiments
upvoted 9 times

  Mickey321 Most Recent  5 months, 2 weeks ago

Selected Answer: C

either A or C but going with C due to minimal effort

upvoted 3 times

  Liting 9 months, 3 weeks ago

Selected Answer: A

I agree with tavva_prudhvi that cloud monitoring is not the best option to do machine learning tracking, Metadata is a better option for that
purpose
upvoted 1 times

  tavva_prudhvi 10 months ago

Selected Answer: A

Option C suggests using AI Platform Training to execute the experiments and write the accuracy metrics to Cloud Monitoring. While Cloud
Monitoring can be used to monitor and collect metrics from various services in Google Cloud, it is not specifically designed for machine learning
experiments tracking.

Using Cloud Monitoring for tracking machine learning experiments may not provide the same level of functionality and flexibility as Kubeflow
Pipelines or AI Platform Training. Additionally, querying the results from Cloud Monitoring may not be as straightforward as using the APIs
provided by Kubeflow Pipelines or AI Platform Training.

Therefore, while Cloud Monitoring can be used as a general-purpose monitoring solution, it may not be the best option for tracking and reporting
machine learning experiments.
upvoted 2 times

  PST21 10 months, 2 weeks ago

Cloud monitoring may not be the most suitable option for tracking and reporting experiments, only because of this option C is out & I stick to A
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: A

Went with A
upvoted 2 times

  lucaluca1982 1 year ago

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 135/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Selected Answer: B

It is B
upvoted 1 times

  John_Pongthorn 1 year, 2 months ago

This is the question, Try out and choose what is the closet to this lab.Last updated Jan 21, 2023
https://codelabs.developers.google.com/vertex_experiments_pipelines_intro#0
upvoted 1 times

  John_Pongthorn 1 year, 2 months ago

As The lab walk me through how to create pipe line to experiment , it use Kubeflow and apply experiment SDK
upvoted 1 times

  ares81 1 year, 3 months ago

Selected Answer: C

Vertex AI Experiments + Cloud Monitoring for the metrics. It's C!

upvoted 3 times

  mymy9418 1 year, 4 months ago

Selected Answer: C

I like C
https://cloud.google.com/monitoring/mql
upvoted 1 times

  Pancy 1 year, 4 months ago

C: Google has already provided inhouse monitoring mechanism so no need to query or use any other tool.
https://cloud.google.com/bigquery/docs/monitoring
upvoted 1 times

  Mohamed_Mossad 1 year, 11 months ago

https://www.kubeflow.org/docs/components/pipelines/introduction/#what-is-kubeflow-pipelines
upvoted 1 times

  Mohamed_Mossad 1 year, 11 months ago

Selected Answer: A

kubeflow pipelines has already experiment tracking API , so A is the correct , B is valid also but the question states "minimizing manual effort"
upvoted 2 times

  Mohamed_Mossad 1 year, 11 months ago

https://www.kubeflow.org/docs/components/pipelines/introduction/#what-is-kubeflow-pipelines
upvoted 1 times

  fdmenendez 2 years, 3 months ago

For me A is wrong because it is not "rapidly", I like B more than C
upvoted 1 times

  NamitSehgal 2 years, 3 months ago

Selected Answer: A

I think A is correct unless we are using Bigquery ML to create our models, we can select C
upvoted 2 times

  ramen_lover 2 years, 5 months ago

Answer A.

> "Kubeflow Pipelines supports the export of scalar metrics. You can write a list of metrics to a local file to describe the performance of the model.
The pipeline agent uploads the local file as your run-time metrics. You can view the uploaded metrics as a visualization in the Runs page for a
particular experiment in the Kubeflow Pipelines UI."
https://www.kubeflow.org/docs/components/pipelines/sdk/pipelines-metrics/
upvoted 4 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 136/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #60 Topic 1

You work for a bank and are building a random forest model for fraud detection. You have a dataset that includes transactions, of which 1% are

identified as fraudulent. Which data transformation strategy would likely improve the performance of your classifier?

A. Write your data in TFRecords.

B. Z-normalize all the numeric features.

C. Oversample the fraudulent transaction 10 times.

D. Use one-hot encoding on all categorical features.

Correct Answer: C

Reference:

https://towardsdatascience.com/how-to-build-a-machine-learning-model-to-identify-credit-card-fraud-in-5-stepsa-hands-on-modeling-

5140b3bd19f1

Community vote distribution

C (100%)

  ralf_cc Highly Voted  2 years, 9 months ago

C - https://swarit.medium.com/detecting-fraudulent-consumer-transactions-through-machine-learning-25b1f2cabbb4
upvoted 13 times

  NamitSehgal Highly Voted  2 years, 3 months ago

Selected Answer: C

C is the answer
upvoted 5 times

  MultiCloudIronMan Most Recent  4 weeks ago

Selected Answer: C

Oversampling increases the number of fraudulent transaction in the training data to enable the machine to learn how to predict them
upvoted 1 times

  fragkris 4 months, 3 weeks ago

Selected Answer: C

C - Even though most similar questions propose to downsample the majority (not fraudulent) and add weights to it.
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: C

Went with C
upvoted 2 times

  wish0035 1 year, 4 months ago

Selected Answer: C

ans: C

A, B, D => wouldnt help with imbalance

upvoted 1 times

  hiromi 1 year, 4 months ago

Selected Answer: C

C
https://medium.com/analytics-vidhya/credit-card-fraud-detection-how-to-handle-imbalanced-dataset-1f18b6f881
upvoted 1 times

  Mohamed_Mossad 1 year, 9 months ago

Selected Answer: C

the best option is C

upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 137/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #61 Topic 1

You are using transfer learning to train an image classifier based on a pre-trained EfficientNet model. Your training dataset has 20,000 images.

You plan to retrain the model once per day. You need to minimize the cost of infrastructure. What platform components and configuration

environment should you use?

A. A Deep Learning VM with 4 V100 GPUs and local storage.

B. A Deep Learning VM with 4 V100 GPUs and Cloud Storage.

C. A Google Kubernetes Engine cluster with a V100 GPU Node Pool and an NFS Server

D. An AI Platform Training job using a custom scale tier with 4 V100 GPUs and Cloud Storage

Correct Answer: C

Community vote distribution

D (73%) B (15%) 12%

  wish0035 Highly Voted  1 year, 4 months ago

Selected Answer: D

ans: D

A, C => local storage, NFS... discarded. Google encourages you to use Cloud Storage.
B => could do the job, but here I would focus on the "daily training" thing, because Vertex AI Training jobs are better for this. Also I think that
Google usually encourages to use Vertex AI over VMs.
upvoted 9 times

  abhay669 Most Recent  5 months ago

Selected Answer: D

I'll go with D. How is C correct?

upvoted 1 times

  Mickey321 5 months, 2 weeks ago

Selected Answer: A

D as need to minimize cost

upvoted 1 times

  Mdso 9 months ago

Selected Answer: A

I think it is A. Refer to Q20 of the GCP Sample Questions - they say managed services (such as Kubeflow Pipelines / Vertex AI) are not the options
for 'minimizing costs'. In this case, you should configure your own infrastructure to train the model leaving A,B. Undecided between A,B because A
would minimize costs, but also result in inefficient I/O operations during training.
upvoted 2 times

  tavva_prudhvi 10 months ago

Selected Answer: D

The pre-trained EfficientNet model can be easily loaded from Cloud Storage, which eliminates the need for local storage or an NFS server. Using A
Platform Training allows for the automatic scaling of resources based on the size of the dataset, which can save costs compared to using a fixed-
size VM or node pool. Additionally, the ability to use custom scale tiers allows for fine-tuning of resource allocation to match the specific needs of
the training job.
upvoted 2 times

  M25 11 months, 3 weeks ago

Selected Answer: D

Went with D
upvoted 1 times

  shankalman717 1 year, 2 months ago

Selected Answer: B

B. A Deep Learning VM with 4 V100 GPUs and Cloud Storage.

For this scenario, a Deep Learning VM with 4 V100 GPUs and Cloud Storage is likely the most cost-effective solution while still providing sufficient
computing resources for the model training. Using Cloud Storage can allow the model to be trained and the data to be stored in a scalable and
cost-effective way.

Option A, using a Deep Learning VM with local storage, may not provide enough storage capacity to store the training data and model
checkpoints. Option C, using a Kubernetes Engine cluster, can be overkill for the size of the job and adds additional complexity. Option D, using an
AI Platform Training job, is a good option as it is designed for running machine learning jobs at scale, but may be more expensive than a Deep
Learning VM with Cloud Storage.
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 138/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

upvoted 2 times

  enghabeth 1 year, 2 months ago

Selected Answer: D

becouse it's cheap

upvoted 1 times

  hiromi 1 year, 4 months ago

Selected Answer: D

it seems D
upvoted 3 times

  OzoneReloaded 1 year, 4 months ago

Selected Answer: D

I think it's D
upvoted 2 times

  JeanEl 1 year, 4 months ago

Selected Answer: B

It's D
upvoted 2 times

  ares81 1 year, 4 months ago

It seems D to me.
upvoted 4 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 139/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #62 Topic 1

While conducting an exploratory analysis of a dataset, you discover that categorical feature A has substantial predictive power, but it is

sometimes missing. What should you do?

A. Drop feature A if more than 15% of values are missing. Otherwise, use feature A as-is.

B. Compute the mode of feature A and then use it to replace the missing values in feature A.

C. Replace the missing values with the values of the feature with the highest Pearson correlation with feature A.

D. Add an additional class to categorical feature A for missing values. Create a new binary feature that indicates whether feature A is missing.

Correct Answer: A

Community vote distribution

D (68%) B (32%)

  wish0035 Highly Voted  1 year, 4 months ago

Selected Answer: D

ans: D

A => no, you don't want to drop a feature with high prediction power.
B => i think this could confuse the model... a better solution could be to fill missing values using an algorithm like Expectation Maximization, but
using the mode i think is a bad idea in this case, because if you have a significant number of missing values (for example >10%) this would modify
the "predictive power". you don't want to lose predictive power of a feature, just guide the model to learn when to use that feature and when to
ignore it.
C => this doesn't make any sense for me. not sure what i would do that.
D => i think this could be a really good approach, and i'm pretty sure it would work pretty well a lot of models. the model would learn that when
"is_available_feat_A" == True, then it would use the feature A, but whenever it is missing then it would try to use other features.
upvoted 13 times

  frangm23 1 year ago

I guess I would go with D, but it confuses me the fact that in option D, it doesn't say that NaN values are replaced (only that there's a new
column added) and this could lead to problems like exploding gradients.
Plus, Google encourages to replace missing values. https://developers.google.com/machine-learning/testing-debugging/common/data-errors
Any thoughts on this?
upvoted 2 times

  MultiCloudIronMan Most Recent  4 weeks ago

Selected Answer: B

Google encourages filling missing value and using mode is one of the examples given. D only tell the obvious - data is missing!
upvoted 1 times

  fragkris 4 months, 3 weeks ago

Selected Answer: D

B and D are correct, but I decided to go with D.

upvoted 1 times

  Mickey321 5 months, 2 weeks ago

Selected Answer: D

highly predictive
upvoted 1 times

  ichbinnoah 5 months, 2 weeks ago

Selected Answer: B

Definitely not D, it does not even solve the problem of NA values.

upvoted 1 times

  andresvelasco 7 months, 1 week ago

Options B or D
But isnt there an inconsistency in option D? if you replace missing values with a new category ("missing") why would you haveto create an extra
feature?
upvoted 1 times

  Liting 9 months, 3 weeks ago

Selected Answer: D

Agree with wish0035, answer should be D

upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 140/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

  PST21 10 months, 1 week ago

By creating a new class for the missing values, you explicitly capture the absence of data, which can provide valuable information for predictive
modeling. Additionally, creating a binary feature allows the model to distinguish between cases where feature A is present and cases where it is
missing, which can be useful for identifying potential patterns or relationships in the data.
upvoted 2 times

  amtg 10 months, 3 weeks ago

Selected Answer: B

By imputing the missing values with the mode (the most frequent value), you retain the original feature's predictive power while handling the
missing values
upvoted 1 times

  Scipione_ 11 months, 1 week ago

Selected Answer: D

Both B and D are possible, but the correct answer is D because of the feature high predictive power.
upvoted 2 times

  M25 11 months, 3 weeks ago

Selected Answer: D

Went with D
upvoted 1 times

  tavva_prudhvi 1 year, 1 month ago

I think, its D.
Option B of imputing the missing values of feature A with the mode of feature A could be a reasonable approach if the mode provides a good
representation of the distribution of feature A. However, this method may lead to biased results if the mode is not representative of the missing
values. This could be the case if the missing values have a different distribution than the observed values.

Similarly, When a categorical feature has substantial predictive power, it is important not to discard it. Instead, missing values can be handled by
adding an additional class for missing values and creating a new binary feature that indicates whether feature A is missing or not. This approach
ensures that the predictive power of feature A is retained while accounting for missing values. Computing the mode of feature A and replacing
missing values may distort the distribution of the feature and create bias in the analysis. Similarly, replacing missing values with values from
another feature may introduce noise and lead to incorrect results.
upvoted 2 times

  BenMS 1 year, 2 months ago

Selected Answer: D

If our objective was to produce a complete dataset then we might use some average value to fill in the gaps (option B) but in this case we want to
predict an outcome, so inventing our own data is not going to help in my view.

Option D is the most sensible approach to let the model choose the best features.
upvoted 1 times

  hiromi 1 year, 4 months ago

Selected Answer: B

B
"For categorical variables, we can usually replace missing values with mean, median, or most frequent values"
Dr. Logan Song - Journey to Become a Google Cloud Machine Learning Engineer - Page 48
upvoted 4 times

  tavva_prudhvi 5 months, 3 weeks ago

While this approach may seem reasonable, it can introduce bias in the dataset by over-representing the mode, especially if the missing values
are not missing at random.
upvoted 1 times

  Pancy 1 year, 4 months ago

B. Because the important feature is already known. By using mode, contribution of other features will not be missed
upvoted 2 times

  ares81 1 year, 4 months ago

Mode is the way to go for categorical features. B, for me.
upvoted 3 times

  LearnSodas 1 year, 4 months ago

Selected Answer: B

I agree with B
upvoted 3 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 141/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #63 Topic 1

You work for a large retailer and have been asked to segment your customers by their purchasing habits. The purchase history of all customers

has been uploaded to BigQuery. You suspect that there may be several distinct customer segments, however you are unsure of how many, and you

don’t yet understand the commonalities in their behavior. You want to find the most efficient solution. What should you do?

A. Create a k-means clustering model using BigQuery ML. Allow BigQuery to automatically optimize the number of clusters.

B. Create a new dataset in Dataprep that references your BigQuery table. Use Dataprep to identify similarities within each column.

C. Use the Data Labeling Service to label each customer record in BigQuery. Train a model on your labeled data using AutoML Tables. Review

the evaluation metrics to understand whether there is an underlying pattern in the data.

D. Get a list of the customer segments from your company’s Marketing team. Use the Data Labeling Service to label each customer record in

BigQuery according to the list. Analyze the distribution of labels in your dataset using Data Studio.

Correct Answer: B

Community vote distribution

A (95%) 5%

  MultiCloudIronMan 4 weeks ago

Selected Answer: A

K-means algorithm is used for grouping/clustering data in unsupervised learning experiments.

upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: A

Went with A
upvoted 3 times

  CloudKida 11 months, 3 weeks ago

Selected Answer: A

when to use k-means : Your data may contain natural groupings or clusters of data. You may want to identify these groupings descriptively in orde
to make data-driven decisions. For example, a retailer may want to identify natural groupings of customers who have similar purchasing habits or
locations. This process is known as customer segmentation.
https://cloud.google.com/bigquery/docs/kmeans-tutorial
upvoted 2 times

  tavva_prudhvi 1 year, 1 month ago

A
This is the most efficient solution for segmenting customers based on their purchasing habits, as it utilizes BigQuery's built-in machine learning
capabilities to identify distinct clusters of customers based on their purchasing behavior. By allowing BigQuery to automatically optimize the
number of clusters, you can ensure that the model identifies the most appropriate number of segments based on the data, without having to
manually select the number of clusters.
upvoted 2 times

  ares81 1 year, 3 months ago

Selected Answer: A

I correct myself. It's A:

According to the documentation, if you omit the num_clusters option, BigQuery ML will choose a reasonable default based on the total number of
rows in the training data.
upvoted 2 times

  hiromi 1 year, 4 months ago

Selected Answer: A

A
https://cloud.google.com/bigquery-ml/docs/kmeans-tutorial
https://towardsdatascience.com/how-to-use-k-means-clustering-in-bigquery-ml-to-understand-and-describe-your-data-better-c972c6f5733b
upvoted 3 times

  wish0035 1 year, 4 months ago

Selected Answer: A

ans: A, pretty sure.

C, D => discarded, very time consuming.

B => yes, you can identify similarities within each column, but when i read "you don’t yet understand the commonalities in their behavior" i
understand that this job would be difficult, because there could be many columns to analyze, and i don't think that this would be efficient.

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 142/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

A => BigQuery ML is compatible with kmeans clustering, it's easy and efficient to create, and i would automatically detect the number of clusters.

Also from the BigQuery ML docs: "K-means clustering for data segmentation; for example, identifying customer segments."
(Source: https://cloud.google.com/bigquery-ml/docs/introduction#supported_models_in)
upvoted 4 times

  LearnSodas 1 year, 4 months ago

Selected Answer: A

K-means is a good unsupervised learning algorithm to segment a population based on similarity

We can usa K-means directly in BQ, so I think it's "the most efficient way"

Labeling is not a good option since we don't really know what make a customer similar to another, and why dataprep if we can use directly BQ?
upvoted 3 times

  ares81 1 year, 4 months ago

It seems B, to me.
upvoted 1 times

  neochaotic 1 year, 4 months ago

Selected Answer: B

Its B! Dataprep provides Data profiling functionalities

upvoted 1 times

  japoji 1 year, 4 months ago

The question is about commonalities of clients by characteristics, no about characteristics by client. I mean with B you are looking for segments of
the characteristics which define a client. But you need segments of clients defined by characteristics.
upvoted 1 times

  Vedjha 1 year, 4 months ago

Will go for 'A' as it is easy to build model in BQML where data is already present and optimization would be auto in case of K-mean algo
upvoted 4 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 143/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #64 Topic 1

You recently designed and built a custom neural network that uses critical dependencies specific to your organization’s framework. You need to

train the model using a managed training service on Google Cloud. However, the ML framework and related dependencies are not supported by AI

Platform Training. Also, both your model and your data are too large to fit in memory on a single machine. Your ML framework of choice uses the

scheduler, workers, and servers distribution structure. What should you do?

A. Use a built-in model available on AI Platform Training.

B. Build your custom container to run jobs on AI Platform Training.

C. Build your custom containers to run distributed training jobs on AI Platform Training.

D. Reconfigure your code to a ML framework with dependencies that are supported by AI Platform Training.

Correct Answer: D

Community vote distribution

C (100%)

  mil_spyro Highly Voted  1 year, 4 months ago

Selected Answer: C

Answer C. By running your machine learning (ML) training job in a custom container, you can use ML frameworks, non-ML dependencies, libraries,
and binaries that are not otherwise supported on Vertex AI.
Model and your data are too large to fit in memory on a single machine hence distributed training jobs.
https://cloud.google.com/vertex-ai/docs/training/containers-overview
upvoted 5 times

  MultiCloudIronMan Most Recent  4 weeks ago

Selected Answer: C

This allows using external dependences and distributed training will solve the memory issues
upvoted 1 times

  Werner123 2 months ago

Selected Answer: C

Critical dependencies that are not supported -> Custom container

Too large to fit in memory on a single machine -> Distributed
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: C

Went with C
upvoted 1 times

  wish0035 1 year, 4 months ago

Selected Answer: C

ans: C

A, D => too much work.

B => discarded because "model and your data are too large to fit in memory on a single machine"
upvoted 1 times

  ares81 1 year, 4 months ago

C, for me!
upvoted 1 times

  JeanEl 1 year, 4 months ago

Selected Answer: C

I think it's C
upvoted 1 times

  Vedjha 1 year, 4 months ago

Will go for 'C'- Custom containers can address the env limitation and distributed processing will handle the data volume
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 144/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #65 Topic 1

While monitoring your model training’s GPU utilization, you discover that you have a native synchronous implementation. The training data is split

into multiple files. You want to reduce the execution time of your input pipeline. What should you do?

A. Increase the CPU load

B. Add caching to the pipeline

C. Increase the network bandwidth

D. Add parallel interleave to the pipeline

Correct Answer: A

Community vote distribution

D (100%)

  hiromi Highly Voted  1 year, 4 months ago

Selected Answer: D

It's D
https://www.tensorflow.org/guide/data_performance
upvoted 6 times

  MultiCloudIronMan Most Recent  4 weeks ago

Selected Answer: D

Multiple files reduce execution time through papalism

upvoted 1 times

  Werner123 2 months ago

Selected Answer: D

"training data split into multiple files", "reduce the execution time of your input pipeline" -> Parallel interleave
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: D

Went with D
upvoted 1 times

  OzoneReloaded 1 year, 4 months ago

Selected Answer: D

I think it's D
upvoted 2 times

  Vedjha 1 year, 4 months ago

D for me
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 145/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #66 Topic 1

Your data science team is training a PyTorch model for image classification based on a pre-trained RestNet model. You need to perform

hyperparameter tuning to optimize for several parameters. What should you do?

A. Convert the model to a Keras model, and run a Keras Tuner job.

B. Run a hyperparameter tuning job on AI Platform using custom containers.

C. Create a Kuberflow Pipelines instance, and run a hyperparameter tuning job on Katib.

D. Convert the model to a TensorFlow model, and run a hyperparameter tuning job on AI Platform.

Correct Answer: C

Community vote distribution

B (100%)

  OzoneReloaded Highly Voted  1 year, 4 months ago

Selected Answer: B

B because Vertex AI supports custom models hyperparameter tuning

upvoted 7 times

  M25 Most Recent  11 months, 3 weeks ago

Selected Answer: B

Went with B
upvoted 1 times

  John_Pongthorn 1 year, 2 months ago

Selected Answer: B

This is a question sourced from google blog

pre-trained BERT model
https://cloud.google.com/blog/topics/developers-practitioners/pytorch-google-cloud-how-train-and-tune-pytorch-models-vertex-ai
https://cloud.google.com/blog/topics/developers-practitioners/pytorch-google-cloud-how-deploy-pytorch-models-vertex-ai
upvoted 1 times

  John_Pongthorn 1 year, 3 months ago

Selected Answer: B

C:
Don't wast your time to convert to other framework, you can use it on custom container absolutely.
https://cloud.google.com/blog/topics/developers-practitioners/pytorch-google-cloud-how-train-and-tune-pytorch-models-vertex-ai
upvoted 3 times

  John_Pongthorn 1 year, 3 months ago

I insist on B, At the present, it seem like we can use prebuilt container instead of custom container, but none of the 4 choice, so B is the most
likely way out of this question.
upvoted 2 times

  wish0035 1 year, 4 months ago

Selected Answer: B

ans: B

A, D => too much work.

C => not sure why you would complicate so much when Vertex AI has this feature in custom containers.
upvoted 3 times

  Vedjha 1 year, 4 months ago

C seems to correct- https://www.kubeflow.org/docs/components/katib/overview/
upvoted 1 times

  LearnSodas 1 year, 4 months ago

Why use a thrid-party tool when Vertex AI already let you tuning hyperparameters in custom containers? I think it's B
upvoted 4 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 146/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #67 Topic 1

You have a large corpus of written support cases that can be classified into 3 separate categories: Technical Support, Billing Support, or Other

Issues. You need to quickly build, test, and deploy a service that will automatically classify future written requests into one of the categories. How

should you configure the pipeline?

A. Use the Cloud Natural Language API to obtain metadata to classify the incoming cases.

B. Use AutoML Natural Language to build and test a classifier. Deploy the model as a REST API.

C. Use BigQuery ML to build and test a logistic regression model to classify incoming requests. Use BigQuery ML to perform inference.

D. Create a TensorFlow model using Google’s BERT pre-trained model. Build and test a classifier, and deploy the model using Vertex AI.

Correct Answer: B

Community vote distribution

B (100%)

  wish0035 Highly Voted  1 year, 4 months ago

Selected Answer: B

ans: B

A => no, you need customization.

C, B => more work and complexity

B => AutoML is easier and faster and "you need to quickly build, test, and deploy". Also the REST API part fits our use case.
upvoted 7 times

  gscharly Most Recent  2 weeks, 2 days ago

Went with B
upvoted 1 times

  MultiCloudIronMan 4 weeks ago

Selected Answer: B

AutoML is faster and offers the requisite REST API

upvoted 1 times

  Werner123 2 months ago

Selected Answer: B

"quickly build, test and deploy" + custom categories -> AutoML

upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: B

Went with B
upvoted 1 times

  frangm23 1 year ago

Selected Answer: B

I think it's B, but I don't understand why it doesn't suggest to deploy the model on Vertex AI instead of as a REST API.
upvoted 1 times

  enghabeth 1 year, 2 months ago

Selected Answer: B

ans B becouse es more fast

upvoted 1 times

  hiromi 1 year, 4 months ago

Selected Answer: B

B
wish0035 explained
upvoted 3 times

  ares81 1 year, 4 months ago

Quickly: AutoML: B.
upvoted 1 times

  OzoneReloaded 1 year, 4 months ago

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 147/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Selected Answer: B

I think it's B because of the deployment

upvoted 1 times

  Vedjha 1 year, 4 months ago

B will give quick result on classification
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 148/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #68 Topic 1

You need to quickly build and train a model to predict the sentiment of customer reviews with custom categories without writing code. You do not

have enough data to train a model from scratch. The resulting model should have high predictive performance. Which service should you use?

A. AutoML Natural Language

B. Cloud Natural Language API

C. AI Hub pre-made Jupyter Notebooks

D. AI Platform Training built-in algorithms

Correct Answer: A

Community vote distribution

A (68%) B (32%)

  b2aaace 1 week, 2 days ago

Selected Answer: B

AutoML does not have transfer learning capabilities as of now. Given that there are
not enough data to train from scratch, B is the only option that makes sense.
upvoted 1 times

  pinimichele01 1 week ago

https://cloud.google.com/vertex-ai/docs/text-data/sentiment-analysis/prepare-data
upvoted 1 times

  MultiCloudIronMan 4 weeks ago

Selected Answer: A

This suitable job for AutoML, it used transfer learning when there is small data for training.
upvoted 1 times

  LFavero 2 months ago

Selected Answer: A

AutoML Natural Language is designed to work well even with relatively small datasets. It uses transfer learning and other techniques to train
models effectively on limited data, which is crucial since there's enough data to train a model from scratch.
upvoted 3 times

  Krish6488 5 months, 2 weeks ago

Selected Answer: A

Custom models and custom categories and hence AutoML natural language, It would still work with less data
upvoted 1 times

  Sahana_98 6 months ago

Selected Answer: B

NO DATA TO TRAIN THE MODEL FROM SCRACH

upvoted 2 times

  M25 11 months, 3 weeks ago

Selected Answer: A

Went with A
upvoted 1 times

  dfdrin 1 year, 1 month ago

Selected Answer: A

It's A. "Custom categories" means B can't be correct

upvoted 2 times

  tavva_prudhvi 1 year, 1 month ago

Its A, Check this document, https://cloud.google.com/natural-language/automl/docs/beginners-guide
The Natural Language API discovers syntax, entities, and sentiment in text, and classifies text into a predefined set of categories.
upvoted 3 times

  shankalman717 1 year, 2 months ago

Selected Answer: B

If you do not have enough data to train a model from scratch, then it may be more appropriate to use a pre-trained model or a pre-made Jupyter
Notebook.

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 149/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Option B, the Cloud Natural Language API, could still be a viable option if you have access to labeled data for sentiment analysis. The API provides
pre-trained models for sentiment analysis that you can use to classify text. However, if you have custom categories or labels, then you would need
to train a custom model, which may not be feasible with limited data.
upvoted 4 times

  enghabeth 1 year, 2 months ago

Selected Answer: A

https://www.toptal.com/machine-learning/google-nlp-
tutorial#:~:text=Google%20Natural%20Language%20API%20vs.&text=Google%20AutoML%20Natural%20Language%20is,t%20require%20machin
e%20learning%20knowledge.
In this case need custom categories without writing code
upvoted 2 times

  John_Pongthorn 1 year, 3 months ago

Selected Answer: A

Quickly ==> A and B and custom categories + you do not have enough data to train a model (it doesn't mean no data to train) it will probably
have a few samples Let's say 10 samples) as this link https://cloud.google.com/natural-language/automl/docs/beginners-guide#include-enough-
labeled-examples-in-each-category
==> A
upvoted 2 times

  John_Pongthorn 1 year, 3 months ago

Selected Answer: B

Quickly ==> A and B and custom categories + you do not have enough data to train a model (it doesn't mean no data to train) it will probably
have a few samples Let's say 10 samples)
==> B
upvoted 1 times

  John_Pongthorn 1 year, 3 months ago

Sorry, I go with A A A A A A
upvoted 3 times

  John_Pongthorn 1 year, 3 months ago

https://cloud.google.com/natural-language/automl/docs/beginners-guide#include-enough-labeled-examples-in-each-category
upvoted 2 times

  Omi_04040 1 year, 4 months ago

B is the correct answer
AutoML needs data for training and its clearly mentioned we don't have any data.
upvoted 2 times

  tavva_prudhvi 1 year, 1 month ago

they said, "do not have enough data"!!!!
upvoted 1 times

  hiromi 1 year, 4 months ago

Selected Answer: A

A
wish0035 explained
upvoted 1 times

  Pancy 1 year, 4 months ago

B is the correct answer. The API connects with the prebuilt Google NLP model for prediction
upvoted 1 times

  wish0035 1 year, 4 months ago

Selected Answer: A

ans: A

B => "custom categories" so discarded.

C, D => discarded because "without writing code" and "do not have enough data".

A => AutoML can train with very little data ("The bare minimum required by AutoML Natural Language for training is 10 text examples per
category/label"), as seifou says it will probably use transfer learning behind the scenes.
upvoted 4 times

  seifou 1 year, 4 months ago

A is a correct. AutoML Natural Language can be trained on a few data. https://cloud.google.com/natural-language/automl/docs/beginners-
guide#include-enough-labeled-examples-in-each-category. i think using transfer learning but I couldn't find any official document about it
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 150/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #69 Topic 1

You need to build an ML model for a social media application to predict whether a user’s submitted profile photo meets the requirements. The

application will inform the user if the picture meets the requirements. How should you build a model to ensure that the application does not

falsely accept a non-compliant picture?

A. Use AutoML to optimize the model’s recall in order to minimize false negatives.

B. Use AutoML to optimize the model’s F1 score in order to balance the accuracy of false positives and false negatives.

C. Use Vertex AI Workbench user-managed notebooks to build a custom model that has three times as many examples of pictures that meet

the profile photo requirements.

D. Use Vertex AI Workbench user-managed notebooks to build a custom model that has three times as many examples of pictures that do not

meet the profile photo requirements.

Correct Answer: C

Community vote distribution

A (44%) B (32%) D (25%)

  LearnSodas Highly Voted  1 year, 4 months ago

I think it's B, since we want to reduce false positives

upvoted 14 times

  jamesking1103 1 year, 3 months ago

B
yes, A is incorrect as minimize false negatives does not help
upvoted 3 times

  NickHapton 10 months, 1 week ago

a non-compliant profile image = positive
false negatives = didn't alert the non-compliant profile image
so the objective is to minimize false nagatives
upvoted 6 times

  [Removed] Highly Voted  9 months, 1 week ago

Selected Answer: A

The answer is A. The negative event is usually labeled as positive (e.g., fraud detection, customer default prediction, and here non-compliant
picture identification). The question explicitly says, "ensure that the application does not falsely accept a non-compliant picture." So we should
avoid falsely labeling a non-compliant image as compliant (negative).

It is never mentioned in the question that false positives are also a concern. So, recall is better than F1-score for this problem.
upvoted 11 times

  Delphin_8150 Most Recent  1 month ago

Selected Answer: B

Gonna go with B on this one, tricky question but since reducing false positives is the goal here only B fits that requirement
upvoted 1 times

  pinimichele01 2 days, 12 hours ago

a non-compliant profile image = positive
false negatives = didn't alert the non-compliant profile image
so the objective is to minimize false nagatives
upvoted 1 times

  Carlose2108 1 month, 3 weeks ago

Selected Answer: A

I went with A.
upvoted 2 times

  b1a8fae 3 months, 1 week ago

Selected Answer: B

B.
A non-compliant picture is the positive and not the negative. What the question is asking is to decrease the number of false positives ("falsely
labeled as non compliant"), which is achieved through optimizing for precision and not recall. Since C and D sound a bit overkill, I would go for the
one that prioritizes false positives which is B.
upvoted 1 times

  Mickey321 5 months, 2 weeks ago

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 151/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Think is B since we need to optimize for percision

upvoted 1 times

  Mickey321 5 months, 2 weeks ago

Selected Answer: D

Minimize False positive. Hence percision. D is the closest.

upvoted 1 times

  Krish6488 5 months, 2 weeks ago

Selected Answer: B

Optimising for false positives is the goal here which should have been precision. Since precision is not available in options, the next best is F1 score
which is harmonic mean of precision and recall. Although it wont fully satisfy the false positives it atleast wont skew towards recall which is more
false positives that deviates from the goal. Hence B
upvoted 1 times

  MCorsetti 6 months, 1 week ago

Selected Answer: B

We should optimize for precision to minimize false positives, so optimizing for recall should be incorrect. F1 Score will balance both precision and
recall. Both B and C might not necessarily meet the goal
upvoted 1 times

  aberthe 6 months, 3 weeks ago

Selected Answer: A

I vote B
upvoted 1 times

  libo1985 7 months ago

A. Let me explain why. You may have 3 times more examples of images, however, the total number of images can be small, which lead to poor
model performace, so C and D are not the for definite answer. The target is the detection of abnormal photo, so falsely accept a non-compliant
picture is false negative. So A.
upvoted 2 times

  libo1985 7 months ago

  lalala_meow 7 months, 1 week ago

Selected Answer: A

I was thinking B but after reading the comments I think it should be A.

I was thinking a non-compliant profile image = negative but actually it should be the positive case we do want to flag out. So minimising false
negative fits the requirement "ensure that the application does not falsely accept a non-compliant picture."
upvoted 1 times

  SamuelTsch 9 months, 3 weeks ago

Selected Answer: B

B should be correct. It covers not only the recall but also the precision
upvoted 1 times

  Liting 9 months, 3 weeks ago

Selected Answer: A

Optimize recall can help lowering the false negative cases

upvoted 1 times

  tavva_prudhvi 10 months ago

Selected Answer: B

In this scenario, it is important to balance the accuracy of false positives (where a non-compliant picture is accepted) and false negatives (where a
compliant picture is rejected). By optimizing the F1 score, the model will find the best balance between precision and recall, which will help reduce
both false positives and false negatives. This will ensure that the application doesn't falsely accept a non-compliant picture.
upvoted 2 times

  tavva_prudhvi 5 months, 3 weeks ago

Given the requirement not to falsely accept a non-compliant picture, the best option would likely be:

B. Use AutoML to optimize the model’s F1 score in order to balance the accuracy of false positives and false negatives. This ensures that the
model is not overly biased towards accepting or rejecting pictures and provides a balanced approach to handling both types of errors. Howeve
if the priority is strongly weighted towards not accepting non-compliant pictures, then: Dcould be the better approach, as it would likely
improve the model's ability to correctly identify non-compliant pictures.
upvoted 1 times

  Voyager2 10 months, 4 weeks ago

Selected Answer: B

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 152/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

B. Use AutoML to optimize the model’s F1 score in order to balance the accuracy of false positives and false negatives.
Can't be A. The sentence "ensure that the application does not falsely accept a non-compliant picture" stattes that we don't want false positives (so
I'm less concern about false negatives). A false positive will be that one classfied as compliant when is not compliant. I interpret that the positive
class will be that one that "meets the requirements" as stated as well.
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 153/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #70 Topic 1

You lead a data science team at a large international corporation. Most of the models your team trains are large-scale models using high-level

TensorFlow APIs on AI Platform with GPUs. Your team usually takes a few weeks or months to iterate on a new version of a model. You were

recently asked to review your team’s spending. How should you reduce your Google Cloud compute costs without impacting the model’s

performance?

A. Use AI Platform to run distributed training jobs with checkpoints.

B. Use AI Platform to run distributed training jobs without checkpoints.

C. Migrate to training with Kuberflow on Google Kubernetes Engine, and use preemptible VMs with checkpoints.

D. Migrate to training with Kuberflow on Google Kubernetes Engine, and use preemptible VMs without checkpoints.

Correct Answer: D

Community vote distribution

C (78%) A (22%)

  seifou Highly Voted  1 year, 4 months ago

Selected Answer: C

https://cloud.google.com/blog/products/ai-machine-learning/reduce-the-costs-of-ml-workflows-with-preemptible-vms-and-gpus?hl=en
upvoted 9 times

  MultiCloudIronMan Most Recent  4 weeks ago

Selected Answer: C

Pre-emptive VMs are cheaper and checkpoints will enable termination if the result is acceptable
upvoted 2 times

  libo1985 7 months ago

I guess distributed training is not cheap. So C.
upvoted 1 times

  joaquinmenendez 7 months, 1 week ago

C is the best approach because it allows you to reduce your compute costs without impacting the model's performance. Preemptible VMs are
much cheaper than standard VMs, but they can be terminated at any time. By using checkpoints, you can ensure that your training job can be
resumed if a preemptible VM is terminated.
Also, even if training takes days, the checkpoints will prevent lossing the progress if preemtible VM are down.
upvoted 2 times

  Liting 9 months, 3 weeks ago

Selected Answer: C

Optimize cost then should use kubeflow

upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: C

Went with C
upvoted 1 times

  CloudKida 11 months, 3 weeks ago

Selected Answer: C

https://cloud.google.com/ai-platform/prediction/docs/ai-explanations/overview
AI Explanations helps you understand your model's outputs for classification and regression tasks. Whenever you request a prediction on AI
Platform, AI Explanations tells you how much each feature in the data contributed to the predicted result. You can then use this information to
verify that the model is behaving as expected, recognize bias in your models, and get ideas for ways to improve your model and your training data
upvoted 1 times

  _learner_ 12 months ago

Selected Answer: A

preemtible vm are valid for 24hrs. Hence training needs months to complete which is mentioned in question that makes A is answer.
upvoted 2 times

  tavva_prudhvi 1 year, 1 month ago

Additionally, AI Platform's autoscaling feature can automatically adjust the number of resources used based on the workload, further optimizing
costs.
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 154/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

  tavva_prudhvi 1 year, 1 month ago

I think it’s a.
By using distributed training jobs with checkpoints, you can train your models on multiple GPUs simultaneously, which reduces the training
time. Checkpoints allow you to save the progress of your training jobs regularly, so if the training job gets interrupted or fails, you can restart it
from the last checkpoint instead of starting from scratch. This saves time and resources, which reduces costs. Additionally, AI Platform's
autoscaling feature can automatically adjust the number of resources used based on the workload, further optimizing costs.
upvoted 1 times

  John_Pongthorn 1 year, 3 months ago

C is out of date ? AI Platform is Vertex-AI ,so , this is a simple scenario that would accommodate infrastructure for this case.
upvoted 1 times

  ares81 1 year, 3 months ago

Selected Answer: A

It's A.
upvoted 2 times

  hiromi 1 year, 4 months ago

Selected Answer: C

It's seem C
- https://www.kubeflow.org/docs/distributions/gke/pipelines/preemptible/
- https://cloud.google.com/optimization/docs/guide/checkpointing
upvoted 4 times

  ares81 1 year, 4 months ago

"A Preemptible VM (PVM) is a Google Compute Engine (GCE) virtual machine (VM) instance that can be purchased for a steep discount as long as
the customer accepts that the instance will terminate after 24 hours."
This excludes C and D. Checkpoints are needed for long processing, so A.
upvoted 3 times

  neochaotic 1 year, 4 months ago

Selected Answer: C

C - Reduce cost with preemptive instances and add checkpoints to snapshot intermediate results
upvoted 3 times

  LearnSodas 1 year, 4 months ago

Selected Answer: A

Saving checkpoints avoids re-run from scratch

upvoted 2 times

  YangG 1 year, 4 months ago

I think it should be A
https://cloud.google.com/ai-platform/training/docs/overview
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 155/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #71 Topic 1

You need to train a regression model based on a dataset containing 50,000 records that is stored in BigQuery. The data includes a total of 20

categorical and numerical features with a target variable that can include negative values. You need to minimize effort and training time while

maximizing model performance. What approach should you take to train this regression model?

A. Create a custom TensorFlow DNN model

B. Use BQML XGBoost regression to train the model.

C. Use AutoML Tables to train the model without early stopping.

D. Use AutoML Tables to train the model with RMSLE as the optimization objective.

Correct Answer: A

Community vote distribution

B (92%) 8%

  gscharly 2 weeks, 1 day ago

Selected Answer: B

Went with B
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: B

Went with B
upvoted 1 times

  abneural 1 year, 2 months ago

Selected Answer: B

Ans B.
C --> No early stopping means longer training time
D --> RMSLE metric need non-negative Y values
upvoted 3 times

  John_Pongthorn 1 year, 3 months ago

Selected Answer: B

B and C is the most likely because of regression approach, But RMSLE it not allow you to take negative label to train as
https://cloud.google.com/automl-tables/docs/evaluate#evaluation_metrics_for_regression_models

RMSLE: The root-mean-squared logarithmic error metric is similar to RMSE, except that it uses the natural logarithm of the predicted and actual
values plus 1. RMSLE penalizes under-prediction more heavily than over-prediction. It can also be a good metric when you don't want to penalize
differences for large prediction values more heavily than for small prediction values. This metric ranges from zero to infinity; a lower value indicates
a higher quality model.
The RMSLE evaluation metric is returned only if all label and predicted values are non-negative.
upvoted 1 times

  John_Pongthorn 1 year, 3 months ago

Selected Answer: D

BQML XGBoost ==> you have to take sql knowlege to write statement and B didn't mention how to get mx performance. Meanwhile
AutoML you just click and select, click and select, click and select to get it done. and D refers to measurement to get maximizing model
performance. you can minimize effort literally
upvoted 1 times

  John_Pongthorn 1 year, 3 months ago

To john pongthorn , You are wrong 55555
it must be B genuinely
upvoted 2 times

  zeic 1 year, 3 months ago

I recommend option D, Use AutoML Tables to train the model with RMSLE as the optimization objective.

Using AutoML Tables to train the model can be a convenient and efficient way to minimize effort and training time while still maximizing model
performance. In this case, using RMSLE as the optimization objective can be a good choice because it is a good fit for regression models with
negative values in the target variable.
upvoted 2 times

  MithunDesai 1 year, 4 months ago

Selected Answer: B

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 156/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

B is correct
upvoted 3 times

  hiromi 1 year, 4 months ago

Selected Answer: B

Its seen B for me

upvoted 1 times

  seifou 1 year, 4 months ago

Selected Answer: B

B is correct
upvoted 1 times

  ares81 1 year, 4 months ago

It's B.
upvoted 1 times

  YangG 1 year, 4 months ago

B. BigQuery is a keyword for me
upvoted 2 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 157/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #72 Topic 1

You are building a linear model with over 100 input features, all with values between –1 and 1. You suspect that many features are non-

informative. You want to remove the non-informative features from your model while keeping the informative ones in their original form. Which

technique should you use?

A. Use principal component analysis (PCA) to eliminate the least informative features.

B. Use L1 regularization to reduce the coefficients of uninformative features to 0.

C. After building your model, use Shapley values to determine which features are the most informative.

D. Use an iterative dropout technique to identify which features do not degrade the model when removed.

Correct Answer: B

Community vote distribution

B (71%) C (18%) 12%

  hiromi Highly Voted  1 year, 4 months ago

Selected Answer: B

L1 regularization it's good for feature selection

https://www.quora.com/How-does-the-L1-regularization-method-help-in-feature-selection
https://developers.google.com/machine-learning/crash-course/regularization-for-sparsity/l1-regularization
upvoted 6 times

  ailiba 1 year, 2 months ago

but this is not a sparse input vector, just a high dimensional vector where many features are not relevant.
upvoted 1 times

  ares81 Highly Voted  1 year, 4 months ago

A. PCA reconfigures the features, so no.

C. After building your model, so no.
D. Dropout should be in the model and it doesn't tell us which features are informative or not. Big No!
For me, it's B.
upvoted 5 times

  Liting Most Recent  9 months, 3 weeks ago

Selected Answer: B

Went with B
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: B

Went with B
upvoted 1 times

  Antmal 1 year, 1 month ago

Selected Answer: B

L1 regularization penalises weights in proportion to the sum of the absolute value of the weights. L1 regularization helps drive the weights of
irrelevant or barely relevant features to exactly 0. A feature with a weight of 0 is effectively removed from the model.
https://developers.google.com/machine-learning/glossary#L1_regularization
upvoted 1 times

  tavva_prudhvi 1 year, 1 month ago

Its B. See my explanations under the comments why its not C.
upvoted 1 times

  enghabeth 1 year, 2 months ago

Selected Answer: B

it's a best way, becouse you reduce features non relevant in this case non-informatives
upvoted 1 times

  mlgh 1 year, 3 months ago

Selected Answer: C

Answer C:
In the official sample questions, there's a similar question, the explanation is that L! is for reducing overfitting while explainability (shapely) is for
feature selection, hence C.
https://docs.google.com/forms/d/e/1FAIpQLSeYmkCANE81qSBqLW0g2X7RoskBX9yGYQu-m1TtsjMvHabGqg/viewform

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 158/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

upvoted 3 times

  tavva_prudhvi 1 year, 1 month ago

Its wrong. Using Shapley values to determine feature importance can be a useful technique, but it requires building a complete model and can
be computationally expensive, especially with over 100 input features. Additionally, it may not be practical to use this method for every model
iteration or update. On the other hand, L1 regularization can be used during the model building process to effectively reduce the impact of
non-informative features by shrinking their coefficients to 0, making it a more efficient and effective approach.
upvoted 1 times

  mlgh 1 year, 3 months ago

It cannot be A either because PCA modifies the features, and it says you should keep them in their original form.
and D cannot be because again dropout is for generalizing and avoiding overfitting, and it's done on the NN model not on the data.
upvoted 1 times

  behzadsw 1 year, 3 months ago

Selected Answer: A

The features must be removed from the model. They are not removed when doing L1 regularization. PCA is used prior to training.
upvoted 2 times

  jamesking1103 1 year, 3 months ago

should be A
as keeping the informative ones in their original form
upvoted 3 times

  libo1985 7 months ago

How PCA can keep the original form?
upvoted 1 times

  tavva_prudhvi 1 year, 1 month ago

That is a good point. PCA is a technique used to reduce the dimensionality of the dataset by transforming the original features into a new set o
uncorrelated features. This can help to eliminate the least informative features and reduce the computational burden of building a model with
many input features. However, it is important to note that PCA does not necessarily remove the original features from the model, but rather
transforms them into a new set of features. On the other hand, L1 regularization can effectively remove the impact of non-informative features
by setting their coefficients to 0 during the model building process. Therefore, both techniques can be useful for addressing the issue of non-
informative features in a linear model, depending on the specific needs of the problem.
upvoted 1 times

  JeanEl 1 year, 4 months ago

Selected Answer: B

Agree with B
upvoted 2 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 159/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #73 Topic 1

You work for a global footwear retailer and need to predict when an item will be out of stock based on historical inventory data Customer behavior

is highly dynamic since footwear demand is influenced by many different factors. You want to serve models that are trained on all available data,

but track your performance on specific subsets of data before pushing to production. What is the most streamlined and reliable way to perform

this validation?

A. Use then TFX ModelValidator tools to specify performance metrics for production readiness.

B. Use k-fold cross-validation as a validation strategy to ensure that your model is ready for production.

C. Use the last relevant week of data as a validation set to ensure that your model is performing accurately on current data.

D. Use the entire dataset and treat the area under the receiver operating characteristics curve (AUC ROC) as the main metric.

Correct Answer: B

Community vote distribution

A (51%) C (45%) 4%

  John_Pongthorn Highly Voted  1 year, 3 months ago

https://www.tensorflow.org/tfx/guide/evaluator
upvoted 12 times

  hiromi Highly Voted  1 year, 4 months ago

Selected Answer: C

it's seem C for me

B is wrong cuz "Many machine learning techniques don’t work well here due to the sequential nature and temporal correlation of time series. For
example, k-fold cross validation can cause data leakage; models need to be retrained to generate new forecasts"
- https://cloud.google.com/learn/what-is-time-series
upvoted 9 times

  gscharly Most Recent  1 week, 2 days ago

Selected Answer: A

Evaluator TFX lets you evaluate the performance on different subsets of data https://www.tensorflow.org/tfx/guide/evaluator
upvoted 2 times

  pinimichele01 2 weeks, 1 day ago

Selected Answer: A

The Evaluator TFX pipeline component performs deep analysis on the training results for your models, to help you understand how your model
performs on subsets of your data.
upvoted 2 times

  edoo 1 month, 3 weeks ago

Selected Answer: A

I prefer A to C because 1 week of data may be insufficient to generalize the model and could lead to overfitting on the validation subset.
upvoted 3 times

  pmle_nintendo 2 months ago

Selected Answer: C

option C provides a streamlined and reliable approach that focuses on evaluating the model's performance on the most relevant and recent data,
which is essential for predicting out-of-stock events in a dynamic retail setting.
upvoted 1 times

  Mickey321 5 months, 2 weeks ago

Selected Answer: A

Either A or C but C is only last week which is not specific data sets
upvoted 1 times

  AdiML 7 months, 1 week ago

Answer should be C, we are dealing with dynamic data and the "last" data is more relevant to have an idea about the future performance
upvoted 1 times

  joaquinmenendez 7 months, 1 week ago

Selected Answer: C

Option C, because it allows you to track your model's performance on the most *recent* data, which is the most relevant data for predicting
stockout risk. Given that the preferences are dynamic, the most important thing is that the model WORKS correctly with the newest data
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 160/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

  [Removed] 9 months, 1 week ago

Selected Answer: A

The answer is A. Performance on specific subsets of data before pushing to production == TFX ModelValidator with custom performance metrics
for production readiness.

C is wrong because performance in the last relevant week of data != performance on specific subsets of data.
upvoted 1 times

  tavva_prudhvi 9 months ago

The ModelValidator TFX Pipeline Component (Deprecated)
upvoted 2 times

  atlas_lyon 9 months, 2 weeks ago

Selected Answer: A

I will go for A. I don't think the aim of the question is to test if the candidates know whether or not a component is deprecated . Note that
ModelValidator has been fused with Evaluator. So we can imagine, the question would have been updated in recent exams. Evaluator enables
testing on specific subsets with the metrics we want, then indicates to Pusher component to push the new model to production if "model is good
enough". This would make the pipeline quite streamlined (https://www.tensorflow.org/tfx/guide/evaluator)

B: wrong: using historical data, one should watch data leakage

C: wrong: We want to track performance on specific subsets of data (not necessarily the last week) maybe to do some targeting/segmentation ?
who knows.
D: wrong because we want to track performance on specific subsets of data not the entire dataset
upvoted 1 times

  tavva_prudhvi 9 months, 1 week ago

Bro, thats not TFXModelValidator its Evaluator, are both the same?
upvoted 1 times

  MultipleWorkerMirroredStrategy 6 months, 1 week ago

TFXModelValidator is deprecated, but its behaviour can be replicated using the Evaluator object - which is the point he tried to make. See
the docs here: https://www.tensorflow.org/tfx/guide/modelval
upvoted 1 times

  Liting 9 months, 3 weeks ago

Selected Answer: C

Went with C
upvoted 1 times

  Voyager2 10 months, 3 weeks ago

Selected Answer: C

I think that it should be C for the following key point

", but track your performance on specific subsets of data before pushing to production"
So the ask is which subset of data you should use.
upvoted 1 times

  julliet 11 months ago

Could someone explain why A is better option than C? C is correct one in terms of evaluation overall, no doubt. But do we choose TFX because it
understands we are dealing with time series? Or is it the "specific subset" in the Q that makes us thinking we have already chosen the data of last
period and just need to push it into the TFX?
upvoted 1 times

  aw_49 11 months, 2 weeks ago

Selected Answer: C

A is deprecated.. so C
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: A

Went with A
upvoted 2 times

  SergioRubiano 12 months ago

Selected Answer: A

TFX ModelValidator
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 161/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #74 Topic 1

You have deployed a model on Vertex AI for real-time inference. During an online prediction request, you get an “Out of Memory” error. What

should you do?

A. Use batch prediction mode instead of online mode.

B. Send the request again with a smaller batch of instances.

C. Use base64 to encode your data before using it for prediction.

D. Apply for a quota increase for the number of prediction requests.

Correct Answer: C

Community vote distribution

B (96%) 4%

  hiromi Highly Voted  1 year, 4 months ago

Selected Answer: B

B is the answer
429 - Out of Memory
https://cloud.google.com/ai-platform/training/docs/troubleshooting
upvoted 20 times

  tavva_prudhvi 1 year, 1 month ago

Upvote this comment, its the right answer!
upvoted 3 times

  pmle_nintendo Most Recent  2 months ago

Selected Answer: B

By reducing the batch size of instances sent for prediction, you decrease the memory footprint of each request, potentially alleviating the out-of-
memory issue. However, be mindful that excessively reducing the batch size might impact the efficiency of your prediction process.
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: B

Went with B
upvoted 1 times

  tavva_prudhvi 1 year, 1 month ago

B. Send the request again with a smaller batch of instances.

If you are getting an "Out of Memory" error during an online prediction request, it suggests that the amount of data you are sending in each
request is too large and is exceeding the available memory. To resolve this issue, you can try sending the request again with a smaller batch of
instances. This reduces the amount of data being sent in each request and helps avoid the out-of-memory error. If the problem persists, you can
also try increasing the machine type or the number of instances to provide more resources for the prediction service.
upvoted 2 times

  BenMS 1 year, 2 months ago

Selected Answer: C

This question is about prediction not training - and specifically it's about _online_ prediction (aka realtime serving).

All the answers are about batch workloads apart from C.

upvoted 1 times

  BenMS 1 year, 2 months ago

Okay, option D is also about online serving, but the error message indicates a problem for individual predictions, which will not be fixed by
increasing the number of predictions per second.
upvoted 1 times

  Antmal 1 year, 1 month ago

@BenMS this feels like a trick question.... makes on to zone to the word batch. https://cloud.google.com/ai-
platform/training/docs/troubleshooting .... states then when an error occurs with an online prediction request, you usually get an HTTP
status code back from the service. These are some commonly encountered codes and their meaning in the context of online prediction:

429 - Out of Memory

The processing node ran out of memory while running your model. There is no way to increase the memory allocated to prediction nodes at
this time. You can try these things to get your model to run:

Reduce your model size by:

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 162/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

1. Using less precise variables.

2. Quantizing your continuous data.
3. Reducing the size of other input features (using smaller vocab sizes, for example).
4. Send the request again with a smaller batch of instances.
upvoted 1 times

  koakande 1 year, 4 months ago

Selected Answer: B

https://cloud.google.com/ai-platform/training/docs/troubleshooting
upvoted 2 times

  ares81 1 year, 4 months ago

The correct answer is B.
upvoted 1 times

  LearnSodas 1 year, 4 months ago

Selected Answer: B

answer B as reported here: https://cloud.google.com/ai-platform/training/docs/troubleshooting

upvoted 1 times

  Sivaram06 1 year, 4 months ago

Selected Answer: B

https://cloud.google.com/ai-platform/training/docs/troubleshooting#http_status_codes
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 163/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #75 Topic 1

You work at a subscription-based company. You have trained an ensemble of trees and neural networks to predict customer churn, which is the

likelihood that customers will not renew their yearly subscription. The average prediction is a 15% churn rate, but for a particular customer the

model predicts that they are 70% likely to churn. The customer has a product usage history of 30%, is located in New York City, and became a

customer in 1997. You need to explain the difference between the actual prediction, a 70% churn rate, and the average prediction. You want to use

Vertex Explainable AI. What should you do?

A. Train local surrogate models to explain individual predictions.

B. Configure sampled Shapley explanations on Vertex Explainable AI.

C. Configure integrated gradients explanations on Vertex Explainable AI.

D. Measure the effect of each feature as the weight of the feature multiplied by the feature value.

Correct Answer: A

Community vote distribution

B (89%) 11%

  pmle_nintendo 2 months ago

Selected Answer: B

Sampled Shapley explanations offer a more sophisticated and model-agnostic method for understanding feature importance and contributions to
predictions.
upvoted 2 times

  adavid213 5 months, 3 weeks ago

Selected Answer: B

I agree, it seems like B

upvoted 1 times

  NickHapton 10 months, 1 week ago

B
refer:
https://cloud.google.com/vertex-ai/docs/explainable-ai/overview#compare-methods
upvoted 2 times

  M25 11 months, 3 weeks ago

Selected Answer: B

Went with B
upvoted 2 times

  CloudKida 11 months, 3 weeks ago

Selected Answer: B

Assigns credit for the outcome to each feature, and considers different permutations of the features. This method provides a sampling
approximation of exact Shapley values.
shampled shapely recommended Model Type: Non-differentiable models, such as ensembles of trees and neural networks.
https://cloud.google.com/ai-platform/prediction/docs/ai-explanations/overview
upvoted 2 times

  enghabeth 1 year, 2 months ago

Selected Answer: B

Sampled Shapley works well for these models, which are meta-ensembles of trees and neural networks.
https://cloud.google.com/vertex-ai/docs/explainable-ai/overview#sampled-shapley
upvoted 2 times

  John_Pongthorn 1 year, 3 months ago

Selected Answer: B

B is optimal for tabular data Tree or DNN

C integrated gradients explanations on Vertex Explainable AI.

It is used for image.
upvoted 1 times

  John_Pongthorn 1 year, 3 months ago

https://cloud.google.com/vertex-ai/docs/explainable-ai/overview#compare-methods
upvoted 3 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 164/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

  ares81 1 year, 3 months ago

Selected Answer: B

It should be B.
upvoted 1 times

  emma_aic 1 year, 4 months ago

Selected Answer: B

https://cloud.google.com/vertex-ai/docs/explainable-ai/overview#sampled-shapley
upvoted 2 times

  egdiaa 1 year, 4 months ago

B - For sure as per GCP Docs here: https://cloud.google.com/vertex-ai/docs/explainable-ai/overview
upvoted 1 times

  hiromi 1 year, 4 months ago

Selected Answer: B

B
- https://christophm.github.io/interpretable-ml-book/shapley.html
- https://cloud.google.com/vertex-ai/docs/explainable-ai/overview
upvoted 2 times

  JeanEl 1 year, 4 months ago

Selected Answer: B

Agree with B : individual instance prediction + ensemble of trees and neural networks (recommended model types for Sampled Shapley : "Non-
differentiable models, such as ensembles of trees and neural networks " ). Check out the link below :
https://cloud.google.com/vertex-ai/docs/explainable-ai/overview
upvoted 2 times

  YangG 1 year, 4 months ago

Selected Answer: C

it is about a individual instance prediction. I think use integrated gradient method

upvoted 2 times

  ares81 1 year, 4 months ago

It seems D.
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 165/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #76 Topic 1

You are working on a classification problem with time series data. After conducting just a few experiments using random cross-validation, you

achieved an Area Under the Receiver Operating Characteristic Curve (AUC ROC) value of 99% on the training data. You haven’t explored using any

sophisticated algorithms or spent any time on hyperparameter tuning. What should your next step be to identify and fix the problem?

A. Address the model overfitting by using a less complex algorithm and use k-fold cross-validation.

B. Address data leakage by applying nested cross-validation during model training.

C. Address data leakage by removing features highly correlated with the target value.

D. Address the model overfitting by tuning the hyperparameters to reduce the AUC ROC value.

Correct Answer: D

Community vote distribution

B (73%) D (27%)

  pinimichele01 1 week, 1 day ago

Selected Answer: B

random cross-validation
time series data

-> B
upvoted 1 times

  gscharly 2 weeks, 1 day ago

Selected Answer: B

B with nested cross validation.

upvoted 1 times

  pinimichele01 1 week, 2 days ago

can you explain me why?
upvoted 1 times

  Werner123 2 months ago

Selected Answer: B

"99% on training data" -> Data leakage

"random cross-validation" -> Not suitable for time series, use "nested cross-validation"
upvoted 1 times

  pmle_nintendo 2 months ago

Selected Answer: D

Options B and C (Address data leakage by applying nested cross-validation during model training; Address data leakage by removing features
highly correlated with the target value) are less relevant in this scenario because the primary concern appears to be overfitting rather than data
leakage. Data leakage typically involves inadvertent inclusion of information from the test set in the training process, which may lead to overly
optimistic performance metrics. However, there is no indication that data leakage is the cause of the high AUC ROC value in this case.
upvoted 1 times

  pico 5 months, 2 weeks ago

Selected Answer: D

Options A and B also address overfitting, but they involve different strategies. Option A suggests using a less complex algorithm and k-fold cross-
validation. While this can be effective, it might be premature to change the algorithm without first exploring hyperparameter tuning. Option B
suggests addressing data leakage, which is a different issue and may not be the primary cause of overfitting in this scenario.
upvoted 3 times

  humancomputation 7 months ago

Selected Answer: B

B with nested cross validation.

upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: B

Went with B
upvoted 2 times

  BenMS 1 year, 2 months ago

Selected Answer: B

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 166/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Nested cross-validation to reduce data leakage - same as a previous question.

upvoted 1 times

  Alexarr6 1 year, 2 months ago

Selected Answer: B

It`s B
upvoted 1 times

  hiromi 1 year, 4 months ago

Selected Answer: B

B (same question 48)

- https://towardsdatascience.com/time-series-nested-cross-validation-76adba623eb9
upvoted 3 times

  ares81 1 year, 4 months ago

To say overfitting, I should have results on testing data, so it's data leakage. Common sense excludes C, so it's B.
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 167/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #77 Topic 1

You need to execute a batch prediction on 100 million records in a BigQuery table with a custom TensorFlow DNN regressor model, and then store

the predicted results in a BigQuery table. You want to minimize the effort required to build this inference pipeline. What should you do?

A. Import the TensorFlow model with BigQuery ML, and run the ml.predict function.

B. Use the TensorFlow BigQuery reader to load the data, and use the BigQuery API to write the results to BigQuery.

C. Create a Dataflow pipeline to convert the data in BigQuery to TFRecords. Run a batch inference on Vertex AI Prediction, and write the

results to BigQuery.

D. Load the TensorFlow SavedModel in a Dataflow pipeline. Use the BigQuery I/O connector with a custom function to perform the inference

within the pipeline, and write the results to BigQuery.

Correct Answer: A

Community vote distribution

A (94%) 6%

  hiromi Highly Voted  1 year, 4 months ago

Selected Answer: A

A should work with less effort

- https://cloud.google.com/bigquery-ml/docs/making-predictions-with-imported-tensorflow-models#api
- https://towardsdatascience.com/how-to-do-batch-predictions-of-tensorflow-models-directly-in-bigquery-ffa843ebdba6
upvoted 8 times

  etienne0 Most Recent  1 month, 4 weeks ago

Selected Answer: C

Went with C
upvoted 1 times

  pawan94 3 months, 3 weeks ago

Simplest doesn't mean it is the most effecient/optimal. If I follow the Best practices offered by Google for Serving / Inference Pipeline I would go
with Vertex AI predictions. Read More for correct details : https://cloud.google.com/architecture/ml-on-gcp-best-practices#machine-learning-
development
upvoted 2 times

  etienne0 1 month, 4 weeks ago

Agreed, i'll also go with C.
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: A

Went with A
upvoted 2 times

  JamesDoe 1 year, 1 month ago

Selected Answer: A

https://cloud.google.com/bigquery-ml/docs/making-predictions-with-imported-tensorflow-models
upvoted 2 times

  enghabeth 1 year, 2 months ago

Selected Answer: A

for this:
https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-inference-overview
Predict the label, either a numerical value for regression tasks or a categorical value for classification tasks on DNN regresion
upvoted 2 times

  ares81 1 year, 4 months ago

ml.predict: https://cloud.google.com/bigquery-ml/docs/making-predictions-with-imported-tensorflow-models#api --> A
upvoted 1 times

  LearnSodas 1 year, 4 months ago

Selected Answer: A

Answer A as the simplest

upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 168/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #78 Topic 1

You are creating a deep neural network classification model using a dataset with categorical input values. Certain columns have a cardinality

greater than 10,000 unique values. How should you encode these categorical values as input into the model?

A. Convert each categorical value into an integer value.

B. Convert the categorical string data to one-hot hash buckets.

C. Map the categorical variables into a vector of boolean values.

D. Convert each categorical value into a run-length encoded string.

Correct Answer: C

Community vote distribution

B (68%) A (21%) 11%

  etienne0 1 month, 4 weeks ago

Selected Answer: A

went with A
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: B

Went with B
upvoted 1 times

  CloudKida 11 months, 3 weeks ago

Selected Answer: B

https://cloud.google.com/ai-platform/training/docs/algorithms/wide-and-deep
If the column is categorical with high cardinality, then the column is treated with hashing, where the number of hash buckets equals to the square
root of the number of unique values in the column.
upvoted 1 times

  JamesDoe 1 year, 1 month ago

Selected Answer: B

B.
The other options solves nada.
upvoted 1 times

  enghabeth 1 year, 2 months ago

Selected Answer: B

https://towardsdatascience.com/getting-deeper-into-categorical-encodings-for-machine-learning-2312acd347c8
When you have millions uniques values try to do: Hash Encoding
upvoted 1 times

  John_Pongthorn 1 year, 3 months ago

Selected Answer: B

B unconditoinally
https://cloud.google.com/ai-platform/training/docs/algorithms/xgboost#analysis

If the column is categorical with high cardinality, then the column is treated with hashing, where the number of hash buckets equals to the square
root of the number of unique values in the column.
A categorical column is considered to have high cardinality if the number of unique values is greater than the square root of the number of rows in
the dataset.
upvoted 2 times

  MithunDesai 1 year, 4 months ago

Selected Answer: C

I think C as it has 10000 categorical values

upvoted 2 times

  hiromi 1 year, 4 months ago

Selected Answer: B

I think B is correct
Ref.:"
- https://cloud.google.com/ai-platform/training/docs/algorithms/xgboost
- https://stackoverflow.com/questions/26473233/in-preprocessing-data-with-high-cardinality-do-you-hash-first-or-one-hot-encode

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 169/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

upvoted 4 times

  hiromi 1 year, 4 months ago

- https://cloud.google.com/ai-platform/training/docs/algorithms/xgboost#analysis
upvoted 1 times

  mil_spyro 1 year, 4 months ago

Selected Answer: B

Answer is B. When cardinality of the categorical column is very large best choice is binary encoding however it not here hence one-hot hash
option.
upvoted 1 times

  mil_spyro 1 year, 4 months ago

https://www.analyticsvidhya.com/blog/2020/08/types-of-categorical-data-encoding/
upvoted 1 times

  JeanEl 1 year, 4 months ago

Selected Answer: B

Ans : B
upvoted 1 times

  seifou 1 year, 4 months ago

Selected Answer: B

B is correct
upvoted 1 times

  ares81 1 year, 4 months ago

It should be B
upvoted 1 times

  LearnSodas 1 year, 4 months ago

Selected Answer: A

Answer A since with 10.000 unique values one-hot shouldn't be a good solution
https://machinelearningmastery.com/how-to-prepare-categorical-data-for-deep-learning-in-python/
upvoted 3 times

  etienne0 1 month, 4 weeks ago

I agree with A
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 170/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #79 Topic 1

You need to train a natural language model to perform text classification on product descriptions that contain millions of examples and 100,000

unique words. You want to preprocess the words individually so that they can be fed into a recurrent neural network. What should you do?

A. Create a hot-encoding of words, and feed the encodings into your model.

B. Identify word embeddings from a pre-trained model, and use the embeddings in your model.

C. Sort the words by frequency of occurrence, and use the frequencies as the encodings in your model.

D. Assign a numerical value to each word from 1 to 100,000 and feed the values as inputs in your model.

Correct Answer: B

Community vote distribution

B (86%) 14%

  M25 11 months, 3 weeks ago

Selected Answer: B

Went with B
upvoted 1 times

  John_Pongthorn 1 year, 3 months ago

Selected Answer: B

B
https://developers.google.com/machine-learning/guides/text-classification/step-3
https://developers.google.com/machine-learning/guides/text-classification/step-4

i
upvoted 2 times

  ares81 1 year, 3 months ago

Selected Answer: B

Answer is B
upvoted 1 times

  egdiaa 1 year, 4 months ago

Answer is B: According to Google Docs here: - https://developers.google.com/machine-learning/guides/text-classification/ it is a Word Embedding
case
upvoted 4 times

  hiromi 1 year, 4 months ago

Selected Answer: B

B (I'm not sure)

- https://developers.google.com/machine-learning/guides/text-classification/step-3#label_vectorization
- https://developers.google.com/machine-learning/guides/text-classification/step-4
- https://towardsai.net/p/deep-learning/text-classification-with-rnn
- https://towardsdatascience.com/pre-trained-word-embedding-for-text-classification-end2end-approach-5fbf5cd8aead
upvoted 2 times

  hiromi 1 year, 4 months ago

- https://developers.google.com/machine-learning/crash-course/embeddings/translating-to-a-lower-dimensional-space
upvoted 1 times

  LearnSodas 1 year, 4 months ago

Selected Answer: C

Bag of words is a good practice to represent and feed text at a DNN

https://machinelearningmastery.com/gentle-introduction-bag-words-model/
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 171/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #80 Topic 1

You work for an online travel agency that also sells advertising placements on its website to other companies. You have been asked to predict the

most relevant web banner that a user should see next. Security is important to your company. The model latency requirements are 300ms@p99,

the inventory is thousands of web banners, and your exploratory analysis has shown that navigation context is a good predictor. You want to

Implement the simplest solution. How should you configure the prediction pipeline?

A. Embed the client on the website, and then deploy the model on AI Platform Prediction.

B. Embed the client on the website, deploy the gateway on App Engine, deploy the database on Firestore for writing and for reading the user’s

navigation context, and then deploy the model on AI Platform Prediction.

C. Embed the client on the website, deploy the gateway on App Engine, deploy the database on Cloud Bigtable for writing and for reading the

user’s navigation context, and then deploy the model on AI Platform Prediction.

D. Embed the client on the website, deploy the gateway on App Engine, deploy the database on Memorystore for writing and for reading the

user’s navigation context, and then deploy the model on Google Kubernetes Engine.

Correct Answer: C

Community vote distribution

B (58%) C (42%)

  hiromi Highly Voted  1 year, 4 months ago

Selected Answer: C

C (same question 49)

keywords
the inventory is thousands of web banners -> Bigtable
You want to Implement the simplest solution -> AI Platform Prediction
upvoted 7 times

  tavva_prudhvi 9 months ago

Yes, but in that question Option B doesnt have a database.Firestore can handle thousands of web banners, right?
upvoted 1 times

  e707 Highly Voted  1 year ago

Selected Answer: B

Here are some of the reasons why C is not as simple as B:

Cloud Bigtable is a more complex database to set up and manage than Firestore.
Cloud Bigtable is not as secure as Firestore.
Cloud Bigtable is not as well-integrated with other Google Cloud services as Firestore.
Therefore, B is the simpler solution that meets all of the requirements.
upvoted 5 times

  pinimichele01 Most Recent  2 weeks, 6 days ago

Selected Answer: B

see e707
upvoted 1 times

  ludovikush 3 weeks, 3 days ago

Selected Answer: C

as Hiromi said
upvoted 1 times

  ludovikush 2 months ago

Selected Answer: B

I would opt for B as we have requirement of retrieval latency

upvoted 1 times

  Mickey321 5 months, 2 weeks ago

Selected Answer: B

Embed the client on the website, deploy the gateway on App Engine, and then deploy the model on AI Platform Prediction.
upvoted 1 times

  Krish6488 5 months, 2 weeks ago

Selected Answer: B

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 172/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

I would go with Firestore as throughput or latency requirement provided in the question are possible with Firestore and bigTable may be an
overkill. Had the scenario involved super large volumes of data, CBT would have taken precedence
upvoted 1 times

  andresvelasco 7 months, 3 weeks ago

Selected Answer: B

I think B, based on "the simplest solution" consideration.

upvoted 1 times

  tavva_prudhvi 9 months ago

Selected Answer: B

the primary requirement mentioned in the original question is to implement the simplest solution. Firestore is a fully managed, serverless NoSQL
database that can also handle thousands of web banners and dynamically changing user browsing history. It is designed for real-time data
synchronization and can quickly update the most relevant web banner as the user browses different pages of the website.

While Cloud Bigtable offers high performance and scalability, it is more complex to manage and is better suited for large-scale, high-throughput
workloads. Firestore, on the other hand, is easier to implement and maintain, making it a more suitable choice for the simplest solution in this
scenario.
upvoted 2 times

  [Removed] 9 months, 1 week ago

Selected Answer: C

The answer is C for the following reason:

If you need:
- Submillisecond retrieval latency on a limited amount of quickly changing data, retrieved by a few thousand clients, use Memorystore.
- Millisecond retrieval latency on slowly changing data where storage scales automatically, use Datastore.
- Millisecond retrieval latency on dynamically changing data, using a store that can scale linearly with heavy reads and writes, use Bigtable.
Source: https://cloud.google.com/architecture/minimizing-predictive-serving-latency-in-machine-learning#choosing_a_nosql_database

C is better than B because 1) the inventory is thousands of web banners and 2) we expect the user to compare many travel destinations, dates,
hotels, and tariffs during their search process. It means the user's browsing history is dynamically changing, and we need to identify "the most
relevant web banner that a user should see next" => we will be dynamically changing the ad as the user browses different pages of the website.
upvoted 2 times

  andresvelasco 7 months, 3 weeks ago

BTW, the storage solution does not mention web banners, just browsing history.
but what about the "simplest solution" consideration? that wold point into the Datastore direction.
It is true however that the guide you mention recommends firestore for " slowly changing data ", which I wonder why? I expect Firectore to be
able to perfectly handle many updates per second, few updates per user per second.
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: C

Went with C
upvoted 1 times

  lucaluca1982 1 year ago

Selected Answer: B

B for me
upvoted 1 times

  ares81 1 year, 3 months ago

Selected Answer: B

B, for me.
upvoted 2 times

  kn29 1 year, 4 months ago

I think C because of latency requirements.
Cloud BigTable has high latency feature from https://cloud.google.com/bigtable
upvoted 3 times

  tavva_prudhvi 9 months ago

correct that Cloud Bigtable can provide better latency compared to Firestore, especially when dealing with very large datasets and high-
throughput workloads. However, it's important to consider the trade-offs and the specific use case.

For the given scenario, the latency requirements are 300ms@p99, which Firestore can handle effectively for thousands of web banners and
dynamically changing user browsing history. Firestore is designed for real-time data synchronization and can quickly update the most relevant
web banner as the user browses different pages on the website.

While Cloud Bigtable can offer improved latency, it comes with added complexity in terms of management and configuration. If the primary
goal is to implement the simplest solution while meeting the latency requirements, Firestore remains a more suitable choice for this use case.
upvoted 1 times

  ares81 1 year, 4 months ago

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 173/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

I need a DB to store the banners, so no A. We're talking of thousands of banners, so no C. Memorystore calls Redis, and other solutions, so no D.
The answer is B, for me.
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 174/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #81 Topic 1

Your data science team has requested a system that supports scheduled model retraining, Docker containers, and a service that supports

autoscaling and monitoring for online prediction requests. Which platform components should you choose for this system?

A. Vertex AI Pipelines and App Engine

B. Vertex AI Pipelines, Vertex AI Prediction, and Vertex AI Model Monitoring

C. Cloud Composer, BigQuery ML, and Vertex AI Prediction

D. Cloud Composer, Vertex AI Training with custom containers, and App Engine

Correct Answer: A

Community vote distribution

B (78%) D (17%) 6%

  John_Pongthorn Highly Voted  1 year, 3 months ago

Selected Answer: B

The Cloud Compose may be good consideration if you are involved in getting Google Data Engineer Cert
App enging is relevant to Dev-Op Cert

Pls.
if you know a bit about ML Google Cloud, we are preparing to take Google ML Cert, if there is no specifically particular requirement in the
question.
We must emphasize on use of Vertext AI as much as possible.
upvoted 5 times

  rosenr0 Most Recent  11 months, 1 week ago

B. Vertext AI also supports Docker container
https://cloud.google.com/vertex-ai/docs/training/containers-overview
upvoted 2 times

  CloudKida 11 months, 3 weeks ago

Selected Answer: D

A custom container is a Docker image that you create to run your training application. By running your machine learning (ML) training job in a
custom container, you can use ML frameworks, non-ML dependencies, libraries, and binaries that are not otherwise supported on Vertex AI. so we
need vertex ai custom container for docker container. Thus option A and B are omitted .
App Engine allows developers to focus on what they do best: writing code. Based on Compute Engine, the App Engine flexible environment
automatically scales your app up and down while also balancing the load.
Customizable infrastructure - App Engine flexible environment instances are Compute Engine virtual machines, which means that you can take
advantage of custom libraries, use SSH for debugging, and deploy your own Docker containers.
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: B

Went with B
upvoted 2 times

  e707 1 year ago

Selected Answer: D

I think it's D. B does not support Docker containers, does it?

upvoted 1 times

  e707 11 months, 3 weeks ago

I can't change the voting but It's B.
upvoted 2 times

  Sas02 1 year ago

Shouldn't it be A?
https://cloud.google.com/appengine/docs/standard/scheduling-jobs-with-cron-yaml
upvoted 1 times

  behzadsw 1 year, 3 months ago

Selected Answer: B

Vote for B
upvoted 1 times

  hiromi 1 year, 4 months ago

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 175/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Selected Answer: B

Vote for B
upvoted 3 times

  mil_spyro 1 year, 4 months ago

Selected Answer: D

D is the only option that provides scheduled model retraining

upvoted 1 times

  ares81 1 year, 4 months ago

Selected Answer: C

Serve Vertex AI Prediction, but the monitoring in the question is not the one of the answer B. (that is connected to the modeol). The correct answe
is C.
upvoted 1 times

  ares81 1 year, 3 months ago

I changed my mind. It's D.
upvoted 1 times

  LearnSodas 1 year, 4 months ago

Selected Answer: B

Everything is possible on Vetex AI

upvoted 3 times

  mil_spyro 1 year, 4 months ago

Scheduling is not possible without the Cloud Scheduler
https://cloud.google.com/vertex-ai/docs/pipelines/schedule-cloud-scheduler
upvoted 1 times

  hiromi 1 year, 4 months ago

I think Vertex AI Pipeline includes schedule/trigger runs, so my vote is B
upvoted 2 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 176/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #82 Topic 1

You are profiling the performance of your TensorFlow model training time and notice a performance issue caused by inefficiencies in the input

data pipeline for a single 5 terabyte CSV file dataset on Cloud Storage. You need to optimize the input pipeline performance. Which action should

you try first to increase the efficiency of your pipeline?

A. Preprocess the input CSV file into a TFRecord file.

B. Randomly select a 10 gigabyte subset of the data to train your model.

C. Split into multiple CSV files and use a parallel interleave transformation.

D. Set the reshuffle_each_iteration parameter to true in the tf.data.Dataset.shuffle method.

Correct Answer: D

Community vote distribution

C (72%) A (28%)

  pinimichele01 1 week, 2 days ago

Selected Answer: C

Converting a large 5 terabyte CSV file to a TFRecord can be a time-consuming process, and you would still be dealing with a single large file.
upvoted 1 times

  tavva_prudhvi 5 months, 3 weeks ago

Selected Answer: C

While preprocessing the input CSV file into a TFRecord file (Option A) can improve the performance of your input pipeline, it is not the first action
to try in this situation. Converting a large 5 terabyte CSV file to a TFRecord can be a time-consuming process, and you would still be dealing with a
single large file.
upvoted 1 times

  andresvelasco 7 months, 3 weeks ago

Selected Answer: C

i think C based on the consideration: "Which action should you try first ", meaning it should be less impactful to continue using CSV.
upvoted 1 times

  TNT87 11 months ago

Selected Answer: C

https://www.tensorflow.org/guide/data_performance#best_practice_summary
upvoted 2 times

  M25 11 months, 3 weeks ago

Selected Answer: C

Went with C
upvoted 1 times

  e707 1 year ago

Selected Answer: C

Option A, preprocess the input CSV file into a TFRecord file, is not as good because it requires additional processing time. Hence, I think C is the
best choice.
upvoted 1 times

  frangm23 1 year ago

Selected Answer: A

I think it could be A.
https://cloud.google.com/architecture/best-practices-for-ml-performance-cost#preprocess_the_data_once_and_save_it_as_a_tfrecord_file
upvoted 1 times

  [Removed] 1 year ago

Selected Answer: A

Clearly both A and C works here, but I can't find any documentation which suggests C is any better than A.
upvoted 1 times

  Yajnas_arpohc 1 year, 1 month ago

"Which action should you try first" seems to be key -- C seems more intuitive as first step!
A is valid as well (interleave works w TFRecords) & definitely more efficient IMO, but maybe 2nd step!
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 177/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

  shankalman717 1 year, 2 months ago

Selected Answer: A

Option B (randomly selecting a 10 gigabyte subset of the data) could lead to a loss of useful data and may not be representative of the entire
dataset. Option C (splitting into multiple CSV files and using a parallel interleave transformation) may also improve the performance, but may be
more complex to implement and maintain, and may not be as efficient as converting to TFRecord. Option D (setting the reshuffle_each_iteration
parameter to true in the tf.data.Dataset.shuffle method) is not directly related to the input data format and may not provide as significant a
performance improvement as converting to TFRecord.
upvoted 3 times

  tavva_prudhvi 1 year, 1 month ago

Please read this site https://www.tensorflow.org/tutorials/load_data/csv, its simple to implement in the same input pipeline, and we cannot
judge the answer by implementation difficulties!
upvoted 1 times

  SMASL 1 year, 2 months ago

Could anyone be kind to explain why C is preferred over A? My initial guess was on A, but everyone here seems to unanimously prefer C. Is it
because it is not about optimizing I/O performance, but rather the input _pipeline_, which is about processing arrived data within that TF input
pipeline (non-I/O)? I just try to understand here. Thanks for reply in advance!
upvoted 4 times

  tavva_prudhvi 1 year, 1 month ago

Option C, splitting into multiple CSV files and using a parallel interleave transformation, could improve the pipeline efficiency by allowing
multiple workers to read the data in parallel.
upvoted 1 times

  [Removed] 1 year ago

yes but how is it more efficient than converting to a TFRecord file?
upvoted 1 times

  tavva_prudhvi 9 months, 1 week ago

A TFRecord file is a binary file format that is used to store TensorFlow data. It is more efficient than a CSV file because it can be read more
quickly and it takes up less space. However, it is still a large file, and it would take a long time to read it into memory. Splitting the file int
multiple smaller files would reduce the amount of time it takes to read the files into memory, and it would also make it easier to
parallelize the reading process.
upvoted 1 times

  enghabeth 1 year, 2 months ago

Selected Answer: C

split data it's best way in my opinion

upvoted 1 times

  hiromi 1 year, 4 months ago

Selected Answer: C

C
Keywords -> You need to optimize the input pipeline performance
https://www.tensorflow.org/guide/data_performance
upvoted 2 times

  hiromi 1 year, 4 months ago

- https://www.tensorflow.org/tutorials/load_data/csv
upvoted 1 times

  ares81 1 year, 4 months ago

Selected Answer: C

It seems C, to me.
upvoted 1 times

  LearnSodas 1 year, 4 months ago

Selected Answer: C

Splitting the file we can use parallel interleave to parallel load the datasets
https://www.tensorflow.org/guide/data_performance
upvoted 2 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 178/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #83 Topic 1

You need to design an architecture that serves asynchronous predictions to determine whether a particular mission-critical machine part will fail.

Your system collects data from multiple sensors from the machine. You want to build a model that will predict a failure in the next N minutes,

given the average of each sensor’s data from the past 12 hours. How should you design the architecture?

A. 1. HTTP requests are sent by the sensors to your ML model, which is deployed as a microservice and exposes a REST API for prediction

2. Your application queries a Vertex AI endpoint where you deployed your model.

3. Responses are received by the caller application as soon as the model produces the prediction.

B. 1. Events are sent by the sensors to Pub/Sub, consumed in real time, and processed by a Dataflow stream processing pipeline.

2. The pipeline invokes the model for prediction and sends the predictions to another Pub/Sub topic.

3. Pub/Sub messages containing predictions are then consumed by a downstream system for monitoring.

C. 1. Export your data to Cloud Storage using Dataflow.

2. Submit a Vertex AI batch prediction job that uses your trained model in Cloud Storage to perform scoring on the preprocessed data.

3. Export the batch prediction job outputs from Cloud Storage and import them into Cloud SQL.

D. 1. Export the data to Cloud Storage using the BigQuery command-line tool

2. Submit a Vertex AI batch prediction job that uses your trained model in Cloud Storage to perform scoring on the preprocessed data.

3. Export the batch prediction job outputs from Cloud Storage and import them into BigQuery.

Correct Answer: C

Community vote distribution

B (76%) C (18%) 6%

  andreabrunelli 5 days, 8 hours ago

Selected Answer: C

The simplest solution that can support an eventual batch prediction (triggered by pub/sub) even the semi-real time prediction.
upvoted 1 times

  Werner123 2 months ago

Selected Answer: B

Needs to be real time not batch. The data needs to be processed as a stream since multiple sensors are used. pawan94 is right.
https://cloud.google.com/architecture/minimizing-predictive-serving-latency-in-machine-learning#online_real-time_prediction
upvoted 1 times

  pawan94 3 months, 3 weeks ago

Here you go to the answer provided by google itself. I don't understand why would people use batch prediction when they its sensor data and
online prediction is as well asynchronous.
https://cloud.google.com/architecture/minimizing-predictive-serving-latency-in-machine-
learning#offline_batch_prediction:~:text=Predictive%20maintenance%3A%20asynchronously%20predicting%20whether%20a%20particular%20ma
hine%20part%20will%20fail%20in%20the%20next%20N%20minutes%2C%20given%20the%20averages%20of%20the%20sensor%27s%20data%20
n%20the%20past%2030%20minutes.
upvoted 2 times

  vale_76_na_xxx 4 months, 1 week ago

it refers to asincronou prediction I' go with C
upvoted 1 times

  rosenr0 11 months, 1 week ago

Selected Answer: D

D.
I think we have to query data from the past 12 hours for the prediction, and that's the reason for exporting the data to Cloud Storage.
Also, the predictions don't have to be real time.
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: B

Went with B
upvoted 1 times

  JamesDoe 1 year, 1 month ago

Selected Answer: B

B.
Online prediction, and need decoupling with Pub/Sub to make it asynchronous. Option A is synchronous.

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 179/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

upvoted 2 times

  tavva_prudhvi 1 year, 1 month ago

Option C may not be the best choice for this use case because it involves using a batch prediction job in Vertex AI to perform scoring on
preprocessed data. Batch prediction jobs are more suitable for scenarios where data is processed in batches, and results can be generated over a
longer period, such as daily or weekly.

In this use case, the requirement is to predict whether a machine part will fail in the next N minutes, given the average of each sensor's data from
the past 12 hours. Therefore, real-time processing and prediction are necessary. Batch prediction jobs are not designed for real-time processing,
and there may be a delay in receiving the predictions.

Option B, on the other hand, is designed for real-time processing and prediction. The Pub/Sub and Dataflow components allow for real-time
processing of incoming sensor data, and the trained ML model can be invoked for prediction in real-time. This makes it ideal for mission-critical
applications where timely predictions are essential.
upvoted 2 times

  tavva_prudhvi 1 year, 1 month ago

Its B, This architecture leverages the strengths of Pub/Sub, Dataflow, and Vertex AI. The system collects data from multiple sensors, which sends
events to Pub/Sub. Pub/Sub can handle the high volume of incoming data and can buffer messages to prevent data loss. A Dataflow stream
processing pipeline can consume the events in real-time and perform feature engineering and data preprocessing before invoking the trained ML
model for prediction. The predictions are then sent to another Pub/Sub topic, where they can be consumed by a downstream system for
monitoring.

This architecture is highly scalable, resilient, and efficient, as it can handle large volumes of data and perform real-time processing and prediction.
also separates concerns by using a separate pipeline for data processing and another for prediction, making it easier to maintain and modify the
system.
upvoted 1 times

  enghabeth 1 year, 2 months ago

Selected Answer: B

if you have sensors inyour architecture.. you need pub/sub...

upvoted 1 times

  John_Pongthorn 1 year, 3 months ago

Selected Answer: B

B is most likely . if you search asynchronous on this page. it appears in

the question wants to focus on online prediction with asynchronous mode.
https://cloud.google.com/architecture/minimizing-predictive-serving-latency-in-machine-learning#online_real-time_prediction
and the question is the same as what has been explained in this section obviously. it is as below.
Predictive maintenance: asynchronously predicting whether a particular machine part will fail in the next N minutes, given the averages of the
sensor's data in the past 30 minutes.

afte that, you can take a closer look at figure3 and read what it try to describle

C and D it is the offline solution but you opt to use different tools.
https://cloud.google.com/architecture/minimizing-predictive-serving-latency-in-machine-learning#offline_batch_prediction
upvoted 2 times

  John_Pongthorn 1 year, 3 months ago

Asycnchromoue preciction = Batch prediction
https://cloud.google.com/architecture/minimizing-predictive-serving-latency-in-machine-learning#offline_batch_prediction
upvoted 1 times

  John_Pongthorn 1 year, 3 months ago

Asynchronous prediction = Batch prediction, It is incorrect because I am reckless to read this article, Admin can delete my shitty comment
above. I was mistaken
upvoted 1 times

  hiromi 1 year, 4 months ago

Selected Answer: B

B
"Predictive maintenance: asynchronously predicting whether a particular machine part will fail in the next N minutes, given the averages of the
sensor's data in the past 30 minutes."
https://cloud.google.com/architecture/minimizing-predictive-serving-latency-in-machine-learning#offline_batch_prediction
upvoted 3 times

  hiromi 1 year, 4 months ago

- https://cloud.google.com/architecture/minimizing-predictive-serving-latency-in-machine-learning#online_real-time_prediction
upvoted 1 times

  mil_spyro 1 year, 4 months ago

Selected Answer: B

Answer is B.
https://cloud.google.com/architecture/minimizing-predictive-serving-latency-in-machine-learning#handling_dynamic_real-time_features
upvoted 1 times

  ares81 1 year, 4 months ago

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 180/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Selected Answer: C

C, for me.
upvoted 1 times

  seifou 1 year, 4 months ago

Selected Answer: C

ref : https://cloud.google.com/architecture/minimizing-predictive-serving-latency-in-machine-learning#offline_batch_prediction
upvoted 1 times

  LearnSodas 1 year, 4 months ago

Selected Answer: B

Answer B
I though a lot, since we don't need a real-time response in this scenario, but other options have this problems:
A - Http request for sensors data is not a good idea
C - What's the point of use Cloud Sql to store the results?
D - No BQ mentioned, so why use bq SDK to move data?
upvoted 2 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 181/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #84 Topic 1

Your company manages an application that aggregates news articles from many different online sources and sends them to users. You need to

build a recommendation model that will suggest articles to readers that are similar to the articles they are currently reading. Which approach

should you use?

A. Create a collaborative filtering system that recommends articles to a user based on the user’s past behavior.

B. Encode all articles into vectors using word2vec, and build a model that returns articles based on vector similarity.

C. Build a logistic regression model for each user that predicts whether an article should be recommended to a user.

D. Manually label a few hundred articles, and then train an SVM classifier based on the manually classified articles that categorizes additional

articles into their respective categories.

Correct Answer: A

Community vote distribution

B (89%) 11%

  gscharly 2 weeks, 1 day ago

Selected Answer: B

Went with B
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: B

Went with B
upvoted 1 times

  TNT87 1 year ago

Selected Answer: B

https://cloud.google.com/blog/topics/developers-practitioners/meet-ais-multitool-vector-embeddings
Answer B
upvoted 3 times

  JamesDoe 1 year, 1 month ago

Selected Answer: B

Currently reading is the keyword here. Going to need B for that, A won't work since it would be based on e.g. all reading history and not the article
currently being read.
upvoted 2 times

  tavva_prudhvi 1 year, 1 month ago

Option A, creating a collaborative filtering system, may not be ideal for this use case because it relies on user behavior data, which may not be
available or sufficient for new users or for users who have not interacted with the system much.

Option C, building a logistic regression model for each user, may not be scalable because it requires building a separate model for each user, whic
can become difficult to manage as the number of users increases.

Option D, manually labeling articles and training an SVM classifier, may not be as effective as the word2vec approach because it relies on manual
labeling, which can be time-consuming and may not capture the full semantic meaning of the articles. Additionally, SVMs may not be as effective
as neural network-based approaches like word2vec for capturing complex relationships between words and articles.
upvoted 2 times

  JJJJim 1 year, 4 months ago

Selected Answer: B

word2vec can easily get similar articles, but the collaborative filter isn't sure well.
upvoted 1 times

  hiromi 1 year, 4 months ago

Selected Answer: B

B
https://towardsdatascience.com/recommending-news-articles-based-on-already-read-articles-627695221fe8
upvoted 3 times

  mil_spyro 1 year, 4 months ago

Selected Answer: B

Answer B
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 182/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

  ares81 1 year, 4 months ago

Selected Answer: B

Collaborative looks at the other users, knowledge-based at me.Answer B is the most knowledge based, among these.
upvoted 2 times

  YangG 1 year, 4 months ago

Selected Answer: A

"similar to they are currently reading". it should be a collaborative filtering problem

upvoted 2 times

  taxberg 1 year, 2 months ago

No, Collaborative filtering recommends articles other people read that are not necessarily similar to what the person is reading. These people
are chosen on being similar to the person in question, not the article.
upvoted 3 times

  LearnSodas 1 year, 4 months ago

Selected Answer: B

Answer B
upvoted 2 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 183/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #85 Topic 1

You work for a large social network service provider whose users post articles and discuss news. Millions of comments are posted online each

day, and more than 200 human moderators constantly review comments and flag those that are inappropriate. Your team is building an ML model

to help human moderators check content on the platform. The model scores each comment and flags suspicious comments to be reviewed by a

human. Which metric(s) should you use to monitor the model’s performance?

A. Number of messages flagged by the model per minute

B. Number of messages flagged by the model per minute confirmed as being inappropriate by humans.

C. Precision and recall estimates based on a random sample of 0.1% of raw messages each minute sent to a human for review

D. Precision and recall estimates based on a sample of messages flagged by the model as potentially inappropriate each minute

Correct Answer: B

Community vote distribution

D (51%) C (24%) B (24%)

  hiromi Highly Voted  1 year, 4 months ago

Selected Answer: D

D
- https://cloud.google.com/natural-language/automl/docs/beginners-guide
- https://cloud.google.com/vertex-ai/docs/text-data/classification/evaluate-model
upvoted 10 times

  andresvelasco Highly Voted  7 months, 3 weeks ago

Selected Answer: C

A. Number of messages flagged by the model per minute => NO, no measure of model performance
B. Number of messages flagged by the model per minute confirmed as being inappropriate by humans.=> DONT THINK SO, because we need the
total number of messages (flagged?)
C. Precision and recall estimates based on a random sample of 0.1% of raw messages each minute sent to a human for review. => I think YES,
because as I understand it that would be based on a sample of ALL messages not just the ones that have been flagged.
D. Precision and recall estimates based on a sample of messages flagged by the model as potentially inappropriate each minute => I think NO,
because the sample includes only flagged messages, meaning positives, so you cannot really measure recall.
upvoted 7 times

  tavva_prudhvi 5 months, 3 weeks ago

The main issue with option C is that it uses a random sample of only 0.1% of raw messages. This random sample might not contain enough
examples of inappropriate content to accurately assess the model's performance. Since the majority of messages on the platform are likely
appropriate, the random sample may not capture enough inappropriate content for a robust evaluation.
upvoted 2 times

  ludovikush Most Recent  1 month ago

Selected Answer: D

Precision and recall are critical metrics for evaluating the performance of classification models, especially in contexts where both the accuracy of
positive predictions (precision) and the ability to identify all positive instances (recall) are important. In this case:
Precision (the proportion of messages flagged by the model as inappropriate that were actually inappropriate) helps ensure that the model
minimizes the burden on human moderators by not flagging too many false positives, which could overwhelm them.
Recall (the proportion of actual inappropriate messages that were correctly flagged by the model) ensures that the model is effective at catching a
many inappropriate messages as possible, reducing the risk of harmful content being missed.
upvoted 2 times

  etienne0 1 month, 2 weeks ago

Selected Answer: C

I go with C
upvoted 1 times

  pmle_nintendo 2 months ago

Selected Answer: D

Let's consider below hypothetical scenario:

Total number of comments per minute: 10,000

Comments actually inappropriate: 500
If we use a random sample of only 0.1% of raw messages (10 comments) for evaluation, there's a high chance that this small sample may not
include any or only a few inappropriate comments. As a result, the precision and recall estimates based on this sample may be skewed, leading to
unreliable assessments of the model's performance. Thus, C is ruled out.
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 184/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

  Werner123 2 months ago

Selected Answer: D

C does not make sense to me since it is a very small random sample. It is also only messages that have been sent to humans for review meaning
that there is bias in that result set.
upvoted 1 times

  b1a8fae 3 months, 4 weeks ago

D only caring for observations flagged by the model means we don't control for false negatives (approved actually inappropriate messages). B
seems like a better option to me: the wording confuses me a bit, but I understand it as the true and false positives (human flagged comments and
their modelled label)
upvoted 1 times

  Mickey321 5 months, 2 weeks ago

Selected Answer: D

In favor of D
upvoted 1 times

  pico 5 months, 2 weeks ago

Selected Answer: C

Given the context of content moderation, a balanced approach is often preferred. Therefore, option C, precision and recall estimates based on a
random sample of raw messages, is a good choice. It provides a holistic view of the model's performance, taking into account both false positives
(precision) and false negatives (recall), and it reflects how well the model is handling the entire dataset.
upvoted 1 times

  Krish6488 5 months, 2 weeks ago

Selected Answer: D

A --> Conveys model'a activity levels but nit accuracy

B --> Accuracy to some extend but wont give full picture as it does not account False negatives
C --> Using a random sample of the raw messages allows you to estimate precision and recall for the overall activity, not just the flagged content.
D --> Specifically measures on the subset of data that it flagged

Both C & D work well in this case, but the specificity is higher in option D and hence will go with D
upvoted 1 times

  MultipleWorkerMirroredStrategy 6 months ago

Selected Answer: C

Google Cloud used to have a service called "continuous evaluation", where human labelers classify data to establish a ground truth. Thinking along
those lines, the answer is C as it's the logical equivalent of that service.

https://cloud.google.com/ai-platform/prediction/docs/continuous-evaluation
upvoted 1 times

  PST21 10 months ago

Question is to measure model performance so has to be precision & recall , hence D.
upvoted 2 times

  Voyager2 10 months, 3 weeks ago

Selected Answer: D

D. Precision and recall estimates based on a sample of messages flagged by the model as potentially inappropriate each minute
You will need precision and recall to identify fals positives and false negatives. A very small random sample doesn't help specially becasue probably
you will have skewed data. So D.
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: D

Went with D
upvoted 1 times

  lucaluca1982 1 year ago

Selected Answer: D

we need to monitor the model, so D

upvoted 1 times

  Sas02 1 year ago

Why not C as we need to look at both precision & recall and both B,D miss that and capture only True/ False +ves? It would be very helpful if
someone can explain.
upvoted 2 times

  Vikraju 1 year ago

D
I go for option D because B is just another way of looking at only precision and completely ignoring recall
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 185/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #86 Topic 1

You are a lead ML engineer at a retail company. You want to track and manage ML metadata in a centralized way so that your team can have

reproducible experiments by generating artifacts. Which management solution should you recommend to your team?

A. Store your tf.logging data in BigQuery.

B. Manage all relational entities in the Hive Metastore.

C. Store all ML metadata in Google Cloud’s operations suite.

D. Manage your ML workflows with Vertex ML Metadata.

Correct Answer: C

Community vote distribution

D (100%)

  hiromi Highly Voted  1 year, 4 months ago

Selected Answer: D

D
- https://cloud.google.com/vertex-ai/docs/ml-metadata/tracking
upvoted 5 times

  SubbuJV Most Recent  2 months, 2 weeks ago

Selected Answer: D

Selected Answer: D
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: D

Went with D
upvoted 1 times

  enghabeth 1 year, 2 months ago

Selected Answer: D

totally D
upvoted 2 times

  ares81 1 year, 4 months ago

Selected Answer: D

This should be an easy D.

upvoted 2 times

  LearnSodas 1 year, 4 months ago

Selected Answer: D

https://codelabs.developers.google.com/vertex-mlmd-pipelines?hl=id&authuser=6#0
upvoted 3 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 186/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #87 Topic 1

You have been given a dataset with sales predictions based on your company’s marketing activities. The data is structured and stored in BigQuery,

and has been carefully managed by a team of data analysts. You need to prepare a report providing insights into the predictive capabilities of the

data. You were asked to run several ML models with different levels of sophistication, including simple models and multilayered neural networks.

You only have a few hours to gather the results of your experiments. Which Google Cloud tools should you use to complete this task in the most

efficient and self-serviced way?

A. Use BigQuery ML to run several regression models, and analyze their performance.

B. Read the data from BigQuery using Dataproc, and run several models using SparkML.

C. Use Vertex AI Workbench user-managed notebooks with scikit-learn code for a variety of ML algorithms and performance metrics.

D. Train a custom TensorFlow model with Vertex AI, reading the data from BigQuery featuring a variety of ML algorithms.

Correct Answer: A

Community vote distribution

A (67%) C (33%)

  Werner123 2 months ago

Selected Answer: A

You only have a few hours. The dataset is in BQ. The dataset is carefully managed. BQML it is.
upvoted 1 times

  ludovikush 2 months ago

Selected Answer: C

I agree with pico answer

upvoted 1 times

  iieva 3 months, 1 week ago

Selected Answer: A

All deep neural networks are multilayered neural networks, but not all multilayered neural networks are necessarily deep. The term "deep" is used
to emphasize the depth of the network in the context of having many hidden layers, which has been shown to be effective for learning hierarchical
representations of complex patterns in data.

Hence BQ allows creation of DNNs (https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-dnn-models) it

should be A.
upvoted 2 times

  pico 7 months, 2 weeks ago

Selected Answer: C

Vertex AI Workbench provides user-managed notebooks that allow you to run Python code using libraries like scikit-learn, TensorFlow, and more.
You can easily connect to your BigQuery dataset from within the notebook, extract the data, and perform data preprocessing.
You can then experiment with different ML algorithms available in scikit-learn and track performance metrics.
It provides flexibility, control, and the ability to run various models quickly.
upvoted 3 times

  pico 7 months, 2 weeks ago

Not A.
BigQuery ML is convenient for quick model training and predictions within BigQuery itself, but it has limitations in terms of the variety of ML
algorithms and customization options it offers.
It may not be the best choice for running more sophisticated ML models or extensive experiments.

and It only said regression model

upvoted 2 times

  MTTTT 8 months, 3 weeks ago

Selected Answer: C

I think multilayered neural networks need to be trained externally from BQ ML as stated here:
https://cloud.google.com/bigquery/docs/bqml-introduction
upvoted 1 times

  MTTTT 8 months, 3 weeks ago

nvm you can import DNN in BQ
upvoted 1 times

  SamuelTsch 9 months, 3 weeks ago

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 187/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Selected Answer: A

According to the question, you don't have enough time. B, C, D need much more time to set up the service, or write the code. Also the data is
already in BigQuery. BQML should be the fastest way. Besides, BQML supports xgboost, NN models as well.
upvoted 2 times

  Jarek7 10 months, 1 week ago

Selected Answer: C

The question says that "You were asked to run several ML models with different levels of sophistication, including simple models and multilayered
neural networks" BQ ML doesn't allow this. BQ ML provides only simple regression/categorization models. It is not about training these
"sophisticated models" but only run them, so you can easly do it within few hours with notebooks.
upvoted 2 times

  M25 11 months, 3 weeks ago

Selected Answer: A

Went with A
upvoted 2 times

  lucaluca1982 1 year ago

Selected Answer: C

C allows to execute more complex tests

upvoted 1 times

  tavva_prudhvi 9 months, 1 week ago

However, given the limited time constraint of a few hours and the fact that the data is already stored in BigQuery, option A is more efficient.

BigQuery ML allows you to quickly create and evaluate ML models directly within BigQuery, without the need to move the data or set up a
separate environment. This makes it faster and more convenient for running several regression models and analyzing their performance within
the given time frame.
upvoted 1 times

  FherRO 1 year, 2 months ago

Selected Answer: A

B,C,D requires coding. You only have some hours, A is the fastest.
upvoted 2 times

  hiromi 1 year, 4 months ago

Selected Answer: A

I vote for A
upvoted 3 times

  ares81 1 year, 4 months ago

Selected Answer: A

It's A.
upvoted 2 times

  LearnSodas 1 year, 4 months ago

Selected Answer: A

I will go with A, since it's the fastest way to do it. Custom training in Vertex AI requires time and writing scikit-learn models in notebooks too
upvoted 2 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 188/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #88 Topic 1

You are an ML engineer at a bank. You have developed a binary classification model using AutoML Tables to predict whether a customer will make

loan payments on time. The output is used to approve or reject loan requests. One customer’s loan request has been rejected by your model, and

the bank’s risks department is asking you to provide the reasons that contributed to the model’s decision. What should you do?

A. Use local feature importance from the predictions.

B. Use the correlation with target values in the data summary page.

C. Use the feature importance percentages in the model evaluation page.

D. Vary features independently to identify the threshold per feature that changes the classification.

Correct Answer: C

Community vote distribution

A (92%) 8%

  shankalman717 Highly Voted  1 year, 2 months ago

Selected Answer: A

To access local feature importance in AutoML Tables, you can use the "Explain" feature, which shows the contribution of each feature to the
prediction for a specific example. This will help you identify the most important features that contributed to the loan request being rejected.

Option B, using the correlation with target values in the data summary page, may not provide the most accurate explanation as it looks at the
overall correlation between the features and target variable, rather than the contribution of each feature to a specific prediction.

Option C, using the feature importance percentages in the model evaluation page, may not provide a sufficient explanation for the specific
prediction, as it shows the importance of each feature across all predictions, rather than for a specific prediction.

Option D, varying features independently to identify the threshold per feature that changes the classification, is not recommended as it can be
time-consuming and does not provide a clear explanation for why the loan request was rejected
upvoted 9 times

  M25 Most Recent  11 months, 3 weeks ago

Selected Answer: A

Went with A
upvoted 1 times

  JamesDoe 1 year, 1 month ago

Selected Answer: A

Local, not global since they asked about one specific prediction.
Check out that section on this blog: https://cloud.google.com/blog/products/ai-machine-learning/explaining-model-predictions-structured-data/
Cool stuff!
upvoted 4 times

  tavva_prudhvi 1 year, 1 month ago

Local feature importance can provide insight into the specific features that contributed to the model's decision for a particular instance. This
information can be used to explain the model's decision to the bank's risks department and potentially identify any issues or biases in the model.
Option B is not applicable as the loan request has already been rejected by the model, so there are no target values to correlate with. Option C ma
provide some insights, but local feature importance will provide more specific information for this particular instance. Option D involves changing
the features, which may not be feasible or ethical in this case.
upvoted 2 times

  Yajnas_arpohc 1 year, 1 month ago

C seems more apt & exhaustive to explain for bank's purpose; it uses various Feature Attribution methods.
A explains how much each feature added to or subtracted from the result as compared with the baseline prediction score; indicative, but less
optimal for the purpose at hand
upvoted 1 times

  enghabeth 1 year, 2 months ago

Selected Answer: A

it's think is more easy to explain with feature importance

upvoted 2 times

  ares81 1 year, 3 months ago

Selected Answer: C

AutoML Tables tells you how much each feature impacts this model. It is shown in the Feature importance graph. The values are provided as a
percentage for each feature: the higher the percentage, the more strongly that feature impacted model training. C.
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 189/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

  hiromi 1 year, 4 months ago

Selected Answer: A

A
https://cloud.google.com/automl-tables/docs/explain#local
upvoted 2 times

  mil_spyro 1 year, 4 months ago

Selected Answer: A

Agree with A.
"Local feature importance gives you visibility into how the individual features in a specific prediction request affected the resulting prediction.
Each local feature importance value shows only how much the feature affected the prediction for that row. To understand the overall behavior of
the model, use model feature importance."
https://cloud.google.com/automl-tables/docs/explain#local
upvoted 4 times

  ares81 1 year, 4 months ago

Selected Answer: C

"Feature importance: AutoML Tables tells you how much each feature impacts this model. It is shown in the Feature importance graph. The values
are provided as a percentage for each feature: the higher the percentage, the more strongly that feature impacted model training." The correct
answer is C.
upvoted 1 times

  tavva_prudhvi 1 year, 1 month ago

Can you tell the feature importance for a specific prediction?
upvoted 2 times

  YangG 1 year, 4 months ago

Selected Answer: A

Should be A. it is specific to this example. so use local feature importance

upvoted 2 times

  ares81 1 year, 4 months ago

It seems C, to me.
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 190/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #89 Topic 1

You work for a magazine distributor and need to build a model that predicts which customers will renew their subscriptions for the upcoming year.

Using your company’s historical data as your training set, you created a TensorFlow model and deployed it to AI Platform. You need to determine

which customer attribute has the most predictive power for each prediction served by the model. What should you do?

A. Use AI Platform notebooks to perform a Lasso regression analysis on your model, which will eliminate features that do not provide a strong

signal.

B. Stream prediction results to BigQuery. Use BigQuery’s CORR(X1, X2) function to calculate the Pearson correlation coefficient between each

feature and the target variable.

C. Use the AI Explanations feature on AI Platform. Submit each prediction request with the ‘explain’ keyword to retrieve feature attributions

using the sampled Shapley method.

D. Use the What-If tool in Google Cloud to determine how your model will perform when individual features are excluded. Rank the feature

importance in order of those that caused the most significant performance drop when removed from the model.

Correct Answer: D

Community vote distribution

C (93%) 7%

  SubbuJV 2 months, 2 weeks ago

Selected Answer: C

Vertex AI Explanations went with C

upvoted 3 times

  M25 11 months, 3 weeks ago

Selected Answer: C

Went with C
upvoted 2 times

  CloudKida 11 months, 3 weeks ago

Selected Answer: C

  Yajnas_arpohc 1 year, 1 month ago

Key words in question "for each prediction served" - that make its C
D is more of a broader analysis activity
upvoted 2 times

  John_Pongthorn 1 year, 3 months ago

Selected Answer: C

You have to use a flagship native service as much as possible.

upvoted 1 times

  hiromi 1 year, 4 months ago

Selected Answer: D

I vote for D
- https://www.tensorflow.org/tensorboard/what_if_tool
- https://pair-code.github.io/what-if-tool/
- https://medium.com/red-buffer/tensorflows-what-if-tool-c52914ea215c
C is wrong cuz AI Explanation dosen't work for TensorFlow models (https://cloud.google.com/vertex-ai/docs/explainable-ai/overview)
upvoted 1 times

  mil_spyro 1 year, 4 months ago

This is from the doc you provided:
"Feature attribution is supported for all types of models (both AutoML and custom-trained), frameworks (TensorFlow, scikit, XGBoost), and
modalities (images, text, tabular, video)."
https://cloud.google.com/vertex-ai/docs/explainable-ai/overview#supported_model_types_2
upvoted 2 times

  hiromi 1 year, 4 months ago

Sorry, I mean Shapley method doesn't support TensorFlow Models
See https://cloud.google.com/vertex-ai/docs/explainable-ai/overview#compare-methods
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 191/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

upvoted 1 times

  hiromi 1 year, 4 months ago

Sorry, i tink C is the answer. Tks
upvoted 2 times

  hiromi 1 year, 4 months ago

Sorry, i tink C is the answer
upvoted 2 times

  mil_spyro 1 year, 4 months ago

Selected Answer: C

AI Explanations provides feature attributions using the sampled Shapley method, which can help you understand how much each feature
contributes to a model's prediction.
upvoted 3 times

  ares81 1 year, 4 months ago

Selected Answer: C

AI Explanations helps you understand your model's outputs for classification and regression tasks. Whenever you request a prediction on AI
Platform, AI Explanations tells you how much each feature in the data contributed to the predicted result." It's C!
upvoted 2 times

  JeanEl 1 year, 4 months ago

Selected Answer: C

Agree with C
upvoted 2 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 192/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #90 Topic 1

You are working on a binary classification ML algorithm that detects whether an image of a classified scanned document contains a company’s

logo. In the dataset, 96% of examples don’t have the logo, so the dataset is very skewed. Which metrics would give you the most confidence in

your model?

A. F-score where recall is weighed more than precision

B. RMSE

C. F1 score

D. F-score where precision is weighed more than recall

Correct Answer: A

Community vote distribution

A (61%) C (25%) 14%

  tavva_prudhvi Highly Voted  1 year, 1 month ago

Selected Answer: A

In this scenario, the dataset is highly imbalanced, where most of the examples do not have the company's logo. Therefore, accuracy could be
misleading as the model can have high accuracy by simply predicting that all images do not have the logo. F1 score is a good metric to consider in
such cases, as it takes both precision and recall into account. However, since the dataset is highly skewed, we should weigh recall more than
precision to ensure that the model is correctly identifying the images that do have the logo. Therefore, F-score where recall is weighed more than
precision is the best metric to evaluate the performance of the model in this scenario. Option B (RMSE) is not applicable to this classification
problem, and option D (F-score where precision is weighed more than recall) is not suitable for highly skewed datasets.
upvoted 8 times

  gscharly Most Recent  1 week, 2 days ago

Selected Answer: C

I'd go with C. We don't know which option (less FP or less FN) is most important for business with the provided information, so we should seek a
balance.
upvoted 2 times

  etienne0 1 month, 4 weeks ago

Selected Answer: D

I think it's D.
upvoted 1 times

  guilhermebutzke 3 months, 1 week ago

Selected Answer: D

I think it could be D, but the question does not provide enough information for this.

I have this feeling: If 4% have the logo, we are looking just for these ones, right? So, the 'quality of TP,' that's it, the precision, could be more
interesting because we want a model that we can rely on. So, when this model Predict a image with logo, we`ll be more certain about it.

If we use recall, for example, a model with 99% recall has more chance of getting the logo, but we won't have quality in this. This model could
suggest a lot of images without logo. It is better to use any ML than this...
upvoted 2 times

  pico 5 months, 2 weeks ago

Selected Answer: C

both option A (F-score with higher weight on recall) and option C (F1 score) could be suitable depending on the specific priorities and
requirements of your classification problem. If missing a company's logo is considered more problematic than having false alarms, then option A
might be preferred. The F1 score (option C) is a balanced measure that considers both precision and recall, which is generally a good choice in
imbalanced datasets.

Ultimately, the choice between option A and option C depends on the specific goals and constraints of your application.
upvoted 1 times

  Mickey321 5 months, 2 weeks ago

Selected Answer: C

The question not have clear preference for recall or precision hence going with C
upvoted 2 times

  Jarek7 10 months ago

Selected Answer: C

Yeah, I know - everyone is voting A... To be honest I still don't understand why are you more affraid of these few FNs than FPs. In my opinion they
are exactly same evil. Every documantation says that F1 is great on skewed data. You should use weighted F1 when you know what is worse for you
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 193/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

FNs or FPs. In this case we have no any hints on it, so I would stay with ordinary F1.
upvoted 4 times

  Voyager2 10 months, 3 weeks ago

Selected Answer: A

A. F-score where recall is weighed more than precision

Even a model which always says that don't have the logo will have a good precision because is the most common. What we need is improve recall.
upvoted 2 times

  M25 11 months, 3 weeks ago

Selected Answer: A

Went with A
upvoted 2 times

  guilhermebutzke 1 year, 2 months ago

Selected Answer: A

I think is A. The positive Class is the minority. So, it's more important to correctly detect logos in all images that have logo (recall) than correctly
detect logos in images classified with logos (precision).
upvoted 3 times

  enghabeth 1 year, 2 months ago

Selected Answer: A

I think is D becouse u try detect TP then it's more important recall than precision
upvoted 3 times

  ares81 1 year, 3 months ago

Selected Answer: A

Answer A is my choice.
upvoted 1 times

  Abhijat 1 year, 4 months ago

A is correct
upvoted 1 times

  Dataspire 1 year, 4 months ago

Selected Answer: A

less logo images. Recall should be weighted more

upvoted 3 times

  kn29 1 year, 4 months ago

I think A.
If D were the answer, the threshold would be set higher to increase PRECISION, but the low percentage of positives (4%) would allow RECALL to be
extremely low. If the percentage of positives is low, greater weight should be given to RECALL.
https://medium.com/@douglaspsteen/beyond-the-f-1-score-a-look-at-the-f-beta-score-3743ac2ef6e3
upvoted 4 times

  egdiaa 1 year, 4 months ago

Answer C: F1-Score is the best for imbalanced Data like this case: https://stephenallwright.com/imbalanced-data-metric/
upvoted 4 times

  hiromi 1 year, 4 months ago

Selected Answer: D

D (not sure)
upvoted 2 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 194/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #91 Topic 1

You work on the data science team for a multinational beverage company. You need to develop an ML model to predict the company’s profitability

for a new line of naturally flavored bottled waters in different locations. You are provided with historical data that includes product types, product

sales volumes, expenses, and profits for all regions. What should you use as the input and output for your model?

A. Use latitude, longitude, and product type as features. Use profit as model output.

B. Use latitude, longitude, and product type as features. Use revenue and expenses as model outputs.

C. Use product type and the feature cross of latitude with longitude, followed by binning, as features. Use profit as model output.

D. Use product type and the feature cross of latitude with longitude, followed by binning, as features. Use revenue and expenses as model

outputs.

Correct Answer: C

Community vote distribution

C (80%) 10% 10%

  hiromi Highly Voted  1 year, 4 months ago

Selected Answer: C

C (not sure)
- https://developers.google.com/machine-learning/crash-course/feature-crosses/video-lecture
- https://developers.google.com/machine-learning/crash-course/regularization-for-sparsity/l1-regularization
upvoted 6 times

  sonicclasps Most Recent  3 months ago

Selected Answer: D

the question asks to predict profitability , not profit.

profitability is calculated from revenue and expenses.
the correct answer is D
upvoted 2 times

  andresvelasco 7 months, 1 week ago

Most people have chosen C but:
Does it make sense to do binning after feature cross? Isnt it the other way around?
upvoted 2 times

  maukaba 6 months, 1 week ago

I agree it is the way around. See example:
https://developers.google.com/machine-learning/crash-course/feature-crosses/check-your-understanding
One feature cross: [binned latitude X binned longitude X binned roomsPerPerson]
upvoted 1 times

  maukaba 6 months, 1 week ago

In the following examples it is said that it is not possible to cross lat & lon without bucketized them before since continous values must be
converted into discrete before crossing :
https://www.kaggle.com/code/vikramtiwari/feature-crosses-tensorflow-mlcc
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: C

Went with C
upvoted 1 times

  tavva_prudhvi 1 year, 1 month ago

Selected Answer: C

Option C is the best option because it takes into account both the product type and location, which can affect profitability. Binning the feature
cross of latitude and longitude can help capture the nonlinear relationship between location and profitability, and using profit as the model output
is appropriate because it's the target variable we want to predict.
upvoted 3 times

  abneural 1 year, 2 months ago

Selected Answer: C

Agreeing with hiromi, taxberg

Feature cross and bucket lat and lon on geographical problems
upvoted 1 times

  enghabeth 1 year, 2 months ago

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 195/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Selected Answer: C

your output is profit

upvoted 1 times

  taxberg 1 year, 2 months ago

Selected Answer: C

Must be C. Always feature cross lat and lon on geographical problems. Also, D can not be right as we do not have revenue in the dataset.
upvoted 2 times

  mil_spyro 1 year, 4 months ago

Selected Answer: A

In this case, there is no need to reduce the number of unique values in the latitude and longitude variables, and binning would reduce information
from those features hence A
upvoted 2 times

  hiromi 1 year, 4 months ago

Why no need to reduce?
upvoted 1 times

  mil_spyro 1 year, 4 months ago

binding and crossing*
upvoted 1 times

  ares81 1 year, 4 months ago

Selected Answer: C

Easy C.
upvoted 2 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 196/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #92 Topic 1

You work as an ML engineer at a social media company, and you are developing a visual filter for users’ profile photos. This requires you to train

an ML model to detect bounding boxes around human faces. You want to use this filter in your company’s iOS-based mobile phone application.

You want to minimize code development and want the model to be optimized for inference on mobile phones. What should you do?

A. Train a model using AutoML Vision and use the “export for Core ML” option.

B. Train a model using AutoML Vision and use the “export for Coral” option.

C. Train a model using AutoML Vision and use the “export for TensorFlow.js” option.

D. Train a custom TensorFlow model and convert it to TensorFlow Lite (TFLite).

Correct Answer: A

Community vote distribution

A (73%) D (18%) 9%

  pshemol Highly Voted  1 year, 4 months ago

Selected Answer: A

https://cloud.google.com/vision/automl/docs/export-edge
Core ML -> iOS and macOS
Coral -> Edge TPU-based device
TensorFlow.js -> web
upvoted 12 times

  maukaba 6 months, 1 week ago

Updated Vertex AI link:https://cloud.google.com/vertex-ai/docs/export/export-edge-model

Trained AutoML Edge image classification models can be exported in the following formats:
TF Lite - to run your model on edge or mobile devices.
Edge TPU TF Lite - to run your model on Edge TPU devices.
Container - to run on a Docker container.
Core ML - to run your model on iOS and macOS devices.
Tensorflow.js - to run your model in the browser and in Node.js.
upvoted 2 times

  M25 Most Recent  11 months, 3 weeks ago

Selected Answer: A

Went with A
upvoted 1 times

  TNT87 1 year ago

https://developer.apple.com/documentation/coreml
Answer A
upvoted 1 times

  TNT87 1 year ago

https://cloud.google.com/vertex-ai/docs/export/export-edge-model#export
upvoted 1 times

  shankalman717 1 year, 2 months ago

Selected Answer: B

AutoML Vision is a service provided by Google Cloud that enables developers to train and deploy machine learning models for image recognition
tasks, such as detecting bounding boxes around human faces. The “export for Coral” option generates a TFLite model that is optimized for running
on Coral, a hardware platform specifically designed for edge computing, including mobile devices. The TFLite model is also compatible with iOS-
based mobile phone applications, making it easy to integrate into the company's app.
upvoted 1 times

  tavva_prudhvi 1 year, 1 month ago

While Coral can be used to optimize machine learning models for inference on edge devices, it's not the best option for an iOS-based mobile
phone application.
upvoted 1 times

  shankalman717 1 year, 2 months ago

Selected Answer: B

Option A, using AutoML Vision and exporting for Core ML, is also a viable option. Core ML is Apple's machine learning framework that is optimized
for iOS-based devices. However, using this option would require more development effort to integrate the Core ML model into the app.

Option C, using AutoML Vision and exporting for TensorFlow.js, is not the best option for this scenario since it is optimized for running on web
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 197/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

browsers, not mobile devices.

Option D, training a custom TensorFlow model and converting it to TFLite, would require significant development effort and time compared to
using AutoML Vision. AutoML Vision provides a simple and efficient way to train and deploy machine learning models without requiring expertise
in machine learning.
upvoted 1 times

  tavva_prudhvi 1 year, 1 month ago

Excellent reasoning for C,D but Core ML is Apple's machine learning framework that is optimized for iOS-based devices, and exporting the
model to Core ML format can help minimize inference time on mobile devices.
upvoted 1 times

  enghabeth 1 year, 2 months ago

Selected Answer: D

https://www.tensorflow.org/lite
https://medium.com/the-ai-team/step-into-on-device-inference-with-tensorflow-lite-a47242ba9130
upvoted 1 times

  tavva_prudhvi 1 year, 1 month ago

Its wrong, While TFLite is a mobile-optimized version of TensorFlow, it requires more code development than using AutoML Vision and
exporting for Core ML. Therefore, it's not the best option for minimizing code development time.
upvoted 1 times

  ares81 1 year, 3 months ago

Selected Answer: A

I correct myself: it's A!

upvoted 1 times

  egdiaa 1 year, 4 months ago

A indeed as described here: https://cloud.google.com/vision/automl/docs/export-edge
upvoted 1 times

  hiromi 1 year, 4 months ago

Selected Answer: A

A
"You want to minimize code development" -> AutoML
- https://cloud.google.com/vision/automl/docs/tflite-coreml-ios-tutorial
- https://cloud.google.com/vertex-ai/docs/training-overview#image
upvoted 2 times

  mil_spyro 1 year, 4 months ago

Selected Answer: D

TensorFlow Lite is a lightweight version of TensorFlow that is optimized for mobile and embedded devices, making it an ideal choice for use in an
iOS-based mobile phone application.
upvoted 2 times

  ares81 1 year, 4 months ago

Selected Answer: D

I find no answer is 100% right, but D seems closer to the truth.

upvoted 1 times

  ares81 1 year, 3 months ago

It's A.
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 198/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #93 Topic 1

You have been asked to build a model using a dataset that is stored in a medium-sized (~10 GB) BigQuery table. You need to quickly determine

whether this data is suitable for model development. You want to create a one-time report that includes both informative visualizations of data

distributions and more sophisticated statistical analyses to share with other ML engineers on your team. You require maximum flexibility to create

your report. What should you do?

A. Use Vertex AI Workbench user-managed notebooks to generate the report.

B. Use the Google Data Studio to create the report.

C. Use the output from TensorFlow Data Validation on Dataflow to generate the report.

D. Use Dataprep to create the report.

Correct Answer: C

Community vote distribution

A (53%) B (31%) Other

  gscharly 2 weeks, 1 day ago

Selected Answer: A

More Flexbility
upvoted 2 times

  SubbuJV 2 months, 2 weeks ago

Selected Answer: A

More Flexbility
upvoted 1 times

  Mickey321 5 months, 2 weeks ago

Selected Answer: A

Max flexibility
upvoted 1 times

  Krish6488 5 months, 2 weeks ago

Selected Answer: A

Looker studio is good too but it does not give the same depth in statistical analysis of the data as using matplotlib, seaborn etc gives on a
notebook. So Jupyterlab notebook a.k.a Vertex AI workbench for me
upvoted 1 times

  MCorsetti 6 months, 1 week ago

Selected Answer: A

A as it is a one off report with maximum flexibility. Dont need a dashboard unless being reused
upvoted 1 times

  lalala_meow 7 months, 1 week ago

Selected Answer: A

A for more sophisticated statistical analyses and maximum flexibility

upvoted 1 times

  andresvelasco 7 months, 3 weeks ago

Selected Answer: A

A (AI workbench): "sophisticated"

upvoted 1 times

  [Removed] 9 months, 1 week ago

Selected Answer: A

The answer is A.

B is wrong because you need more sophisticated statistical analyses and maximum flexibility to create your report.
upvoted 1 times

  NickHapton 9 months, 3 weeks ago

1. one- time
2. flexibility
go for A
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 199/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

  SamuelTsch 9 months, 3 weeks ago

Selected Answer: A

went with A, because of max. flexibility

upvoted 1 times

  PST21 10 months ago

Correct Answer A . While Google Data Studio (Option B) is a powerful data visualization and reporting tool, it might not provide the same level of
flexibility and sophistication for statistical analyses compared to a notebook environment.
upvoted 2 times

  CloudKida 11 months, 3 weeks ago

Selected Answer: C

TensorFlow Data Validation(TFDV) can compute descriptive statistics that provide a quick overview of the data in terms of the features that are
present and the shapes of their value distributions. Tools such as Facets Overview can provide a succinct visualization of these statistics for easy
browsing.
upvoted 2 times

  lucaluca1982 1 year ago

Selected Answer: A

A. Flexibility is the key.

upvoted 1 times

  frangm23 1 year ago

Selected Answer: B

I think has to be B. One of the keys is that it says quickly and BQ makes it very easy to export the query into Looker Studio. The other one is that
there's maximum flexibility within the needs for this case (informative visualizations + statistical analysis), as we can develop and write custom
formulas.
A feels like overkill to use a Deep Learning VM Image to only describe data and perform some analysis.
C also feels overkill to start developping a neural net for that.
D although you may use Dataprep for this, it is less suited than A
upvoted 2 times

  kucuk_kagan 1 year, 1 month ago

Selected Answer: A

A seçeneğini öneriyorum çünkü Vertex AI Workbench kullanıcı yönetimli not defterleri (user-managed notebooks), BigQuery tablosundaki verilerin
analiz edilmesi ve görselleştirilmesi için daha fazla esneklik ve özelleştirme sağlar. Python kütüphaneleri (pandas, matplotlib, seaborn vb.)
kullanarak, veri dağılımlarının görselleştirmelerini oluşturabilir ve daha karmaşık istatistiksel analizler gerçekleştirebilirsiniz.
upvoted 1 times

  JamesDoe 1 year, 1 month ago

Selected Answer: A

I think it's A.One time report containing real datasets STATISTICAL measurements to tell if the data is suitable for model development. Target
audience is also other ML engineers.
Getting a whole report of exactly this with TFDV/Facets is like two lines of code: https://www.tensorflow.org/tfx/data_validation/get_started

A similar data studio report for this would take lots of time and work, and there would be no benefit from reuseability since task was a one-time
job.
upvoted 2 times

  JamesDoe 1 year, 1 month ago

Depending on your definition of "You require maximum flexibility to create your report.", it could very well be B too.
upvoted 1 times

  hghdh5454 1 year, 1 month ago

Selected Answer: A

A. Use Vertex AI Workbench user-managed notebooks to generate the report.

By using Vertex AI Workbench user-managed notebooks, you can create a one-time report that includes both informative visualizations and
sophisticated statistical analyses. The notebooks provide maximum flexibility for data analysis, as they allow you to use a wide range of libraries
and tools to create visualizations, perform statistical tests, and share your findings with your team. You can easily connect to the BigQuery table
from the notebook and perform the necessary data exploration and analysis.
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 200/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #94 Topic 1

You work on an operations team at an international company that manages a large fleet of on-premises servers located in few data centers

around the world. Your team collects monitoring data from the servers, including CPU/memory consumption. When an incident occurs on a server,

your team is responsible for fixing it. Incident data has not been properly labeled yet. Your management team wants you to build a predictive

maintenance solution that uses monitoring data from the VMs to detect potential failures and then alerts the service desk team. What should you

do first?

A. Train a time-series model to predict the machines’ performance values. Configure an alert if a machine’s actual performance values

significantly differ from the predicted performance values.

B. Implement a simple heuristic (e.g., based on z-score) to label the machines’ historical performance data. Train a model to predict

anomalies based on this labeled dataset.

C. Develop a simple heuristic (e.g., based on z-score) to label the machines’ historical performance data. Test this heuristic in a production

environment.

D. Hire a team of qualified analysts to review and label the machines’ historical performance data. Train a model based on this manually

labeled dataset.

Correct Answer: D

Community vote distribution

C (70%) B (20%) 10%

  mil_spyro Highly Voted  1 year, 4 months ago

Selected Answer: C

I would go for C, it is important to have a clear understanding of what constitutes a potential failure and how to detect it. A heuristic based on z-
scores, for example, can be used to flag instances where the performance values of a machine significantly differ from its historical baseline.
upvoted 8 times

  pico Most Recent  7 months, 2 weeks ago

Selected Answer: B

NOT C: when you have tested something directly in production??

Option B involves labeling historical data using heuristics, which can be a practical and quick way to get started.
upvoted 1 times

  razmik 10 months, 1 week ago

Selected Answer: C

Vote for C
Reference: Rule #1: Don’t be afraid to launch a product without machine learning.
https://developers.google.com/machine-learning/guides/rules-of-ml#before_machine_learning
upvoted 1 times

  julliet 10 months, 2 weeks ago

Selected Answer: C

simple solution goes first, more sophisticated one -- after

upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: C

Went with C
upvoted 2 times

  TNT87 1 year ago

Answer C
Same as Question number 139
upvoted 2 times

  studybrew 1 year, 1 month ago

What’s the difference between B and C?
upvoted 3 times

  julliet 11 months, 1 week ago

in B you are labeling with heuristics and still develop a model
in C you follow the ML-rules to adopt simple solution first and later decide if, how and where you need more sophisticated model

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 201/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

upvoted 1 times

  tavva_prudhvi 1 year, 1 month ago

Selected Answer: C

This is the best option for this scenario because it's quick and inexpensive, and it can provide a baseline for labeling the historical performance
data. Once we have labeled data, we can train a predictive maintenance model to detect potential failures and alert the service desk team.
upvoted 1 times

  osaka_monkey 1 year, 1 month ago

why not D ?
upvoted 1 times

  tavva_prudhvi 1 year, 1 month ago

While this approach may result in accurate labeling of the historical performance data, it can be time-consuming and expensive.
upvoted 1 times

  John_Pongthorn 1 year, 3 months ago

Selected Answer: C

https://www.geeksforgeeks.org/z-score-for-outlier-detection-python/
upvoted 1 times

  hiromi 1 year, 4 months ago

Selected Answer: B

I vote for B
- https://developers.google.com/machine-learning/guides/rules-of-ml
upvoted 3 times

  hiromi 1 year, 4 months ago

Sorry, I think C is the answer
upvoted 4 times

  jamesking1103 1 year, 3 months ago

C.
we need detect potential failures
upvoted 1 times

  guilhermebutzke 1 year, 2 months ago

Why not B? The team wants to create a model to predict failure. So, the z-score is used to label the failure scenario, for then to use this to
build a prediction model.
upvoted 2 times

  tavva_prudhvi 1 year, 1 month ago

While this approach may work in some cases, it's not guaranteed to work well in this scenario because we don't know the nature of
the anomalies that we want to detect. Therefore, it may be difficult to come up with a heuristic that can accurately label the historical
performance data.
upvoted 2 times

  evanfebrianto 11 months, 2 weeks ago

But testing the heuristic in a production environment without training a model could be risky and lead to false alarms or misses.
upvoted 1 times

  ares81 1 year, 4 months ago

Selected Answer: A

This is really tricky, but it could be A.

upvoted 2 times

  ares81 1 year, 3 months ago

Thinking about it, it should be C.
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 202/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #95 Topic 1

You are developing an ML model that uses sliced frames from video feed and creates bounding boxes around specific objects. You want to

automate the following steps in your training pipeline: ingestion and preprocessing of data in Cloud Storage, followed by training and

hyperparameter tuning of the object model using Vertex AI jobs, and finally deploying the model to an endpoint. You want to orchestrate the entire

pipeline with minimal cluster management. What approach should you use?

A. Use Kubeflow Pipelines on Google Kubernetes Engine.

B. Use Vertex AI Pipelines with TensorFlow Extended (TFX) SDK.

C. Use Vertex AI Pipelines with Kubeflow Pipelines SDK.

D. Use Cloud Composer for the orchestration.

Correct Answer: A

Community vote distribution

C (64%) B (36%)

  qaz09 Highly Voted  1 year, 2 months ago

From:
https://cloud.google.com/vertex-ai/docs/pipelines/build-pipeline#sdk
"1. If you use TensorFlow in an ML workflow that processes terabytes of structured data or text data, we recommend that you build your pipeline
using TFX.
To learn more about building a TFX pipeline, follow the TFX getting started tutorials.
To learn more about using Vertex AI Pipelines to run a TFX pipeline, follow the TFX on Google Cloud tutorials.
2. For other use cases, we recommend that you build your pipeline using the Kubeflow Pipelines SDK. By building a pipeline with the Kubeflow
Pipelines SDK, you can implement your workflow by building custom components or reusing prebuilt components, such as the Google Cloud
Pipeline Components. Google Cloud Pipeline Components make it easier to use Vertex AI services like AutoML in your pipeline."

So I guess since it is image processing, it should be Kubeflow - answer C (TFX is for structured or text data).
upvoted 11 times

  pinimichele01 Most Recent  1 week, 2 days ago

Selected Answer: C

If you use TensorFlow in an ML workflow that processes terabytes of structured data or text data, should use TFX. For other use cases, Kubeflow.
Link: https://cloud.google.com/vertex-ai/docs/pipelines/build-pipelin
upvoted 1 times

  Ulule 1 month, 3 weeks ago

Selected Answer: B

Overall, using Vertex AI Pipelines with TensorFlow Extended (TFX) SDK provides a comprehensive and managed solution for handling video feed
data in an ML pipeline, while minimizing the need for manual infrastructure management and maximizing scalability and efficiency.
upvoted 2 times

  vale_76_na_xxx 4 months, 3 weeks ago

I vote for be. the question stated that the minumumn clustering management is required, and I found this on the google study guide" Vertex AI
Pipelines automatically provisions underlying infrastructure and managed it for you"
upvoted 1 times

  Mickey321 5 months, 2 weeks ago

Selected Answer: B

minimal managment
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: C

Went with C
upvoted 1 times

  tavva_prudhvi 1 year, 1 month ago

Selected Answer: C

Vertex AI Pipelines with Kubeflow Pipelines SDK provides a high-level interface for building end-to-end machine learning pipelines. This approach
allows for easy integration with Google Cloud services, including Cloud Storage for data ingestion and preprocessing, Vertex AI for training and
hyperparameter tuning, and deployment to an endpoint. The Kubeflow Pipelines SDK also allows for easy orchestration of the entire pipeline,
minimizing cluster management.
upvoted 1 times

  neochaotic 1 year, 1 month ago

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 203/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Answer is C. If you use TensorFlow in an ML workflow that processes terabytes of structured data or text data, should use TFX. For other use cases,
Kubeflow. Link: https://cloud.google.com/vertex-ai/docs/pipelines/build-pipeline
upvoted 1 times

  TNT87 1 year, 1 month ago

Selected Answer: C

Answer C...
https://cloud.google.com/architecture/ml-on-gcp-best-practices#use-vertex-pipelines
upvoted 1 times

  TNT87 1 year ago

https://cloud.google.com/architecture/ml-on-gcp-best-practices#use-kubeflow-pipelines-sdk-for-flexible-pipeline-construction
upvoted 1 times

  John_Pongthorn 1 year, 3 months ago

Google want you to use core native service Pipeline, Don't overthink but , need to think it over.
The anwser is in
https://cloud.google.com/architecture/architecture-for-mlops-using-tfx-kubeflow-pipelines-and-cloud-build
https://cloud.google.com/vertex-ai/docs/pipelines
upvoted 2 times

  zeic 1 year, 3 months ago

Selected Answer: B

" You want to orchestrate the entire pipeline with minimal cluster management"
because of that it cant be answer c
i vote for b, becausse there is no cluster management with vertex ai
upvoted 2 times

  TNT87 11 months, 3 weeks ago

nope, not correct
upvoted 1 times

  hiromi 1 year, 4 months ago

Selected Answer: C

C
"If you are using other frameworks, we recommend using Kubeflow Pipeline, which is very flexible and allows you to use simple code to construct
pipelines. Kubeflow Pipeline also provides Google Cloud pipeline components such as Vertex AI AutoML."
(Journey to Become a Google Cloud Machine Learning Engineer: Build the mind and hand of a Google Certified ML professional)
upvoted 3 times

  mil_spyro 1 year, 4 months ago

Selected Answer: C

vote C
upvoted 1 times

  mil_spyro 1 year, 4 months ago

I vote C.
https://cloud.google.com/vertex-ai/docs/pipelines/build-pipeline
upvoted 1 times

  YangG 1 year, 4 months ago

Selected Answer: C

I will go C, because for generic orchestration purpose kuberflow is recommended while TFX should go with large scale tasks.
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 204/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #96 Topic 1

You are training an object detection machine learning model on a dataset that consists of three million X-ray images, each roughly 2 GB in size.

You are using Vertex AI Training to run a custom training application on a Compute Engine instance with 32-cores, 128 GB of RAM, and 1 NVIDIA

P100 GPU. You notice that model training is taking a very long time. You want to decrease training time without sacrificing model performance.

What should you do?

A. Increase the instance memory to 512 GB and increase the batch size.

B. Replace the NVIDIA P100 GPU with a v3-32 TPU in the training job.

C. Enable early stopping in your Vertex AI Training job.

D. Use the tf.distribute.Strategy API and run a distributed training job.

Correct Answer: C

Community vote distribution

B (43%) D (33%) C (22%)

  smarques Highly Voted  1 year, 3 months ago

Selected Answer: C

I would say C.

The question asks about time, so the option "early stopping" looks fine because it will no impact the existent accuracy (it will maybe improve it).

The tf.distribute.Strategy reading the TF docs says that it's used when you want to split training between GPUs, but the question says that we have
a single GPU.

Open to discuss. :)
upvoted 6 times

  djo06 9 months, 3 weeks ago

tf.distribute.OneDeviceStrategy uses parallel training on one GPU
upvoted 2 times

  andreabrunelli Most Recent  5 days, 3 hours ago

Selected Answer: B

I would say B:

A. Increse memory doesn't mean necessary a speed up of the process, it's not a batch-size problem
B. It seems a image -> Tensorflow situation. So transforming image into tensors means that a TPU works better and maybe faster
C. It's not a overfitting problem
D. Same here, it's not a memory or input-size problem
upvoted 1 times

  pinimichele01 1 week, 1 day ago

https://www.tensorflow.org/guide/distributed_training#onedevicestrategy
upvoted 1 times

  pinimichele01 1 week, 1 day ago

https://www.tensorflow.org/guide/distributed_training#onedevicestrategy
-> D
upvoted 1 times

  Werner123 2 months ago

Selected Answer: D

In my eyes the only solution is distributed training. 3 000 000 x 2GB = 6 Petabytes worth of data. No single device will get you there.
upvoted 2 times

  ludovikush 2 months ago

Selected Answer: B

Agree with JamesDoes

upvoted 1 times

  Mickey321 5 months, 2 weeks ago

Selected Answer: B

B as it have only one GPU hence in D distributed not efficient

upvoted 3 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 205/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

  pico 5 months, 2 weeks ago

f the question didn't specify the framework used, and you want to choose an option that is more framework-agnostic, it's important to consider th
available options.

Given the context and the need for a framework-agnostic approach, you might consider a combination of options A and D. Increasing instance
memory and batch size can still be beneficial, and if you're using a deep learning framework that supports distributed training (like TensorFlow or
PyTorch), implementing distributed training (Option D) can further accelerate the process.
upvoted 1 times

  Krish6488 5 months, 2 weeks ago

Selected Answer: B

I would go with B as v3-32 TPU offers much more computational power than a single P100 GPU, and this upgrade should provide a substantial
decrease in training time.

Also tf.distributestrategy is good to perform distreibuted training on multiple GPUs or TPUs but the current setup has just one GPU which makes it
the second best option provided the architecture uses multiple GPUs.

Increase in memory may allow large batch size but wont address the fundamental problem which is over utilised GPU

Early stopping is good for avoiding overfitting when model already starts performing at its best. Its good to reduce overall training time but wont
improve the training speed
upvoted 4 times

  pico 7 months, 2 weeks ago

Selected Answer: B

Given the options and the goal of decreasing training time, options B (using TPUs) and D (distributed training) are the most effective ways to
achieve this goal

C. Enable early stopping in your Vertex AI Training job:

Early stopping is a technique that can help save training time by monitoring a validation metric and stopping the training process when the metric
stops improving. While it can help in terms of stopping unnecessary training runs, it may not provide as substantial a speedup as other options.
upvoted 2 times

  tavva_prudhvi 5 months, 3 weeks ago

TPUs (Tensor Processing Units) are Google's custom-developed application-specific integrated circuits (ASICs) used to accelerate machine
learning workloads. They are often faster than GPUs for specific types of computations. However, not all models or training pipelines will benefi
from TPUs, and they might require code modification to fully utilize the TPU capabilities.
upvoted 1 times

  andresvelasco 7 months, 3 weeks ago

Selected Answer: C

A. Increase the instance memory to 512 GB and increase the batch size.
> this will not necessarily decrease training time
B. Replace the NVIDIA P100 GPU with a v3-32 TPU in the training job. Most Voted
> TPU can sacrifice performance
C. Enable early stopping in your Vertex AI Training job.
> YES, this decreases training time without sacrificing performance, if set properly
D. Use the tf.distribute.Strategy API and run a distributed training job.
> No idea .... But I believe the type of machine and architecture cannot be changed as per the wording of the question.
upvoted 1 times

  tavva_prudhvi 5 months, 3 weeks ago

Early stopping is a method that allows you to stop training once the model performance stops improving on a validation dataset. While it can
prevent overfitting and save time by stopping unnecessary training epochs, it does not inherently speed up the training process.
upvoted 1 times

  PST21 9 months ago

Option D, using the tf.distribute.Strategy API for distributed training, can be beneficial for improving training efficiency, but it would require
additional resources and complexity to set up compared to simply using a TPU.

Therefore, replacing the NVIDIA P100 GPU with a v3-32 TPU in the Vertex AI Training job would be the most effective way to decrease training time
while maintaining or even improving model performance
upvoted 2 times

  [Removed] 9 months, 1 week ago

Selected Answer: B

I don't understand why so many people are voting for D (tf.distribute.Strategy API). If we look at our training infrastructure, we can see the
bottleneck is obviously the GPU, which has 12GB or 16GB memory depending on the model
(https://www.leadtek.com/eng/products/ai_hpc(37)/tesla_p100(761)/detail). This means we can afford to have a batch size of only 6-8 images (2GB
each) even if we assume the GPU is utilized 100%. And remember the training size is 3M, which means each epoch will have 375-500K steps in the
best case.

With 32-cores and 128GB memory, we are able to afford higher batch sizes (e.g., 32), so moving to TPU will accelerate the training.

A is wrong because we can't afford a larger batch size with the current GPU. D is wrong because you don't have multiple GPUs and your current
GPU is saturated. C is a viable option, but it seems less optimal than B.

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 206/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

upvoted 4 times

  [Removed] 9 months, 1 week ago

I should note that the batch size should be lower than even 6-8 images because the model weights will also take the GPU memory.
upvoted 1 times

  djo06 9 months, 3 weeks ago

Selected Answer: D

tf.distribute.OneDeviceStrategy uses parallel training on one GPU

upvoted 1 times

  SamuelTsch 9 months, 3 weeks ago

Selected Answer: C

went with C
upvoted 1 times

  Voyager2 10 months, 4 weeks ago

Selected Answer: D

D. Use the tf.distribute.Strategy API and run a distributed training job

Option B replaces GPU with TPU which is not the best option for image procesing. Early stop will affect model performance.
upvoted 1 times

  julliet 10 months, 2 weeks ago

to run distribution job you need to have more than 1 GPU. we have exactly one here
upvoted 1 times

  julliet 11 months, 1 week ago

Selected Answer: A

went with A
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: B

Went with B
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 207/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #97 Topic 1

You are a data scientist at an industrial equipment manufacturing company. You are developing a regression model to estimate the power

consumption in the company’s manufacturing plants based on sensor data collected from all of the plants. The sensors collect tens of millions of

records every day. You need to schedule daily training runs for your model that use all the data collected up to the current date. You want your

model to scale smoothly and require minimal development work. What should you do?

A. Train a regression model using AutoML Tables.

B. Develop a custom TensorFlow regression model, and optimize it using Vertex AI Training.

C. Develop a custom scikit-learn regression model, and optimize it using Vertex AI Training.

D. Develop a regression model using BigQuery ML.

Correct Answer: A

Community vote distribution

D (68%) A (32%)

  niketd Highly Voted  1 year ago

Selected Answer: D

The key is to understand the amount of data that needs to be used for training - the sensor collects tens of millions of records every day and the
model needs to use all the data up to the current date.
There is a limitation for AutoML is 100M rows -> https://cloud.google.com/vertex-ai/docs/tabular-data/classification-regression/prepare-data
upvoted 12 times

  pinimichele01 Most Recent  1 week ago

Selected Answer: D

There is a limitation for AutoML is 100M rows -> https://cloud.google.com/vertex-ai/docs/tabular-data/classification-regression/prepare-data

upvoted 1 times

  vale_76_na_xxx 4 months, 2 weeks ago

I go for A
upvoted 2 times

  pinimichele01 1 week ago

There is a limitation for AutoML is 100M rows -> https://cloud.google.com/vertex-ai/docs/tabular-data/classification-regression/prepare-data
upvoted 1 times

  Mickey321 5 months, 2 weeks ago

Selected Answer: A

Either A or D . Since not stated where is sensor data stored . hence go for A
upvoted 2 times

  PST21 10 months ago

Ans D. BigQuery ML allows you to schedule daily training runs by incorporating the latest data collected up to the current date. By specifying the
appropriate SQL query, you can include all the relevant data in the training process, ensuring that your model is updated regularly.
upvoted 1 times

  maukaba 5 months, 3 weeks ago

it says "use all the data collected up to the current date" not a just a selection of "relevant" (?!) data
upvoted 1 times

  ggwp1999 11 months, 3 weeks ago

Selected Answer: A

I would go with A because it states that it requires minimal development work. Not sure tho, correct me if I’m wrong
upvoted 3 times

  M25 11 months, 3 weeks ago

Selected Answer: D

Went with D
upvoted 1 times

  JamesDoe 1 year, 1 month ago

Selected Answer: A

Old question, the quotas were removed when they moved AutoML into VertexAI.
https://cloud.google.com/vertex-ai/docs/quotas#model_quotas#tabular

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 208/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

upvoted 3 times

  Yajnas_arpohc 1 year, 1 month ago

Would go w A given the specifics mentioned in question.

BigQuery is an unnecessary distraction IMO (e.g. why would we assume BigQuery and not BigTable!)
upvoted 1 times

  TNT87 1 year, 1 month ago

Selected Answer: D

Answer D
https://cloud.google.com/blog/products/data-analytics/automl-tables-now-generally-available-bigquery-ml

This legacy version of AutoML Tables is deprecated and will no longer be available on Google Cloud after January 23, 2024. All the functionality of
legacy AutoML Tables and new features are available on the Vertex AI platform. See Migrate to Vertex AI to learn how to migrate your resources.
upvoted 2 times

  FherRO 1 year, 2 months ago

Selected Answer: A

You require minimal development work and the question doesn't mention if your data is stored in BQ
upvoted 1 times

  Ade_jr 1 year, 3 months ago

Selected Answer: D

Answer is D, AutoML has 200M rows as limits

upvoted 3 times

  ares81 1 year, 3 months ago

Selected Answer: A

A and D seem both good, but A works better, for me.

upvoted 1 times

  mymy9418 1 year, 4 months ago

Selected Answer: A

But BQML also has limits on training data

https://cloud.google.com/bigquery-ml/quotas
upvoted 2 times

  hiromi 1 year, 4 months ago

Selected Answer: D

Vote for D
A dosen't work because AutoML has limits on training data
- https://www.examtopics.com/exams/google/professional-machine-learning-engineer/view/10/
upvoted 3 times

  behzadsw 1 year, 3 months ago

Wrong. The limit is 200 M records. We have 10M records. see:
https://cloud.google.com/automl-tables/docs/quotas
upvoted 1 times

  adarifian 1 year, 1 month ago

it's more than 10M. the training needs to use all the data collected up to the current date
upvoted 2 times

  mil_spyro 1 year, 4 months ago

Selected Answer: D

BigQuery ML can scale smoothly and requires minimal development work.

Model can be build using SQL queries rather than writing custom code.
upvoted 3 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 209/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #98 Topic 1

You built a custom ML model using scikit-learn. Training time is taking longer than expected. You decide to migrate your model to Vertex AI

Training, and you want to improve the model’s training time. What should you try out first?

A. Migrate your model to TensorFlow, and train it using Vertex AI Training.

B. Train your model in a distributed mode using multiple Compute Engine VMs.

C. Train your model with DLVM images on Vertex AI, and ensure that your code utilizes NumPy and SciPy internal methods whenever possible.

D. Train your model using Vertex AI Training with GPUs.

Correct Answer: C

Community vote distribution

C (77%) D (23%)

  pico 5 months, 2 weeks ago

Selected Answer: D

Options B and C may also be relevant in certain scenarios, but they are generally more involved and might require additional considerations.
Option B can be effective for large-scale training tasks, but it might add complexity and overhead. Option C could be helpful, but the impact on
training time might not be as immediate and substantial as using GPUs.
upvoted 2 times

  pico 7 months, 2 weeks ago

Selected Answer: D

D: Training your model with GPUs can provide a substantial speedup, especially for deep learning models or models that require a lot of
computation. This option is likely to have a significant impact on training time.

NOT C: While optimizing code can help improve training time to some extent, it may not provide as significant a speedup as the other options.
However, it's still a good practice to optimize your code.
upvoted 1 times

  andresvelasco 7 months, 3 weeks ago

Selected Answer: C

I dont think scikit-learn would support GPU or distribution, so based on "What should you try out first?" I think > C. Train your model with DLVM
images on Vertex AI, and ensure that your code utilizes NumPy and SciPy internal methods whenever possible.
upvoted 3 times

  blobfishtu 9 months, 3 weeks ago

why not B? Vertex AI provides the ability to distribute training tasks across multiple Compute Engine VMs, which can parallelize the workload and
significantly reduce the training time for large datasets and complex models.
upvoted 3 times

  PST21 10 months ago

Option D is not the optimal choice for a scikit-learn model since scikit-learn does not have native GPU support. Option C, training with DLVM
images on Vertex AI and optimizing code with NumPy and SciPy, would be more appropriate in your scenario.
upvoted 1 times

  PST21 10 months ago

Ans - D. quickest improvement in training time with minimal modifications to your existing scikit-learn model, trying out Option D and training
your model using Vertex AI Training with GPUs is the recommended first step.
upvoted 1 times

  Scipione_ 11 months, 2 weeks ago

Selected Answer: C

A) Migrate your model to TensorFlow, and train it using Vertex AI Training.

Not the first thing to do.
B) Train your model in a distributed mode using multiple Compute Engine VMs.
Could be not easy and fast.
D)Train your model using Vertex AI Training with GPUs
sklearn does not support GPUs

Also, most of scikit-learn assumes data is in NumPy arrays or SciPy sparse matrices of a single numeric dtype.
I choose C as the correct answer.
upvoted 3 times

  M25 11 months, 3 weeks ago

Selected Answer: C

Went with C
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 210/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

upvoted 1 times

  TNT87 1 year ago

Selected Answer: C

Answer C
upvoted 1 times

  guilhermebutzke 1 year, 2 months ago

How about using sklearn's multi-core? Considering multiple jobs, could we choose item B?
https://machinelearningmastery.com/multi-core-machine-learning-in-python/
upvoted 1 times

  enghabeth 1 year, 2 months ago

Selected Answer: C

https://scikit-learn.org/stable/faq.html#will-you-add-gpu-support
upvoted 1 times

  John_Pongthorn 1 year, 3 months ago

Selected Answer: C

C is correct absolutely
https://console.cloud.google.com/marketplace/details/click-to-deploy-images/deeplearning?_ga=2.139171125.787784554.1674450530-
1146240914.1659613735&project=quantum-hash-240404
upvoted 1 times

  behzadsw 1 year, 3 months ago

Selected Answer: C

Scikit learn does not support GPU:s

https://scikit-learn.org/stable/faq.html#will-you-add-gpu-support
upvoted 4 times

  emma_aic 1 year, 4 months ago

No D
https://cloud.google.com/vertex-ai/docs/predictions/pre-built-containers?hl=ko#scikit-learn
upvoted 1 times

  mymy9418 1 year, 4 months ago

Selected Answer: C

GPU is not useful for sciki-learn model

https://scikit-learn.org/stable/faq.html#will-you-add-gpu-support
but DLVM did mention it is support scikit-learn framework
https://cloud.google.com/deep-learning-vm
upvoted 3 times

  hiromi 1 year, 4 months ago

Selected Answer: D

D (not sure)
- https://cloud.google.com/vertex-ai/docs/training/code-requirements#gpus
upvoted 1 times

  hiromi 1 year, 4 months ago

Changind my vote to C
upvoted 2 times

  mil_spyro 1 year, 4 months ago

Selected Answer: D

Training a machine learning model on a GPU can significantly improve the training time compared to training on a CPU.
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 211/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #99 Topic 1

You are an ML engineer at a travel company. You have been researching customers’ travel behavior for many years, and you have deployed models

that predict customers’ vacation patterns. You have observed that customers’ vacation destinations vary based on seasonality and holidays;

however, these seasonal variations are similar across years. You want to quickly and easily store and compare the model versions and

performance statistics across years. What should you do?

A. Store the performance statistics in Cloud SQL. Query that database to compare the performance statistics across the model versions.

B. Create versions of your models for each season per year in Vertex AI. Compare the performance statistics across the models in the

Evaluate tab of the Vertex AI UI.

C. Store the performance statistics of each pipeline run in Kubeflow under an experiment for each season per year. Compare the results

across the experiments in the Kubeflow UI.

D. Store the performance statistics of each version of your models using seasons and years as events in Vertex ML Metadata. Compare the

results across the slices.

Correct Answer: B

Community vote distribution

D (56%) B (44%)

  pinimichele01 3 days, 5 hours ago

Selected Answer: B

https://cloud.google.com/vertex-ai/docs/model-registry/versioning
Model versioning lets you create multiple versions of the same model. With model versioning, you can organize your models in a way that helps
navigate and understand which changes had what effect on the models. With Vertex AI Model Registry you can view your models and all of their
versions in a single view. You can drill down into specific model versions and see exactly how they performed.
upvoted 1 times

  gscharly 1 week, 1 day ago

Selected Answer: B

agree with pico

upvoted 1 times

  Mickey321 5 months, 2 weeks ago

Selected Answer: B

either B or D so leaning towards B

upvoted 1 times

  pico 7 months, 2 weeks ago

Selected Answer: B

Vertex AI provides a managed environment for machine learning, and creating model versions for each season per year is a structured way to
organize and compare models. You can use the Evaluate tab to compare performance metrics easily. This approach is well-suited for the task.
upvoted 2 times

  pico 7 months, 2 weeks ago

not D:

Vertex ML Metadata is designed for tracking metadata and lineage in machine learning pipelines. While it can store model version information
and performance statistics, it might not provide as straightforward a way to compare models across years and seasons as Vertex AI's model
versioning and evaluation tools.
upvoted 1 times

  andresvelasco 7 months, 3 weeks ago

Selected Answer: D

I absolutely do not master this topicm but I would say correct answer is D.
It does not sound right to systematically create versions of a model beased on seasonality, if the model has not changed. "Events" in metadata
sound right.
upvoted 1 times

  PST21 10 months ago

Ans D- With Vertex ML Metadata, you can store the performance statistics of each version of your models as events. You can associate these event
with specific seasons and years, making it easy to organize and retrieve the data based on the relevant time periods. By storing performance
statistics as events, you can capture the necessary information for comparing model versions across years.
upvoted 1 times

  Voyager2 10 months, 4 weeks ago

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 212/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Selected Answer: D

D. Store the performance statistics of each version of your models using seasons and years as events in Vertex ML Metadata. Compare the results
across the slices.
https://cloud.google.com/vertex-ai/docs/ml-metadata/analyzing#filtering
Which versions of a trained model achieved a certain quality threshold?
upvoted 1 times

  pico 7 months, 2 weeks ago

https://cloud.google.com/vertex-ai/docs/evaluation/using-model-evaluation#console
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: D

Went with D
upvoted 1 times

  iskorini 11 months ago

why choose D instead of B?
upvoted 1 times

  CloudKida 11 months, 3 weeks ago

Selected Answer: B

  Yajnas_arpohc 1 year, 1 month ago

Selected Answer: B

You can compare evaluation results across different models, model versions, and evaluation jobs --> https://cloud.google.com/vertex-
ai/docs/evaluation/using-model-evaluation

Metadata mgmt has a very different purpose

upvoted 1 times

  TNT87 1 year, 1 month ago

Selected Answer: D

Answer D
upvoted 1 times

  hiromi 1 year, 4 months ago

Selected Answer: D

D
- https://cloud.google.com/vertex-ai/docs/ml-metadata/introduction
upvoted 2 times

  mil_spyro 1 year, 4 months ago

Selected Answer: D

Vote D. It is easy to compare via Vertex ML Metadata UI the performance statistics across the different slices and see how the model performance
varies over time.
upvoted 2 times

  mymy9418 1 year, 4 months ago

Selected Answer: D

i think it is D
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 213/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #100 Topic 1

You are an ML engineer at a manufacturing company. You need to build a model that identifies defects in products based on images of the

product taken at the end of the assembly line. You want your model to preprocess the images with lower computation to quickly extract features

of defects in products. Which approach should you use to build the model?

A. Reinforcement learning

B. Recommender system

C. Recurrent Neural Networks (RNN)

D. Convolutional Neural Networks (CNN)

Correct Answer: D

Community vote distribution

D (100%)

  MultiCloudIronMan 3 weeks, 6 days ago

Selected Answer: D

CNN is commonly used for image classifications

upvoted 1 times

  Scipione_ 11 months, 2 weeks ago

Selected Answer: D

D for sure
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: D

Went with D
upvoted 1 times

  TNT87 1 year, 1 month ago

Selected Answer: D

Answer D
upvoted 1 times

  FherRO 1 year, 2 months ago

Selected Answer: D

CNNs commonly used for image classification and recognition tasks.

upvoted 1 times

  FherRO 1 year, 2 months ago

Selected Answer: D

CNN scenario
upvoted 1 times

  enghabeth 1 year, 2 months ago

Selected Answer: D

best way
upvoted 1 times

  hiromi 1 year, 4 months ago

Selected Answer: D

D
CNN is good for images processing
- https://developers.google.com/machine-learning/practica/image-classification/convolutional-neural-networks
upvoted 1 times

  ares81 1 year, 4 months ago

Selected Answer: D

Obviously D.
upvoted 2 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 214/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #101 Topic 1

You are developing an ML model intended to classify whether X-ray images indicate bone fracture risk. You have trained a ResNet architecture on

Vertex AI using a TPU as an accelerator, however you are unsatisfied with the training time and memory usage. You want to quickly iterate your

training code but make minimal changes to the code. You also want to minimize impact on the model’s accuracy. What should you do?

A. Reduce the number of layers in the model architecture.

B. Reduce the global batch size from 1024 to 256.

C. Reduce the dimensions of the images used in the model.

D. Configure your model to use bfloat16 instead of float32.

Correct Answer: A

Community vote distribution

D (87%) 13%

  mymy9418 Highly Voted  1 year, 4 months ago

i think should be D
https://cloud.google.com/tpu/docs/bfloat16
upvoted 5 times

  fitri001 Most Recent  3 days, 16 hours ago

Selected Answer: D

Configuring bfloat16 instead of float32 (D): This offers a good balance between speed, memory usage, and minimal code changes. Bfloat16 uses 1
bits per float value compared to 32 bits for float32.

pen_spark
expand_more This can significantly reduce memory usage while maintaining similar accuracy in many machine learning models, especially for
image recognition tasks.expand_more It's a quick change with minimal impact on the code and potentially large gains in training speed.
upvoted 1 times

  pinimichele01 2 weeks ago

Selected Answer: D

"the Google hardware team chose bfloat16 for Cloud TPUs to improve hardware efficiency while maintaining the ability to train deep learning
models accurately, all with minimal switching costs from float32"
upvoted 1 times

  pico 7 months, 2 weeks ago

Selected Answer: B

while reducing the global batch size (Option B) and configuring your model to use bfloat16 (Option D) are both valid options, reducing the global
batch size is typically a safer and more straightforward choice to quickly iterate and make minimal changes to your code while still achieving
reasonable model performance.
upvoted 1 times

  pico 7 months, 2 weeks ago

Why not D:
Numerical Precision: bfloat16 has a lower numerical precision compared to float32
Compatibility: Not all machine learning frameworks and libraries support bfloat16 natively.
Hyperparameter Tuning: When switching to bfloat16, you may need to adjust hyperparameters, such as learning rates and gradient clipping
thresholds, to accommodate the lower numerical precision
Model Architecture: Some model architectures and layers may be more sensitive to reduced precision than others.
upvoted 1 times

  tavva_prudhvi 5 months, 3 weeks ago

TPUs are optimized for operations with bfloat16 data types. By switching from float32 to bfloat16, you can benefit from the TPU's hardware
acceleration capabilities, leading to faster computation and reduced memory usage without significant changes to your code.

While bfloat16 offers a lower precision compared to float32, it maintains a similar dynamic range. This means that the reduction in numerica
precision is unlikely to have a substantial impact on the accuracy of your model, especially in the context of image classification tasks like
bone fracture risk assessment in X-rays.

While reducing the batch size can decrease memory usage, it can also affect the model's convergence and accuracy. Additionally, TPUs are
highly efficient with large batch sizes, so reducing the batch size might not fully leverage the TPU's capabilities.
upvoted 1 times

  Voyager2 10 months, 4 weeks ago

Selected Answer: D

I think it should be D since they are using a TPU.https://cloud.google.com/tpu/docs/bfloat16

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 215/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: D

Went with D
upvoted 1 times

  tavva_prudhvi 1 year, 1 month ago

Selected Answer: D

https://cloud.google.com/tpu/docs/bfloat16
upvoted 1 times

  TNT87 1 year, 1 month ago

Selected Answer: D

Answer D
upvoted 2 times

  ailiba 1 year, 2 months ago

  John_Pongthorn 1 year, 3 months ago

Selected Answer: D

I go with D exactly, primarily. the rest don't make any sense at all
upvoted 2 times

  ares81 1 year, 3 months ago

Selected Answer: D

It should be D.
upvoted 1 times

  hiromi 1 year, 4 months ago

Selected Answer: D

D
Agree with mymy9418
upvoted 2 times

  mil_spyro 1 year, 4 months ago

Selected Answer: D

Agree with D
upvoted 1 times

  ares81 1 year, 4 months ago

Selected Answer: B

It should be B.
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 216/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #102 Topic 1

You have successfully deployed to production a large and complex TensorFlow model trained on tabular data. You want to predict the lifetime

value (LTV) field for each subscription stored in the BigQuery table named subscription. subscriptionPurchase in the project named my-

fortune500-company-project.

You have organized all your training code, from preprocessing data from the BigQuery table up to deploying the validated model to the Vertex AI

endpoint, into a TensorFlow Extended (TFX) pipeline. You want to prevent prediction drift, i.e., a situation when a feature data distribution in

production changes significantly over time. What should you do?

A. Implement continuous retraining of the model daily using Vertex AI Pipelines.

B. Add a model monitoring job where 10% of incoming predictions are sampled 24 hours.

C. Add a model monitoring job where 90% of incoming predictions are sampled 24 hours.

D. Add a model monitoring job where 10% of incoming predictions are sampled every hour.

Correct Answer: C

Community vote distribution

B (79%) A (16%) 5%

  MultiCloudIronMan 3 weeks, 6 days ago

Selected Answer: B

You need to monitor it first and foremost to see if there is a drift and if there is then a measure can be devised. training every date is an over kill.
upvoted 1 times

  pico 7 months, 1 week ago

Selected Answer: A

Continuous Retraining: Continuously retraining the model allows it to adapt to changes in the data distribution, helping to mitigate prediction drift
Daily retraining provides a good balance between staying up-to-date and avoiding excessive retraining.
Options B, C, and D involve model monitoring but do not address the issue of keeping the model updated with the changing data distribution.
Monitoring alone can help you detect drift, but it does not actively prevent it. Retraining the model is necessary to address drift effectively.
upvoted 3 times

  maukaba 6 months, 1 week ago

Option A can prevent drift prediction. All the other options can only detect.
Therefore the correct answer is A unless it is possible to monitor drifts and then remediate without retrainings.
upvoted 1 times

  Nish1729 3 months, 3 weeks ago

Follow me on X (twitter): @nbcodes for more useful tips.

I think you're slightly missing the point, the answer should be B, let me explain why..

The whole point of this question is to come up with a PREVENTATIVE way of handling prediction drift so you need to find a way to DETECT the
drift before it occurs, this is exactly what solution B does and ensures it's done in a way that is not too frequent i.e D and not too resource
intensive with the large sample i.e C remember if sampling is done well you don't need 90% of the data to detect drift.

Solution A suggests retraining every day which is a CRAZY proposal, why would you retrain every day even if you don't know if your data is
drifting?? Huge waste of resources and time.
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: B

Went with B
upvoted 1 times

  tavva_prudhvi 1 year, 1 month ago

Selected Answer: B

Continuous retraining (option A) is not necessarily the best solution for preventing prediction drift, as it can be time-consuming and expensive.
Instead, monitoring the performance of the model in production is a better approach. Option B is a good choice because it samples a small
percentage of incoming predictions and checks for any significant changes in the feature data distribution over a 24-hour period. This allows you
to detect any drift and take appropriate action to address it before it affects the model's performance. Options C and D are less effective because
they either sample too many or too few predictions and/or at too frequent intervals.
upvoted 4 times

  andresvelasco 7 months, 3 weeks ago

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 217/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

I am just not sure why sampling too few (10%) is important. Is this a costly service?
upvoted 1 times

  tavva_prudhvi 5 months, 3 weeks ago

Model monitoring, especially at a large scale, can consume significant computational resources. Sampling a smaller percentage of
predictions (like 10%) helps manage these resource demands and associated costs. The more predictions you sample, the more storage,
computation, and network resources you'll need to analyze the data, potentially increasing the cost.

In many cases, a 10% sample of the data can provide statistically significant insights into the model's performance and the presence of drift.
It's a balancing act between getting enough data to make informed decisions and not overburdening the system.

In some datasets, especially large ones, a lot of the data might be redundant or not particularly informative. Sampling a smaller fraction can
help filter out noise and focus on the most relevant information.
upvoted 1 times

  pico 5 months, 2 weeks ago

Neither B,C or D have a step to prevent the prediction drift.

The question says: "you want to prevent prediction drift"

upvoted 1 times

  TNT87 1 year, 1 month ago

Selected Answer: B

Answer B
upvoted 1 times

  John_Pongthorn 1 year, 2 months ago

Selected Answer: B

B , I got it from Machine Learning in the Enterprise course for google partnet skillboost
you can watch cafully on video "Model management using Vertex AI"
I imply that it is default setting on typical case.
upvoted 3 times

  behzadsw 1 year, 3 months ago

Selected Answer: D

Using 10% of hourly requests would yield a better distribution and faster feed back loop
upvoted 1 times

  hargur 1 year, 4 months ago

I think it is B, we can say 10% to be a sample but not 90%
upvoted 2 times

  mymy9418 1 year, 4 months ago

Selected Answer: B

I guess 10% of 24 hours should be good enough?

upvoted 3 times

  hiromi 1 year, 4 months ago

Selected Answer: B

B (not sure)
- https://cloud.google.com/vertex-ai/docs/model-monitoring/overview
- https://cloud.google.com/vertex-ai/docs/model-monitoring/using-model-monitoring#drift-detection
upvoted 2 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 218/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #103 Topic 1

You recently developed a deep learning model using Keras, and now you are experimenting with different training strategies. First, you trained the

model using a single GPU, but the training process was too slow. Next, you distributed the training across 4 GPUs using

tf.distribute.MirroredStrategy (with no other changes), but you did not observe a decrease in training time. What should you do?

A. Distribute the dataset with tf.distribute.Strategy.experimental_distribute_dataset

B. Create a custom training loop.

C. Use a TPU with tf.distribute.TPUStrategy.

D. Increase the batch size.

Correct Answer: C

Community vote distribution

D (66%) A (28%) 3%

  egdiaa Highly Voted  1 year, 4 months ago

Selected Answer: D

Ans D: Check this link https://www.tensorflow.org/guide/gpu_performance_analysis for details on how to Optimize the performance on the multi-
GPU single host
upvoted 10 times

  pinimichele01 Most Recent  2 weeks ago

Selected Answer: D

when using tf.distribute.MirroredStrategy, TensorFlow automatically takes care of distributing the dataset across the available devices (GPUs in this
case).

To make sure that the data is efficiently distributed across the GPUs, you should increase the global batch size. This ensures that each GPU receives
a larger batch of data to process, effectively utilizing the additional computational power. The global batch size is the sum of the batch sizes for all
devices. For example, if you had a batch size of 64 for a single GPU, you would set the global batch size to 256 (64 * 4) when using 4 GPUs.
upvoted 1 times

  pico 5 months, 2 weeks ago

Selected Answer: A

When you distribute the training across multiple GPUs using tf.distribute.MirroredStrategy, the training time may not decrease if the dataset
loading and preprocessing become a bottleneck. In this case, option A, distributing the dataset with
tf.distribute.Strategy.experimental_distribute_dataset, can help improve the performance.
upvoted 1 times

  pico 5 months, 2 weeks ago

option D can be a reasonable step to try, but it's important to carefully monitor the training process, consider memory constraints, and assess
the impact on model performance. It might be a good idea to try both option A (distributing the dataset) and option D (increasing the batch
size) to see if there is any improvement in training time.
upvoted 1 times

  PST21 9 months ago

A. Distribute the dataset with tf.distribute.Strategy.experimental_distribute_dataset
When you distribute the training across multiple GPUs using tf.distribute.MirroredStrategy, you need to make sure that the data is also distributed
across the GPUs to fully utilize the computational power. By default, the tf.distribute.MirroredStrategy replicates the model and uses synchronous
training, but it does not automatically distribute the dataset across the GPUs.
upvoted 1 times

  tavva_prudhvi 5 months, 3 weeks ago

You are right, However, when using tf.distribute.MirroredStrategy, TensorFlow automatically takes care of distributing the dataset across the
available devices (GPUs in this case).

To make sure that the data is efficiently distributed across the GPUs, you should increase the global batch size. This ensures that each GPU
receives a larger batch of data to process, effectively utilizing the additional computational power. The global batch size is the sum of the batch
sizes for all devices. For example, if you had a batch size of 64 for a single GPU, you would set the global batch size to 256 (64 * 4) when using 4
GPUs.
upvoted 1 times

  CloudKida 11 months, 3 weeks ago

Selected Answer: D

When going from training with a single GPU to multiple GPUs on the same host, ideally you should experience the performance scaling with only
the additional overhead of gradient communication and increased host thread utilization. Because of this overhead, you will not have an exact 2x
speedup if you move from 1 to 2 GPUs.

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 219/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Try to maximize the batch size, which will lead to higher device utilization and amortize the costs of communication across multiple GPUs. Using
the memory profiler helps get a sense of how close your program is to peak memory utilization. Note that while a higher batch size can affect
convergence, this is usually outweighed by the performance benefits.
upvoted 2 times

  M25 11 months, 3 weeks ago

Selected Answer: D

Went with D
upvoted 1 times

  tavva_prudhvi 1 year, 1 month ago

Selected Answer: D

If distributing the training across multiple GPUs did not result in a decrease in training time, the issue may be related to the batch size being too
small. When using multiple GPUs, each GPU gets a smaller portion of the batch size, which can lead to slower training times due to increased
communication overhead. Therefore, increasing the batch size can help utilize the GPUs more efficiently and speed up training.
upvoted 2 times

  TNT87 1 year, 1 month ago

Selected Answer: D

Answer D
upvoted 1 times

  John_Pongthorn 1 year, 2 months ago

D: it is best https://www.tensorflow.org/guide/distributed_training#use_tfdistributestrategy_with_keras_modelfit
Each epoch will then train faster as you add more GPUs. Typically, you would want to increase your batch size as you add more accelerators,
C is rule out because of GPU
A and B , as reading on https://www.tensorflow.org/guide/distributed_training#use_tfdistributestrategy_with_custom_training_loops To use custom
loop , we have call If you are writing a custom training loop, you will need to call a few more methods, see the guide:

Start by creating a tf.data.Dataset normally.

Use tf.distribute.Strategy.experimental_distribute_dataset to convert a tf.data.Dataset to something that produces "per-replica" values. If you want
to
https://www.tensorflow.org/api_docs/python/tf/distribute/Strategy
upvoted 4 times

  zeic 1 year, 3 months ago

Selected Answer: D

To speed up the training of the deep learning model, increasing the batch size. When using multiple GPUs with tf.distribute.MirroredStrategy,
increasing the batch size can help to better utilize the additional GPUs and potentially reduce the training time. This is because larger batch sizes
allow each GPU to process more data in parallel, which can help to improve the efficiency of the training process.
upvoted 1 times

  ares81 1 year, 3 months ago

Selected Answer: C

TPUs are Google's specialized ASICs designed to dramatically accelerate machine learning workloads. Hence it should be C.
upvoted 1 times

  Nayak8 1 year, 4 months ago

Selected Answer: D

I think it's D
upvoted 1 times

  MithunDesai 1 year, 4 months ago

Selected Answer: A

I think its A
upvoted 4 times

  hiromi 1 year, 4 months ago

Selected Answer: B

B (not sure)
- https://www.tensorflow.org/guide/keras/writing_a_training_loop_from_scratch
-https://www.tensorflow.org/guide/distributed_training#use_tfdistributestrategy_with_custom_training_loops
upvoted 1 times

  hiromi 1 year, 4 months ago

Sorry, ans D (by ediaa link)
upvoted 1 times

  hiromi 1 year, 4 months ago

It's should A
upvoted 1 times

  mil_spyro 1 year, 4 months ago

Selected Answer: A

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 220/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

I think it's A,

https://www.tensorflow.org/api_docs/python/tf/distribute/Strategy#in_short
upvoted 3 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 221/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #104 Topic 1

You work for a gaming company that has millions of customers around the world. All games offer a chat feature that allows players to

communicate with each other in real time. Messages can be typed in more than 20 languages and are translated in real time using the Cloud

Translation API. You have been asked to build an ML system to moderate the chat in real time while assuring that the performance is uniform

across the various languages and without changing the serving infrastructure.

You trained your first model using an in-house word2vec model for embedding the chat messages translated by the Cloud Translation API.

However, the model has significant differences in performance across the different languages. How should you improve it?

A. Add a regularization term such as the Min-Diff algorithm to the loss function.

B. Train a classifier using the chat messages in their original language.

C. Replace the in-house word2vec with GPT-3 or T5.

D. Remove moderation for languages for which the false positive rate is too high.

Correct Answer: D

Community vote distribution

B (60%) A (30%) 10%

  TNT87 Highly Voted  1 year, 1 month ago

Selected Answer: B

Answer B
Since the performance of the model varies significantly across different languages, it suggests that the translation process might have introduced
some noise in the chat messages, making it difficult for the model to generalize across languages. One way to address this issue is to train a
classifier using the chat messages in their original language.
upvoted 7 times

  Zwi3b3l Most Recent  3 months ago

Selected Answer: A

uniform performance
upvoted 1 times

  pinimichele01 2 weeks ago

Adding a regularization term to the loss function can help prevent overfitting of the model, but it may not necessarily address the language-
specific differences in performance. The Min-Diff algorithm is a type of regularization technique that aims to minimize the difference between
the model predictions and the ground truth while ensuring that the model remains simple. While this can improve the generalization
performance of the model, it may not be sufficient to address the language-specific differences in performance. Therefore, training a classifier
using the chat messages in their original language can be a better solution to improve the performance of the moderation system across
different languages.
upvoted 1 times

  ciro_li 9 months, 1 week ago

Selected Answer: B

Min-diff may reduce model unfairness, but here the concern is about improving performance. Training models avoiding Cloud Natural API should
be more suitable.
upvoted 1 times

  tavva_prudhvi 9 months ago

  [Removed] 9 months, 1 week ago

Selected Answer: A

A is correct since it encourages the model to have similar performance across languages.

B would entail training 20 word2vec embeddings + maintaining 20 models at the same time. On top of that, there would be no guarantee that
those models will have comparable performance across languages. This is certainly not something you would do after training your first model.
upvoted 2 times

  friedi 10 months, 1 week ago

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 222/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Selected Answer: A

A is correct, the key part of the question is „[…] assuring the performance is uniform […]“ which is baked into the Min-Diff regularisation:
https://ai.googleblog.com/2020/11/mitigating-unfair-bias-in-ml-models.html
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: B

Went with B
upvoted 1 times

  tavva_prudhvi 1 year, 1 month ago

Selected Answer: B

Since the current model has significant differences in performance across the different languages, it is likely that the translations produced by the
Cloud Translation API are not of uniform quality across all languages. Therefore, it would be best to train a classifier using the chat messages in
their original language instead of relying on translations.

This approach has several advantages. First, the model can directly learn the nuances of each language, leading to better performance across all
languages. Second, it eliminates the need for translation, reducing the possibility of errors and improving the overall speed of the system. Finally, it
is a relatively simple approach that can be implemented without changing the serving infrastructure.
upvoted 3 times

  hakook 1 year, 1 month ago

Selected Answer: A

should be A
https://ai.googleblog.com/2020/11/mitigating-unfair-bias-in-ml-models.html
upvoted 2 times

  Ml06 1 year, 1 month ago

B i think is the correct answer
C is an overkill , you have just developed your first model you don’t jump into solution like C , in addition the problem is that there is a significant
difference between language note the model is enormously underperforming .
Finally you are serving millions of users , running chat GPT or T5 for a task like chat moderation (and in real time) is extremely wasteful .
upvoted 3 times

  John_Pongthorn 1 year, 2 months ago

Given that GPT-3 is rival of google , C is not possible certainly .
upvoted 2 times

  John_Pongthorn 1 year, 2 months ago

we are taking into account 20 muti classification, it is relevant about FP or FN.
upvoted 1 times

  egdiaa 1 year, 4 months ago

Selected Answer: C

GPT-3 is best for generating human-like Text

upvoted 2 times

  lightnessofbein 1 year, 2 months ago

Does "moderate" means we need to generate text?
upvoted 2 times

  kunal_18 1 year, 4 months ago

Ans : C
https://towardsdatascience.com/poor-mans-gpt-3-few-shot-text-generation-with-t5-transformer-51f1b01f843e
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 223/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #105 Topic 1

You work for a gaming company that develops massively multiplayer online (MMO) games. You built a TensorFlow model that predicts whether

players will make in-app purchases of more than $10 in the next two weeks. The model’s predictions will be used to adapt each user’s game

experience. User data is stored in BigQuery. How should you serve your model while optimizing cost, user experience, and ease of management?

A. Import the model into BigQuery ML. Make predictions using batch reading data from BigQuery, and push the data to Cloud SQL

B. Deploy the model to Vertex AI Prediction. Make predictions using batch reading data from Cloud Bigtable, and push the data to Cloud SQL.

C. Embed the model in the mobile application. Make predictions after every in-app purchase event is published in Pub/Sub, and push the data

to Cloud SQL.

D. Embed the model in the streaming Dataflow pipeline. Make predictions after every in-app purchase event is published in Pub/Sub, and push

the data to Cloud SQL.

Correct Answer: A

Community vote distribution

A (64%) D (27%) 9%

  hiromi Highly Voted  1 year, 4 months ago

Selected Answer: A

it seens A (not sure)

- https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-create-tensorflow
upvoted 11 times

  pinimichele01 Most Recent  2 weeks ago

Selected Answer: A

Make predictions after every in-app purchase it it not necessary -> A

upvoted 1 times

  Mickey321 5 months, 2 weeks ago

Selected Answer: D

Embedding the model in a streaming Dataflow pipeline allows low latency predictions on real-time events published to Pub/Sub. This provides a
responsive user experience.
Dataflow provides a managed service to scale predictions and integrate with Pub/Sub, without having to manage servers.
Streaming predictions only when events occur optimizes cost compared to bulk or client-side prediction.
Pushing results to Cloud SQL provides a managed database for persistence.
In contrast, options A and B use inefficient batch predictions. Option C increases mobile app size and cost.
upvoted 1 times

  SamuelTsch 9 months, 3 weeks ago

Selected Answer: D

D could be correct
upvoted 1 times

  Nxtgen 10 months ago

Selected Answer: D

These were my reasonings to choose D as best option:

B -> Vertex AI would not minimize cost
C -> Would not optimize user experience (this may lead to slow running of the game (lag)?)
A- > Would not optimize ease of management / automatization
D -> Best choice?
upvoted 1 times

  tavva_prudhvi 5 months, 3 weeks ago

Why do you want to make a prediction after every app purchase bro?
upvoted 3 times

  M25 11 months, 3 weeks ago

Selected Answer: D

For "used to adapt each user's game experience" points out to non-batch, hence excludes A & B, and embedding the model in the mobile app
would not necessarily "optimize cost". Plus, the classical streaming solution builds on Dataflow along with Pub/Sub and BigQuery, embedding ML
in Dataflow is low-code https://cloud.google.com/blog/products/data-analytics/latest-dataflow-innovations-for-real-time-streaming-and-aiml and
apparently a modified version of the question points to the same direction https://mikaelahonen.com/en/data/gcp-mle-exam-questions/
upvoted 3 times

  ciro_li 9 months, 1 week ago

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 224/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

there's no need to make a prediction after every in-app purchase event. Am i wrong?
upvoted 3 times

  TNT87 1 year ago

Selected Answer: A

Yeah its A
upvoted 2 times

  TNT87 1 year, 1 month ago

Selected Answer: C

Answer C
upvoted 2 times

  tavva_prudhvi 1 year, 1 month ago

Option C, embedding the model in the mobile application, can increase the size of the application and may not be suitable for real-time
prediction.
upvoted 2 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 225/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #106 Topic 1

You are building a linear regression model on BigQuery ML to predict a customer’s likelihood of purchasing your company’s products. Your model

uses a city name variable as a key predictive component. In order to train and serve the model, your data must be organized in columns. You want

to prepare your data using the least amount of coding while maintaining the predictable variables. What should you do?

A. Use TensorFlow to create a categorical variable with a vocabulary list. Create the vocabulary file, and upload it as part of your model to

BigQuery ML.

B. Create a new view with BigQuery that does not include a column with city information

C. Use Cloud Data Fusion to assign each city to a region labeled as 1, 2, 3, 4, or 5, and then use that number to represent the city in the model.

D. Use Dataprep to transform the state column using a one-hot encoding method, and make each city a column with binary values.

Correct Answer: B

Community vote distribution

D (82%) C (18%)

  fitri001 4 days, 15 hours ago

Selected Answer: D

A. Using TensorFlow: This is an overkill for this scenario. BigQuery ML can handle one-hot encoding natively within Dataprep.
B. Excluding City Information: This removes a potentially important predictive variable, reducing model accuracy.
C. Assigning Region Labels: This approach loses granularity and might not capture the specific variations between cities.
upvoted 2 times

  andresvelasco 7 months, 3 weeks ago

Selected Answer: D

D by elimination but ...

Does not bigquery automatically do one-hot encoding of categorical features for you?
Also the wording of the question does not seem right: a linear regression model to predict the likelihodd that the customer ... isnt that a
classification model?
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: D

Went with D
upvoted 1 times

  Yajnas_arpohc 1 year, 1 month ago

Is it correct to say that A is technically a better way to do things if the ask wast for separate columns?
upvoted 1 times

  tavva_prudhvi 1 year, 1 month ago

"least amount of coding"
upvoted 4 times

  guilhermebutzke 1 year, 1 month ago

Selected Answer: D

One-hot is a good way to use categorical variables in regressions problems

https://academic.oup.com/rheumatology/article/54/7/1141/1849688
https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-auto-preprocessing
upvoted 3 times

  TNT87 1 year, 1 month ago

Selected Answer: D

Answer D
upvoted 1 times

  abneural 1 year, 2 months ago

Selected Answer: C

for a fuller answer, D--> transforms “state” column not city column
C--> at least works with city column
upvoted 1 times

  tavva_prudhvi 1 year, 1 month ago

Read smarques comment
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 226/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

  John_Pongthorn 1 year, 2 months ago

Selected Answer: D

https://docs.trifacta.com/display/SS/Prepare+Data+for+Machine+Processing
upvoted 2 times

  smarques 1 year, 3 months ago

Selected Answer: D

This will allow you to maintain the city name variable as a predictor while ensuring that the data is in a format that can be used to train a linear
regression model on BigQuery ML.
upvoted 1 times

  Abhijat 1 year, 4 months ago

Selected Answer: D

Answer D
upvoted 1 times

  Abhijat 1 year, 4 months ago

Answer is D
upvoted 1 times

  mymy9418 1 year, 4 months ago

Selected Answer: D

one-hot encoding makes sense to me

upvoted 2 times

  hiromi 1 year, 4 months ago

Selected Answer: C

I vote for C
upvoted 2 times

  hiromi 1 year, 4 months ago

Changing my vote to D
upvoted 3 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 227/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #107 Topic 1

You are an ML engineer at a bank that has a mobile application. Management has asked you to build an ML-based biometric authentication for the

app that verifies a customer’s identity based on their fingerprint. Fingerprints are considered highly sensitive personal information and cannot be

downloaded and stored into the bank databases. Which learning strategy should you recommend to train and deploy this ML mode?

A. Data Loss Prevention API

B. Federated learning

C. MD5 to encrypt data

D. Differential privacy

Correct Answer: C

Community vote distribution

B (88%) 12%

  hiromi Highly Voted  1 year, 4 months ago

Selected Answer: B

B
With federated learning, all the data is collected, and the model is trained with algorithms across multiple decentralized edge devices such as cell
phones or websites, without exchanging them.
(Journey to Become a Google Cloud Machine Learning Engineer: Build the mind and hand of a Google Certified ML professional)
upvoted 8 times

  fitri001 Most Recent  4 days, 15 hours ago

Selected Answer: B

Federated learning allows training the model on the user's devices themselves.

pen_spark
expand_more The model updates its parameters based on local training data on the device without ever needing the raw fingerprint information to
leave the device. This ensures the highest level of privacy for sensitive biometric data.
upvoted 1 times

  fitri001 4 days, 15 hours ago

Data Loss Prevention API (DLAPI): This focuses on protecting data at rest and in transit, not relevant to training a model without storing data.
MD5 Encryption: This is a one-way hashing function, not suitable for encryption and decryption needed for training.expand_more
Differential privacy: While it adds noise to protect privacy, it's not ideal for training image recognition models like fingerprint identification.
upvoted 1 times

  Voyager2 10 months, 4 weeks ago

B. Federated learning.
"information and cannot be downloaded and stored into the bank databases" That excludes DLP. ederated Learning enables mobile phones to
collaboratively learn a shared prediction model while keeping all the training data on device, decoupling the ability to do machine learning from
the need to store the data in the cloud.
upvoted 2 times

  M25 11 months, 3 weeks ago

Selected Answer: B

Went with B
upvoted 1 times

  Yajnas_arpohc 1 year, 1 month ago

Selected Answer: B

I think the giveaway is in the question "Which learning strategy.."... Federated Learning seems to be the only one !
upvoted 3 times

  TNT87 1 year, 1 month ago

Selected Answer: B

B. Federated learning would be the best learning strategy to train and deploy the ML model for biometric authentication in this scenario. Federated
learning allows for training an ML model on distributed data without transferring the raw data to a centralized location.
upvoted 1 times

  zzzzzooooo 1 year, 2 months ago

Selected Answer: A

Ans is A for me
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 228/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

  ares81 1 year, 3 months ago

Selected Answer: A

It seems A, to me.
upvoted 1 times

  mil_spyro 1 year, 4 months ago

Selected Answer: B

Federated Learning enables mobile phones to collaboratively learn a shared prediction model while keeping all the training data on device.
https://ai.googleblog.com/2017/04/federated-learning-collaborative.html
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 229/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #108 Topic 1

You are experimenting with a built-in distributed XGBoost model in Vertex AI Workbench user-managed notebooks. You use BigQuery to split your

data into training and validation sets using the following queries:

CREATE OR REPLACE TABLE ‘myproject.mydataset.training‘ AS

(SELECT * FROM ‘myproject.mydataset.mytable‘ WHERE RAND() <= 0.8);

CREATE OR REPLACE TABLE ‘myproject.mydataset.validation‘ AS

(SELECT * FROM ‘myproject.mydataset.mytable‘ WHERE RAND() <= 0.2);

After training the model, you achieve an area under the receiver operating characteristic curve (AUC ROC) value of 0.8, but after deploying the

model to production, you notice that your model performance has dropped to an AUC ROC value of 0.65. What problem is most likely occurring?

A. There is training-serving skew in your production environment.

B. There is not a sufficient amount of training data.

C. The tables that you created to hold your training and validation records share some records, and you may not be using all the data in your

initial table.

D. The RAND() function generated a number that is less than 0.2 in both instances, so every record in the validation table will also be in the

training table.

Correct Answer: A

Community vote distribution

C (94%) 6%

  M25 11 months, 3 weeks ago

Selected Answer: C

- Excluding D as RAND() samples 80% for “.training” & 20% for “.validaton”: https://stackoverflow.com/questions/42115968/how-does-rand-works
in-bigquery;
- Could be that those 2 samplings share some records since pseudo-randomly sampled over the same “.mytable”, & therefore might not be using
all of its data, thus C seems valid;
- Excluding B as there is no indication otherwise of insufficient amount of training data, after training AUC ROC was 0.8, that we know;
- There could be a training-serving skew occurring in Prod, but “most likely occurring” is C as a result of the selective information presented:
https://developers.google.com/machine-learning/guides/rules-of-ml#training-serving_skew
upvoted 3 times

  formazioneQI 1 year ago

Selected Answer: C

Answer C
upvoted 2 times

  Yajnas_arpohc 1 year, 1 month ago

Selected Answer: C

C seems closest here

upvoted 1 times

  TNT87 1 year, 1 month ago

Selected Answer: C

Answer C
upvoted 1 times

  ailiba 1 year, 2 months ago

Selected Answer: C

since we are calling rand twice it might be that data that was in training set ends up in testing set too. If we had called it just once I would say D.
upvoted 2 times

  Ahmades 1 year, 4 months ago

Selected Answer: D

Hesitated between C and D, but D looks more precise

upvoted 1 times

  pshemol 1 year, 3 months ago

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 230/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

If there were one RAND() in front of those two queries it would be true. There are two separate RAND() and "every record in the validation table
will also be in the training table" is not true.
upvoted 2 times

  hiromi 1 year, 4 months ago

Selected Answer: C

C (not sure)
upvoted 4 times

  mymy9418 1 year, 4 months ago

Selected Answer: C

the rand is generated twice

upvoted 2 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 231/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #109 Topic 1

During batch training of a neural network, you notice that there is an oscillation in the loss. How should you adjust your model to ensure that it

converges?

A. Decrease the size of the training batch.

B. Decrease the learning rate hyperparameter.

C. Increase the learning rate hyperparameter.

D. Increase the size of the training batch.

Correct Answer: C

Community vote distribution

B (93%) 7%

  hiromi Highly Voted  1 year, 4 months ago

Selected Answer: B

B
larger learning rates can reduce training time but may lead to model oscillation and may miss the optimal model parameter values.
upvoted 7 times

  fitri001 Most Recent  4 days, 14 hours ago

Selected Answer: B

A. Decrease Batch Size: While a smaller batch size can sometimes help with convergence, it can also lead to slower training. It might not necessarily
address the issue of oscillation.
C. Increase Learning Rate: A higher learning rate can cause the loss to jump around more erratically, potentially worsening the oscillation problem.
D. Increase Batch Size: A larger batch size can lead to smoother updates but might also make the model less sensitive to local gradients and hinde
convergence, especially with an already oscillating loss.
upvoted 1 times

  Akel123 1 week, 2 days ago

Selected Answer: C

I don't understand
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: B

Went with B
upvoted 1 times

  TNT87 1 year, 1 month ago

Selected Answer: B

Answer B
upvoted 1 times

  enghabeth 1 year, 2 months ago

Selected Answer: B

having a large learning rate results in Instability or Oscillations. Thus, the first solution is to tune the learning rate by gradually decreasing it.
https://towardsdatascience.com/8-common-pitfalls-in-neural-network-training-workarounds-for-them-7d3de51763ad
upvoted 1 times

  mymy9418 1 year, 4 months ago

Selected Answer: B

https://ai.stackexchange.com/questions/14079/what-could-an-oscillating-training-loss-curve-
represent#:~:text=Try%20lowering%20the%20learning%20rate,step%20and%20overshoot%20it%20again.
upvoted 2 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 232/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #110 Topic 1

You work for a toy manufacturer that has been experiencing a large increase in demand. You need to build an ML model to reduce the amount of

time spent by quality control inspectors checking for product defects. Faster defect detection is a priority. The factory does not have reliable Wi-

Fi. Your company wants to implement the new ML model as soon as possible. Which model should you use?

A. AutoML Vision Edge mobile-high-accuracy-1 model

B. AutoML Vision Edge mobile-low-latency-1 model

C. AutoML Vision model

D. AutoML Vision Edge mobile-versatile-1 model

Correct Answer: D

Community vote distribution

B (100%)

  mil_spyro Highly Voted  1 year, 4 months ago

Hence faster defect detection is a priority, AutoML Vision Edge mobile-low-latency-1 model should be the choice. This model is designed to run
efficiently on mobile devices and prioritize low latency, which means that it can provide fast defect detection without requiring a connection to the
cloud.
https://cloud.google.com/vision/automl/docs/train-edge
upvoted 8 times

  maukaba 6 months, 1 week ago

https://cloud.google.com/vertex-ai/docs/training/automl-edge-api
upvoted 1 times

  hiromi Highly Voted  1 year, 4 months ago

Selected Answer: B

B
"reduce the amount of time spent by quality control inspectors checking for product defects."-> low latency
upvoted 6 times

  fitri001 Most Recent  4 days, 14 hours ago

Selected Answer: B

The AutoML Vision Edge mobile-low-latency-1 model prioritizes speed over accuracy, making it ideal for real-time defect detection on the factory
floor without a stable internet connection. This allows for faster inspections and quicker identification of faulty products.
upvoted 2 times

  fitri001 4 days, 14 hours ago

Faster Defect Detection: This is the main priority, and the low-latency model is specifically designed for speed.
Edge Device Compatibility: The model should run on a device without relying on Wi-Fi. AutoML Vision Edge models are optimized for edge
deployments.
upvoted 1 times

  fitri001 4 days, 14 hours ago

A. AutoML Vision mobile-high-accuracy-1 model: While high accuracy is desirable, faster defect detection is the top priority in this case. This
model might be slower due to its focus on accuracy.
C. AutoML Vision model: This model is likely designed for cloud deployment and might not be suitable for running on an edge device
without reliable Wi-Fi.
D. AutoML Vision Edge mobile-versatile-1 model: This model prioritizes a balance between accuracy and latency. While faster than the high-
accuracy model, it might be slower than the low-latency model for this specific use case.
upvoted 1 times

  MultiCloudIronMan 3 weeks, 6 days ago

Selected Answer: B

Edge device with low latency

upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: B

Went with B
upvoted 1 times

  TNT87 1 year, 1 month ago

Selected Answer: B

Answer B

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 233/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

upvoted 1 times

  ares81 1 year, 3 months ago

Selected Answer: B

It's B.
upvoted 1 times

  mil_spyro 1 year, 4 months ago

Selected Answer: B

vote B
upvoted 4 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 234/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #111 Topic 1

You need to build classification workflows over several structured datasets currently stored in BigQuery. Because you will be performing the

classification several times, you want to complete the following steps without writing code: exploratory data analysis, feature selection, model

building, training, and hyperparameter tuning and serving. What should you do?

A. Train a TensorFlow model on Vertex AI.

B. Train a classification Vertex AutoML model.

C. Run a logistic regression job on BigQuery ML.

D. Use scikit-learn in Notebooks with pandas library.

Correct Answer: A

Community vote distribution

B (100%)

  hiromi Highly Voted  1 year, 4 months ago

Selected Answer: B

B (similar to question 7)
upvoted 7 times

  fitri001 Most Recent  3 days, 21 hours ago

Selected Answer: B

Vertex AutoML is a Google Cloud Platform service designed for building machine learning models without writing code.expand_more It automates
various stages of the machine learning pipeline, including those you mentioned:

Exploratory data analysis

Feature selection
Model building (supports various classification algorithms)
Training
Hyperparameter tuning
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: B

Went with B
upvoted 1 times

  TNT87 1 year, 1 month ago

Selected Answer: B

Answer B
upvoted 1 times

  ares81 1 year, 3 months ago

Selected Answer: B

A and D need coding. C is regression, not classification. Hence B.

upvoted 1 times

  ares81 1 year, 3 months ago

My mistake, it's logistic regression, meaning classification. But it still requires some coding. So still B.
upvoted 3 times

  mymy9418 1 year, 4 months ago

Selected Answer: B

BQML will need coding

only AutoML in Vertex AI is codeless from end to end
upvoted 3 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 235/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #112 Topic 1

You are an ML engineer in the contact center of a large enterprise. You need to build a sentiment analysis tool that predicts customer sentiment

from recorded phone conversations. You need to identify the best approach to building a model while ensuring that the gender, age, and cultural

differences of the customers who called the contact center do not impact any stage of the model development pipeline and results. What should

you do?

A. Convert the speech to text and extract sentiments based on the sentences.

B. Convert the speech to text and build a model based on the words.

C. Extract sentiment directly from the voice recordings.

D. Convert the speech to text and extract sentiment using syntactical analysis.

Correct Answer: D

Community vote distribution

A (54%) B (32%) 11%

  mymy9418 Highly Voted  1 year, 4 months ago

Selected Answer: A

Syntactic Analysis is not for sentiment analysis

upvoted 8 times

  fitri001 Most Recent  3 days, 20 hours ago

Selected Answer: A

A. Convert speech to text and extract sentiments based on sentences: This method focuses on the content of the conversation, minimizing the
influence of factors like voice tone (which can be culturally or gender-specific). Sentiment analysis techniques can analyze the meaning and contex
of sentences to identify positive, negative, or neutral sentiment.
upvoted 2 times

  fitri001 3 days, 20 hours ago

B. Convert speech to text and build a model based on the words: While words are important, relying solely on them can miss the context and
lead to bias. For example, "great" might be positive in most cases, but in some cultures, it might be used sarcastically.

C. Extract sentiment directly from voice recordings: This approach can be biased as voice characteristics like pitch or pace can vary based on
gender, age, and cultural background.

D. Convert speech to text and extract sentiment using syntactical analysis: While syntax can provide some clues, it's not the strongest indicator
of sentiment. Additionally, cultural differences in sentence structure could impact accuracy.
upvoted 1 times

  RioGrande 5 months ago

The correct answer should be A. Word embeddings have static embeddings for the same words, while contextual embeddings vary depending on
the context.

"May’s sentence embedding adaptation of WEAT, known as the Sentence Embedding Association Test (SEAT), shows less clear racial and gender
bias in language models and embeddings than the corresponding word embedding formulation"

From: https://medium.com/institute-for-applied-computational-science/bias-in-nlp-embeddings-b1dabb8bbe20
upvoted 2 times

  pico 5 months, 2 weeks ago

Selected Answer: B

This approach involves converting the speech to text, which allows you to analyze the content of the conversations without directly dealing with
the speakers' gender, age, or cultural differences. By building a model based on the words, you can focus on the language used in the
conversations to predict sentiment, making the model more inclusive and less sensitive to demographic factors.

Option A could be influenced by the syntactical nuances and structures used in different cultures, and option C might be impacted by the
variations in voice tones across genders and ages. Option B, on the other hand, relies on the text content, which provides a more neutral and
content-focused basis for sentiment analysis.
upvoted 2 times

  MCorsetti 6 months, 1 week ago

Selected Answer: B

B: People of different cultures will often use difference sentence structures, so words would be safer than sentences
upvoted 1 times

  tavva_prudhvi 5 months, 3 weeks ago

Yeah, but they(words) may miss the context of the sentiment, leading to inaccuracies!
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 236/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

upvoted 1 times

  tavva_prudhvi 9 months ago

Selected Answer: A

building a model based on words, may also be effective but could potentially be influenced by factors such as accents, dialects, or language
variations that may differ between speakers.extracting sentiment directly from voice recordings, may be less accurate due to the subjective nature
of interpreting emotions from audio alone.using syntactical analysis, may be useful in certain contexts but may not capture the full range of
sentiment expressed in a conversation. Therefore, A provides the most comprehensive and unbiased approach to sentiment analysis in this
scenario.
upvoted 1 times

  pico 5 months, 2 weeks ago

Option A could be influenced by the syntactical nuances and structures used in different cultures
upvoted 1 times

  tavva_prudhvi 5 months, 1 week ago

See, both have their own advantages & dissadvantages, but we should choose the option which is more relevant
upvoted 1 times

  ciro_li 9 months, 1 week ago

Selected Answer: A

Answer A
upvoted 1 times

  ciro_li 9 months, 1 week ago

Answer B*
upvoted 1 times

  erenklclar 9 months, 2 weeks ago

Selected Answer: C

By working directly with the audio data, you can account for important aspects like tone, pitch, and rhythm of speech, which might provide
valuable information regarding sentiment.
upvoted 1 times

  [Removed] 9 months, 1 week ago

But the audio will be affected by gender, age, and cultural differences of the customers. When you convert the recording to text, this problem is
less pronounced. So the answer cannot be C
upvoted 1 times

  NickHapton 9 months, 3 weeks ago

vote for A
between words and sentences:
Age and gender considerations: Sentences provide a broader view of sentiment that can help mitigate age and gender biases. Analyzing at the
sentence level allows you to observe sentiment patterns across various demographic groups, which can help identify any biases that may arise. By
considering the overall sentiment expressed in sentences, you can minimize the impact of individual words that might carry specific biases.
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: C

There is the possibility for a more sophisticated architecture for an audio processing pipeline, and the “not impact any stage of the model
development pipeline and results” somewhat calls for a more holistic answer: https://cloud.google.com/architecture/categorizing-audio-files-
using-ml#converting_speech_to_text. Plus, it adds “voice emotion information, related to an audio recording, indicating that a vocal utterance of a
speaker is spoken with negative or positive emotion”: https://patents.google.com/patent/US20140220526A1/en.
upvoted 2 times

  M25 11 months, 3 weeks ago

The emphasis here is on #ResponsibleAI https://cloud.google.com/natural-language/automl/docs/beginners-guide
upvoted 1 times

  M25 11 months, 3 weeks ago

A reason why one could exclude “Convert the speech to text” altogether [Options A, B & D] could be, for instance, because “speech
transcription may have higher error rates for African Americans than White Americans [3]”: https://developers.googleblog.com/2018/04/text-
embedding-models-contain-bias.html.
upvoted 1 times

  M25 11 months, 3 weeks ago

“Cloud NL API can perform syntactic analysis directly on a file located in Cloud Storage.” “Syntactic Analysis [Option D] breaks up the given
text into a series of sentences [Option A] and tokens (generally, words [Option B]) and provides linguistic information about those tokens”:
https://cloud.google.com/natural-language/docs/analyzing-syntax.
It “can be used to identify the parts of speech, determine the structure of a sentence, and determine the meaning of words in context”:
https://ts2.space/en/a-comprehensive-guide-to-google-cloud-natural-language-apis-syntax-analysis/.
upvoted 1 times

  [Removed] 1 year ago

Selected Answer: B

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 237/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Can anyone explain how to choose between words and sentences? I feel like the model could pick up bias from both
upvoted 1 times

  formazioneQI 1 year ago

Selected Answer: B

I agree with qaz09. To avoid demographical variables influence model shoud be built on the words.
upvoted 2 times

  TNT87 1 year, 1 month ago

Selected Answer: A

Answer A
upvoted 1 times

  qaz09 1 year, 2 months ago

Selected Answer: B

For "ensuring that the gender, age, and cultural differences of the customers who called the contact center do not impact any stage of the model
development pipeline and results" I think the model should be built on the words rather than sentences
upvoted 3 times

  ares81 1 year, 3 months ago

Selected Answer: A

A makes sense, to me.

upvoted 1 times

  hiromi 1 year, 4 months ago

Selected Answer: A

A
Convert the speech to text and extract sentiments based on the sentences.
upvoted 1 times

  mil_spyro 1 year, 4 months ago

Selected Answer: D

vote D
upvoted 1 times

  Yajnas_arpohc 1 year, 1 month ago

Based only on words might be misleading; at a minimum need to go w sentences
upvoted 2 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 238/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #113 Topic 1

You need to analyze user activity data from your company’s mobile applications. Your team will use BigQuery for data analysis, transformation,

and experimentation with ML algorithms. You need to ensure real-time ingestion of the user activity data into BigQuery. What should you do?

A. Configure Pub/Sub to stream the data into BigQuery.

B. Run an Apache Spark streaming job on Dataproc to ingest the data into BigQuery.

C. Run a Dataflow streaming job to ingest the data into BigQuery.

D. Configure Pub/Sub and a Dataflow streaming job to ingest the data into BigQuery,

Correct Answer: A

Community vote distribution

A (66%) D (34%)

  pshemol Highly Voted  1 year, 4 months ago

Selected Answer: A

Previously Google pattern was Pub/Sub -> Dataflow -> BQ

but now it looks as there is new Pub/Sub -> BQ
https://cloud.google.com/blog/products/data-analytics/pub-sub-launches-direct-path-to-bigquery-for-streaming-analytics
upvoted 17 times

  TNT87 1 year, 1 month ago

New pub sub??? heheheh
upvoted 1 times

  TNT87 1 year, 1 month ago

https://cloud.google.com/blog/products/data-analytics/pub-sub-launches-direct-path-to-bigquery-for-streaming-analytics
You should have said pub sub has been upgrade to directly stream to bigquery templates...not new pub sub
upvoted 1 times

  ludovikush Most Recent  1 month, 4 weeks ago

Selected Answer: D

Werner123 i agree
upvoted 1 times

  Werner123 2 months ago

Selected Answer: D

User data would most likely include PII, for that case it is still recommended to use Dataflow since you need to remove/anonymise sensitive data.
upvoted 2 times

  pico 5 months, 2 weeks ago

I would have added "with / without data transformation" to the question to choose the right answer between A or D
upvoted 1 times

  andresvelasco 7 months, 3 weeks ago

Selected Answer: A

I had my doubts between A and D.

But since the transformation will occur in bigquery I think Pubsub suffices.
upvoted 3 times

  M25 11 months, 3 weeks ago

Selected Answer: D

Agree with TNT87. From the same link: “For Pub/Sub messages where advanced preload transformations or data processing before landing data in
BigQuery (such as masking PII) is necessary, we still recommend going through Dataflow.” It’s “analyze user activity data”, not merely streaming IoT
into BigQuery so that concerns like privacy are per se n/a. One can deal with PII after landing in BigQuery as well, but apparently that’s not what
they recommend.
upvoted 3 times

  PHD_CHENG 1 year, 1 month ago

Selected Answer: D

Pub/Sub -> DataFlow -> BigQuery

upvoted 2 times

  TNT87 1 year, 1 month ago

Selected Answer: D

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 239/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

D. Configure Pub/Sub and a Dataflow streaming job to ingest the data into BigQuery.

This solution involves using Google Cloud Pub/Sub as the messaging service to receive the data from the mobile application, and then using
Google Cloud Dataflow to transform and load the data into BigQuery in real time. Pub/Sub is a scalable and reliable messaging service that can
handle high-volume real-time data streaming, while Dataflow provides a unified programming model to develop and run data processing
pipelines. This solution is suitable for handling large volumes of user activity data from mobile applications and ingesting it into BigQuery in real-
time for analysis and ML experimentation.
upvoted 2 times

  TNT87 1 year, 1 month ago

Starting today, you no longer have to write or run your own pipelines for data ingestion from Pub/Sub into BigQuery. We are introducing a new
type of Pub/Sub subscription called a “BigQuery subscription” that writes directly from Cloud Pub/Sub to BigQuery. This new extract, load, and
transform (ELT) path will be able to simplify your event-driven architecture. For Pub/Sub messages where advanced preload transformations or
data processing before landing data in BigQuery (such as masking PII) is necessary, we still recommend going through Dataflow
upvoted 1 times

  hiromi 1 year, 4 months ago

Selected Answer: A

A
agree with pshemol
upvoted 3 times

  mymy9418 1 year, 4 months ago

Selected Answer: D

need dataflow
upvoted 2 times

  mil_spyro 1 year, 4 months ago

transformation will be handled in BQ hence I think A
upvoted 6 times

  mymy9418 1 year, 4 months ago

agree.
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 240/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #114 Topic 1

You work for a gaming company that manages a popular online multiplayer game where teams with 6 players play against each other in 5-minute

battles. There are many new players every day. You need to build a model that automatically assigns available players to teams in real time. User

research indicates that the game is more enjoyable when battles have players with similar skill levels. Which business metrics should you track to

measure your model’s performance?

A. Average time players wait before being assigned to a team

B. Precision and recall of assigning players to teams based on their predicted versus actual ability

C. User engagement as measured by the number of battles played daily per user

D. Rate of return as measured by additional revenue generated minus the cost of developing a new model

Correct Answer: C

Community vote distribution

C (64%) B (36%)

  pshemol Highly Voted  1 year, 4 months ago

Selected Answer: C

The game is more enjoyable - the better and "business metrics" points me to user engagement as best metric
upvoted 8 times

  fitri001 Most Recent  3 days, 20 hours ago

Selected Answer: C

focusing on user engagement through the number of battles played daily provides a clearer indication of whether the model successfully creates
balanced and enjoyable matches, which is the core objective. If players find battles more engaging due to fairer competition, they're more likely to
keep playing. This can then translate to long-term benefits like increased retention and potential monetization opportunities.
upvoted 1 times

  fitri001 3 days, 20 hours ago

A. Average time players wait before being assigned to a team: While faster matchmaking is desirable, it shouldn't come at the expense of
balanced teams. If wait times are very low but battles are imbalanced due to poor matchmaking, user engagement might suffer.
B. Precision and recall of assigning players to skill level: These metrics are valuable for evaluating the model's ability to predict skill accurately.
However, they don't directly measure the impact on user experience and enjoyment.
D. Rate of return: This metric focuses on financial gain, which might not be the primary objective in this case. Prioritizing balanced teams for a
more enjoyable experience can indirectly lead to higher user retention and potentially more revenue in the long run.
upvoted 1 times

  edoo 1 month, 3 weeks ago

Selected Answer: C

Tempted by B but "user engagement" is the keyword.

upvoted 2 times

  edoo 1 month, 3 weeks ago

I meant "business metric".
upvoted 2 times

  guilhermebutzke 3 months, 1 week ago

Selected Answer: C

Looking for "business metrics to track," I think C could be the most important metric. Although, option B is also a good choice.
upvoted 2 times

  MCorsetti 6 months, 1 week ago

Selected Answer: C

C: Business metric i.e. outcome driven

upvoted 1 times

  tavva_prudhvi 9 months ago

"Business metrics" does suggest that the question is looking for metrics that are relevant to the business goals of the company, rather than purely
technical metrics. In that case, C.could be a good choice. User engagement is an important metric for any online service, as it reflects how much
users are enjoying and using the product. In the context of a multiplayer game, the number of battles played daily per user can indicate how well
the model is doing in creating balanced teams that are enjoyable to play against. If the model is successful in creating balanced teams, then users
are likely to play more games, which would increase user engagement.

Therefore, C could be a suitable choice to track the performance of the model.

upvoted 3 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 241/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

  Nxtgen 9 months, 4 weeks ago

Selected Answer: C

The focus is to obtain a model that assigns players to teams with players with similar level of skill (or average team 1 skill == average team 2 skill)

A: A fast queue assignment may not focus on pearing players with the same levels of skills. A random assignment would work.

B: This would be an option but is more difficult to measure than C, we don’t know If we have a measure of skill level. Also, for new players this
metric would not be available at the beginning. I think “There are many new players every day.” is a key point important to discard answer B.

C: Players play more games daily ← players enjoy the game more frequently and the other way round should also apply. Easy to measure also for
new players.

D:This focus on costs and revenue not on players matchmaking.

I would go with C.
upvoted 2 times

  Antmal 11 months, 3 weeks ago

Selected Answer: C

C because "user engagement" is a business metric https://support.google.com/analytics/answer/11109416?hl=en

upvoted 3 times

  M25 11 months, 3 weeks ago

Selected Answer: C

Went with C
upvoted 1 times

  [Removed] 1 year ago

Selected Answer: B

This is B, as it directly relates to our model's ability to predict player ability. There are many factors beyond our model which will impact user
engagement (e.g. whether the game is actually enjoyable) so it's not a good measurement of the model performance
upvoted 3 times

  TNT87 1 year ago

Selected Answer: C

Answer C
upvoted 1 times

  PHD_CHENG 1 year, 1 month ago

Selected Answer: C

The question is asking about "available players". Therefore, the business metric is the user engagement.
upvoted 4 times

  JamesDoe 1 year, 1 month ago

Selected Answer: C

Asks for >business metric<, and problem states "user research indicates that the game is more enjoyable when battles have players with similar
skill levels.", which means more battles per user if your model is performing well.
upvoted 1 times

  dfdrin 1 year, 1 month ago

Selected Answer: C

It's C. The question specifically asks for a business metric. Precision and recall are not business metrics, but user engagement is
upvoted 4 times

  guilhermebutzke 1 year, 1 month ago

Selected Answer: B

The template uses the 'ability' to create teams. For this, we can conclude that the system measures the player's skill. So, nothing better than
comparing the predict ability with the actual ability to understand the performance of the model.
upvoted 3 times

  TNT87 1 year, 1 month ago

Selected Answer: B

A. Average time players wait before being assigned to a team

B. Precision and recall of assigning players to teams based on their predicted versus actual ability

These two metrics are the most relevant for measuring the performance of the model in assigning players to teams based on skill level. The
average wait time can indicate whether the model is making efficient and quick team assignments, while precision and recall can measure the
accuracy of the model's predictions. It's important to balance precision and recall since assigning players to a team with a large difference in skill
level could have a negative impact on the players' gaming experience.

C and D are also important metrics to track, but they may not be as directly tied to the performance of the team assignment model. User
engagement can indicate the success of the overall gaming experience, but it can be influenced by other factors beyond team assignments. The
rate of return is also an important metric, but it may not be a direct measure of the success of the team assignment model.

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 242/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

upvoted 4 times

  TNT87 1 year ago

Answer C , user engagement
upvoted 2 times

  shankalman717 1 year, 2 months ago

Selected Answer: B

To measure the performance of a model that assigns available players to teams in real time, the business metrics that should be tracked should
reflect the ability of the model to effectively balance the skill levels of players in battles. Therefore, the best answer is B, precision and recall of
assigning players to teams based on their predicted versus actual ability.
upvoted 3 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 243/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #115 Topic 1

You are building an ML model to predict trends in the stock market based on a wide range of factors. While exploring the data, you notice that

some features have a large range. You want to ensure that the features with the largest magnitude don’t overfit the model. What should you do?

A. Standardize the data by transforming it with a logarithmic function.

B. Apply a principal component analysis (PCA) to minimize the effect of any particular feature.

C. Use a binning strategy to replace the magnitude of each feature with the appropriate bin number.

D. Normalize the data by scaling it to have values between 0 and 1.

Correct Answer: D

Community vote distribution

D (58%) A (33%) 8%

  fitri001 3 days, 20 hours ago

Selected Answer: D

D. Normalize the data by scaling it to have values between 0 and 1 (Min-Max scaling): This technique ensures all features contribute proportionally
to the model's learning process.

pen_spark
expand_more It prevents features with a larger magnitude from dominating the model and reduces the risk of overfitting.expand_more
upvoted 2 times

  fitri001 3 days, 20 hours ago

A. Standardize the data by transforming it with a logarithmic function: While logarithmic transformation can help compress the range of skewed
features, it might not be suitable for all features, and it can introduce non-linear relationships that might not be ideal for all machine learning
algorithms.

B. Apply a principal component analysis (PCA) to minimize the effect of any particular feature: PCA is a dimensionality reduction technique that
can be useful, but its primary function is to reduce the number of features, not specifically address differences in feature scales.

C. Use a binning strategy to replace the magnitude of each feature with the appropriate bin number: Binning can introduce information loss an
might not capture the nuances within each bin, potentially affecting the model's accuracy.
upvoted 1 times

  gscharly 1 week, 1 day ago

Selected Answer: D

agree with pico

upvoted 1 times

  pico 5 months, 2 weeks ago

Selected Answer: D

Not A because a logarithmic transformation may be appropriate for data with a skewed distribution, but it doesn't necessarily address the issue of
features having different scales.
upvoted 4 times

  Krish6488 5 months, 2 weeks ago

Selected Answer: D

Features with a larger magnitude might still dominate after a log transformation if the range of values is significantly different from other features.
Scaling is better, will go with Option D
upvoted 1 times

  envest 8 months, 3 weeks ago

by abylead: Min-Max scaling is a popular technique for normalizing stock price data. Logs are commonly used in finance to normalize relative data
such as returns.https://itadviser.dev/stock-market-data-normalization-for-time-series/
upvoted 1 times

  [Removed] 9 months, 1 week ago

Selected Answer: D

The correct answer is D. Min-max scaling will render all variables comparable by bringing them to a common ground.

A is wrong for the following reasons:

1. It is never mentioned that all variables are positive. If some columns have negative values, log transformation is not applicable.
2. Log transformation of variables having small positive values (close to 0) will increase their magnitude. For example, ln(0.0001) = -9.2, which will
increase this variable's effect considerably.
upvoted 2 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 244/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

  djo06 9 months, 3 weeks ago

Selected Answer: D

D is the right answer

upvoted 1 times

  NickHapton 9 months, 3 weeks ago

go for D, z-score. This question doesn't mention outlier, just large range.
reason why not log transformation:
log transformation is more suitable for addressing skewed distributions and reducing the impact of outliers. It compresses the range of values,
especially for features with a large dynamic range. While it can help normalize the distribution, it doesn't directly address the issue of feature
magnitude overpowering the model.
upvoted 1 times

  SamuelTsch 9 months, 3 weeks ago

Selected Answer: A

From my point of view, log transformation is more tolerant to outliers. Thus, went to A.
upvoted 1 times

  tavva_prudhvi 9 months, 3 weeks ago

n cases where the data has significant skewness or a large number of outliers, option A (log transformation) might be more suitable. However, i
the primary concern is to equalize the influence of features with different magnitudes and the data is not heavily skewed or has few outliers,
option D (normalizing the data) would be more appropriate.
upvoted 1 times

  coolmenthol 10 months, 1 week ago

Selected Answer: A

See https://developers.google.com/machine-learning/data-prep/transform/normalization
upvoted 2 times

  Antmal 11 months, 3 weeks ago

Selected Answer: A

A is a better option because Log transform data used when we want a heavily skewed feature to be transformed into a normal distribution as close
as possible, because when you normalize data using Minimum Maximum scaler, It doesn't work well with many outliers and its prone to
unexpected behaviours if values go out of the given range in the test set. It is a less popular alternative to scaling.
upvoted 1 times

  tavva_prudhvi 9 months, 3 weeks ago

If your data is heavily skewed and has a significant number of outliers, log transformation (option A) might be a better choice. However, if your
primary concern is to ensure that the features with the largest magnitudes don't overfit the model and the data does not have a significant skew
or too many outliers, normalizing the data (option D) would be more appropriate.
upvoted 1 times

  M25 11 months, 3 weeks ago

Selected Answer: D

The challenge is the “scale” (significant variations in magnitude and spread): https://stats.stackexchange.com/questions/462380/does-data-
normalization-reduce-over-fitting-when-training-a-model,
apparently largely used anyhow: https://itadviser.dev/stock-market-data-normalization-for-time-series/.
upvoted 1 times

  M25 11 months, 3 weeks ago

Even if binning “prevents overfitting and increases the robustness of the model”: https://www.analyticsvidhya.com/blog/2020/10/getting-
started-with-feature-engineering,
the disadvantage is that information is lost, particularly on features sharper than the binning: https://www.kaggle.com/questions-and-
answers/171942,
and then you need to reasonably re-adjust the binning to spot the moving target “trends” [excluding C]:
https://stats.stackexchange.com/questions/230750/when-should-we-discretize-bin-continuous-independent-variables-features-and-when.
upvoted 1 times

  M25 11 months, 3 weeks ago

“(…) some features have a large range”, possible presence of outliers exclude standardization [excluding A]:
https://www.analyticsvidhya.com/blog/2020/04/feature-scaling-machine-learning-normalization-standardization/.
“(…) a wide range of factors”, PCA transform the data so that it can be described with fewer dimensions / features:
https://en.wikipedia.org/wiki/Principal_component_analysis, but [excluding B]: it asks to “ensure that the features with largest magnitude don’t
overfit the model”.
upvoted 1 times

  niketd 1 year, 1 month ago

Selected Answer: D

The question doesn't talk about the skewness within each feature. It talks about normalizing the effect of features with large range. So scaling each
feature within (0,1) range will solve the problem
upvoted 1 times

  JamesDoe 1 year, 1 month ago

Really need more info to answer this: what does "large range" mean? Distribution follows a power law --> use log(). Or are they more
evenly/linearly distributed --> use (0,1) scaling.
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 245/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

upvoted 1 times

  guilhermebutzke 1 year, 1 month ago

Selected Answer: C

I think C could be a better choice. Bucketizing the data we can fix the distribution problem by bins.

in letter A, standardization by log could not be effective if the range of the data has negative and positive values.

In letter D, definitely normalization does not resolve the skew problem. Data normalization assumes that data has some normal distribution.

https://medium.com/analytics-vidhya/data-transformation-for-numeric-features-fb16757382c0
upvoted 3 times

  TNT87 1 year, 1 month ago

Selected Answer: D

D. Normalize the data by scaling it to have values between 0 and 1.

Standardization and normalization are common techniques to preprocess the data to be more suitable for machine learning models. Normalization
scales the data to be within a specific range (commonly between 0 and 1 or -1 and 1), which can help prevent features with large magnitudes from
dominating the model. This approach is especially useful when using models that are sensitive to the magnitude of features, such as distance-
based models or neural networks.
upvoted 1 times

  FherRO 1 year, 2 months ago

Selected Answer: A

https://developers.google.com/machine-learning/data-prep/transform/normalization#log-scaling
upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 246/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #116 Topic 1

You work for a biotech startup that is experimenting with deep learning ML models based on properties of biological organisms. Your team

frequently works on early-stage experiments with new architectures of ML models, and writes custom TensorFlow ops in C++. You train your

models on large datasets and large batch sizes. Your typical batch size has 1024 examples, and each example is about 1 MB in size. The average

size of a network with all weights and embeddings is 20 GB. What hardware should you choose for your models?

A. A cluster with 2 n1-highcpu-64 machines, each with 8 NVIDIA Tesla V100 GPUs (128 GB GPU memory in total), and a n1-highcpu-64

machine with 64 vCPUs and 58 GB RAM

B. A cluster with 2 a2-megagpu-16g machines, each with 16 NVIDIA Tesla A100 GPUs (640 GB GPU memory in total), 96 vCPUs, and 1.4 TB

RAM

C. A cluster with an n1-highcpu-64 machine with a v2-8 TPU and 64 GB RAM

D. A cluster with 4 n1-highcpu-96 machines, each with 96 vCPUs and 86 GB RAM

Correct Answer: B

Community vote distribution

D (71%) B (29%)

  aw_49 Highly Voted  11 months, 2 weeks ago

Selected Answer: D

D: use CPU when models that contain many custom TensorFlow operations written in C++
https://cloud.google.com/tpu/docs/intro-to-tpu#cpus
upvoted 5 times

  edoo Most Recent  1 month, 3 weeks ago

Selected Answer: D

B looks like unleashing a rocket launcher to swat a fly ("early-stage experiments"). D is enough (c++).
upvoted 2 times

  tavva_prudhvi 9 months ago

While it is true that using CPUs can be more efficient when dealing with custom TensorFlow operations written in C++, it is important to consider
the specific requirements of your models. In his case, we mentioned large batch sizes (1024 examples), large example sizes (1 MB each), and large
network sizes (20 GB). 4 n1-highcpu-96 machines, each with 96 vCPUs and 86 GB RAM. While this configuration would provide a high number of
vCPUs for custom TensorFlow operations, it lacks the GPU memory and overall RAM necessary to handle the large batch sizes and network sizes of
your models.
upvoted 1 times

  ciro_li 9 months, 1 week ago

B: https://cloud.google.com/tpu/docs/intro-to-tpu#cpus
upvoted 1 times

  pinimichele01 2 weeks, 2 days ago

so D, not B...
upvoted 1 times

  Voyager2 10 months, 3 weeks ago

Selected Answer: D

D: use CPU when models that contain many custom TensorFlow operations written in C++
https://cloud.google.com/tpu/docs/intro-to-tpu#cpus
upvoted 3 times

  LoveExams 11 months, 1 week ago

Wouldn't all PC's work here? I could do this model on my own home PC just fine.
upvoted 2 times

  M25 11 months, 3 weeks ago

Selected Answer: D

“writes custom TensorFlow ops in C++” -> use CPUs when “Models that contain many custom TensorFlow operations written in C++”:
https://cloud.google.com/tpu/docs/intro-to-tpu#when_to_use_tpus
upvoted 2 times

  Antmal 1 year ago

Selected Answer: B

The best hardware for your models would be a cluster with 2 a2-megagpu-16g machines, each with 16 NVIDIA Tesla A100 GPUs (640 GB GPU
memory in total), 96 vCPUs, and 1.4 TB RAM.
https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 247/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

This hardware will give you the following benefits:

High GPU memory: Each A100 GPU has 40 GB of memory, which is more than enough to store the weights and embeddings of your models.
Large batch sizes: With 16 GPUs per machine, you can train your models with large batch sizes, which will improve training speed.
Fast CPUs: The 96 vCPUs on each machine will provide the processing power you need to run your custom TensorFlow ops in C++.
Adequate RAM: The 1.4 TB of RAM on each machine will ensure that your models have enough memory to train and run.
The other options are not as suitable for your needs. Option A has less GPU memory, which will slow down training. Option B has more GPU
memory, but it is also more expensive. Option C has a TPU, which is a good option for some deep learning tasks, but it is not as well-suited for
your needs as a GPU cluster. Option D has more vCPUs and RAM, but it does not have enough GPU memory to train your models.

Therefore, the best hardware for your models is a cluster with 2 a2-megagpu-16g machines.
upvoted 3 times

  TNT87 1 year, 1 month ago

Selected Answer: B

To determine the appropriate hardware for training the models, we need to calculate the required memory and processing power based on the siz
of the model and the size of the input data.

Given that the batch size is 1024 and each example is 1 MB, the total size of each batch is 1024 * 1 MB = 1024 MB = 1 GB. Therefore, we need to
load 1 GB of data into memory for each batch.

The total size of the network is 20 GB, which means that it can fit in the memory of most modern GPUs.
upvoted 3 times

  JeanEl 1 year, 3 months ago

Selected Answer: D

It's D
upvoted 1 times

  JeanEl 1 year, 3 months ago

https://cloud.google.com/tpu/docs/tpus
upvoted 2 times

  hiromi 1 year, 4 months ago

Selected Answer: D

D
CPUs are recommended for TensorFlow ops written in C++
- https://cloud.google.com/tpu/docs/tensorflow-ops (Cloud TPU only supports Python)
upvoted 2 times

  John_Pongthorn 1 year, 3 months ago

GPU can apply through C++ implement,but C rule out for sure.
upvoted 3 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 248/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #117 Topic 1

You are an ML engineer at an ecommerce company and have been tasked with building a model that predicts how much inventory the logistics

team should order each month. Which approach should you take?

A. Use a clustering algorithm to group popular items together. Give the list to the logistics team so they can increase inventory of the popular

items.

B. Use a regression model to predict how much additional inventory should be purchased each month. Give the results to the logistics team at

the beginning of the month so they can increase inventory by the amount predicted by the model.

C. Use a time series forecasting model to predict each item's monthly sales. Give the results to the logistics team so they can base inventory

on the amount predicted by the model.

D. Use a classification model to classify inventory levels as UNDER_STOCKED, OVER_STOCKED, and CORRECTLY_STOCKEGive the report to the

logistics team each month so they can fine-tune inventory levels.

Correct Answer: B

Community vote distribution

C (100%)

  mil_spyro Highly Voted  1 year, 4 months ago

Selected Answer: C

This type of model is well-suited to predicting inventory levels because it can take into account trends and patterns in the data over time, such as
seasonal fluctuations in demand or changes in customer behavior.
upvoted 8 times

  M25 Most Recent  11 months, 3 weeks ago

Selected Answer: C

https://cloud.google.com/learn/what-is-time-series
"For example, a large retail store may have millions of items to forecast so that inventory is available when demand is high, and not overstocked
when demand is low."
upvoted 1 times

  TNT87 1 year, 1 month ago

Selected Answer: C

Answer C
upvoted 1 times

  JeanEl 1 year, 3 months ago

Selected Answer: C

Yup it's C (Time series forecasting)

upvoted 1 times

  ares81 1 year, 3 months ago

Selected Answer: C

Time-series forecasting model is the key expression, for me.

upvoted 1 times

  hiromi 1 year, 4 months ago

Selected Answer: C

C (by experience)
Use a time series forecasting model to predict each item's monthly sales. Give the results to the logistics team so they can base inventory on the
amount predicted by the model.
upvoted 2 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 249/505
29/04/2024, 21:23 Professional Machine Learning Engineer Exam - Free Actual Q&As, Page 1 | ExamTopics

Question #118 Topic 1

You are building a TensorFlow model for a financial institution that predicts the impact of consumer spending on inflation globally. Due to the size

and nature of the data, your model is long-running across all types of hardware, and you have built frequent checkpointing into the training

process. Your organization has asked you to minimize cost. What hardware should you choose?

A. A Vertex AI Workbench user-managed notebooks instance running on an n1-standard-16 with 4 NVIDIA P100 GPUs

B. A Vertex AI Workbench user-managed notebooks instance running on an n1-standard-16 with an NVIDIA P100 GPU

C. A Vertex AI Workbench user-managed notebooks instance running on an n1-standard-16 with a non-preemptible v3-8 TPU

D. A Vertex AI Workbench user-managed notebooks instance running on an n1-standard-16 with a preemptible v3-8 TPU

Correct Answer: B

Community vote distribution

D (100%)

  hiromi Highly Voted  1 year, 4 months ago

Selected Answer: D

D
you have built frequent checkpointing into the training process / minimize cost -> preemptible
upvoted 7 times

  M25 Most Recent  11 months, 3 weeks ago

Selected Answer: D

Follows same principle as #70

upvoted 2 times

  Antmal 1 year ago

Selected Answer: D

Preemptible v3-8 TPUs are the most cost-effective option for training large TensorFlow models. They are up to 80% cheaper than non-preemptible
v3-8 TPUs, and they are only preempted if Google Cloud needs the resources for other workloads.

In this case, the model is long-running and checkpointing is used. This means that the training process can be interrupted and resumed without
losing any progress. Therefore, preemptible TPUs are a safe choice, as the training process will not be interrupted if the TPU is preempted.

The other options are not as cost-effective.

upvoted 1 times

  TNT87 1 year, 1 month ago

Selected Answer: D

Answer D
upvoted 1 times

  ares81 1 year, 3 months ago

Selected Answer: D

Frequent checkpoints --> Preemptible --> D

upvoted 4 times

  mymy9418 1 year, 4 months ago

Selected Answer: D

preemptible is the keyword to me

upvoted 1 times

https://www.examtopics.com/exams/google/professional-machine-learning-engineer/custom-view/ 250/505

Professional Machine Learning Engineer
No ratings yet
Professional Machine Learning Engineer
106 pages
Google Cloud ML Engineer Exam Tips
No ratings yet
Google Cloud ML Engineer Exam Tips
12 pages
SEM RESPOSTA - 736496689-Google-Cloud-Professional-Machine-Learning-Engineer-Exam-Questions
No ratings yet
SEM RESPOSTA - 736496689-Google-Cloud-Professional-Machine-Learning-Engineer-Exam-Questions
82 pages
Google Certleader Professional-Machine-Learning-Engineer Vce Download 2024-Jul-26 by Spencer 78q Vce
No ratings yet
Google Certleader Professional-Machine-Learning-Engineer Vce Download 2024-Jul-26 by Spencer 78q Vce
8 pages
Professional Machine Learning Engineer Demo
No ratings yet
Professional Machine Learning Engineer Demo
6 pages
Professional Machine Learning Engineer Exam Questions
No ratings yet
Professional Machine Learning Engineer Exam Questions
2 pages
PMLE Dumps
No ratings yet
PMLE Dumps
4 pages
Nca Genl Demo
No ratings yet
Nca Genl Demo
13 pages
GCP ML Engineer Questions (Examtopics) 49 Q
No ratings yet
GCP ML Engineer Questions (Examtopics) 49 Q
22 pages
Telephone Directory File 2022
No ratings yet
Telephone Directory File 2022
112 pages
Generative AI Leader Questions
No ratings yet
Generative AI Leader Questions
2 pages
Practice Test 4
No ratings yet
Practice Test 4
87 pages
Aif C01
No ratings yet
Aif C01
25 pages
GCP PMLE Notes
No ratings yet
GCP PMLE Notes
3 pages
Senior Big Data Engineer Profile
No ratings yet
Senior Big Data Engineer Profile
6 pages
Associate Cloud Engineer
No ratings yet
Associate Cloud Engineer
9 pages
Amazon AI1-C01 v2024-12-10 q37
No ratings yet
Amazon AI1-C01 v2024-12-10 q37
12 pages
Cheat Sheet GCP Data Engineer Professional
No ratings yet
Cheat Sheet GCP Data Engineer Professional
74 pages
AI Practitioner Bible
No ratings yet
AI Practitioner Bible
15 pages
Professional Cloud Database Engineer
No ratings yet
Professional Cloud Database Engineer
40 pages
Hemanshu Kumar Saraf - Resume New
No ratings yet
Hemanshu Kumar Saraf - Resume New
1 page
Serverless Data Processing With Dataflow - Foundations
No ratings yet
Serverless Data Processing With Dataflow - Foundations
2 pages
Pcap01 06
No ratings yet
Pcap01 06
28 pages
Reading Preparing For ACE Module 2 v2.0
No ratings yet
Reading Preparing For ACE Module 2 v2.0
30 pages
AWS Certified Machine Learning Study Guide Specialty MLS C01 Exam 1st Edition Subramanian Download
No ratings yet
AWS Certified Machine Learning Study Guide Specialty MLS C01 Exam 1st Edition Subramanian Download
145 pages
Questions - 150 To 200 - AWS GEN AI CERTIFICATION
No ratings yet
Questions - 150 To 200 - AWS GEN AI CERTIFICATION
18 pages
Professional Cloud Architect Exam - Free Actual Q&As, Page 1 - ExamTopics
No ratings yet
Professional Cloud Architect Exam - Free Actual Q&As, Page 1 - ExamTopics
4 pages
Top 10 Machine Learning Algo PDF
No ratings yet
Top 10 Machine Learning Algo PDF
15 pages
Google - Professional Data Engineer.v2022 05 17.q108
No ratings yet
Google - Professional Data Engineer.v2022 05 17.q108
62 pages
Practice Test 2
No ratings yet
Practice Test 2
84 pages
Train With Shubham Syllabus
No ratings yet
Train With Shubham Syllabus
61 pages
BigQuery Questions+Answers
100% (1)
BigQuery Questions+Answers
5 pages
Exam Professional Data Engineer Topic 1 Question 204 Discussion - ExamTopics
No ratings yet
Exam Professional Data Engineer Topic 1 Question 204 Discussion - ExamTopics
1 page
AI-900 Slides
100% (1)
AI-900 Slides
193 pages
BigQuery Pricing Guide
No ratings yet
BigQuery Pricing Guide
18 pages
DR Antonio Gulli - A Collection of Advanced Data Science and Machine Learning Interview Questions Solved in Python and Spark (II) - Hands-On Big Data and Machine - Programming Interview Questions) (
No ratings yet
DR Antonio Gulli - A Collection of Advanced Data Science and Machine Learning Interview Questions Solved in Python and Spark (II) - Hands-On Big Data and Machine - Programming Interview Questions) (
112 pages
Generative AI Leader Google Exam Practice Questions
No ratings yet
Generative AI Leader Google Exam Practice Questions
4 pages
GCP Data Engineering Cheet Sheet
No ratings yet
GCP Data Engineering Cheet Sheet
11 pages
Google Data Engineer Certification Guide
No ratings yet
Google Data Engineer Certification Guide
4 pages
Machine Learning For Business: Using Amazon SageMaker and Jupyter 1st Edition Doug Hudgeon Full Chapters Instanly
100% (1)
Machine Learning For Business: Using Amazon SageMaker and Jupyter 1st Edition Doug Hudgeon Full Chapters Instanly
129 pages
Introduction to NLP in AI
No ratings yet
Introduction to NLP in AI
43 pages
Professional Data Engineer Certification Exam Guide - Learn - Google Cloud
No ratings yet
Professional Data Engineer Certification Exam Guide - Learn - Google Cloud
10 pages
Cloud Digital Leader-New PDF
No ratings yet
Cloud Digital Leader-New PDF
142 pages
Machine Learning & DevOps Sessions Summary
No ratings yet
Machine Learning & DevOps Sessions Summary
23 pages
MERN Stack 45-Day Challenge
No ratings yet
MERN Stack 45-Day Challenge
21 pages
Binder
No ratings yet
Binder
97 pages
Python - Dictionary Runtime (Codility Test) - Stack Overflow - HTM
No ratings yet
Python - Dictionary Runtime (Codility Test) - Stack Overflow - HTM
28 pages
Machine Learning Algorithms Theory - Vimal Mishra
No ratings yet
Machine Learning Algorithms Theory - Vimal Mishra
931 pages
300 Core Java Interview Questions (2024) - Javatpoint
No ratings yet
300 Core Java Interview Questions (2024) - Javatpoint
73 pages
Google Ai ML Virtual Internship Report
No ratings yet
Google Ai ML Virtual Internship Report
29 pages
Vertex AI ML Workflow Guide
No ratings yet
Vertex AI ML Workflow Guide
70 pages
Transformer Architecture Explained
No ratings yet
Transformer Architecture Explained
8 pages
Neo4j Answers
No ratings yet
Neo4j Answers
61 pages
Free Professional Cloud Architect Exam Questions
No ratings yet
Free Professional Cloud Architect Exam Questions
14 pages
Google Data Engineer Exam Q&A
No ratings yet
Google Data Engineer Exam Q&A
2 pages
Sandeep Interview
No ratings yet
Sandeep Interview
27 pages
Professional Data Engineer Certification - Learn - Google Cloud
No ratings yet
Professional Data Engineer Certification - Learn - Google Cloud
5 pages
Lab7 LLM Chains
No ratings yet
Lab7 LLM Chains
7 pages
Professional Machine Learning Engineer Exam - Free Actual Q&as, Page 1 - ExamTopics
No ratings yet
Professional Machine Learning Engineer Exam - Free Actual Q&as, Page 1 - ExamTopics
111 pages
Gujarat Technological University: Diploma Engineering - Semester - 1 (Ctod) New - Examination - Summer - 2023
No ratings yet
Gujarat Technological University: Diploma Engineering - Semester - 1 (Ctod) New - Examination - Summer - 2023
5 pages
Tài liệu bồi dưỡng học sinh giỏi tiếng Anh lớp 7
100% (1)
Tài liệu bồi dưỡng học sinh giỏi tiếng Anh lớp 7
26 pages
Test Unit 4 Mixed Past Tenses and Newspapers 2020
No ratings yet
Test Unit 4 Mixed Past Tenses and Newspapers 2020
4 pages
Eternal Recurrence Personal Infinity 2019
No ratings yet
Eternal Recurrence Personal Infinity 2019
18 pages
Grade 4 Winter Break HW
No ratings yet
Grade 4 Winter Break HW
1 page
Meditation, Prayer, Fasting, Study
No ratings yet
Meditation, Prayer, Fasting, Study
50 pages
Obedience Cost Us Something.
No ratings yet
Obedience Cost Us Something.
15 pages
Toltec Art of Life and Death Excerpt For MiguelRuiz Enewsletter
100% (2)
Toltec Art of Life and Death Excerpt For MiguelRuiz Enewsletter
14 pages
Applications of Ladder Diagrams
No ratings yet
Applications of Ladder Diagrams
16 pages
Proto Indo European - Language
No ratings yet
Proto Indo European - Language
20 pages
WEEK 1-2 The Foundation of Filipino Christian Living - UPHSL
No ratings yet
WEEK 1-2 The Foundation of Filipino Christian Living - UPHSL
4 pages
Ebooks File Introductory Statistics For Data Analysis Warren J. Ewens All Chapters
No ratings yet
Ebooks File Introductory Statistics For Data Analysis Warren J. Ewens All Chapters
49 pages
Introduction To Ms-Excel: Spreadsheet Data Pivot Tables Visual Basic For Applications
No ratings yet
Introduction To Ms-Excel: Spreadsheet Data Pivot Tables Visual Basic For Applications
11 pages
Test Feelings and Animals
No ratings yet
Test Feelings and Animals
2 pages
Lesson 1: 1.1. Egyptian Nouns
No ratings yet
Lesson 1: 1.1. Egyptian Nouns
11 pages
English Language Proficiency Course
No ratings yet
English Language Proficiency Course
15 pages
Qgis Tutorial
No ratings yet
Qgis Tutorial
53 pages
Gejala Batu Hempedu
No ratings yet
Gejala Batu Hempedu
4 pages
Present Perfect
No ratings yet
Present Perfect
20 pages
FIKAYI Harmo Augustin TSHOMBE IBANDA & Jonathan MWAMBA
No ratings yet
FIKAYI Harmo Augustin TSHOMBE IBANDA & Jonathan MWAMBA
11 pages
Eng 110 Purpossive Communication
No ratings yet
Eng 110 Purpossive Communication
8 pages
Narrativizing Visual Culture
No ratings yet
Narrativizing Visual Culture
5 pages
OAVS TGT (Maths) Official Paper (Held On - 29 May, 2018 Shift 1)
No ratings yet
OAVS TGT (Maths) Official Paper (Held On - 29 May, 2018 Shift 1)
32 pages
Revised Chapter 15 Post Covid BSTD Grade 12 Notes On Presentation and Data Response
No ratings yet
Revised Chapter 15 Post Covid BSTD Grade 12 Notes On Presentation and Data Response
8 pages
Exploring Urban Expression Graffiti Art Education Presentation in Blue Cyan Red Flat Graphic Semi Realistic Style
No ratings yet
Exploring Urban Expression Graffiti Art Education Presentation in Blue Cyan Red Flat Graphic Semi Realistic Style
13 pages
Action Research
No ratings yet
Action Research
27 pages
SAP Hybris V6.2 Certified Development Professional Study Guide - Quuth5ootaip
No ratings yet
SAP Hybris V6.2 Certified Development Professional Study Guide - Quuth5ootaip
260 pages
Islamic New Year
No ratings yet
Islamic New Year
9 pages
Huawei HCIP Cloud Exam Prep
100% (1)
Huawei HCIP Cloud Exam Prep
11 pages
Data Structures: Stacks & Queues
No ratings yet
Data Structures: Stacks & Queues
74 pages