0 of 50 Questions completed
Questions:
You have already completed the quiz before. Hence you can not start it again.
You must sign in or sign up to start the quiz.
You must first complete the following:
Quiz complete. Results are being recorded.
0 of 50 Questions answered correctly
Your time:
Time has elapsed
You have reached 0 of 0 point(s), (0 )
Earned Point(s): 0 of 0 , (0 )
0 Essay(s) Pending (Possible Point(s): 0 )
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
Current
Review
Answered
Correct
Incorrect
Question 1 of 50
1 point(s)
For exploration and analysis, a Machine Learning Specialist must be able to consume streaming data and store it in Apache Parquet files. Which of the services mentioned above would correctly ingest and store this data?
Question 2 of 50
1 point(s)
A data scientist has investigated and cleaned a dataset for the modeling stage of a supervised learning job. Between features, the statistical dispersion can vary significantly, often by many orders of magnitude. The data scientist wants to ensure that the production data’s prediction performance is as accurate as feasible before the modeling phase. How should the data scientist go to fulfill these requirements?
Sampling at random is used for the dataset. Create training, validation, and test sets from the dataset.
Create training, validation, and test sets from the dataset. The training set should be rescaled, and the validation and test sets should also use the same scaling.
Create training, validation, and test sets from the dataset.
Create training, validation, and test sets from the dataset. After that, individually rescale the test set, the validation set, and the training set.
Question 3 of 50
1 point(s)
An Amazon SageMaker project using TensorFlow is given to a machine learning specialist who must work on it for a long without Wi-Fi access. What method should the Specialist employ to go on working?
Question 4 of 50
1 point(s)
Real-time insights into a data stream of GZIP files are what a data scientist seeks. Which method would offer the most minor lag while using SQL to query the stream?
Question 5 of 50
1 point(s)
A retail business plans to classify new items using machine learning. The Data Science group was given a labeled dataset of recent goods. One thousand two hundred goods are included in the dataset. Each product in the labeled dataset comprises 15 attributes, including the title, dimensions, weight, and price. Each item falls under one of six categories: movies, video games, books, and gadgets. Which model should be applied to classify fresh items using the supplied training dataset?
Question 6 of 50
1 point(s)
A data scientist is working on An application that does sentiment analysis. The Data Scientist believes the low validation accuracy may be due to the dataset’s extensive vocabulary and low average word frequency. What tool ought to be employed to increase the validity’s accuracy?
Question 7 of 50
1 point(s)
A machine learning specialist is developing a model to forecast future employment rates based on various economic variables. The Specialist discovers as she examines the data that the amplitude of the input characteristics varies substantially. The Specialist does not want the model to be dominated by variables of a greater size. What steps should the Specialist take to prepare the data for model training?
Question 8 of 50
1 point(s)
The technique for utilizing Amazon Athena to query a dataset on Amazon S3 must be developed by a machine learning specialist. More than 800,000 records in the dataset are kept as unencrypted CSV files. Each record is around 1.5 MB in size and has 200 columns. Most searches will only cover 5–10 columns. How can the machine learning specialist reduce query runtime by transforming the dataset?
Question 9 of 50
1 point(s)
An organization gathers census data nationwide to identify healthcare and social program needs by province and city. Each person must respond to over 500 questions on the census form. Which medley of algorithms would offer the right insights? (Select two.) The Latent Dirichlet Allocation (LDA) method, the factorization machines (FM) algorithm, and the principal component analysis (PCA) algorithm are examples of algorithms.
Question 10 of 50
1 point(s)
A dataset is uploaded to an Amazon S3 bucket with server-side encryption using AWS KMS by a machine learning specialist. To allow it to read the same dataset from Amazon S3, how should the ML Specialist define the Amazon SageMaker notebook instance?
Question 11 of 50
1 point(s)
A manufacturer of aviation engines is tracking 200 performance parameters over time. Engineers strive to find serious production flaws during testing as quickly as possible. For offline analysis, all of the data must be saved. What method would be the MOST efficient for performing fault detection in close to real-time?
For data input, storage, and additional analysis, use AWS IoT Analytics. To analyze anomalies inside AWS IoT Analytics, use Jupyter notebooks.
For data intake, storage, and additional analysis, use Amazon S3. Apply Apache Spark ML k-means clustering to find anomalies using an Amazon EMR cluster.
For data intake, storage, and additional analysis, use Amazon S3. Use the Amazon SageMaker Random Cut Forest (RCF) algorithm to find anomalies.
Perform anomaly detection using Amazon Kinesis Data Analytics Random Cut Forest (RCF) and Amazon Kinesis Data Firehose for data input. To store data on Amazon S3 for further analysis, use Kinesis Data Firehose.
Question 12 of 50
1 point(s)
On Amazon SageMaker, a machine learning team runs its training algorithm. The training algorithm needs outside resources. The team’s algorithm code and algorithm-specific parameters must be submitted to Amazon SageMaker. Which services should the team use to create a unique algorithm in Amazon SageMaker? (Select two.)
Question 13 of 50
1 point(s)
A machine learning specialist desires the right SageMakerVariantInvocationsPerInstance option for an endpoint automatic scaling configuration. The Specialist has run a load test on a single instance and found that the maximum number of requests per second (RPS) that can be processed without the service degrading is around 20 RPS. The Specialist plans to set the invocation safety factor to 0.5 as this is the initial deployment. What should the Specialist specify as the SageMakerVariantInvocationsPerInstance setting based on the parameters above, and the invocations per instance setting calculated per minute?
Question 14 of 50
1 point(s)
A corporation assesses the risk characteristics of a specific energy sector using a long short-term memory (LSTM) model. The program reads through multi-page texts and analyses each sentence, classifying it as either posing a risk or not. Even though the data scientist has tested with various network designs and set the associated hyperparameters, the model is not performing adequately. Which strategy will enhance performance the MOST?
Question 15 of 50
1 point(s)
A machine learning specialist must transfer and convert data to prepare for training. While some data may be transmitted hourly, some must be handled in close to real-time. The data has to be cleaned and feature-engineered using current Amazon EMR MapReduce operations. As mentioned earlier, which services can provide data to MapReduce jobs? (Select two.)
Question 16 of 50
1 point(s)
A machine learning expert previously used sci-kit-learn to train a logistic regression model on a local machine, and the expert is now looking to deploy the model to production for use in inference only. What actions should be taken to guarantee that Amazon SageMaker may host a model trained locally?
Question 17 of 50
1 point(s)
An international trucking firm is gathering real-time visual data from its fleet of vehicles. Data is increasing quickly, and 100 GB of fresh data is produced daily. The business wants to investigate potential machine learning applications while ensuring that only certain IAM users may access the data. Which type of storage enables the most processing flexibility and supports IAM access control?
Question 18 of 50
1 point(s)
To determine if a new credit card applicant would miss a payment on their credit card, a credit card firm wishes to develop a credit scoring model. The business has gathered information from several sources with thousands of unprocessed qualities. Early classification model training tests showed that many qualities are highly associated, the enormous number of features considerably slows down training time, and there are some overfitting difficulties. In this project, the data scientist wants to shorten the model training period without losing much of the original dataset’s data. Which feature engineering approach should the data scientist employ to accomplish the goals?
Question 19 of 50
1 point(s)
A data scientist trains a multilayer perception (MLP) on a dataset with several classifications. Despite being distinct from the other classes in the dataset, the target class of interest does not meet an adequate recall score. The Data Scientist has tried adjusting the MLP’s hidden layers’ size and quantity, but this did not appreciably enhance the outcomes. Implementing a recall solution as soon as feasible is necessary. What methods should be applied to fulfill these demands?
Question 20 of 50
1 point(s)
A machine learning specialist who works for a credit card processing business must identify potentially fraudulent transactions in close to real-time. The Specialist must specifically train a model that returns the likelihood that a particular transaction is fraudulent. How should the specialist approach this commercial issue?
Question 21 of 50
1 point(s)
A real estate business wishes to develop a machine-learning model for forecasting home values based on a historical dataset. Thirty-two features are present in the dataset. Which model will fulfill the needs of the business?
Question 22 of 50
1 point(s)
A machine learning expert is doing a linear least squares regression model on a dataset with 1,000 records and 50 features. The ML specialist observes that two characteristics are completely linearly dependent before training. Why can the linear least squares regression model have problems with this?
It could lead to one of the following outcomes:
Question 23 of 50
1 point(s)
An expert in machine learning wants to integrate a unique algorithm into Amazon SageMaker. The Specialist uses an Amazon SageMaker-compatible Docker container to implement the algorithm. How should the Specialist bundle the Docker container so that Amazon SageMaker adequately launches the training?
Question 24 of 50
1 point(s)
A data scientist must analyze data about employment. There are around 10 million observations of humans across ten distinct characteristics in the collection. The Data Scientist observes that the age and income distributions are out of the ordinary during the preliminary investigation. While there is a right skew in the income distribution, as would be expected, with fewer people having higher incomes, there is also a right skew in the age distribution, with fewer older people working.
What feature transformations can the data scientist use to repair the wrongly skewed data?
Question 25 of 50
1 point(s)
A machine learning specialist will build a long-running Amazon EMR cluster. There will be one controller node, ten core nodes, and twenty task nodes in the EMR cluster. The Specialist will employ Spot Instances in the EMR cluster to cut expenses. The Specialist should launch which nodes on Spot Instances.
Question 26 of 50
1 point(s)
Based on the historical sales data that is currently accessible, a corporation wishes to forecast the sale values of homes. The sale price is the critical variable in the company’s dataset. The characteristics include the lot size, measures of the living and non-living areas, the number of bedrooms and bathrooms, the year the house was built, and the postal code. The organization plans to apply multi-variable linear regression to forecast home selling prices. What action should a machine learning expert take to exclude unimportant features for the analysis and simplify the model?
Question 27 of 50
1 point(s)
A healthcare business plans to employ neural networks to categorize X-ray pictures into normal and pathological groups. A training set of 1,000 photos and a test set of 200 images are created from the labeled data. A neural network model with 50 hidden layers that had undergone initial training achieved 99% accuracy on the training set but only 55% on the test set. What adjustments have to be made by the Specialist to address this problem? (Select three.)
Question 28 of 50
1 point(s)
A giant corporation has created a BI application that uses information gathered from various operational KPIs to produce reports and dashboards. The business hopes to provide CEOs with a better experience so they can speak naturally while requesting information from reports. The business wants the executives to have the option of asking inquiries orally and in writing. What services may be used to provide this conversational interface? (Select three.)
Question 29 of 50
1 point(s)
A fruit processing firm requires a machine learning expert to develop a system that divides apples into three kinds. A neural network pre-trained on ImageNet using this dataset by the expert has been used to apply transfer learning to a dataset comprising 150 photos of each variety of apples. The firm requires at least 85% accuracy for the model.
Following a thorough grid search, the best hyperparameters led to the following results:
• 68% of the training set’s accuracy
• 67% precision on the validation set
What can the expert in machine learning do to increase the system’s precision?
Question 30 of 50
1 point(s)
To identify which things were taken out and which were kept, a corporation takes pictures with a camera of the tops of items on shop shelves. After many hours of labeling data, the business has 1,000 hand-labeled photos covering ten categories. The training’s outcomes were not good. Which machine learning strategy best meets the long-term requirements of the company?
Question 31 of 50
1 point(s)
A data scientist creates a binary classifier to determine if a patient has a specific ailment from a sequence of test results. Four hundred patients were randomly chosen from the population, and the data scientist has information about them. 3% of the population exhibits symptoms of the condition. What cross-validation technique ought the data scientist to use?
Question 32 of 50
1 point(s)
A media firm wants to index its assets to enable quick identification of pertinent information by the Research team from a vast collection of unlabeled photographs, text, audio, and video recordings. The business wants to employ machine learning to speed up the work of its researchers, who have little experience with the technology. Which method of indexing the assets is the FASTEST?
Question 33 of 50
1 point(s)
An online store that wants to conduct analytics on each client visit, processed through a machine learning pipeline, is hiring a machine learning specialist. The JSON data blob is 100 KB in size and needs to be consumed by Amazon Kinesis Data Streams at up to 100 transactions per second. How many Kinesis Data Streams shards in the bare minimum should the Specialist utilize to ingest this data correctly?
Question 34 of 50
1 point(s)
A machine learning expert chooses between creating a comprehensive Bayesian network and a basic Bayesian model for a classification issue. When the Specialist calculates the Pearson correlation coefficients between each characteristic, they are found to have absolute values between 0.1 and 0.95. Which model, in this case, best describes the underlying data?
Question 35 of 50
1 point(s)
A data scientist was given a collection of insurance records, each with the record ID, the date of the ultimate conclusion, and the outcome from 200 different categories. Only a few of the 200 categories receive some incomplete information on claim contents. Hundreds of records for each result category are spread throughout the last three years. A few months in advance, the data scientist wishes to forecast the number of claims expected in each category from month to month. Which kind of machine learning model ought to be applied?
Question 36 of 50
1 point(s)
When a machine unexpectedly breaks down while a producer with a complex supply chain relationship operates, the output may cease at numerous companies. A data scientist wants to examine factory sensor data to locate machinery needing preventive maintenance. They will then send a service crew to minimize unscheduled downtime. Up to 200 data points, including temperatures, voltages, vibrations, RPMs, and pressure measurements, may be obtained from a single machine’s sensor readings. The manufacturer equipped the factories with Wi-Fi and LANs to gather this sensor data. The company wants to keep near-real-time inference capabilities even though many production locations lack dependable or fast internet access. Which model deployment architecture will take these business needs into account?
Question 37 of 50
1 point(s)
For Amazon SageMaker, a machine learning specialist is developing a scalable data storage solution. An existing TensorFlow-based model that uses static training data currently kept as TFRecords is implemented as a train.py script. Which approach would match the business needs with the most minor development overhead for sending training data to Amazon SageMaker?
Question 38 of 50
1 point(s)
A machine learning algorithm that can be used to determine whether or not people in a series of photographs are sporting the company’s retail brand is something the research and development team has been tasked with creating by the chief editor of a product catalog. The group contains a collection of practice data. Which machine learning algorithm best fits the researchers’ needs should they use?
Question 39 of 50
1 point(s)
During a marketing campaign, a retail firm utilizes Amazon Personalise to provide its consumers with personalized product recommendations. Immediately following the deployment of a new solution version, the business observes a considerable rise in sales of products advised to current customers; however, these sales fall off shortly after the deployment. For training purposes, only old data from before the marketing campaign is accessible. How ought a data scientist modify the answer?
Question 40 of 50
1 point(s)
Calls to the Amazon SageMaker Service API need to be secured, according to a machine learning (ML) expert. The Specialist seeks to encrypt traffic from certain groups of instances and IAM users and has established Amazon VPC with a VPC interface endpoint for the Amazon SageMaker Service API. There is just one public subnet defined for the VPC. What sequence of actions should the ML expert follow to secure the traffic? (Select two.)
Question 41 of 50
1 point(s)
A logistics firm wants a forecast model to anticipate the inventory needs for a single item across ten warehouses for the following month. A machine learning expert uses Amazon Forecast to create a forecast model using three years’ monthly data. There are no data gaps. To train a predictor, the expert chooses the DeepAR+ algorithm. The predictor MAPE is significantly more significant than the MAPE generated by the present human forecasters.
What modifications to the CreatePredictor API call could enhance MAPE? (Select two.)
Question 42 of 50
1 point(s)
A machine learning expert runs an Amazon SageMaker endpoint utilizing the integrated object identification algorithm on a P3 instance for real-time predictions in a business’s production application. The expert discovers that the model only consumes a small portion of the GPU while assessing the model’s resource use. Which architectural modifications would guarantee the effective utilization of given resources?
Question 43 of 50
1 point(s)
A data scientist uses an Amazon SageMaker notebook instance to explore and analyze data. This necessitates installing specific Python packages on the notebook instance that Amazon SageMaker does not natively support. How can a machine learning expert ensure the data scientist’s necessary packages are always accessible on the notebook instance?
Question 44 of 50
1 point(s)
For an organization’s e-commerce platform, a data scientist must recognize fake user accounts. The business needs the capacity to ascertain whether a recently formed account is connected to a previously identified fraudulent user. The data scientist used AWS Glue to clean the company’s application logs during ingestion.
Which approach will enable the data scientist to spot fake accounts?
Question 45 of 50
1 point(s)
A data scientist used 500,000 aligned phrase pairs and the built-in seq2seq technique of Amazon SageMaker to create a machine-learning translation model for translating English to Japanese. During testing with sample sentences, the data scientist discovers that the translation quality is acceptable for a sample of as few as five words. However, if the statement exceeds 100 words, the quality degrades to an undesirable level. Which course of action will fix the issue?
Question 46 of 50
1 point(s)
A financial institution is looking for signs of credit card theft. The business found that, on average, 2% of credit card transactions were fraudulent. A data scientist built a classifier using data from a year’s worth of credit card transactions. The model must distinguish between legitimate transactions (negatives) and fraudulent ones (positives). The corporation wants to record as many advantages as it can precisely. Which measures ought the data scientist to employ to enhance the model? (Select two.)
Question 47 of 50
1 point(s)
A machine learning expert is creating a proof of concept for government users whose top priority is security. The expert trains a convolutional neural network (CNN) model for a picture classification application using Amazon SageMaker. The expert wishes to secure the information so malicious programs mistakenly left on the training container cannot access or transmit it to a remote computer. Which course of action will offer the MOST secure defense?
Question 48 of 50
1 point(s)
An organization loads machine learning (ML) data into an Amazon S3 data lake via web advertising clicks. The Kinesis Producer Library (KPL) adds click data to an Amazon Kinesis data stream. Using an Amazon Kinesis Data Firehose delivery stream, the data is put into the S3 data lake from the data stream. An ML expert observes that the pace of data absorbed into Amazon S3 remains constant primarily as the volume of data grows. Additionally, the amount of data that has to be ingested by Kinesis Data Streams and Kinesis Data Firehose is growing.
Which action is MOST likely to increase the Rate at which data is ingested into Amazon S3?
Question 49 of 50
1 point(s)
An organization that provides financial services wants to make Amazon SageMaker the standard data science environment. The data scientists at the firm use private financial data to run machine learning (ML) models. The business needs an ML engineer to safeguard the environment since it is concerned about data leakage. What controls does the ML engineer have over the data leaving SageMaker? (Select three.)
Question 50 of 50
1 point(s)
A few weeks have passed since a data scientist started using an Amazon SageMaker notebook instance. A new version of the Jupyter Notebook and other software upgrades were launched. The security team requires all active SageMaker notebook instances to employ the most recent security and software upgrades.
How can a data scientist fulfill these demands?