Exam4Training

Microsoft DP-201 Designing an Azure Data Solution Online Training

Question #1

Topic 1, Trey Research

Case study

This is a case study. Case studies are not timed separately. You can use as much exam time as you would like to complete each case. However, there may be additional case studies and sections on this exam. You must manage your time to ensure that you are able to complete all questions included on this exam in the time provided.

To answer the questions included in a case study, you will need to reference information that is provided in the case study. Case studies might contain exhibits and other resources that provide more information about the scenario that is described in the case study. Each question is independent of the other questions in this case study.

At the end of this case study, a review screen will appear. This screen allows you to review your answers and to make changes before you move to the next section of the exam. After you begin a new section, you cannot return to this section.

To start the case study

To display the first question in this case study, click the Next button. Use the buttons in the left pane to explore the content of the case study before you answer the questions. Clicking these buttons displays information such as business requirements, existing environment, and problem statements. If the case study has an All Information tab, note that the information displayed is identical to the information displayed on the subsequent tabs. When you are ready to answer a question, click the Question button to return to the question.

Background

Trey Research is a technology innovator. The company partners with regional transportation department office to build solutions that improve traffic flow and safety.

The company is developing the following solutions:

Regional transportation departments installed traffic sensor systems on major highways across North America.

Sensors record the following information each time a vehicle passes in front of a sensor:

– Time

– Location in latitude and longitude

– Speed in kilometers per second (kmps)

– License plate number

– Length of vehicle in meters

Sensors provide data by using the following structure:

Traffic sensors will occasionally capture an image of a vehicle for debugging purposes.

You must optimize performance of saving/storing vehicle images.

Traffic sensor data

– Sensors must have permission only to add items to the SensorData collection.

– Traffic data insertion rate must be maximized.

– Once every three months all traffic sensor data must be analyzed to look for data patterns that indicate sensor malfunctions.

– Sensor data must be stored in a Cosmos DB named treydata in a collection named SensorData

– The impact of vehicle images on sensor data throughout must be minimized.

Backtrack

This solution reports on all data related to a specific vehicle license plate. The report must use data from the SensorData collection.

Users must be able to filter vehicle data in the following ways:

– vehicles on a specific road

– vehicles driving above the speed limit

Planning Assistance

Data used for Planning Assistance must be stored in a sharded Azure SQL Database.

Data from the Sensor Data collection will automatically be loaded into the Planning Assistance database once a week by using Azure Data Factory. You must be able to manually trigger the data load process.

Privacy and security policy

– Azure Active Directory must be used for all services where it is available.

– For privacy reasons, license plate number information must not be accessible in Planning Assistance.

– Unauthorized usage of the Planning Assistance data must be detected as quickly as possible. Unauthorized usage is determined by looking for an unusual pattern of usage.

– Data must only be stored for seven years.

Performance and availability

– The report for Backtrack must execute as quickly as possible.

– The SLA for Planning Assistance is 70 percent, and multiday outages are permitted.

– All data must be replicated to multiple geographic regions to prevent data loss.

– You must maximize the performance of the Real Time Response system.

Financial requirements

Azure resource costs must be minimized where possible.

You need to design a sharding strategy for the Planning Assistance database.

What should you recommend?

  • A . a list mapping shard map on the binary representation of the License Plate column
  • B . a range mapping shard map on the binary representation of the speed column
  • C . a list mapping shard map on the location column
  • D . a range mapping shard map on the time column

Reveal Solution Hide Solution

Correct Answer: A
A

Explanation:

Data used for Planning Assistance must be stored in a sharded Azure SQL Database.

A shard typically contains items that fall within a specified range determined by one or more attributes of the data. These attributes form the shard key (sometimes referred to as the partition key). The shard key should be static. It shouldn’t be based on data that might change.

References: https://docs.microsoft.com/en-us/azure/architecture/patterns/sharding

Question #2

You need to design the vehicle images storage solution.

What should you recommend?

  • A . Azure Media Services
  • B . Azure Premium Storage account
  • C . Azure Redis Cache
  • D . Azure Cosmos DB

Reveal Solution Hide Solution

Correct Answer: B
B

Explanation:

Premium Storage stores data on the latest technology Solid State Drives (SSDs) whereas Standard Storage stores data on Hard Disk Drives (HDDs). Premium Storage is designed for Azure Virtual Machine workloads which require consistent high IO performance and low latency in order to host IO intensive workloads like OLTP, Big Data, and Data Warehousing on platforms like SQL Server, MongoDB, Cassandra, and others. With Premium Storage, more customers will be able to lift-and-shift demanding enterprise applications to the cloud.

Scenario: Traffic sensors will occasionally capture an image of a vehicle for debugging purposes.

You must optimize performance of saving/storing vehicle images.

The impact of vehicle images on sensor data throughout must be minimized.

References: https://azure.microsoft.com/es-es/blog/introducing-premium-storage-high-performance-storage-for-azure-virtual-machine-workloads/

Question #3

HOTSPOT

You need to ensure that security policies for the unauthorized detection system are met.

What should you recommend? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point.

Reveal Solution Hide Solution

Correct Answer:

Explanation:

Box 1: Blob storage

Configure blob storage for audit logs.

Scenario: Unauthorized usage of the Planning Assistance data must be detected as quickly as possible. Unauthorized usage is determined by looking for an unusual pattern of usage.

Data used for Planning Assistance must be stored in a sharded Azure SQL Database.

Box 2: Web Apps

SQL Advanced Threat Protection (ATP) is to be used.

One of Azure’s most popular service is App Service which enables customers to build and host web applications in the programming language of their choice without managing infrastructure. App Service offers auto-scaling and high availability, supports both Windows and Linux. It also supports automated deployments from GitHub, Visual Studio Team Services or any Git repository. At RSA, we announced that Azure Security Center leverages the scale of the cloud to identify attacks targeting App Service applications.

Reference: https://azure.microsoft.com/sv-se/blog/azure-security-center-can-identify-attacks-targeting-azure-app­service-applications/


Question #4

HOTSPOT

You need to design the authentication and authorization methods for sensors.

What should you recommend? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point.

Reveal Solution Hide Solution

Correct Answer:

Explanation:

Sensor data must be stored in a Cosmos DB named treydata in a collection named SensorData Sensors must have permission only to add items to the SensorData collection

Box 1: Resource Token

Resource tokens provide access to the application resources within a Cosmos DB database. Enable clients to read, write, and delete resources in the Cosmos DB account according to the permissions they’ve been granted.

Box 2: Cosmos DB user

You can use a resource token (by creating Cosmos DB users and permissions) when you want to provide access to resources in your Cosmos DB account to a client that cannot be trusted with the master key.

Reference: https://docs.microsoft.com/en-us/azure/cosmos-db/secure-access-to-data


Question #5

HOTSPOT

You need to ensure that emergency road response vehicles are dispatched automatically.

How should you design the processing system? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point.

Reveal Solution Hide Solution

Correct Answer:

Explanation:

Box1: API App

✑ Events generated from the IoT data sources are sent to the stream ingestion layer through Azure HDInsight Kafka as a stream of messages. HDInsight Kafka stores streams of data in topics for a configurable of time.

✑ Kafka consumer, Azure Databricks, picks up the message in real time from the Kafka topic, to process the data based on the business logic and can then send to Serving layer for storage.

✑ Downstream storage services, like Azure Cosmos DB, Azure SQL Data warehouse, or Azure SQL DB, will then be a data source for presentation and action layer.

✑ Business analysts can use Microsoft Power BI to analyze warehoused data. Other applications can be built upon the serving layer as well. For example, we can expose APIs based on the service layer data for third party uses.

Box 2: Cosmos DB Change Feed

Change feed support in Azure Cosmos DB works by listening to an Azure Cosmos DB container for any changes. It then outputs the sorted list of documents that were changed in the order in which they were modified.

The change feed in Azure Cosmos DB enables you to build efficient and scalable solutions for each of these patterns, as shown in the following image:

References: https://docs.microsoft.com/bs-cyrl-ba/azure/architecture/example-scenario/data/realtime-analytics-vehicle-iot?view=azurermps-4.4.1


Question #6

You need to recommend an Azure SQL Database pricing tier for Planning Assistance.

Which pricing tier should you recommend?

  • A . Business critical Azure SQL Database single database
  • B . General purpose Azure SQL Database Managed Instance
  • C . Business critical Azure SQL Database Managed Instance
  • D . General purpose Azure SQL Database single database

Reveal Solution Hide Solution

Correct Answer: B
B

Explanation:

Azure resource costs must be minimized where possible.

Data used for Planning Assistance must be stored in a sharded Azure SQL Database.

The SLA for Planning Assistance is 70 percent, and multiday outages are permitted.

Question #7

HOTSPOT

You need to design the SensorData collection.

What should you recommend? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point.

Reveal Solution Hide Solution

Correct Answer:

Explanation:

Box 1: Eventual

Traffic data insertion rate must be maximized.

Sensor data must be stored in a Cosmos DB named treydata in a collection named SensorData

With Azure Cosmos DB, developers can choose from five well-defined consistency models on the consistency spectrum. From strongest to more relaxed, the models include strong, bounded staleness, session, consistent prefix, and eventual consistency.

Box 2: License plate

This solution reports on all data related to a specific vehicle license plate. The report must use data from the SensorData collection.

References: https://docs.microsoft.com/en-us/azure/cosmos-db/consistency-levels


Question #8

HOTSPOT

You need to design the data loading pipeline for Planning Assistance.

What should you recommend? To answer, drag the appropriate technologies to the correct locations. Each technology may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content. NOTE: Each correct selection is worth one point.

Reveal Solution Hide Solution

Correct Answer:

Explanation:

Box 1: SqlSink Table

Sensor data must be stored in a Cosmos DB named treydata in a collection named SensorData

Box 2: Cosmos Bulk Loading

Use Copy Activity in Azure Data Factory to copy data from and to Azure Cosmos DB (SQL API).

Scenario: Data from the Sensor Data collection will automatically be loaded into the Planning Assistance database once a week by using Azure Data Factory. You must be able to manually trigger the data load process.

Data used for Planning Assistance must be stored in a sharded Azure SQL Database.

References: https://docs.microsoft.com/en-us/azure/data-factory/connector-azure-cosmos-db


Question #9

DRAG DROP

You need to ensure that performance requirements for Backtrack reports are met.

What should you recommend? To answer, drag the appropriate technologies to the correct locations. Each technology may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content. NOTE: Each correct selection is worth one point.

Reveal Solution Hide Solution

Correct Answer:

Explanation:

Box 1: Cosmos DB indexes

The report for Backtrack must execute as quickly as possible.

You can override the default indexing policy on an Azure Cosmos container, this could be useful if you want to tune the indexing precision to improve the query performance or to reduce the consumed storage.

Box 2: Cosmos DB TTL

This solution reports on all data related to a specific vehicle license plate.

The report must use data from the SensorData collection. Users must be able to filter vehicle data in the following ways:

✑ vehicles on a specific road

✑ vehicles driving above the speed limit

Note: With Time to Live or TTL, Azure Cosmos DB provides the ability to delete items automatically from a container after a certain time period. By default, you can set time to live at the container level and override the value on a per-item basis. After you set the TTL at a container or at an item level, Azure Cosmos DB will automatically remove these items after the time period, since the time they were last modified.


Question #10

You need to design the runtime environment for the Real Time Response system.

What should you recommend?

  • A . General Purpose nodes without the Enterprise Security package
  • B . Memory Optimized Nodes without the Enterprise Security package
  • C . Memory Optimized nodes with the Enterprise Security package
  • D . General Purpose nodes with the Enterprise Security package

Reveal Solution Hide Solution

Correct Answer: B

Question #11

HOTSPOT

You need to design the Planning Assistance database.

For each of the following statements, select Yes if the statement is true. Otherwise, select No. NOTE: Each correct selection is worth one point.

Reveal Solution Hide Solution

Correct Answer:

Explanation:

Box 1: No

Data used for Planning Assistance must be stored in a sharded Azure SQL Database.

Box 2: Yes

Box 3: Yes

Planning Assistance database will include reports tracking the travel of a single vehicle


Question #12

Topic 2, Case study 1

Case study

This is a case study. Case studies are not timed separately. You can use as much exam time as you would like to complete each case. However, there may be additional case studies and sections on this exam. You must manage your time to ensure that you are able to complete all questions included on this exam in the time provided.

To answer the questions included in a case study, you will need to reference information that is provided in the case study. Case studies might contain exhibits and other resources that provide more information about the scenario that is described in the case study. Each question is independent of the other questions in this case study.

At the end of this case study, a review screen will appear. This screen allows you to review your answers and to make changes before you move to the next section of the exam. After you begin a new section, you cannot return to this section.

To start the case study

To display the first question in this case study, click the Next button. Use the buttons in the left pane to explore the content of the case study before you answer the questions. Clicking these buttons displays information such as business requirements, existing environment, and problem statements. If the case study has an All Information tab, note that the information displayed is identical to the information displayed on the subsequent tabs. When you are ready to answer a question, click the Question button to return to the question.

Overview

You develop data engineering solutions for Graphics Design Institute, a global media company with offices in New York City, Manchester, Singapore, and Melbourne.

The New York office hosts SQL Server databases that stores massive amounts of customer data. The company also stores millions of images on a physical server located in the New York office. More than 2 TB of image data is added each day. The images are transferred from customer devices to the server in New York.

Many images have been placed on this server in an unorganized manner, making it difficult for editors to search images. Images should automatically have object and color tags generated. The tags must be stored in a document database, and be queried by SQL

You are hired to design a solution that can store, transform, and visualize customer data.

Requirements

Business

The company identifies the following business requirements:

– You must transfer all images and customer data to cloud storage and remove on-premises servers.

– You must develop an analytical processing solution for transforming customer data.

– You must develop an image object and color tagging solution.

– Capital expenditures must be minimized.

– Cloud resource costs must be minimized.

Technical

The solution has the following technical requirements:

– Tagging data must be uploaded to the cloud from the New York office location.

– Tagging data must be replicated to regions that are geographically close to company office locations.

– Image data must be stored in a single data store at minimum cost.

– Customer data must be analyzed using managed Spark clusters.

– Power BI must be used to visualize transformed customer data.

– All data must be backed up in case disaster recovery is required.

Security and optimization

All cloud data must be encrypted at rest and in transit.

The solution must support:

– parallel processing of customer data

– hyper-scale storage of images

– global region da

You need to recommend a solution for storing customer data.

What should you recommend?

  • A . Azure SQL Data Warehouse
  • B . Azure Stream Analytics
  • C . Azure Databricks
  • D . Azure SQL Database

Reveal Solution Hide Solution

Correct Answer: C
C

Explanation:

From the scenario:

Customer data must be analyzed using managed Spark clusters.

All cloud data must be encrypted at rest and in transit. The solution must support: parallel processing of customer data.

References: https://www.microsoft.com/developerblog/2019/01/18/running-parallel-apache-spark-notebook-workloads-on-azure-databricks/

Question #13

HOTSPOT

You need to design storage for the solution.

Which storage services should you recommend? To answer, select the appropriate configuration in the answer area. NOTE: Each correct selection is worth one point.

Reveal Solution Hide Solution

Correct Answer:

Explanation:

Images: Azure Data Lake Storage

Scenario: Image data must be stored in a single data store at minimum cost.

Customer data: Azure Blob Storage

Scenario: Customer data must be analyzed using managed Spark clusters.

Spark clusters in HDInsight are compatible with Azure Storage and Azure Data Lake Storage.

Azure Storage includes these data services: Azure Blob, Azure Files, Azure Queues, and Azure Tables.

References: https://docs.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-overview


Question #14

HOTSPOT

You need to design the image processing and storage solutions.

What should you recommend? To answer, select the appropriate configuration in the answer area. NOTE: Each correct selection is worth one point.

Reveal Solution Hide Solution

Correct Answer:

Explanation:

References:

https://docs.microsoft.com/en-us/azure/architecture/data-guide/technology-choices/batch-processing

https://docs.microsoft.com/en-us/azure/sql-database/sql-database-service-tier-hyperscale


Question #15

What should you recommend to prevent users outside the Litware on-premises network from accessing the analytical data store?

  • A . a server-level virtual network rule
  • B . a database-level virtual network rule
  • C . a database-level firewall IP rule
  • D . a server-level firewall IP rule

Reveal Solution Hide Solution

Correct Answer: A
A

Explanation:

Virtual network rules are one firewall security feature that controls whether the database server for your single databases and elastic pool in Azure SQL Database or for your databases in SQL Data Warehouse accepts communications that are sent from particular subnets in virtual networks.

Server-level, not database-level: Each virtual network rule applies to your whole Azure SQL Database server, not just to one particular database on the server. In other words, virtual network rule applies at the server level, not at the database-level.

References: https://docs.microsoft.com/en-us/azure/sql-database/sql-database-vnet-service-endpoint-rule-overview

Question #16

You need to design the solution for analyzing customer data.

What should you recommend?

  • A . Azure Databricks
  • B . Azure Data Lake Storage
  • C . Azure SQL Data Warehouse
  • D . Azure Cognitive Services
  • E . Azure Batch

Reveal Solution Hide Solution

Correct Answer: A
A

Explanation:

Customer data must be analyzed using managed Spark clusters.

You create spark clusters through Azure Databricks.

References: https://docs.microsoft.com/en-us/azure/azure-databricks/quickstart-create-databricks-workspace-portal

Question #17

You need to design a backup solution for the processed customer data.

What should you include in the design?

  • A . AzCopy
  • B . AdlCopy
  • C . Geo-Redundancy
  • D . Geo-Replication

Reveal Solution Hide Solution

Correct Answer: C
C

Explanation:

Scenario: All data must be backed up in case disaster recovery is required. Geo-redundant storage (GRS) is designed to provide at least 99.99999999999999% (16 9’s) durability of objects over a given year by replicating your data to a secondary region that is hundreds of miles away from the primary region. If your storage account has GRS enabled, then your data is durable even in the case of a complete regional outage or a disaster in which the primary region isn’t recoverable.

References: https://docs.microsoft.com/en-us/azure/storage/common/storage-redundancy-grs

Question #18

DRAG DROP

You discover that the highest chance of corruption or bad data occurs during nightly inventory loads.

You need to ensure that you can quickly restore the data to its state before the nightly load and avoid missing any streaming data.

Which three actions should you perform in sequence? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order.

Reveal Solution Hide Solution

Correct Answer:

Explanation:

Step 1: Before the nightly load, create a user-defined restore point.

SQL Data Warehouse performs a geo-backup once per day to a paired data center. The RPO for a geo-restore is 24 hours. If you require a shorter RPO for geo-backups, you can create a user-defined restore point and restore from the newly created restore point to a new data warehouse in a different region.

Step 2: Restore the data warehouse to a new name on the same server.

Step 3: Swap the restored database warehouse name.

References: https://docs.microsoft.com/en-us/azure/sql-data-warehouse/backup-and-restore


Question #19

DRAG DROP

You need to design the image processing solution to meet the optimization requirements for image tag data.

What should you configure? To answer, drag the appropriate setting to the correct drop targets.

Each source may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content. NOTE: Each correct selection is worth one point.

Reveal Solution Hide Solution

Correct Answer:

Explanation:

Tagging data must be uploaded to the cloud from the New York office location.

Tagging data must be replicated to regions that are geographically close to company office locations.


Question #20

What should you recommend using to secure sensitive customer contact information?

  • A . data labels
  • B . column-level security
  • C . row-level security
  • D . Transparent Data Encryption (TDE)

Reveal Solution Hide Solution

Correct Answer: B
B

Explanation:

Scenario: All cloud data must be encrypted at rest and in transit.

Always Encrypted is a feature designed to protect sensitive data stored in specific database columns from access (for example, credit card numbers, national identification numbers, or data on a need to know basis). This includes database administrators or other privileged users who are authorized to access the database to perform management tasks, but have no business need to access the particular data in the encrypted columns. The data is always encrypted, which means the encrypted data is decrypted only for processing by client applications with access to the encryption key.

References: https://docs.microsoft.com/en-us/azure/sql-database/sql-database-security-overview

Question #21

You plan to use an Azure SQL data warehouse to store the customer data. You need to recommend a disaster recovery solution for the data warehouse.

What should you include in the recommendation?

  • A . AzCopy
  • B . Read-only replicas
  • C . AdICopy
  • D . Geo-Redundant backups

Reveal Solution Hide Solution

Correct Answer: D
D

Explanation:

References: https://docs.microsoft.com/en-us/azure/sql-data-warehouse/backup-and-restore

Question #22

DRAG DROP

You need to design the encryption strategy for the tagging data and customer data.

What should you recommend? To answer, drag the appropriate setting to the correct drop targets. Each source may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content. NOTE: Each correct selection is worth one point.

Reveal Solution Hide Solution

Correct Answer:

Explanation:

All cloud data must be encrypted at rest and in transit.

Box 1: Transparent data encryption

Encryption of the database file is performed at the page level. The pages in an encrypted database are encrypted before they are written to disk and decrypted when read into memory.

Box 2: Encryption at rest

Encryption at Rest is the encoding (encryption) of data when it is persisted.

References:

https://docs.microsoft.com/en-us/sql/relational-databases/security/encryption/transparent-data-encryption?view=sql-server-2017

https://docs.microsoft.com/en-us/azure/security/azure-security-encryption-atrest


Question #23

You need to recommend a solution for storing the image tagging data.

What should you recommend?

  • A . Azure File Storage
  • B . Azure Cosmos DB
  • C . Azure Blob Storage
  • D . Azure SQL Database
  • E . Azure SQL Data Warehouse

Reveal Solution Hide Solution

Correct Answer: C
C

Explanation:

Image data must be stored in a single data store at minimum cost.

Note: Azure Blob storage is Microsoft’s object storage solution for the cloud. Blob storage is optimized for storing massive amounts of unstructured data. Unstructured data is data that does not adhere to a particular data model or definition, such as text or binary data.

Blob storage is designed for:

✑ Serving images or documents directly to a browser.

✑ Storing files for distributed access.

✑ Streaming video and audio.

✑ Writing to log files.

✑ Storing data for backup and restore, disaster recovery, and archiving.

✑ Storing data for analysis by an on-premises or Azure-hosted service.

References: https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction

Question #24

Topic 3, Case study 2

Case study

This is a case study. Case studies are not timed separately. You can use as much exam time as you would like to complete each case. However, there may be additional case studies and sections on this exam. You must manage your time to ensure that you are able to complete all questions included on this exam in the time provided.

To answer the questions included in a case study, you will need to reference information that is provided in the case study. Case studies might contain exhibits and other resources that provide more information about the scenario that is described in the case study. Each question is independent of the other questions in this case study.

At the end of this case study, a review screen will appear. This screen allows you to review your answers and to make changes before you move to the next section of the exam. After you begin a new section, you cannot return to this section.

To start the case study

To display the first question in this case study, click the Next button. Use the buttons in the left pane to explore the content of the case study before you answer the questions. Clicking these buttons displays information such as business requirements, existing environment, and problem statements. If the case study has an All Information tab, note that the information displayed is identical to the information displayed on the subsequent tabs. When you are ready to answer a question, click the Question button to return to the question.

Background

Current environment

The company has the following virtual machines (VMs):

Requirements

Storage and processing

You must be able to use a file system view of data stored in a blob.

You must build an architecture that will allow Contoso to use the DB FS filesystem layer over a blob store. The architecture will need to support data files, libraries, and images. Additionally, it must provide a web-based interface to documents that contain runnable command, visualizations, and narrative text such as a notebook.

CONT_SQL3 requires an initial scale of 35000 IOPS.

CONT_SQL1 and CONT_SQL2 must use the vCore model and should include replicas. The solution must support 8000 IOPS.

The storage should be configured to optimized storage for database OLTP workloads.

Migration

– You must be able to independently scale compute and storage resources.

– You must migrate all SQL Server workloads to Azure. You must identify related machines in the on-premises environment, get disk size data usage information.

– Data from SQL Server must include zone redundant storage.

– You need to ensure that app components can reside on-premises while interacting with components that run in the Azure public cloud.

– SAP data must remain on-premises.

– The Azure Site Recovery (ASR) results should contain per-machine data.

Business requirements

– You must design a regional disaster recovery topology.

– The database backups have regulatory purposes and must be retained for seven years.

– CONT_SQL1 stores customers sales data that requires ETL operations for data analysis. A solution is required that reads data from SQL, performs ETL, and outputs to Power BI. The solution should use managed clusters to minimize costs. To optimize logistics, Contoso needs to analyze customer sales data to see if certain products are tied to specific times in the year.

– The analytics solution for customer sales data must be available during a regional outage.

Security and auditing

– Contoso requires all corporate computers to enable Windows Firewall.

– Azure servers should be able to ping other Contoso Azure servers.

– Employee PII must be encrypted in memory, in motion, and at rest. Any data encrypted by SQL Server must support equality searches, grouping, indexing, and joining on the encrypted data.

– Keys must be secured by using hardware security modules (HSMs).

– CONT_SQL3 must not communicate over the default ports

Cost

– All solutions must minimize cost and resources.

– The organization does not want any unexpected charges.

– The data engineers must set the SQL Data Warehouse compute resources to consume 300 DWUs.

– CONT_SQL2 is not fully utilized during non-peak hours. You must minimize resource costs for during non-peak hours.

You plan to use Azure SQL Database to support a line of business app.

You need to identify sensitive data that is stored in the database and monitor access to the data.

Which three actions should you recommend? Each correct answer presents part of the solution. NOTE: Each correct selection is worth one point.

  • A . Enable Data Discovery and Classification.
  • B . Implement Transparent Data Encryption (TDE).
  • C . Enable Auditing.
  • D . Run Vulnerability Assessment.
  • E . Use Advanced Threat Protection.

Reveal Solution Hide Solution

Correct Answer: C,D,E
Question #25

You need to design a solution to meet the SQL Server storage requirements for CONT_SQL3.

Which type of disk should you recommend?

  • A . Standard SSD Managed Disk
  • B . Premium SSD Managed Disk
  • C . Ultra SSD Managed Disk

Reveal Solution Hide Solution

Correct Answer: C
C

Explanation:

CONT_SQL3 requires an initial scale of 35000 IOPS. Ultra SSD Managed Disk Offerings

The following table provides a comparison of ultra solid-state-drives (SSD) (preview), premium SSD, standard SSD, and standard hard disk drives (HDD) for managed disks to help you decide what to use.

References: https://docs.microsoft.com/en-us/azure/virtual-machines/windows/disks-types


Question #26

DRAG DROP

You are designing an Azure SQL Data Warehouse for a financial services company. Azure Active Directory will be used to authenticate the users.

You need to ensure that the following security requirements are met:

✑ Department managers must be able to create new database.

✑ The IT department must assign users to databases.

✑ Permissions granted must be minimized.

Which role memberships should you recommend? To answer, drag the appropriate roles to the correct groups. Each role may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content. NOTE: Each correct selection is worth one point.

Reveal Solution Hide Solution

Correct Answer:

Explanation:

Box 1: dbmanager

Members of the dbmanager role can create new databases.

Box 2: db_accessadmin

Members of the db_accessadmin fixed database role can add or remove access to the database for Windows logins, Windows groups, and SQL Server logins.

References: https://docs.microsoft.com/en-us/azure/sql-database/sql-database-manage-logins


Question #27

You need to recommend the appropriate storage and processing solution?

What should you recommend?

  • A . Enable auto-shrink on the database.
  • B . Flush the blob cache using Windows PowerShell.
  • C . Enable Apache Spark RDD (RDD) caching.
  • D . Enable Databricks IO (DBIO) caching.
  • E . Configure the reading speed using Azure Data Studio.

Reveal Solution Hide Solution

Correct Answer: C
C

Explanation:

Scenario: You must be able to use a file system view of data stored in a blob. You must build an architecture that will allow Contoso to use the DB FS filesystem layer over a blob store.

Databricks File System (DBFS) is a distributed file system installed on Azure Databricks clusters. Files in DBFS persist to Azure Blob storage, so you won’t lose data even after you terminate a cluster.

The Databricks Delta cache, previously named Databricks IO (DBIO) caching, accelerates data reads by creating copies of remote files in nodes’ local storage using a fast intermediate data format. The data is cached automatically whenever a file has to be fetched from a remote location. Successive reads of the same data are then performed locally, which results in significantly improved reading speed.

References: https://docs.databricks.com/delta/delta-cache.html#delta-cache

Question #28

You need to optimize storage for CONT_SQL3.

What should you recommend?

  • A . AlwaysOn
  • B . Transactional processing
  • C . General
  • D . Data warehousing

Reveal Solution Hide Solution

Correct Answer: B
B

Explanation:

CONT_SQL3 with the SQL Server role, 100 GB database size, Hyper-VM to be migrated to Azure VM.

The storage should be configured to optimized storage for database OLTP workloads.

Azure SQL Database provides three basic in-memory based capabilities (built into the underlying database engine) that can contribute in a meaningful way to performance improvements:

In-Memory Online Transactional Processing (OLTP)

Clustered columnstore indexes intended primarily for Online Analytical Processing (OLAP) workloads

Nonclustered columnstore indexes geared towards Hybrid Transactional/Analytical Processing (HTAP) workloads

Reference: https://www.databasejournal.com/features/mssql/overview-of-in-memory-technologies-of-azure-sql­database.html

Question #29

A company stores sensitive information about customers and employees in Azure SQL Database.

You need to ensure that the sensitive data remains encrypted in transit and at rest.

What should you recommend?

  • A . Transparent Data Encryption
  • B . Always Encrypted with secure enclaves
  • C . Azure Disk Encryption
  • D . SQL Server AlwaysOn

Reveal Solution Hide Solution

Correct Answer: B
B

Explanation:

References: https://cloudblogs.microsoft.com/sqlserver/2018/12/17/confidential-computing-using-always-encrypted-withsecure-enclaves-in-sql-server-2019-preview/

Question #30

You need to recommend a backup strategy for CONT_SQL1 and CONT_SQL2.

What should you recommend?

  • A . Use AzCopy and store the data in Azure.
  • B . Configure Azure SQL Database long-term retention for all databases.
  • C . Configure Accelerated Database Recovery.
  • D . Use DWLoader.

Reveal Solution Hide Solution

Correct Answer: B
B

Explanation:

Scenario: The database backups have regulatory purposes and must be retained for seven years.

Question #31

HOTSPOT

You need to design network access to the SQL Server data.

What should you recommend? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point.

Reveal Solution Hide Solution

Correct Answer:

Explanation:

Box 1: 8080

1433 is the default port, but we must change it as CONT_SQL3 must not communicate over the default ports. Because port 1433 is the known standard for SQL Server, some organizations specify that the SQL Server port number should be changed to enhance security.

Box 2: SQL Server Configuration Manager

You can configure an instance of the SQL Server Database Engine to listen on a specific fixed port by using the SQL Server Configuration Manager.

References: https://docs.microsoft.com/en-us/sql/database-engine/configure-windows/configure-a-server-to-listen-on-a-specific-tcp-port?view=sql-server-2017


Question #32

You need to recommend an Azure SQL Database service tier.

What should you recommend?

  • A . Business Critical
  • B . General Purpose
  • C . Premium
  • D . Standard
  • E . Basic

Reveal Solution Hide Solution

Correct Answer: C
C

Explanation:

The data engineers must set the SQL Data Warehouse compute resources to consume 300 DWUs.

Note: There are three architectural models that are used in Azure SQL Database:

✑ General Purpose/Standard

✑ Business Critical/Premium

✑ Hyperscale

Question #33

You need to design the disaster recovery solution for customer sales data analytics.

Which three actions should you recommend? Each correct answer presents part of the solution. NOTE: Each correct selection is worth one point.

  • A . Provision multiple Azure Databricks workspaces in separate Azure regions.
  • B . Migrate users, notebooks, and cluster configurations from one workspace to another in the same region.
  • C . Use zone redundant storage.
  • D . Migrate users, notebooks, and cluster configurations from one region to another.
  • E . Use Geo-redundant storage.
  • F . Provision a second Azure Databricks workspace in the same region.

Reveal Solution Hide Solution

Correct Answer: A,D,E
A,D,E

Explanation:

Scenario: The analytics solution for customer sales data must be available during a regional outage.

To create your own regional disaster recovery topology for databricks, follow these requirements:

Question #33

You need to design the disaster recovery solution for customer sales data analytics.

Which three actions should you recommend? Each correct answer presents part of the solution. NOTE: Each correct selection is worth one point.

  • A . Provision multiple Azure Databricks workspaces in separate Azure regions.
  • B . Migrate users, notebooks, and cluster configurations from one workspace to another in the same region.
  • C . Use zone redundant storage.
  • D . Migrate users, notebooks, and cluster configurations from one region to another.
  • E . Use Geo-redundant storage.
  • F . Provision a second Azure Databricks workspace in the same region.

Reveal Solution Hide Solution

Correct Answer: A,D,E
A,D,E

Explanation:

Scenario: The analytics solution for customer sales data must be available during a regional outage.

To create your own regional disaster recovery topology for databricks, follow these requirements:

Question #33

You need to design the disaster recovery solution for customer sales data analytics.

Which three actions should you recommend? Each correct answer presents part of the solution. NOTE: Each correct selection is worth one point.

  • A . Provision multiple Azure Databricks workspaces in separate Azure regions.
  • B . Migrate users, notebooks, and cluster configurations from one workspace to another in the same region.
  • C . Use zone redundant storage.
  • D . Migrate users, notebooks, and cluster configurations from one region to another.
  • E . Use Geo-redundant storage.
  • F . Provision a second Azure Databricks workspace in the same region.

Reveal Solution Hide Solution

Correct Answer: A,D,E
A,D,E

Explanation:

Scenario: The analytics solution for customer sales data must be available during a regional outage.

To create your own regional disaster recovery topology for databricks, follow these requirements:

Question #33

You need to design the disaster recovery solution for customer sales data analytics.

Which three actions should you recommend? Each correct answer presents part of the solution. NOTE: Each correct selection is worth one point.

  • A . Provision multiple Azure Databricks workspaces in separate Azure regions.
  • B . Migrate users, notebooks, and cluster configurations from one workspace to another in the same region.
  • C . Use zone redundant storage.
  • D . Migrate users, notebooks, and cluster configurations from one region to another.
  • E . Use Geo-redundant storage.
  • F . Provision a second Azure Databricks workspace in the same region.

Reveal Solution Hide Solution

Correct Answer: A,D,E
A,D,E

Explanation:

Scenario: The analytics solution for customer sales data must be available during a regional outage.

To create your own regional disaster recovery topology for databricks, follow these requirements:

Question #37

Topic 4, ADatum Corporation

Case study

This is a case study. Case studies are not timed separately. You can use as much exam time as you would like to complete each case. However, there may be additional case studies and sections on this exam. You must manage your time to ensure that you are able to complete all questions included on this exam in the time provided.

To answer the questions included in a case study, you will need to reference information that is provided in the case study. Case studies might contain exhibits and other resources that provide more information about the scenario that is described in the case study. Each question is independent of the other questions in this case study.

At the end of this case study, a review screen will appear. This screen allows you to review your answers and to make changes before you move to the next section of the exam. After you begin a new section, you cannot return to this section.

To start the case study

To display the first question in this case study, click the Next button. Use the buttons in the left pane to explore the content of the case study before you answer the questions. Clicking these buttons displays information such as business requirements, existing environment, and problem statements. If the case study has an All Information tab, note that the information displayed is identical to the information displayed on the subsequent tabs. When you are ready to answer a question, click the Question button to return to the question.

Overview

General Overview

ADatum Corporation is a medical company that has 5,000 physicians located in more than 300 hospitals across the US. The company has a medical department, a sales department, a marketing department, a medical research department, and a human resources department.

You are redesigning the application environment of ADatum.

Physical Locations

ADatum has three main offices in New York, Dallas, and Los Angeles. The offices connect to each other by using a WAN link. Each office connects directly to the Internet. The Los Angeles office also has a datacenter that hosts all the company’s applications.

Existing Environment

Health Review

ADatum has a critical OLTP web application named Health Review that physicians use to track billing, patient care, and overall physician best practices.

Health Interface

ADatum has a critical application named Health Interface that receives hospital messages related to patient care and status updates. The messages are sent in batches by each hospital’s enterprise relationship management (ERM) system by using a VPN. The data sent from each hospital can have varying columns and formats.

Currently, a custom C# application is used to send the data to Health Interface. The application uses deprecated libraries and a new solution must be designed for this functionality.

Health Insights

ADatum has a web-based reporting system named Health Insights that shows hospital and patient insights to physicians and business users. The data is created from the data in Health Review and Health Interface, as well as manual entries.

Database Platform

Currently, the databases for all three applications are hosted on an out-of-date VMware cluster that has a single instance of Microsoft SQL Server 2012.

Problem Statements

ADatum identifies the following issues in its current environment:

– Over time, the data received by Health Interface from the hospitals has slowed, and the number of messages has increased.

– When a new hospital joins ADatum, Health Interface requires a schema modification due to the lack of data standardization.

– The speed of batch data processing is inconsistent.

Business Requirements

Business Goals

ADatum identifies the following business goals:

– Migrate the applications to Azure whenever possible.

– Minimize the development effort required to perform data movement.

– Provide continuous integration and deployment for development, test, and production environments.

– Provide faster access to the applications and the data and provide more consistent application performance.

– Minimize the number of services required to perform data processing, development, scheduling, monitoring, and the operationalizing of pipelines.

Health Review Requirements

ADatum identifies the following requirements for the Health Review application:

– Ensure that sensitive health data is encrypted at rest and in transit.

– Tag all the sensitive health data in Health Review. The data will be used for auditing.

Health Interface Requirements

ADatum identifies the following requirements for the Health Interface application:

– Upgrade to a data storage solution that will provide flexible schemas and increased throughput for writing data. Data must be regionally located close to each hospital, and reads must display be the most recent committed version of an item.

– Reduce the amount of time it takes to add data from new hospitals to Health Interface.

– Support a more scalable batch processing solution in Azure.

– Reduce the amount of development effort to rewrite existing SQL queries.

Health Insights Requirements

ADatum identifies the following requirements for the Health Insights application:

– The analysis of events must be performed over time by using an organizational date dimension table.

– The data from Health Interface and Health Review must be available in Health Insights within 15 minutes of being committed.

– The new Health Insights application must be built on a massively parallel processing (MPP) architecture that will support the high performance of joins on large fact tables.

HOTSPOT

Which Azure data storage solution should you recommend for each application? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point.

Reveal Solution Hide Solution

Correct Answer:

Explanation:

Health Review: Azure SQL Database

Scenario: ADatum identifies the following requirements for the Health Review application:

– Ensure that sensitive health data is encrypted at rest and in transit.

– Tag all the sensitive health data in Health Review. The data will be used for auditing.

Health Interface: Azure Cosmos DB

ADatum identifies the following requirements for the Health Interface application:

– Upgrade to a data storage solution that will provide flexible schemas and increased throughput for writing data. Data must be regionally located close to each hospital, and reads must display be the most recent committed version of an item.

– Reduce the amount of time it takes to add data from new hospitals to Health Interface.

– Support a more scalable batch processing solution in Azure.

– Reduce the amount of development effort to rewrite existing SQL queries.

Health Insights: Azure SQL Data Warehouse

Azure SQL Data Warehouse is a cloud-based enterprise data warehouse that leverages massively parallel processing (MPP) to quickly run complex queries across petabytes of data. Use SQL Data Warehouse as a key component of a big data solution.

You can access Azure SQL Data Warehouse (SQL DW) from Databricks using the SQL Data Warehouse connector (referred to as the SQL DW connector), a data source implementation for Apache Spark that uses Azure Blob Storage, and PolyBase in SQL DW to transfer large volumes of data efficiently between a Databricks cluster and a SQL DW instance.

Scenario: ADatum identifies the following requirements for the Health Insights application:

– The new Health Insights application must be built on a massively parallel processing (MPP) architecture that will support the high performance of joins on large fact tables

Reference: https://docs.databricks.com/data/data-sources/azure/sql-data-warehouse.html


Question #38

You need to recommend a security solution that meets the requirements of Health Review.

What should you include in the recommendation?

  • A . dynamic data masking
  • B . Transport Layer Security (TLS)
  • C . Always Encrypted
  • D . row-level security

Reveal Solution Hide Solution

Correct Answer: C
C

Explanation:

Must ensure that sensitive health data is encrypted at rest and in transit.

Always Encrypted is a feature designed to protect sensitive data stored in Azure SQL Database or SQL Server databases. Always Encrypted allows clients to encrypt sensitive data inside client applications and never reveal the encryption keys to the database engine (SQL Database or SQL Server).

References:

https://docs.microsoft.com/en-us/azure/security/fundamentals/encryption-atrest

https://docs.microsoft.com/en-us/azure/security/fundamentals/database-security-overview

Question #39

You need to recommend a solution that meets the data platform requirements of Health Interface. The solution must minimize redevelopment efforts for the application.

What should you include in the recommendation?

  • A . Azure SQL Data Warehouse
  • B . Azure SQL Database
  • C . Azure Cosmos DB that uses the SQL API
  • D . Azure Cosmos DB that uses the Table API

Reveal Solution Hide Solution

Correct Answer: C
C

Explanation:

Scenario: ADatum identifies the following requirements for the Health Interface application:

– Reduce the amount of development effort to rewrite existing SQL queries.

– Upgrade to a data storage solution that will provide flexible schemas and increased throughput for writing data. Data must be regionally located close to each hospital, and reads must display be the most recent committed version of an item.

– Reduce the amount of time it takes to add data from new hospitals to Health Interface.

– Support a more scalable batch processing solution in Azure.

Question #40

Which consistency level should you use for Health Interface?

  • A . Consistent Prefix
  • B . Session
  • C . Bounded Staleness
  • D . Strong

Reveal Solution Hide Solution

Correct Answer: D
D

Explanation:

Scenario: ADatum identifies the following requirements for the Health Interface application:

reads must display be the most recent committed version of an item.

Azure Cosmos DB consistency levels include:

Strong: Strong consistency offers a linearizability guarantee. Linearizability refers to serving

requests concurrently. The reads are guaranteed to return the most recent committed version of an item. A client never sees an uncommitted or partial write. Users are always guaranteed to read the latest

committed write.

References: https://docs.microsoft.com/en-us/azure/cosmos-db/consistency-levels

Question #41

You need to design a solution that meets the business requirements of Health Insights.

What should you include in the recommendation?

  • A . Azure Cosmos DB that uses the Gremlin
  • B . Azure Data Factory
  • C . Azure Cosmos DB that uses the SQL API
  • D . Azure Databricks

Reveal Solution Hide Solution

Correct Answer: D
D

Explanation:

Azure SQL Data Warehouse is a cloud-based enterprise data warehouse that leverages massively parallel processing (MPP) to quickly run complex queries across petabytes of data. Use SQL Data Warehouse as a key component of a big data solution.

You can access Azure SQL Data Warehouse (SQL DW) from Databricks using the SQL Data Warehouse connector (referred to as the SQL DW connector), a data source implementation for Apache Spark that uses Azure Blob Storage, and PolyBase in SQL DW to transfer large volumes of data efficiently between a Databricks cluster and a SQL DW instance.

Scenario: ADatum identifies the following requirements for the Health Insights application:

✑ The new Health Insights application must be built on a massively parallel processing (MPP) architecture that will support the high performance of joins on large fact tables

References: https://docs.databricks.com/data/data-sources/azure/sql-data-warehouse.html

Question #42

You need to recommend a solution to quickly identify all the columns in Health Review that contain sensitive health data.

What should you include in the recommendation?

  • A . classifications
  • B . data masking
  • C . SQL Server auditing
  • D . Azure tags

Reveal Solution Hide Solution

Correct Answer: A
A

Explanation:

Data Discovery & Classification introduces a set of advanced capabilities aimed at protecting data and not just the data warehouse itself. Classification/Labeling C Sensitivity classification labels tagged on the columns can be persisted in the data warehouse itself.

References: https://azure.microsoft.com/sv-se/blog/announcing-public-preview-of-data-discovery-classification-formicrosoft-azure-sql-data-warehouse/

Question #43

What should you recommend as a batch processing solution for Health Interface?

  • A . Azure CycleCloud
  • B . Azure Stream Analytics
  • C . Azure Data Factory
  • D . Azure Databricks

Reveal Solution Hide Solution

Correct Answer: B
B

Explanation:

Scenario: ADatum identifies the following requirements for the Health Interface application:

Support a more scalable batch processing solution in Azure.

Reduce the amount of time it takes to add data from new hospitals to Health Interface.

Data Factory integrates with the Azure Cosmos DB bulk executor library to provide the best performance when you write to Azure Cosmos DB.

References: https://docs.microsoft.com/en-us/azure/data-factory/connector-azure-cosmos-db

Question #44

HOTSPOT

You need to design the storage for the Health Insights data platform.

Which types of tables should you include in the design? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point.

Reveal Solution Hide Solution

Correct Answer:

Explanation:

Box 1: Hash-distributed tables

The new Health Insights application must be built on a massively parallel processing (MPP) architecture that will support the high performance of joins on large fact tables.

Hash-distributed tables improve query performance on large fact tables.

Box 2: Round-robin distributed tables

A round-robin distributed table distributes table rows evenly across all distributions. The assignment of rows to distributions is random.

Scenario:

ADatum identifies the following requirements for the Health Insights application:

✑ The new Health Insights application must be built on a massively parallel processing (MPP) architecture that will support the high performance of joins on large fact tables.

References: https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-tables-distribute


Question #45

Topic 5, Data Engineer for Trey Research

Overview

You are a data engineer for Trey Research. The company is close to completing a joint project with the government to build smart highways infrastructure across North America. This involves the placement of sensors and cameras to measure traffic flow, car speed, and vehicle details.

You have been asked to design a cloud solution that will meet the business and technical requirements of the smart highway.

Solution components

Telemetry Capture

The telemetry capture system records each time a vehicle passes in front of a sensor.

The sensors run on a custom embedded operating system and record the following telemetry data:

– Time

– Location in latitude and longitude

– Speed in kilometers per hour (kmph)

– Length of vehicle in meters

Visual Monitoring

The visual monitoring system is a network of approximately 1,000 cameras placed near highways that capture images of vehicle traffic every 2 seconds. The cameras record high resolution images. Each image is approximately 3 MB in size.

Requirements: Business

The company identifies the following business requirements:

– External vendors must be able to perform custom analysis of data using machine learning technologies.

– You must display a dashboard on the operations status page that displays the following metrics: telemetry, volume, and processing latency.

– Traffic data must be made available to the Government Planning Department for the purpose of modeling changes to the highway system. The traffic data will be used in conjunction with other data such as information about events such as sporting events, weather conditions, and population statistics. External data used during the modeling is stored in on-premises SQL Server 2016 databases and CSV files stored in an Azure Data Lake Storage Gen2 storage account.

– Information about vehicles that have been detected as going over the speed limit during the last 30 minutes must be available to law enforcement officers. Several law enforcement organizations may respond to speeding vehicles.

– The solution must allow for searches of vehicle images by license plate to support law enforcement investigations. Searches must be able to be performed using a query language and must support fuzzy searches to compensate for license plate detection errors.

Requirements: Security

The solution must meet the following security requirements:

– External vendors must not have direct access to sensor data or images.

– Images produced by the vehicle monitoring solution must be deleted after one month. You must minimize costs associated with deleting images from the data store.

– Unauthorized usage of data must be detected in real time. Unauthorized usage is determined by looking for unusual usage patterns.

– All changes to Azure resources used by the solution must be recorded and stored. Data must be provided to the security team for incident response purposes.

Requirements: Sensor data

You must write all telemetry data to the closest Azure region. The sensors used for the telemetry capture system have a small amount of memory available and so must write data as quickly as possible to avoid losing telemetry data.

You need to design the storage for the telemetry capture system.

What storage solution should you use in the design?

  • A . Azure SQL Data Warehouse
  • B . Azure Databricks
  • C . Azure Cosmos DB

Reveal Solution Hide Solution

Correct Answer: C
C

Explanation:

Azure Cosmos DB is a globally distributed database service. You can associate any number of Azure regions with your Azure Cosmos account and your data is automatically and transparently replicated.

Scenario:

Telemetry Capture

The telemetry capture system records each time a vehicle passes in front of a sensor. The sensors run on a custom embedded operating system and record the following telemetry data:

✑ Time

✑ Location in latitude and longitude

✑ Speed in kilometers per hour (kmph)

✑ Length of vehicle in meters

You must write all telemetry data to the closest Azure region. The sensors used for the telemetry capture system have a small amount of memory available and so must write data as quickly as possible to avoid losing telemetry data.

Reference: https://docs.microsoft.com/en-us/azure/cosmos-db/regional-presence

Question #46

You need to design the storage for the visual monitoring system.

Which storage solution should you recommend?

  • A . Azure Blob storage
  • B . Azure Table storage
  • C . Azure SQL database
  • D . Azure Media Services

Reveal Solution Hide Solution

Correct Answer: A
A

Explanation:

Azure Blobs: A massively scalable object store for text and binary data.

Scenario:

✑ The visual monitoring system is a network of approximately 1,000 cameras placed near highways that capture images of vehicle traffic every 2 seconds. The cameras record high resolution images. Each image is approximately 3 MB in size.

✑ The solution must allow for searches of vehicle images by license plate to support law enforcement investigations. Searches must be able to be performed using a query language and must support fuzzy searches to compensate for license plate detection errors.

Reference:

https://docs.microsoft.com/en-us/azure/storage/common/storage-introduction

Question #47

You need to design the unauthorized data usage detection system.

What Azure service should you include in the design?

  • A . Azure Databricks
  • B . Azure SQL Data Warehouse
  • C . Azure Analysis Services
  • D . Azure Data Factory

Reveal Solution Hide Solution

Correct Answer: B
B

Explanation:

SQL Database and SQL Data Warehouse

SQL threat detection identifies anomalous activities indicating unusual and potentially harmful attempts to access or exploit databases.

Advanced Threat Protection for Azure SQL Database and SQL Data Warehouse detects anomalous activities indicating unusual and potentially harmful attempts to access or exploit databases.

Scenario:

Requirements. Security

The solution must meet the following security requirements:

Unauthorized usage of data must be detected in real time. Unauthorized usage is determined by looking for unusual usage patterns.

Reference:

https://docs.microsoft.com/en-us/azure/sql-database/sql-database-threat-detection-overview

Question #48

DRAG DROP

You need to design the system for notifying law enforcement officers about speeding vehicles.

How should you design the pipeline? To answer, drag the appropriate services to the correct locations. Each service may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content. NOTE: Each correct selection is worth one point.

Reveal Solution Hide Solution

Correct Answer:

Explanation:

Scenario:

Information about vehicles that have been detected as going over the speed limit during the last 30 minutes must be available to law enforcement officers. Several law enforcement organizations may respond to speeding vehicles.

Telemetry Capture

The telemetry capture system records each time a vehicle passes in front of a sensor.

The sensors run on a custom embedded operating system and record the following telemetry data:

– Time

– Location in latitude and longitude

– Speed in kilometers per hour (kmph)

– Length of vehicle in meters


Question #49

You need to design the solution for the government planning department.

Which services should you include in the design?

  • A . Azure SQL Data Warehouse and Elastic Queries
  • B . Azure SQL Database and Polybase
  • C . Azure SQL Data Warehouse and Polybase
  • D . Azure SQL Database and Elastic Queries

Reveal Solution Hide Solution

Correct Answer: C
C

Explanation:

PolyBase is a new feature in SQL Server 2016. It is used to query relational and non-relational databases (NoSQL) such as CSV files.

Scenario: Traffic data must be made available to the Government Planning Department for the purpose of modeling changes to the highway system. The traffic data will be used in conjunction with other data such as information about events such as sporting events, weather conditions, and population statistics. External data used during the modeling is stored in on-premises SQL Server 2016 databases and CSV files stored in an Azure Data Lake Storage Gen2 storage account.

Reference: https://www.sqlshack.com/sql-server-2016-polybase-tutorial/

Question #50

Topic 6, Litware Case

Case study

This is a case study. Case studies are not timed separately. You can use as much exam time as you would like to complete each case. However, there may be additional case studies and sections on this exam. You must manage your time to ensure that you are able to complete all questions included on this exam in the time provided.

To answer the questions included in a case study, you will need to reference information that is provided in the case study. Case studies might contain exhibits and other resources that provide more information about the scenario that is described in the case study. Each question is independent of the other questions in this case study.

At the end of this case study, a review screen will appear. This screen allows you to review your answers and to make changes before you move to the next section of the exam. After you begin a new section, you cannot return to this section.

To start the case study

To display the first question in this case study, click the Next button. Use the buttons in the left pane to explore the content of the case study before you answer the questions. Clicking these buttons displays information such as business requirements, existing environment, and problem statements. If the case study has an All Information tab, note that the information displayed is identical to the information displayed on the subsequent tabs. When you are ready to answer a question, click the Question button to return to the question.

Overview

Litware, Inc. owns and operates 300 convenience stores across the US. The company sells a variety of packaged foods and drinks, as well as a variety of prepared foods, such as sandwiches and pizzas.

Litware has a loyalty club whereby members can get daily discounts on specific items by providing their membership number at checkout.

Litware employs business analysts who prefer to analyze data by using Microsoft Power BI, and data scientists who prefer analyzing data in Azure Databricks notebooks.

Requirements. Business Goals

Litware wants to create a new analytics environment in Azure to meet the following requirements:

– See inventory levels across the stores. Data must be updated as close to real time as possible.

– Execute ad hoc analytical queries on historical data to identify whether the loyalty club discounts increase sales of the discounted products.

– Every four hours, notify store employees about how many prepared food items to produce based on historical demand from the sales data.

Requirements. Technical Requirements

Litware identifies the following technical requirements:

– Minimize the number of different Azure services needed to achieve the business goals

– Use platform as a service (PaaS) offerings whenever possible and avoid having to provision virtual machines that must be managed by Litware.

– Ensure that the analytical data store is accessible only to the company’s on-premises network and Azure services.

– Use Azure Active Directory (Azure AD) authentication whenever possible.

– Use the principle of least privilege when designing security.

– Stage inventory data in Azure Data Lake Storage Gen2 before loading the data into the analytical data store. Litware wants to remove transient data from Data Lake Storage once the data is no longer in use. Files that have a modified date that is older than 14 days must be removed.

– Limit the business analysts’ access to customer contact information, such as phone numbers, because this type of data is not analytically relevant.

– Ensure that you can quickly restore a copy of the analytical data store within one hour in the event of corruption or accidental deletion.

Requirements. Planned Environment

Litware plans to implement the following environment:

– The application development team will create an Azure event hub to receive real-time sales data, including store number, date, time, product ID, customer loyalty number, price, and discount amount, from the point of sale (POS) system and output the data to data storage in Azure.

– Customer data, including name, contact information, and loyalty number, comes from Salesforce and can be imported into Azure once every eight hours. Row modified dates are not trusted in the source table.

– Product data, including product ID, name, and category, comes from Salesforce and can be imported into Azure once every eight hours. Row modified dates are not trusted in the source table.

– Daily inventory data comes from a Microsoft SQL server located on a private network.

– Litware currently has 5 TB of historical sales data and 100 GB of customer data. The company expects approximately 100 GB of new data per month for the next year.

– Litware will build a custom application named FoodPrep to provide store employees with the calculation results of how many prepared food items to produce every four hours.

– Litware does not plan to implement Azure ExpressRoute or a VPN between the on-premises network and Azure.

What should you do to improve high availability of the real-time data processing solution?

  • A . Deploy identical Azure Stream Analytics jobs to paired regions in Azure.
  • B . Deploy a High Concurrency Databricks cluster.
  • C . Deploy an Azure Stream Analytics job and use an Azure Automation runbook to check the status of the job and to start the job if it stops.
  • D . Set Data Lake Storage to use geo-redundant storage (GRS).

Reveal Solution Hide Solution

Correct Answer: A
A

Explanation:

Guarantee Stream Analytics job reliability during service updates

Part of being a fully managed service is the capability to introduce new service functionality and improvements at a rapid pace. As a result, Stream Analytics can have a service update deploy on a weekly (or more frequent) basis. No matter how much testing is done there is still a risk that an existing, running job may break due to the introduction of a bug. If you are running mission critical jobs, these risks need to be avoided. You can reduce this risk by following Azure’s paired region model.

Scenario: The application development team will create an Azure event hub to receive real-time sales data, including store number, date, time, product ID, customer loyalty number, price, and discount amount, from the point of sale (POS) system and output the data to data storage in Azure

Reference: https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-job-reliability

Question #51

Inventory levels must be calculated by subtracting the current day’s sales from the previous day’s final inventory.

Which two options provide Litware with the ability to quickly calculate the current inventory levels by store and product? Each correct answer presents a complete solution. NOTE: Each correct selection is worth one point.

  • A . Consume the output of the event hub by using Azure Stream Analytics and aggregate the data by store and product. Output the resulting data directly to Azure Synapse Analytics. Use Transact-SQL to calculate the inventory levels.
  • B . Output Event Hubs Avro files to Azure Blob storage. Use Transact-SQL to calculate the inventory levels by using PolyBase in Azure Synapse Analytics.
  • C . Consume the output of the event hub by using Databricks. Use Databricks to calculate the inventory levels and output the data to Azure Synapse Analytics.
  • D . Consume the output of the event hub by using Azure Stream Analytics and aggregate the data by store and product. Output the resulting data into Databricks. Calculate the inventory levels in Databricks and output the data to Azure Blob storage.
  • E . Output Event Hubs Avro files to Azure Blob storage. Trigger an Azure Data Factory copy activity to run every 10 minutes to load the data into Azure Synapse Analytics. Use Transact-SQL to aggregate the data by store and product.

Reveal Solution Hide Solution

Correct Answer: A,E
A,E

Explanation:

A: Azure Stream Analytics is a fully managed service providing low-latency, highly available, scalable complex event processing over streaming data in the cloud. You can use your Azure SQL Data Warehouse database as an output sink for your Stream Analytics jobs.

E: Event Hubs Capture is the easiest way to get data into Azure. Using Azure Data Lake, Azure Data Factory, and Azure HDInsight, you can perform batch processing and other analytics using familiar tools and platforms of your choosing, at any scale you need.

Note: Event Hubs Capture creates files in Avro format.

Captured data is written in Apache Avro format: a compact, fast, binary format that provides rich data structures with inline schema. This format is widely used in the Hadoop ecosystem, Stream Analytics, and Azure Data Factory.

Scenario: The application development team will create an Azure event hub to receive real-time sales data, including store number, date, time, product ID, customer loyalty number, price, and discount amount, from the point of sale (POS) system and output the data to data storage in Azure.

Reference:

https://docs.microsoft.com/bs-latn-ba/azure/sql-data-warehouse/sql-data-warehouse-integrate-azure­stream-analytics

https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-capture-overview

Question #52

HOTSPOT

Which Azure service and feature should you recommend using to manage the transient data for Data Lake Storage? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point.

Reveal Solution Hide Solution

Correct Answer:

Explanation:

Scenario: Stage inventory data in Azure Data Lake Storage Gen2 before loading the data into the analytical data store. Litware wants to remove transient data from Data Lake Storage once the data is no longer in use. Files that have a modified date that is older than 14 days must be removed.

Service: Azure Data Factory

Clean up files by built-in delete activity in Azure Data Factory (ADF).

ADF built-in delete activity, which can be part of your ETL workflow to deletes undesired files without writing code. You can use ADF to delete folder or files from Azure Blob Storage, Azure Data Lake Storage Gen1, Azure Data Lake Storage Gen2, File System, FTP Server, sFTP Server, and Amazon S3.

You can delete expired files only rather than deleting all the files in one folder. For example, you may want to only delete the files which were last modified more than 13 days ago.

Feature: Delete Activity


Question #53

Which Azure service should you recommend for the analytical data store so that the business analysts and data scientists can execute ad hoc queries as quickly as possible?

  • A . Azure Data Lake Storage Gen2
  • B . Azure Cosmos DB
  • C . Azure SQL Database
  • D . Azure SQL Data Warehouse

Reveal Solution Hide Solution

Correct Answer: A
A

Explanation:

There are several differences between a data lake and a data warehouse. Data structure, ideal users, processing methods, and the overall purpose of the data are the key differentiators.

Scenario: Litware employs business analysts who prefer to analyze data by using Microsoft Power BI, and data scientists who prefer analyzing data in Azure Databricks notebooks.


Question #54

HOTSPOT

Which Azure Data Factory components should you recommend using together to import the customer data from Salesforce to Data Lake Storage? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point.

Reveal Solution Hide Solution

Correct Answer:

Explanation:

Box 1: Self-hosted integration runtime

A self-hosted IR is capable of nunning copy activity between a cloud data stores and a data

store in private network.

Box 2: Schedule trigger

Schedule every 8 hours

Box 3: Copy activity

Scenario:

✑ Customer data, including name, contact information, and loyalty number, comes from Salesforce and can be imported into Azure once every eight hours. Row modified dates are not trusted in the source table.

✑ Product data, including product ID, name, and category, comes from Salesforce and can be imported into Azure once every eight hours. Row modified dates are not trusted in the source table.


Question #55

HOTSPOT

Which Azure Data Factory components should you recommend using together to import the daily inventory data from SQL to Data Lake Storage? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point.

Reveal Solution Hide Solution

Correct Answer:

Explanation:

Box 1: Self-hosted integration runtime

A self-hosted IR is capable of nunning copy activity between a cloud data stores and a data store in private network.

Scenario: Daily inventory data comes from a Microsoft SQL server located on a private network.

Box 2: Schedule trigger

Daily schedule

Box 3: Copy activity

Scenario:

Stage inventory data in Azure Data Lake Storage Gen2 before loading the data into the analytical data store. Litware wants to remove transient data from Data Lake Storage once the data is no longer in use. Files that have a modified date that is older than 14 days must be removed.


Question #56

Topic 7, Misc. Questions

HOTSPOT

You manage an on-premises server named Server1 that has a database named Database1. The company purchases a new application that can access data from Azure SQL Database.

You recommend a solution to migrate Database1 to an Azure SQL Database instance.

What should you recommend? To answer, select the appropriate configuration in the answer area. NOTE: Each correct selection is worth one point.

Reveal Solution Hide Solution

Correct Answer:

Explanation:

References: https://docs.microsoft.com/en-us/azure/sql-database/sql-database-import


Question #57

You are designing a statistical analysis solution that will use custom proprietary Python functions on near real-time data from Azure Event Hubs.

You need to recommend which Azure service to use to perform the statistical analysis. The solution must minimize latency.

What should you recommend?

  • A . Azure Synapse Analytics
  • B . Azure Stream Analytics
  • C . Azure Databricks
  • D . Azure SQL Database

Reveal Solution Hide Solution

Correct Answer: B
B

Explanation:

Reference: https://docs.microsoft.com/en-us/azure/event-hubs/process-data-azure-stream-analytics

Question #58

You plan to create an Azure Synapse Analytics dedicated SQL pool.

You need to minimize the time it takes to identify queries that return confidential information as defined by the company’s data privacy regulations and the users who executed the queries.

Which two components should you include in the solution? Each correct answer presents part of the solution. NOTE: Each correct selection is worth one point.

  • A . dynamic data masking for columns that contain confidential information
    B sensitivity-classification labels applied to columns that contain confidential information
  • B . resource tags for databases that contain confidential information
  • C . audit logs sent to a Log Analytics workspace

Reveal Solution Hide Solution

Correct Answer: A,B
A,B

Explanation:

Reference: https://www.sqlshack.com/understanding-azure-synapse-analytics-formerly-sql-dw/

Question #59

HOTSPOT

You have an Azure Data Lake Storage Gen2 account named account1 that stores logs as shown in the following table.

You do not expect that the logs will be accessed during the retention periods.

You need to recommend a solution for account1 that meets the following requirements:

✑ Automatically deletes the logs at the end of each retention period

✑ Minimizes storage costs

What should you include in the recommendation? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point.

Reveal Solution Hide Solution

Correct Answer:

Explanation:

Box 1: Store the infrastructure in the Cool access tier and the application logs in the Archive access tier.

Cool – Optimized for storing data that is infrequently accessed and stored for at least 30 days.

Archive – Optimized for storing data that is rarely accessed and stored for at least 180 days with flexible latency requirements, on the order of hours.

Box 2: Azure Blob storage lifecycle management rules

Blob storage lifecycle management offers a rich, rule-based policy that you can use to transition your data to the best access tier and to expire data at the end of its lifecycle.


Question #60

HOTSPOT

You are designing a solution for a company. You plan to use Azure Databricks.

You need to recommend workloads and tiers to meet the following requirements:

✑ Provide managed clusters for running production jobs.

✑ Provide persistent clusters that support auto-scaling for analytics processes.

✑ Provide role-based access control (RBAC) support for Notebooks.

What should you recommend? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point.

Reveal Solution Hide Solution

Correct Answer:

Explanation:

Box 1: Data Engineering Only

Box 2: Data Engineering and Data Analytics

Box 3: Standard

Box 4: Data Analytics only

Box 5: Premium

Premium required for RBAC. Data Analytics Premium Tier provide interactive workloads to analyze data collaboratively with notebooks

References: https://azure.microsoft.com/en-us/pricing/details/databricks/


Question #61

HOTSPOT

You have an Azure subscription that contains a logical Microsoft SQL server named Server1. Server1 hosts an Azure Synapse Analytics SQL dedicated pool named Pool1. You need to recommend a Transparent Data Encryption (TDE) solution for Server1.

The solution must meet the following requirements:

• Track the usage of encryption keys.

• Maintain the access of client apps to Pool1 in the event of an Azure datacenter outage

that affects the availability of the encryption keys.

What should you include in the recommendation? To answer, select the appropriate options in the answer area.

Reveal Solution Hide Solution

Correct Answer:


Question #62

HOTSPOT

A company stores large datasets in Azure, including sales transactions and customer account information.

You must design a solution to analyze the data.

You plan to create the following HDInsight clusters:

You need to ensure that the clusters support the query requirements.

Which cluster types should you recocmmend? To answer, select the appropriate configuration in the answer area. NOTE: Each correct seletion is worth one point.

Reveal Solution Hide Solution

Correct Answer:

Explanation:

Box 1: Interactive Query

Choose Interactive Query cluster type to optimize for ad hoc, interactive queries.

Box 2: Hadoop

Choose Apache Hadoop cluster type to optimize for Hive queries used as a batch process.

Note: In Azure HDInsight, there are several cluster types and technologies that can run Apache Hive queries. When you create your HDInsight cluster, choose the appropriate cluster type to help optimize performance for your workload needs.

For example, choose Interactive Query cluster type to optimize for ad hoc, interactive queries. Choose Apache Hadoop cluster type to optimize for Hive queries used as a batch process. Spark and HBase cluster types can also run Hive queries.

References: https://docs.microsoft.com/bs-latn-ba/azure/hdinsight/hdinsight-hadoop-optimize-hive-query?toc=%2Fko-kr%2Fazure%2Fhdinsight%2Finteractive-query%2FTOC.json&bc=%2Fbs-latn-ba%2Fazure%2Fbread%2Ftoc.json


Question #63

A company is developing a mission-critical line of business app that uses Azure SQL Database Managed Instance. You must design a disaster recovery strategy for the solution.

You need to ensure that the database automatically recovers when full or partial loss of the Azure SQL Database service occurs in the primary region.

What should you recommend?

  • A . Failover-group
  • B . Azure SQL Data Sync
  • C . SQL Replication
  • D . Active geo-replication

Reveal Solution Hide Solution

Correct Answer: A
A

Explanation:

Auto-failover groups is a SQL Database feature that allows you to manage replication and failover of a group of databases on a SQL Database server or all databases in a Managed Instance to another region (currently in public preview for Managed Instance). It uses the same underlying technology as active geo-replication. You can initiate failover manually or you can delegate it to the SQL Database service based on a user-defined policy.

References: https://docs.microsoft.com/en-us/azure/sql-database/sql-database-auto-failover-group

Exit mobile version