A medical company needs to copy sensitive data from a relational database to BigQuery. The total size of the database is 15 TB. They need to design a solution that is secure and time-efficient. What should you advise them?
The latest changes and updates from the administration for this exam.
Latest Update: Jun 08 2026
All questions are working fine.
A medical company needs to copy sensitive data from a relational database to BigQuery. The total size of the database is 15 TB. They need to design a solution that is secure and time-efficient. What should you advise them?
Your team wants to integrate internal company application and BigQuery, so employees can make queries directly from the application interface. You need to securely access BigQuery from your application and don't want individual employees to authenticate to BigQuery. You also don't want to give them access to the dataset. What should you recommend?
Your company is building a new voice recognition system for its call centers. The system must process large amounts of audio data from customer calls to transcribe speech, identify speaker changes, and detect specific keywords in real-time. The system must meet the following requirements:
scalable and performant
low latency and high throughput
real-time audio processing and analysis
secure storage and processing of sensitive audio data
ability to update voice recognition models in real-time
ability to store and analyze audio data over long periods of time
ability to handle large amounts of audio data
Which Google Cloud API would you recommend the most for this system?
Solarex is a Software as a Service (SaaS) company specializing in renewable energy sources, mainly photovoltaics. It collects streaming time series data from tens of thousands solar panels around the world. The solar panels are owned and operated by 150 different companies which are Solarex's main customers. The data will be stored in Bigtable using a multitenant database - all customer data will be stored in the same database. The data sent from the solar panel includes a solar panel ID (globally unique), a timestamp and several metrics about performance. Each client will only query their own data. What row key should Solarex use?
Your company processes and stores large volumes of heterogeneous data from various global sources, including structured data, unstructured data, and real-time streaming data. The data analysis team requires real-time access to this data for quick decision making and also demands the ability to run complex SQL queries. At the same time, the company expects data to be highly durable and expects optimized costs for long-term storage. As a data engineer, which data storage strategy would you recommend for this scenario?
When you create a Cloud Spanner instance, you specify its compute capacity as a number of processing units or as a number of nodes (1000 processing units is equal to 1 node). You want to deploy a Cloud Spanner database that can have up to 10 TB of data. What is the minimum number of nodes or processing units required?
Your organization is developing a recommendation engine that leverages Machine Learning models. The data volume is high and the models need to be retrained periodically. The data primarily includes user interactions and product details. Given that you need to store this data, allowing for efficient analysis and model retraining, which data storage and model should you adopt?
As a data engineer, you are responsible for designing the storage layer for an analytics Hadoop cluster in a region with high data access within your organization. Your goal is to execute multiple jobs on a nightly basis, and you have identified Cloud Storage as the preferred solution. What is the most economical choice for this scenario?
A company is using Bigtable in an IoT application. As a data engineer, you are investigating long latencies in query response time. The Key Visualizer heatmap shows two areas with hotspots. What could be the cause of hotspots?
Your company has developed a complex ML model to predict stock prices in real-time. The model needs to serve predictions with minimal latency to thousands of users concurrently. Considering these requirements, which of the following serving infrastructure options would be the most appropriate?