What should you do?

You work on a data science team at a bank and are creating an ML model to predict loan default risk. You have collected and cleaned hundreds of millions of records worth of training data in a BigQuery table, and you now want to develop and compare multiple models on this data using TensorFlow and Vertex AI. You want to minimize any bottlenecks during the data ingestion state while considering scalability.

What should you do?
A . Use the BigQuery client library to load data into a dataframe, and use tf.data.Dataset.from_tensor_slices() to read it.
B. Export data to CSV files in Cloud Storage, and use tf.data.TextLineDataset() to read them.
C. Convert the data into TFRecords, and use tf.data.TFRecordDataset() to read them.
D. Use TensorFlow I/O’s BigQuery Reader to directly read the data.

Answer: D

Explanation:

TensorFlow I/O’s BigQuery Reader allows you to directly read data from BigQuery tables into your TensorFlow model without the need to export the data to a separate file format. This can minimize any bottlenecks during the data ingestion stage and also it can increase the scalability. By using BigQuery Reader, you can easily read large amounts of data from BigQuery and use it to train your model without having to worry about the performance impact of reading from a dataframe or CSV file.

You can use the tfio.BigQueryRecordDataset which will return a dataset of dictionaries, and where each key corresponds to a table column and each value corresponds to the value in that column.

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments