IBM C2090-101 IBM Big Data Engineer Online Training

exams

5 years ago

Question #1

Which statement is TRUE concerning optimizing the load performance?

A . You can improve the performance by increasing the number of map tasks assigned to the load
B . When loading large files the number of files that you load does not impact the performance of the LOAD HADOOP statement
C . You can improve the performance by decreasing the number of map tasks that are assigned to the load and adjusting the heap size
D . It is advantageous to run the LOAD HADOOP statement directly pointing to large files located in the host file system as opposed to copying the files to the DFS prior to load

Correct Answer: A
A

Explanation:

Reference: https://www.ibm.com/support/knowledgecenter/en/SSCRJT_5.0.3/com.ibm.swg.im.bigsql.doc/doc/bigsql_loadperf.html

Question #2

Which of the following statements are TRUE regarding the use of Data Click to load data into BigInsights? (Choose two.)

A . Big SQL cannot be used to access the data moved in by Data Click because the data is in Hive
B . You must import metadata for all sources and targets that you want to make available for Data Click activities
C . Connections from the relational database source to HDFS are discovered automatically from within Data Click
D . Hive tables are automatically created every time you run an activity that moves data from a relational database into HDFS
E . HBase tables are automatically created every time you ran an activity that moves data from a relational database into HDFS

Reveal Solution Hide Solution

Correct Answer: CE
CE

Explanation:

Reference: https://www.ibm.com/support/knowledgecenter/en/SSZJPZ_11.3.0/com.ibm.swg.im.iis.dataclick.doc/topics/hivetables.html

Question #3

Which of the following statements regarding importing streaming data from InfoSphere Streams into Hadoop is TRUE?

A . InfoSphere Streams can both read from and write data to HDFS
B . The Streams Big Data toolkit operators that interface with HDFS uses Apache Flume to integrate with Hadoop
C . Streams applications never need to be concerned with making the data schemas consistent with those on Hadoop
D . Big SQL can be used to preprocess the data as it flows through InfoSphere Streams before the data lands in HDFS

Reveal Solution Hide Solution

Correct Answer: D

Question #4

Which of the following is TRUE about storing an Apache Spark object in serialized form?

A . It is advised to use Java serialization over Kryo serialization
B . Storing the object in serialized from will lead to faster access times
C . Storing the object in serialized from will lead to slower access times
D . All of the above

Reveal Solution Hide Solution

Correct Answer: B
B

Explanation:

Reference: https://spark.apache.org/docs/latest/rdd-programming-guide.html

Question #5

Which ONE of the following statements regarding Sqoop is TRUE?

A . HBase is not supported as an import target
B . Data imported using Sqoop is always written to a single Hive partition
C . Sqoop can be used to retrieve rows newer than some previously imported set of rows
D . Sqoop can only append new rows to a database table when exporting back to a database

Reveal Solution Hide Solution

Correct Answer: C
C

Explanation:

Reference: https://sqoop.apache.org/docs/1.4.1-incubating/SqoopUserGuide.html

Question #6

Which one of the following statements is TRUE?

A . Spark SQL does not support HiveQL
B . Spark SQL does not support ANSI SQL
C . To use Spark with Hive, HiveQL queries have to rewritten in Scala
D . Spark SQL allows relational queries expressed in SQL, HiveQL, or Scala

Reveal Solution Hide Solution

Correct Answer: D
D

Explanation:

Reference: https://spark.apache.org/docs/1.2.1/sql-programming-guide.html (overview)

Question #7

Which of the following statements regarding Big SQL is TRUE?

A . Big SQL doesn’t support stored procedures
B . Big SQL can be deployed on a subset of data nodes in the BigInsights cluster
C . Big SQL provides a SQL-on-Hadoop environment based on map reduce
D . Only tables created or loaded via Big SQL can be accessed via Big SQL

Reveal Solution Hide Solution

Correct Answer: B
B

Explanation:

Reference: https://books.google.com.pk/books?id=t13nCQAAQBAJ&pg=PA3&lpg=PA3&dq=Big+SQL+can+be+deployed+on+a+subset+of+data+nodes+in+the+BigInsights+cluster&source=bl&ots=RBbad0Xkel&sig=pMgmgDNLGUrkvOSXoVBj64xTMgk&hl=en&sa=X&redir_esc=y#v=onepage&q=Big%20SQL%20can%20be%20deployed%20on%20a%20subset%20of%20data%20nodes%20in%20the%20BigInsights%20cluster&f=false

Question #8

The number of partitions created by DynamicPartitions in Hive can be controlled by which of the following?

A . hive.exec.max.dynamic.partitions.pernode
B . hive.exec.max.dynamic.partitions
C . hive.exec.max.created.files
D . All of the above

Reveal Solution Hide Solution

Correct Answer: A
A

Explanation:

Reference: https://resources.zaloni.com/blog/partitioning-in-hive

Question #9

Which of the following Jaq operators groups one or more arrays based on key values and applies an aggregate expression?

A . join
B . group
C . expand
D . transform

Reveal Solution Hide Solution

Correct Answer: B
B

Explanation:

Reference: https://books.google.com.pk/books?id=Qj-5BQAAQBAJ&pg=PA174&lpg=PA174&dq=Jaq+operators+groups+one+or+more+arrays+based+on+key+values+and+applies+an+aggregate+expression&source=bl&ots=zobr8AZzWy&sig=ZRCIH9ee4Un3Aam1hX8TzxfrfQI&hl=en&sa=X&redir_esc=y#v=onepage&q=Jaq%20operators%20groups%20one%20or%20more%20arrays%20based%20on%20key%20values%20and%20applies%20an%20aggregate%20expression&f=false

Question #10

Which of the following are CRUD operations available in HBase? (Choose two.)

A . HTable.Put
B . HTable.Read
C . HTable.Delete
D . HTable.Update
E . HTable.Remove

Reveal Solution Hide Solution

Correct Answer: AC
AC

Explanation:

Reference: https://www.tutorialspoint.com/hbase/hbase_client_api.htm

Question #11

Which statement is TRUE about Big SQL?

A . The table definition can include other attributes such as the primary key or check constraints
B . When using Big SQL, the CREATE TABLE statement cannot be embedded in an application program
C . If a sub-table is being defined, the authorization ID can be either the same as the owner of the root table or an equivalent
D . When defining a staging table associated with a materialized query table, the privileges held by the authorization ID of the statement only works with DBADM authority

Reveal Solution Hide Solution

Correct Answer: D
D

Explanation:

Reference: https://www.ibm.com/support/knowledgecenter/fi/SSCRJT_5.0.2/com.ibm.swg.im.bigsql.commsql.doc/doc/r0000927.html

Question #12

Which of the following statements is TRUE regarding search visualization with Apache Hue?

A . Hue submits MapReduce jobs to Oozie
B . No additional setup is required to secure your session cookies
C . Hue applications require some code to be installed on the client
D . The File Browser application allows you to perform keyword searches across your Hadoop data

Reveal Solution Hide Solution

Correct Answer: A

Question #13

A Resilient Distributed Dataset supports which of the following?

A . Creating a new dataset from an old one
B . Returning a computed value to the driver program
C . Both “Creating a new dataset from an old one” and “Returning a computed value to the driver program”
D . Neither “Creating a new dataset from an old one” nor “Returning a computed value to the driver program”

Reveal Solution Hide Solution

Correct Answer: C
C

Explanation:

Reference: https://spark.apache.org/docs/latest/rdd-programming-guide.html (RDD operations)

Question #14

In order for an SPSS Modeler stream to be incorporated for use in an InfoSphere Streams application leveraging SPSS Modeler Solution Publisher, you need to:

A . add a Type node
B . insert any Output node
C . add a Table node as the terminal node
D . Make the terminal node a scoring branch

Reveal Solution Hide Solution

Correct Answer: D

Question #15

Which of the following Hive data types is directly supported in Big SQL without any changes?

A . INT
B . STRING
C . STRUCT
D . BOOLEAN

Reveal Solution Hide Solution

Correct Answer: B
B

Explanation:

Reference: https://www.ibm.com/support/knowledgecenter/en/SSCRJT_5.0.1/com.ibm.swg.im.bigsql.dev.doc/doc/biga_numbers.html

Question #16

Which parameters are considered when configuring Big Match algorithm?

A . Search and custom requirements
B . Accuracy, search, and performance
C . Adaptive weighting and standardization
D . Empirical components, accuracy, and performance

Reveal Solution Hide Solution

Correct Answer: B

Question #17

The GPFS implementation of Data Management API is compliant to which Open Group storage management Standard?

A . XSH
B . XBD
C . XDSM
D . X /Open

Reveal Solution Hide Solution

Correct Answer: C
C

Explanation:

Reference: https://www.ibm.com/support/knowledgecenter/en/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs400.doc/bl1dmp_intro.htm

Question #18

Which file format support Column data compression? (Choose two.)

A . Text
B . Avro
C . RCFile
D . Parquet
E . Sequence_text

Reveal Solution Hide Solution

Correct Answer: CD

Question #19

Which statement about the Jaqi Programming Language is TRUE?

A . Jaqi always produces a MapReduce job, but Combiner functionality is optional
B . Jaqi includes the following operators: filter, extend, groupby, combine, and transform
C . Data that is read from multiple blocks (splits) is always processed in parallel by MapReduce
D . The read operator loads data from different source and formats, and then converts this data into JSON format for internal processing by the Jaqi interpreter

Reveal Solution Hide Solution

Correct Answer: C

Question #20

When we create a new table in Hive, which clause can be used in HiveSQL to indicate the storage file format?

A . SAVE AS
B . MAKE AS
C . FORMAT AS
D . STORED AS

Reveal Solution Hide Solution

Correct Answer: D

Question #21

Which of the following is not a capability of Pig?

A . Low-latency queries
B . Schemas are optional
C . Nested relational data model
D . A high level abstraction on top of MapReduce

Reveal Solution Hide Solution

Correct Answer: A
A

Explanation:

Reference: http://hadooptutorial.info/apache-pig-overview/

Question #22

Given a file named readme.txt, which command will copy the readme.txt file to the <user> directory on the HDFS?

A . hadoop fs Ccp readme.txt hdfs://test.ibm.com:9000/<user>
B . hadoop fs Ccp hdfs://test.ibm.com:9000/<user> readme.txt
C . hadoop fs Cput readme.txt hdfs://test.ibm.com:9000/<user>
D . hadoop fs Cput hdfs://test.ibm.com:9000/<user> readme.text

Reveal Solution Hide Solution

Correct Answer: B

Question #23

Which of the following is the most effective method for improving query performance on large Hive tables?

A . Indexing
B . Bucketing
C . Partitioning
D . De-normalizing data

Reveal Solution Hide Solution

Correct Answer: B
B

Explanation:

Reference: https://dzone.com/articles/how-to-improve-hive-query-performance-with-hadoop

Question #24

Which one of the following is NOT provided by the SerDe interface?

A . SerDe interface has to be built using C or C++ language
B . Allows SQL-style queries across data that is often not appropriate for a relational database
C . Serializer takes a Java object that Big SQL has been working with, and turns it into a format that BigSQL can write to HDFS
D . Deserializer interface takes a string or binary representation of a record, and translates it into a Java object that Big SQL can manipulate

Reveal Solution Hide Solution

Correct Answer: A

Question #25

Which of the following are capabilities of the Apache Spark project?

A . Large scale machine learning
B . Large scale graph processing
C . Live data stream processing
D . All of the above

Reveal Solution Hide Solution

Correct Answer: B
B

Explanation:

Reference: https://spark.apache.org/

Question #26

Which of the following Big SQL statements is valid?

A . CREATE TABLE t1 WITH CS;
B . WITH t1 AS (…)
(SELECT * FROM t1 WITH RR USE AND KEEP SHARE LOCKS)
UNION ALL
(SELECT * FROM t1 WITH UR);
C . SELECT deptno, deptname, mgrno FROM t1
WHERE admrdept =‘A00’
FOR READ ONLY WITH RS USE AND KEEP EXCLUSIVE LOCKS
D . ALTER TABLE t1 ALTER COLUMN deptname SET DATA TYPE VARCHAR(100) USE AND KEEP UPDATE LOCKS

Reveal Solution Hide Solution

Correct Answer: C

Question #27

Which of the following techniques is NOT employed by Big SQL to improve performance?

A . Query Optimization
B . Predicate Push down
C . Compression efficiency
D . Load data into DB2 and return the data

Reveal Solution Hide Solution

Correct Answer: A
A

Explanation:

Reference: https://www.ibm.com/support/knowledgecenter/en/SSZLC2_7.0.0/com.ibm.commerce.developer.soa.doc/refs/rsdperformanceworkspaces.htm

Question #28

When embedding SPSS models within InfoSphere Streams, what SPSS product must be installed on the same machine with InfoSphere Streams?

A . SPSS Modeler
B . SPSS Solution Publisher
C . SPSS Accelerator for InfoSphere Streams
D . None, the SPSS software runs remotely to the Streams machine

Reveal Solution Hide Solution

Correct Answer: B

Question #29

Which of the following statements regarding Sqoop is TRUE? (Choose two.)

A . All columns in a table must be imported
B . Sqoop bypasses MapReduce for enhanced performance
C . Each row from a source table is represented as a separate record in HDFS
D . When using a password file, the file containing the password must reside in HDFS
E . Multiple options files can be specified when invoking Sqoop from the command line

Reveal Solution Hide Solution

Correct Answer: CE
CE

Explanation:

Reference: https://data-flair.training/blogs/apache-sqoop-tutorial/

Question #30

Use of Bulk Load in HBase for loading large volume of data will result in which of the following?

A . It will use less CPU but will use more network resource
B . It will use less network resource but more CPU
C . It will behave same way as using HBase API for loading large volume of data
D . None of the above

Reveal Solution Hide Solution

Correct Answer: C