Cloudera CCA175 CCA Spark and Hadoop Developer Exam Online Training

Question #70

Import departments table from mysql to hdfs as parquet file in departments_parquet directory.

Question #72

Write a Sqoop Job which will import "retaildb.categories" table to hdfs, in a directory name "categories_targetJob".

Question #73

Problem Scenario 21: You have been given log generating service as below.

startjogs (It will generate continuous logs)

tailjogs (You can check, what logs are being generated)

stopjogs (It will stop the log service)

Path where logs are generated using above service: /opt/gen_logs/logs/access.log

Now write a flume configuration file named flumel.conf, using that configuration file dumps logs in HDFS file system in a directory called flumel. Flume channel should have following property as well. After every 100 message it should be committed, use non-durable/faster channel and it should be able to hold maximum 1000 events

Reveal Solution Hide Solution

Question #73

Problem Scenario 21: You have been given log generating service as below.

startjogs (It will generate continuous logs)

tailjogs (You can check, what logs are being generated)

stopjogs (It will stop the log service)

Path where logs are generated using above service: /opt/gen_logs/logs/access.log

Now write a flume configuration file named flumel.conf, using that configuration file dumps logs in HDFS file system in a directory called flumel. Flume channel should have following property as well. After every 100 message it should be committed, use non-durable/faster channel and it should be able to hold maximum 1000 events

Reveal Solution Hide Solution

Question #73

Problem Scenario 21: You have been given log generating service as below.

startjogs (It will generate continuous logs)

tailjogs (You can check, what logs are being generated)

stopjogs (It will stop the log service)

Path where logs are generated using above service: /opt/gen_logs/logs/access.log

Now write a flume configuration file named flumel.conf, using that configuration file dumps logs in HDFS file system in a directory called flumel. Flume channel should have following property as well. After every 100 message it should be committed, use non-durable/faster channel and it should be able to hold maximum 1000 events

Reveal Solution Hide Solution

Question #76

Write a hive query to read average salary of all employees.

Reveal Solution Hide Solution

Question #77

Problem Scenario 23: You have been given log generating service as below.

Start_logs (It will generate continuous logs)

Tail_logs (You can check, what logs are being generated)

Stop_logs (It will stop the log service)

Path where logs are generated using above service: /opt/gen_logs/logs/access.log

Now write a flume configuration file named flume3.conf, using that configuration file dumps logs in HDFS file system in a directory called flumeflume3/%Y/%m/%d/%H/%M

Means every minute new directory should be created). Please us the interceptors to provide timestamp information, if message header does not have header info.

And also note that you have to preserve existing timestamp, if message contains it. Flume channel should have following property as well. After every 100 message it should be committed, use non-durable/faster channel and it should be able to hold maximum 1000 events.

Reveal Solution Hide Solution

Question #77

Problem Scenario 23: You have been given log generating service as below.

Start_logs (It will generate continuous logs)

Tail_logs (You can check, what logs are being generated)

Stop_logs (It will stop the log service)

Path where logs are generated using above service: /opt/gen_logs/logs/access.log

Now write a flume configuration file named flume3.conf, using that configuration file dumps logs in HDFS file system in a directory called flumeflume3/%Y/%m/%d/%H/%M

Means every minute new directory should be created). Please us the interceptors to provide timestamp information, if message header does not have header info.

And also note that you have to preserve existing timestamp, if message contains it. Flume channel should have following property as well. After every 100 message it should be committed, use non-durable/faster channel and it should be able to hold maximum 1000 events.

Reveal Solution Hide Solution

Question #77

Problem Scenario 23: You have been given log generating service as below.

Start_logs (It will generate continuous logs)

Tail_logs (You can check, what logs are being generated)

Stop_logs (It will stop the log service)

Path where logs are generated using above service: /opt/gen_logs/logs/access.log

Now write a flume configuration file named flume3.conf, using that configuration file dumps logs in HDFS file system in a directory called flumeflume3/%Y/%m/%d/%H/%M

Means every minute new directory should be created). Please us the interceptors to provide timestamp information, if message header does not have header info.

And also note that you have to preserve existing timestamp, if message contains it. Flume channel should have following property as well. After every 100 message it should be committed, use non-durable/faster channel and it should be able to hold maximum 1000 events.

Reveal Solution Hide Solution

Question #80

While importing, make sure only male employee data is stored.

Reveal Solution Hide Solution

Correct Answer: Step 1: Create hive table for flumeemployee.’

CREATE TABLE flumemaleemployee

(

name string,

salary int,

sex string,

age int

)

ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘, ‘;

step 2: Create flume configuration file, with below configuration for source, sink and channel and save it in flume4.conf.

#Define source, sink, channel and agent.

agent1 .sources = source1

agent1 .sinks = sink1

agent1 .channels = channel1

# Describe/configure source1

agent1 . sources.source1.type = netcat

agent1 .sources.source1.bind = 127.0.0.1

agent1.sources.sourcel.port = 44444

#Define interceptors

agent1.sources.source1.interceptors=il

agent1 .sources.source1.interceptors.i1.type=regex_filter

agent1 .sources.source1.interceptors.i1.regex=female

agent1 .sources.source1.interceptors.i1.excludeEvents=true

## Describe sink1

agent1 .sinks, sinkl.channel = memory-channel

agent1.sinks.sink1.type = hdfs

agent1 .sinks, sinkl. hdfs. path = /user/hive/warehouse/flumemaleemployee

hdfs-agent.sinks.hdfs-write.hdfs.writeFormat=Text

agentl .sinks.sink1.hdfs.fileType = Data Stream

# Now we need to define channel1 property.

agent1.channels.channel1.type = memory

agent1.channels.channell.capacity = 1000

agent1.channels.channel1.transactionCapacity = 100

# Bind the source and sink to the channel

agent1 .sources.source1.channels = channel1

agent1 .sinks.sink1.channel = channel1

step 3: Run below command which will use this configuration file and append data in hdfs.

Start flume service:

flume-ng agent -conf /home/cloudera/flumeconf -conf-file /home/cloudera/flumeconf/flume4.conf –name agentl

Step 4: Open another terminal and use the netcat service, nc localhost 44444

Step 5: Enter data line by line.

alok, 100000, male, 29

jatin, 105000, male, 32

yogesh, 134000, male, 39

ragini, 112000, female, 35

jyotsana, 129000, female, 39

valmiki.123000.male.29

Step 6: Open hue and check the data is available in hive table or not.

Step 7: Stop flume service by pressing ctrl+c

Step 8: Calculate average salary on hive table using below query. You can use either hive command line tool or hue. select avg(salary) from flumeemployee;

Cloudera CCA175 CCA Spark and Hadoop Developer Exam Online Training

Cloudera CCA175 Online Training

The questions for CCA175 were last updated at Mar 28,2026.

Latest CCA175 Dumps Valid Version with 96 Q&As