Hive interview questions - 10

57. What will be the result when you do cast(‘abc’ as INT) ?
Hive will return NULL

58. Can the name of a view be same as the name of a hive table ?
No. The name of a view must be unique when compared to all other tables and views present in the same database.
CREATE VIEW [IF NOT EXISTS] view_name [(column_name [COMMENT column_comment], ...) ]
[COMMENT table_comment]
AS SELECT ...
hive> CREATE VIEW emp_30000 AS
SELECT * FROM employee
WHERE salary>30000;

59. Can we LOAD data into a view ?
No. A view can not be the target of a INSERT or LOAD statement.

60. What types of costs are associated in creating index on hive tables ?
Indexes occupies space and there is a processing cost in arranging the values of the column on which index is created.

61. Give the command to see the indexes on a table ?
SHOW INDEX ON table_name
This will list all the indexes created on any of the columns in the table table_name.

62. What is bucketing ?
The values in a column are hashed into a number of buckets which is defined by user. It is a way to avoid too many partitions or nested partitions while ensuring optimizes query output.

63. What does /*streamtable(table_name)*/ do ?
It is query hint to stream a table into memory before running the query. It is a query optimization Technique.
Fortunately, you don’t have to put the largest table last in the query. Hive also provides
a “hint” mechanism to tell the query optimizer which table should be streamed:
SELECT /*+ STREAMTABLE(s) */ s.ymd, s.symbol, s.price_close, d.dividend
FROM stocks s JOIN dividends d ON s.ymd = d.ymd AND s.symbol = d.symbol
WHERE s.symbol = 'AAPL';
Now Hive will attempt to stream the stocks table, even though it’s not the last table in the query.

"STREAMTABLE" hint is not longer required. Moreover, if you enable CBO and run analyze commands for table/columns, TEZ execution engine will optimize the plan and dynamically determine which table to stream.

64. Can a partition be archived ? What are the advantages and Disadvantages ?
Yes. A partition can be archived. Advantage is it decreases the number of files stored in namenode and the archived file can be queried using hive. The disadvantage is it will cause less efficient query and does not offer any space savings.

65. What is a generic UDF in hive?
It is a UDF which is created using a java program to server some specific need not covered under the existing functions in Hive. It can detect the type of input argument programmatically and provide appropriate response.

66. The following statement failed to execute. What can be the cause?
LOAD DATA LOCAL INPATH ‘${env:HOME}/country/state/’
OVERWRITE INTO TABLE address;
The local inpath should contain a file and not a directory. The $env:HOME is a valid variable available in the hive environment.

67. How do you specify the table creator name when creating a table in Hive?
The TBLPROPERTIES clause is used to add the creator name while creating a table.
The TBLPROPERTIES is added like −
TBLPROPERTIES(‘creator’= ‘Joan’)

12345678910