Hive interview questions - 9

50. Is it possible to create Cartesian join between 2 tables, using Hive?
No. As this kind of Join can not be implemented in mapreduce.

51. As part of Optimizing the queries in HIve, what should be the order of table size in a join 
In a join query the smallest table to be taken in the first position and largest table should be taken in the last position.

52. Write a query to insert a new column(new_col INT) into a hive table (htab) at a  position before an existing column (x_col)?
ALTER TABLE table_name
CHANGE COLUMN new_col  INT
BEFORE x_col

53.  While loading data into a hive table using the LOAD DATA clause how do you specify it is a hdfs file and not a local file ?
By Omitting the LOCAL CLAUSE in the LOAD DATA statement.

54. If you omit the OVERWRITE clause while creating a hive table what happens to file which are new and files which already exist?
The new incoming files are just added to the target directory and the existing files are simply overwritten. Other files whose cache name does not match any of the incoming files will continue to exist. If you add the OVERWRITE clause then all the existing data in the directory will be deleted before new data is written.

55. What is the usefulness of the DISTRIBUTED BY clause in Hive?
DISTRIBUTE BY controls how map output is divided among reducers. By default, MapReduce computes a hash on the keys output by mappers and tries to evenly distribute the key-value pairs among the available reducers using the hash values. Say we want the data for each value in a column to be captured together. We can use DISTRIBUTE BY to ensure that the records for each go to the same reducer. DISTRIBUTE BY works similar to GROUP BY in the sense that it controls how reducers receive rows for processing, Note that Hive requires that the DISTRIBUTE BY clause come before the SORT BY clause if it's in same query . DISTRIBUTE BY is a good workaround to utilize less memory when you have a memory-intensive job, and forces Hadoop to use Reducers instead of having a Map-only job.

SELECT col1, col2 FROM t1 DISTRIBUTE BY col1
SELECT col1, col2 FROM t1 DISTRIBUTE BY col1 SORT BY col1 ASC, col2 DESC

56. How will you convert the string ’51.2’ to a float value in the price column?
Select cast(price as FLOAT)

12345678910