Hive interview questions - 2

7. What is the Metastore and embedded metastore ?
The metastore is the central repository of Hive metadata. The metastore is divided into two pieces: a service and the backing store for the data. By default, the metastore service runs in the same JVM as the Hive service and contains an embedded Derby database instance backed by the local disk. This is called the embedded metastore configuration.however, only one embedded Derby database can access the database files on disk at any one time, which means you can have only one Hive session open at a time that shares the same metastore.

8. explain Schema on Read Versus Schema on Write ?
In a traditional database, a table’s schema is enforced at data load time. If the data being loaded doesn’t conform to the schema, then it is rejected. This design is sometimes called schema on write because the data is checked against the schema when it is written into the database.Hive, on the other hand, doesn’t verify the data when it is loaded, but rather when a query is issued. This is called schema on read. Schema on read makes for a very fast initial load, since the data does not have to be read, parsed, and serialized to disk in the database’s internal format. Schema on write makes query time performance faster because the database can index columns and perform compression on the data.

9. What is Hive Locks ?
Hive also has support for table and partition-level locking. Locks prevent, for example,one process from dropping a table while another is reading from it. Locks are managed  transparently using ZooKeeper.

10. What are Indexes in Hive ?
There are currently two index types: compact and bitmap.
Compact indexes store the HDFS block numbers of each value, rather than each file offset.
Bitmap indexes use compressed bitsets to efficiently store the rows that a particular value appears in, and they are usually appropriate for low-cardinality columns.

11. What are the hive Data Types ?
Hive supports both primitive and complex data types. Primitives include numeric,Boolean, string, and timestamp types. The Complex data types include arrays, maps, and structs.
BOOLEAN type for storing true and false values. There are four signed integral types: TINYINT, SMALLINT, INT, and BIGINT, which are equivalent to Java’s byte, short, int, and long primitive types, respectively; they are 1-byte, 2-byte, 4-byte, and 8-byte signed integers.
Hive’s floating-point types, FLOAT and DOUBLE, correspond to Java’s float and double, which are 32-bit and 64-bit floating point numbers. The DECIMAL data type is used to represent arbitrary-precision decimals, like

12. What are the data type for storing Text ?
There are three Hive data types for storing text. STRING is a variable-length character string with no declared maximum length. (The theoretical maximum-size STRING that may be stored is 2GB, although in practice it may be inefficient to materialize such large values.
VARCHAR types are similar except they are declared with a maximum length between 1
and 65355; for example, VARCHAR(100).
CHAR types are fixed-length strings that are padded with trailing spaces if necessary;

12345678910