. Advertisement .
..3..
. Advertisement .
..4..
Partitioning is a great way to prevent Hive from scanning the whole table for each query. But this may require you to check out what partitions are available in a table. Read on to learn more about Hive – How to show all partitions of a table.
Hive – How To Show All Partitions Of A Table
SHOW PARTITIONS Statement
You can use the SHOW PARTITIONS statement to get the job done. It can list all the partitions of a given table. Those partitions will be shown in alphabetical order.
The basic syntax of this statement:
SHOW PARTITIONS [database.]table [PARTITION(partition)];
Where:
- table: the name of the table you want to list all its existing partitions.
- [database.]: this is an optional clause. You can use it to access partitions of a table located on a given database.
- [PARTITION(partition)]: this optional clause can be used to show only a certain partition of a Hive table.
If your Hive version is 0.6 or newer, you can filter the result as well. Additionally, Hive 0.13.0 supports showing partitions of a table from a specific database.
The incoming version 4.0.0 brings big changes, including the ability to optionally include LIMIT, ORDER BY, and WHERE clauses to limit, order, or filter the returned list. They have the same effects as in SELECT statements.
Examples
To demonstrate the abilities of SHOW PARTITIONS, we need to create a table with partitions first. This statement loads the homes.csv file into a local Hive table without uploading it to HDFS. It contains information about several home sales in an area, such as listing and selling prices, as well as numbers of rooms and taxes.
LOAD DATA LOCAL INPATH'/home/hive/homes.csv' INTO TABLE Homes;
......... ADVERTISEMENT .........
..8..
Note: You can learn more about the basics of Hadoop (which is required to run Hive) with this guide.
The following statement will list all the partitions in the Homes table we have just created:
SHOW PARTITIONS Homes;
Use this command if you want to list the partition that contains the homes with eight rooms:
SHOW PARTITIONS Homes PARTITION(Rooms = '8');
This result can be displayed in order of your choice when used with the ORDER BY clause. For instance, this command sorts the partition above based on the number of selling prices in descending order:
SHOW PARTITIONS Homes PARTITION(Rooms = '8') ORDER BY Sell DESC;
The LIMIT clause can be added to the statement to limit the number of partitions you want to show. This command only shows two home sales:
SHOW PARTITIONS Homes PARTITION(Rooms = '8') LIMIT 2;
Meanwhile, the WHERE clause will fetch a specific partition that matches your additional condition. Using this command will get only homes with eight rooms, three of which are bathrooms:
SHOW PARTITIONS Homes PARTITION(Rooms = '8') WHERE Baths = '3';
Partitioning Considerations
Making use of partitions in Hive can improve the performance of your work. It eliminates the need to scan the whole table. However, it can actually degrade the performance when used incorrectly.
The first principle is to avoid over-partitioning. A large number of partitions can increase the resource required for data retrieval and loading.
You will end up with a whopping number of small files when you put just small chunks of data inside those partitions. This can lead to much slower performance.
Avoid partitions smaller than 1GB. The bigger, the better. If you insist on a large number of partitions for your database, remember to tune Hive Metastore and Hiveserver2 memory.
Conclusion
Thanks to the SHOW PARTITIONS statement, it is easy to learn Hive – How to show all partitions of a table. You can also modify the query with optional clauses to fine-tune your result.
Leave a comment