Tag: hive and duplicate data
-
Hive and RDBMS Sync Issues
Using Hive, perform analysis on the data stored in HDFS. The data is being regularly retrieved from a RDBMS store. The RDBMS is frequently updated. This causes a lot of duplicate data in HDFS. How would you overcome this issue? ORC file format provides update functionality on HDFS using Hive transactional tables. Need a hands-on…