Hive and RDBMS Sync Issues

Using Hive, perform analysis on the data stored in HDFS. The data is being regularly retrieved from a RDBMS store. The RDBMS is frequently updated. This causes a  lot of duplicate data in HDFS.

How would you overcome this issue?

ORC file format provides update functionality on HDFS using Hive transactional tables.

Need a hands-on Data Architect, AI, ML or GCP Consultant?

Need help with your data journey?  Start the conversation today.