Using Hive, perform analysis on the data stored in HDFS. The data is being regularly retrieved from a RDBMS store. The RDBMS is frequently updated. This causes a lot of duplicate data in HDFS.
How would you overcome this issue?
ORC file format provides update functionality on HDFS using Hive transactional tables.
Need a hands-on Data Architect, AI, ML or GCP Consultant?
Need help with your data journey? Start the conversation today.