Category: Hadoop DataLake
-
Hive and RDBMS Sync Issues
Using Hive, perform analysis on the data stored in HDFS. The data is being regularly retrieved from a RDBMS store. The RDBMS is frequently updated. This causes a lot of duplicate data in HDFS. How would you overcome this issue? ORC file format provides update functionality on HDFS using Hive transactional tables. Need a hands-on…
-
OLTP and OLAP Bottlenecks
Say you have an Online Transactional Database (OLTP). Daily transaction data is extracted, transformed and loaded from this OLTP system to Teradata Database (OLAP) during nightly batch. As the customer base grows, nightly ETL loads are taking longer and longer. Business users are not able to view the latest report the next morning. What solution…