Machine Learning Architect: For all your AI and GenAI needs
-
OLTP and OLAP Bottlenecks
Say you have an Online Transactional Database (OLTP). Daily transaction data is extracted, transformed and loaded from this OLTP system to Teradata Database (OLAP) during nightly batch. As the customer base grows, nightly ETL loads are taking longer and longer. Business users are not able to view the latest report the next morning. What solution…
-
DataProc Cost Saving Options
Use Pre-emptible VMs for adding capacity and configure them for graceful shutdown If your apps are fault-tolerant and can withstand possible instance preemptions, then preemptible instances can reduce your Compute Engine costs significantly. For example, batch processing jobs can run on preemptible instances. If some of those instances terminate during processing, the job may slow…
-
Big Query Cost Saving Options
Use the Preview Option for Data Exploration – Zero Cost Dry run option using CLI – Zero Cost Partitioning table enables us to avoid full table scan for those queries which are based on a particular (e.g. calendar) dimension. For e.g. retrieve last month’s data. Using the limit clause doesn’t help in saving the cost.…
-
Data Studio Caching
Data Studio Reports may not get refreshed and you may be unable to see data updated in the last one hour. Data is cached by Data Studio Data Studio caches data for 1 hour if you are leveraging Big Query as your datasource. These settings can be configured through the Data freshness setting. For more information,…
-
What are data streams?
Streaming Ingestion (SI) To use data, a system needs to be able to discover, integrate, and ingest all available data from the machines that produce it, as fast as it’s being produced, in any format, and at any quality. A streaming data ingestion framework doesn’t simply move data from source to destination like traditional ETL solutions.…
-
Classification Models vs. Regression Models
A simple example – say you have Gender (M,F) and Height and Weight data for a group of individuals. Say you want to predict the ‘Gender’ of the next individual. This would be a classification model problem (aka Logistic Regression). Say you want to predict the height of the next individual. This would be a Linear…
Got any book recommendations?