Category: BigQuery ML
-
Clustering Use Cases – Unsupervised Machine Learning on GCP
Clustering is one of the most common patterns in Unsupervised machine learning. Some areas / use cases where we can apply clustering include: Market segmentation Social network analysis Search result grouping Medical imaging Image segmentation Anomaly detection BigQuery ML: Is ideal for clustering use cases (and SQL-based machine learning use cases). Apart from BQML, GCP…
-
DataProc on GCP – Job Scoped Cluster Model
If your landscape is primarily ETL and Batch jobs, the job per cluster paradigm (shown below) works. DataProc Pricing Pricing consists of Cluster Size and Duration of Run Pricing formula is: $0.010 * # of vCPUs * hourly duration Dataproc clusters are billed in one-second clock-time increments Scaling and autoscaling clusters. When VMs are added…
-
BigQuery Data Loading, Data Sources and Data Formats
Also read, basic data processing pipeline in GCP Federated Data Sources for Bigquery A Federated source (external source) is a source which allows BigQuery to query data directly without importing it in BigQuery. DataStore is not a valid federated source. Cloud Storage, Cloud SQL, BigTable and Google Drive are valid Data sources for federating data…
-
Big Query Cost Saving Options
Use the Preview Option for Data Exploration – Zero Cost Dry run option using CLI – Zero Cost Partitioning table enables us to avoid full table scan for those queries which are based on a particular (e.g. calendar) dimension. For e.g. retrieve last month’s data. Using the limit clause doesn’t help in saving the cost.…