Category: Uncategorized

  • Data catalogs, data lineage, data quality, and data observability – for big data workflows

    Data catalogs, data lineage, data quality, and data observability as they apply to big data workflows: 1. Data Catalogs Definition: A data catalogue is an organized inventory of all the data assets within an organization, which includes metadata that describes the data. In big data workflows, it serves as a centralized repository where users can…

  • DataProc Cost Saving Options

    Use Pre-emptible VMs for adding capacity and configure them for graceful shutdown If your apps are fault-tolerant and can withstand possible instance preemptions, then preemptible instances can reduce your Compute Engine costs significantly. For example, batch processing jobs can run on preemptible instances. If some of those instances terminate during processing, the job may slow…