BigQuery Data Loading, Data Sources and Data Formats

Also read, basic data processing pipeline in GCP

Federated Data Sources for Bigquery

A Federated source (external source) is a source which allows BigQuery to query data directly without importing it in BigQuery.

DataStore is not a valid federated source.

Cloud Storage, Cloud SQL, BigTable and Google Drive are valid Data sources for federating data

Data Loading into BigQuery

While loading data into BigQuery, what if you want to allow a set percent (x % ) data to be invalid out of, say 1 million records.

You could use MaxBadRecords to specify max number of bad records.

Data Formats Supported for  BigQuery Loads

  • Batch load a set of data records from Cloud Storage or from a local file.
  • The records can be in Avro, CSV, JSON (newline delimited only), ORC, or Parquet format.
  • Proto Buffer is not a supported protocol

Need a hands-on Data Architect, AI, ML or GCP Consultant?

Need help with your data journey?  Start the conversation today.