Also read, basic data processing pipeline in GCP
Federated Data Sources for Bigquery
A Federated source (external source) is a source which allows BigQuery to query data directly without importing it in BigQuery.
DataStore is not a valid federated source.
Cloud Storage, Cloud SQL, BigTable and Google Drive are valid Data sources for federating data
Data Loading into BigQuery
While loading data into BigQuery, what if you want to allow a set percent (x % ) data to be invalid out of, say 1 million records.
You could use MaxBadRecords to specify max number of bad records.
Data Formats Supported for BigQuery Loads
- Batch load a set of data records from Cloud Storage or from a local file.
- The records can be in Avro, CSV, JSON (newline delimited only), ORC, or Parquet format.
- Proto Buffer is not a supported protocol
Need a hands-on Data Architect, AI, ML or GCP Consultant?
Need help with your data journey? Start the conversation today.