Associate Data Practitioner

Unlock the power of your data in the cloud! Get hands-on with Google Cloud's core data services like BigQuery and Looker to validate your practical skills in data ingestion, analysis, and management, and earn your Associate Data Practitioner certification!

Practice Test

Fundamental

Practice Test

Fundamental

Differentiate between different data manipulation methodologies (e.g., ETL, ELT, ETLT)

1.1 Prepare and process data

Differentiate between different data manipulation methodologies (e.g., ETL, ELT, ETLT)

Understand the ETL Process

The ETL process is a basic data integration method that stands for Extract, Transform, and Load. It helps combine data from multiple sources into a single database. It’s widely used in traditional data warehousing to prepare data for reporting and analysis. In GCP, ETL tasks often use services like Dataflow, Cloud Data Fusion, and BigQuery Data Transfer Service.

In the Extract stage, data is pulled from on-premises systems, cloud storage, or streaming services. GCP tools can handle both batch and real-time extractions. Extracted data is usually stored in Cloud Storage or directly staged in BigQuery. This approach keeps raw data available for audit and reprocessing if needed.

The Transform stage refines the extracted data to make it suitable for analysis. You can use features in BigQuery such as:

Materialized views: precomputed views that update with base table changes
Continuous queries: ongoing SQL queries that process streaming data
Dataform: a development environment for managing ELT workflows with SQLX and JavaScript
These tools help clean, enrich, and reshape data before loading.

In the Load stage, the transformed data is written into a target warehouse like BigQuery. Loading can be done in bulk or incrementally, depending on workload and latency needs. GCP automatically scales storage and compute, which can handle large data volumes. This stage ensures that the data is organized and ready for use by analysts.

ETL on GCP is well-suited for traditional data warehousing, operational reporting, and business intelligence. Using BigQuery offers benefits like:

Automated scaling to manage large datasets
Cost efficiency with pay-as-you-go pricing
Improved data quality through integrated testing and validation
These features make it easier to centralize data and keep it accurate.

Conclusion

The ETL method gathers data, reshapes it, and loads it into a warehouse, making it ready for analysis. GCP services—Dataflow, Cloud Data Fusion, and BigQuery—provide a complete toolset for each stage. By separating extraction, transformation, and loading, ETL ensures data is accurate and consistent.

While ETL performs transformations before loading, the ELT approach loads data first and then transforms it inside the warehouse. This can simplify pipelines for large datasets by using the warehouse’s compute power. The ETLT method combines both strategies, transforming data both before and after loading to meet complex processing needs.

Choosing between ETL, ELT, and ETLT depends on factors like data volume, processing latency, and tooling capabilities. Understanding these methodologies helps you pick the right GCP services and design efficient data workflows. This foundation prepares you for building robust solutions on Google Cloud.

Choose the appropriate data transfer tool (e.g., Storage Transfer Service, Transfer Appliance)