Growth teams need more than dashboards built on pre aggregated data. They need the flexibility to combine GA4 event data with other business sources, run custom funnel analysis, and create reporting that adapts as fast as their experiments. The standard GA4 interface provides many prebuilt reports, but it limits how deep you can go with segmentation, joins, and historical data retention. Connecting GA4 to BigQuery unlocks raw event level data that you can query with SQL and keep for as long as you want.
This article covers the complete workflow from enabling the export to writing production grade queries that growth teams can use daily. It assumes you have basic familiarity with GA4 and Google Cloud Platform but no advanced data engineering background.
Why Growth Teams Need BigQuery with GA4
GA4 stores data in a privacy aware way with event parameters and user properties, but its built in reports have limitations. You can only apply a certain number of segments, you cannot join with internal CRM or ad spend data, and the free retention for event data is 14 months. Growth teams often need to look further back or combine traffic data with offline conversion data to understand true incrementality.
BigQuery is a serverless data warehouse optimized for large scale SQL queries. When you export GA4 events to BigQuery, every event sent from your website or app becomes a row in a table. You can run arbitrary SQL on that table, aggregate by any parameter, create user level session tables, and join with other datasets. This gives growth teams the freedom to build bespoke attribution models, cohort analyses, and product adoption metrics that are not available out of the box.
Prerequisites for the GA4 to BigQuery Export
Before you set up the export, confirm you have the following:
You need a Google Cloud Project with billing enabled. The BigQuery sandbox is free up to certain limits, but real growth team volumes will likely incur costs for storage and query processing. You also need the GA4 account to have at least Editor or higher permissions to create the export link.
The export itself is configured inside GA4 under Admin, then BigQuery Linking. You must follow the setup wizard to select the Cloud Project, choose a dataset region, and decide whether to export daily events and streaming events. For most growth teams, enabling both daily and streaming provides the best balance between cost and freshness.
Understanding the Exported Data Structure
Once the export runs, BigQuery creates a dataset named analytics_. Inside that dataset you will find tables named events_YYYYMMDD for daily exports and a materialized view for intraday streaming data. The schema is nested and repeated, meaning many columns are of type RECORD containing arrays.
The most important columns include event_name, event_timestamp, event_params, user_pseudo_id, user_id (if set), device, geo, traffic_source, and ecommerce fields. The event_params column is an array of key value pairs. To extract a specific parameter you use UNNEST in SQL. For example, to get the value of a custom parameter named campaign_id, you write a subquery with UNNEST(event_params) and filter on key.
A note on timestamps: event_timestamp is in microseconds since epoch. You can convert it to a human readable date using TIMESTAMP_MICROS(event_timestamp).
Setting Up the Export with Minimal Data Loss
When you configure the export in GA4, you have two options: export daily events and stream events. Daily exports run once per day and include all events from the previous day. Streaming exports push events as they occur, but they may have a slight delay and are available as an intraday table. For critical real time monitoring, most growth teams rely on the streaming table. However, the streaming table is not partitioned by date and can be more expensive to query.
To avoid missing data, ensure that the export is active before you start collecting events. There is no backfill option. If you notice gaps, check the GA4 Admin for any permission issues. Also, note that GA4 filters internal traffic and any events that are blocked by data deletion requests. Those events will not appear in BigQuery.
Querying GA4 Data Efficiently for Growth Use Cases
Writing SQL against GA4 event data requires some adjustment if you are used to flat tables. The nested structure allows GA4 to store many parameters without creating new columns, but it forces you to unpivot data in queries. Below are common patterns.
Daily Active Users by Source
SELECT date, traffic_source.source, COUNT(DISTINCT user_pseudo_id) AS daily_users FROM `your_project.analytics_123456789.events_*` WHERE event_name = ‘session_start’ AND PARSE_DATE(‘%Y%m%d’, _TABLE_SUFFIX) BETWEEN ‘2024-01-01’ AND ‘2024-12-31’ GROUP BY date, traffic_source.source;
This query uses the wildcard table events_* to scan multiple days. The _TABLE_SUFFIX pseudo column contains the date portion of the table name.
Funnel Steps with Multiple Events
To build a funnel for registration, you can use a left join approach or window functions. A simple method is to aggregate events per user and then check the sequence of timestamps. For example, to count users who saw a product page and then reached checkout, you can write a query that first collects the earliest timestamp of each event per user and then counts those where the product page event occurred before the checkout event.
SELECT COUNT(DISTINCT user_pseudo_id) AS users_in_funnel FROM ( SELECT user_pseudo_id, MIN(CASE WHEN event_name = ‘view_item’ THEN event_timestamp END) AS view_time, MIN(CASE WHEN event_name = ‘start_checkout’ THEN event_timestamp END) AS checkout_time FROM `your_project.analytics_123456789.events_*` GROUP BY user_pseudo_id ) WHERE view_time IS NOT NULL AND checkout_time IS NOT NULL AND view_time < checkout_time;
User Sessionization for Growth Metrics
GA4 sessions are already aggregated inside the export through the session_id event parameter. You can create a session level table by grouping by user_pseudo_id and session_id. For each session, you can compute total events, session duration, and first source. This is useful for growth metrics like session conversion rate or average session value.
CREATE OR REPLACE TABLE `your_project.growth_analytics.sessions_2024` AS SELECT user_pseudo_id, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = ‘ga_session_id’) AS session_id, MIN(event_timestamp) AS session_start, MAX(event_timestamp) AS session_end, COUNT(*) AS total_events, ANY_VALUE(traffic_source.source) AS first_source FROM `your_project.analytics_123456789.events_*` WHERE event_name IS NOT NULL GROUP BY user_pseudo_id, session_id;
Be careful with the session_id extraction: it is stored as an integer parameter. Use UNNEST to fetch it.
Cost Management and Query Optimization
BigQuery charges by the amount of data processed per query, not by the amount stored (though storage also costs). To keep costs predictable, follow these practices. Use partitioned tables. The daily GA4 tables are partitioned by ingestion date, so always filter on _TABLE_SUFFIX or use the DATE column from event_date if available. This limits the scan to only the days you need.
Avoid SELECT * on the events table. The table contains many columns and nested fields. Instead, select only the columns you need. Use the preview feature in the BigQuery console to understand the schema before writing full queries.
If your growth team runs many repeated queries, consider creating aggregated materialized views or scheduled query snapshots. BigQuery allows you to schedule queries that run periodically and write results to a smaller table. This reduces the amount of raw data scanned for future reports.
Monitor your billing in the GCP console. Set custom cost controls and alerts. For teams on a tight budget, restrict query access to only read the last 30 days of data for most users, and grant full history access only to analysts who need it.
Building a Growth Analytics View on Top of GA4 in BigQuery
Once the raw data is flowing, you can create a dedicated view layer for growth team members who are not comfortable writing complex SQL. A view is a saved query that acts like a table. For example, you can create a view that flattens the event parameters into columns for key events like purchase, signup, or trial start.
This view can join the GA4 data with other tables such as a product catalog from your internal database, or a table of ad spend from Google Ads or Facebook. The joined table gives you a single source of truth for growth metrics like cost per acquisition and return on ad spend.
To join with ad spend data, you first need to import that spend data into BigQuery. Many platforms allow direct exports, or you can use a tool like Supermetrics or Stitch to land the data in a staging dataset. Then write a query that joins the GA4 purchase event by transaction ID with the corresponding ad click data.
Example join for last click attribution: SELECT s.user_pseudo_id, s.transaction_id, s.revenue, a.campaign_name FROM ( SELECT user_pseudo_id, (SELECT value.string_value FROM UNNEST(event_params) WHERE key = ‘transaction_id’) AS transaction_id, (SELECT value.double_value FROM UNNEST(event_params) WHERE key = ‘value’) AS revenue FROM `your_project.analytics_123456789.events_*` WHERE event_name = ‘purchase’ ) s LEFT JOIN `your_project.ad_spend.google_ads_clicks` a ON s.user_pseudo_id = a.user_id;
This is a simplified example. In practice you need a mapping table between GA4 user identifiers and your ad platform user identifiers, which often requires a user ID scheme.
Scheduling Reports and Alerts
BigQuery can schedule queries via the Scheduled Queries feature in the console or through the BigQuery Data Transfer Service. You can set a scheduled query to run every morning and populate a table with key growth metrics such as daily active users, conversion rate by channel, and average revenue per user. Then you connect that table to a visualization tool like Looker Studio to build a live dashboard.
You can also use BigQuery to create alerts. For example, if sessions drop by more than 20 percent compared to the same day last week, you can run a scheduled query that checks the condition and sends a notification through Cloud Functions or a third party service.
Growth teams that rely on frequent experiment analysis can set up a daily dataset that precomputes experiment groups based on user_id or custom dimension. This eliminates the need to join the raw GA4 table each time you want to measure an A/B test.
Common Pitfalls and How to Avoid Them
One common mistake is forgetting that GA4 exports event data, not session data. To get session level metrics you need to aggregate events using the session_id parameter. Another issue is the latency of the streaming export. The intraday table is updated every few minutes but not in real time. For sub minute decisions, consider other solutions like Google Tag Manager server side with a firehose.
Data type mismatches can cause query errors. GA4 event parameters can be stored as string, integer, or float depending on how they are defined in the GA4 interface. Always check the value fields: .int_value, .double_value, .string_value, or .float_value. Using the wrong one returns NULL.
Finally, growth teams often underestimate the cost of querying the full event table repeatedly. Establish a culture of writing optimized SQL and using previews before running large queries. Set query quotas per user to avoid surprise bills.
Moving from Ad Hoc Queries to Embedded Analytics
As the team matures, you can embed BigQuery insights directly into your product or internal tools. For example, you can use the BigQuery API to fetch growth metrics on demand and surface them in a growth dashboard built inside your app. This eliminates the need for manual SQL runs and gives every team member access to the same data.
Tools like Looker Studio and Tableau can connect directly to BigQuery. For more flexible embedding, you can write a Cloud Function that runs a query and returns a JSON response. This approach works well for internal growth tools that need to display metrics like weekly active users or experiment results.
The move to BigQuery based reporting is not just about technical setup. It is about changing how the team thinks about data. Instead of working with pre aggregated reports, growth teams can ask any question that can be expressed in SQL. This unlocks deeper insights into user behavior and faster iteration cycles.
Leave a Reply