TL;DR Ecommerce data looks simple — you have orders, you have sessions, you have ad spend. But Shopify, GA4, and Meta all count conversions differently, and none of them agree on how much revenue you made yesterday. Data engineering is the work of resolving that and building a single source of truth. This guide explains what that work involves, what you’re paying for, and how long it takes. Book a 20-min call to scope it for your store.
What “data engineering” means for an ecommerce brand
You already have a lot of data. Shopify has every order, every refund, every customer. GA4 has every session, every event, every funnel drop-off. Meta Ads Manager has your ad spend and its own version of your conversions. Google Ads has another version. Your email platform has open rates and click-throughs.
The problem is not a shortage of data. The problem is that these tools don’t talk to each other, they define the same things differently, and extracting useful answers from any combination of them requires either expensive analysts or a lot of your own time.
Data engineering is the work of building reliable pipelines from all of these sources into a single warehouse — and writing the transformation logic that makes them comparable.
Shopify webhooks and the order event problem. Shopify fires webhook events when orders are created, updated, fulfilled, and refunded. A naive pipeline captures the order at creation time. A well-engineered pipeline captures the full order lifecycle: cancellations, partial refunds, exchanges, and subscription renewals. If your return rate is meaningful (and for most brands it is), a dashboard that doesn’t account for refunds is overstating revenue by double digits.
GA4’s BigQuery export. GA4 has a native export to BigQuery that is genuinely powerful but requires significant transformation before it’s useful. The raw export is event-level — one row per user event, with session and user IDs that need to be stitched together to reconstruct sessions, funnels, and conversion paths. Writing the SQL to turn raw GA4 events into meaningful session and conversion data is non-trivial and is one of the more common things we do for ecommerce clients.
Cross-platform attribution. Meta reports your conversions using 7-day click or 1-day view by default. Google Ads uses last-click by default. GA4 uses data-driven. Shopify shows orders tagged with the last UTM source in the checkout flow. These will never agree. What they should do is be consistent and clearly labelled — so when you’re deciding where to allocate next month’s budget, you’re working from a defined model, not a number that changes depending on which platform you looked at last.
The 4-week build journey
What we actually do behind the scenes
- daily revenue net of refunds
- roas by channel, agreed attribution model
- cac and ltv trends over time
- funnel drop-off from session to checkout
- bigquery warehouse, configured per store
- fivetran connectors with monitoring
- ga4 raw event transformation sql
- 2am alerts when a sync fails
- weekly iteration as campaigns and products change
Your data pipeline
Where the time and money goes
Ecommerce has a relatively lower data engineering cost than field marketing or multi-client agency work, because Shopify and GA4 are well-documented platforms with mature connectors. The engineering complexity is concentrated in the transformation layer — specifically attribution logic and GA4 event reconstruction.
Once Shopify and GA4 are correctly wired and the attribution model is agreed, the ongoing engineering cost drops. The steady-state retainer is dominated by iteration — new dashboards for new product lines, BFCM performance tracking, cohort analysis as your customer base grows.
What you get vs what we manage
You get a dashboard that tells you where your revenue is coming from, what it’s costing you to acquire customers, and where people are dropping out of your funnel. You can open it every morning and make better decisions about where to spend today’s budget.
We manage the full data stack underneath that — including the parts that break when Shopify releases a new API version or Meta changes how it reports conversion events.
Frequently asked questions
Shopify has its own analytics. Why do I need BigQuery?
Shopify analytics is fine for basic order reporting. It breaks down when you want to combine order data with GA4 session data (funnel analysis), ad platform data (attribution), or email platform data (LTV by acquisition channel). BigQuery is the neutral warehouse that holds all of it in one place, so you can ask questions that span multiple platforms.
GA4 is complicated. Do you handle the BigQuery export setup?
Yes, completely. GA4’s BigQuery export is one of the more powerful tools available to ecommerce brands, but the raw data is extremely verbose and needs significant transformation before it’s dashboardable. We set up the export, write the transformation SQL, and maintain it as your GA4 event taxonomy evolves.
What attribution model do you use?
We use whatever model you agree to, applied consistently. Most ecommerce brands we work with end up on a blended view: Shopify-tagged last-UTM revenue as the transaction record, with platform-reported data shown alongside as context. We document the model and the decision rationale so your team and any future analyst can understand exactly what a number means.
Can you track subscription and repeat purchase behaviour?
Yes. Shopify has good subscription data if you’re using Recharge or Shopify Subscriptions. We build cohort analysis and repeat purchase rates into the warehouse model from the start, because customer LTV and repeat rate are usually the metrics that matter most at a strategic level — and they’re never available out of the box from any single platform.
What happens during peak periods like BFCM?
We build BFCM-specific views in the run-up so you can track day-by-day performance against targets. Pipeline monitoring is heightened during peak — if something breaks at 2am on Black Friday, we want to know before you do. We’ve run ecommerce data infrastructure through enough peak periods to know where the failure points are.
The ecommerce brands that win over time are the ones making faster, better-informed decisions about product, channel, and customer. The dashboard is what enables that. The engineering underneath is what makes the dashboard trustworthy.
Book a 20-min call and we’ll show you what a revenue dashboard could look like for your specific stack.