Holdout Group Incrementality for Retargeting Campaigns

Understanding Incrementality in Retargeting

Retargeting aims to bring back users who have already shown interest in a brand. The core question for marketers is whether the ads actually cause additional conversions or simply remind users who would convert later on their own. Incrementality testing answers that question by comparing the behaviour of a treated audience with a comparable group that does not see the retargeting ads.

Why Holdout Groups Are the Preferred Method

A holdout group is a randomly selected segment of the retargeting audience that is deliberately excluded from seeing any retargeting creative. Because the selection is random, the holdout mirrors the treated group in all observable characteristics. Any difference in conversion rates between the two groups can be attributed to the presence of the ads, assuming the experiment is properly controlled.

Key Benefits

First, holdout testing provides a clear causal estimate of lift, which is more reliable than proxy metrics such as click‑through rate. Second, it reveals the true cost per incremental conversion, allowing marketers to reallocate spend toward the most efficient tactics. Third, it uncovers creative or audience segments that generate little or no lift, helping to prune waste.

Designing a Robust Holdout Experiment

Effective experiments start with clear objectives, a well‑defined audience, and a plan for statistical analysis. The following steps guide the process.

Define the conversion window that matches the buying cycle of the product or service. Typical windows range from 7 to 30 days for e‑commerce and up to 90 days for high‑ticket items.
Identify the retargeting audience based on source pixels, website visits, or app events. Segment the audience by value tiers if the brand wants to test lift for high‑value users separately.
Determine the holdout size. A common rule is to allocate 5 to 10 percent of the total retargeting budget to the holdout, which provides enough power while limiting spend loss.
Apply randomization at the user level. Most ad platforms offer built‑in audience exclusion rules that guarantee a random split.
Set up identical campaign settings for the treated group, including bidding strategy, creative rotation and frequency caps. The only difference is that the holdout does not receive any ad impressions.

Sample Size Considerations

Statistical power depends on three factors: the baseline conversion rate, the expected lift, and the confidence level. Online calculators can estimate the required sample size. For example, if the baseline conversion rate is 2 percent and the expected lift is 15 percent, a 95 percent confidence level with 80 percent power typically requires a holdout of around 10 000 users.

Measuring Incremental Lift

After the experiment runs for the predefined window, collect conversion data for both groups. The primary metric is the incremental conversion rate, calculated as the difference between treated and holdout conversion rates. To express lift as a percentage, divide the incremental rate by the holdout rate and multiply by 100.

Statistical significance should be tested using a two‑sample proportion test or a chi‑square test. Many analytics platforms provide built‑in significance testing for A/B experiments; otherwise, a simple spreadsheet formula can produce the p‑value.

Example Calculation

Suppose the treated group records 1 200 conversions from 100 000 impressions, a 1.2 percent rate. The holdout group records 950 conversions from the same number of users, a 0.95 percent rate. The incremental rate is 0.25 percent, and the lift is approximately 26 percent. If the p‑value is below 0.05, the result is statistically significant, indicating that the retargeting ads generated a measurable boost.

Common Pitfalls and How to Avoid Them

Even well‑designed experiments can produce misleading results if certain traps are ignored.

Leakage: Users in the holdout may still see ads through other channels or placements. Use frequency caps and platform‑level exclusions to keep the holdout truly blind.
Overlap with Prospecting: If prospecting campaigns target the same audience, the lift attributed to retargeting will be overstated. Separate prospecting and retargeting budgets during the test.
Insufficient Duration: Short experiments may not capture conversions that occur later in the buying cycle. Align the experiment length with the chosen conversion window.
Non‑random Allocation: Manual list creation can introduce bias. Rely on platform‑generated random splits.

Practical Implementation Tips

Many ad platforms support holdout testing natively. On Meta, create a custom audience for the retargeting pool, then duplicate it and label one copy as “Holdout.” Exclude the holdout from the ad set using the audience exclusion feature. On Google Ads, use audience signals to build a retargeting list and then apply a “Do not target” rule for the holdout segment.

For brands that use a tag manager or a server‑side solution, pass a holdout flag to the ad server so that the user is never served a retargeting impression, regardless of the platform. This approach reduces leakage caused by cross‑platform bidding.

Automation Opportunities

Once the experiment framework is in place, automation can rotate holdout percentages, refresh audience segments weekly, and push results to a dashboard. Scripts can also pause or scale the campaign automatically if lift falls below a predefined threshold.

Interpreting Results for Decision Making

When the lift is positive and statistically significant, calculate the incremental cost per acquisition (iCPA) by dividing the spend on the treated group by the number of incremental conversions. Compare iCPA to the overall profit margin to determine whether the retargeting effort is profitable.

If the lift is negligible or negative, consider testing alternative creative, adjusting frequency caps, or narrowing the audience to higher‑value users. The holdout framework allows rapid iteration because each new variant can be measured against the same baseline.

Scaling Incrementality Testing Across Channels

Holdout testing is not limited to display retargeting. The same principles apply to video retargeting, dynamic product ads, and even email remarketing. By establishing a unified measurement architecture, marketers can compare incremental lift across channels and allocate budget to the highest‑return tactics.

When multiple channels are tested simultaneously, use multi‑armed bandit logic to allocate spend dynamically based on early lift signals. This hybrid approach blends rigorous holdout measurement with real‑time optimisation.

Future Directions in Incrementality Measurement

Privacy‑focused changes such as Apple’s ATT framework reduce the availability of deterministic identifiers. Emerging solutions include probabilistic matching, aggregated conversion modeling, and the use of privacy‑preserving measurement APIs offered by major platforms. While these methods add complexity, the core concept of comparing a treated group with a control remains the gold standard for causal insight.

Investing in a robust holdout testing regimen equips performance marketers with the evidence needed to justify spend, refine creative, and ultimately drive more efficient growth.