Title: | Bayesian Inference for A|B and Bandit Marketing Tests |
---|---|
Description: | Uses simple Bayesian conjugate prior update rules to calculate the win probability of each option, value remaining in the test, and percent lift over the baseline for various marketing objectives. References: Fink, Daniel (1997) "A Compendium of Conjugate Priors" <https://www.johndcook.com/CompendiumOfConjugatePriors.pdf>. Stucchio, Chris (2015) "Bayesian A/B Testing at VWO" <https://vwo.com/downloads/VWO_SmartStats_technical_whitepaper.pdf>. |
Authors: | Ryan Angi |
Maintainer: | Ryan Angi <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.3.5 |
Built: | 2025-03-10 04:20:52 UTC |
Source: | https://github.com/r-angi/grizbayr |
Calculate Multi Rev Per Session
calculate_multi_rev_per_session(conv_rates, inverse_rev_A, inverse_rev_B)
calculate_multi_rev_per_session(conv_rates, inverse_rev_A, inverse_rev_B)
conv_rates |
Dirichlet samples containing a tibble with columns alpha_1, alpha_2, and alpha_0 |
inverse_rev_A |
Vector of inverse revenue samples from A conversion type |
inverse_rev_B |
Vector of inverse revenue samples from B conversion type |
Vector of samples (dbl)
Calculate Total CM
calculate_total_cm(rev_per_click, cost_per_click, expected_clicks)
calculate_total_cm(rev_per_click, cost_per_click, expected_clicks)
rev_per_click |
vector of rev per click samples |
cost_per_click |
vector of cost per click (cpc) samples |
expected_clicks |
vector of expected clicks (expected CTR * fixed impressions) |
vector of CM estimates (dbl)
Efficiently estimates all values at once so the posterior only need to be sampled one time. This function will return as a list win probability, value remaining, estimated percent lift with respect to the provided option, and the win probability of the best option vs the provided option.
estimate_all_values( input_df, distribution, wrt_option_lift, priors = list(), wrt_option_vr = NULL, loss_threshold = 0.95, lift_threshold = 0.7, metric = "lift" )
estimate_all_values( input_df, distribution, wrt_option_lift, priors = list(), wrt_option_vr = NULL, loss_threshold = 0.95, lift_threshold = 0.7, metric = "lift" )
input_df |
Dataframe containing option_name (str) and various other columns depending on the distribution type. See vignette for more details. |
distribution |
String of the distribution name |
wrt_option_lift |
String: the option lift and win probability is calculated with respect to (wrt). Required. |
priors |
Optional list of priors. Defaults will be use otherwise. |
wrt_option_vr |
String: the option against which loss (value remaining) is calculated. If NULL the best option will be used. (optional) |
loss_threshold |
The confidence interval specifying what the "worst case scenario" should be. Defaults to 95%. (optional) |
lift_threshold |
The confidence interval specifying how likely the lift is to be true. Defaults to 70%. (optional) |
metric |
string the type of loss. absolute will be the difference, on the outcome scale. 0 when best = wrt_option lift will be the (best - wrt_option) / wrt_option, 0 when best = wrt_option relative_risk will be the ratio best/wrt_option, 1 when best = wrt_option |
TODO: Add high density credible intervals to this output for each option.
A list with 4 named items: Win Probability, Value Remaining, Lift vs Baseline, and Win Probability vs Baseline.
## Not run: input_df <- data.frame(option_name = c("A", "B", "C"), sum_clicks = c(1000, 1000, 1000), sum_conversions = c(100, 120, 110), stringsAsFactors = FALSE) estimate_all_values(input_df, distribution = "conversion_rate", wrt_option_lift = "A") ## End(Not run)
## Not run: input_df <- data.frame(option_name = c("A", "B", "C"), sum_clicks = c(1000, 1000, 1000), sum_conversions = c(100, 120, 110), stringsAsFactors = FALSE) estimate_all_values(input_df, distribution = "conversion_rate", wrt_option_lift = "A") ## End(Not run)
Estimates lift distribution vector from posterior samples.
estimate_lift(posterior_samples, distribution, wrt_option, metric = "lift")
estimate_lift(posterior_samples, distribution, wrt_option, metric = "lift")
posterior_samples |
Tibble returned from sample_from_posterior with 3 columns 'option_name', 'samples', and 'sample_id'. |
distribution |
String of the distribution name |
wrt_option |
string the option lift is calculated with respect to (wrt). Required. |
metric |
string the type of lift. 'absolute“ will be the difference, on the outcome scale. 0 when best = wrt_option 'lift“ will be the (best - wrt_option) / wrt_option, 0 when best = wrt_option 'relative_risk“ will be the ratio best/wrt_option, 1 when best = wrt_option |
numeric, the lift distribution
# Requires posterior_samples dataframe. See `sample_from_posterior()` # for an example. ## Not run: estimate_lift(posterior_samples = posterior_samples, distribution = "conversion_rate", wrt_option = "A", metric = "lift") ## End(Not run)
# Requires posterior_samples dataframe. See `sample_from_posterior()` # for an example. ## Not run: estimate_lift(posterior_samples = posterior_samples, distribution = "conversion_rate", wrt_option = "A", metric = "lift") ## End(Not run)
Estimate Lift vs Baseline
estimate_lift_vs_baseline( input_df, distribution, priors = list(), wrt_option, metric = "lift", threshold = 0.7 )
estimate_lift_vs_baseline( input_df, distribution, priors = list(), wrt_option, metric = "lift", threshold = 0.7 )
input_df |
Dataframe containing option_name (str) and various other columns depending on the distribution type. See vignette for more details. |
distribution |
String of the distribution name |
priors |
Optional list of priors. Defaults will be use otherwise. |
wrt_option |
string the option loss is calculated with respect to (wrt). Required. |
metric |
string the type of loss. absolute will be the difference, on the outcome scale. 0 when best = wrt_option lift will be the (best - wrt_option) / wrt_option, 0 when best = wrt_option relative_risk will be the ratio best/wrt_option, 1 when best = wrt_option |
threshold |
Lift percentage threshold between 0 and 1. (0.7 threshold is "at least 70% lift"). Defaults to 0.7. |
numeric value remaining at the specified threshold
input_df <- tibble::tibble(option_name = c("A", "B", "C"), sum_clicks = c(1000, 1000, 1000), sum_conversions = c(100, 120, 110)) estimate_lift_vs_baseline(input_df, distribution = "conversion_rate", wrt_option = "A")
input_df <- tibble::tibble(option_name = c("A", "B", "C"), sum_clicks = c(1000, 1000, 1000), sum_conversions = c(100, 120, 110)) estimate_lift_vs_baseline(input_df, distribution = "conversion_rate", wrt_option = "A")
Estimate Loss
estimate_loss( posterior_samples, distribution, wrt_option = NULL, metric = c("absolute", "lift", "relative_risk") )
estimate_loss( posterior_samples, distribution, wrt_option = NULL, metric = c("absolute", "lift", "relative_risk") )
posterior_samples |
Tibble: returned from sample_from_posterior with 3 columns 'option_name', 'samples', and 'sample_id'. |
distribution |
String: the name of the distribution |
wrt_option |
String: the option loss is calculated with respect to (wrt). If NULL, the best option will be chosen. |
metric |
String: the type of loss. absolute will be the difference, on the outcome scale. 0 when best = wrt_option lift will be the (best - wrt_option) / wrt_option, 0 when best = wrt_option relative_risk will be the ratio best/wrt_option, 1 when best = wrt_option |
numeric, the loss distribution
# Requires posterior_samples dataframe. See `sample_from_posterior()` # for an example. ## Not run: estimate_loss(posterior_samples = posterior_samples, distribution = "conversion_rate") ## End(Not run)
# Requires posterior_samples dataframe. See `sample_from_posterior()` # for an example. ## Not run: estimate_loss(posterior_samples = posterior_samples, distribution = "conversion_rate") ## End(Not run)
Estimates value remaining or loss (in terms of percent lift, absolute, or relative).
estimate_value_remaining( input_df, distribution, priors = list(), wrt_option = NULL, metric = "lift", threshold = 0.95 )
estimate_value_remaining( input_df, distribution, priors = list(), wrt_option = NULL, metric = "lift", threshold = 0.95 )
input_df |
Dataframe containing option_name (str) and various other columns depending on the distribution type. See vignette for more details. |
distribution |
String of the distribution name |
priors |
Optional list of priors. Defaults will be use otherwise. |
wrt_option |
string the option loss is calculated with respect to (wrt). If NULL, the best option will be chosen. |
metric |
string the type of loss. absolute will be the difference, on the outcome scale. 0 when best = wrt_option lift will be the (best - wrt_option) / wrt_option, 0 when best = wrt_option relative_risk will be the ratio best/wrt_option, 1 when best = wrt_option |
threshold |
The confidence interval specifying what the "worst case scenario should be. Defaults to 95%. (optional) |
numeric value remaining at the specified threshold
input_df <- tibble::tibble(option_name = c("A", "B", "C"), sum_clicks = c(1000, 1000, 1000), sum_conversions = c(100, 120, 110)) estimate_value_remaining(input_df, distribution = "conversion_rate") estimate_value_remaining(input_df, distribution = "conversion_rate", threshold = 0.99) estimate_value_remaining(input_df, distribution = "conversion_rate", wrt_option = "A", metric = "absolute")
input_df <- tibble::tibble(option_name = c("A", "B", "C"), sum_clicks = c(1000, 1000, 1000), sum_conversions = c(100, 120, 110)) estimate_value_remaining(input_df, distribution = "conversion_rate") estimate_value_remaining(input_df, distribution = "conversion_rate", threshold = 0.99) estimate_value_remaining(input_df, distribution = "conversion_rate", wrt_option = "A", metric = "absolute")
Creates a tibble of win probabilities for each option based on the data observed.
estimate_win_prob(input_df, distribution, priors = list())
estimate_win_prob(input_df, distribution, priors = list())
input_df |
Dataframe containing option_name (str) and various other columns depending on the distribution type. See vignette for more details. |
distribution |
String of the distribution name |
priors |
Optional list of priors. Defaults will be use otherwise. |
tibble object with 2 columns: 'option_name' and 'win_probability' formatted as a percent
input_df <- tibble::tibble( option_name = c("A", "B"), sum_clicks = c(1000, 1000), sum_conversions = c(100, 120) ) estimate_win_prob(input_df, "conversion_rate")
input_df <- tibble::tibble( option_name = c("A", "B"), sum_clicks = c(1000, 1000), sum_conversions = c(100, 120) ) estimate_win_prob(input_df, "conversion_rate")
Estimate Win Probability Given Posterior Distribution
estimate_win_prob_given_posterior(posterior_samples, winner_is_max = TRUE)
estimate_win_prob_given_posterior(posterior_samples, winner_is_max = TRUE)
posterior_samples |
Tibble of data in long form with 2 columns 'option_name' and 'samples' |
winner_is_max |
Boolean. This should almost always be TRUE. If a larger number is better then this should be TRUE. This should be FALSE for metrics such as CPA or CPC where a higher cost is not necessarily better. |
Tibble of each option_name and the win probability expressed as a percentage and a decimal 'raw'
# Requires posterior_samples dataframe. See `sample_from_posterior()` # for an example. ## Not run: estimate_win_prob_given_posterior(posterior_samples = posterior_samples) estimate_win_prob_given_posterior( posterior_samples = posterior_samples, winner_is_max = TRUE ) ## End(Not run)
# Requires posterior_samples dataframe. See `sample_from_posterior()` # for an example. ## Not run: estimate_win_prob_given_posterior(posterior_samples = posterior_samples) estimate_win_prob_given_posterior( posterior_samples = posterior_samples, winner_is_max = TRUE ) ## End(Not run)
Calculates the win probability of the best option compared to a single other option given an input_df
estimate_win_prob_vs_baseline( input_df, distribution, priors = list(), wrt_option )
estimate_win_prob_vs_baseline( input_df, distribution, priors = list(), wrt_option )
input_df |
Dataframe containing option_name (str) and various other columns depending on the distribution type. See vignette for more details. |
distribution |
String of the distribution name |
priors |
Optional list of priors. Defaults will be use otherwise. |
wrt_option |
string the option win prob is calculated with respect to (wrt). Required. |
Tibble of each option_name and the win probability expressed as a percentage and a decimal 'raw'
input_df <- tibble::tibble( option_name = c("A", "B", "C"), sum_clicks = c(1000, 1000, 1000), sum_conversions = c(100, 120, 110) ) estimate_win_prob_vs_baseline(input_df = input_df, distribution = "conversion_rate", wrt_option = "B")
input_df <- tibble::tibble( option_name = c("A", "B", "C"), sum_clicks = c(1000, 1000, 1000), sum_conversions = c(100, 120, 110) ) estimate_win_prob_vs_baseline(input_df = input_df, distribution = "conversion_rate", wrt_option = "B")
Calculates the win probability of the best option compared to a single other option given a posterior distribution.
estimate_win_prob_vs_baseline_given_posterior( posterior_samples, distribution, wrt_option )
estimate_win_prob_vs_baseline_given_posterior( posterior_samples, distribution, wrt_option )
posterior_samples |
Tibble returned from sample_from_posterior with 3 columns 'option_name', 'samples', and 'sample_id'. |
distribution |
String: the distribution name |
wrt_option |
String: the option to compare against the best option. |
Tibble of each option_name and the win probability expressed as a percentage and a decimal 'raw'
# Requires posterior_samples dataframe. See `sample_from_posterior()` # for an example. ## Not run: estimate_win_prob_vs_baseline_given_posterior( posterior_samples = posterior_samples, distribution = "conversion_rate", wrt_option = "A") ## End(Not run)
# Requires posterior_samples dataframe. See `sample_from_posterior()` # for an example. ## Not run: estimate_win_prob_vs_baseline_given_posterior( posterior_samples = posterior_samples, distribution = "conversion_rate", wrt_option = "A") ## End(Not run)
Samples from posterior, calculates win probability, and selects the best option. Note: this can be inefficient if you already have the win probability dataframe. Only use this if that has not already been calculated.
find_best_option(posterior_samples, distribution)
find_best_option(posterior_samples, distribution)
posterior_samples |
Tibble returned from sample_from_posterior with 3 columns 'option_name', 'samples', and 'sample_id'. |
distribution |
String: name of the distribution |
String: the best option name
# Requires posterior distribution ## Not run: find_best_option(posterior_samples = posterior_samples, distribution = "conversion_rate") ## End(Not run)
# Requires posterior distribution ## Not run: find_best_option(posterior_samples = posterior_samples, distribution = "conversion_rate") ## End(Not run)
When win probability is calculated
impute_missing_options(posterior_samples, wp_raw)
impute_missing_options(posterior_samples, wp_raw)
posterior_samples |
Tibble of data in long form with 2 columns 'option_name' and 'samples' |
wp_raw |
Tibble of win probabilities with the columns: 'option_name' and 'win_prob_raw' |
wp_raw table with new rows if option names were missing.
Checks if a single valid prior name is in the list of prior values and if that prior value from the list is greater than 0.
is_prior_valid(priors_list, valid_prior)
is_prior_valid(priors_list, valid_prior)
priors_list |
A list of valid priors |
valid_prior |
A character string |
Boolean (TRUE/FALSE)
Determines if the max or min function should be used for win probability. If CPA or CPC distribution, lower is better, else higher number is better.
is_winner_max(distribution)
is_winner_max(distribution)
distribution |
String: the name of the distribution |
Boolean TRUE/FALSE
Randomly samples a vector of length n from a dirichlet distribution parameterized by a vector of alphas PDF of Gamma with scale = 1 : f(x)= 1/(Gamma(a)) x^(a-1) e^-(x)
rdirichlet(n, alphas_list)
rdirichlet(n, alphas_list)
n |
integer, the number of samples |
alphas_list |
Named List of Integers: parameters of the dirichlet, interpreted as the number of success of each outcome |
n x length(alphas) named tibble representing the probability of observing each outcome
rdirichlet(100, list(a = 20, b = 15, c = 60))
rdirichlet(100, list(a = 20, b = 15, c = 60))
Adds 4 new nested columns to the input_df: 'beta_params', 'gamma_params_rev', 'gamma_params_cost'and 'samples'
sample_cm_per_click(input_df, priors, n_samples = 50000)
sample_cm_per_click(input_df, priors, n_samples = 50000)
input_df |
Dataframe containing option_name (str), sum_conversions (dbl), sum_revenue (dbl), and sum_clicks (dbl). |
priors |
Optional list of priors alpha0, beta0 for Beta, k0, theta0 for Gamma Inverse Revenue,
and k01, theta01 for Gamma Cost (uses alternate priors so they can be different from Revenue).
Default |
n_samples |
Optional integer value. Defaults to 50,000 samples. |
'beta_params' and 'gamma_params_rev' in each row should be a
tibble of length 2 ( and
parameters
and
and
parameters)
'samples' in each row should be a tibble of length 'n_samples'
See update_rules vignette for a mathematical representation.
input_df with 4 new nested columns 'beta_params', 'gamma_params_rev', 'gamma_params_cost', and 'samples'
Adds 2 new nested columns to the input_df: 'beta_params' and 'samples'
'beta_params' in each row should be a tibble of length 2 (
and
parameters)
'samples' in each row should be a tibble of length 'n_samples'
sample_conv_rate(input_df, priors, n_samples = 50000)
sample_conv_rate(input_df, priors, n_samples = 50000)
input_df |
Dataframe containing option_name (str), sum_conversions (dbl), and sum_clicks (dbl). |
priors |
Optional list of priors alpha0 and beta0.
Default |
n_samples |
Optional integer value. Defaults to 50,000 samples. |
See update_rules vignette for a mathematical representation.
Conversion Rate is sampled from a Beta distribution with a Binomial likelihood of an individual converting.
input_df with 2 new nested columns 'beta_params' and 'samples'
Adds 3 new nested columns to the input_df: 'beta_params', 'gamma_params', and 'samples'
'beta_params' and 'gamma_params' in each row should be a tibble of length 2 (
and
parameters and
and
parameters)
'samples' in each row should be a tibble of length 'n_samples'
sample_cpa(input_df, priors, n_samples = 50000)
sample_cpa(input_df, priors, n_samples = 50000)
input_df |
Dataframe containing option_name (str), sum_conversions (dbl), sum_cost (dbl), and sum_clicks (dbl). |
priors |
Optional list of priors alpha0, beta0 for Beta and k0, theta0
for Gamma.
Default |
n_samples |
Optional integer value. Defaults to 50,000 samples. |
See update_rules vignette for a mathematical representation. This is a combination of a Beta-Bernoulli update and a Gamma-Exponential update.
Conversion Rate is sampled from a Beta distribution with a Binomial likelihood of an individual converting.
Average CPC is sampled from a Gamma distribution with an Exponential likelihood of an individual cost.
input_df with 3 new nested columns 'beta_params', 'gamma_params', and 'samples'
Adds 2 new nested columns to the input_df: 'gamma_params' and 'samples'
'gamma_params' in each row should be a tibble of length 2 (
and
parameters)
'samples' in each row should be a tibble of length 'n_samples'
sample_cpc(input_df, priors, n_samples = 50000)
sample_cpc(input_df, priors, n_samples = 50000)
input_df |
Dataframe containing option_name (str), sum_clicks (dbl), sum_cost (dbl). |
priors |
Optional list of priors k0, theta0 for Gamma.
Default |
n_samples |
Optional integer value. Defaults to 50,000 samples. |
See update_rules vignette for a mathematical representation.
Average CPC is sampled from a Gamma distribution with an Exponential likelihood of an individual cost.
input_df with 2 new nested columns 'gamma_params' and 'samples'
This is an alias for sample_conv_rate with 2 different input
columns. This function calculates posterior samples of
. Adds 2 new nested columns to
the input_df: 'beta_params' and 'samples'.
'beta_params' in each row should be a tibble of length 2 (
and
parameters)
'samples' in each row should be a tibble of length 'n_samples'
sample_ctr(input_df, priors, n_samples = 50000)
sample_ctr(input_df, priors, n_samples = 50000)
input_df |
Dataframe containing option_name (str), sum_clicks (dbl), and sum_impressions (dbl). |
priors |
Optional list of priors alpha0 and beta0.
Default |
n_samples |
Optional integer value. Defaults to 50,000 samples. |
See update_rules vignette for a mathematical representation.
Click Through Rate is sampled from a Beta distribution with a Binomial likelihood of an individual Clicking
input_df with 2 new nested columns 'beta_params' and 'samples'
Selects which function to use to sample from the posterior distribution
sample_from_posterior( input_df, distribution, priors = list(), n_samples = 50000 )
sample_from_posterior( input_df, distribution, priors = list(), n_samples = 50000 )
input_df |
Dataframe containing option_name (str) and various other columns depending on the distribution type. See vignette for more details. |
distribution |
String of the distribution name |
priors |
Optional list of priors. Defaults will be use otherwise. |
n_samples |
Optional integer value. Defaults to 50,000 samples. |
A tibble with 2 columns: option_name (chr) and samples (dbl) [long form data].
input_df <- tibble::tibble( option_name = c("A", "B"), sum_clicks = c(1000, 1000), sum_conversions = c(100, 120), sum_sessions = c(1000, 1000), sum_revenue = c(1000, 1500) ) sample_from_posterior(input_df, "conversion_rate") sample_from_posterior(input_df, "rev_per_session")
input_df <- tibble::tibble( option_name = c("A", "B"), sum_clicks = c(1000, 1000), sum_conversions = c(100, 120), sum_sessions = c(1000, 1000), sum_revenue = c(1000, 1500) ) sample_from_posterior(input_df, "conversion_rate") sample_from_posterior(input_df, "rev_per_session")
Adds 5 new nested columns to the input_df: 'dirichlet_params', 'gamma_params_A', 'gamma_params_B', and 'samples'. This samples from multiple revenue per session distributions at once.
sample_multi_rev_per_session(input_df, priors, n_samples = 50000)
sample_multi_rev_per_session(input_df, priors, n_samples = 50000)
input_df |
Dataframe containing option_name (str), sum_conversions (dbl), sum_sessions (dbl), sum_revenue (dbl), sum_conversion_2 (dbl), sum_sessions_2 (dbl), sum_revenue_2 (dbl). |
priors |
Optional list of priors alpha0 and beta0.
Default |
n_samples |
Optional integer value. Defaults to 50,000 samples. |
See update_rules vignette for a mathematical representation.
Conversion Rate is sampled from a Dirichlet distribution with a Multinomial likelihood of an individual converting.
input_df with 4 new nested columns 'dirichlet_params', 'gamma_params_A', 'gamma_params_B', and 'samples'. 'samples' in each row should be a tibble of length 'n_samples'.
Adds 2 new nested columns to the input_df: 'gamma_params' and 'samples'
'gamma_params' in each row should be a tibble of length 2 (
and
parameters)
'samples' in each row should be a tibble of length 'n_samples'
sample_page_views_per_session(input_df, priors, n_samples = 50000)
sample_page_views_per_session(input_df, priors, n_samples = 50000)
input_df |
Dataframe containing option_name (str), sum_sessions (dbl), and sum_page_views_per_session (dbl). |
priors |
Optional list of priors k0 and theta0.
Default |
n_samples |
Optional integer value. Defaults to 50,000 samples. |
See update_rules vignette for a mathematical representation.
Page Views Per Visit is sampled from a Gamma distribution with a Poisson likelihood of an individual having n page views by the end of their session.
This is not always the case, so verify your data follows the shape of an Poisson distribution before using this.
input_df with 2 new nested columns 'gamma_params' and 'samples'
This is an alias for sample_conv_rate with a different input column.
Adds 2 new nested columns to the input_df: 'beta_params' and 'samples'
'beta_params' in each row should be a tibble of length 2 (
and
parameters)
'samples' in each row should be a tibble of length 'n_samples'
sample_response_rate(input_df, priors, n_samples = 50000)
sample_response_rate(input_df, priors, n_samples = 50000)
input_df |
Dataframe containing option_name (str), sum_conversions (dbl), and sum_sessions (dbl). |
priors |
Optional list of priors alpha0 and beta0.
Default |
n_samples |
Optional integer value. Defaults to 50,000 samples. |
See update_rules vignette for a mathematical representation.
Response Rate is sampled from a Beta distribution with a Binomial likelihood of an individual converting.
input_df with 2 new nested columns 'beta_params' and 'samples'
Adds 3 new nested columns to the input_df: 'beta_params', 'gamma_params', and 'samples'
'beta_params' and 'gamma_params' in each row should be a tibble of length 2 (
and
parameters and
and
parameters)
'samples' in each row should be a tibble of length 'n_samples'
sample_rev_per_session(input_df, priors, n_samples = 50000)
sample_rev_per_session(input_df, priors, n_samples = 50000)
input_df |
Dataframe containing option_name (str), sum_conversions (dbl), sum_revenue (dbl), and sum_clicks (dbl). |
priors |
Optional list of priors alpha0, beta0 for Beta
and k0, theta0 for Gamma. Default |
n_samples |
Optional integer value. Defaults to 50,000 samples. |
See update_rules vignette for a mathematical representation.
This is a combination of a Beta-Bernoulli update and a Gamma-Exponential update.
Conversion Rate is sampled from a Beta distribution with a Binomial likelihood of an individual converting.
Average Rev Per Order is sampled from a Gamma distribution with an Exponential likelihood of Revenue from an individual order. This function makes sense to use if there is a distribution of possible revenue values that can be produced from a single order or conversion.
input_df with 3 new nested columns 'beta_params', 'gamma_params', and 'samples'
Adds 2 new nested columns to the input_df: 'gamma_params' and 'samples'
'gamma_params' in each row should be a tibble of length 2 (
and
parameters)
'samples' in each row should be a tibble of length 'n_samples'
sample_session_duration(input_df, priors, n_samples = 50000)
sample_session_duration(input_df, priors, n_samples = 50000)
input_df |
Dataframe containing option_name (str), sum_sessions (dbl), and sum_duration (dbl). |
priors |
Optional list of priors k0 and theta0.
Default |
n_samples |
Optional integer value. Defaults to 50,000 samples. |
See update_rules vignette for a mathematical representation.
Session Duration is sampled from a Gamma distribution with a Exponential likelihood of an individual leaving the site or ending a session at time t.
This is not always the case, so verify your data follows the shape of an exponential distribution before using this.
input_df with 2 new nested columns 'gamma_params' and 'samples'
Adds 4 new nested columns to the input_df: 'beta_params_ctr', 'beta_params_conv','gamma_params_rev', 'gamma_params_cost' and 'samples'.
sample_total_cm(input_df, priors, n_samples = 50000)
sample_total_cm(input_df, priors, n_samples = 50000)
input_df |
Dataframe containing option_name (str), sum_conversions (dbl), sum_revenue (dbl), and sum_clicks (dbl). |
priors |
Optional list of priors alpha0, beta0 for Beta,
k0, theta0 for Gamma Inverse Revenue, and k01, theta01 for
Gamma Cost (uses alternate priors so they can be different from Revenue).
Default |
n_samples |
Optional integer value. Defaults to 50,000 samples. |
'beta_params' and 'gamma_params' in each row should be a tibble of length 2
( and
params and
and
params).
'samples' in each row should be a tibble of length 'n_samples'.
One assumption in this model is that sum_impressions is not stochastic. This assumes that Clicks are stochastically generated from a set number of Impressions. It does not require that the number of impressions are equal on either side. Generally this assumption holds true in marketing tests where traffic is split 50/50 and very little variance is observed in the number of impressions on either side.
See update_rules vignette for a mathematical representation.
input_df with 5 new nested columns 'beta_params_conv', 'beta_params_ctr', 'gamma_params_rev','gamma_params_cost', and 'samples'
Updates Beta Distribution with the Beta-Bernoulli conjugate prior update rule
update_beta(alpha, beta, priors = list())
update_beta(alpha, beta, priors = list())
alpha |
Double value for alpha (count of successes). Must be 0 or greater. |
beta |
Double value for beta (count of failures). Must be 0 or greater. |
priors |
An optional list object that contains alpha0 and beta0. Otherwise the function with use Beta(1,1) as the prior distribution. |
A tibble object that contains 'alpha' and 'beta'
update_beta(alpha = 1, beta = 5, priors = list(alpha0 = 2, beta0 = 2)) update_beta(alpha = 20000, beta = 50000)
update_beta(alpha = 1, beta = 5, priors = list(alpha0 = 2, beta0 = 2)) update_beta(alpha = 20000, beta = 50000)
This function updates the Dirichlet distribution with the Dirichlet-Multinomial conjugate prior update rule.
update_dirichlet(alpha_0, alpha_1, alpha_2, priors = list())
update_dirichlet(alpha_0, alpha_1, alpha_2, priors = list())
alpha_0 |
Double value for alpha_0 (count of failures). Must be 0 or greater. |
alpha_1 |
Double value for alpha_1 (count of successes side 1). Must be 0 or greater. |
alpha_2 |
Double value for alpha_2 (count of successes side 2). Must be 0 or greater. |
priors |
An optional list object that contains alpha00, alpha01, and alpha02.
Otherwise the function with use |
TODO: This function currently only works in 3 dimensions. Should be extended into N dimensions in the future. Can use ... notation.
tibble with columns alpha_0, alpha_1, and alpha_2
update_dirichlet(alpha_0 = 20, alpha_1 = 5, alpha_2 = 2) sample_priors_list <- list(alpha00 = 2, alpha01 = 3, alpha02 = 5) update_dirichlet(alpha_0 = 20, alpha_1 = 5, alpha_2 = 2, priors = sample_priors_list)
update_dirichlet(alpha_0 = 20, alpha_1 = 5, alpha_2 = 2) sample_priors_list <- list(alpha00 = 2, alpha01 = 3, alpha02 = 5) update_dirichlet(alpha_0 = 20, alpha_1 = 5, alpha_2 = 2, priors = sample_priors_list)
Updates Gamma Distribution with the Gamma-Exponential
conjugate prior update rule. Parameterized by and
(not
)
update_gamma(k, theta, priors = list(), alternate_priors = FALSE)
update_gamma(k, theta, priors = list(), alternate_priors = FALSE)
k |
Double value for |
theta |
Double value for |
priors |
An optional list object that contains k0 and
theta0. Otherwise the function will use |
alternate_priors |
Boolean Defaults to FALSE. Allows a user to specify alternate prior names so the same prior isn't required when multiple gamma distributions are used. |
A list object that contains 'k' and 'theta'
update_gamma(k = 1, theta = 100, priors = list(k0 = 2, theta0 = 1000)) update_gamma(k = 10, theta = 200)
update_gamma(k = 1, theta = 100, priors = list(k0 = 2, theta0 = 1000)) update_gamma(k = 10, theta = 200)
Validates data values are all greater than 0.
validate_data_values(data_values)
validate_data_values(data_values)
data_values |
List of named data values |
None
Validates the input column exists in the dataframe, is of the correct type, and that all values are greater than or equal to 0.
validate_input_column(column_name, input_df, greater_than_zero = TRUE)
validate_input_column(column_name, input_df, greater_than_zero = TRUE)
column_name |
String value of the column name |
input_df |
Dataframe containing option_name (str) and various other columns depending on the distribution type. See vignette for more details. |
greater_than_zero |
Boolean: Do all values in the column have to be greater than zero? |
None
Validates the input dataframe has the correct type, correct required column names, that the distribution is valid, that the column types are correct, and that the column values are greater than or equal to 0 when they are numeric.
validate_input_df(input_df, distribution)
validate_input_df(input_df, distribution)
input_df |
Dataframe containing option_name (str) and various other columns depending on the distribution type. See vignette for more details. |
distribution |
String of the distribution name |
Bool TRUE if all checks pass.
input_df <- tibble::tibble( option_name = c("A", "B"), sum_clicks = c(1000, 1000), sum_conversions = c(100, 120) ) validate_input_df(input_df, "conversion_rate")
input_df <- tibble::tibble( option_name = c("A", "B"), sum_clicks = c(1000, 1000), sum_conversions = c(100, 120) ) validate_input_df(input_df, "conversion_rate")
Function fails if posterior is not shaped correctly.
validate_posterior_samples(posterior_samples)
validate_posterior_samples(posterior_samples)
posterior_samples |
Tibble of data in long form with 2 columns 'option_name' and 'samples' |
None
Validates list of priors against a vector of valid priors and if the values are not valid, default priors are returned.
validate_priors(priors, valid_priors, default_priors)
validate_priors(priors, valid_priors, default_priors)
priors |
List of named priors with double values. |
valid_priors |
A character vector of valid prior names. |
default_priors |
A list of default priors for the distribution. |
A named list of valid priors for the distribution.
Verify that the option provided is in the poster_samples dataframe 'option_name' column. Raises error if not TRUE
validate_wrt_option(wrt_option, posterior_samples)
validate_wrt_option(wrt_option, posterior_samples)
wrt_option |
string name of the option |
posterior_samples |
Tibble returned from sample_from_posterior with 3 columns 'option_name', 'samples', and 'sample_id'. |
None