Statistical tests in BioRender Graphing: Methods, assumption checks, and R packages

18 May 2026 16:24

Below, you'll find the specific packages, functions, and methods used for each test. For transparency, all computations are powered by R (version 4.5.1)

List of R packages
Statistical assumption checks
One-sample t-test (and nonparametric tests)
Two-Sample t-test (and nonparametric tests)
One-way ANOVA (and nonparametric tests) and multiple comparisons tests
Two-way ANOVA and multiple comparisons tests
Correlation
Logistic regression
Dose-response curves
Survival analysis

All R packages

R version 4.5.1
afex version 1.3-1
dplyr version 1.0.10
effsize version 0.8.1
emmeans version 1.10.1
httr2 version 1.0.0
jsonlite version 1.8.4
lme4 version 1.1-37
logger version 0.2.2
multcomp version 1.4-20
nlraa version 1.9.3
OptimModel version 2.0-3
PMCMRplus version 1.9.6
readr version 2.1.5
rjson version 0.2.21
rstatix version 0.7.1
survival version 3.8-3
tictoc version 1.1
tidyr version 1.3.0
uuid version 1.2-0

Statistical assumption checks

Our platform automatically runs tests to verify assumptions essential for each analysis type. We suggest preselected analysis options based on these results. Hovering over 'suggested' pills in the app reveals details about these recommendations.

Shapiro-Wilk test of normality

Parametric t-test and ANOVA have the underlying assumption that your dataset is normally distributed (follows a standard bell-curve).

We run the Shapiro-Wilk test of normality in the background to check if your dataset violates this assumption.

If the Shapiro-Wilk test (p > 0.05) cannot reject the hypothesis of normality, we recommend parametric tests (eg. t-test).

If the Shapiro-Wilk test (p ≤ 0.05) indicates that the data is not normally distributed, we log-transform the data internally and test again. If the subsequent test suggests the data is normally distributed in log-space, we recommend parametric tests on the log-transformed data (eg. lognormal t-test).

If both Shapiro-Wilk tests reject the hypothesis of normality, we suggest nonparametric alternatives (eg. Mann-Whitney U test).

This check uses the shapiro_test function from rstatix.

Levene’s test of equal variances

Parametric t-test and ANOVA have the underlying assumption that the variance in observations in different groups in your data is approximately the same.

We run Levene’s test of equal variances to check if your dataset violates this assumption.

If variances differ, we recommend Welch’s correction to accommodate unequal variances in t-tests or ANOVA.

This check uses the levene_test function from rstatix.

We use the default setting, which applies the median as the center of each group for a more robust test.

One-sample t-test (and nonparametric tests)

A one-sample t-test determines whether the mean of a single sample differs significantly from a known or hypothesized value. It assumes that the data are drawn from a normally distributed population.

One-sample t-test

We use a custom base R implementation that computes the t-statistic directly as (sample_mean − μ) / (SD / √n) using pt and qt from the stats package for p-values and critical values respectively. The degrees of freedom are n − 1. Confidence intervals are computed on the difference (sample_mean − μ): for a two-sided test, diff ± qt(1 − α/2, df) × SE; for one-sided tests, the CI is [diff − qt(conf_level, df) × SE, Inf] or [−Inf, diff + qt(conf_level, df) × SE]. Cohen's d effect size is reported as (sample_mean − μ) / SD.

One-sample ratio t-test (lognormal)

When data are expected to follow a lognormal distribution, the ratio t-test tests whether the geometric mean differs from a hypothesized positive value. The test log-transforms both the data and the hypothesized value, performs the identical custom t-test on the log scale, and back-transforms the results. The geometric mean, geometric SD, and ratio to the theoretical value are returned on the original scale, along with confidence intervals for both the ratio and the geometric mean.

One-sample Wilcoxon signed-rank test

The nonparametric alternative for one-sample comparisons. We use wilcox.test from the stats package with exact = NULL, letting base R select the exact or asymptotic method. The p_value_exact flag is set to TRUE when the number of non-zero differences n_nz < 50 (strictly) and there are no tied differences, reflecting the expected behavior under R's internal logic. A normal approximation with continuity correction is applied otherwise (correct = TRUE).

The Hodges–Lehmann pseudo-median is reported as the location estimate: it is the median of all pairwise Walsh averages (x_i + x_j) / 2 for i ≤ j (including self-pairs i = j) computed on the original values vector — this is the H-L estimator of the population median. The V statistic (sum of positive signed ranks) and separate sums of positive and negative ranks are also returned.

Two-sample t-test (and nonparametric tests)

Two-sample tests compare a continuous outcome between two groups. BioRender offers parametric t-tests (assuming normality) and nonparametric alternatives that make no distributional assumptions, along with lognormal variants for data where ratios, rather than differences, are the relevant effect.

All parametric two-sample tests use t_test from the rstatix package, and all nonparametric tests use wilcox_test from the same package. Effect sizes for parametric tests are computed with cohen.d from the effsize package. Note that rstatix's t_test computes the difference as level1 − level2 (alphabetical order); BioRender negates the t-statistic and CIs to report level2 − level1 (matching GraphPad Prism's convention), and one-sided p-values are recomputed from the negated statistic using pt to ensure the correct tail is used.

Unpaired two-sample t-test (Student’s)

t_test(..., var.equal = TRUE). Assumes equal variances; uses a pooled standard deviation across both groups. The pooled SEM is pooled_SD × √(1/n₁ + 1/n₂). Cohen's d is computed with cohen.d(y, x, paired = FALSE).

Unpaired two-sample t-test with Welch’s correction (aka Welch’s t-test)

t_test(..., var.equal = FALSE). Does not assume equal variances; uses Welch's approximation to degrees of freedom (Satterthwaite). The SEM is √(s₁²/n₁ + s₂²/n₂). Cohen's d is computed with cohen.d(y, x, paired = paired) from effsize, where paired is the function argument (default FALSE).

Mann-Whitney U test (Wilcoxon rank-sum test)

wilcox_test(..., paired = FALSE). Nonparametric comparison of two independent samples. U₁ and U₂ statistics are computed from the rank sums: U₁ = R₁ − n₁(n₁+1)/2 and U₂ = R₂ − n₂(n₂+1)/2, where R₁ and R₂ are the sums of ranks for each group. The reported U statistic is min(U₁, U₂).

Paired t-test

t_test(..., paired = TRUE). Subjects are matched across the two conditions; the test is performed on within-pair differences. Cohen's d for paired designs uses the within-subjects formula: |mean(x − y)| / √(SD₁² + SD₂² − 2r·SD₁·SD₂), where r is the Pearson correlation between the two conditions. The Pearson correlation coefficient and its p-value are also returned.

Wilcoxon matched-pairs signed ranks test

wilcox_test(..., paired = TRUE). Nonparametric equivalent of the paired t-test. Sum of positive signed ranks, sum of negative signed ranks, and the overall sum of signed ranks W are returned. The p_value_exact flag is set to TRUE when the number of pairs is less than 50 and there are no tied differences, reflecting the expected behavior under rstatix's internal logic. The Spearman rank correlation coefficient and its p-value between the two conditions are also returned.

Lognormal (ratio) variants

When data follow a lognormal distribution, interest lies in the ratio of geometric means rather than the difference of arithmetic means. For unpaired and Welch tests, all values are natural-log-transformed and the corresponding t_test is run on the log scale. Geometric means are back-transformed as exp(mean(log(values))). Following the level2 − level1 convention, geometric_mean1 corresponds to the second alphabetical group and geometric_mean2 to the first; the ratio reported is geometric_mean1 / geometric_mean2. CIs on the ratio are obtained by inverting and swapping the log-scale CI bounds: conf_low_ratio = exp(−conf.high_log), conf_high_ratio = exp(−conf.low_log). For the paired lognormal test, Cohen's d is also computed on the log scale; the mean, SD, and SEM of the log-differences, and the Pearson correlation on the log-scale data, are additionally returned.

One-way ANOVA (and nonparametric tests) and multiple comparisons tests

One-way ANOVA tests whether the means of three or more groups differ significantly. BioRender supports independent measures, repeated measures, and mixed-effects variants, as well as nonparametric alternatives and lognormal ANOVA.

Classic one-way ANOVA

Implemented via aov_car from the afex package. Contrasts are set to sum-to-zero (contr.sum) for the duration of the analysis and restored on exit. Sum of squares type: Type II SS is used only for one-way independent balanced designs (all groups have equal n and no repeated measures factor). Type III SS is used for all other cases: unbalanced designs, repeated measures, and all multi-factor designs regardless of balance. Effect size is reported as generalized eta-squared (ges) from the afex ANOVA table.

One-way ANOVA with Welch’s correction (aka Welch’s one-way ANOVA)

welch_anova_test from rstatix. An F-test that does not assume equal variances across groups, using a Welch–Satterthwaite adjustment to the denominator degrees of freedom. Reports the W statistic and numerator/denominator DFs.

Lognormal ANOVA

Values are log₁₀-transformed and classic_anova is run on the transformed data. F statistics, p-values, and degrees of freedom are invariant to the base of the logarithm; SS and MS are reported on the log₁₀ scale to match GraphPad Prism.

Kruskal-Wallis test

kruskal_test from rstatix. Nonparametric analogue of one-way ANOVA. Reports the H statistic and p-value.

Post-hoc multiple comparisons tests

Post-hoc tests are run after a significant ANOVA to identify which group pairs differ.

Tukey's multiple comparisons test uses estimated marginal means from emmeans and Tukey-adjusted pairwise contrasts via pairs(..., adjust = "tukey"). Confidence intervals and t-statistics for each pairwise comparison are returned.
Bonferroni multiple comparisons test uses the same emmeans framework with pairs(..., adjust = "bonferroni"). The critical t-value for the CI is qt(0.05 / (2 × k), df), where k is the number of comparisons.
Dunnett's test (one control vs. all others) uses glht from multcomp with mcp(categories = "Dunnet"). The control group is releveled to the reference position before fitting; confidence intervals are extracted via confint(glht_result).
Games–Howell test (post-hoc for unequal variances) uses games_howell_test from rstatix. The rstatix function applies a √0.5 factor internally to the pooled SE; BioRender removes this by dividing by √0.5 (equivalent to multiplying by √2) to match the conventional formulation.
Dunnett's T3 test (for unequal variances) uses dunnettT3Test from PMCMRplus. Pairwise degrees of freedom are computed with the Welch–Satterthwaite formula applied to each individual pair. Critical values are derived from the multivariate t-distribution via qmvt.
Dunn's test (nonparametric, follows Kruskal–Wallis) uses dunn_test from rstatix with Bonferroni p-value adjustment for all-pairwise comparisons. When a control group is specified, only comparisons involving the control are returned, and the Bonferroni correction uses the Dunnett family size (k − 1 comparisons) rather than k(k−1)/2.

Two-way ANOVA and multiple comparisons tests

Two-way ANOVA tests the effects of two categorical factors and their interaction on a continuous outcome. BioRender uses the same classic_anova function as one-way ANOVA, extended to include a second factor column in the data frame.

Sum of squares: Type III is always used for two-way ANOVA (whether balanced or not), which assesses each effect after accounting for all other effects. Contrasts are set to contr.sum (sum-to-zero) as required for Type III SS.
Effect size is reported as generalized eta-squared (ges) for each main effect and the interaction term.
Repeated measures: Two-way repeated measures ANOVA is supported via the same aov_car / afex infrastructure. Mauchly's sphericity test is computed for within-subjects factors. Greenhouse–Geisser correction can be applied to effects that violate sphericity.
Mixed-effects fallback: When repeated measures data have missing observations, a mixed-effects model (via lme4::lmer) is used, either always or only when missingness is detected, depending on the missing_value_handling option.

Post-hoc tests for two-way ANOVA use the same functions as one-way ANOVA (Tukey, Bonferroni, Dunnett, Dunnett T3, Games–Howell) but operate on the interaction cell means via emmeans. Factor-level prefixes that emmeans appends to numeric-valued factor levels are stripped from contrast labels before returning results, and within-subject factor column names prepended by afex (e.g. Xfactor1) are corrected to their original names.

Correlation

Correlation analysis quantifies the linear (Pearson) or monotonic (Spearman) association between two continuous variables and tests whether that association differs from zero.

We use cor.test from the stats package. Missing values are handled by pairwise deletion. A minimum of four complete observations is required.

Pearson correlation

cor.test(x, y, method = "pearson") returns the Pearson correlation coefficient r, a t-statistic on n − 2 degrees of freedom, a two-sided p-value, and exact 95% confidence intervals (via Fisher's z-transformation, which R applies internally).

Spearman correlation

cor.test(x, y, method = "spearman") returns the Spearman rank correlation coefficient ρ (rho). The exact permutation distribution is used when there are no tied values and n ≤ 50 (exact = TRUE); otherwise the asymptotic normal approximation is used. Because R's cor.test does not return confidence intervals for Spearman correlation, 95% CIs are approximated via Fisher's z-transformation: z = 0.5 × ln((1 + ρ) / (1 − ρ)), SE = 1 / √(n − 3), and the CI bounds are back-transformed to the correlation scale.

Visualization

Regardless of whether Pearson or Spearman correlation is selected, the prediction line and 95% confidence band displayed on the scatter plot are derived from a simple linear regression (lm) fit to the data, over 100 evenly spaced points across the observed x range, via predict(model, interval = "confidence", level = 0.95).

Simple linear regression

Simple linear regression models the relationship between a single continuous predictor and a continuous outcome as a straight line.

We use lm from the stats package: lm(dependent ~ independent). An option to constrain the y-intercept to zero is available via lm(dependent ~ 0 + independent).

The slope and intercept coefficients, their standard errors, t-statistics, and p-values are taken directly from summary(model)$coefficients. The 95% CI on each coefficient is estimate ± qt(0.975, df.residual(model)) × SE. The x-intercept is computed as −β₀ / β₁.

The standard error of the estimate (Sy.x) is √(SS_residuals / (n − 2)) for the unconstrained model, or √(SS_residuals / (n − 1)) when the intercept is forced through zero, matching GraphPad Prism's convention.

Overall model significance is assessed with an F-test: pf(F, df1, df2, lower.tail = FALSE) using the F-statistic from summary(model)$fstatistic.

A fitted curve with 95% confidence band is returned over 250 evenly spaced points spanning the observed predictor range, via predict(model, interval = "confidence").

Logistic regression

Logistic regression models the log-odds of a binary outcome as a linear function of one or more predictor variables. BioRender offers three variants: simple (one predictor), multiple (several predictors), and Firth penalized logistic regression for small or imbalanced samples.

All logistic regression models use glm from the stats package with family = binomial(link = "logit") and maxit = 100 (the default of 25 can silently return a non-converged iterate on near-separation data). Before fitting, the service checks for quasi-complete separation (fitted probabilities numerically 0 or 1) and evaluates whether Firth penalization is recommended based on separation detection, low events-per-variable (EPV), or rare event rates.

Simple logistic regression

Fits a single continuous or binary predictor: glm(dependent ~ independent, family = binomial(link = "logit")).

Confidence intervals for coefficients are computed on the log-odds scale using profile likelihood via confint, with automatic fallback to Wald CIs if profiling fails. Odds ratios and their 95% CIs are obtained by exponentiating the coefficient estimates and confidence bounds.

The X value at 50% predicted probability (where the log-odds equal zero, i.e. X = −β₀/β₁) is reported with a 95% CI derived by the delta method.

Model fit is evaluated via a likelihood ratio test against the null (intercept-only) model. Goodness-of-fit statistics reported include Tjur's R², Cox–Snell R², the AUC of the ROC curve (computed via a rank-based, vectorized algorithm), and AICc. Predicted probabilities and a confidence band across the observed predictor range (250 grid points) are returned for plotting; the band is computed with Wald CIs on the logit scale and back-transformed via plogis.

Firth penalized logistic regression

When perfect or quasi-complete separation is detected, or when EPV is low (fewer than ~10 events per predictor), standard maximum likelihood estimates are infinite or severely biased. In these cases, BioRender fits Firth's penalized likelihood model (Firth 1993; Heinze & Schemper 2002), which adds a Jeffreys prior penalty to the log-likelihood to produce finite, less-biased estimates.

The Firth model is fit using a custom IRLS implementation. Convergence is assessed jointly on three criteria: max|Δβ| < 1e-8 (L∞ parameter change), max|U*(β)| < 1e-5 (maximum absolute value of the modified score vector, i.e. L∞ score norm), and |ΔL*| < 1e-5 (penalized log-likelihood change).

Confidence intervals are computed via profile penalized likelihood using a custom firth_profile_ci() function. For each parameter, the remaining parameters are re-optimized at each candidate β value via constrained IRLS. A doubling-step bracket expansion (starting at max(SE×3, 1.5), doubling up to a hard cap of 50 on the log-odds scale) locates a sign-change interval, then uniroot() (Brent's method) finds the root. If no bracket is found, the bound is reported as NA and ci_fallback_used = TRUE.

Goodness-of-fit statistics include Tjur's R², Cox–Snell R², McFadden's R², penalized deviance (-2 × L*), null penalized deviance, and AICc (computed from the unpenalized log-likelihood).

Dose-response curves

Dose-response curves model the sigmoidal relationship between a stimulus (concentration, dose) and a response (inhibition, stimulation). BioRender fits the four-parameter logistic (4PL) model by default; the three-parameter and five-parameter variants are special cases.

The 4-parameter model (4PL)

The four-parameter logistic equation is:

Inhibition (log x): y = bottom + (top − bottom) / (1 + 10^((log_IC50 − x) × hill_slope))
Inhibition (linear x): y = bottom + (top − bottom) / (1 + (IC50 / x)^hill_slope)
Stimulation (linear x): y = bottom + (top − bottom) × x^hill_slope / (x^hill_slope + EC50^hill_slope)

The four free parameters are: bottom (minimum response), top (maximum response), hill_slope (Hill coefficient), and EC50/IC50 (concentration at the midpoint between bottom and top).

The 3-parameter model (3PL)

Equivalent to the 4PL with the Hill slope constrained: hill_slope = −1 for inhibition, hill_slope = +1 for stimulation. All other parameters remain free.

The 5-parameter logistic (5PL)

Extends the 4PL by adding an asymmetry factor S:

Log x: y = bottom + (top − bottom) / (1 + 10^((log_EC50 − x) × hill_slope))^S
Linear x (inhibition): y = bottom + (top − bottom) / (1 + (EC50/x)^hill_slope)^S

When S = 1 the 5PL reduces to the 4PL. S is initialized to 1 and estimated freely. Note: the parameter log_EC50 in the 5PL formula is the inflection point of the curve. When S ≠ 1, the true EC50 differs and is computed by applying the correction: log_EC50_actual = log_inflection − log10(2^(1/S) − 1) / hill_slope.

Fitting procedure

All models are fit using nls from the stats package. Starting values are derived from the data: bottom and top are initialized to the observed minimum and maximum group means; EC50/IC50 is initialized by linear interpolation to the y-midpoint; hill_slope is initialized to ±1 based on the direction of the data.

The PORT (nl2sol) algorithm is tried first (algorithm = "port"), as it supports bound constraints and avoids the "minFactor" failure that the default Gauss-Newton algorithm encounters on the EC50 profile axis. If PORT fails and no user-supplied bounds are active, a fallback nls() call without an explicit algorithm (defaulting to Gauss-Newton) is attempted. Any parameter constraints specified by the user are passed as lower / upper vectors to nls; the Hill slope is additionally capped at ±50 to prevent divergence, with its sign fixed to match the data direction.

Confidence intervals for fit parameters are computed via profile likelihood using confint(fit). If profiling fails (e.g. singular gradient), a Wald approximation is used instead: estimate ± qt(0.975, df_residual) × SE. EC50/IC50 and its CI are reported on both the log₁₀ scale and the original concentration scale.

A custom concentration (EC_x or IC_x at a user-specified response percentage) can also be computed. For inhibition, EC_x = EC50 × ((1/fraction − 1))^(1/hill_slope); for stimulation, EC_x = EC50 × (fraction / (1 − fraction))^(1/hill_slope). The 95% CI on the custom concentration is approximated by propagating the relative error from the EC50 CI.

Goodness-of-fit statistics reported are R² (1 − SS_res / SS_tot), residual sum of squares, Sy.x (residual standard error), and degrees of freedom. Confidence bands across both linear and log-spaced concentration grids (250 points each) are computed via the numerical delta method: the model formula is differentiated numerically with respect to each parameter to build the Jacobian J, then SE(ŷ) = √(J' × vcov(fit) × J) and the band is ŷ ± t_{0.975, df} × SE(ŷ). If the delta method fails, the band falls back to point estimates only.

Survival analysis

Survival analysis models the time until an event of interest (e.g., death, relapse) while accounting for censored observations—subjects who have not yet experienced the event by the end of follow-up.

All survival analyses use functions from the survival package.

Kaplan–Meier curves

Kaplan–Meier survival curves are estimated with survfit using conf.type = "log-log" (complementary log-log transformation of the survival function), which is the method used by GraphPad Prism and is better suited to survival probabilities near 0 or 1 than the default plain-log method. Groups are factored in order of first appearance in the data to ensure curve ordering matches the input.

Median survival time and its 95% CI are extracted with quantile(km_fit, probs = 0.5). The curve is extended to the maximum observed time in each group (even if that time carries only censored observations) so that the plotted step function does not truncate early.

Log-rank test

The log-rank test compares survival distributions across groups under the null hypothesis of proportional hazards. We use survdiff with rho = 0. The chi-square statistic is evaluated on (number of groups − 1) degrees of freedom.

Gehan–Breslow–Wilcoxon test

An alternative to the log-rank test that gives greater weight to early time points. This test is implemented as a custom weighted chi-square statistic (not via survdiff). At each event time t, weight w_t = N_t (number at risk) is applied. The score vector is Z_j = Σ_t N_t (d_jt − e_jt) and the variance-covariance matrix uses hypergeometric variance with the N_t² terms applied. The test statistic χ² = Z' V⁻¹ Z is evaluated on K−1 degrees of freedom (Cholesky decomposition; QR pseudoinverse as fallback).

Hazard ratios

Two hazard ratio estimation methods are provided for pairwise group comparisons.

Logrank method: Uses the observed-to-expected ratio HR = (O₁/E₁) / (O₂/E₂), where O and E are obtained from the log-rank survdiff output. The standard error of ln(HR) is √(1/E₁ + 1/E₂), giving 95% CIs of exp(ln(HR) ± 1.96 × SE). This matches GraphPad Prism's "logrank" hazard ratio.

Mantel–Haenszel method: Uses the formula L = (O₁ − E₁) / V, HR = exp(L), with 95% CIs of exp(L ± 1.96 / √V), where V is the variance of (O₁ − E₁) under the null hypothesis, taken from survdiff$var[1,1] of a pairwise log-rank test. This matches GraphPad Prism's "Mantel–Haenszel" hazard ratio (Machin, Cheung & Parmar; Vaeth).

Cox proportional hazards regression

Cox PH regression is fit with coxph. In auto-detect mode, the ties handling method is selected automatically: "exact" when there are no tied event times, and "efron" when ties are present; the ties method can also be specified explicitly. Confidence intervals for hazard ratios are computed via a custom profile likelihood function (compute_profile_ci_cox()), which traces the profile likelihood surface to find the CI boundary rather than relying on the Wald approximation.

The proportional hazards assumption is tested using cox.zph, which fits scaled Schoenfeld residuals against time and reports a global and per-covariate p-value; a small p-value suggests the hazard ratio changes over time, violating the PH assumption.

Multiple comparison corrections

When three or more groups are compared and a correction method is selected, p-values from pairwise tests are corrected for multiple comparisons. Available methods are Bonferroni and Benjamini–Hochberg (false discovery rate), both applied via p.adjust, and Holm–Šídák, which is implemented with a custom stepwise procedure.

Need help? Reach out to our support team at support@biorender.com or start a live chat by clicking the "Chat With Us" bubble in the bottom right corner.

7 5

Was this article helpful?

6 out of 7 found this helpful

Articles in this section

Hey, how can we help?