- February 14, 2024
- By Paula Villasante
- Data Processing, Reproducibility

Have you ever wondered how to test if your functions are working correctly? Unit testing can be very helpful in such situations, ensuring the reliability and accuracy of your code. This practice is particularly useful when validating individual components or units of code, with the primary goal of verifying that each code unit functions correctly and produces the expected results.

## Unit Testing

Unit testing is a software testing method where individual components of a program are tested in isolation to ensure they work correctly. It means writing and running tests to exercise a single module or an even smaller unit, such as a function. The main goal of unit testing is to verify that each code unit functions correctly and produces the expected results. Key features of unit tests include:

**Isolation:**Unit tests are designed to test a single code unit in isolation, without relying on other parts of the system. This makes it easier to identify and correct errors specific to a portion of the code.**Automation:**Unit tests should be automatic and easily executable without human intervention. This allows them to be run frequently, ensuring code integrity as changes or updates are made.**Repeatability:**Unit tests should be consistent and provide predictable results. If run under the same conditions, they should produce the same results, facilitating the quick and efficient detection of issues.**Speed:**Unit tests should execute quickly so that developers can rapidly receive feedback on code quality. This is particularly important in agile development practices.**Independence:**Unit tests should be independent of each other. The result of one unit test should not depend on the result of another. This simplifies the identification and resolution of issues without additional complications.

In R, the ‘testthat’ package provides a robust framework for implementing unit tests. But before we delve into the application of unit testing using ‘testthat,’ let’s explore some key features that make unit tests effective.

## Unit Testing in R using the testthat package

The ‘testthat’ package provides a powerful tool for implementing unit testing in R. To illustrate its use, consider a simple function ‘square’ that calculates the square of a given number.

`pacman::p_load(testthat, tidyverse)`

`# Function to be tested`

square <- function(x) {

return(x^2)

}

# Unit test for the square function

test_that("square function works correctly", {

# Test case 1

expect_equal(square(2), 4)

# Test case 2

expect_equal(square(5), 25)

})

`Test passed 😀`

The output of this test indicates whether it has passed or not. In this case, the test has passed successfully. Great!

Now, let’s explore how this unit testing approach can be applied to other functions, ranging from basic calculations like recoding a variable to more complex scenarios such as regression analyses.

## Unit Testing applied to variable age recoding

We can explore the practical applications of unit testing by considering the recoding of a variable, such as age. In the following example, we have a `recodify_age`

function designed to categorize age values into specific age groups.

By creating specific test cases with predefined age values, we can ensure that the recoding process works as expected and assigns the correct age categories. This verifies the functionality of the code, and provides a safety net against potential errors.

`# Recoding function for age variable`

recodify_age <- function(df) {

df$Age_Category <- case_when(

df$Age < 18 ~ "Under 18",

between(df$Age, 18, 30) ~ "18-30",

between(df$Age, 31, 45) ~ "31-45",

between(df$Age, 46, 60) ~ "46-60",

df$Age > 60 ~ "Over 60",

TRUE ~ "Unknown"

)

return(df)

}

# Unit tests for age recoding

test_that("recodify_age function works correctly", {

# Create a test data frame with age values

df_test <- data.frame(Age = c(10, 25, 40, 55, 70))

# Call the recoding function

df_result <- recodify_age(df_test)

# Verify expected results

expect_equal(df_result$Age_Category[1], "Under 18")

expect_equal(df_result$Age_Category[2], "18-30")

expect_equal(df_result$Age_Category[3], "31-45")

expect_equal(df_result$Age_Category[4], "46-60")

expect_equal(df_result$Age_Category[5], "Over 60")

})

`Test passed 🥳`

The unit test shows that the recoding works as expected, great!

## Unit testing regression models

Unit testing isn’t limited to simple functions; it can also be applied to more complex functions and analytic pipelines like regression models. Let’s explore how we can unit test regression models in the following section.

### Unit testing data pipeline

To ensure the reliability of our linear regression model, we’ll define two functions. First, the `load_data`

function generates a synthetic dataset with a linear relationship between `x`

and `y`

. Second, the `perform_regression`

function conducts a linear regression analysis on a given dataset.

We’ll employ the `test_that`

framework to create unit tests for these functions. The test for `load_data`

ensures the generated dataset has the correct dimensions and column names, while the test for `perform_regression`

checks if the regression output includes expected components such as coefficients and residuals.

`load_data <- function() {`

# Function to generate a sample dataset

set.seed(123)

data <- data.frame(x = 1:10, y = 2 * (1:10) + rnorm(10))

return(data)

}

perform_regression <- function(data) {

# Function to perform linear regression analysis

model <- lm(y ~ x, data = data)

return(summary(model))

}

# Test for load_data()

test_that("load_data generates a sample dataset", {

data <- load_data()

expect_equal(nrow(data), 10)

expect_equal(ncol(data), 2)

expect_true(all(c("x", "y") %in% colnames(data)))

})

# Test for perform_regression()

test_that("perform_regression performs linear regression analysis", {

# Create a sample dataset for testing

mock_data <- data.frame(x = 1:10, y = 2 * (1:10) + rnorm(10))

# Perform regression and check if it produces expected results

result <- perform_regression(mock_data)

expect_true("coefficients" %in% names(result))

expect_true("residuals" %in% names(result))

# Add more specific checks based on your regression output

})

Test passed 😀 Test passed 🥳

As expected, both functions have successfully passed the unit tests. Now, let’s explore how our code handles the addition of new functionality in the following section.

### Unit testing added functionality

Now, let’s consider a scenario where we decide to enhance the `perform_regression`

function by adding a new feature. This added functionality calculates and returns the R-squared value along with the original summary. Let’s explore how we can update both the code and tests to add this new feature.

`perform_regression <- function(data) {`

# Function to perform linear regression analysis

model <- lm(y ~ x, data = data)

summary_result <- summary(model)

r_squared <- summary_result$r.squared

# Return both the original summary and the calculated R-squared

return(list(summary_result = summary_result, r_squared = r_squared))

}

test_that("perform_regression performs linear regression analysis", {

# Create a sample dataset for testing

mock_data <- data.frame(x = 1:10, y = 2 * (1:10) + rnorm(10))

# Perform regression and check if it produces expected results

result <- perform_regression(mock_data)

# Original checks

expect_true("coefficients" %in% names(result$summary_result))

expect_true("residuals" %in% names(result$summary_result))

# Additional check for the new feature

expect_true("r_squared" %in% names(result))

expect_type(result$r_squared, "double")

# Add more specific checks based on your regression output

})

`Test passed 🎊`

In this example, the `perform_regression`

function is updated to include the R-squared value in the returned result. The corresponding test is expanded to include checks for the new feature. This way, during regression testing, you can ensure that the original functionality is still valid, and the new feature doesn’t introduce unexpected issues.

Feel free to explore and add more specific checks based on your regression output as needed!

### Unit testing predictive performance

Unit testing can also be applied to multiple regression with a focus on validating the integrity of the model. In this example, we will validate the performance of a binomial regression model by testing it with new data.

The model is trained on a dataset (`X_train`

and `y_train`

), and its predictions on a separate dataset (`X_test`

) are compared against the expected outcomes (`y_expected`

). The use of `expect_equal`

ensures that the predicted and expected outcomes align, providing a robust assessment of the model’s predictive accuracy.

`test_that("model works correctly", {`

# Load the training data

X_train <- data.frame(c1 = c(50, 30, 40, 60, 20), c2 = c(80, 90, 75, 70, 60))

y_train <- factor(c(1, 0, 1, 0, 1), levels = c(0, 1)) # Ensure y_train is a factor for the binomial family

# Train the model

model <- glm(y_train ~ ., data = X_train, family = "binomial")

# Test the model with some new data

X_test <- data.frame(c1 = c(35, 45), c2 = c(85, 60))

y_pred <- round(predict(model, newdata = X_test, type = "response")) # Get predicted probabilities and round to get classes

y_pred <- factor(y_pred, levels = c(0, 1))

names(y_pred) <- NULL

y_expected <- factor(c(0, 1), levels = c(0, 1))

# Verify that the model's predictions are correct

expect_equal(y_pred, y_expected)

})

`Test passed 🥇`

Again, the test has been passed, just as expected!

These are just a few examples of how you can use unit testing to ensure the robustness of your code. Please feel free to reach out with any comments or suggestions 😊

**About Rosan International**

ROSAN is a technology company specialized in the development of Data Science and Artificial Intelligence solutions with the aim of solving the most challenging global projects. Contact us to discover how we can help you gain valuable insights from your data and optimize your processes.