- February 14, 2024
- By Paula Villasante
- Data Processing, Reproducibility
Have you ever wondered how to test if your functions are working correctly? Unit testing can be very helpful in such situations, ensuring the reliability and accuracy of your code. This practice is particularly useful when validating individual components or units of code, with the primary goal of verifying that each code unit functions correctly and produces the expected results.
Unit Testing
Unit testing is a software testing method where individual components of a program are tested in isolation to ensure they work correctly. It means writing and running tests to exercise a single module or an even smaller unit, such as a function. The main goal of unit testing is to verify that each code unit functions correctly and produces the expected results. Key features of unit tests include:
Isolation: Unit tests are designed to test a single code unit in isolation, without relying on other parts of the system. This makes it easier to identify and correct errors specific to a portion of the code.
Automation: Unit tests should be automatic and easily executable without human intervention. This allows them to be run frequently, ensuring code integrity as changes or updates are made.
Repeatability: Unit tests should be consistent and provide predictable results. If run under the same conditions, they should produce the same results, facilitating the quick and efficient detection of issues.
Speed: Unit tests should execute quickly so that developers can rapidly receive feedback on code quality. This is particularly important in agile development practices.
Independence: Unit tests should be independent of each other. The result of one unit test should not depend on the result of another. This simplifies the identification and resolution of issues without additional complications.
In R, the ‘testthat’ package provides a robust framework for implementing unit tests. But before we delve into the application of unit testing using ‘testthat,’ let’s explore some key features that make unit tests effective.
Unit Testing in R using the testthat package
The ‘testthat’ package provides a powerful tool for implementing unit testing in R. To illustrate its use, consider a simple function ‘square’ that calculates the square of a given number.
pacman::p_load(testthat, tidyverse)
# Function to be tested
square <- function(x) {
return(x^2)
}
# Unit test for the square function
test_that("square function works correctly", {
# Test case 1
expect_equal(square(2), 4)
# Test case 2
expect_equal(square(5), 25)
})
Test passed 😀
The output of this test indicates whether it has passed or not. In this case, the test has passed successfully. Great!
Now, let’s explore how this unit testing approach can be applied to other functions, ranging from basic calculations like recoding a variable to more complex scenarios such as regression analyses.
Unit Testing applied to variable age recoding
We can explore the practical applications of unit testing by considering the recoding of a variable, such as age. In the following example, we have a recodify_age
function designed to categorize age values into specific age groups.
By creating specific test cases with predefined age values, we can ensure that the recoding process works as expected and assigns the correct age categories. This verifies the functionality of the code, and provides a safety net against potential errors.
# Recoding function for age variable
recodify_age <- function(df) {
df$Age_Category <- case_when(
df$Age < 18 ~ "Under 18",
between(df$Age, 18, 30) ~ "18-30",
between(df$Age, 31, 45) ~ "31-45",
between(df$Age, 46, 60) ~ "46-60",
df$Age > 60 ~ "Over 60",
TRUE ~ "Unknown"
)
return(df)
}
# Unit tests for age recoding
test_that("recodify_age function works correctly", {
# Create a test data frame with age values
df_test <- data.frame(Age = c(10, 25, 40, 55, 70))
# Call the recoding function
df_result <- recodify_age(df_test)
# Verify expected results
expect_equal(df_result$Age_Category[1], "Under 18")
expect_equal(df_result$Age_Category[2], "18-30")
expect_equal(df_result$Age_Category[3], "31-45")
expect_equal(df_result$Age_Category[4], "46-60")
expect_equal(df_result$Age_Category[5], "Over 60")
})
Test passed 🥳
The unit test shows that the recoding works as expected, great!
Unit testing regression models
Unit testing isn’t limited to simple functions; it can also be applied to more complex functions and analytic pipelines like regression models. Let’s explore how we can unit test regression models in the following section.
Unit testing data pipeline
To ensure the reliability of our linear regression model, we’ll define two functions. First, the load_data
function generates a synthetic dataset with a linear relationship between x
and y
. Second, the perform_regression
function conducts a linear regression analysis on a given dataset.
We’ll employ the test_that
framework to create unit tests for these functions. The test for load_data
ensures the generated dataset has the correct dimensions and column names, while the test for perform_regression
checks if the regression output includes expected components such as coefficients and residuals.
load_data <- function() {
# Function to generate a sample dataset
set.seed(123)
data <- data.frame(x = 1:10, y = 2 * (1:10) + rnorm(10))
return(data)
}
perform_regression <- function(data) {
# Function to perform linear regression analysis
model <- lm(y ~ x, data = data)
return(summary(model))
}
# Test for load_data()
test_that("load_data generates a sample dataset", {
data <- load_data()
expect_equal(nrow(data), 10)
expect_equal(ncol(data), 2)
expect_true(all(c("x", "y") %in% colnames(data)))
})
# Test for perform_regression()
test_that("perform_regression performs linear regression analysis", {
# Create a sample dataset for testing
mock_data <- data.frame(x = 1:10, y = 2 * (1:10) + rnorm(10))
# Perform regression and check if it produces expected results
result <- perform_regression(mock_data)
expect_true("coefficients" %in% names(result))
expect_true("residuals" %in% names(result))
# Add more specific checks based on your regression output
})
Test passed 😀 Test passed 🥳
As expected, both functions have successfully passed the unit tests. Now, let’s explore how our code handles the addition of new functionality in the following section.
Unit testing added functionality
Now, let’s consider a scenario where we decide to enhance the perform_regression
function by adding a new feature. This added functionality calculates and returns the R-squared value along with the original summary. Let’s explore how we can update both the code and tests to add this new feature.
perform_regression <- function(data) {
# Function to perform linear regression analysis
model <- lm(y ~ x, data = data)
summary_result <- summary(model)
r_squared <- summary_result$r.squared
# Return both the original summary and the calculated R-squared
return(list(summary_result = summary_result, r_squared = r_squared))
}
test_that("perform_regression performs linear regression analysis", {
# Create a sample dataset for testing
mock_data <- data.frame(x = 1:10, y = 2 * (1:10) + rnorm(10))
# Perform regression and check if it produces expected results
result <- perform_regression(mock_data)
# Original checks
expect_true("coefficients" %in% names(result$summary_result))
expect_true("residuals" %in% names(result$summary_result))
# Additional check for the new feature
expect_true("r_squared" %in% names(result))
expect_type(result$r_squared, "double")
# Add more specific checks based on your regression output
})
Test passed 🎊
In this example, the perform_regression
function is updated to include the R-squared value in the returned result. The corresponding test is expanded to include checks for the new feature. This way, during regression testing, you can ensure that the original functionality is still valid, and the new feature doesn’t introduce unexpected issues.
Feel free to explore and add more specific checks based on your regression output as needed!
Unit testing predictive performance
Unit testing can also be applied to multiple regression with a focus on validating the integrity of the model. In this example, we will validate the performance of a binomial regression model by testing it with new data.
The model is trained on a dataset (X_train
and y_train
), and its predictions on a separate dataset (X_test
) are compared against the expected outcomes (y_expected
). The use of expect_equal
ensures that the predicted and expected outcomes align, providing a robust assessment of the model’s predictive accuracy.
test_that("model works correctly", {
# Load the training data
X_train <- data.frame(c1 = c(50, 30, 40, 60, 20), c2 = c(80, 90, 75, 70, 60))
y_train <- factor(c(1, 0, 1, 0, 1), levels = c(0, 1)) # Ensure y_train is a factor for the binomial family
# Train the model
model <- glm(y_train ~ ., data = X_train, family = "binomial")
# Test the model with some new data
X_test <- data.frame(c1 = c(35, 45), c2 = c(85, 60))
y_pred <- round(predict(model, newdata = X_test, type = "response")) # Get predicted probabilities and round to get classes
y_pred <- factor(y_pred, levels = c(0, 1))
names(y_pred) <- NULL
y_expected <- factor(c(0, 1), levels = c(0, 1))
# Verify that the model's predictions are correct
expect_equal(y_pred, y_expected)
})
Test passed 🥇
Again, the test has been passed, just as expected!
These are just a few examples of how you can use unit testing to ensure the robustness of your code. Please feel free to reach out with any comments or suggestions 😊
About Rosan International
ROSAN is a technology company specialized in the development of Data Science and Artificial Intelligence solutions with the aim of solving the most challenging global projects. Contact us to discover how we can help you gain valuable insights from your data and optimize your processes.