Video classification models, especially deep learning models, can be computationally intensive. Running such models on mobile devices, particularly for real-time applications on the edge like our award-winning seizure detection model, may lead to overheating and suboptimal performance. In fact, recent tests with our typically zippy Samsung Galaxy A30 crashed after a few hours due to overheating . Since we are interested primarily in detecting clonic motion at night, most of the video feed we expect is still. If we can focus compute just on video fragments showing movement, we expect to dramatically reduce device energy consumption and overheating. In this blog post, we’ll walk you through a code snippet for implementing a simple motion detection algorithm for Android. Let’s get right on it!
 
Everyone stay still!
  

The Challenges of Inference on a Video Stream

Real-time video inference on edge devices is plagued by challenges including limited computational resources, power consumption, heat generation, latency, model size, data transfer constraints, environmental variability, update and maintenance difficulties, and scalability issues. So we need to be super-judicious with computational resources and focus compute on the most important parts of the stream. Currently, our algorithm takes a continuous video stream and performs inference on every frame that comes in, even if nothing is happening at all. This is extremely inefficient, leading to unnecessary battery consumption and device overheating.

Warming the planet, one frame at a time

We must surely be able to do better. We want to warm the planet with our smile, not our carbon footprint! So let’s see if we can find a motion detection algorithm that can be executed in real-time with minimal resources while still filtering out static video segments.

Motion Detection Algorithms

The field of motion detection has a long history and a crowded space of algorithms to choose from. Sehairi, Chouireb, & Meunier (2017) conducted one of the best comparisons we have found. Their review evaluated the following algorithms.

  1. Frame Differencing (FD): This technique involves calculating pixel-wise differences between consecutive frames.
  2. Three-Frame Difference (3FD): Extends frame differencing by comparing the current frame with the previous two frames, which helps reducing false positives.
  3. Running Average Filter (RAF): Adaptive background subtraction maintains a dynamic background model that adapts over time.
  4. Forgetting Morphological Temporal Gradient (FMTG): Morphological operation-based method that calculates temporal gradients while incorporating a forgetting mechanism to handle background changes.
  5. Sigma-Delta Background Estimation (ΣΔ): Employs a method inspired by sigma-delta modulation to differentiate between moving objects and the background.
  6. Spatio-Temporal Markov Random Field (MRF): Considers the temporal and spatial relationships between pixels to identify moving objects.
  7. Running Gaussian Average (RGA): Maintains a running average of pixel values over time and detects motion by comparing current pixel values with the running average.
  8. Mixture of Gaussians (MoG): Probabilistic approach that models pixel values using a mixture of Gaussian distributions.
  9. Spatio-Temporal Entropy Image (STEI): Represents motion by calculating the entropy of pixel values over time and space, helping identify regions with changing content.
  10. Difference-Based Spatio-Temporal Entropy Image (DSTEI): Enhances STEI by considering differences in pixel values between consecutive frames.
  11. Eigen-Background (Eig-Bg): Employs eigenvalues and eigenvectors to represent background information and separate it from moving objects in the scene.
  12. Simplified Self-Organized Map (Simp-SOBS): Simplified version of the self-organized map to cluster pixel values and distinguish moving objects from the background in a self-adaptive manner.

Choosing an Algorithm

Sehairi, Chouireb, & Meunier (2017) evaluated the performance of these algorithms using the CDnet video dataset, a benchmark including problems ranging from basic simple scenes to complex scenes affected by bad weather and dynamic backgrounds. Their results showed that there is no perfect method for all challenging cases; each method performs well in certain cases and fails in others. However, for the purposes of our use case, they found that Frame Differencing, with its simple computational approach, was the fastest algorithm, offering a particularly good performance with night video.

Computational time for each method (presented in frames per second) (Sehairi, Chouireb & Meunier, 2017)

Understanding the Code

The snippet below contains the logic for motion detection:

      private var prevFrame: Bitmap? = null
      private var motionThresholdDevice = 10000000 // Calibrate threshold to device.
      private var motionThresholdExternal = 500000 // Calibrate threshold to external camera.

      private fun detectMotion(frame: Bitmap): Boolean {
        if (prevFrame == null) {
          // Initialize the previous frame on the first call.
          prevFrame = frame.copy(frame.config, true)
          return false
        }

        // Convert the current frame to grayscale for simplicity (you can use more advanced methods)
        val currentFrameGray = convertToGrayscale(frame)
        val prevFrameGray = convertToGrayscale(prevFrame!!)

        // Compute the absolute difference between the current and previous frames
        val frameDiff = calculateFrameDifference(currentFrameGray, prevFrameGray)

        // Update the previous frame with the current frame
        prevFrame = frame.copy(frame.config, true)

        // Check if motion is detected based on device and threshold
        if (inputSource == SOURCE_DEVICE_CAMERA) {
          return frameDiff > motionThresholdDevice
        } else if (inputSource == SOURCE_EXTERNAL_CAMERA) {
          return frameDiff > motionThresholdExternal
        } else {
          throw Exception("Please set proper input source")
        }
      }

      private fun convertToGrayscale(frame: Bitmap): Bitmap {
        val grayFrame = Bitmap.createBitmap(frame.width, frame.height, Bitmap.Config.RGB_565)
        val canvas = Canvas(grayFrame)
        val paint = Paint()
        val colorMatrix = ColorMatrix()
        colorMatrix.setSaturation(0f) // Convert to grayscale
        val filter = ColorMatrixColorFilter(colorMatrix)
        paint.colorFilter = filter
        canvas.drawBitmap(frame, 0f, 0f, paint)
        return grayFrame
      }

      private fun calculateFrameDifference(frame1: Bitmap, frame2: Bitmap): Int {
        val width = frame1.width
        val height = frame1.height
        val pixels1 = IntArray(width * height)
        val pixels2 = IntArray(width * height)

        frame1.getPixels(pixels1, 0, width, 0, 0, width, height)
        frame2.getPixels(pixels2, 0, width, 0, 0, width, height)

        var diff = 0

        for (i in pixels1.indices) {
          val pixel1 = pixels1[i]
          val pixel2 = pixels2[i]

          val red1 = Color.red(pixel1)
          val green1 = Color.green(pixel1)
          val blue1 = Color.blue(pixel1)

          val red2 = Color.red(pixel2)
          val green2 = Color.green(pixel2)
          val blue2 = Color.blue(pixel2)

          // Calculate the absolute difference in RGB values
          val deltaRed = Math.abs(red1 - red2)
          val deltaGreen = Math.abs(green1 - green2)
          val deltaBlue = Math.abs(blue1 - blue2)

          // Calculate the overall difference for the pixel
          val pixelDiff = deltaRed + deltaGreen + deltaBlue

          diff += pixelDiff
        }

        return diff
      }

Let’s break down the key components of this code:

  1. detectMotion(frame: Bitmap): This function takes a frame (as a Bitmap) as input and determines whether motion is detected. It does so by comparing the current frame with the previous frame using the calculateFrameDifference function. If the frame difference exceeds a predefined threshold (motionThresholdDevice or motionThresholdExternal based on the input source), it returns true, indicating motion is detected.
  2. convertToGrayscale(frame: Bitmap): To simplify the motion detection process, the current frame is converted to grayscale. Grayscale images contain only shades of gray (from black to white) and simplify the comparison of pixel values.
  3. calculateFrameDifference(frame1: Bitmap, frame2: Bitmap): This function calculates the absolute difference in RGB values between two frames. It calculates the difference for each pixel in the frames and sums up these differences to get an overall difference score.

Frame Differencing in Action

Next we implement the motion detection filter to our video classification model. As you can see in the gif below, the model does not receive inputs when the image is still, and thus the inference does not change. As soon as the algorithm detects  a significant amount of movement, frames are fed to the model in order to update predictions for each of the three states.

Saving the planet, one frame at a time

One aspect that requires careful fine-tuning is the motion detection threshold. As you can see in the code above, the theshold is quite different when using an external camera or the device camera. We even saw major differences when using the device camera on the Android Studio emulator vs. an actual device. We need to do some careful calibration here though: If the threshold is too low we lose all our efficiency gains., but if it is too high, the motion detection algorithm starts skipping too many frames and inference becomes useless. The optimally efficient boundary is usually razor thin, so calibrate carefully!

Evaluating Energy Consumption

To check the impact of the motion detection filter on energy usage, we turn to the profiling tool provided by Android Studio, which you can access as follows:

  • Open your project in Android Studio.
  • Connect your Android device to your computer.
  • Run your app on the device in debug mode.
  • In Android Studio, go to “View” -> “Tool Windows” -> “Profiler” to open the Profiler tab.

We then run the app with and without the motion detection algorithm to evaluate energy consumption on the same static video stream. As you can see below, frame differencing reduces energy consumption on the static video stream to about 25% of the baseline energy consumption.

Android Studio Profiler of Energy Consumption

 

In conclusion, the implementation of an efficient motion detection algorithm like Frame Differencing is a significant step toward reducing energy consumption and optimizing real-time video inference on edge devices. Besides our light-hearted claims to be saving the planet, the main benefit of this algo is to make the app lighter on the edge device, which will in turn reduce battery depletion and overheating.

About Rosan International

ROSAN is a technology company specialized in the development of Data Science and Artificial Intelligence solutions with the aim of solving the most challenging global projects. Contact us to discover how we can help you gain valuable insights from your data and optimize your processes.