Hi there! As you may know we have been working on video recognition algorithms a bit since the Spring. One of the more challenging aspects in video recognition is getting enough data to train a model. One of the ways we can increase the power of our training data is through data augmentation. In this post I will show how I have been doing data augmentation using the ‘vidaug’ library in Python.

The data

For illustration purposes, let us assume a training dataset ‘train_ds’ with videos of me waving my hand.  First let us check out the content and format of the dataset by defining a function ‘to_gif’ that animates the frames of a video in train_ds:

def to_gif(images):
  converted_images = np.clip(images * 255, 0, 255).astype(np.uint8)
  imageio.mimsave(‘/content/animation.gif’, converted_images, duration=2)
  return embed.embed_file(‘./animation.gif’)
frames, label = list(train_ds.take(5))[0]


Note that I am usually less square-shaped: This video was taken using a phone camera with a 4:3 aspect ratio, which gets distorted when reshaped into a square format!

Vidaug requirements and installation

Required packages:

  • numpy
  • PIL
  • scipy
  • skimage
  • OpenCV (i.e. cv2)

For installation, simply use:

sudo pip install git+

Sample Augmentation

The vidaug library has 18 different augmentation types, including random cropping, rotation, vertical and horizontal flips and a variety of image distortions. For this example we are going to concatenate 5 different  augmentation types, each applied with a probability of 0.5.

sometimes = lambda aug: va.Sometimes(0.5, aug) # Used to apply each augmentor with 50% probability
seq1 = va.Sequential([
    sometimes(va.RandomCrop(size=(resolution, resolution))),
    ], random_order=True)
tensor = tf.constant(frames)
tensor_np = tensor.numpy()
video_aug_tensor = tf.convert_to_tensor(seq1(tensor_np))


As luck would have it, I happened to get a vertical flip with a Gaussian blur.

Batch Augmentation

We will now define an ‘augmenter’ function to implement augmentation to the whole dataset.
def augmenter (dataset, num_augmentations=1):
  new_datasets = []
  for i, (frames, labels) in enumerate(dataset):
    print(“Processing video #”, i)
    new_element = (frames, labels)
    tensor = tf.constant(frames)
    tensor_np = tensor.numpy()
    sometimes = lambda aug: va.Sometimes(0.5, aug) # Used to apply each augmentor with 50% probability
    seq1 = va.Sequential([
        sometimes(va.RandomCrop(size=(resolution, resolution))),
    for j in range(num_augmentations):
      #print(“Processing augmentation #”, j)
      video_aug = seq1(tensor_np)
      video_aug_tensor = tf.convert_to_tensor(video_aug)
      new_element = (video_aug_tensor, labels)
  return new_datasets
Now we apply 30 augmentations to train_ds:
aug_train_ds = augmenter (dataset = train_ds, num_augmentations=num_augmentations)
To see the resulting augmentations in a reasonable way, we create a grid plotting function:
def plot_videos_grid(dataset):
    num_videos = 0
    for video in dataset:
      num_videos = num_videos + 1
    print(‘There are %s videos in this dataset’ % num_videos)
    print(‘Plotting below a frame from the middle of each video’)
    grid_size = (math.ceil(num_videos/5), 5)
    fig, axes = plt.subplots(*grid_size, figsize=(12, 12))
    for i, (frames, labels) in enumerate(dataset):
        tensor = tf.constant(frames)
        tensor_np = tensor.numpy()
        frame_np = tensor_np[round(len(frames)/2)]
        row = i // grid_size[1]
        col = i % grid_size[1]
        ax = axes[row, col]
And apply it to the augmented dataset:
plot_videos_grid (aug_train_ds)
Et voilá, out come a bunch of augmented Pablos for you!
You can of course play with parameters for more extreme augmentations. For example, with a rotation of 90 degrees ‘va.RandomRotate(degrees=10)’ you can get Pablos in all orientations:
OK that’s enough Pablos for today, hope you found that useful!

