By Rosan International | | Data Processing

 

Hi there! As you may know we have been working on video recognition algorithms a bit since the Spring. One of the more challenging aspects in video recognition is getting enough data to train a model. One of the ways we can increase the power of our training data is through data augmentation. In this post I will show how I have been doing data augmentation using the ‘vidaug’ library in Python.

The data

For illustration purposes, let us assume a training dataset ‘train_ds’ with videos of me waving my hand.  First let us check out the content and format of the dataset by defining a function ‘to_gif’ that animates the frames of a video in train_ds:

def to_gif(images):
  converted_images = np.clip(images * 255, 0, 255).astype(np.uint8)
  imageio.mimsave(‘/content/animation.gif’, converted_images, duration=2)
  return embed.embed_file(‘./animation.gif’)
frames, label = list(train_ds.take(5))[0]
to_gif(frames.numpy())

 

Note that I am usually less square-shaped: This video was taken using a phone camera with a 4:3 aspect ratio, which gets distorted when reshaped into a square format!

Vidaug requirements and installation

Required packages:

  • numpy
  • PIL
  • scipy
  • skimage
  • OpenCV (i.e. cv2)

For installation, simply use:

sudo pip install git+https://github.com/okankop/vidaug

Sample Augmentation

The vidaug library has 18 different augmentation types, including random cropping, rotation, vertical and horizontal flips and a variety of image distortions. For this example we are going to concatenate 5 different  augmentation types, each applied with a probability of 0.5.

sometimes = lambda aug: va.Sometimes(0.5, aug) # Used to apply each augmentor with 50% probability
seq1 = va.Sequential([
    sometimes(va.RandomCrop(size=(resolution, resolution))),
    sometimes(va.RandomRotate(degrees=10)),
    sometimes(va.VerticalFlip()),
    sometimes(va.HorizontalFlip()),
    sometimes(va.GaussianBlur(1))
    ], random_order=True)
tensor = tf.constant(frames)
tensor_np = tensor.numpy()
video_aug_tensor = tf.convert_to_tensor(seq1(tensor_np))
to_gif(video_aug_tensor.numpy())

 

As luck would have it, I happened to get a vertical flip with a Gaussian blur.

Batch Augmentation

We will now define an ‘augmenter’ function to implement augmentation to the whole dataset.
def augmenter (dataset, num_augmentations=1):
  new_datasets = []
  for i, (frames, labels) in enumerate(dataset):
    print(“Processing video #”, i)
    new_element = (frames, labels)
    new_datasets.append(new_element)
    tensor = tf.constant(frames)
    tensor_np = tensor.numpy()
    sometimes = lambda aug: va.Sometimes(0.5, aug) # Used to apply each augmentor with 50% probability
    seq1 = va.Sequential([
        sometimes(va.RandomCrop(size=(resolution, resolution))),
        sometimes(va.RandomRotate(degrees=10)),
        sometimes(va.VerticalFlip()),
        sometimes(va.HorizontalFlip()),
        sometimes(va.GaussianBlur(1))
        ])
    for j in range(num_augmentations):
      #print(“Processing augmentation #”, j)
      video_aug = seq1(tensor_np)
      video_aug_tensor = tf.convert_to_tensor(video_aug)
      new_element = (video_aug_tensor, labels)
      new_datasets.append(new_element)
  return new_datasets
Now we apply 30 augmentations to train_ds:
aug_train_ds = augmenter (dataset = train_ds, num_augmentations=num_augmentations)
To see the resulting augmentations in a reasonable way, we create a grid plotting function:
def plot_videos_grid(dataset):
    num_videos = 0
    for video in dataset:
      num_videos = num_videos + 1
    print(‘There are %s videos in this dataset’ % num_videos)
    print(‘Plotting below a frame from the middle of each video’)
    grid_size = (math.ceil(num_videos/5), 5)
    fig, axes = plt.subplots(*grid_size, figsize=(12, 12))
    for i, (frames, labels) in enumerate(dataset):
        tensor = tf.constant(frames)
        tensor_np = tensor.numpy()
        frame_np = tensor_np[round(len(frames)/2)]
        row = i // grid_size[1]
        col = i % grid_size[1]
        ax = axes[row, col]
        ax.imshow(frame_np)
        ax.axis(‘off’)
    plt.tight_layout()
    plt.show()
And apply it to the augmented dataset:
plot_videos_grid (aug_train_ds)
Et voilá, out come a bunch of augmented Pablos for you!
You can of course play with parameters for more extreme augmentations. For example, with a rotation of 90 degrees ‘va.RandomRotate(degrees=10)’ you can get Pablos in all orientations:
OK that’s enough Pablos for today, hope you found that useful!

About Rosan International

ROSAN is a technology company specialized in the development of Data Science and Artificial Intelligence solutions with the aim of solving the most challenging global projects. Contact us to discover how we can help you gain valuable insights from your data and optimize your processes.