Today’s general-purpose deep convolutional neural networks (CNN) for image classification and object detection are trained offline on large static datasets. However, this means that these CNNs are poorly equipped to adapt to new situations, and leaves them vulnerable to exploits targeting their pre-trained behaviors. Adaptable, rapidly-trained CNNs will require training in real-time on live video streams, such as training an image classifier onboard an unmanned aerial system (UAS) while in flight. We refer to this class of problem as Time-ordered Online Training (ToOT). Such information streams present an immense amount of incoming information, and labeling every datapoint for use in training will quickly overwhelm a human operator.
We demonstrate and evaluate a system tailored to performing ToOT in the field, capable of training an image classifier on a live video stream through minimal input from a human operator. We show that by exploiting the time-ordered nature of the video stream through optical flow-based object tracking, we can increase the effectiveness, i.e. the training benefit of human actions by more than 8 times.
Of course, optical-flow based object tracking is just one way we can reduce human effort when training neural networks. This works sets the stage for future investigation into this matter, and how we both extract information from and choose training images so that we get the best resulting model with the least human input.