Practical Transfer Learning for Intelligent Robot Programming

I worked on this project during the summer of 2020 (my third year) under the guidance of Prof. David Touretzky (LOR) at Carnegie Mellon University. The internship had to be done remotely because of Covid-19.

In this project we develop a Transfer Learning Tool for intelligent robot programming. Robots need to do more than classify images; they need to perceive objects situated in their environment. This requires both recognizing the object and estimating its pose (location and orientation) so that the robot can approach or avoid it.
The project uses the Cozmo robot along with the comzo-tools programming framework created by the PI. Cozmo’s low resolution camera provides 320×240 gray scale or 160×240 color images. We also expect to support additional robots as part of the effort to port cozmo tools to new platforms.

The Transfer Learning Tool has several components. The first component is a training module that allows naive robot programmers to train a network as described above. The second is an analysis component similar to the Personal Image Classifier that allows users to see how each of their training and test examples is being classified, and to add or subtract from the training set to improve classifier performance. The third component is a “wizard” to guide users to construct effective training sets. The novel idea here is that robots interact differently with different types of objects, so a wizard that understands the nature of an object can coach the user to collect training images that vary appropriately for that use case.
For my task, I consider Placard objects, a robot may need to interact with. “Placards” are two-dimensional patterns appearing on walls that can serve as landmarks for localization and navigation. Placards are only visible from one side of the wall, and their orientation in the environment is fixed.Future works will consider “Flat manipulables” and “Containers”.
The wizard can guide the user to collect an effective training set by explaining how to vary the robot’s position, or perhaps by moving the robot in cooperation with the user to systematically vary its distance and bearing from the object as images are collected.
Robustness is a major concern. Teachable Machine is a “proof of concept” demonstration and is not expected to produce robust classifiers. But for robot programming we may wish to produce classifiers that can handle variations in backgrounds, lighting conditions, and other factors. We will investigate the robustness of our approach and consider techniques for improving accuracy by artificially expanding the training set.
A deep network developed for image classification is not necessarily well suited for pose estimation, but it may be possible to extract crude pose estimates with enough information to accomplish the user’s intent. This is a question we investigated. An alternative approach we can considered is to train specialized pose estimators.

Summary of my tasks:

Employed transfer learning to allow a mobile robot with a low-resolution camera for object recognition from a small number of samples and guide users to construct effective training sets
Utilized extensive set of transformations to allow reasonable recognition rates for 2D patterns
Studied the minimal set and type of training samples necessary to achieve good performance on a highly varied test set with various orientations/viewing angles

You can find my PyTorch code attached here