In this paper, we propound a command processing mechanism for an autonomous arm manipulator using real-time speech and images. We propose a novel two-stage speech recognition algorithm using two-fold Dynamic Time Warping in each stage. Real-time wake-up word recognition is followed by offline command recognition using k-means. Since high precision is paramount in any control system activation mechanism, a restrictive threshold is set to gain a precision of 1. This alleviates the problem of accidental triggering of the control system. Object recognition and classification is performed by matching features resulting from a local feature detector and descriptor. The algorithm controls an arm manipulator with 5 degrees of freedom.