Deep learning with neural networks is dependent on large amounts of annotated training data. For the development of robotic visuomotor skills in complex environments, generating suitable training data is time-consuming and depends on the availability of accurate robot models. Deep reinforcement learning alleviates this challenge by letting robots learn in an unsupervised manner through trial and error at the cost of long training times. In contrast, we present an approach for acquiring visuomotor skills for grasping through fast self-learning: The robot generates suitable training data through interaction with the environment based on initial motor abilities. Supervised end-to-end learning of visuomotor skills is realized with a deep convolutional neural architecture that combines two important subtasks of grasping: object localization and inverse kinematics.