We propose a weakly-supervised approach for object localization based on top-down attention which is able to consider both attributes and object classes as attentional cues. This enables to not only search for objects but additionally for objects with specific attributes such as colors or shapes. Our approach consists of two streams: an attribute stream and an object stream. By tracing backward through these two streams and localizing activated neurons in hidden layers, we generate two top-down attention maps, one for attributes and one for objects. Fusing these maps generates a joint attention map, which highlights regions with a specific attribute and object. We show experimentally that our method can localize objects in cluttered images by only specifying their attributes, and that instances of the same class can be discriminated based on their attributes.