An apparatus is provided for audibly reading text retrieved from a captured image. In one implementation, the apparatus comprises an image sensor configured to capture image data from an environment of a user, and at least one processor. The processor is configured to determine an existence of a pointing trigger in the image data, the trigger being associated with a user's desire to hear text read aloud, and wherein the trigger identifies an intermediate portion of the text a distance from a level break in the text. The processor is further configured to perform a layout analysis on the text to identify a level break associated with the trigger; and cause the text to be read aloud from the level break associated with the trigger.