Systems, methods, and non-transitory computer-readable media can initiate a video capture mode that provides a camera view. A touch gesture can be detected via a touch display. A drawing can be rendered based on the touch gesture. The drawing can be rendered to appear to overlay the camera view. A first video image frame can be acquired based on the camera view. At least a portion of the first video image frame and the drawing can be combined to produce a first combined frame. The drawing can appear to overlay the first video image frame. The first combined frame can be stored in a video buffer.