An apparatus of a multimedia telephony services over IMS (MTSI) based user equipment (UE) configurable for video Region-of-Interest (ROI) signaling and operation as an MTSI sender, the apparatus comprising: memory; and processing circuitry configured to: signal video Region-of-Interest (ROI) information for a first ROI of an MTSI receiver in real-time protocol (RTP) packets, the RTP packets to include at least a zoom command to capture the first ROI, the first ROI being a requested ROI of the MTSI receiver; decode received RTP payload packets from the MTSI receiver, the RTP payload packets comprising video corresponding to the first ROI; receive signaling in RTCP feedback reports from the MTSI receiver requesting a second ROI, the second ROI provided during an Session Description Protocol (SDP) capability negotiation, the second ROI being a predefined ROI of the MTSI sender; and encode video corresponding to the second ROI in RTP payload packets for transmission to the MTSI receiver.