The present disclosure discloses a system and a method for inputting a gesture in a 3D scene. The system comprises a gesture acquiring unit configured to simultaneously acquire at least two channels of video stream data in real time at different angles for a user's gesture; a gesture recognizing unit configured to recognize a gesture shape varying in real time from the at least two channels of video stream data; a gesture analyzing unit configured to analyze the gesture shape varying in real time to obtain corresponding gesture motion; and a gesture displaying unit configured to process the gesture motion into a 3D image and display the 3D image in the 3D scene in real time. The technical solutions of the present disclosure can display the user's real gesture in the 3D scene, thereby enhancing the real effect of the system and improving the user's usage experience.