The invention relates to a method and apparatus for generating and displaying a 3D representation of at least a portion an intraoral scene. The method includes determining 3D point cloud data representing a part of an intraoral scene in a point cloud coordinate space. A colour image of the same part of the intraoral scene is acquired in camera coordinate space. The colour image elements are labelled that are within a region of the image representing a surface of said intraoral scene, which should preferably not be included in said 3D representation. Typically, image elements are labelled within a region with either a colour or colour pattern corresponding to a surface colour or surface colour pattern of a utensil used intraorally when acquiring said 3D data and colour image. Alternatively, or in addition elements are labelled that are within a region having a colour pattern corresponding to a colour pattern of a tooth surface area comprising undesired stains or particles. Either before or after said labelling of the image elements, the colour image is, if necessary, transformed from the camera coordinate space to the point cloud coordinate space. The labelled and applicably transformed colour image is then mapped onto the 3D point cloud data, whereby the 3D point cloud data points that map onto such labelled colour image elements are removed or filtered out. Eventually, a 3D representation is generated from said filtered 3D point cloud data, which does not include any of the surfaces represented by the labelled colour image elements.