An image processing device comprises a rough region extraction unit for extracting a rough region of a human bone portion in a cross-sectional image, an inner voxel extraction unit for extracting, on a per cross-sectional image basis, voxels just inside the voxels in the outermost periphery of the rough region extracted by the rough region extraction unit, a density calculation unit for calculating the density value of a target object from the density value of each voxel extracted by the inner voxel extraction unit, and a threshold determination unit for calculating a threshold value used for extracting the contour of the target object from the calculated density value of the target object. The image processing device can precisely extract the contour of a target object from each cross-sectional image.