In an image processing apparatus, and an operation method and a program therefor, a target region having a predictable shape is extracted more accurately and more robustly. The image processing apparatus configured to assign a binary label representing belonging to a target region or not to each pixel in an image includes: a shape setting unit configured to set a predicted shape of the target region; an energy function setting unit configured to: select a pixel group including N pixels in the image, where N is a natural number of 4 or more, which have a positional relationship representing the set predicted shape; and set an energy function including an N-th order term in which a variable is a label of each of the N pixels of the selected pixel group, so that a value of the N-th order term is at a minimum value when a combination of the labels assigned to the N pixels of the selected pixel group is a pattern matching the set predicted shape of the target region, and increases in stages from the minimum value along with an increase in a number of pixels to which a label different from the pattern is assigned; and a labeling unit configured to perform the labeling by minimizing the set energy function.