Graph self-supervised representation learning has gained considerable attention and demonstrated remarkableefficacy in extracting meaningful representations from graphs, particularly in the absence of labeled data.Two representative methods in this domain are graph auto-encoding and graph contrastive learning. However,the former methods primarily focus on global structures, potentially overlooking some fine-grainedinformation during reconstruction. The latter methods emphasize node similarity across correlated viewsin the embedding space, potentially neglecting the inherent global graph information in the original inputspace. Moreover, handling incomplete graphs in real-world scenarios, where original features are unavailablefor certain nodes, poses challenges for both types of methods. To alleviate these limitations, we integratemasked graph auto-encoding and prototype-aware graph contrastive learning into a unified model to learnnode representations in graphs. In our method, we begin by masking a portion of node features and utilizea specific decoding strategy to reconstruct the masked information. This process facilitates the recovery ofgraphs from a global or macro level and enables handling incomplete graphs easily. Moreover, we treat themasked graph and the original one as a pair of contrasting views, enforcing the alignment and uniformitybetween their corresponding node representations at a local or micro level. Last, to capture cluster structuresfrom a meso level and learn more discriminative representations, we introduce a prototype-aware clusteringconsistency loss that is jointly optimized with the preceding two complementary objectives. Extensiveexperiments conducted on several datasets demonstrate that the proposed method achieves significantly betteror competitive performance on downstream tasks, especially for graph clustering, compared with thestate-of-the-art methods, showcasing its superiority in enhancing graph representation learning.