Embodiments for generating appropriate data sets for learning to identify user actions. A user uses one or more applications over a suitable period of time. As the user uses the applications, a monitoring device, acting as a “man-in-the-middle,” intermediates the exchange of encrypted communication between the applications and the servers that serve the applications. The monitoring device obtains, for each action performed by the user, two corresponding (bidirectional) flows of communication: an encrypted flow, and an unencrypted flow. Since the unencrypted flow indicates the type of action that was performed by the user, the correspondence between the encrypted flow and the unencrypted flow may be used to automatically label the encrypted flow, without decrypting the encrypted flow. Features of the encrypted communication may then be stored in association with the label to automatically generate appropriately-sized learning set for each application of interest.