Measuring informative content of words in documents in a document collection relative to a probability function including a concavity control parameter
Processing methods and systems are provided for representing documents relative to importance of words in the document. A processor comprising a weighting model of word importance in a document in a collection relative to an importance of the word in other documents in the collection computes a deviation of distribution of the word from a probability distribution of the word in other documents in the collection, where the deviation distribution is weighted in accordance with a concavity control function. A concavity control parameter is adjustable relative to word frequency.