IDFModel¶
-
class
pyspark.mllib.feature.IDFModel(java_model: py4j.java_gateway.JavaObject)[source]¶ Represents an IDF model that can transform term frequency vectors.
New in version 1.2.0.
Methods
call(name, *a)Call method of java_model
docFreq()Returns the document frequency.
idf()Returns the current IDF vector.
numDocs()Returns number of documents evaluated to compute idf
transform(x)Transforms term frequency (TF) vectors to TF-IDF vectors.
Methods Documentation
-
call(name: str, *a: Any) → Any¶ Call method of java_model
-
idf() → pyspark.mllib.linalg.Vector[source]¶ Returns the current IDF vector.
New in version 1.4.0.
-
transform(x: Union[VectorLike, pyspark.rdd.RDD[VectorLike]]) → Union[pyspark.mllib.linalg.Vector, pyspark.rdd.RDD[pyspark.mllib.linalg.Vector]][source]¶ Transforms term frequency (TF) vectors to TF-IDF vectors.
If minDocFreq was set for the IDF calculation, the terms which occur in fewer than minDocFreq documents will have an entry of 0.
New in version 1.2.0.
- Parameters
- x
pyspark.mllib.linalg.Vectororpyspark.RDD an RDD of term frequency vectors or a term frequency vector
- x
- Returns
pyspark.mllib.linalg.Vectororpyspark.RDDan RDD of TF-IDF vectors or a TF-IDF vector
Notes
In Python, transform cannot currently be used within an RDD transformation or action. Call transform directly on the RDD instead.
-