public class Imputer extends Estimator<ImputerModel> implements ImputerParams, DefaultParamsWritable
Note when an input column is integer, the imputed value is casted (truncated) to an integer type. For example, if the input column is IntegerType (1, 2, 4, null), the output will be IntegerType (1, 2, 4, 2) after mean imputation.
Note that the mean/median value is computed after filtering out missing values. All Null values in the input columns are treated as missing, and so are also imputed. For computing median, DataFrameStatFunctions.approxQuantile is used with a relative error of 0.001.
| Modifier and Type | Method and Description |
|---|---|
Imputer |
copy(ParamMap extra)
Creates a copy of this instance with the same UID and some extra params.
|
ImputerModel |
fit(Dataset<?> dataset)
Fits a model to the input data.
|
Param<String> |
inputCol()
Param for input column name.
|
StringArrayParam |
inputCols()
Param for input column names.
|
static Imputer |
load(String path) |
DoubleParam |
missingValue()
The placeholder for the missing values.
|
Param<String> |
outputCol()
Param for output column name.
|
StringArrayParam |
outputCols()
Param for output column names.
|
static MLReader<T> |
read() |
DoubleParam |
relativeError()
Param for the relative target precision for the approximate quantile algorithm.
|
Imputer |
setInputCol(String value) |
Imputer |
setInputCols(String[] value) |
Imputer |
setMissingValue(double value) |
Imputer |
setOutputCol(String value) |
Imputer |
setOutputCols(String[] value) |
Imputer |
setRelativeError(double value) |
Imputer |
setStrategy(String value)
Imputation strategy.
|
Param<String> |
strategy()
The imputation strategy.
|
StructType |
transformSchema(StructType schema)
Check transform validity and derive the output schema from the input schema.
|
String |
uid()
An immutable unique ID for the object and its derivatives.
|
paramsequals, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitgetInOutCols, getMissingValue, getStrategy, validateAndTransformSchemagetInputColgetInputColsgetOutputColgetOutputColsgetRelativeErrorclear, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, paramMap, params, set, set, set, setDefault, setDefault, shouldOwntoStringwritesave$init$, initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, initLock, isTraceEnabled, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning, org$apache$spark$internal$Logging$$log__$eq, org$apache$spark$internal$Logging$$log_, uninitializepublic static Imputer load(String path)
public static MLReader<T> read()
public final Param<String> strategy()
ImputerParamsstrategy in interface ImputerParamspublic final DoubleParam missingValue()
ImputerParamsmissingValue in interface ImputerParamspublic final DoubleParam relativeError()
HasRelativeErrorrelativeError in interface HasRelativeErrorpublic final StringArrayParam outputCols()
HasOutputColsoutputCols in interface HasOutputColspublic final Param<String> outputCol()
HasOutputColoutputCol in interface HasOutputColpublic final StringArrayParam inputCols()
HasInputColsinputCols in interface HasInputColspublic final Param<String> inputCol()
HasInputColinputCol in interface HasInputColpublic String uid()
Identifiableuid in interface Identifiablepublic Imputer setInputCol(String value)
public Imputer setOutputCol(String value)
public Imputer setInputCols(String[] value)
public Imputer setOutputCols(String[] value)
public Imputer setStrategy(String value)
value - (undocumented)public Imputer setMissingValue(double value)
public Imputer setRelativeError(double value)
public ImputerModel fit(Dataset<?> dataset)
Estimatorfit in class Estimator<ImputerModel>dataset - (undocumented)public StructType transformSchema(StructType schema)
PipelineStage
We check validity for interactions between parameters during transformSchema and
raise an exception if any parameter value is invalid. Parameter value checks which
do not depend on other parameters are handled by Param.validate().
Typical implementation should first conduct verification on schema change and parameter validity, including complex parameter interaction checks.
transformSchema in class PipelineStageschema - (undocumented)public Imputer copy(ParamMap extra)
ParamsdefaultCopy().copy in interface Paramscopy in class Estimator<ImputerModel>extra - (undocumented)