Class Utils
-
Nested Class Summary
Nested Classes -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic StringAn identifier that backup masters use in their responses.static StringbuildLocationMetadata(scala.collection.immutable.Seq<org.apache.hadoop.fs.Path> paths, int stopAppendingThreshold) Convert a sequence ofPaths to a metadata string.static StringbytesToString(long size) Convert a quantity in bytes to a human-readable string such as "4.0 MiB".static StringbytesToString(scala.math.BigInt size) static longbyteStringAsBytes(String str) Convert a passed byte string (e.g.static longbyteStringAsGb(String str) Convert a passed byte string (e.g.static longbyteStringAsKb(String str) Convert a passed byte string (e.g.static longbyteStringAsMb(String str) Convert a passed byte string (e.g.static StringcheckAndGetK8sMasterUrl(String rawMasterURL) Check the validity of the given Kubernetes master URL and return the resolved URL.static booleancheckCommandAvailable(String command) Check if a command is available.static voidChecks if the host contains only valid hostname/ip without port NOTE: Incase of IPV6 ip it should be enclosed inside []static voidcheckHostPort(String hostPort) static longcheckOffHeapEnabled(SparkConf sparkConf, long offHeapSize) return 0 if MEMORY_OFFHEAP_ENABLED is false.static booleanJDK equivalent ofchmod 700 file.static <C> Class<C>classForName(String className, boolean initialize, boolean noSparkClassLoader) static <C> booleanstatic <C> booleanstatic booleanclassIsLoadable(String clazz) static booleanclassIsLoadableAndAssignableFrom(String clazz, Class<?> targetClass) static <T> Tclone(T value, SerializerInstance serializer, scala.reflect.ClassTag<T> evidence$2) Clone an object using a Spark serializer.static PropertiescloneProperties(Properties props) Create a new properties object with the same values as `props`static voidcopyFileStreamNIO(FileChannel input, WritableByteChannel output, long startPosition, long bytesToCopy) static longcopyStream(InputStream in, OutputStream out, boolean closeStreams, boolean transferToEnabled) static booleanstatic booleanstatic InputStreamcopyStreamUpTo(InputStream in, long maxSize) Copy the firstmaxSizebytes of data from the InputStream to an in-memory buffer, primarily to check for corruption.static booleancreateDirectory(File dir) static FilecreateDirectory(String root, String namePrefix) static Stringstatic org.apache.spark.internal.MessageWithContextcreateFailedToGetTokenMessage(String serviceName, Throwable e) Returns a string message about delegation token generation failurestatic <R extends Closeable>
RcreateResourceUninterruptiblyIfInTaskThread(scala.Function0<R> createResource) Create a resource uninterruptibly if we are in a task thread (i.e., TaskContext.get() != null).static StringcreateSecret(SparkConf conf) static FilecreateTempDir(String root, String namePrefix) Create a temporary directory inside the given parent directory.static Stringstatic Stringstatic StringdecodeFileNameInURI(URI uri) Get the file name from uri's raw path and decode it.static intDefine a default value for driver memory here since this value is referenced across the code base and nearly all files already use Utils.scalastatic voiddeleteRecursively(File file) Delete a file or directory and its contents recursively.static <T> Tdeserialize(byte[] bytes) static <T> Tdeserialize(byte[] bytes, ClassLoader loader) static longdeserializeLongValue(byte[] bytes) Deserialize a Long value (used forPythonPartitioner)static voiddeserializeViaNestedStream(InputStream is, SerializerInstance ser, scala.Function1<DeserializationStream, scala.runtime.BoxedUnit> f) Deserialize via nested stream using specific serializerstatic booleandoesDirectoryContainAnyNewFiles(File dir, long cutoff) Determines if a directory contains any files newer than cutoff seconds.static FiledoFetchFile(String url, File targetDir, String filename, SparkConf conf, org.apache.hadoop.conf.Configuration hadoopConf) Download a file or directory to target directory.static <T> scala.util.Try<T>doTryWithCallerStacktrace(scala.Function0<T> f) Use Try with stacktrace substitution for the caller retrieving the error.static scala.collection.immutable.Set<String>static StringencodeFileNameToURIRawPath(String fileName) A file name may contain some invalid URI characters, such as " ".static StringSame asencodeFileNameToURIRawPath(java.lang.String)but returns the relative UNIX path.static StringReturn a nice string representation of the exception.static StringexecuteAndGetOutput(scala.collection.immutable.Seq<String> command, File workingDir, scala.collection.Map<String, String> extraEnvironment, boolean redirectStderr) Execute a command and get its output, throwing an exception if it yields a code other than 0.static ProcessexecuteCommand(scala.collection.immutable.Seq<String> command, File workingDir, scala.collection.Map<String, String> extraEnvironment, boolean redirectStderr) Execute a command and return the process running the command.static intexecutorOffHeapMemorySizeAsMb(SparkConf sparkConf) Convert MEMORY_OFFHEAP_SIZE to MB Unit, return 0 if MEMORY_OFFHEAP_ENABLED is false.extractHostPortFromSparkUrl(String sparkUrl) Return a pair of host and port extracted from thesparkUrl.static FilefetchFile(String url, File targetDir, SparkConf conf, org.apache.hadoop.conf.Configuration hadoopConf, long timestamp, boolean useCache, boolean shouldUntar) Download a file or directory to target directory.static org.apache.spark.util.CallSitegetCallSite(scala.Function1<String, Object> skipClass) When called inside a class in the spark package, returns the name of the user code class (outside the spark package) that called into Spark, as well as which Spark method they called.static String[]Return the configured local directories where Spark can write files.static ClassLoaderstatic scala.collection.immutable.Set<String>getCurrentUserGroups(SparkConf sparkConf, String username) static StringReturns the current user name.static StringgetDefaultPropertiesFile(scala.collection.Map<String, String> env) Return the path of the default Spark properties file.static intReturn the initial number of executors for dynamic allocation.static longgetFileLength(File file, SparkConf workConf) Return the file length, if the file is compressed it returns the uncompressed file length.static Stringstatic org.apache.hadoop.fs.FileSystemgetHadoopFileSystem(String path, org.apache.hadoop.conf.Configuration conf) Return a Hadoop FileSystem with the scheme encoded in the given path.static org.apache.hadoop.fs.FileSystemgetHadoopFileSystem(URI path, org.apache.hadoop.conf.Configuration conf) Return a Hadoop FileSystem with the scheme encoded in the given path.static String[]Return a heap dump.static longgetIteratorSize(scala.collection.Iterator<Object> iterator) Counts the number of elements of an iterator.static <T> scala.collection.Iterator<scala.Tuple2<T,Object>> getIteratorZipWithIndex(scala.collection.Iterator<T> iter, long startIndex) Generate a zipWithIndex iterator, avoid index value overflowing problem in scala's zipWithIndexstatic StringgetLocalDir(SparkConf conf) Get the path of a temporary directory.static scala.collection.immutable.Seq<String>Return the local jar files which will be added to REPL's classpath.static StringGet current log levelstatic StringReturns the name of this JVM process.getPropertiesFromFile(String filename) Load properties present in the given file.static StringgetSimpleName(Class<?> cls) static ClassLoaderstatic StringgetSparkOrYarnConfig(SparkConf conf, String key, String default_) Return the value of a config either through the SparkConf or the Hadoop configuration.static scala.Option<String>Return the stderr of a process after waiting for the process to terminate.Returns the system properties map that is thread-safe to iterator over.static ThreadStackTrace[]Return a thread dump of all threads' stacktraces.static scala.Option<ThreadStackTrace>getThreadDumpForThread(long threadId) static <T> TgetTryWithCallerStacktrace(scala.util.Try<T> t) Retrieve the result of Try that was created by doTryWithCallerStacktrace.static jakarta.ws.rs.core.UriBuildergetUriBuilder(String uri) Create a UriBuilder from URI string.static jakarta.ws.rs.core.UriBuildergetUriBuilder(URI uri) Create a UriBuilder from URI object.static StringgetUsedTimeNs(long startTimeNs) Return the string to tell how long has passed in milliseconds.static scala.collection.immutable.Seq<String>getUserJars(SparkConf conf) Return the jar files pointed by the "spark.jars" property.static voidinitDaemon(org.slf4j.Logger log) Utility function that should be called early inmain()for daemons to set up some common diagnostic state.static <T> TinstantiateSerializerFromConf(org.apache.spark.internal.config.ConfigEntry<String> propertyName, SparkConf conf, boolean isDriver) static <T> TinstantiateSerializerOrShuffleManager(String className, SparkConf conf, boolean isDriver) static booleanisBindCollision(Throwable exception) Return whether the exception is caused by an address-port collision when binding.static booleanisClientMode(SparkConf conf) static booleanReturn whether dynamic allocation is enabled in the given conf.static booleanReturns true if the given exception was fatal.static booleanisFileSplittable(org.apache.hadoop.fs.Path path, org.apache.hadoop.io.compress.CompressionCodecFactory codecFactory) Check whether the file of the path is splittable.static booleanisG1GC()static booleanisInDirectory(File parent, File child) Return whether the specified file is a parent directory of the child file.static booleanReturns if the current codes are running in a Spark task, e.g., in executors.static booleanWhether the underlying Java version is at least 21.static booleanstatic booleanisLocalUri(String uri) Returns whether the URI is a "local:" URI.static booleanisMac()Whether the underlying operating system is Mac OS X.static booleanWhether the underlying operating system is Mac OS X and processor is Apple Silicon.static booleanisPushBasedShuffleEnabled(SparkConf conf, boolean isDriver, boolean checkSerializer) Push based shuffle can only be enabled when below conditions are met: - the application is submitted to run in YARN mode - external shuffle service enabled - IO encryption disabled - serializer(such as KryoSerializer) supports relocation of serialized objectsstatic booleanstatic booleanstatic booleanWhether the underlying operating system is Windows.static StringReturn the current system LD_LIBRARY_PATH namestatic StringlibraryPathEnvPrefix(scala.collection.immutable.Seq<String> libraryPaths) Return the prefix of a command that appends the given library paths to the system-specific library path environment variable.static StringloadDefaultSparkProperties(SparkConf conf, String filePath) Load default Spark properties from the given file.static <T> scala.collection.immutable.Seq<T>loadExtensions(Class<T> extClass, scala.collection.immutable.Seq<String> classes, SparkConf conf) Create instances of extension classes.static StringScheme used for files that are locally available on worker nodes in the cluster.static StringGet the local machine's FQDN.static StringGet the local machine's hostname.static StringGet the local machine's URI.static org.apache.spark.internal.Logging.LogStringContextLogStringContext(scala.StringContext sc) static <T> TlogUncaughtExceptions(scala.Function0<T> f) Execute the given block, logging and re-throwing any uncaught exception.static intstatic longmedian(long[] sizes, boolean alreadySorted) Return the median number of a long arraystatic StringmegabytesToString(long megabytes) Convert a quantity in megabytes to a human-readable string such as "4.0 MiB".static intmemoryStringToMb(String str) Convert a Java memory parameter passed to -Xmx (such as 300m or 1g) to a number of mebibytes.static StringmsDurationToString(long ms) Returns a human-readable string representing a duration such as "35ms"static StringnameForAppAndAttempt(String appId, scala.Option<String> appAttemptId) static String[]nonLocalPaths(String paths, boolean testWindows) Return all non-local paths from a comma-separated list of paths.static intnonNegativeHash(Object obj) static intnonNegativeMod(int x, int mod) static StringoffsetBytes(String path, long length, long start, long end) Return a string containing part of a file from byte 'start' to 'end'.static StringoffsetBytes(scala.collection.immutable.Seq<File> files, scala.collection.immutable.Seq<Object> fileLengths, long start, long end) Return a string containing data across a set of files.static org.slf4j.Loggerstatic voidorg$apache$spark$internal$Logging$$log__$eq(org.slf4j.Logger x$1) parseHostPort(String hostPort) static String[]parseStandaloneMasterUrls(String masterUrls) Split the comma delimited string of master URLs into a list.static intportMaxRetries(SparkConf conf) Maximum number of retries when binding to a port before giving up.static booleanWhether the underlying JVM prefer IPv6 addresses.static ThreadprocessStreamByLine(String threadName, InputStream inputStream, scala.Function1<String, scala.runtime.BoxedUnit> processLine) Return and start a daemon thread that processes the content of the input stream line by line.static Randomrandom()static <T> scala.collection.immutable.Seq<T>randomize(scala.collection.IterableOnce<T> seq, scala.reflect.ClassTag<T> evidence$1) Shuffle the elements of a collection into a random order, returning the result in a new collection.static <T> ObjectrandomizeInPlace(Object arr, Random rand) Shuffle the elements of an array into a random order, modifying the original array.static File[]Redact the sensitive values in the given map.Looks up the redaction regex from within the key value pairs and uses it to redact the rest of the key value pairs.static StringRedact the sensitive information in the given string.static <K,V> scala.collection.Seq<scala.Tuple2<K, V>> redact(scala.Option<scala.util.matching.Regex> regex, scala.collection.Seq<scala.Tuple2<K, V>> kvs) Redact the sensitive values in the given map.static scala.collection.immutable.Seq<String>redactCommandLineArgs(SparkConf conf, scala.collection.immutable.Seq<String> commands) static voidUtility function to enable or disable structured logging based on system properties.static voidresetStructuredLogging(SparkConf sparkConf) Utility function to enable or disable structured logging based on SparkConf.static URIresolveURI(String path) static StringresolveURIs(String paths) Resolve a comma-separated list of paths.static booleanresponseFromBackup(String msg) Return true if the response message is sent from a backup Master on standby.static StringsanitizeDirName(String str) static <T> byte[]serialize(T o) static voidserializeViaNestedStream(OutputStream os, SerializerInstance ser, scala.Function1<SerializationStream, scala.runtime.BoxedUnit> f) Serialize via nested stream using specific serializerstatic voidsetCustomHostname(String hostname) Allow setting a custom host namestatic voidsetLogLevel(org.apache.logging.log4j.Level l) configure a new log4j levelstatic voidsetLogLevelIfNeeded(String newLogLevel) static scala.collection.immutable.Seq<String>sparkJavaOpts(SparkConf conf, scala.Function1<String, Object> filterKey) Convert all spark properties set in the given SparkConf to a sequence of java options.static scala.collection.immutable.Seq<String>Split a string of potentially quoted arguments from the command line the way that a shell would do it to determine arguments to a command.static Stringstatic <T> scala.Tuple2<T,Object> startServiceOnPort(int startPort, scala.Function1<Object, scala.Tuple2<T, Object>> startService, int maxRetries, String serviceName) Attempt to start a service on the given port, or fail after a number of attempts.static <T> scala.Tuple2<T,Object> startServiceOnPort(int startPort, scala.Function1<Object, scala.Tuple2<T, Object>> startService, SparkConf conf, String serviceName) Attempt to start a service on the given port, or fail after a number of attempts.static intstringHalfWidth(String str) Return the number of half widths in a given string.static scala.collection.immutable.Seq<String>stringToSeq(String str) static final Stringstatic StringsubstituteAppId(String opt, String appId) Replaces all the {{APP_ID}} occurrences with the App Id.static StringsubstituteAppNExecIds(String opt, String appId, String execId) Replaces all the {{EXECUTOR_ID}} occurrences with the Executor Id and {{APP_ID}} occurrences with the App Id.static voidCreates a symlink.static FiletempFileWith(File path) Returns a path of temporary file which is in the same directory withpath.static scala.Option<Object>terminateProcess(Process process, long timeoutMs) Terminates a process waiting for at most the specified duration.static longtimeIt(int numIters, scala.Function0<scala.runtime.BoxedUnit> f, scala.Option<scala.Function0<scala.runtime.BoxedUnit>> prepare) Timing method based on iterations that permit JVM JIT optimization.static voidtimes(int numIters, scala.Function0<scala.runtime.BoxedUnit> f) Method executed for repeating a task for side effects.static longtimeStringAsMs(String str) Convert a time parameter such as (50s, 100ms, or 250us) to milliseconds for internal use.static longConvert a time parameter such as (50s, 100ms, or 250us) to seconds for internal use.static <T> scala.Tuple2<T,Object> timeTakenMs(scala.Function0<T> body) Records the duration of running `body`.static Stringstatic <R extends Closeable,T>
TtryInitializeResource(scala.Function0<R> createResource, scala.Function1<R, T> initialize) static <T> scala.util.Try<T>tryLog(scala.Function0<T> f) Executes the given block in a Try, logging any uncaught exceptions.static voidtryLogNonFatalError(scala.Function0<scala.runtime.BoxedUnit> block) Executes the given block.static voidtryOrExit(scala.Function0<scala.runtime.BoxedUnit> block) Execute a block of code that evaluates to Unit, forwarding any uncaught exceptions to the default UncaughtExceptionHandlerstatic <T> TtryOrIOException(scala.Function0<T> block) static voidtryOrStopSparkContext(SparkContext sc, scala.Function0<scala.runtime.BoxedUnit> block) Execute a block of code that evaluates to Unit, stop SparkContext if there is any uncaught exceptionstatic <R extends Closeable,T>
TtryWithResource(scala.Function0<R> createResource, scala.Function1<R, T> f) static <T> TtryWithSafeFinally(scala.Function0<T> block, scala.Function0<scala.runtime.BoxedUnit> finallyBlock) static <T> TtryWithSafeFinallyAndFailureCallbacks(scala.Function0<T> block, scala.Function0<scala.runtime.BoxedUnit> catchBlock, scala.Function0<scala.runtime.BoxedUnit> finallyBlock) Execute a block of code and call the failure callbacks in the catch block.static voidUnpacks an archive file into the specified directory.static scala.collection.immutable.Seq<File>unzipFilesFromFile(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path dfsZipFile, File localDir) Decompress a zip file into a local dir.static intuserPort(int base, int offset) Returns the user port to try when trying to bind a service.static voidvalidateURL(URI uri) Validate that a given URI is actually a valid URL as well.static StringweakIntern(String s) String interning to reduce the memory usage.static scala.util.matching.RegexPattern for matching a Windows drive, which contains only a single alphabet character.static <T> TwithContextClassLoader(ClassLoader ctxClassLoader, scala.Function0<T> fn) Run a segment of code using a different context class loader in the current threadstatic <T> TwithDummyCallSite(SparkContext sc, scala.Function0<T> body) To avoid callingUtils.getCallSitefor every single RDD we create in the body, set a dummy call site that RDDs use instead.static voidwriteByteBuffer(ByteBuffer bb, DataOutput out) Primitive often used when writingByteBuffertoDataOutputstatic voidwriteByteBuffer(ByteBuffer bb, OutputStream out) Primitive often used when writingByteBuffertoOutputStream
-
Constructor Details
-
Utils
public Utils()
-
-
Method Details
-
random
-
DEFAULT_DRIVER_MEM_MB
public static int DEFAULT_DRIVER_MEM_MB()Define a default value for driver memory here since this value is referenced across the code base and nearly all files already use Utils.scala- Returns:
- (undocumented)
-
MAX_DIR_CREATION_ATTEMPTS
public static int MAX_DIR_CREATION_ATTEMPTS() -
LOCAL_SCHEME
Scheme used for files that are locally available on worker nodes in the cluster. -
deserializeLongValue
public static long deserializeLongValue(byte[] bytes) Deserialize a Long value (used forPythonPartitioner) -
serializeViaNestedStream
public static void serializeViaNestedStream(OutputStream os, SerializerInstance ser, scala.Function1<SerializationStream, scala.runtime.BoxedUnit> f) Serialize via nested stream using specific serializer -
deserializeViaNestedStream
public static void deserializeViaNestedStream(InputStream is, SerializerInstance ser, scala.Function1<DeserializationStream, scala.runtime.BoxedUnit> f) Deserialize via nested stream using specific serializer -
weakIntern
String interning to reduce the memory usage. -
withContextClassLoader
Run a segment of code using a different context class loader in the current thread- Parameters:
ctxClassLoader- (undocumented)fn- (undocumented)- Returns:
- (undocumented)
-
writeByteBuffer
Primitive often used when writingByteBuffertoDataOutput- Parameters:
bb- (undocumented)out- (undocumented)
-
writeByteBuffer
Primitive often used when writingByteBuffertoOutputStream- Parameters:
bb- (undocumented)out- (undocumented)
-
chmod700
JDK equivalent ofchmod 700 file.- Parameters:
file- the file whose permissions will be modified- Returns:
- true if the permissions were successfully changed, false otherwise.
-
createTempDir
Create a temporary directory inside the given parent directory. The directory will be automatically deleted when the VM shuts down.- Parameters:
root- (undocumented)namePrefix- (undocumented)- Returns:
- (undocumented)
-
copyStreamUpTo
Copy the firstmaxSizebytes of data from the InputStream to an in-memory buffer, primarily to check for corruption.This returns a new InputStream which contains the same data as the original input stream. It may be entirely on in-memory buffer, or it may be a combination of in-memory data, and then continue to read from the original stream. The only real use of this is if the original input stream will potentially detect corruption while the data is being read (e.g. from compression). This allows for an eager check of corruption in the first maxSize bytes of data.
- Parameters:
in- (undocumented)maxSize- (undocumented)- Returns:
- An InputStream which includes all data from the original stream (combining buffered data and remaining data in the original stream)
-
encodeFileNameToURIRawPath
A file name may contain some invalid URI characters, such as " ". This method will convert the file name to a raw path accepted byjava.net.URI(String).Note: the file name must not contain "/" or "\"
- Parameters:
fileName- (undocumented)- Returns:
- (undocumented)
-
encodeRelativeUnixPathToURIRawPath
Same asencodeFileNameToURIRawPath(java.lang.String)but returns the relative UNIX path.- Parameters:
path- (undocumented)- Returns:
- (undocumented)
-
decodeFileNameInURI
Get the file name from uri's raw path and decode it. If the raw path of uri ends with "/", return the name before the last "/".- Parameters:
uri- (undocumented)- Returns:
- (undocumented)
-
fetchFile
public static File fetchFile(String url, File targetDir, SparkConf conf, org.apache.hadoop.conf.Configuration hadoopConf, long timestamp, boolean useCache, boolean shouldUntar) Download a file or directory to target directory. Supports fetching the file in a variety of ways, including HTTP, Hadoop-compatible filesystems, and files on a standard filesystem, based on the URL parameter. Fetching directories is only supported from Hadoop-compatible filesystems.If
useCacheis true, first attempts to fetch the file to a local cache that's shared across executors running the same application.useCacheis used mainly for the executors, and not in local mode.Throws SparkException if the target file already exists and has different contents than the requested file.
If
shouldUntaris true, it untars the given url if it is a tar.gz or tgz intotargetDir. This is a legacy behavior, and users should better usespark.archivesconfiguration orSparkContext.addArchive- Parameters:
url- (undocumented)targetDir- (undocumented)conf- (undocumented)hadoopConf- (undocumented)timestamp- (undocumented)useCache- (undocumented)shouldUntar- (undocumented)- Returns:
- (undocumented)
-
unpack
Unpacks an archive file into the specified directory. It expects .jar, .zip, .tar.gz, .tgz and .tar files. This behaves same as Hadoop's archive in distributed cache. This method is basically copied fromorg.apache.hadoop.yarn.util.FSDownload.unpack.- Parameters:
source- (undocumented)dest- (undocumented)
-
timeTakenMs
Records the duration of running `body`. -
doFetchFile
public static File doFetchFile(String url, File targetDir, String filename, SparkConf conf, org.apache.hadoop.conf.Configuration hadoopConf) Download a file or directory to target directory. Supports fetching the file in a variety of ways, including HTTP, Hadoop-compatible filesystems, and files on a standard filesystem, based on the URL parameter. Fetching directories is only supported from Hadoop-compatible filesystems.Throws SparkException if the target file already exists and has different contents than the requested file.
- Parameters:
url- (undocumented)targetDir- (undocumented)filename- (undocumented)conf- (undocumented)hadoopConf- (undocumented)- Returns:
- (undocumented)
-
validateURL
Validate that a given URI is actually a valid URL as well.- Parameters:
uri- The URI to validate- Throws:
MalformedURLException
-
getLocalDir
Get the path of a temporary directory. Spark's local directories can be configured through multiple settings, which are used with the following precedence:- If called from inside of a YARN container, this will return a directory chosen by YARN. - If the SPARK_LOCAL_DIRS environment variable is set, this will return a directory from it. - Otherwise, if the spark.local.dir is set, this will return a directory from it. - Otherwise, this will return java.io.tmpdir.
Some of these configuration options might be lists of multiple paths, but this method will always return a single directory. The return directory is chosen randomly from the array of directories it gets from getOrCreateLocalRootDirs.
- Parameters:
conf- (undocumented)- Returns:
- (undocumented)
-
isInRunningSparkTask
public static boolean isInRunningSparkTask()Returns if the current codes are running in a Spark task, e.g., in executors.- Returns:
- (undocumented)
-
getConfiguredLocalDirs
Return the configured local directories where Spark can write files. This method does not create any directories on its own, it only encapsulates the logic of locating the local directories according to deployment mode.- Parameters:
conf- (undocumented)- Returns:
- (undocumented)
-
randomize
public static <T> scala.collection.immutable.Seq<T> randomize(scala.collection.IterableOnce<T> seq, scala.reflect.ClassTag<T> evidence$1) Shuffle the elements of a collection into a random order, returning the result in a new collection. Unlike scala.util.Random.shuffle, this method uses a local random number generator, avoiding inter-thread contention.- Parameters:
seq- (undocumented)evidence$1- (undocumented)- Returns:
- (undocumented)
-
randomizeInPlace
Shuffle the elements of an array into a random order, modifying the original array. Returns the original array.- Parameters:
arr- (undocumented)rand- (undocumented)- Returns:
- (undocumented)
-
setCustomHostname
Allow setting a custom host name- Parameters:
hostname- (undocumented)
-
localCanonicalHostName
Get the local machine's FQDN.- Returns:
- (undocumented)
-
localHostName
Get the local machine's hostname. In case of IPv6, getHostAddress may return '0:0:0:0:0:0:0:1'.- Returns:
- (undocumented)
-
localHostNameForURI
Get the local machine's URI.- Returns:
- (undocumented)
-
checkHost
Checks if the host contains only valid hostname/ip without port NOTE: Incase of IPV6 ip it should be enclosed inside []- Parameters:
host- (undocumented)
-
checkHostPort
-
parseHostPort
-
getUsedTimeNs
Return the string to tell how long has passed in milliseconds.- Parameters:
startTimeNs- - a timestamp in nanoseconds returned bySystem.nanoTime.- Returns:
- (undocumented)
-
deleteRecursively
Delete a file or directory and its contents recursively. Don't follow directories if they are symlinks. Throws an exception if deletion is unsuccessful.- Parameters:
file- (undocumented)
-
doesDirectoryContainAnyNewFiles
Determines if a directory contains any files newer than cutoff seconds.- Parameters:
dir- must be the path to a directory, or IllegalArgumentException is throwncutoff- measured in seconds. Returns true if there are any files or directories in the given directory whose last modified time is later than this many seconds ago- Returns:
- (undocumented)
-
timeStringAsMs
Convert a time parameter such as (50s, 100ms, or 250us) to milliseconds for internal use. If no suffix is provided, the passed number is assumed to be in ms.- Parameters:
str- (undocumented)- Returns:
- (undocumented)
-
timeStringAsSeconds
Convert a time parameter such as (50s, 100ms, or 250us) to seconds for internal use. If no suffix is provided, the passed number is assumed to be in seconds.- Parameters:
str- (undocumented)- Returns:
- (undocumented)
-
byteStringAsBytes
Convert a passed byte string (e.g. 50b, 100k, or 250m) to bytes for internal use.If no suffix is provided, the passed number is assumed to be in bytes.
- Parameters:
str- (undocumented)- Returns:
- (undocumented)
-
byteStringAsKb
Convert a passed byte string (e.g. 50b, 100k, or 250m) to kibibytes for internal use.If no suffix is provided, the passed number is assumed to be in kibibytes.
- Parameters:
str- (undocumented)- Returns:
- (undocumented)
-
byteStringAsMb
Convert a passed byte string (e.g. 50b, 100k, or 250m) to mebibytes for internal use.If no suffix is provided, the passed number is assumed to be in mebibytes.
- Parameters:
str- (undocumented)- Returns:
- (undocumented)
-
byteStringAsGb
Convert a passed byte string (e.g. 50b, 100k, or 250m, 500g) to gibibytes for internal use.If no suffix is provided, the passed number is assumed to be in gibibytes.
- Parameters:
str- (undocumented)- Returns:
- (undocumented)
-
memoryStringToMb
Convert a Java memory parameter passed to -Xmx (such as 300m or 1g) to a number of mebibytes.- Parameters:
str- (undocumented)- Returns:
- (undocumented)
-
bytesToString
Convert a quantity in bytes to a human-readable string such as "4.0 MiB".- Parameters:
size- (undocumented)- Returns:
- (undocumented)
-
bytesToString
-
msDurationToString
Returns a human-readable string representing a duration such as "35ms"- Parameters:
ms- (undocumented)- Returns:
- (undocumented)
-
megabytesToString
Convert a quantity in megabytes to a human-readable string such as "4.0 MiB".- Parameters:
megabytes- (undocumented)- Returns:
- (undocumented)
-
executeCommand
public static Process executeCommand(scala.collection.immutable.Seq<String> command, File workingDir, scala.collection.Map<String, String> extraEnvironment, boolean redirectStderr) Execute a command and return the process running the command.- Parameters:
command- (undocumented)workingDir- (undocumented)extraEnvironment- (undocumented)redirectStderr- (undocumented)- Returns:
- (undocumented)
-
executeAndGetOutput
public static String executeAndGetOutput(scala.collection.immutable.Seq<String> command, File workingDir, scala.collection.Map<String, String> extraEnvironment, boolean redirectStderr) Execute a command and get its output, throwing an exception if it yields a code other than 0.- Parameters:
command- (undocumented)workingDir- (undocumented)extraEnvironment- (undocumented)redirectStderr- (undocumented)- Returns:
- (undocumented)
-
processStreamByLine
public static Thread processStreamByLine(String threadName, InputStream inputStream, scala.Function1<String, scala.runtime.BoxedUnit> processLine) Return and start a daemon thread that processes the content of the input stream line by line.- Parameters:
threadName- (undocumented)inputStream- (undocumented)processLine- (undocumented)- Returns:
- (undocumented)
-
tryOrExit
public static void tryOrExit(scala.Function0<scala.runtime.BoxedUnit> block) Execute a block of code that evaluates to Unit, forwarding any uncaught exceptions to the default UncaughtExceptionHandlerNOTE: This method is to be called by the spark-started JVM process.
- Parameters:
block- (undocumented)
-
tryOrStopSparkContext
public static void tryOrStopSparkContext(SparkContext sc, scala.Function0<scala.runtime.BoxedUnit> block) Execute a block of code that evaluates to Unit, stop SparkContext if there is any uncaught exceptionNOTE: This method is to be called by the driver-side components to avoid stopping the user-started JVM process completely; in contrast, tryOrExit is to be called in the spark-started JVM process .
- Parameters:
sc- (undocumented)block- (undocumented)
-
tryLogNonFatalError
public static void tryLogNonFatalError(scala.Function0<scala.runtime.BoxedUnit> block) Executes the given block. Log non-fatal errors if any, and only throw fatal errors -
tryWithSafeFinallyAndFailureCallbacks
public static <T> T tryWithSafeFinallyAndFailureCallbacks(scala.Function0<T> block, scala.Function0<scala.runtime.BoxedUnit> catchBlock, scala.Function0<scala.runtime.BoxedUnit> finallyBlock) Execute a block of code and call the failure callbacks in the catch block. If exceptions occur in either the catch or the finally block, they are appended to the list of suppressed exceptions in original exception which is then rethrown.This is primarily an issue with
catch { abort() }orfinally { out.close() }blocks, where the abort/close needs to be called to clean upout, but if an exception happened inout.write, it's likelyoutmay be corrupted andabortorout.closewill fail as well. This would then suppress the original/likely more meaningful exception from the originalout.writecall.- Parameters:
block- (undocumented)catchBlock- (undocumented)finallyBlock- (undocumented)- Returns:
- (undocumented)
-
TRY_WITH_CALLER_STACKTRACE_FULL_STACKTRACE
-
doTryWithCallerStacktrace
public static <T> scala.util.Try<T> doTryWithCallerStacktrace(scala.Function0<T> f) Use Try with stacktrace substitution for the caller retrieving the error.Normally in case of failure, the exception would have the stacktrace of the caller that originally called doTryWithCallerStacktrace. However, we want to replace the part above this function with the stacktrace of the caller who calls getTryWithCallerStacktrace. So here we save the part of the stacktrace below doTryWithCallerStacktrace, and getTryWithCallerStacktrace will stitch it with the new stack trace of the caller. The full original stack trace is kept in ex.getSuppressed.
- Parameters:
f- Code block to be wrapped in Try- Returns:
- Try with Success or Failure of the code block. Use with getTryWithCallerStacktrace.
-
getTryWithCallerStacktrace
public static <T> T getTryWithCallerStacktrace(scala.util.Try<T> t) Retrieve the result of Try that was created by doTryWithCallerStacktrace.In case of failure, the resulting exception has a stack trace that combines the stack trace below the original doTryWithCallerStacktrace which triggered it, with the caller stack trace of the current caller of getTryWithCallerStacktrace.
Full stack trace of the original doTryWithCallerStacktrace caller can be retrieved with
ex.getSuppressed.find { e => e.isInstanceOf[Utils.OriginalTryStackTraceException] }- Parameters:
t- Try from doTryWithCallerStacktrace- Returns:
- Result of the Try or rethrows the failure exception with modified stacktrace.
-
getCallSite
When called inside a class in the spark package, returns the name of the user code class (outside the spark package) that called into Spark, as well as which Spark method they called. This is used, for example, to tell users where in their code each RDD got created.- Parameters:
skipClass- Function that is used to exclude non-user-code classes.- Returns:
- (undocumented)
-
getFileLength
Return the file length, if the file is compressed it returns the uncompressed file length. It also caches the uncompressed file size to avoid repeated decompression. The cache size is read from workerConf.- Parameters:
file- (undocumented)workConf- (undocumented)- Returns:
- (undocumented)
-
offsetBytes
Return a string containing part of a file from byte 'start' to 'end'. -
offsetBytes
public static String offsetBytes(scala.collection.immutable.Seq<File> files, scala.collection.immutable.Seq<Object> fileLengths, long start, long end) Return a string containing data across a set of files. ThestartIndexandendIndexis based on the cumulative size of all the files take in the given order. See figure below for more details.- Parameters:
files- (undocumented)fileLengths- (undocumented)start- (undocumented)end- (undocumented)- Returns:
- (undocumented)
-
clone
public static <T> T clone(T value, SerializerInstance serializer, scala.reflect.ClassTag<T> evidence$2) Clone an object using a Spark serializer.- Parameters:
value- (undocumented)serializer- (undocumented)evidence$2- (undocumented)- Returns:
- (undocumented)
-
splitCommandString
Split a string of potentially quoted arguments from the command line the way that a shell would do it to determine arguments to a command. For example, if the string is 'a "b c" d', then it would be parsed as three arguments: 'a', 'b c' and 'd'.- Parameters:
s- (undocumented)- Returns:
- (undocumented)
-
nonNegativeMod
public static int nonNegativeMod(int x, int mod) -
nonNegativeHash
-
getSystemProperties
Returns the system properties map that is thread-safe to iterator over. It gets the properties which have been set explicitly, as well as those for which only a default value has been defined.- Returns:
- (undocumented)
-
times
public static void times(int numIters, scala.Function0<scala.runtime.BoxedUnit> f) Method executed for repeating a task for side effects. Unlike a for comprehension, it permits JVM JIT optimization- Parameters:
numIters- (undocumented)f- (undocumented)
-
timeIt
public static long timeIt(int numIters, scala.Function0<scala.runtime.BoxedUnit> f, scala.Option<scala.Function0<scala.runtime.BoxedUnit>> prepare) Timing method based on iterations that permit JVM JIT optimization.- Parameters:
numIters- number of iterationsf- function to be executed. If prepare is not None, the running time of each call to f must be an order of magnitude longer than one nanosecond for accurate timing.prepare- function to be executed before each call to f. Its running time doesn't count.- Returns:
- the total time across all iterations (not counting preparation time) in nanoseconds.
-
getIteratorSize
Counts the number of elements of an iterator.- Parameters:
iterator- (undocumented)- Returns:
- (undocumented)
-
getIteratorZipWithIndex
public static <T> scala.collection.Iterator<scala.Tuple2<T,Object>> getIteratorZipWithIndex(scala.collection.Iterator<T> iter, long startIndex) Generate a zipWithIndex iterator, avoid index value overflowing problem in scala's zipWithIndex- Parameters:
iter- (undocumented)startIndex- (undocumented)- Returns:
- (undocumented)
-
symlink
Creates a symlink.- Parameters:
src- absolute path to the sourcedst- relative path for the destination
-
getHadoopFileSystem
public static org.apache.hadoop.fs.FileSystem getHadoopFileSystem(URI path, org.apache.hadoop.conf.Configuration conf) Return a Hadoop FileSystem with the scheme encoded in the given path.- Parameters:
path- (undocumented)conf- (undocumented)- Returns:
- (undocumented)
-
getHadoopFileSystem
public static org.apache.hadoop.fs.FileSystem getHadoopFileSystem(String path, org.apache.hadoop.conf.Configuration conf) Return a Hadoop FileSystem with the scheme encoded in the given path.- Parameters:
path- (undocumented)conf- (undocumented)- Returns:
- (undocumented)
-
isWindows
public static boolean isWindows()Whether the underlying operating system is Windows.- Returns:
- (undocumented)
-
isMac
public static boolean isMac()Whether the underlying operating system is Mac OS X.- Returns:
- (undocumented)
-
isJavaVersionAtLeast21
public static boolean isJavaVersionAtLeast21()Whether the underlying Java version is at least 21.- Returns:
- (undocumented)
-
isMacOnAppleSilicon
public static boolean isMacOnAppleSilicon()Whether the underlying operating system is Mac OS X and processor is Apple Silicon.- Returns:
- (undocumented)
-
preferIPv6
public static boolean preferIPv6()Whether the underlying JVM prefer IPv6 addresses.- Returns:
- (undocumented)
-
windowsDrive
public static scala.util.matching.Regex windowsDrive()Pattern for matching a Windows drive, which contains only a single alphabet character.- Returns:
- (undocumented)
-
terminateProcess
Terminates a process waiting for at most the specified duration.- Parameters:
process- (undocumented)timeoutMs- (undocumented)- Returns:
- the process exit value if it was successfully terminated, else None
-
getStderr
Return the stderr of a process after waiting for the process to terminate. If the process does not terminate within the specified timeout, return None.- Parameters:
process- (undocumented)timeoutMs- (undocumented)- Returns:
- (undocumented)
-
logUncaughtExceptions
public static <T> T logUncaughtExceptions(scala.Function0<T> f) Execute the given block, logging and re-throwing any uncaught exception. This is particularly useful for wrapping code that runs in a thread, to ensure that exceptions are printed, and to avoid having to catch Throwable.- Parameters:
f- (undocumented)- Returns:
- (undocumented)
-
tryLog
public static <T> scala.util.Try<T> tryLog(scala.Function0<T> f) Executes the given block in a Try, logging any uncaught exceptions. -
isFatalError
Returns true if the given exception was fatal. See docs for scala.util.control.NonFatal. -
resolveURIs
Resolve a comma-separated list of paths. -
nonLocalPaths
Return all non-local paths from a comma-separated list of paths. -
loadDefaultSparkProperties
Load default Spark properties from the given file. If no file is provided, use the common defaults file. This mutates state in the given SparkConf and in this JVM's system properties if the config specified in the file is not already set. Return the path of the properties file used.- Parameters:
conf- (undocumented)filePath- (undocumented)- Returns:
- (undocumented)
-
getPropertiesFromFile
Load properties present in the given file. -
getDefaultPropertiesFile
Return the path of the default Spark properties file. -
exceptionString
Return a nice string representation of the exception. It will call "printStackTrace" to recursively generate the stack trace including the exception and its causes.- Parameters:
e- (undocumented)- Returns:
- (undocumented)
-
getThreadDump
Return a thread dump of all threads' stacktraces. Used to capture dumps for the web UI -
getHeapHistogram
Return a heap dump. Used to capture dumps for the web UI -
getThreadDumpForThread
-
sparkJavaOpts
public static scala.collection.immutable.Seq<String> sparkJavaOpts(SparkConf conf, scala.Function1<String, Object> filterKey) Convert all spark properties set in the given SparkConf to a sequence of java options.- Parameters:
conf- (undocumented)filterKey- (undocumented)- Returns:
- (undocumented)
-
portMaxRetries
Maximum number of retries when binding to a port before giving up.- Parameters:
conf- (undocumented)- Returns:
- (undocumented)
-
userPort
public static int userPort(int base, int offset) Returns the user port to try when trying to bind a service. Handles wrapping and skipping privileged ports.- Parameters:
base- (undocumented)offset- (undocumented)- Returns:
- (undocumented)
-
startServiceOnPort
public static <T> scala.Tuple2<T,Object> startServiceOnPort(int startPort, scala.Function1<Object, scala.Tuple2<T, Object>> startService, SparkConf conf, String serviceName) Attempt to start a service on the given port, or fail after a number of attempts. Use a shared configuration for the maximum number of port retries.- Parameters:
startPort- (undocumented)startService- (undocumented)conf- (undocumented)serviceName- (undocumented)- Returns:
- (undocumented)
-
startServiceOnPort
public static <T> scala.Tuple2<T,Object> startServiceOnPort(int startPort, scala.Function1<Object, scala.Tuple2<T, Object>> startService, int maxRetries, String serviceName) Attempt to start a service on the given port, or fail after a number of attempts. Each subsequent attempt uses 1 + the port used in the previous attempt (unless the port is 0).- Parameters:
startPort- The initial port to start the service on.startService- Function to start service on a given port. This is expected to throw java.net.BindException on port collision.maxRetries- The maximum number of retries when binding to a port.serviceName- Name of the service.- Returns:
- (service: T, port: Int)
-
isBindCollision
Return whether the exception is caused by an address-port collision when binding.- Parameters:
exception- (undocumented)- Returns:
- (undocumented)
-
setLogLevel
public static void setLogLevel(org.apache.logging.log4j.Level l) configure a new log4j level- Parameters:
l- (undocumented)
-
setLogLevelIfNeeded
-
getLogLevel
Get current log level- Returns:
- (undocumented)
-
libraryPathEnvName
Return the current system LD_LIBRARY_PATH name- Returns:
- (undocumented)
-
libraryPathEnvPrefix
Return the prefix of a command that appends the given library paths to the system-specific library path environment variable. On Unix, for instance, this returns the string LD_LIBRARY_PATH="path1:path2:$LD_LIBRARY_PATH".- Parameters:
libraryPaths- (undocumented)- Returns:
- (undocumented)
-
getSparkOrYarnConfig
Return the value of a config either through the SparkConf or the Hadoop configuration. We Check whether the key is set in the SparkConf before look at any Hadoop configuration. If the key is set in SparkConf, no matter whether it is running on YARN or not, gets the value from SparkConf. Only when the key is not set in SparkConf and running on YARN, gets the value from Hadoop configuration.- Parameters:
conf- (undocumented)key- (undocumented)default_- (undocumented)- Returns:
- (undocumented)
-
extractHostPortFromSparkUrl
public static scala.Tuple2<String,Object> extractHostPortFromSparkUrl(String sparkUrl) throws SparkException Return a pair of host and port extracted from thesparkUrl.A spark url (
spark://host:port) is a special URI that its scheme issparkand only contains host and port.- Parameters:
sparkUrl- (undocumented)- Returns:
- (undocumented)
- Throws:
SparkException- if sparkUrl is invalid.
-
getCurrentUserName
Returns the current user name. This is the currently logged in user, unless that's been overridden by theSPARK_USERenvironment variable.- Returns:
- (undocumented)
-
EMPTY_USER_GROUPS
-
getCurrentUserGroups
-
parseStandaloneMasterUrls
Split the comma delimited string of master URLs into a list. For instance, "spark://abc,def" becomes [spark://abc, spark://def].- Parameters:
masterUrls- (undocumented)- Returns:
- (undocumented)
-
BACKUP_STANDALONE_MASTER_PREFIX
An identifier that backup masters use in their responses. -
responseFromBackup
Return true if the response message is sent from a backup Master on standby. -
withDummyCallSite
To avoid callingUtils.getCallSitefor every single RDD we create in the body, set a dummy call site that RDDs use instead. This is for performance optimization.- Parameters:
sc- (undocumented)body- (undocumented)- Returns:
- (undocumented)
-
isInDirectory
Return whether the specified file is a parent directory of the child file.- Parameters:
parent- (undocumented)child- (undocumented)- Returns:
- (undocumented)
-
isLocalMaster
- Parameters:
conf- (undocumented)- Returns:
- whether it is local mode
-
isPushBasedShuffleEnabled
public static boolean isPushBasedShuffleEnabled(SparkConf conf, boolean isDriver, boolean checkSerializer) Push based shuffle can only be enabled when below conditions are met: - the application is submitted to run in YARN mode - external shuffle service enabled - IO encryption disabled - serializer(such as KryoSerializer) supports relocation of serialized objects- Parameters:
conf- (undocumented)isDriver- (undocumented)checkSerializer- (undocumented)- Returns:
- (undocumented)
-
instantiateSerializerOrShuffleManager
-
instantiateSerializerFromConf
-
isDynamicAllocationEnabled
Return whether dynamic allocation is enabled in the given conf.- Parameters:
conf- (undocumented)- Returns:
- (undocumented)
-
isStreamingDynamicAllocationEnabled
-
getDynamicAllocationInitialExecutors
Return the initial number of executors for dynamic allocation.- Parameters:
conf- (undocumented)- Returns:
- (undocumented)
-
tempFileWith
Returns a path of temporary file which is in the same directory withpath.- Parameters:
path- (undocumented)- Returns:
- (undocumented)
-
getProcessName
Returns the name of this JVM process. This is OS dependent but typically (OSX, Linux, Windows), this is formatted as PID@hostname.- Returns:
- (undocumented)
-
initDaemon
public static void initDaemon(org.slf4j.Logger log) Utility function that should be called early inmain()for daemons to set up some common diagnostic state.- Parameters:
log- (undocumented)
-
resetStructuredLogging
public static void resetStructuredLogging()Utility function to enable or disable structured logging based on system properties. This is designed for a code path which we cannot use SparkConf yet, and should be used before the first invocation ofLogging.log(). For example, this should be used beforeinitDaemon. -
resetStructuredLogging
Utility function to enable or disable structured logging based on SparkConf. This is designed for a code path which logging system may be initilized before loading SparkConf.- Parameters:
sparkConf- (undocumented)
-
getUserJars
Return the jar files pointed by the "spark.jars" property. Spark internally will distribute these jars through file server. In the YARN mode, it will return an empty list, since YARN has its own mechanism to distribute jars.- Parameters:
conf- (undocumented)- Returns:
- (undocumented)
-
getLocalUserJarsForShell
Return the local jar files which will be added to REPL's classpath. These jar files are specified by --jars (spark.jars) or --packages, remote jars will be downloaded to local by SparkSubmit at first.- Parameters:
conf- (undocumented)- Returns:
- (undocumented)
-
redact
public static scala.collection.Seq<scala.Tuple2<String,String>> redact(SparkConf conf, scala.collection.Seq<scala.Tuple2<String, String>> kvs) Redact the sensitive values in the given map. If a map key matches the redaction pattern then its value is replaced with a dummy text.- Parameters:
conf- (undocumented)kvs- (undocumented)- Returns:
- (undocumented)
-
redact
public static <K,V> scala.collection.Seq<scala.Tuple2<K,V>> redact(scala.Option<scala.util.matching.Regex> regex, scala.collection.Seq<scala.Tuple2<K, V>> kvs) Redact the sensitive values in the given map. If a map key matches the redaction pattern then its value is replaced with a dummy text.- Parameters:
regex- (undocumented)kvs- (undocumented)- Returns:
- (undocumented)
-
redact
Redact the sensitive information in the given string.- Parameters:
regex- (undocumented)text- (undocumented)- Returns:
- (undocumented)
-
redact
public static scala.collection.Seq<scala.Tuple2<String,String>> redact(scala.collection.Map<String, String> kvs) Looks up the redaction regex from within the key value pairs and uses it to redact the rest of the key value pairs. No care is taken to make sure the redaction property itself is not redacted. So theoretically, the property itself could be configured to redact its own value when printing.- Parameters:
kvs- (undocumented)- Returns:
- (undocumented)
-
redactCommandLineArgs
-
loadExtensions
public static <T> scala.collection.immutable.Seq<T> loadExtensions(Class<T> extClass, scala.collection.immutable.Seq<String> classes, SparkConf conf) Create instances of extension classes.The classes in the given list must: - Be sub-classes of the given base class. - Provide either a no-arg constructor, or a 1-arg constructor that takes a SparkConf.
The constructors are allowed to throw "UnsupportedOperationException" if the extension does not want to be registered; this allows the implementations to check the Spark configuration (or other state) and decide they do not need to be added. A log message is printed in that case. Other exceptions are bubbled up.
- Parameters:
extClass- (undocumented)classes- (undocumented)conf- (undocumented)- Returns:
- (undocumented)
-
checkAndGetK8sMasterUrl
Check the validity of the given Kubernetes master URL and return the resolved URL. Prefix "k8s://" is appended to the resolved URL as the prefix is used by KubernetesClusterManager in canCreate to determine if the KubernetesClusterManager should be used.- Parameters:
rawMasterURL- (undocumented)- Returns:
- (undocumented)
-
substituteAppNExecIds
Replaces all the {{EXECUTOR_ID}} occurrences with the Executor Id and {{APP_ID}} occurrences with the App Id.- Parameters:
opt- (undocumented)appId- (undocumented)execId- (undocumented)- Returns:
- (undocumented)
-
substituteAppId
Replaces all the {{APP_ID}} occurrences with the App Id.- Parameters:
opt- (undocumented)appId- (undocumented)- Returns:
- (undocumented)
-
createSecret
-
stringHalfWidth
Return the number of half widths in a given string. Note that a full width character occupies two half widths.For a string consisting of 1 million characters, the execution of this method requires about 50ms.
- Parameters:
str- (undocumented)- Returns:
- (undocumented)
-
sanitizeDirName
-
nameForAppAndAttempt
-
isClientMode
-
isLocalUri
Returns whether the URI is a "local:" URI. -
getUriBuilder
Create a UriBuilder from URI object. -
getUriBuilder
Create a UriBuilder from URI string. -
isFileSplittable
public static boolean isFileSplittable(org.apache.hadoop.fs.Path path, org.apache.hadoop.io.compress.CompressionCodecFactory codecFactory) Check whether the file of the path is splittable. -
cloneProperties
Create a new properties object with the same values as `props` -
buildLocationMetadata
public static String buildLocationMetadata(scala.collection.immutable.Seq<org.apache.hadoop.fs.Path> paths, int stopAppendingThreshold) Convert a sequence ofPaths to a metadata string. When the length of metadata string exceedsstopAppendingThreshold, stop appending paths for saving memory.- Parameters:
paths- (undocumented)stopAppendingThreshold- (undocumented)- Returns:
- (undocumented)
-
executorOffHeapMemorySizeAsMb
Convert MEMORY_OFFHEAP_SIZE to MB Unit, return 0 if MEMORY_OFFHEAP_ENABLED is false.- Parameters:
sparkConf- (undocumented)- Returns:
- (undocumented)
-
checkOffHeapEnabled
return 0 if MEMORY_OFFHEAP_ENABLED is false.- Parameters:
sparkConf- (undocumented)offHeapSize- (undocumented)- Returns:
- (undocumented)
-
createFailedToGetTokenMessage
public static org.apache.spark.internal.MessageWithContext createFailedToGetTokenMessage(String serviceName, Throwable e) Returns a string message about delegation token generation failure -
unzipFilesFromFile
public static scala.collection.immutable.Seq<File> unzipFilesFromFile(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path dfsZipFile, File localDir) Decompress a zip file into a local dir. File names are read from the zip file. Note, we skip addressing the directory here. Also, we rely on the caller side to address any exceptions.- Parameters:
fs- (undocumented)dfsZipFile- (undocumented)localDir- (undocumented)- Returns:
- (undocumented)
-
createResourceUninterruptiblyIfInTaskThread
public static <R extends Closeable> R createResourceUninterruptiblyIfInTaskThread(scala.Function0<R> createResource) Create a resource uninterruptibly if we are in a task thread (i.e., TaskContext.get() != null). Otherwise, create the resource normally. This is mainly used in the situation where we want to create a multi-layer resource in a task thread. The uninterruptible behavior ensures we don't leak the underlying resources when there is a task cancellation request,- Parameters:
createResource- (undocumented)- Returns:
- (undocumented)
-
median
public static long median(long[] sizes, boolean alreadySorted) Return the median number of a long array- Parameters:
sizes-alreadySorted-- Returns:
-
checkCommandAvailable
Check if a command is available.- Parameters:
command- (undocumented)- Returns:
- (undocumented)
-
isG1GC
public static boolean isG1GC() -
org$apache$spark$internal$Logging$$log_
public static org.slf4j.Logger org$apache$spark$internal$Logging$$log_() -
org$apache$spark$internal$Logging$$log__$eq
public static void org$apache$spark$internal$Logging$$log__$eq(org.slf4j.Logger x$1) -
LogStringContext
public static org.apache.spark.internal.Logging.LogStringContext LogStringContext(scala.StringContext sc) -
getSparkClassLoader
-
getContextOrSparkClassLoader
-
classForName
-
classForName$default$2
public static <C> boolean classForName$default$2() -
classForName$default$3
public static <C> boolean classForName$default$3() -
classIsLoadable
-
classIsLoadableAndAssignableFrom
-
getFormattedClassName
-
getSimpleName
-
stripDollars
-
isTesting
public static boolean isTesting() -
tryOrIOException
public static <T> T tryOrIOException(scala.Function0<T> block) -
tryWithResource
public static <R extends Closeable,T> T tryWithResource(scala.Function0<R> createResource, scala.Function1<R, T> f) -
tryInitializeResource
public static <R extends Closeable,T> T tryInitializeResource(scala.Function0<R> createResource, scala.Function1<R, T> initialize) -
tryWithSafeFinally
public static <T> T tryWithSafeFinally(scala.Function0<T> block, scala.Function0<scala.runtime.BoxedUnit> finallyBlock) -
stackTraceToString
-
resolveURI
-
recursiveList
-
createDirectory
-
createDirectory
-
createDirectory$default$2
-
createTempDir$default$1
-
createTempDir$default$2
-
serialize
public static <T> byte[] serialize(T o) -
deserialize
public static <T> T deserialize(byte[] bytes) -
deserialize
-
copyStream
public static long copyStream(InputStream in, OutputStream out, boolean closeStreams, boolean transferToEnabled) -
copyStream$default$3
public static boolean copyStream$default$3() -
copyStream$default$4
public static boolean copyStream$default$4() -
copyFileStreamNIO
public static void copyFileStreamNIO(FileChannel input, WritableByteChannel output, long startPosition, long bytesToCopy) -
stringToSeq
-