Spark Utilities๏
The Spark utilities module provides a comprehensive collection of PySpark functions for data manipulation, mathematical operations, and data processing.
Module Overview๏
Functions by Category๏
Mathematical Functions๏
Array Functions๏
Aggregation Functions๏
Cryptographic Functions๏
Usage Examples๏
Basic mathematical operations:
from pyspark.sql import SparkSession
from pyspark.sql.functions import col
import siege_utilities
spark = SparkSession.builder.appName("MathExample").getOrCreate()
# Create sample data
data = [("A", 1.5), ("B", -2.3), ("C", 0.0)]
df = spark.createDataFrame(data, ["id", "value"])
# Apply mathematical functions
df = df.withColumn("abs_value", siege_utilities.abs(col("value")))
df = df.withColumn("acos_value", siege_utilities.acos(col("value")))
df.show()
Array operations:
# Array manipulation
df = df.withColumn("array_col", siege_utilities.array(col("id"), col("value")))
df = df.withColumn("distinct_array", siege_utilities.array_distinct(col("array_col")))
df = df.withColumn("array_contains", siege_utilities.array_contains(col("array_col"), "A"))
# Array aggregation
df = df.groupBy("id").agg(
siege_utilities.array_agg(col("value")).alias("all_values")
)
Date operations:
from pyspark.sql.functions import current_date
# Add months to current date
df = df.withColumn("future_date",
siege_utilities.add_months(current_date(), 3))
Unit Tests๏
The Spark utilities module has comprehensive test coverage:
โ
test_spark_utils.py - All Spark utility tests pass
Test Coverage:
- Mathematical functions (abs, acos, acosh)
- Array operations (creation, manipulation, aggregation)
- Date functions (add_months)
- Aggregation functions (aggregate, any_value)
- Cryptographic functions (AES encryption/decryption)
- Edge cases and error handling
Test Results: All Spark utility tests pass successfully with comprehensive coverage.