Spark Utilities๏ƒ

The Spark utilities module provides a comprehensive collection of PySpark functions for data manipulation, mathematical operations, and data processing.

Module Overview๏ƒ

Functions by Category๏ƒ

Mathematical Functions๏ƒ

Array Functions๏ƒ

Aggregation Functions๏ƒ

Cryptographic Functions๏ƒ

Usage Examples๏ƒ

Basic mathematical operations:

from pyspark.sql import SparkSession
from pyspark.sql.functions import col
import siege_utilities

spark = SparkSession.builder.appName("MathExample").getOrCreate()

# Create sample data
data = [("A", 1.5), ("B", -2.3), ("C", 0.0)]
df = spark.createDataFrame(data, ["id", "value"])

# Apply mathematical functions
df = df.withColumn("abs_value", siege_utilities.abs(col("value")))
df = df.withColumn("acos_value", siege_utilities.acos(col("value")))

df.show()

Array operations:

# Array manipulation
df = df.withColumn("array_col", siege_utilities.array(col("id"), col("value")))
df = df.withColumn("distinct_array", siege_utilities.array_distinct(col("array_col")))
df = df.withColumn("array_contains", siege_utilities.array_contains(col("array_col"), "A"))

# Array aggregation
df = df.groupBy("id").agg(
    siege_utilities.array_agg(col("value")).alias("all_values")
)

Date operations:

from pyspark.sql.functions import current_date

# Add months to current date
df = df.withColumn("future_date",
                   siege_utilities.add_months(current_date(), 3))

Unit Tests๏ƒ

The Spark utilities module has comprehensive test coverage:

โœ… test_spark_utils.py - All Spark utility tests pass

Test Coverage:
- Mathematical functions (abs, acos, acosh)
- Array operations (creation, manipulation, aggregation)
- Date functions (add_months)
- Aggregation functions (aggregate, any_value)
- Cryptographic functions (AES encryption/decryption)
- Edge cases and error handling

Test Results: All Spark utility tests pass successfully with comprehensive coverage.