Complete Function Reference
Siege Utilities contains 521 auto-discovered functions.
siege_utilities.core.logging
- siege_utilities.get_logger(name=None)[source]
Return a logger instance.
- Parameters:
name (str, optional) – Logger name. If None, returns/creates the default logger.
- Returns:
Logger instance.
- Return type:
logging.Logger
Examples
>>> logger = get_logger() # Gets default logger >>> db_logger = get_logger("database") # Gets database logger >>> api_logger = get_logger("api") # Gets API logger
- siege_utilities.init_logger(name='siege_utilities', log_to_file=False, log_dir='logs', level='INFO', max_bytes=5000000, backup_count=5, shared_log_file=None)[source]
Initialize and configure a named logger.
- Parameters:
name (str) – Logger name. Each component can have its own logger.
log_to_file (bool) – If True, creates individual log file (unless shared_log_file specified).
log_dir (str) – Directory for individual log files.
level (str|int) – Logging level for this logger.
max_bytes (int) – Max size for rotating file handler.
backup_count (int) – How many backup logs to keep.
shared_log_file (str) – If provided, this logger writes to shared file instead of individual file.
- Returns:
Configured logger instance.
- Return type:
logging.Logger
Examples
>>> # Individual loggers with separate files >>> db_logger = init_logger("database", log_to_file=True, level="DEBUG") >>> api_logger = init_logger("api", log_to_file=True, level="INFO")
>>> # Multiple loggers sharing one file (great for Spark!) >>> worker1 = init_logger("worker_1", shared_log_file="spark_workers.log") >>> worker2 = init_logger("worker_2", shared_log_file="spark_workers.log")
>>> # Use global shared configuration >>> configure_shared_logging("/shared/logs/app.log") >>> logger1 = init_logger("component_1") # Automatically uses shared file >>> logger2 = init_logger("component_2") # Automatically uses shared file
- siege_utilities.log_critical(message, logger_name=None)[source]
Log a critical message.
- Parameters:
message – Message to log
logger_name (str, optional) – Specific logger to use. Uses default if None.
Examples
>>> log_critical("Critical system error") >>> log_critical("Spark cluster failure", logger_name="spark_master")
- siege_utilities.log_debug(message, logger_name=None)[source]
Log a debug message.
- Parameters:
message – Message to log
logger_name (str, optional) – Specific logger to use. Uses default if None.
Examples
>>> log_debug("Debug information") >>> log_debug("Database query details", logger_name="database")
- siege_utilities.log_error(message, logger_name=None)[source]
Log an error message.
- Parameters:
message – Message to log
logger_name (str, optional) – Specific logger to use. Uses default if None.
Examples
>>> log_error("An error occurred") >>> log_error("Database connection failed", logger_name="database")
- siege_utilities.log_info(message: str, logger_name=None) None [source]
Log an info message.
- Parameters:
message (str) – Message to log
logger_name (str, optional) – Specific logger to use. Uses default if None.
Examples
>>> log_info("Application started") >>> log_info("Worker processing task", logger_name="worker_1")
- siege_utilities.log_warning(message, logger_name=None)[source]
Log a warning message.
- Parameters:
message – Message to log
logger_name (str, optional) – Specific logger to use. Uses default if None.
Examples
>>> log_warning("This is a warning") >>> log_warning("Cache miss detected", logger_name="cache")
siege_utilities.core.string_utils
siege_utilities.distributed.hdfs_config
- siege_utilities.create_cluster_config(data_path: str, **kwargs) HDFSConfig [source]
Create config optimized for cluster deployment
- siege_utilities.create_geocoding_config(data_path: str, **kwargs) HDFSConfig [source]
Create config optimized for geocoding workloads
- siege_utilities.create_hdfs_config(**kwargs) HDFSConfig [source]
Factory function to create HDFS configuration
- siege_utilities.create_local_config(data_path: str, **kwargs) HDFSConfig [source]
Create config optimized for local development
- siege_utilities.dataclass(cls=None, /, *, init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False, match_args=True, kw_only=False, slots=False, weakref_slot=False)[source]
Add dunder methods based on the fields defined in the class.
Examines PEP 526 __annotations__ to determine fields.
If init is true, an __init__() method is added to the class. If repr is true, a __repr__() method is added. If order is true, rich comparison dunder methods are added. If unsafe_hash is true, a __hash__() method is added. If frozen is true, fields may not be assigned to after instance creation. If match_args is true, the __match_args__ tuple is added. If kw_only is true, then by default all fields are keyword-only. If slots is true, a new class with a __slots__ attribute is returned.
siege_utilities.distributed.hdfs_legacy
- siege_utilities.get_quick_file_signature(file_path)[source]
“”” Perform file operations: get quick file signature.
Part of Siege Utilities File Operations module. Auto-discovered and available at package level.
- Returns:
Description needed
Example
>>> import siege_utilities >>> result = siege_utilities.get_quick_file_signature() >>> print(result)
Note
This function is auto-discovered and available without imports across all siege_utilities modules.
“””
siege_utilities.distributed.hdfs_operations
siege_utilities.distributed.spark_utils
siege_utilities.files.hashing
- siege_utilities.calculate_file_hash(file_path) str | None [source]
Alias for get_file_hash with SHA256 - for backward compatibility
- siege_utilities.generate_sha256_hash_for_file(file_path) str | None [source]
Generate SHA256 hash for a file - chunked reading for large files
- Parameters:
file_path – Path to the file (str or Path object)
- Returns:
SHA256 hash as hexadecimal string, or None if error
- siege_utilities.get_file_hash(file_path, algorithm='sha256') str | None [source]
Generate hash for a file using specified algorithm
- Parameters:
file_path – Path to the file (str or Path object)
algorithm – Hash algorithm to use (‘sha256’, ‘md5’, ‘sha1’, etc.)
- Returns:
Hash as hexadecimal string, or None if error
- siege_utilities.verify_file_integrity(file_path, expected_hash, algorithm='sha256') bool [source]
Verify file integrity by comparing with expected hash
- Parameters:
file_path – Path to the file
expected_hash – Expected hash value
algorithm – Hash algorithm used
- Returns:
True if file matches expected hash, False otherwise
siege_utilities.files.operations
- siege_utilities.check_for_file_type_in_directory(target_file_path: Path, file_type: str) bool [source]
- Parameters:
target_file_path
file_type
- Returns:
bool
- siege_utilities.check_if_file_exists_at_path(target_file_path: Path) bool [source]
- Parameters:
target_file_path – This is the path we are going to check to see if a file exists
- Returns:
True if file exists, False otherwise
- siege_utilities.count_duplicate_rows_in_file_using_awk(target_file_path: Path) int [source]
“This uses an awk pattern from Justin Hernandez to count duplicate rows in file” :param target_file_path: pathlib.Path object that we are going to count the duplicate rows of :return: count of duplicate rows in file
- siege_utilities.count_empty_rows_in_file_pythonically(target_file_path: Path) int [source]
- Parameters:
target_file_path – pathlib.Path object that we are going to count the empty rows of
- Returns:
count of empty rows in file
- siege_utilities.count_empty_rows_in_file_using_awk(target_file_path: Path) int [source]
- Parameters:
target_file_path – pathlib.Path object that we are going to count the empty rows of
- Returns:
count of empty rows in file
- siege_utilities.count_total_rows_in_file_pythonically(target_file_path: Path) int [source]
- Parameters:
target_file_path – pathlib.Path object that we are going to count the rows of
- Returns:
count of total rows in file
- siege_utilities.count_total_rows_in_file_using_sed(target_file_path: Path) int [source]
- Parameters:
target_file_path – pathlib.Path object that we are going to count the total rows of
- Returns:
count of total rows in file
- siege_utilities.delete_existing_file_and_replace_it_with_an_empty_file(target_file_path: Path) Path [source]
This function deletes the existing file and replaces it with an empty file. :param target_file_path: Pathlib.path object to interact with :return: pathlib.Path object to interact with
- siege_utilities.remove_empty_rows_in_file_using_sed(target_file_path: Path, fixed_file_path: Path = None)[source]
- Parameters:
target_file_path – pathlib.Path object that we are going to remove the empty rows of
target_file_path – pathlib.Path object to path for saved fixed file
- Returns:
- siege_utilities.rmtree(f: Path)[source]
“”” Utility function: rmtree.
Part of Siege Utilities Utilities module. Auto-discovered and available at package level.
- Returns:
Description needed
Example
>>> import siege_utilities >>> result = siege_utilities.rmtree() >>> print(result)
Note
This function is auto-discovered and available without imports across all siege_utilities modules.
“””
siege_utilities.files.paths
- siege_utilities.ensure_path_exists(desired_path: Path) Path [source]
“”” Perform file operations: ensure path exists.
Part of Siege Utilities File Operations module. Auto-discovered and available at package level.
- Returns:
Description needed
Example
>>> import siege_utilities >>> result = siege_utilities.ensure_path_exists() >>> print(result)
Note
This function is auto-discovered and available without imports across all siege_utilities modules.
“””
- siege_utilities.unzip_file_to_its_own_directory(path_to_zipfile: Path, new_dir_name=None, new_dir_parent=None)[source]
“”” Perform file operations: unzip file to its own directory.
Part of Siege Utilities File Operations module. Auto-discovered and available at package level.
- Returns:
Description needed
Example
>>> import siege_utilities >>> result = siege_utilities.unzip_file_to_its_own_directory() >>> print(result)
Note
This function is auto-discovered and available without imports across all siege_utilities modules.
“””
siege_utilities.files.remote
- siege_utilities.download_file(url, local_filename)[source]
Download a file from a URL to a local file with progress bar
- Parameters:
url – The URL to download from
local_filename – The local path where the file should be saved
- Returns:
The local filename if successful, False otherwise
- siege_utilities.generate_local_path_from_url(url: str, directory_path: Path, as_string: bool = True)[source]
“”” Perform file operations: generate local path from url.
Part of Siege Utilities File Operations module. Auto-discovered and available at package level.
- Returns:
Description needed
Example
>>> import siege_utilities >>> result = siege_utilities.generate_local_path_from_url() >>> print(result)
Note
This function is auto-discovered and available without imports across all siege_utilities modules.
“””