Complete Function Reference

Siege Utilities contains 521 auto-discovered functions.

siege_utilities.core.logging

siege_utilities.get_logger(name=None)[source]

Return a logger instance.

Parameters:: name (str, optional) – Logger name. If None, returns/creates the default logger.
Returns:: Logger instance.
Return type:: logging.Logger

Examples

>>> logger = get_logger()  # Gets default logger
>>> db_logger = get_logger("database")  # Gets database logger
>>> api_logger = get_logger("api")  # Gets API logger

siege_utilities.init_logger(name='siege_utilities', log_to_file=False, log_dir='logs', level='INFO', max_bytes=5000000, backup_count=5, shared_log_file=None)[source]

Initialize and configure a named logger.

Parameters:

name (str) – Logger name. Each component can have its own logger.
log_to_file (bool) – If True, creates individual log file (unless shared_log_file specified).
log_dir (str) – Directory for individual log files.
level (str|int) – Logging level for this logger.
max_bytes (int) – Max size for rotating file handler.
backup_count (int) – How many backup logs to keep.
shared_log_file (str) – If provided, this logger writes to shared file instead of individual file.

Returns:

Configured logger instance.

Return type:

logging.Logger

Examples

>>> # Individual loggers with separate files
>>> db_logger = init_logger("database", log_to_file=True, level="DEBUG")
>>> api_logger = init_logger("api", log_to_file=True, level="INFO")

>>> # Multiple loggers sharing one file (great for Spark!)
>>> worker1 = init_logger("worker_1", shared_log_file="spark_workers.log")
>>> worker2 = init_logger("worker_2", shared_log_file="spark_workers.log")

>>> # Use global shared configuration
>>> configure_shared_logging("/shared/logs/app.log")
>>> logger1 = init_logger("component_1")  # Automatically uses shared file
>>> logger2 = init_logger("component_2")  # Automatically uses shared file

siege_utilities.log_critical(message, logger_name=None)[source]

Log a critical message.

Parameters:

message – Message to log
logger_name (str, optional) – Specific logger to use. Uses default if None.

Examples

>>> log_critical("Critical system error")
>>> log_critical("Spark cluster failure", logger_name="spark_master")

siege_utilities.log_debug(message, logger_name=None)[source]

Log a debug message.

Parameters:

message – Message to log
logger_name (str, optional) – Specific logger to use. Uses default if None.

Examples

>>> log_debug("Debug information")
>>> log_debug("Database query details", logger_name="database")

siege_utilities.log_error(message, logger_name=None)[source]

Log an error message.

Parameters:

message – Message to log
logger_name (str, optional) – Specific logger to use. Uses default if None.

Examples

>>> log_error("An error occurred")
>>> log_error("Database connection failed", logger_name="database")

siege_utilities.log_info(message: str, logger_name=None) → None[source]

Log an info message.

Parameters:

message (str) – Message to log
logger_name (str, optional) – Specific logger to use. Uses default if None.

Examples

>>> log_info("Application started")
>>> log_info("Worker processing task", logger_name="worker_1")

siege_utilities.log_warning(message, logger_name=None)[source]

Log a warning message.

Parameters:

message – Message to log
logger_name (str, optional) – Specific logger to use. Uses default if None.

Examples

>>> log_warning("This is a warning")
>>> log_warning("Cache miss detected", logger_name="cache")

siege_utilities.parse_log_level(level)[source]: Convert a string or numeric level into a logging level constant.

siege_utilities.core.string_utils

siege_utilities.distributed.hdfs_config

siege_utilities.create_cluster_config(data_path: str, **kwargs) → HDFSConfig[source]: Create config optimized for cluster deployment

siege_utilities.create_geocoding_config(data_path: str, **kwargs) → HDFSConfig[source]: Create config optimized for geocoding workloads

siege_utilities.create_hdfs_config(**kwargs) → HDFSConfig[source]: Factory function to create HDFS configuration

siege_utilities.create_local_config(data_path: str, **kwargs) → HDFSConfig[source]: Create config optimized for local development

siege_utilities.dataclass(cls=None, /, *, init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False, match_args=True, kw_only=False, slots=False, weakref_slot=False)[source]

Add dunder methods based on the fields defined in the class.

Examines PEP 526 __annotations__ to determine fields.

If init is true, an __init__() method is added to the class. If repr is true, a __repr__() method is added. If order is true, rich comparison dunder methods are added. If unsafe_hash is true, a __hash__() method is added. If frozen is true, fields may not be assigned to after instance creation. If match_args is true, the __match_args__ tuple is added. If kw_only is true, then by default all fields are keyword-only. If slots is true, a new class with a __slots__ attribute is returned.

siege_utilities.distributed.hdfs_legacy

siege_utilities.check_hdfs_status()[source]: Check if HDFS is accessible

siege_utilities.get_quick_file_signature(file_path)[source]

“”” Perform file operations: get quick file signature.

Part of Siege Utilities File Operations module. Auto-discovered and available at package level.

Returns:: Description needed

Example

>>> import siege_utilities
>>> result = siege_utilities.get_quick_file_signature()
>>> print(result)

Note

This function is auto-discovered and available without imports across all siege_utilities modules.

“””

siege_utilities.distributed.hdfs_operations

siege_utilities.create_hdfs_operations(config)[source]: Factory function to create HDFS operations instance

siege_utilities.setup_distributed_environment(config, data_path: str | None = None, dependency_paths: List[str] | None = None)[source]: Convenience function to set up distributed environment

siege_utilities.distributed.spark_utils

siege_utilities.files.hashing

siege_utilities.calculate_file_hash(file_path) → str | None[source]: Alias for get_file_hash with SHA256 - for backward compatibility

siege_utilities.generate_sha256_hash_for_file(file_path) → str | None[source]

Generate SHA256 hash for a file - chunked reading for large files

Parameters:: file_path – Path to the file (str or Path object)
Returns:: SHA256 hash as hexadecimal string, or None if error

siege_utilities.get_file_hash(file_path, algorithm='sha256') → str | None[source]

Generate hash for a file using specified algorithm

Parameters:

file_path – Path to the file (str or Path object)
algorithm – Hash algorithm to use (‘sha256’, ‘md5’, ‘sha1’, etc.)

Returns:

Hash as hexadecimal string, or None if error

siege_utilities.test_hash_functions()[source]: Test the hash functions with a temporary file

siege_utilities.verify_file_integrity(file_path, expected_hash, algorithm='sha256') → bool[source]

Verify file integrity by comparing with expected hash

Parameters:

file_path – Path to the file
expected_hash – Expected hash value
algorithm – Hash algorithm used

Returns:

True if file matches expected hash, False otherwise

siege_utilities.files.operations

siege_utilities.check_for_file_type_in_directory(target_file_path: Path, file_type: str) → bool[source]

Parameters:

target_file_path
file_type

Returns:

bool

siege_utilities.check_if_file_exists_at_path(target_file_path: Path) → bool[source]

Parameters:: target_file_path – This is the path we are going to check to see if a file exists
Returns:: True if file exists, False otherwise

siege_utilities.count_duplicate_rows_in_file_using_awk(target_file_path: Path) → int[source]: “This uses an awk pattern from Justin Hernandez to count duplicate rows in file” :param target_file_path: pathlib.Path object that we are going to count the duplicate rows of :return: count of duplicate rows in file

siege_utilities.count_empty_rows_in_file_pythonically(target_file_path: Path) → int[source]

Parameters:: target_file_path – pathlib.Path object that we are going to count the empty rows of
Returns:: count of empty rows in file

siege_utilities.count_empty_rows_in_file_using_awk(target_file_path: Path) → int[source]

Parameters:: target_file_path – pathlib.Path object that we are going to count the empty rows of
Returns:: count of empty rows in file

siege_utilities.count_total_rows_in_file_pythonically(target_file_path: Path) → int[source]

Parameters:: target_file_path – pathlib.Path object that we are going to count the rows of
Returns:: count of total rows in file

siege_utilities.count_total_rows_in_file_using_sed(target_file_path: Path) → int[source]

Parameters:: target_file_path – pathlib.Path object that we are going to count the total rows of
Returns:: count of total rows in file

siege_utilities.delete_existing_file_and_replace_it_with_an_empty_file(target_file_path: Path) → Path[source]: This function deletes the existing file and replaces it with an empty file. :param target_file_path: Pathlib.path object to interact with :return: pathlib.Path object to interact with

siege_utilities.remove_empty_rows_in_file_using_sed(target_file_path: Path, fixed_file_path: Path = None)[source]

Parameters:

target_file_path – pathlib.Path object that we are going to remove the empty rows of
target_file_path – pathlib.Path object to path for saved fixed file

Returns:

siege_utilities.rmtree(f: Path)[source]

“”” Utility function: rmtree.

Part of Siege Utilities Utilities module. Auto-discovered and available at package level.

Returns:: Description needed

Example

>>> import siege_utilities
>>> result = siege_utilities.rmtree()
>>> print(result)

Note

This function is auto-discovered and available without imports across all siege_utilities modules.

“””

siege_utilities.write_data_to_a_new_empty_file(target_file_path: Path, data: str) → Path[source]

Parameters:

target_file_path – file path to write data to
data – what to write

Returns:

the path to the file

siege_utilities.write_data_to_an_existing_file(target_file_path: Path, data: str) → Path[source]

Parameters:

target_file_path – file path to write data to
data – what to write

Returns:

the path to the file

siege_utilities.files.paths

siege_utilities.ensure_path_exists(desired_path: Path) → Path[source]

“”” Perform file operations: ensure path exists.

Part of Siege Utilities File Operations module. Auto-discovered and available at package level.

Returns:: Description needed

Example

>>> import siege_utilities
>>> result = siege_utilities.ensure_path_exists()
>>> print(result)

Note

This function is auto-discovered and available without imports across all siege_utilities modules.

“””

siege_utilities.unzip_file_to_its_own_directory(path_to_zipfile: Path, new_dir_name=None, new_dir_parent=None)[source]

“”” Perform file operations: unzip file to its own directory.

Part of Siege Utilities File Operations module. Auto-discovered and available at package level.

Returns:: Description needed

Example

>>> import siege_utilities
>>> result = siege_utilities.unzip_file_to_its_own_directory()
>>> print(result)

Note

This function is auto-discovered and available without imports across all siege_utilities modules.

“””

siege_utilities.files.remote

siege_utilities.download_file(url, local_filename)[source]

Download a file from a URL to a local file with progress bar

Parameters:

url – The URL to download from
local_filename – The local path where the file should be saved

Returns:

The local filename if successful, False otherwise

siege_utilities.generate_local_path_from_url(url: str, directory_path: Path, as_string: bool = True)[source]

“”” Perform file operations: generate local path from url.

Part of Siege Utilities File Operations module. Auto-discovered and available at package level.

Returns:: Description needed

Example

>>> import siege_utilities
>>> result = siege_utilities.generate_local_path_from_url()
>>> print(result)

Note

This function is auto-discovered and available without imports across all siege_utilities modules.

“””

siege_utilities.files.shell

siege_utilities.run_subprocess(command_list)[source]

Run a shell command as a subprocess and handle the output.

Parameters:: command_list – The command to run, as a list or string
Returns:: The command output (stdout if successful, stderr if failed)

Complete Function Reference

siege_utilities.core.logging

siege_utilities.core.string_utils

siege_utilities.distributed.hdfs_config

siege_utilities.distributed.hdfs_legacy

siege_utilities.distributed.hdfs_operations

siege_utilities.distributed.spark_utils

siege_utilities.files.hashing

siege_utilities.files.operations

siege_utilities.files.paths

siege_utilities.files.remote

siege_utilities.files.shell

siege_utilities.geo.geocoding