Complete Function Reference

Siege Utilities contains 521 auto-discovered functions.

siege_utilities.core.logging

siege_utilities.get_logger(name=None)[source]

Return a logger instance.

Parameters:

name (str, optional) – Logger name. If None, returns/creates the default logger.

Returns:

Logger instance.

Return type:

logging.Logger

Examples

>>> logger = get_logger()  # Gets default logger
>>> db_logger = get_logger("database")  # Gets database logger
>>> api_logger = get_logger("api")  # Gets API logger
siege_utilities.init_logger(name='siege_utilities', log_to_file=False, log_dir='logs', level='INFO', max_bytes=5000000, backup_count=5, shared_log_file=None)[source]

Initialize and configure a named logger.

Parameters:
  • name (str) – Logger name. Each component can have its own logger.

  • log_to_file (bool) – If True, creates individual log file (unless shared_log_file specified).

  • log_dir (str) – Directory for individual log files.

  • level (str|int) – Logging level for this logger.

  • max_bytes (int) – Max size for rotating file handler.

  • backup_count (int) – How many backup logs to keep.

  • shared_log_file (str) – If provided, this logger writes to shared file instead of individual file.

Returns:

Configured logger instance.

Return type:

logging.Logger

Examples

>>> # Individual loggers with separate files
>>> db_logger = init_logger("database", log_to_file=True, level="DEBUG")
>>> api_logger = init_logger("api", log_to_file=True, level="INFO")
>>> # Multiple loggers sharing one file (great for Spark!)
>>> worker1 = init_logger("worker_1", shared_log_file="spark_workers.log")
>>> worker2 = init_logger("worker_2", shared_log_file="spark_workers.log")
>>> # Use global shared configuration
>>> configure_shared_logging("/shared/logs/app.log")
>>> logger1 = init_logger("component_1")  # Automatically uses shared file
>>> logger2 = init_logger("component_2")  # Automatically uses shared file
siege_utilities.log_critical(message, logger_name=None)[source]

Log a critical message.

Parameters:
  • message – Message to log

  • logger_name (str, optional) – Specific logger to use. Uses default if None.

Examples

>>> log_critical("Critical system error")
>>> log_critical("Spark cluster failure", logger_name="spark_master")
siege_utilities.log_debug(message, logger_name=None)[source]

Log a debug message.

Parameters:
  • message – Message to log

  • logger_name (str, optional) – Specific logger to use. Uses default if None.

Examples

>>> log_debug("Debug information")
>>> log_debug("Database query details", logger_name="database")
siege_utilities.log_error(message, logger_name=None)[source]

Log an error message.

Parameters:
  • message – Message to log

  • logger_name (str, optional) – Specific logger to use. Uses default if None.

Examples

>>> log_error("An error occurred")
>>> log_error("Database connection failed", logger_name="database")
siege_utilities.log_info(message: str, logger_name=None) None[source]

Log an info message.

Parameters:
  • message (str) – Message to log

  • logger_name (str, optional) – Specific logger to use. Uses default if None.

Examples

>>> log_info("Application started")
>>> log_info("Worker processing task", logger_name="worker_1")
siege_utilities.log_warning(message, logger_name=None)[source]

Log a warning message.

Parameters:
  • message – Message to log

  • logger_name (str, optional) – Specific logger to use. Uses default if None.

Examples

>>> log_warning("This is a warning")
>>> log_warning("Cache miss detected", logger_name="cache")
siege_utilities.parse_log_level(level)[source]

Convert a string or numeric level into a logging level constant.

siege_utilities.core.string_utils

siege_utilities.distributed.hdfs_config

siege_utilities.create_cluster_config(data_path: str, **kwargs) HDFSConfig[source]

Create config optimized for cluster deployment

siege_utilities.create_geocoding_config(data_path: str, **kwargs) HDFSConfig[source]

Create config optimized for geocoding workloads

siege_utilities.create_hdfs_config(**kwargs) HDFSConfig[source]

Factory function to create HDFS configuration

siege_utilities.create_local_config(data_path: str, **kwargs) HDFSConfig[source]

Create config optimized for local development

siege_utilities.dataclass(cls=None, /, *, init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False, match_args=True, kw_only=False, slots=False, weakref_slot=False)[source]

Add dunder methods based on the fields defined in the class.

Examines PEP 526 __annotations__ to determine fields.

If init is true, an __init__() method is added to the class. If repr is true, a __repr__() method is added. If order is true, rich comparison dunder methods are added. If unsafe_hash is true, a __hash__() method is added. If frozen is true, fields may not be assigned to after instance creation. If match_args is true, the __match_args__ tuple is added. If kw_only is true, then by default all fields are keyword-only. If slots is true, a new class with a __slots__ attribute is returned.

siege_utilities.distributed.hdfs_legacy

siege_utilities.check_hdfs_status()[source]

Check if HDFS is accessible

siege_utilities.get_quick_file_signature(file_path)[source]

“”” Perform file operations: get quick file signature.

Part of Siege Utilities File Operations module. Auto-discovered and available at package level.

Returns:

Description needed

Example

>>> import siege_utilities
>>> result = siege_utilities.get_quick_file_signature()
>>> print(result)

Note

This function is auto-discovered and available without imports across all siege_utilities modules.

“””

siege_utilities.distributed.hdfs_operations

siege_utilities.create_hdfs_operations(config)[source]

Factory function to create HDFS operations instance

siege_utilities.setup_distributed_environment(config, data_path: str | None = None, dependency_paths: List[str] | None = None)[source]

Convenience function to set up distributed environment

siege_utilities.distributed.spark_utils

siege_utilities.files.hashing

siege_utilities.calculate_file_hash(file_path) str | None[source]

Alias for get_file_hash with SHA256 - for backward compatibility

siege_utilities.generate_sha256_hash_for_file(file_path) str | None[source]

Generate SHA256 hash for a file - chunked reading for large files

Parameters:

file_path – Path to the file (str or Path object)

Returns:

SHA256 hash as hexadecimal string, or None if error

siege_utilities.get_file_hash(file_path, algorithm='sha256') str | None[source]

Generate hash for a file using specified algorithm

Parameters:
  • file_path – Path to the file (str or Path object)

  • algorithm – Hash algorithm to use (‘sha256’, ‘md5’, ‘sha1’, etc.)

Returns:

Hash as hexadecimal string, or None if error

siege_utilities.test_hash_functions()[source]

Test the hash functions with a temporary file

siege_utilities.verify_file_integrity(file_path, expected_hash, algorithm='sha256') bool[source]

Verify file integrity by comparing with expected hash

Parameters:
  • file_path – Path to the file

  • expected_hash – Expected hash value

  • algorithm – Hash algorithm used

Returns:

True if file matches expected hash, False otherwise

siege_utilities.files.operations

siege_utilities.check_for_file_type_in_directory(target_file_path: Path, file_type: str) bool[source]
Parameters:
  • target_file_path

  • file_type

Returns:

bool

siege_utilities.check_if_file_exists_at_path(target_file_path: Path) bool[source]
Parameters:

target_file_path – This is the path we are going to check to see if a file exists

Returns:

True if file exists, False otherwise

siege_utilities.count_duplicate_rows_in_file_using_awk(target_file_path: Path) int[source]

“This uses an awk pattern from Justin Hernandez to count duplicate rows in file” :param target_file_path: pathlib.Path object that we are going to count the duplicate rows of :return: count of duplicate rows in file

siege_utilities.count_empty_rows_in_file_pythonically(target_file_path: Path) int[source]
Parameters:

target_file_path – pathlib.Path object that we are going to count the empty rows of

Returns:

count of empty rows in file

siege_utilities.count_empty_rows_in_file_using_awk(target_file_path: Path) int[source]
Parameters:

target_file_path – pathlib.Path object that we are going to count the empty rows of

Returns:

count of empty rows in file

siege_utilities.count_total_rows_in_file_pythonically(target_file_path: Path) int[source]
Parameters:

target_file_path – pathlib.Path object that we are going to count the rows of

Returns:

count of total rows in file

siege_utilities.count_total_rows_in_file_using_sed(target_file_path: Path) int[source]
Parameters:

target_file_path – pathlib.Path object that we are going to count the total rows of

Returns:

count of total rows in file

siege_utilities.delete_existing_file_and_replace_it_with_an_empty_file(target_file_path: Path) Path[source]

This function deletes the existing file and replaces it with an empty file. :param target_file_path: Pathlib.path object to interact with :return: pathlib.Path object to interact with

siege_utilities.remove_empty_rows_in_file_using_sed(target_file_path: Path, fixed_file_path: Path = None)[source]
Parameters:
  • target_file_path – pathlib.Path object that we are going to remove the empty rows of

  • target_file_path – pathlib.Path object to path for saved fixed file

Returns:

siege_utilities.rmtree(f: Path)[source]

“”” Utility function: rmtree.

Part of Siege Utilities Utilities module. Auto-discovered and available at package level.

Returns:

Description needed

Example

>>> import siege_utilities
>>> result = siege_utilities.rmtree()
>>> print(result)

Note

This function is auto-discovered and available without imports across all siege_utilities modules.

“””

siege_utilities.write_data_to_a_new_empty_file(target_file_path: Path, data: str) Path[source]
Parameters:
  • target_file_path – file path to write data to

  • data – what to write

Returns:

the path to the file

siege_utilities.write_data_to_an_existing_file(target_file_path: Path, data: str) Path[source]
Parameters:
  • target_file_path – file path to write data to

  • data – what to write

Returns:

the path to the file

siege_utilities.files.paths

siege_utilities.ensure_path_exists(desired_path: Path) Path[source]

“”” Perform file operations: ensure path exists.

Part of Siege Utilities File Operations module. Auto-discovered and available at package level.

Returns:

Description needed

Example

>>> import siege_utilities
>>> result = siege_utilities.ensure_path_exists()
>>> print(result)

Note

This function is auto-discovered and available without imports across all siege_utilities modules.

“””

siege_utilities.unzip_file_to_its_own_directory(path_to_zipfile: Path, new_dir_name=None, new_dir_parent=None)[source]

“”” Perform file operations: unzip file to its own directory.

Part of Siege Utilities File Operations module. Auto-discovered and available at package level.

Returns:

Description needed

Example

>>> import siege_utilities
>>> result = siege_utilities.unzip_file_to_its_own_directory()
>>> print(result)

Note

This function is auto-discovered and available without imports across all siege_utilities modules.

“””

siege_utilities.files.remote

siege_utilities.download_file(url, local_filename)[source]

Download a file from a URL to a local file with progress bar

Parameters:
  • url – The URL to download from

  • local_filename – The local path where the file should be saved

Returns:

The local filename if successful, False otherwise

siege_utilities.generate_local_path_from_url(url: str, directory_path: Path, as_string: bool = True)[source]

“”” Perform file operations: generate local path from url.

Part of Siege Utilities File Operations module. Auto-discovered and available at package level.

Returns:

Description needed

Example

>>> import siege_utilities
>>> result = siege_utilities.generate_local_path_from_url()
>>> print(result)

Note

This function is auto-discovered and available without imports across all siege_utilities modules.

“””

siege_utilities.files.shell

siege_utilities.run_subprocess(command_list)[source]

Run a shell command as a subprocess and handle the output.

Parameters:

command_list – The command to run, as a list or string

Returns:

The command output (stdout if successful, stderr if failed)

siege_utilities.geo.geocoding