File Utilities

Comprehensive file operations, hashing, path management, and remote file handling.

siege_utilities.files.configure_shared_logging(log_file_path, level='INFO', max_bytes=5000000, backup_count=5)[source]

Configure all loggers to write to the same shared log file. Perfect for distributed computing where all workers should log to one file.

Parameters:

log_file_path (str) – Path to shared log file
level (str) – Log level for the shared file
max_bytes (int) – Max file size before rotation
backup_count (int) – Number of backup files to keep

Example

>>> # In Spark job - all workers write to same file
>>> configure_shared_logging("/shared/logs/spark_job.log", level="INFO")
>>> log_info("Worker started", logger_name="worker_1")
>>> log_info("Processing data", logger_name="worker_2")

siege_utilities.files.disable_shared_logging()[source]: Disable shared logging configuration. Loggers will revert to individual file handling.

siege_utilities.files.init_logger(name='siege_utilities', log_to_file=False, log_dir='logs', level='INFO', max_bytes=5000000, backup_count=5, shared_log_file=None)[source]

Initialize and configure a named logger.

Parameters:

name (str) – Logger name. Each component can have its own logger.
log_to_file (bool) – If True, creates individual log file (unless shared_log_file specified).
log_dir (str) – Directory for individual log files.
level (str|int) – Logging level for this logger.
max_bytes (int) – Max size for rotating file handler.
backup_count (int) – How many backup logs to keep.
shared_log_file (str) – If provided, this logger writes to shared file instead of individual file.

Returns:

Configured logger instance.

Return type:

logging.Logger

Examples

>>> # Individual loggers with separate files
>>> db_logger = init_logger("database", log_to_file=True, level="DEBUG")
>>> api_logger = init_logger("api", log_to_file=True, level="INFO")

>>> # Multiple loggers sharing one file (great for Spark!)
>>> worker1 = init_logger("worker_1", shared_log_file="spark_workers.log")
>>> worker2 = init_logger("worker_2", shared_log_file="spark_workers.log")

>>> # Use global shared configuration
>>> configure_shared_logging("/shared/logs/app.log")
>>> logger1 = init_logger("component_1")  # Automatically uses shared file
>>> logger2 = init_logger("component_2")  # Automatically uses shared file

siege_utilities.files.get_logger(name=None)[source]

Return a logger instance.

Parameters:: name (str, optional) – Logger name. If None, returns/creates the default logger.
Returns:: Logger instance.
Return type:: logging.Logger

Examples

>>> logger = get_logger()  # Gets default logger
>>> db_logger = get_logger("database")  # Gets database logger
>>> api_logger = get_logger("api")  # Gets API logger

siege_utilities.files.get_all_loggers()[source]

Get all initialized loggers.

Returns:: Dictionary of logger_name -> logger_instance
Return type:: dict

Example

>>> loggers = get_all_loggers()
>>> print(f"Active loggers: {list(loggers.keys())}")

siege_utilities.files.set_default_logger_name(name)[source]

Set the default logger name used by convenience functions.

Parameters:: name (str) – New default logger name

Example

>>> set_default_logger_name("spark_master")
>>> log_info("This will use 'spark_master' logger")

siege_utilities.files.cleanup_logger(name)[source]

Remove a logger and clean up its handlers.

Parameters:: name (str) – Logger name to remove
Returns:: True if logger was removed, False if it didn’t exist
Return type:: bool

siege_utilities.files.cleanup_all_loggers()[source]: Clean up all loggers and their handlers. Useful for testing or application shutdown.

siege_utilities.files.log_debug(message, logger_name=None)[source]

Log a debug message.

Parameters:

message – Message to log
logger_name (str, optional) – Specific logger to use. Uses default if None.

Examples

>>> log_debug("Debug information")
>>> log_debug("Database query details", logger_name="database")

siege_utilities.files.log_info(message: str, logger_name=None) → None[source]

Log an info message.

Parameters:

message (str) – Message to log
logger_name (str, optional) – Specific logger to use. Uses default if None.

Examples

>>> log_info("Application started")
>>> log_info("Worker processing task", logger_name="worker_1")

siege_utilities.files.log_warning(message, logger_name=None)[source]

Log a warning message.

Parameters:

message – Message to log
logger_name (str, optional) – Specific logger to use. Uses default if None.

Examples

>>> log_warning("This is a warning")
>>> log_warning("Cache miss detected", logger_name="cache")

siege_utilities.files.log_error(message, logger_name=None)[source]

Log an error message.

Parameters:

message – Message to log
logger_name (str, optional) – Specific logger to use. Uses default if None.

Examples

>>> log_error("An error occurred")
>>> log_error("Database connection failed", logger_name="database")

siege_utilities.files.log_critical(message, logger_name=None)[source]

Log a critical message.

Parameters:

message – Message to log
logger_name (str, optional) – Specific logger to use. Uses default if None.

Examples

>>> log_critical("Critical system error")
>>> log_critical("Spark cluster failure", logger_name="spark_master")

siege_utilities.files.parse_log_level(level)[source]: Convert a string or numeric level into a logging level constant.

Hashing

Hash Management Functions - Fixed Version Provides standardized hash functions that actually exist and work properly

siege_utilities.files.hashing.generate_sha256_hash_for_file(file_path) → str | None[source]

Generate SHA256 hash for a file - chunked reading for large files

Parameters:: file_path – Path to the file (str or Path object)
Returns:: SHA256 hash as hexadecimal string, or None if error

siege_utilities.files.hashing.get_file_hash(file_path, algorithm='sha256') → str | None[source]

Generate hash for a file using specified algorithm

Parameters:

file_path – Path to the file (str or Path object)
algorithm – Hash algorithm to use (‘sha256’, ‘md5’, ‘sha1’, etc.)

Returns:

Hash as hexadecimal string, or None if error

siege_utilities.files.hashing.calculate_file_hash(file_path) → str | None[source]: Alias for get_file_hash with SHA256 - for backward compatibility

siege_utilities.files.hashing.get_quick_file_signature(file_path) → str[source]

Generate a quick file signature using file stats + partial hash Faster for change detection, not cryptographically secure

Parameters:: file_path – Path to the file
Returns:: Quick signature string

siege_utilities.files.hashing.verify_file_integrity(file_path, expected_hash, algorithm='sha256') → bool[source]

Verify file integrity by comparing with expected hash

Parameters:

file_path – Path to the file
expected_hash – Expected hash value
algorithm – Hash algorithm used

Returns:

True if file matches expected hash, False otherwise

siege_utilities.files.hashing.test_hash_functions()[source]: Test the hash functions with a temporary file

Operations

siege_utilities.files.operations.rmtree(f: Path)[source]

“”” Utility function: rmtree.

Part of Siege Utilities Utilities module. Auto-discovered and available at package level.

Returns:: Description needed

Example

>>> import siege_utilities
>>> result = siege_utilities.rmtree()
>>> print(result)

Note

This function is auto-discovered and available without imports across all siege_utilities modules.

“””

siege_utilities.files.operations.check_if_file_exists_at_path(target_file_path: Path) → bool[source]

Parameters:: target_file_path – This is the path we are going to check to see if a file exists
Returns:: True if file exists, False otherwise

siege_utilities.files.operations.delete_existing_file_and_replace_it_with_an_empty_file(target_file_path: Path) → Path[source]: This function deletes the existing file and replaces it with an empty file. :param target_file_path: Pathlib.path object to interact with :return: pathlib.Path object to interact with

siege_utilities.files.operations.count_total_rows_in_file_pythonically(target_file_path: Path) → int[source]

Parameters:: target_file_path – pathlib.Path object that we are going to count the rows of
Returns:: count of total rows in file

siege_utilities.files.operations.count_empty_rows_in_file_pythonically(target_file_path: Path) → int[source]

Parameters:: target_file_path – pathlib.Path object that we are going to count the empty rows of
Returns:: count of empty rows in file

siege_utilities.files.operations.count_duplicate_rows_in_file_using_awk(target_file_path: Path) → int[source]: “This uses an awk pattern from Justin Hernandez to count duplicate rows in file” :param target_file_path: pathlib.Path object that we are going to count the duplicate rows of :return: count of duplicate rows in file

siege_utilities.files.operations.count_total_rows_in_file_using_sed(target_file_path: Path) → int[source]

Parameters:: target_file_path – pathlib.Path object that we are going to count the total rows of
Returns:: count of total rows in file

siege_utilities.files.operations.count_empty_rows_in_file_using_awk(target_file_path: Path) → int[source]

Parameters:: target_file_path – pathlib.Path object that we are going to count the empty rows of
Returns:: count of empty rows in file

siege_utilities.files.operations.remove_empty_rows_in_file_using_sed(target_file_path: Path, fixed_file_path: Path = None)[source]

Parameters:

target_file_path – pathlib.Path object that we are going to remove the empty rows of
target_file_path – pathlib.Path object to path for saved fixed file

Returns:

siege_utilities.files.operations.write_data_to_a_new_empty_file(target_file_path: Path, data: str) → Path[source]

Parameters:

target_file_path – file path to write data to
data – what to write

Returns:

the path to the file

siege_utilities.files.operations.write_data_to_an_existing_file(target_file_path: Path, data: str) → Path[source]

Parameters:

target_file_path – file path to write data to
data – what to write

Returns:

the path to the file

siege_utilities.files.operations.check_for_file_type_in_directory(target_file_path: Path, file_type: str) → bool[source]

Parameters:

target_file_path
file_type

Returns:

bool

Paths

siege_utilities.files.paths.ensure_path_exists(desired_path: Path) → Path[source]

“”” Perform file operations: ensure path exists.

Part of Siege Utilities File Operations module. Auto-discovered and available at package level.

Returns:: Description needed

Example

>>> import siege_utilities
>>> result = siege_utilities.ensure_path_exists()
>>> print(result)

Note

This function is auto-discovered and available without imports across all siege_utilities modules.

“””

siege_utilities.files.paths.unzip_file_to_its_own_directory(path_to_zipfile: Path, new_dir_name=None, new_dir_parent=None)[source]

“”” Perform file operations: unzip file to its own directory.

Part of Siege Utilities File Operations module. Auto-discovered and available at package level.

Returns:: Description needed

Example

>>> import siege_utilities
>>> result = siege_utilities.unzip_file_to_its_own_directory()
>>> print(result)

Note

This function is auto-discovered and available without imports across all siege_utilities modules.

“””

Remote

siege_utilities.files.remote.download_file(url, local_filename)[source]

Download a file from a URL to a local file with progress bar

Parameters:

url – The URL to download from
local_filename – The local path where the file should be saved

Returns:

The local filename if successful, False otherwise

siege_utilities.files.remote.generate_local_path_from_url(url: str, directory_path: Path, as_string: bool = True)[source]

“”” Perform file operations: generate local path from url.

Part of Siege Utilities File Operations module. Auto-discovered and available at package level.

Returns:: Description needed

Example

>>> import siege_utilities
>>> result = siege_utilities.generate_local_path_from_url()
>>> print(result)

Note

This function is auto-discovered and available without imports across all siege_utilities modules.

“””

Shell

siege_utilities.files.shell.run_subprocess(command_list)[source]

Run a shell command as a subprocess and handle the output.

Parameters:: command_list – The command to run, as a list or string
Returns:: The command output (stdout if successful, stderr if failed)