File Utilities
Comprehensive file operations, hashing, path management, and remote file handling.
Configure all loggers to write to the same shared log file. Perfect for distributed computing where all workers should log to one file.
- Parameters:
log_file_path (str) – Path to shared log file
level (str) – Log level for the shared file
max_bytes (int) – Max file size before rotation
backup_count (int) – Number of backup files to keep
Example
>>> # In Spark job - all workers write to same file >>> configure_shared_logging("/shared/logs/spark_job.log", level="INFO") >>> log_info("Worker started", logger_name="worker_1") >>> log_info("Processing data", logger_name="worker_2")
Disable shared logging configuration. Loggers will revert to individual file handling.
- siege_utilities.files.init_logger(name='siege_utilities', log_to_file=False, log_dir='logs', level='INFO', max_bytes=5000000, backup_count=5, shared_log_file=None)[source]
Initialize and configure a named logger.
- Parameters:
name (str) – Logger name. Each component can have its own logger.
log_to_file (bool) – If True, creates individual log file (unless shared_log_file specified).
log_dir (str) – Directory for individual log files.
level (str|int) – Logging level for this logger.
max_bytes (int) – Max size for rotating file handler.
backup_count (int) – How many backup logs to keep.
shared_log_file (str) – If provided, this logger writes to shared file instead of individual file.
- Returns:
Configured logger instance.
- Return type:
logging.Logger
Examples
>>> # Individual loggers with separate files >>> db_logger = init_logger("database", log_to_file=True, level="DEBUG") >>> api_logger = init_logger("api", log_to_file=True, level="INFO")
>>> # Multiple loggers sharing one file (great for Spark!) >>> worker1 = init_logger("worker_1", shared_log_file="spark_workers.log") >>> worker2 = init_logger("worker_2", shared_log_file="spark_workers.log")
>>> # Use global shared configuration >>> configure_shared_logging("/shared/logs/app.log") >>> logger1 = init_logger("component_1") # Automatically uses shared file >>> logger2 = init_logger("component_2") # Automatically uses shared file
- siege_utilities.files.get_logger(name=None)[source]
Return a logger instance.
- Parameters:
name (str, optional) – Logger name. If None, returns/creates the default logger.
- Returns:
Logger instance.
- Return type:
logging.Logger
Examples
>>> logger = get_logger() # Gets default logger >>> db_logger = get_logger("database") # Gets database logger >>> api_logger = get_logger("api") # Gets API logger
- siege_utilities.files.get_all_loggers()[source]
Get all initialized loggers.
- Returns:
Dictionary of logger_name -> logger_instance
- Return type:
dict
Example
>>> loggers = get_all_loggers() >>> print(f"Active loggers: {list(loggers.keys())}")
- siege_utilities.files.set_default_logger_name(name)[source]
Set the default logger name used by convenience functions.
- Parameters:
name (str) – New default logger name
Example
>>> set_default_logger_name("spark_master") >>> log_info("This will use 'spark_master' logger")
- siege_utilities.files.cleanup_logger(name)[source]
Remove a logger and clean up its handlers.
- Parameters:
name (str) – Logger name to remove
- Returns:
True if logger was removed, False if it didn’t exist
- Return type:
bool
- siege_utilities.files.cleanup_all_loggers()[source]
Clean up all loggers and their handlers. Useful for testing or application shutdown.
- siege_utilities.files.log_debug(message, logger_name=None)[source]
Log a debug message.
- Parameters:
message – Message to log
logger_name (str, optional) – Specific logger to use. Uses default if None.
Examples
>>> log_debug("Debug information") >>> log_debug("Database query details", logger_name="database")
- siege_utilities.files.log_info(message: str, logger_name=None) None [source]
Log an info message.
- Parameters:
message (str) – Message to log
logger_name (str, optional) – Specific logger to use. Uses default if None.
Examples
>>> log_info("Application started") >>> log_info("Worker processing task", logger_name="worker_1")
- siege_utilities.files.log_warning(message, logger_name=None)[source]
Log a warning message.
- Parameters:
message – Message to log
logger_name (str, optional) – Specific logger to use. Uses default if None.
Examples
>>> log_warning("This is a warning") >>> log_warning("Cache miss detected", logger_name="cache")
- siege_utilities.files.log_error(message, logger_name=None)[source]
Log an error message.
- Parameters:
message – Message to log
logger_name (str, optional) – Specific logger to use. Uses default if None.
Examples
>>> log_error("An error occurred") >>> log_error("Database connection failed", logger_name="database")
- siege_utilities.files.log_critical(message, logger_name=None)[source]
Log a critical message.
- Parameters:
message – Message to log
logger_name (str, optional) – Specific logger to use. Uses default if None.
Examples
>>> log_critical("Critical system error") >>> log_critical("Spark cluster failure", logger_name="spark_master")
- siege_utilities.files.parse_log_level(level)[source]
Convert a string or numeric level into a logging level constant.
Hashing
Hash Management Functions - Fixed Version Provides standardized hash functions that actually exist and work properly
- siege_utilities.files.hashing.generate_sha256_hash_for_file(file_path) str | None [source]
Generate SHA256 hash for a file - chunked reading for large files
- Parameters:
file_path – Path to the file (str or Path object)
- Returns:
SHA256 hash as hexadecimal string, or None if error
- siege_utilities.files.hashing.get_file_hash(file_path, algorithm='sha256') str | None [source]
Generate hash for a file using specified algorithm
- Parameters:
file_path – Path to the file (str or Path object)
algorithm – Hash algorithm to use (‘sha256’, ‘md5’, ‘sha1’, etc.)
- Returns:
Hash as hexadecimal string, or None if error
- siege_utilities.files.hashing.calculate_file_hash(file_path) str | None [source]
Alias for get_file_hash with SHA256 - for backward compatibility
- siege_utilities.files.hashing.get_quick_file_signature(file_path) str [source]
Generate a quick file signature using file stats + partial hash Faster for change detection, not cryptographically secure
- Parameters:
file_path – Path to the file
- Returns:
Quick signature string
- siege_utilities.files.hashing.verify_file_integrity(file_path, expected_hash, algorithm='sha256') bool [source]
Verify file integrity by comparing with expected hash
- Parameters:
file_path – Path to the file
expected_hash – Expected hash value
algorithm – Hash algorithm used
- Returns:
True if file matches expected hash, False otherwise
Operations
- siege_utilities.files.operations.rmtree(f: Path)[source]
“”” Utility function: rmtree.
Part of Siege Utilities Utilities module. Auto-discovered and available at package level.
- Returns:
Description needed
Example
>>> import siege_utilities >>> result = siege_utilities.rmtree() >>> print(result)
Note
This function is auto-discovered and available without imports across all siege_utilities modules.
“””
- siege_utilities.files.operations.check_if_file_exists_at_path(target_file_path: Path) bool [source]
- Parameters:
target_file_path – This is the path we are going to check to see if a file exists
- Returns:
True if file exists, False otherwise
- siege_utilities.files.operations.delete_existing_file_and_replace_it_with_an_empty_file(target_file_path: Path) Path [source]
This function deletes the existing file and replaces it with an empty file. :param target_file_path: Pathlib.path object to interact with :return: pathlib.Path object to interact with
- siege_utilities.files.operations.count_total_rows_in_file_pythonically(target_file_path: Path) int [source]
- Parameters:
target_file_path – pathlib.Path object that we are going to count the rows of
- Returns:
count of total rows in file
- siege_utilities.files.operations.count_empty_rows_in_file_pythonically(target_file_path: Path) int [source]
- Parameters:
target_file_path – pathlib.Path object that we are going to count the empty rows of
- Returns:
count of empty rows in file
- siege_utilities.files.operations.count_duplicate_rows_in_file_using_awk(target_file_path: Path) int [source]
“This uses an awk pattern from Justin Hernandez to count duplicate rows in file” :param target_file_path: pathlib.Path object that we are going to count the duplicate rows of :return: count of duplicate rows in file
- siege_utilities.files.operations.count_total_rows_in_file_using_sed(target_file_path: Path) int [source]
- Parameters:
target_file_path – pathlib.Path object that we are going to count the total rows of
- Returns:
count of total rows in file
- siege_utilities.files.operations.count_empty_rows_in_file_using_awk(target_file_path: Path) int [source]
- Parameters:
target_file_path – pathlib.Path object that we are going to count the empty rows of
- Returns:
count of empty rows in file
- siege_utilities.files.operations.remove_empty_rows_in_file_using_sed(target_file_path: Path, fixed_file_path: Path = None)[source]
- Parameters:
target_file_path – pathlib.Path object that we are going to remove the empty rows of
target_file_path – pathlib.Path object to path for saved fixed file
- Returns:
- siege_utilities.files.operations.write_data_to_a_new_empty_file(target_file_path: Path, data: str) Path [source]
- Parameters:
target_file_path – file path to write data to
data – what to write
- Returns:
the path to the file
Paths
- siege_utilities.files.paths.ensure_path_exists(desired_path: Path) Path [source]
“”” Perform file operations: ensure path exists.
Part of Siege Utilities File Operations module. Auto-discovered and available at package level.
- Returns:
Description needed
Example
>>> import siege_utilities >>> result = siege_utilities.ensure_path_exists() >>> print(result)
Note
This function is auto-discovered and available without imports across all siege_utilities modules.
“””
- siege_utilities.files.paths.unzip_file_to_its_own_directory(path_to_zipfile: Path, new_dir_name=None, new_dir_parent=None)[source]
“”” Perform file operations: unzip file to its own directory.
Part of Siege Utilities File Operations module. Auto-discovered and available at package level.
- Returns:
Description needed
Example
>>> import siege_utilities >>> result = siege_utilities.unzip_file_to_its_own_directory() >>> print(result)
Note
This function is auto-discovered and available without imports across all siege_utilities modules.
“””
Remote
- siege_utilities.files.remote.download_file(url, local_filename)[source]
Download a file from a URL to a local file with progress bar
- Parameters:
url – The URL to download from
local_filename – The local path where the file should be saved
- Returns:
The local filename if successful, False otherwise
- siege_utilities.files.remote.generate_local_path_from_url(url: str, directory_path: Path, as_string: bool = True)[source]
“”” Perform file operations: generate local path from url.
Part of Siege Utilities File Operations module. Auto-discovered and available at package level.
- Returns:
Description needed
Example
>>> import siege_utilities >>> result = siege_utilities.generate_local_path_from_url() >>> print(result)
Note
This function is auto-discovered and available without imports across all siege_utilities modules.
“””