Automation-first Python library for local file / directory / zip operations, HTTP downloads, and remote storage (Google Drive, S3, Azure Blob, Dropbox, SFTP). Actions are defined as JSON and dispatched through a central registry so they can be executed in-process, from disk, over a TCP socket, or over HTTP.
Layered architecture with Facade + Registry + Command + Strategy patterns:
automation_file/
├── __init__.py # Public API facade (every name users import)
├── __main__.py # CLI entry (argparse dispatcher, subcommands + legacy flags)
├── exceptions.py # Exception hierarchy (FileAutomationException base)
├── logging_config.py # file_automation_logger (file + stderr handlers)
├── core/
│ ├── action_registry.py # ActionRegistry — name -> callable (Registry + Command)
│ ├── action_executor.py # ActionExecutor — runs JSON action lists (Facade + Template Method)
│ ├── callback_executor.py # CallbackExecutor — trigger then callback composition
│ ├── package_loader.py # PackageLoader — dynamically registers package members
│ ├── json_store.py # Thread-safe read/write of JSON action files
│ ├── retry.py # retry_on_transient — capped exponential back-off decorator
│ └── quota.py # Quota — size + time budget guards
├── local/ # Strategy modules — each file is a batch of pure operations
│ ├── file_ops.py
│ ├── dir_ops.py
│ ├── zip_ops.py
│ └── safe_paths.py # safe_join / is_within — path traversal guard
├── remote/
│ ├── url_validator.py # SSRF guard for outbound URLs
│ ├── http_download.py # SSRF-validated HTTP download with size/timeout caps + retry
│ ├── google_drive/
│ │ ├── client.py # GoogleDriveClient (Singleton Facade)
│ │ ├── delete_ops.py
│ │ ├── download_ops.py
│ │ ├── folder_ops.py
│ │ ├── search_ops.py
│ │ ├── share_ops.py
│ │ └── upload_ops.py
│ ├── s3/ # S3 (boto3) — auto-registered in build_default_registry()
│ │ ├── client.py # S3Client
│ │ ├── upload_ops.py
│ │ ├── download_ops.py
│ │ ├── delete_ops.py
│ │ └── list_ops.py
│ ├── azure_blob/ # Azure Blob — auto-registered in build_default_registry()
│ │ └── {client,upload,download,delete,list}_ops.py
│ ├── dropbox_api/ # Dropbox — auto-registered in build_default_registry()
│ │ └── {client,upload,download,delete,list}_ops.py
│ └── sftp/ # SFTP (paramiko + RejectPolicy) — auto-registered in build_default_registry()
│ └── {client,upload,download,delete,list}_ops.py
├── server/
│ ├── tcp_server.py # Loopback-only TCP server executing JSON actions (optional shared-secret auth)
│ └── http_server.py # Loopback-only HTTP server (POST /actions, optional Bearer auth)
├── project/
│ ├── project_builder.py # ProjectBuilder (Builder pattern)
│ └── templates.py # Scaffolding templates
├── ui/ # PySide6 GUI (required dep)
│ ├── launcher.py # launch_ui(argv) — boots QApplication + MainWindow
│ ├── main_window.py # MainWindow — tabbed control surface over every feature
│ ├── worker.py # ActionWorker(QRunnable) + _WorkerSignals
│ ├── log_widget.py # LogPanel — timestamped, read-only log stream
│ └── tabs/ # One tab per domain: local / http / drive / s3 /
│ # azure / dropbox / sftp /
│ # JSON actions / servers
└── utils/
└── file_discovery.py # Recursive file listing by extension
Key design patterns in use:
automation_file/__init__.py re-exports every supported name (execute_action, driver_instance, start_autocontrol_socket_server, …).ActionRegistry maps action name → callable. JSON action lists are command objects ([name, kwargs] / [name, [args]] / [name]) dispatched through the registry.ActionExecutor._execute_event defines the single-action lifecycle (resolve → call → wrap result); execute_action is the outer iteration template.local/*_ops.py and remote/google_drive/*_ops.py module is an independent strategy that plugs into the registry.driver_instance, executor, callback_executor, package_manager are shared instances wired in __init__.py so callback_executor.registry is executor.registry.ProjectBuilder assembles the keyword/ + executor/ skeleton.ActionRegistry — mutable name → callable mapping. register, register_many, resolve, unregister, event_dict (live view for legacy callers).ActionExecutor — holds a registry and runs JSON action lists. execute_action(list|dict, validate_first=False, dry_run=False), execute_action_parallel(list, max_workers=None), validate(list) -> list[str], execute_files(paths), add_command_to_executor(mapping).CallbackExecutor — runs a registered trigger, then a user callback, sharing the executor’s registry.PackageLoader — imports a package by name and registers its top-level functions / classes / builtins as <package>_<member>.GoogleDriveClient — wraps OAuth2 credential loading; exposes service lazily. later_init(token_path, credentials_path) bootstraps; require_service() raises if not initialised.S3Client / AzureBlobClient / DropboxClient / SFTPClient — singleton wrappers around the required SDKs. Each exposes later_init(...) plus close() where relevant. Their ops are auto-registered by build_default_registry(); register_<backend>_ops(registry) is still exported so callers can populate custom registries.MainWindow — PySide6 tabbed control surface (ui/main_window.py). Nine tabs — Local, HTTP, Google Drive, S3, Azure Blob, Dropbox, SFTP, JSON actions, Servers — share a LogPanel and dispatch work through ActionWorker(QRunnable) on the global QThreadPool.launch_ui(argv=None) — boots / reuses a QApplication, shows MainWindow, and returns the exec code. Exposed lazily on the facade via __getattr__ so the Qt runtime isn’t paid for by non-UI importers.TCPActionServer — threaded TCP server that deserialises a JSON action list per connection. Defaults to loopback; optional shared_secret enforces AUTH <secret>\n prefix.HTTPActionServer — ThreadingHTTPServer exposing POST /actions. Defaults to loopback; optional shared_secret enforces Authorization: Bearer <secret>.Quota — frozen dataclass capping bytes and wall-clock seconds per action or block (check_size, time_budget context manager, wraps decorator). 0 disables each cap.retry_on_transient(max_attempts, backoff_base, backoff_cap, retriable) — decorator that retries with capped exponential back-off and raises RetryExhaustedException chained to the last error.safe_join(root, user_path) / is_within(root, path) — path traversal guard; safe_join raises PathTraversalException when the resolved path escapes root.main branch: stable releases, publishes automation_file to PyPI (version in stable.toml).dev branch: development, publishes automation_file_dev to PyPI (version in dev.toml).dependencies and [project.optional-dependencies] (dev) in sync across both TOMLs. Backends (boto3, azure-storage-blob, dropbox, paramiko) and PySide6 are first-class runtime deps — do not move them back under extras.stable.toml and dev.toml, builds, uploads to PyPI, then commits the bump back to main tagged as vX.Y.Z. Do not hand-bump before merging to main. The next publish run is skipped via a commit-message guard (chore: bump version), so the bump itself never re-triggers publishing..github/workflows/ci-dev.yml, .github/workflows/ci-stable.yml.lint (ruff check + ruff format –check + mypy) → pytest with coverage → uploads coverage.xml as an artifact..github/workflows/publish.yml) that runs on push to main: bumps both TOMLs, copies stable.toml to pyproject.toml, builds the sdist + wheel, twine upload via PYPI_API_TOKEN, then commits + tags + pushes and creates gh release create v<version> --generate-notes.pre-commit is configured (.pre-commit-config.yaml): trailing-whitespace, eof-fixer, check-yaml, check-toml, check-added-large-files, ruff, ruff-format, mypy. Install with pre-commit install after cloning.python -m pip install -r dev_requirements.txt pytest pytest-cov
python -m pip install -e ".[dev]" # ruff, mypy, pre-commit
python -m pytest tests/ -v --tb=short
ruff check automation_file/ tests/
ruff format --check automation_file/ tests/
mypy automation_file/
python -m automation_file --help
Testing:
tests/ (pytest). Fixtures in tests/conftest.py (sample_file, sample_dir).core/, local/, remote/url_validator, project/, server/, utils/, plus a facade smoke test, retry/quota/safe_paths, HTTP+TCP auth, and optional-backend registration.python -m pytest tests/ -v.X | Y union syntax, not Union[X, Y].from __future__ import annotations at the top of every module for deferred type evaluation.FileAutomationException; never raise Exception(...) directly.file_automation_logger from automation_file.logging_config. Never print() for diagnostics.[name], [name, {kwargs}], or [name, [args]] — nothing else._old_-prefixed names. Git history is the archive.add_command_to_executor({name: callable}).All code must follow secure-by-default principles. Review every change against the checklist below.
eval(), exec(), or pickle.loads() on untrusted data.subprocess.Popen(..., shell=True) — always pass argument lists.GoogleDriveClient are kept on disk only at the caller-supplied token_path.json.loads() / json.dumps() for serialisation — never pickle.automation_file.remote.url_validator.validate_http_url:
http:// and https:// schemes — rejects file://, ftp://, data:, gopher://.http_download.download_file calls the validator, uses allow_redirects=False, enforces a default 20 MB response cap and 15 s connection timeout, and never downgrades TLS verification.urlopen() / requests.* without the validator.verify=False.InteractiveHostKeyPolicy pattern.shell=True), set an explicit timeout, and never interpolate user input into a command string.TCPActionServer binds to localhost by default. start_autocontrol_socket_server(host=…) raises ValueError if the resolved address is not loopback unless allow_non_loopback=True is passed explicitly.recv(8192)). Do not raise that limit without also adding a length-framed protocol.quit_server triggers an orderly shutdown; do not add an administrative bypass that skips the loopback check.shared_secret= enforces an AUTH <secret>\n prefix; the comparison uses hmac.compare_digest (constant time). Never log the secret or the raw payload.HTTPActionServer / start_http_action_server mirror the TCP server’s posture: loopback-only by default, allow_non_loopback=True required to bind elsewhere, optional shared_secret enforced as Authorization: Bearer <secret> using hmac.compare_digest.POST /actions is handled. Request body capped at 1 MB — do not raise without also switching to a streaming parser.401; malformed JSON returns 400; unknown paths return 404.automation_file.local.safe_paths.safe_join (raises PathTraversalException) or the is_within check. Never concatenate + Path.resolve() yourself and skip the containment check — symlinks and .. segments bypass naive string checks.SFTPClient uses paramiko.RejectPolicy() — unknown hosts are rejected, never auto-added. Callers pass known_hosts= explicitly or rely on ~/.ssh/known_hosts. Do not swap in AutoAddPolicy for convenience.retry_on_transient only retries the exception types passed via retriable=(…). Never widen to bare Exception — masks logic bugs as transient failures. Always exhausts to RetryExhaustedException chained with raise ... from err.Quota(max_bytes=…, max_seconds=…) — prefer Quota.wraps(...) over inline checks when guarding a whole operation. 0 disables each cap.token_path with encoding="utf-8". Never log or print the token contents.GoogleDriveClient.require_service() raises rather than silently operating with a None service — do not paper over it by catching RuntimeError at the call site.pathlib.Path for path manipulation; never string-concatenate paths with user input.with open(...) as f: for every file operation; close via context manager.encoding="utf-8" when reading or writing text.automation_file.core.json_store.write_action_json which holds a module-level lock.PackageLoader.add_package_to_executor(package) registers every function / class / builtin of a package under <package>_<member>. Treat it as eval-grade power: never expose it to arbitrary clients (e.g. via the TCP server). If you add a remote plugin-load command, gate it behind an explicit admin flag and authenticated transport.requirements.txt / dev_requirements.txt.All code must satisfy common static-analysis rules. Review every change against the checklist below.
if/for/try chains with early returns.except: — always specify exception types.Exception / BaseException unless immediately logging and re-raising, or running at a top-level dispatcher boundary (the ActionExecutor.execute_action loop is one of these — it intentionally records per-action failures without aborting the batch).pass silently inside except — log via file_automation_logger at minimum.return / break / continue inside a finally block — it swallows exceptions.FileAutomationException.raise ... from err (or raise ... from None) when re-raising to preserve / suppress the chain explicitly.None using is / is not, never == / !=.isinstance(obj, T), never type(obj) == T.None and initialise inside.% formatting or str.format() (except inside lazy log calls: logger.info("x=%s", x)).enumerate() instead of range(len(...)) when the index is needed.dict.get(key, default) over key in dict and dict[key].snake_case for functions, methods, variables, module names.PascalCase for classes.UPPER_SNAKE_CASE for module-level constants._leading_underscore for protected / internal members.id, type, list, dict, input, file, open, etc.).return / raise.TODO / FIXME / XXX without an issue reference (# TODO(#123): …).print() for diagnostics in library code — use file_automation_logger.logger.debug("x=%s", x)) to avoid eager f-string formatting on hot paths.assert for runtime validation; assert is for tests only.localhost / loopback defaults.return bool(cond) or return cond, not if cond: return True else: return False.if x / if not x, not if x == True / if x == False.from x import a, b is fine.automation_file.*) — separated by blank lines.__init__.py re-exports.ruff check automation_file/ tests/ locally.# noqa: RULE, justify it in the comment — never blanket-disable.Co-Authored-By headers referencing any AI.dev for development work, main for stable releases.