TFGrid Compose Architecture¶
Version: 0.9.0
Status: Active
Audience: Developers, Contributors, Advanced Users
Table of Contents¶
- System Overview
- Core Components
- Data Flow
- Pattern System
- State Management
- Extension Guide
- Design Decisions
- Performance Considerations
System Overview¶
High-Level Architecture¶
┌─────────────────────────────────────────────────────────────┐
│ tfgrid-compose CLI │
│ (User Entry Point) │
└──────────────────┬──────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Core Orchestrator │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Validation → Pattern Loading → App Loading │ │
│ │ ↓ ↓ ↓ │ │
│ │ Infrastructure → Network → Platform → App Deploy │ │
│ └──────────────────────────────────────────────────────┘ │
└─────────────┬────────────────────────────┬──────────────────┘
│ │
▼ ▼
┌──────────────────────────┐ ┌──────────────────────────┐
│ Pattern System │ │ Task Executors │
│ ┌────────────────────┐ │ │ ┌────────────────────┐ │
│ │ Infrastructure/ │ │ │ │ terraform.sh │ │
│ │ Platform/ │ │ │ │ wireguard.sh │ │
│ │ App/ │ │ │ │ ansible.sh │ │
│ └────────────────────┘ │ │ └────────────────────┘ │
└──────────────────────────┘ └──────────────────────────┘
│ │
▼ ▼
┌─────────────────────────────────────────────────────────────┐
│ ThreeFold Grid │
│ ┌────────────┐ ┌────────────┐ ┌──────────────────┐ │
│ │ Nodes │ │ Network │ │ Storage/ZDB │ │
│ └────────────┘ └────────────┘ └──────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Design Philosophy¶
1. Pattern-Based Architecture
- Patterns encapsulate infrastructure, platform, and application concerns
- Orchestrator is pattern-agnostic (knows nothing about pattern internals)
- New patterns can be added without modifying core code
2. Contract-Driven Integration
- Patterns must implement a standard contract
- Orchestrator consumes contract outputs (primary_ip, deployment_name, etc.)
- Loose coupling enables extensibility
3. Shell-Based Implementation
- Bash scripts for simplicity and transparency
- Easy to understand, debug, and extend
- Minimal dependencies (bash, common Unix tools)
- Works on any Unix-like system (Linux, macOS)
Core Components¶
1. CLI Entry Point (cli/tfgrid-compose
)¶
Purpose: User-facing command interface
Responsibilities:
- Parse command-line arguments
- Load context file (
.tfgrid-compose.yaml
) - Route commands to appropriate handlers
- Display help and version information
Key Features:
- Context file support for simplified commands
- Command aliases (e.g.,
agent
subcommand) - Global flags (--debug, --version)
Example Flow:
tfgrid-compose up /path/to/app
↓
Load context (optional)
↓
Validate prerequisites
↓
Call orchestrator.deploy_app()
2. Common Utilities (core/common.sh
)¶
Purpose: Shared functions used across all modules
Key Functions:
# Logging
log_info() # Informational messages
log_success() # Success indicators
log_warning() # Warnings
log_error() # Errors
log_step() # Major steps
# Utilities
command_exists() # Check if command is available
yaml_get() # Extract values from YAML files
state_save() # Save to state file
state_get() # Retrieve from state file
state_clear() # Clear state directory
Color-Coded Output:
- 🔵 Blue: Informational
- ✅ Green: Success
- ⚠️ Yellow: Warnings
- ❌ Red: Errors
- ▶ Purple: Major steps
3. Validation Module (core/validation.sh
)¶
Purpose: Validate prerequisites and configurations
Validation Stages:
validate_prerequisites()
├── Check OpenTofu/Terraform (prefer OpenTofu)
├── Check Ansible
├── Check SSH client
├── Check WireGuard (optional warning)
├── Load ThreeFold mnemonic
└── Export TF_CMD environment variable
validate_app_path()
├── Check directory exists
└── Check tfgrid-compose.yaml exists
validate_deployment_exists()
├── Check .tfgrid-compose/ directory
└── Check state.yaml file
validate_no_deployment()
└── Prevent duplicate deployments
validate_network_prerequisites()
├── Check WireGuard (if needed)
├── Check Mycelium (if needed)
└── Test connectivity
Mnemonic Loading Priority:
1. Environment variable: $TF_VAR_mnemonic
2. Standard location: ~/.config/threefold/mnemonic
3. Project-specific: ./.tfgrid-mnemonic
OpenTofu Priority:
# Check tofu first (open source)
if command -v tofu; then
export TF_CMD="tofu"
elif command -v terraform; then
export TF_CMD="terraform"
else
error "Neither found"
fi
4. Pattern Loader (core/pattern-loader.sh
)¶
Purpose: Load and validate deployment patterns
Pattern Structure:
patterns/
└── single-vm/
├── pattern.yaml # Pattern metadata
├── infrastructure/ # Terraform/OpenTofu files
│ ├── main.tf
│ ├── variables.tf
│ └── outputs.tf # MUST implement contract
├── platform/ # Ansible playbooks
│ ├── site.yml
│ └── roles/
└── README.md
Pattern Loading Flow:
load_pattern()
├── Read pattern name from app manifest
├── Validate pattern directory exists
├── Load pattern.yaml metadata
├── Verify infrastructure/ (Terraform files)
├── Verify platform/ (Ansible playbooks)
└── Export pattern variables
Pattern Metadata (pattern.yaml
):
name: single-vm
version: 1.0.0
description: Single VM deployment
requires:
- terraform # or opentofu
- ansible
- wireguard # optional
5. App Loader (core/app-loader.sh
)¶
Purpose: Load and validate application manifests
App Manifest (tfgrid-compose.yaml
):
name: my-app
version: 1.0.0
pattern: single-vm
# Pattern configuration
config:
node: 8
cpu: 4
memory: 8192
disk: 102400
# Optional app-specific hooks
hooks:
setup: ./deployment/hooks/setup.sh
configure: ./deployment/hooks/configure.sh
healthcheck: ./deployment/hooks/healthcheck.sh
App Loading Flow:
load_app()
├── Parse tfgrid-compose.yaml
├── Extract app metadata
├── Load pattern configuration
├── Validate hooks (if present)
└── Export app variables
6. Orchestrator (core/orchestrator.sh
)¶
Purpose: Main deployment orchestration logic
Deployment Flow:
deploy_app()
│
├─ 1. VALIDATION PHASE
│ ├── Validate system prerequisites
│ ├── Validate no existing deployment
│ ├── Validate app path and manifest
│ └── Load pattern and app config
│
├─ 2. INFRASTRUCTURE PHASE
│ ├── Generate Terraform config
│ ├── Run: terraform init
│ ├── Run: terraform plan
│ ├── Run: terraform apply
│ └── Extract outputs (primary_ip, node_ids, etc.)
│
├─ 3. NETWORK PHASE
│ ├── If primary_ip_type == "wireguard":
│ │ ├── Extract WireGuard config
│ │ ├── Deploy to /etc/wireguard/
│ │ └── Start wg-quick up
│ └── Test connectivity
│
├─ 4. WAIT PHASE
│ ├── Wait for SSH (up to 5 minutes)
│ ├── Retry connection every 10 seconds
│ └── Verify SSH access
│
├─ 5. PLATFORM PHASE
│ ├── Generate Ansible inventory
│ ├── Run: ansible-playbook site.yml
│ └── Configure base system
│
├─ 6. APPLICATION PHASE
│ ├── Copy app source to VM
│ ├── Run setup hook (if present)
│ ├── Run configure hook (if present)
│ └── Run healthcheck hook (if present)
│
├─ 7. VERIFICATION PHASE
│ ├── Test SSH connectivity
│ ├── Check application service
│ └── Validate deployment
│
└─ 8. FINALIZATION
├── Save deployment metadata
├── Display connection info
└── Show next steps
Destroy Flow:
destroy_deployment()
├── Validate deployment exists
├── Stop WireGuard interface (if active)
├── Run: terraform destroy
└── Clear state directory
7. Task Executors (core/tasks/
)¶
Purpose: Execute specific deployment tasks
terraform.sh
¶
# Detects OpenTofu or Terraform
TF_CMD detection (tofu → terraform → error)
├── terraform init
├── terraform plan
├── terraform apply
└── Extract outputs to state.yaml
wireguard.sh
¶
# Sets up WireGuard VPN
├── Extract wg_config from Terraform
├── Generate interface name (wg0, wg1, ...)
├── Deploy to /etc/wireguard/
├── Handle conflicts with existing interfaces
└── Start interface with wg-quick
ansible.sh
¶
# Configures platform
├── Generate inventory from state
├── Test connectivity
├── Run playbook
└── Capture logs
wait-ssh.sh
¶
# Waits for SSH to be ready
├── Timeout: 300 seconds (5 minutes)
├── Retry: Every 10 seconds
├── Test: ssh -o BatchMode=yes <ip> 'echo ok'
└── Early exit on first success
Data Flow¶
State Directory Structure¶
.tfgrid-compose/
├── state.yaml # Deployment metadata
├── terraform/ # Generated Terraform config
│ ├── main.tf
│ ├── variables.tf
│ ├── outputs.tf
│ ├── .terraform/ # Terraform state
│ └── terraform.tfstate
├── inventory.ini # Generated Ansible inventory
├── wg0.conf # WireGuard config (if used)
├── terraform-init.log # Terraform init log
├── terraform-plan.log # Terraform plan log
├── terraform-apply.log # Terraform apply log
├── ansible.log # Ansible playbook log
├── hook-setup.log # Setup hook log
├── hook-configure.log # Configure hook log
└── hook-healthcheck.log # Healthcheck hook log
State File (state.yaml
)¶
# Metadata
app_name: my-app
app_version: 1.0.0
pattern: single-vm
deployment_id: abc123xyz
created_at: 2025-10-14T12:00:00Z
# Infrastructure outputs
vm_ip: 10.1.3.2
primary_ip: 10.1.3.2
primary_ip_type: wireguard
deployment_name: vm_abc123xyz
node_ids: [8]
mycelium_ip: 543:7233:7534:51c4:ff0f:f38b:d69b:8f19
# Network
wg_interface: wg0
network_name: net_abc123xyz
# Platform
ssh_user: root
ssh_key: /home/user/.ssh/id_ed25519
# Status
status: running
last_updated: 2025-10-14T12:05:00Z
Data Flow Diagram¶
┌──────────────────┐
│ User Command │
│ tfgrid-compose │
│ up │
└────────┬─────────┘
│
▼
┌──────────────────┐
│ App Manifest │
│ .yaml file │───┐
└────────┬─────────┘ │
│ │
▼ │
┌──────────────────┐ │
│ Pattern Files │◄──┘
│ Infrastructure/ │
│ Platform/ │
└────────┬─────────┘
│
▼
┌──────────────────┐
│ Terraform Apply │
│ Creates VMs │
└────────┬─────────┘
│
▼
┌──────────────────┐
│ Extract Outputs │
│ Save to state │
└────────┬─────────┘
│
▼
┌──────────────────┐
│ WireGuard Setup │
│ Network Access │
└────────┬─────────┘
│
▼
┌──────────────────┐
│ Ansible Config │
│ Platform Setup │
└────────┬─────────┘
│
▼
┌──────────────────┐
│ App Deployment │
│ Hooks Execution │
└────────┬─────────┘
│
▼
┌──────────────────┐
│ State Saved │
│ Deployment Done │
└──────────────────┘
Pattern System¶
Pattern Contract¶
Every pattern MUST implement the standard contract.
Required Terraform Outputs:
output "primary_ip" {
value = "10.1.3.2"
description = "Primary IP for SSH"
}
output "primary_ip_type" {
value = "wireguard" # or "public" or "mycelium"
description = "Type of primary IP"
}
output "deployment_name" {
value = "vm_abc123"
description = "Deployment name"
}
output "node_ids" {
value = [8]
description = "Node IDs used"
}
Pattern Types¶
1. Single-VM Pattern¶
Use Case: Development, databases, internal services, AI agents
Infrastructure:
- 1 VM on TFGrid
- WireGuard private network
- Optional Mycelium IPv6
Network:
- Primary: WireGuard IP (10.x.x.x)
- Secondary: Mycelium IP (IPv6)
Platform:
- Ubuntu 24.04
- Base packages (git, curl, build-essential)
- Docker (optional)
2. Gateway Pattern¶
Use Case: Production web apps with public IPv4
Infrastructure:
- 1 Gateway VM (public IPv4)
- N Backend VMs (private network)
- WireGuard + Mycelium
Network:
- Primary: Public IPv4 (gateway)
- Secondary: WireGuard IPs (backends)
Platform:
- Gateway: Nginx/HAProxy reverse proxy
- Backends: Application servers
- SSL/TLS: Let's Encrypt (automated)
3. K3s Pattern¶
Use Case: Kubernetes clusters
Infrastructure:
- 1+ Control plane nodes
- N Worker nodes
- 1 Management node (kubectl, helm, k9s)
Network:
- Primary: Control plane WireGuard IP
- Secondary: Worker node IPs
Platform:
- K3s lightweight Kubernetes
- MetalLB load balancer
- Nginx Ingress Controller
- Local-path storage provisioner
State Management¶
State Lifecycle¶
┌──────────────┐
│ No State │ Initial state
└──────┬───────┘
│
│ deploy_app()
▼
┌──────────────┐
│ Deploying │ Terraform running
└──────┬───────┘
│
│ Success
▼
┌──────────────┐
│ Running │ Deployment active
└──────┬───────┘
│
│ destroy_deployment()
▼
┌──────────────┐
│ Destroying │ Terraform destroy
└──────┬───────┘
│
│ Success
▼
┌──────────────┐
│ Cleaned │ State removed
└──────────────┘
State Operations¶
Create State:
Save to State:
Read from State:
Clear State:
State Validation¶
Check Deployment Exists:
Prevent Duplicate Deployments:
validate_no_deployment()
if deployment_exists; then
show current deployment info
error "Cannot deploy while another exists"
fi
Extension Guide¶
Adding a New Pattern¶
1. Create Pattern Directory:
patterns/my-pattern/
├── pattern.yaml
├── infrastructure/
│ └── main.tf # Implement pattern contract!
├── platform/
│ └── site.yml
└── README.md
2. Implement Pattern Contract:
# infrastructure/outputs.tf
output "primary_ip" {
value = your_resource.primary_ip
}
output "primary_ip_type" {
value = "wireguard"
}
output "deployment_name" {
value = your_resource.name
}
output "node_ids" {
value = [var.node_id]
}
3. Test Pattern:
# Test with example app
tfgrid-compose up /path/to/test-app
# Verify outputs
cd .tfgrid-compose/terraform
terraform output primary_ip
terraform output primary_ip_type
4. Document Pattern:
# My Pattern
## Use Cases
- What problems does this solve?
## Configuration
- What options are available?
## Example
- Show complete example
Adding a New Task¶
1. Create Task Script:
2. Implement Task:
#!/usr/bin/env bash
set -e
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "$SCRIPT_DIR/../common.sh"
log_step "Running my task..."
# Task implementation
# ...
log_success "Task complete"
3. Call from Orchestrator:
# In core/orchestrator.sh
if ! bash "$DEPLOYER_ROOT/core/tasks/my-task.sh"; then
log_error "My task failed"
return 1
fi
Adding New Commands¶
1. Add Command Handler:
2. Update Help Text:
Design Decisions¶
Why Bash?¶
Pros:
- ✅ Universal availability (every Unix system)
- ✅ Easy to understand and debug
- ✅ Transparent execution (no compilation)
- ✅ Excellent for orchestration
- ✅ Direct integration with CLI tools
- ✅ Minimal dependencies
Cons:
- ❌ Error handling can be tricky
- ❌ No type safety
- ❌ Testing is harder
Mitigation:
- Use
set -e
(exit on error) - Extensive validation
- Logging at each step
- Test scripts provided
Why Pattern-Based?¶
Benefits:
- Different use cases need different infrastructure
- Patterns encapsulate best practices
- Easy to add new patterns without modifying core
- Users can create custom patterns
Alternatives Considered:
- Single monolithic template (too rigid)
- Full config DSL (too complex)
- GUI builder (not CLI-friendly)
Why OpenTofu Priority?¶
Rationale:
- Open source (Apache 2.0 license)
- Compatible with Terraform
- Community-driven development
- No license restrictions
- Same user experience
Fallback:
- Terraform still supported
- Auto-detection at runtime
- No breaking changes
Why State in .tfgrid-compose/
?¶
Benefits:
- Co-located with deployment
- Easy to find and inspect
- Git-ignored by default
- Self-contained
Alternatives:
~/.tfgrid/
(harder to track per-project)- Database (added complexity)
- Cloud storage (requires connectivity)
Performance Considerations¶
Deployment Speed¶
Typical Timeline:
- Infrastructure (Terraform): 30-60 seconds
- Network setup (WireGuard): 5-10 seconds
- Wait for SSH: 30-90 seconds
- Platform config (Ansible): 60-120 seconds
- App deployment: 10-30 seconds
Total: 2-5 minutes
Optimization Opportunities¶
1. Parallel Execution:
- Multiple Ansible hosts configured in parallel
- Background tasks where possible
2. Caching:
- Terraform state cached locally
- Ansible facts cached between runs
3. Incremental Updates:
- Only run changed playbooks
- Terraform plan before apply
Resource Usage¶
Local Machine:
- Minimal CPU usage (mostly waiting)
- Small disk footprint (<50MB for state)
- Network: Depends on Terraform operations
Remote VMs:
- Configured per pattern requirements
- Single-VM: 2-8 CPU, 4-16GB RAM typical
- Gateway: 1-2 CPU per VM, 2-4GB RAM
- K3s: 2+ CPU per node, 4+GB RAM
Security Considerations¶
Mnemonic Security¶
Storage:
- File permissions: 600 (read/write owner only)
- Standard location:
~/.config/threefold/mnemonic
- Warning if permissions are incorrect
Best Practices:
- Never commit to version control
- Use environment variable in CI/CD
- Rotate regularly
SSH Key Management¶
Default Behavior:
- Uses system SSH keys (
~/.ssh/id_*.pub
) - Injected into VMs during deployment
- No passwords (key-based auth only)
WireGuard Security¶
Private Keys:
- Generated by Terraform provider
- Stored in Terraform state (encrypted at rest)
- Deployed to
/etc/wireguard/
with 600 permissions
Network Isolation:
- Private networks per deployment
- No default internet routing
- Explicit rules required for external access
Troubleshooting Architecture¶
Debug Mode¶
Enable:
Effects:
- Verbose logging (
set -x
) - Keep temporary files
- Show all command output
- Network diagnostics
Log Files¶
All logs saved to .tfgrid-compose/
:
terraform-init.log
terraform-plan.log
terraform-apply.log
ansible.log
hook-*.log
State Inspection¶
Check Deployment:
Check Terraform State:
Check WireGuard:
Future Architecture¶
Planned Enhancements¶
1. Multi-Deployment Support
~/.tfgrid-compose/
├── deployments/
│ ├── abc123/ # Deployment 1
│ └── def456/ # Deployment 2
└── index.yaml # Registry
2. Plugin System
3. Remote State
References¶
Document Status: Active
Last Updated: 2025-10-14
Next Review: After 1.0.0 release
TFGrid Studio Ecosystem
Integrated tools and resources