Phase 5: High-Performance Optimizations

Step 25: Apply Ceph Performance Optimizations

Configure Ceph for high-performance hardware (64GB RAM, 13th Gen Intel, NVMe drives):

# Memory optimizations (8GB per OSD for 64GB system):
ssh n2 "ceph config set osd osd_memory_target 8589934592"

# BlueStore optimizations:
ssh n2 "ceph config set osd bluestore_cache_size_ssd 4294967296"
ssh n2 "ceph config set osd bluestore_cache_kv_max 2147483648"
ssh n2 "ceph config set osd bluestore_cache_meta_ratio 0.4"
ssh n2 "ceph config set osd bluestore_compression_algorithm lz4"
ssh n2 "ceph config set osd bluestore_compression_mode aggressive"

# Network optimizations:
ssh n2 "ceph config set osd osd_heartbeat_grace 20"
ssh n2 "ceph config set osd osd_heartbeat_interval 5"

# CPU optimizations:
ssh n2 "ceph config set osd osd_op_num_threads_per_shard 2"
ssh n2 "ceph config set osd osd_op_num_shards 12"

# Scrubbing optimizations:
ssh n2 "ceph config set osd osd_scrub_during_recovery false"
ssh n2 "ceph config set osd osd_scrub_begin_hour 1"
ssh n2 "ceph config set osd osd_scrub_end_hour 5"

Restart OSD services to apply optimizations:

# Restart OSDs on all nodes:
for node in n2 n3 n4; do
  echo "=== Restarting OSDs on $node ==="
  ssh $node "systemctl restart ceph-osd@*.service"
  sleep 10  # Wait between nodes
done

Step 26: Apply System-Level Optimizations

OS-level tuning for maximum performance:

# Apply on all mesh nodes:
for node in n2 n3 n4; do
  ssh $node "
    # Network tuning:
    echo 'net.core.rmem_max = 268435456' >> /etc/sysctl.conf
    echo 'net.core.wmem_max = 268435456' >> /etc/sysctl.conf
    echo 'net.core.netdev_max_backlog = 30000' >> /etc/sysctl.conf

    # Memory tuning:
    echo 'vm.swappiness = 1' >> /etc/sysctl.conf
    echo 'vm.min_free_kbytes = 4194304' >> /etc/sysctl.conf

    # Apply settings:
    sysctl -p
  "
done

Step 27: Verify Optimizations Are Active

Check that all optimizations are applied:

# Verify key Ceph optimizations:
ssh n2 "ceph config dump | grep -E '(memory_target|cache_size|compression|heartbeat)'"

Expected output (sample):

osd/osd_memory_target                    8589934592
osd/bluestore_cache_size_ssd            4294967296
osd/bluestore_compression_algorithm     lz4
osd/osd_heartbeat_grace                 20
osd/osd_heartbeat_interval              5

Check system-level optimizations:

# Verify network and memory settings:
for node in n2 n3 n4; do
  echo "=== System tuning on $node ==="
  ssh $node "sysctl net.core.rmem_max vm.swappiness"
done

Performance Testing

Step 28: Run Performance Benchmarks

Test optimized cluster performance:

# Test write performance with optimized cluster:
ssh n2 "rados -p cephtb4 bench 10 write --no-cleanup -b 4M -t 16"

# Test read performance:
ssh n2 "rados -p cephtb4 bench 10 rand -t 16"

# Clean up test data:
ssh n2 "rados -p cephtb4 cleanup"

Expected Performance Results

Write Performance:

Average Bandwidth: 1,294 MB/s
Peak Bandwidth: 2,076 MB/s
Average IOPS: 323
Average Latency: ~48ms

Read Performance:

Average Bandwidth: 1,762 MB/s
Peak Bandwidth: 2,448 MB/s
Average IOPS: 440
Average Latency: ~36ms

Step 29: Verify Proxmox Integration

Check that all optimizations are visible in Proxmox GUI:

Navigate: Ceph → Configuration Database
Verify: All optimization settings visible and applied
Check: No configuration errors or warnings
Navigate: Ceph → OSDs → Verify all 6 OSDs are visible and up

Key optimizations to verify in GUI:

osd_memory_target: 8589934592 (8GB per OSD)
bluestore_cache_size_ssd: 4294967296 (4GB cache)
bluestore_compression_algorithm: lz4
cluster_network: 10.100.0.0/24 (TB4 mesh)
public_network: 10.11.12.0/24

Final Cluster Status

Your optimized TB4 + Ceph cluster should now provide:

High Performance: 1,300+ MB/s write, 1,760+ MB/s read
Low Latency: Sub-millisecond mesh, ~40ms storage
High Availability: 3 monitors, 3 managers, automatic failover
Reliability: 0% packet loss, persistent configuration
Full Integration: Proxmox GUI visibility and management