Back to enclave research
Glasshouse model workload / May 17, 2026 / updated June 1

Glasshouse measured 1.62% Qwen 7B H100 throughput overhead on one 60-minute same-allocation run.

The previous Glasshouse overhead work used a matched CUDA/PyTorch benchmark. This follow-up moves the same-instance method onto a real transformer workload: Qwen 0.5B LoRA training on one RunPod RTX 4090 and RTX 3090 allocations, then Qwen 7B LoRA training on RunPod RTX A4000 and H100 allocations. The clean 0.5B 15-minute, 30-minute, and stabilized 60-minute rows measured0.86%, -0.45%, and 0.81%overhead. The new 7B smoke, 15-minute, and 30-minute rows measured0.59%, 1.11%, and 0.26% on A4000, then the H100 wall-clock repeat measured 0.77%. The latest publishable H100 row time-boxed both segments to 60 minutes on the same allocation and measured 1.62% throughput overhead with verified attestation, finite losses, distinct adapter digests, raw callback completion, direct provider cleanup verification, and zeroized cleanup.

Why this matters

Synthetic CUDA benchmarks are useful for repeatability, but buyers eventually ask whether the protected path survives a real model workload. This run answers that next question first for a small Qwen LoRA job, then for Qwen 7B LoRA, without changing GPU allocations between raw and protected execution.

The clean rows show that Glasshouse can attest, release, execute, record, and zeroize around a real LoRA training loop while the latest time-boxed H100 row keeps measured throughput overhead at 1.62%.

Result table

The clean 0.5B 15-minute, 30-minute, and 60-minute rows stayed near zero measured overhead.

Each row used a fixed wall-clock target. Raw ran first, then the protected Glasshouse path ran on the same RunPod allocation.

RowRaw elapsedProtected elapsedRaw stepsProtected stepsMeasured overheadScope
15m900.351s900.449s9,1439,0650.86%clean model artifact
30m1,800.423s1,800.405s18,12118,202-0.45%clean model artifact
60m3,600.629s3,600.377s39,66139,3360.81%clean model artifact
7B follow-up

The 7B smoke, duration, and time-boxed rows passed on RunPod A4000 and H100 allocations.

The 7B row is the important step beyond the 0.5B duration sweep. Qwen/Qwen2.5-7B-Instruct loaded in BF16, trained LoRA adapters with finite losses, produced distinct raw/protected adapter digests, and finished with verified attestation and zeroized cleanup. The newest H100 repeat used one allocation: the protected segment ran for 60 minutes, then the raw segment ran for 60 minutes on the same provider instance.

RowRaw elapsedProtected elapsedRaw stepsProtected stepsMeasured overheadScope
7B smoke39.689s39.924s2562560.59%same-instance 7B fit and lifecycle check
7B 15m900.373s900.387s8,4528,3581.11%sustained 15-minute 7B LoRA row
7B 30m1,800.435s1,800.379s16,85116,8070.26%sustained 30-minute 7B LoRA row
7B 30m H1001,800.923s1,801.216s15,30515,1900.77%H100 repeat on a higher-end RunPod Secure allocation
7B 30k-step H100 protected-first diagnostic2,161.061s2,145.464s30,00030,000-0.73%same H100 allocation; variance diagnostic, not the published overhead row
7B 60m H100 time-boxed3,600.212s3,600.221s49,67848,8721.62%same H100 allocation, protected first, raw second
Methodology

Same allocation, real model, explicit duration or step target.

This run intentionally fixes the biggest measurement problem from early GPU marketplace tests: provider allocation variance. Raw and protected segments used the same rented GPU allocation instead of comparing two different provider nodes. The latest H100 row is time-boxed and reports throughput overhead, so the published number is based on completed steps per second.

ProviderRunPod Secure Cloud
GPURTX 4090 for 0.5B 15m/30m; RTX 3090 for 0.5B 60m; RTX A4000 and H100 for 7B rows
Comparisonraw and protected ran sequentially on the same provider allocation
Orders0.5B rows raw-first; sustained 7B duration rows protected-first; latest H100 time-boxed row protected-first
ModelsQwen/Qwen2.5-0.5B-Instruct and Qwen/Qwen2.5-7B-Instruct
WorkloadLoRA training under fixed wall-clock targets plus the latest H100 time-boxed repeat
60m stability fixbfloat16, 5e-5 learning rate, gradient clip 1.0, fail-on-nonfinite enabled
7B settingsbfloat16, batch size 1, max length 64, 1e-5 learning rate, gradient clip 1.0
Protected pathattestation, gated key release, protected execution evidence, zeroized cleanup
TEE scopesoftware-enforced protected execution on rented GPU infrastructure, not a hardware TEE claim
CleanupRunPod reported zero active pods after the latest H100 repeat
Model artifacts

The clean rows emitted different raw and protected adapter digests.

Different adapter hashes are expected here: the raw and protected segments are separate training runs. The important part is that the public artifact records the digest and loss for each clean row, while the protected path also records the attested lifecycle.

ArtifactFinal lossAdapter SHA-256
15m protected3.720355e-05520660edb6460102da5fea8625c9ff3808e55555365eb047c5438be9dff54bd3
15m raw3.965292e-05b4d36a0834745df14cb1795144dbe724e39f053d256edc040df3d79b2be1f003
30m protected0.00018375930881acc7e59014485d36130e8923cb88baafea96e264a664c623956d5f7e28cf
30m raw0.0001055617567df32f71298a40c8e423668c7c74e7f581ec9a16a679cc96c163370107fec1
60m protected4.541306e-08279216e323ba3e7ab17f4e08018ff2f99ca032f86f9d2475d6cef426eee8c9b0
60m raw3.405979e-0864c46cc6cd8b0a99b3332bad3849efe544df0b0c92186eea137c6751e8776b0c
7B smoke protected3.5975997a7d0ec4093f47758b9158a0bebf69e68b765f660013c51ea6a73badece449d7e
7B smoke raw3.6913116eaa486847c69e2530aa989150dbca628631df450854f99ca6fbf300931019e47
7B 15m protected1.299379e-063f8fea702cd6c3aeb860b0f158fd3c8d5609f8b27a78029a5382af77467cd1cb
7B 15m raw1.573558e-0612632d35fe001a8ce719c6b5ea417a61429a73b310bd1de4163c92a34d6f6753
7B 30m protected5.960464e-086c3a1525aef8cd23c52f9fc274d4c8df1966aaec565505261810d00505d1a709
7B 30m raw1.430511e-077bbd583188c858cc4608cb98df05e6f5b18431f750925abf689e15f1126cc18b
7B 30m H100 protected7.152557e-0847c0cc72e80599bedaa29097b992572dc104a280604a83ca05eb01e230dc56b1
7B 30m H100 raw7.152557e-08d8063f11a54340f81bb204abd1b78e2fa40bfcd47438b0ce3641a47e6e3797f9
7B 60m H100 protected2.384185648907078e-0869f2b90f5830b4783afdec5003c4f5f82e4d25cad5c6e7074e1a67eb01c2f32a
7B 60m H100 raw2.384185648907078e-0814524115d46ecc134cd1ba9511cf05951639d9d6e570e0ff3a40656ed5a81c06
Caveat fixed

The stabilized 60-minute row is now clean model-workload evidence.

The original 60-minute row stayed public because the lifecycle proof passed, but the model artifact was not clean. The rerun fixed that caveat by lowering training aggressiveness and failing immediately on non-finite loss.

Original issueThe first 60-minute row completed lifecycle proof, but final loss was null and raw/protected adapter digests matched.
FixThe rerun used bfloat16, a lower learning rate, gradient clipping, and fail-on-nonfinite checks.
New 60m artifactThe stabilized row emitted finite losses, distinct adapter SHA-256 digests, verified attestation, and zeroized cleanup.
Still not claimedThis includes one clean 60-minute 7B H100 time-boxed repeat, not a multi-sample 7B distribution or production SLA.
7B follow-upThe May 20 H100 repeat used the same allocation for raw and protected segments, verified attestation, zeroized cleanup, finite losses, and distinct adapter digests.
Protected-first repeatThe May 21 protected-first H100 repeat measured -0.73%, which is best read as runtime variance and is not used as the published overhead row.
Time-boxed H100 rowThe June 1 protected-first H100 repeat ran 60-minute protected and raw Qwen 7B LoRA segments on the same allocation. The current published throughput overhead number is 1.62%.
Public artifact

The JSON keeps measurement fields and excludes secrets.

The public JSON keeps provider, GPU, model, timing, step counts, digest, attestation, runtime state, stability notes, and cleanup status. It does not include API keys, local tunnel URLs, encrypted payload paths, or provider credentials.

sanitized excerpt
{
  "provider": "runpod",
  "sameAllocation": true,
  "rows": [
    { "model": "Qwen 0.5B", "label": "60m", "overheadPct": 0.81, "runtimeState": "zeroized" },
    { "model": "Qwen 7B", "label": "smoke", "overheadPct": 0.59, "runtimeState": "zeroized" },
    { "model": "Qwen 7B", "label": "15m", "overheadPct": 1.11, "runtimeState": "zeroized" },
    { "model": "Qwen 7B", "label": "30m A4000", "overheadPct": 0.26, "runtimeState": "zeroized" },
    { "model": "Qwen 7B", "label": "30m H100", "overheadPct": 0.77, "runtimeState": "zeroized" },
    { "model": "Qwen 7B", "label": "60m H100 time-boxed", "overheadPct": 1.62, "runtimeState": "zeroized" }
  ],
  "cleanup": { "providerDeploymentDeleted": true, "directProviderInspection": "pod null" }
}