Why this matters

Synthetic CUDA benchmarks are useful for repeatability, but buyers eventually ask whether the protected path survives a real model workload. This run answers that next question first for a small Qwen LoRA job, then for Qwen 7B LoRA, without changing GPU allocations between raw and protected execution.

The clean rows show that Glasshouse can attest, release, execute, record, and zeroize around a real LoRA training loop while the latest time-boxed H100 row keeps measured throughput overhead at 1.62%.

Result table

The clean 0.5B 15-minute, 30-minute, and 60-minute rows stayed near zero measured overhead.

Each row used a fixed wall-clock target. Raw ran first, then the protected Glasshouse path ran on the same RunPod allocation.

Row	Raw elapsed	Protected elapsed	Raw steps	Protected steps	Measured overhead	Scope
`15m`	`900.351s`	`900.449s`	`9,143`	`9,065`	`0.86%`	clean model artifact
`30m`	`1,800.423s`	`1,800.405s`	`18,121`	`18,202`	`-0.45%`	clean model artifact
`60m`	`3,600.629s`	`3,600.377s`	`39,661`	`39,336`	`0.81%`	clean model artifact

7B follow-up

The 7B smoke, duration, and time-boxed rows passed on RunPod A4000 and H100 allocations.

The 7B row is the important step beyond the 0.5B duration sweep. Qwen/Qwen2.5-7B-Instruct loaded in BF16, trained LoRA adapters with finite losses, produced distinct raw/protected adapter digests, and finished with verified attestation and zeroized cleanup. The newest H100 repeat used one allocation: the protected segment ran for 60 minutes, then the raw segment ran for 60 minutes on the same provider instance.

Row	Raw elapsed	Protected elapsed	Raw steps	Protected steps	Measured overhead	Scope
`7B smoke`	`39.689s`	`39.924s`	`256`	`256`	`0.59%`	same-instance 7B fit and lifecycle check
`7B 15m`	`900.373s`	`900.387s`	`8,452`	`8,358`	`1.11%`	sustained 15-minute 7B LoRA row
`7B 30m`	`1,800.435s`	`1,800.379s`	`16,851`	`16,807`	`0.26%`	sustained 30-minute 7B LoRA row
`7B 30m H100`	`1,800.923s`	`1,801.216s`	`15,305`	`15,190`	`0.77%`	H100 repeat on a higher-end RunPod Secure allocation
`7B 30k-step H100 protected-first diagnostic`	`2,161.061s`	`2,145.464s`	`30,000`	`30,000`	`-0.73%`	same H100 allocation; variance diagnostic, not the published overhead row
`7B 60m H100 time-boxed`	`3,600.212s`	`3,600.221s`	`49,678`	`48,872`	`1.62%`	same H100 allocation, protected first, raw second

Methodology

Same allocation, real model, explicit duration or step target.

This run intentionally fixes the biggest measurement problem from early GPU marketplace tests: provider allocation variance. Raw and protected segments used the same rented GPU allocation instead of comparing two different provider nodes. The latest H100 row is time-boxed and reports throughput overhead, so the published number is based on completed steps per second.

Provider	`RunPod Secure Cloud`
GPU	`RTX 4090 for 0.5B 15m/30m; RTX 3090 for 0.5B 60m; RTX A4000 and H100 for 7B rows`
Comparison	`raw and protected ran sequentially on the same provider allocation`
Orders	`0.5B rows raw-first; sustained 7B duration rows protected-first; latest H100 time-boxed row protected-first`
Models	`Qwen/Qwen2.5-0.5B-Instruct and Qwen/Qwen2.5-7B-Instruct`
Workload	`LoRA training under fixed wall-clock targets plus the latest H100 time-boxed repeat`
60m stability fix	`bfloat16, 5e-5 learning rate, gradient clip 1.0, fail-on-nonfinite enabled`
7B settings	`bfloat16, batch size 1, max length 64, 1e-5 learning rate, gradient clip 1.0`
Protected path	`attestation, gated key release, protected execution evidence, zeroized cleanup`
TEE scope	`software-enforced protected execution on rented GPU infrastructure, not a hardware TEE claim`
Cleanup	`RunPod reported zero active pods after the latest H100 repeat`

Model artifacts

The clean rows emitted different raw and protected adapter digests.

Different adapter hashes are expected here: the raw and protected segments are separate training runs. The important part is that the public artifact records the digest and loss for each clean row, while the protected path also records the attested lifecycle.

Artifact	Final loss	Adapter SHA-256
15m protected	`3.720355e-05`	`520660edb6460102da5fea8625c9ff3808e55555365eb047c5438be9dff54bd3`
15m raw	`3.965292e-05`	`b4d36a0834745df14cb1795144dbe724e39f053d256edc040df3d79b2be1f003`
30m protected	`0.0001837593`	`0881acc7e59014485d36130e8923cb88baafea96e264a664c623956d5f7e28cf`
30m raw	`0.0001055617`	`567df32f71298a40c8e423668c7c74e7f581ec9a16a679cc96c163370107fec1`
60m protected	`4.541306e-08`	`279216e323ba3e7ab17f4e08018ff2f99ca032f86f9d2475d6cef426eee8c9b0`
60m raw	`3.405979e-08`	`64c46cc6cd8b0a99b3332bad3849efe544df0b0c92186eea137c6751e8776b0c`
7B smoke protected	`3.5975997`	`a7d0ec4093f47758b9158a0bebf69e68b765f660013c51ea6a73badece449d7e`
7B smoke raw	`3.6913116`	`eaa486847c69e2530aa989150dbca628631df450854f99ca6fbf300931019e47`
7B 15m protected	`1.299379e-06`	`3f8fea702cd6c3aeb860b0f158fd3c8d5609f8b27a78029a5382af77467cd1cb`
7B 15m raw	`1.573558e-06`	`12632d35fe001a8ce719c6b5ea417a61429a73b310bd1de4163c92a34d6f6753`
7B 30m protected	`5.960464e-08`	`6c3a1525aef8cd23c52f9fc274d4c8df1966aaec565505261810d00505d1a709`
7B 30m raw	`1.430511e-07`	`7bbd583188c858cc4608cb98df05e6f5b18431f750925abf689e15f1126cc18b`
7B 30m H100 protected	`7.152557e-08`	`47c0cc72e80599bedaa29097b992572dc104a280604a83ca05eb01e230dc56b1`
7B 30m H100 raw	`7.152557e-08`	`d8063f11a54340f81bb204abd1b78e2fa40bfcd47438b0ce3641a47e6e3797f9`
7B 60m H100 protected	`2.384185648907078e-08`	`69f2b90f5830b4783afdec5003c4f5f82e4d25cad5c6e7074e1a67eb01c2f32a`
7B 60m H100 raw	`2.384185648907078e-08`	`14524115d46ecc134cd1ba9511cf05951639d9d6e570e0ff3a40656ed5a81c06`

Caveat fixed

The stabilized 60-minute row is now clean model-workload evidence.

The original 60-minute row stayed public because the lifecycle proof passed, but the model artifact was not clean. The rerun fixed that caveat by lowering training aggressiveness and failing immediately on non-finite loss.

Original issue	The first 60-minute row completed lifecycle proof, but final loss was null and raw/protected adapter digests matched.
Fix	The rerun used bfloat16, a lower learning rate, gradient clipping, and fail-on-nonfinite checks.
New 60m artifact	The stabilized row emitted finite losses, distinct adapter SHA-256 digests, verified attestation, and zeroized cleanup.
Still not claimed	This includes one clean 60-minute 7B H100 time-boxed repeat, not a multi-sample 7B distribution or production SLA.
7B follow-up	The May 20 H100 repeat used the same allocation for raw and protected segments, verified attestation, zeroized cleanup, finite losses, and distinct adapter digests.
Protected-first repeat	The May 21 protected-first H100 repeat measured -0.73%, which is best read as runtime variance and is not used as the published overhead row.
Time-boxed H100 row	The June 1 protected-first H100 repeat ran 60-minute protected and raw Qwen 7B LoRA segments on the same allocation. The current published throughput overhead number is 1.62%.

Public artifact

The JSON keeps measurement fields and excludes secrets.

The public JSON keeps provider, GPU, model, timing, step counts, digest, attestation, runtime state, stability notes, and cleanup status. It does not include API keys, local tunnel URLs, encrypted payload paths, or provider credentials.

{
  "provider": "runpod",
  "sameAllocation": true,
  "rows": [
    { "model": "Qwen 0.5B", "label": "60m", "overheadPct": 0.81, "runtimeState": "zeroized" },
    { "model": "Qwen 7B", "label": "smoke", "overheadPct": 0.59, "runtimeState": "zeroized" },
    { "model": "Qwen 7B", "label": "15m", "overheadPct": 1.11, "runtimeState": "zeroized" },
    { "model": "Qwen 7B", "label": "30m A4000", "overheadPct": 0.26, "runtimeState": "zeroized" },
    { "model": "Qwen 7B", "label": "30m H100", "overheadPct": 0.77, "runtimeState": "zeroized" },
    { "model": "Qwen 7B", "label": "60m H100 time-boxed", "overheadPct": 1.62, "runtimeState": "zeroized" }
  ],
  "cleanup": { "providerDeploymentDeleted": true, "directProviderInspection": "pod null" }
}

Open Qwen wall-clock JSON Open H100 protected-first JSON Open H100 time-boxed JSON

Glasshouse measured 1.62% Qwen 7B H100 throughput overhead on one 60-minute same-allocation run.

Five chapters. One continuous proof story.