Baseline vs Iris
This is an operational comparison, not a controlled A/B test, and it does not isolate Iris (our OpenClaw AI agent) as the only variable. The baseline window is the April 22-25, 2026 planner-offline run already documented in the public outage story. The comparison window is April 26-May 2, 2026, when normal Iris planning resumed.
The comparison is still useful because it answers the launch question a skeptical reader will ask first: when the planning loop is online, do the public scorecards look different from the period where the ESP32 had to keep running without normal AI plans?
For the exact parameters Iris can change when it is online, see AI-Writable Tunables.
For the broader caveat language, see the Launch FAQ. For the live receipts behind this page, use Planning Quality, Operations, and the planning archive.
Periods Compared
Planner offline window: 4 days, 0.0 Iris plans/day.
Iris online window: 7 days, 2.9 Iris plans/day.
Summary Table
+2.9
+34.1 pts
+25.1 pts
+43.8 pts
17.2h lower
101.8 gal lower
1.4 kWh lower
USD 0.43 lower
+27.8
Confounders To Keep In View
This comparison is useful, but it is not weather-normalized proof that Iris caused every improvement. The Iris-online window was cooler, more humid, and lower-solar on average, which likely made VPD and heat stress easier. The table below makes those confounders explicit instead of burying them in caveats.
The Iris-online window was cooler, reducing heat-load pressure.
The Iris-online window had less dry-air pressure, so VPD compliance was easier to recover.
Lower solar load reduces overheating and evaporative demand.
Logged event counts look balanced, but operator activity is not controlled like a lab experiment.
The comparison uses the same greenhouse and controller boundary, but it is not a locked hardware trial.
Crop targets are comparable enough for an operational receipt, not for yield attribution or agronomic proof.
Visual Evidence
The two graphs below use the exact comparison span, April 22 through May 2, 2026. They are the clearest visual read on the claim: during the planner-offline days, compliance was lower and stress-category hours were higher; after Iris resumed, the scorecard generally moved in the right direction. The charts still do not prove causality because weather and operator activity were not held constant.
Resource Tradeoffs
The comparison is more useful when stress is shown beside what the greenhouse spent trying to reduce it. These existing planning-quality panels use the public 30-day scorecard context, which includes both the April 22-25 outage window and the April 26-May 2 Iris-online window.
Cost, water, and misting are not success metrics by themselves. They matter because an AI planner can improve the headline score only if it reduces plant stress without hiding the resource bill.
Daily Rows
26.7h stress, 12.9h VPD-high, 9.7h heat, USD 4.23
27.3h stress, 15.5h VPD-high, 2.0h heat, USD 6.93
40.3h stress, 22.5h VPD-high, 6.9h heat, USD 5.52
24.9h stress, 12.7h VPD-high, 9.0h heat, USD 4.68
16.4h stress, 2.9h VPD-high, 4.3h heat, USD 6.38
23.6h stress, 4.6h VPD-high, 1.9h heat, USD 3.17
14.8h stress, 4.5h VPD-high, 3.6h heat, USD 3.03
6.9h stress, 3.3h VPD-high, 0.9h heat, USD 4.42
1.7h stress, 0.0h VPD-high, 0.0h heat, USD 6.96
13.1h stress, 4.4h VPD-high, 4.9h heat, USD 3.80
12.1h stress, 6.4h VPD-high, 3.2h heat, USD 6.60
Definitions
daily_summary.compliance_pctPercent of samples where temperature and VPD were both inside the active crop band.
Summed daily stress duration from corrected daily summary fields. This is not capped at one stress type; a hot-dry hour can count on more than one axis.
v_planner_performance.planner_scoreComposite score: 80% compliance and 20% cost efficiency. It is useful as an operational KPI, not as a yield claim.
daily_summary.kwh_totalElectric energy from the greenhouse power meter where available, with runtime estimates kept as a separate diagnostic.
Resource spend comes from estimated daily summary fields unless marked measured. The greenhouse is solar-aligned but still uses grid electricity and gas heat.
Caveats
- Weather, crop load, hardware state, and operator activity were not identical across the two windows.
- The baseline is a real outage window, not a hand-picked fixed-rule controller experiment.
- The strongest claim is not that Iris guarantees better outcomes every day. The useful claim is that the system makes planner availability, physical stress, cost, and score visible enough to audit.
- This is not a yield, profit, or controlled-trial claim. It is a launch-safe operational receipt; see the FAQ for the claim boundary.
- Known physical and instrumentation limits still apply, including weather, sensor coverage, water attribution, and firmware-change risk. See Known Limits and Firmware Change Protocol.
Reproducibility
This page is generated by scripts/generate-baseline-vs-iris-page.py from daily_summary, plan_journal, and v_planner_performance.
For raw launch-safe data, use the 7-day climate CSV, 30-day plan outcomes CSV, and dataset notes. The current public snapshot is available from the evidence snapshot API.
Where To Go Next
- Why the AI Does Not Control Relays explains the safety split behind the outage window.
- Planning Loop shows how Iris writes hypotheses and waypoints.
- AI-Writable Tunables lists the bounded control surface behind those waypoints.
- Planning Quality shows the live scorecard and forecast-plan-outcome panels.
- Generated Lessons shows what the planner reads before future plans.
- Data Model explains the tables, views, and sample exports behind this comparison.