Baseline vs Iris

This is an operational comparison, not a controlled A/B test, and it does not isolate Iris (our OpenClaw AI agent) as the only variable. The baseline window is the April 22-25, 2026 planner-offline run already documented in the public outage story. The comparison window is April 26-May 2, 2026, when normal Iris planning resumed.

The comparison is still useful because it answers the launch question a skeptical reader will ask first: when the planning loop is online, do the public scorecards look different from the period where the ESP32 had to keep running without normal AI plans?

For the exact parameters Iris can change when it is online, see AI-Writable Tunables.

For the broader caveat language, see the Launch FAQ. For the live receipts behind this page, use Planning Quality, Operations, and the planning archive.

Periods Compared

2026-04-22 to 2026-04-25

Planner offline window: 4 days, 0.0 Iris plans/day.

2026-04-26 to 2026-05-02

Iris online window: 7 days, 2.9 Iris plans/day.

Summary Table

Average Iris plans/day0.0 → 2.9

+2.9

Both-axis compliance20.1% → 54.2%

+34.1 pts

Temperature compliance45.3% → 70.4%

+25.1 pts

VPD compliance30.1% → 73.9%

+43.8 pts

Cumulative stress-axis hours/day29.8h → 12.6h

17.2h lower

Water/day308.8 gal → 207.0 gal

101.8 gal lower

Estimated electric energy/day2.6 kWh → 1.2 kWh

1.4 kWh lower

Cost/dayUSD 5.34 → USD 4.91

USD 0.43 lower

Planner score29.0 → 56.8

+27.8

Confounders To Keep In View

This comparison is useful, but it is not weather-normalized proof that Iris caused every improvement. The Iris-online window was cooler, more humid, and lower-solar on average, which likely made VPD and heat stress easier. The table below makes those confounders explicit instead of burying them in caveats.

Outdoor temperature55.9°F avg / 83.6°F max → 47.8°F avg / 73.5°F max

The Iris-online window was cooler, reducing heat-load pressure.

Outdoor VPD / humidity1.19 kPa avg → 0.55 kPa avg

The Iris-online window had less dry-air pressure, so VPD compliance was easier to recover.

Solar irradiance262 W/m² avg → 205 W/m² avg

Lower solar load reduces overheating and evaporative demand.

Manual interventions1 logged crop event and 0 manual/override irrigation rows in each window

Logged event counts look balanced, but operator activity is not controlled like a lab experiment.

Hardware changesNo major hardware change is asserted here

The comparison uses the same greenhouse and controller boundary, but it is not a locked hardware trial.

Crop mix / active bandsSame public crop-control model, plants still aging

Crop targets are comparable enough for an operational receipt, not for yield attribution or agronomic proof.

Visual Evidence

The two graphs below use the exact comparison span, April 22 through May 2, 2026. They are the clearest visual read on the claim: during the planner-offline days, compliance was lower and stress-category hours were higher; after Iris resumed, the scorecard generally moved in the right direction. The charts still do not prove causality because weather and operator activity were not held constant.

Resource Tradeoffs

The comparison is more useful when stress is shown beside what the greenhouse spent trying to reduce it. These existing planning-quality panels use the public 30-day scorecard context, which includes both the April 22-25 outage window and the April 26-May 2 Iris-online window.

Cost, water, and misting are not success metrics by themselves. They matter because an AI planner can improve the headline score only if it reduces plant stress without hiding the resource bill.

Daily Rows

2026-04-220 plans - 30.2% both-axis - 38.5 score

26.7h stress, 12.9h VPD-high, 9.7h heat, USD 4.23

2026-04-230 plans - 19.3% both-axis - 26.2 score

27.3h stress, 15.5h VPD-high, 2.0h heat, USD 6.93

2026-04-240 plans - 2.9% both-axis - 15.0 score

40.3h stress, 22.5h VPD-high, 6.9h heat, USD 5.52

2026-04-250 plans - 28.0% both-axis - 36.2 score

24.9h stress, 12.7h VPD-high, 9.0h heat, USD 4.68

2026-04-264 plans - 39.8% both-axis - 43.3 score

16.4h stress, 2.9h VPD-high, 4.3h heat, USD 6.38

2026-04-272 plans - 24.9% both-axis - 35.7 score

23.6h stress, 4.6h VPD-high, 1.9h heat, USD 3.17

2026-04-283 plans - 52.4% both-axis - 57.9 score

14.8h stress, 4.5h VPD-high, 3.6h heat, USD 3.03

2026-04-293 plans - 74.3% both-axis - 73.5 score

6.9h stress, 3.3h VPD-high, 0.9h heat, USD 4.42

2026-04-302 plans - 69.3% both-axis - 66.2 score

1.7h stress, 0.0h VPD-high, 0.0h heat, USD 6.96

2026-05-012 plans - 65.5% both-axis - 67.3 score

13.1h stress, 4.4h VPD-high, 4.9h heat, USD 3.80

2026-05-024 plans - 53.4% both-axis - 53.9 score

12.1h stress, 6.4h VPD-high, 3.2h heat, USD 6.60

Definitions

Both-axis compliancedaily_summary.compliance_pct

Percent of samples where temperature and VPD were both inside the active crop band.

Cumulative stress-axis hours/dayHeat + cold + VPD-high + VPD-low

Summed daily stress duration from corrected daily summary fields. This is not capped at one stress type; a hot-dry hour can count on more than one axis.

Planner scorev_planner_performance.planner_score

Composite score: 80% compliance and 20% cost efficiency. It is useful as an operational KPI, not as a yield claim.

Metered electric energy/daydaily_summary.kwh_total

Electric energy from the greenhouse power meter where available, with runtime estimates kept as a separate diagnostic.

Cost/dayElectric + gas + water

Resource spend comes from estimated daily summary fields unless marked measured. The greenhouse is solar-aligned but still uses grid electricity and gas heat.

Caveats

Weather, crop load, hardware state, and operator activity were not identical across the two windows.
The baseline is a real outage window, not a hand-picked fixed-rule controller experiment.
The strongest claim is not that Iris guarantees better outcomes every day. The useful claim is that the system makes planner availability, physical stress, cost, and score visible enough to audit.
This is not a yield, profit, or controlled-trial claim. It is a launch-safe operational receipt; see the FAQ for the claim boundary.
Known physical and instrumentation limits still apply, including weather, sensor coverage, water attribution, and firmware-change risk. See Known Limits and Firmware Change Protocol.

Reproducibility

This page is generated by scripts/generate-baseline-vs-iris-page.py from daily_summary, plan_journal, and v_planner_performance.

For raw launch-safe data, use the 7-day climate CSV, 30-day plan outcomes CSV, and dataset notes. The current public snapshot is available from the evidence snapshot API.

Where To Go Next

Why the AI Does Not Control Relays explains the safety split behind the outage window.
Planning Loop shows how Iris writes hypotheses and waypoints.
AI-Writable Tunables lists the bounded control surface behind those waypoints.
Planning Quality shows the live scorecard and forecast-plan-outcome panels.
Generated Lessons shows what the planner reads before future plans.
Data Model explains the tables, views, and sample exports behind this comparison.

🌱 Verdify