Reliable Decision Agents
under Imperfect Evidence

When should an AI agent trust historical data—and when should it refuse to recommend?

01 / Input Signal

Imperfect Data

02 / Verification

Reliable Evidence

03 / Core Layer

Decision Agent

04 / Execution

Business Action

Traditional Analytics Engine

Forces a response from observational features without validating underlying structural assumptions.

                # Standard Analytics Paradigm

                User Input: "Should we allocate more budget to Vendor A?"

                System Output:

                > ROI looks positive (+14.2%)

                > RECOMMENDATION: INVEST

Our Solution: Evidence-Aware Refusal

Evaluates identification bounds and data sufficiency. Explicitly triggers a refusal path if data cannot support the action.

                # Proposed Framework

                User Input: "Should we allocate more budget to Vendor A?"

                System Output:

                > Overlap Score: CRITICAL (0.21)

                > Selection Bias: HIGH REGIME

                > STATUS: REFUSE TO RECOMMEND

💡 Booth Takeaway: Statistical reliability becomes an active runtime agent capability, not a static post-hoc check.

Capability	Standard LLM Agent	Our Evidence-Aware Agent
Summarize text & graph data	✓ Yes	✓ Yes
Recommend downstream actions	✓ Yes	✓ Yes
Quantify finite-sample uncertainty	⨯ No	✓ Yes
Detect selection bias & truncation	⨯ No	✓ Yes
REFUSE unsupported decisions	⨯ No (Hallucinates)	★ CRITICAL FEATURE
Suggest optimal next data to collect	⨯ No	✓ Yes

Standard agents are engineered to optimize context retrieval and conversational flow. Our framework transforms mathematical limits into active routing primitives.

1. Missing Data

Missing blocks follow structured unobserved patterns.

2. Selection Bias

Observed subset sits completely above the selection rule line.

3. Truncation

Data cut off below threshold; full distribution tail hidden.

4. Distribution Shift

Training evidence patterns diverge from deployment settings.

"The risk is not uncertainty — the risk is acting as if uncertainty does not exist."

A single operational loop connects asymptotic statistical theory directly to autonomous agent execution:

01

Imperfect Data

• Truncated tails
• Censored responses
• Active selection bias
• Covariate shift

02

Reliable Evidence

• Structural identifiability
• Minimax estimation bounds
• Sharp confidence envelopes
• Non-parametric tests

03

Decision Agents

• Structural verification
• Sensitivity analysis
• Refusal strategies
• Counterfactual bounds

04

Business Action

• Risk-aware investment
• Optimal budget scaling
• Targeted exploration
• Root-cause alerts

Central Agenda: What conclusions and actions are mathematically justified under imperfect offline evidence?

Before optimizing downstream policy interventions, the core interface calculates strict non-parametric identifiability and finite-sample error metrics:

• Strict Identifiability Verification Evaluates if target densities $P(Y|do(X))$ can be uniquely recovered from the truncation region $S$.

• Partial Identification & Confidence Envelopes If point identification fails, the framework outputs explicit upper and lower logical boundaries via localized optimization.

• Minimax Guarantees Establishes finite-sample variance floors matching foundational limits:

inf sup_{P \in P} E_{P} [&lVert; \hat{μ} - μ {&rVert;}^{2}] \geq \frac{d \cdot ψ (S)}{n}

✓ AISTATS 2025 (Truncation) · ✓ UAI 2025 (Outliers) · ✓ ICML 2026 Spotlight (Structure Learning)

Truncation Hides the Tail

Evidence is not static data. Evidence represents observed matrices evaluated alongside localized assumptions and bounds.

Step 01

Business Query

Unstructured textual target or programmatic policy requirement input.

Step 02

Evidence Builder

Retrieves target historical logs and isolates the tracking window.

Step 03

Reliability Engine

Runs selection checks and estimates identification bounds.

Step 04

Causal Reasoner

Triggers continuous evaluation, modeling counterfactual bounds.

Step 05

Decision Matrix

                + Recommendation

                + Confidence Metrics

                + System Caveats

                + Refusal Branches

            
            agent> Processing: "Reallocate budget to high-performing vendor cohort?"

            [evidence_builder] Retrieved 12,400 sessions  |  Window: Q2–Q3

            [reliability_engine] Overlap Score: 0.18  |  Selection Regime: HIGH

            [decision_matrix] REFUSE TO RECOMMEND

            ↳ Caveat: Point estimate unreliable; targeted exploration suggested.

Scenario A — Sufficient Evidence

"Should we expand inventory for Product Line B?"

                > Overlap Score: 0.82

                > Identification: POINT ID

                > Effect Bound: [+3.1%, +5.8%]

                > RECOMMEND: EXPAND

Scenario B — Insufficient Evidence

"Should we shift spend to Vendor A's new channel?"

                > Overlap Score: 0.19

                > Identification: PARTIAL ONLY

                > Effect Bound: [−12%, +28%] (too wide)

                > REFUSE TO RECOMMEND

Booth visitors can walk through both paths and inspect how evidence quality gates the final action.

Offline Evidence

Historical logs, truncated cohorts, censored outcomes, selection artifacts

→

Online Action

Budget allocation, inventory scaling, policy rollout, exploration triggers

Verify — Can the causal quantity be identified from available data?

Bound — What is the sharpest defensible interval for the effect?

Decide — Act, explore, or refuse based on evidence sufficiency.

ICML 2026 Spotlight

Optimal Structure Learning and Conditional Independence Testing

Structural guarantees for evidence graph discovery under imperfect observations.

AISTATS 2025

Gaussian Mean Testing under Truncation

Finite-sample testing when tails are systematically hidden.

UAI 2025

Toward Universal Laws of Outlier Propagation

How anomalies propagate through causal structures and corrupt inference.

Booth Theme

Reliable Decision Agents under Imperfect Evidence

Unifying theory, systems, and business decision-making in one operational loop.

Experience Evidence-Aware
Decision Agents Live

01

Walk Through Scenarios

Compare recommend vs. refuse paths on real business queries.

02

Inspect Evidence Layers

See how overlap, bounds, and identification gates drive decisions.

03

Discuss Research

Connect with our team on causal inference and decision systems.

Building enterprise AI systems that thoroughly verify the limits of evidence before committing to runtime actions:

Phase 1: Imperfect Data

Isolating foundational limits under statistical truncation, missing data blocks, and selection bias anomalies.

Phase 2: Reliable Evidence

Developing asymptotic point and partial identification envelopes alongside exact minimax risk floors.

Phase 3: Decision Agents

Integrating active verification protocols, algorithmic refusal metrics, and target exploration path routing.

Phase 4: Platform Vision

                Deploying a unified, reusable decision intelligence layer across multi-variable enterprise surfaces.
            

One scalable evidence-aware foundation abstracting risk verification across independent business verticals:

Theory Layer

                + Minimax Bounds

                + Identification Checks

                + Tail Reconstructions

Evidence Layer

                + Core Diagnostics

                + Sensitivity Testing

                + Uncertainty Wrappers

Supported Surfaces

• Promotion Optimization
• Vendor Strategies
• Root Cause Analysis
• Budget Risk Allocation

Let's Discuss:

"When should an AI system refuse to make a recommendation?"

Reliable Evidence Foundations

Mean Testing under Truncation beyond Gaussian (AISTATS 2025)
Learning High-Dimensional Gaussians from Censored Data (AISTATS 2025)

Structure & Agent Routing

Optimal Structure Learning and CI Testing (ICML 2026 Spotlight)
PAC Guarantees for Doubly Robust Front-Door Estimators (Causal@UAI Oral)

                SCAN CODE
            

Yuhao Wang

Applied Scientist, Amazon Japan · yohannawang.com

SLIDE 14 / 14

Reliable Decision Agents
under Imperfect Evidence

The Core Problem: Standard Agents Force a Choice

Traditional Analytics Engine

Our Solution: Evidence-Aware Refusal

Why Existing AI Agents Fail

Real-World Anomalies: Why Causal Signals Collapse

Research Agenda: Learn, Infer, and Decide

Layer 1: Rigorous Identification Envelopes

Layer 2: Decision Intelligence Processing Architecture

Live Demo: Two Decision Paths

Closing the Offline-to-Online Gap

Connected Research at ICML 2026

Experience Evidence-Aware
Decision Agents Live

Long-Term Research Vision

A Reusable Decision Intelligence Layer

"When should an AI system refuse to make a recommendation?"

Reliable Decision Agentsunder Imperfect Evidence

The Core Problem: Standard Agents Force a Choice

Traditional Analytics Engine

Our Solution: Evidence-Aware Refusal

Why Existing AI Agents Fail

Real-World Anomalies: Why Causal Signals Collapse

Research Agenda: Learn, Infer, and Decide

Layer 1: Rigorous Identification Envelopes

Layer 2: Decision Intelligence Processing Architecture

Live Demo: Two Decision Paths

Closing the Offline-to-Online Gap

Connected Research at ICML 2026

Experience Evidence-AwareDecision Agents Live

Long-Term Research Vision

A Reusable Decision Intelligence Layer

"When should an AI system refuse to make a recommendation?"

Reliable Decision Agents
under Imperfect Evidence

Experience Evidence-Aware
Decision Agents Live