Skip to main content
The Analytics dashboard provides real-time visibility into agent performance across key metrics—completion rates, evaluation scores, runtime, and user feedback—enabling data-driven optimization decisions.
Analytics dashboard showing 98.95% completion rate, 98.41% avg evaluation score, 100% positive feedback, with date range September 9th - October 9th, 2025

Understanding the Analytics Dashboard

Access comprehensive performance insights for any agent by navigating to Analytics in the agent sidebar. Date Range Selector - Filter metrics by Last 7 days, Last 30 days, or Last 3 months to track performance trends Key Performance Metrics - Six primary indicators displayed at dashboard top:
  • Tasks completed
  • Tasks failed
  • Tasks approval rate (for HITL workflows)
  • Average runtime per task
  • Total runtime across all tasks
  • Completion rate percentage
Evaluation Metrics - Visual gauges showing:
  • Completion rate (percentage of tasks finishing successfully)
  • Average evaluation score (mean accuracy across all evaluated nodes)
  • Feedback score (positive vs negative user ratings)

Key Metrics Explained

Tasks Completed

What it measures: Total number of tasks that executed successfully and reached completion without errors. Dashboard display: Numeric count with percentage change from prior period (e.g., “+118.60% from prior period”) What to monitor:
  • Steady growth indicates healthy agent adoption
  • Sudden drops may signal workflow issues or reduced triggering
  • Compare against tasks failed to calculate success rate
Related pages:

Tasks Failed

What it measures: Total number of tasks that encountered errors and did not complete successfully. Dashboard display: Numeric count with percentage change from prior period (e.g., “-100.00% from prior period” when zero failures) What to monitor:
  • Target: 0 failures or <5% failure rate
  • Investigate any non-zero values immediately
  • Use Debug Tools to diagnose failures
Common failure causes:
  • Integration authentication errors
  • Missing required input data
  • Timeout errors on complex workflows
  • API rate limiting
Related pages:

Tasks Approval Rate

What it measures: Percentage of tasks requiring human approval that were approved vs rejected in HITL workflows. Dashboard display: Percentage with change from prior period (e.g., “0% from prior period”) What to monitor:
  • High rejection rates (>20%) indicate agent output quality issues
  • Use rejected task feedback to improve prompts
  • Consider adding Evaluation Criteria to catch issues before human review
When this appears:
  • Only visible for agents with Automation Modes configured for human-in-the-loop (HITL)
  • Shows 0% if no approval checkpoints configured
Related pages:

Average Runtime

What it measures: Mean execution time per task from trigger to completion. Dashboard display: Duration in minutes and seconds (e.g., “4m”) with percentage change from prior period What to monitor:
  • Baseline your typical runtime for the agent’s workflow complexity
  • Sudden increases may indicate:
    • Integration slowdowns
    • Increased prompt complexity
    • Model performance degradation
    • Network latency issues
Optimization strategies:
  • Review slow nodes using execution logs
  • Simplify prompts where possible
  • Use faster LLM models for non-critical steps
  • Implement parallel execution for independent tasks

Total Runtime

What it measures: Cumulative execution time across all completed tasks in the selected date range. Dashboard display: Duration in hours and minutes (e.g., “21h 17m”) with percentage change What this indicates:
  • Overall agent workload and resource consumption
  • High values with high task counts = good adoption
  • High values with low task counts = workflow inefficiency

Completion Rate

What it measures: Percentage of tasks that finished successfully out of total tasks attempted. Dashboard display: Large circular gauge showing percentage (e.g., “98.95%”) Target benchmarks:
  • 95-100%: Excellent - Agent highly reliable
  • 90-94%: Good - Minor optimization opportunities
  • 85-89%: Acceptable - Investigate frequent failure patterns
  • <85%: Needs attention - Significant reliability issues
Calculation: (Tasks Completed / (Tasks Completed + Tasks Failed)) × 100 How to improve:
  • Identify and fix common failure patterns using Debug Tools
  • Add error handling and retry logic to workflow nodes
  • Validate integrations are properly authenticated
  • Use Test Datasets to catch issues before production
Related pages:

Average Evaluation Score

What it measures: Mean accuracy percentage across all nodes with evaluation criteria configured. Dashboard display: Large circular gauge showing percentage (e.g., “98.41%”) Target benchmarks:
  • 95-100%: Excellent - Evaluation criteria well-calibrated
  • 90-94%: Good - Minor prompt optimization opportunities
  • 85-89%: Acceptable - Review criteria strictness and prompt quality
  • <85%: Needs improvement - Systematic quality issues
What this indicates:
  • How well agent outputs match defined quality standards
  • Effectiveness of evaluation criteria configuration
  • Need for prompt optimization
How to improve:
  • Use Optimize Outputs to automatically improve underperforming nodes
  • Review and refine evaluation criteria for balance between strictness and practicality
  • Enable auto-run on low-scoring nodes for self-healing
  • Analyze failed evaluations to identify patterns
When this appears:
  • Only shows data for agents with Evaluation Framework criteria configured
  • Empty if no evaluation criteria defined on any workflow nodes
Related pages:

Feedback Score

What it measures: User satisfaction with agent outputs based on thumbs up/down ratings. Dashboard display: Large circular gauge showing percentage positive (e.g., “100% Positive feedback”) with breakdown of positive (👍) vs negative (👎) counts Target benchmarks:
  • 90-100%: Excellent - Users highly satisfied with outputs
  • 80-89%: Good - Minor quality improvements needed
  • 70-79%: Acceptable - Address common user complaints
  • <70%: Needs attention - Systematic output quality issues
How users provide feedback:
  • Thumbs up/down buttons on task execution results
  • Feedback captured per task or per workflow step
  • Comments can accompany ratings for qualitative insights
How to improve:
  • Review negative feedback comments to identify common issues
  • Use feedback to refine prompts and evaluation criteria
  • Implement feedback-driven optimization via Optimize Outputs
  • Consider if evaluation criteria align with user expectations
Related pages:

Using Analytics for Optimization

Identifying Performance Issues

Low Completion Rate + High Failures:
  • Issue: Workflow reliability problems
  • Action: Use Debug Tools to diagnose common failure patterns
  • Validation: Create Test Datasets covering failure scenarios
Low Evaluation Score + High Completion Rate: Low Feedback Score + High Evaluation Score:
  • Issue: Evaluation criteria don’t match user expectations
  • Action: Review negative feedback comments and adjust evaluation criteria
  • Validation: Incorporate user feedback patterns into evaluation rules
High Average Runtime + Low Task Count:
  • Issue: Workflow inefficiency limiting adoption
  • Action: Identify slow nodes in execution logs and optimize prompts or use faster models
  • Validation: Monitor runtime trends after optimization
After Prompt Optimization:
  1. Note baseline evaluation score and completion rate
  2. Apply optimization via Optimize Outputs
  3. Monitor analytics for 7-14 days
  4. Expect 10-40% improvement in evaluation scores
  5. Document successful optimization patterns
After Adding Evaluation Criteria:
  1. Baseline period shows no evaluation score
  2. After criteria deployment, evaluation score appears
  3. Initial scores typically 70-85% as criteria are calibrated
  4. Use auto-run to self-heal low scores
  5. Scores stabilize at 90-95% after 2-4 weeks
After HITL Implementation:
  1. Approval rate metric appears
  2. Initial rejection rates often 15-30% as agents learn
  3. Use rejection feedback to refine prompts
  4. Target 5-10% rejection rate for mature agents
  5. High approval rates indicate agents ready for full automation

Best Practices

Daily (for new agents):
  • Check completion rate and failure count
  • Review any failed tasks immediately
  • Monitor evaluation scores for instability
Weekly (for stable agents):
  • Review all key metrics for trends
  • Compare current week vs prior week performance
  • Investigate any metric degradation >10%
  • Celebrate improvements with stakeholders
Monthly (for mature agents):
  • Analyze trends across 30-day and 3-month views
  • Identify seasonal patterns or usage changes
  • Plan optimization initiatives based on data
  • Review and update evaluation criteria if needed
New Agent Baseline (First 30 Days):
  • Completion rate: 85-90% acceptable as agent stabilizes
  • Evaluation score: 75-85% during calibration
  • Feedback score: 80-90% as users learn agent capabilities
  • Average runtime: Establish typical duration for workflow complexity
Mature Agent Targets (After 30 Days):
  • Completion rate: 95%+
  • Evaluation score: 90%+
  • Feedback score: 90%+
  • Average runtime: Within 10% of baseline
Document Baselines:
  • Record initial metrics when agent goes live
  • Note any major workflow changes affecting comparability
  • Use baselines to calculate ROI and improvement percentages
Sudden Drops (>20% decrease overnight):
  • Likely causes: Integration outage, authentication failure, upstream system change
  • Action: Check recent workflow changes, verify integrations, review execution logs
  • Urgency: High - investigate within 1 hour
Gradual Decline (10-20% decrease over 1-2 weeks):
  • Likely causes: Data drift, prompt degradation, evaluation criteria misalignment
  • Action: Analyze recent task executions, run test datasets, consider re-optimization
  • Urgency: Medium - investigate within 1 day
Unexpected Spike (task count increases >50%):
  • Likely causes: New trigger source, increased adoption, duplicate triggering
  • Action: Verify expected behavior, check for duplicate task creation, validate trigger configuration
  • Urgency: Medium - investigate within 1 day
Metric Stagnation (no change for 2+ weeks):
  • Likely causes: Stable agent performance OR lack of usage
  • Action: Verify task triggering is occurring, check if usage patterns changed
  • Urgency: Low - review during weekly check-in
Benchmarking Similar Agents:
  • Compare completion rates for agents handling similar complexity
  • Identify highest-performing agents and analyze their prompts/configuration
  • Use top performers as templates for new agents
Workflow Complexity Tiers:
  • Simple (1-3 nodes): Target 98%+ completion, 95%+ evaluation
  • Medium (4-8 nodes): Target 95%+ completion, 90%+ evaluation
  • Complex (9+ nodes): Target 92%+ completion, 88%+ evaluation
Industry Standards:
  • Invoice processing: 95%+ completion, 95%+ evaluation
  • Email triage: 97%+ completion, 90%+ evaluation
  • Data extraction: 90%+ completion, 93%+ evaluation
  • Customer inquiry: 93%+ completion, 88%+ evaluation

Next Steps

Evaluation Framework

Set up evaluation criteria to measure and track output quality

Optimize Outputs

Use AI to improve agent accuracy when evaluation scores are low

Task Executions

Drill into individual task details and execution logs

Debug Tools

Diagnose and resolve failures affecting completion rate