
Understanding the Analytics Dashboard
Access comprehensive performance insights for any agent by navigating to Analytics in the agent sidebar. Date Range Selector - Filter metrics by Last 7 days, Last 30 days, or Last 3 months to track performance trends Key Performance Metrics - Six primary indicators displayed at dashboard top:- Tasks completed
- Tasks failed
- Tasks approval rate (for HITL workflows)
- Average runtime per task
- Total runtime across all tasks
- Completion rate percentage
- Completion rate (percentage of tasks finishing successfully)
- Average evaluation score (mean accuracy across all evaluated nodes)
- Feedback score (positive vs negative user ratings)
Key Metrics Explained
Tasks Completed
What it measures: Total number of tasks that executed successfully and reached completion without errors. Dashboard display: Numeric count with percentage change from prior period (e.g., “+118.60% from prior period”) What to monitor:- Steady growth indicates healthy agent adoption
- Sudden drops may signal workflow issues or reduced triggering
- Compare against tasks failed to calculate success rate
- Task Executions - View individual task details and execution logs
Tasks Failed
What it measures: Total number of tasks that encountered errors and did not complete successfully. Dashboard display: Numeric count with percentage change from prior period (e.g., “-100.00% from prior period” when zero failures) What to monitor:- Target: 0 failures or <5% failure rate
- Investigate any non-zero values immediately
- Use Debug Tools to diagnose failures
- Integration authentication errors
- Missing required input data
- Timeout errors on complex workflows
- API rate limiting
- Debug Tools - Diagnose and resolve execution errors
- Rerunning Tasks - Retry failed tasks after fixes
Tasks Approval Rate
What it measures: Percentage of tasks requiring human approval that were approved vs rejected in HITL workflows. Dashboard display: Percentage with change from prior period (e.g., “0% from prior period”) What to monitor:- High rejection rates (>20%) indicate agent output quality issues
- Use rejected task feedback to improve prompts
- Consider adding Evaluation Criteria to catch issues before human review
- Only visible for agents with Automation Modes configured for human-in-the-loop (HITL)
- Shows 0% if no approval checkpoints configured
- Automation Modes - Configure HITL approval checkpoints
Average Runtime
What it measures: Mean execution time per task from trigger to completion. Dashboard display: Duration in minutes and seconds (e.g., “4m”) with percentage change from prior period What to monitor:- Baseline your typical runtime for the agent’s workflow complexity
- Sudden increases may indicate:
- Integration slowdowns
- Increased prompt complexity
- Model performance degradation
- Network latency issues
- Review slow nodes using execution logs
- Simplify prompts where possible
- Use faster LLM models for non-critical steps
- Implement parallel execution for independent tasks
Total Runtime
What it measures: Cumulative execution time across all completed tasks in the selected date range. Dashboard display: Duration in hours and minutes (e.g., “21h 17m”) with percentage change What this indicates:- Overall agent workload and resource consumption
- High values with high task counts = good adoption
- High values with low task counts = workflow inefficiency
Completion Rate
What it measures: Percentage of tasks that finished successfully out of total tasks attempted. Dashboard display: Large circular gauge showing percentage (e.g., “98.95%”) Target benchmarks:- 95-100%: Excellent - Agent highly reliable
- 90-94%: Good - Minor optimization opportunities
- 85-89%: Acceptable - Investigate frequent failure patterns
- <85%: Needs attention - Significant reliability issues
(Tasks Completed / (Tasks Completed + Tasks Failed)) × 100
How to improve:
- Identify and fix common failure patterns using Debug Tools
- Add error handling and retry logic to workflow nodes
- Validate integrations are properly authenticated
- Use Test Datasets to catch issues before production
- Debug Tools - Systematic error diagnosis
- Rerunning Tasks - Retry and validate fixes
Average Evaluation Score
What it measures: Mean accuracy percentage across all nodes with evaluation criteria configured. Dashboard display: Large circular gauge showing percentage (e.g., “98.41%”) Target benchmarks:- 95-100%: Excellent - Evaluation criteria well-calibrated
- 90-94%: Good - Minor prompt optimization opportunities
- 85-89%: Acceptable - Review criteria strictness and prompt quality
- <85%: Needs improvement - Systematic quality issues
- How well agent outputs match defined quality standards
- Effectiveness of evaluation criteria configuration
- Need for prompt optimization
- Use Optimize Outputs to automatically improve underperforming nodes
- Review and refine evaluation criteria for balance between strictness and practicality
- Enable auto-run on low-scoring nodes for self-healing
- Analyze failed evaluations to identify patterns
- Only shows data for agents with Evaluation Framework criteria configured
- Empty if no evaluation criteria defined on any workflow nodes
- Evaluation Framework - Configure validation criteria and auto-run
- Optimize Outputs - AI-powered prompt optimization
Feedback Score
What it measures: User satisfaction with agent outputs based on thumbs up/down ratings. Dashboard display: Large circular gauge showing percentage positive (e.g., “100% Positive feedback”) with breakdown of positive (👍) vs negative (👎) counts Target benchmarks:- 90-100%: Excellent - Users highly satisfied with outputs
- 80-89%: Good - Minor quality improvements needed
- 70-79%: Acceptable - Address common user complaints
- <70%: Needs attention - Systematic output quality issues
- Thumbs up/down buttons on task execution results
- Feedback captured per task or per workflow step
- Comments can accompany ratings for qualitative insights
- Review negative feedback comments to identify common issues
- Use feedback to refine prompts and evaluation criteria
- Implement feedback-driven optimization via Optimize Outputs
- Consider if evaluation criteria align with user expectations
- Optimize Outputs - Learn from user feedback to improve prompts
Using Analytics for Optimization
Identifying Performance Issues
Low Completion Rate + High Failures:- Issue: Workflow reliability problems
- Action: Use Debug Tools to diagnose common failure patterns
- Validation: Create Test Datasets covering failure scenarios
- Issue: Agent completing tasks but with poor quality
- Action: Use Optimize Outputs to improve underperforming nodes
- Validation: Review Evaluation Framework criteria for balance
- Issue: Evaluation criteria don’t match user expectations
- Action: Review negative feedback comments and adjust evaluation criteria
- Validation: Incorporate user feedback patterns into evaluation rules
- Issue: Workflow inefficiency limiting adoption
- Action: Identify slow nodes in execution logs and optimize prompts or use faster models
- Validation: Monitor runtime trends after optimization
Tracking Improvement Trends
After Prompt Optimization:- Note baseline evaluation score and completion rate
- Apply optimization via Optimize Outputs
- Monitor analytics for 7-14 days
- Expect 10-40% improvement in evaluation scores
- Document successful optimization patterns
- Baseline period shows no evaluation score
- After criteria deployment, evaluation score appears
- Initial scores typically 70-85% as criteria are calibrated
- Use auto-run to self-heal low scores
- Scores stabilize at 90-95% after 2-4 weeks
- Approval rate metric appears
- Initial rejection rates often 15-30% as agents learn
- Use rejection feedback to refine prompts
- Target 5-10% rejection rate for mature agents
- High approval rates indicate agents ready for full automation
Best Practices
Regular Monitoring Schedule
Regular Monitoring Schedule
Daily (for new agents):
- Check completion rate and failure count
- Review any failed tasks immediately
- Monitor evaluation scores for instability
- Review all key metrics for trends
- Compare current week vs prior week performance
- Investigate any metric degradation >10%
- Celebrate improvements with stakeholders
- Analyze trends across 30-day and 3-month views
- Identify seasonal patterns or usage changes
- Plan optimization initiatives based on data
- Review and update evaluation criteria if needed
Setting Baseline Metrics
Setting Baseline Metrics
New Agent Baseline (First 30 Days):
- Completion rate: 85-90% acceptable as agent stabilizes
- Evaluation score: 75-85% during calibration
- Feedback score: 80-90% as users learn agent capabilities
- Average runtime: Establish typical duration for workflow complexity
- Completion rate: 95%+
- Evaluation score: 90%+
- Feedback score: 90%+
- Average runtime: Within 10% of baseline
- Record initial metrics when agent goes live
- Note any major workflow changes affecting comparability
- Use baselines to calculate ROI and improvement percentages
Responding to Metric Changes
Responding to Metric Changes
Sudden Drops (>20% decrease overnight):
- Likely causes: Integration outage, authentication failure, upstream system change
- Action: Check recent workflow changes, verify integrations, review execution logs
- Urgency: High - investigate within 1 hour
- Likely causes: Data drift, prompt degradation, evaluation criteria misalignment
- Action: Analyze recent task executions, run test datasets, consider re-optimization
- Urgency: Medium - investigate within 1 day
- Likely causes: New trigger source, increased adoption, duplicate triggering
- Action: Verify expected behavior, check for duplicate task creation, validate trigger configuration
- Urgency: Medium - investigate within 1 day
- Likely causes: Stable agent performance OR lack of usage
- Action: Verify task triggering is occurring, check if usage patterns changed
- Urgency: Low - review during weekly check-in
Comparing Across Agents
Comparing Across Agents
Benchmarking Similar Agents:
- Compare completion rates for agents handling similar complexity
- Identify highest-performing agents and analyze their prompts/configuration
- Use top performers as templates for new agents
- Simple (1-3 nodes): Target 98%+ completion, 95%+ evaluation
- Medium (4-8 nodes): Target 95%+ completion, 90%+ evaluation
- Complex (9+ nodes): Target 92%+ completion, 88%+ evaluation
- Invoice processing: 95%+ completion, 95%+ evaluation
- Email triage: 97%+ completion, 90%+ evaluation
- Data extraction: 90%+ completion, 93%+ evaluation
- Customer inquiry: 93%+ completion, 88%+ evaluation