Understanding Task Reruns
Every completed task can be re-executed to verify behavior, test changes, or debug issues. Beam provides multiple rerun strategies: Full Task Rerun - Re-execute entire workflow from start with original trigger data Step-Level Rerun - Restart from specific node for targeted debugging Auto-Rerun - Automatically retry steps that fail evaluation criteria Batch Rerun - Re-process multiple tasks to backtest prompt or workflow improvementsManual Task Rerun
Re-execute any completed or failed task to test changes or debug issues.
- Navigate to task execution details in Tasks page
- Scroll to bottom of execution timeline
- Click “Re-run task” button below workflow steps
- Workflow re-executes with identical trigger input (
task_query) - All file attachments from original task preserved
- New execution creates separate task record
- Original task remains unchanged for comparison
When to Use Full Rerun
When to Use Full Rerun
Testing Workflow Changes:
- Modified node configurations or prompts
- Updated evaluation criteria
- Changed tool selections
- Added or removed nodes
- Task failed due to transient error (API timeout, network issue)
- Integration temporarily unavailable
- Want to verify fix worked
- Show before/after results to stakeholders
- Validate optimization impact
- Compare agent performance over time
Rerun Behavior
Rerun Behavior
Preserved Elements:
- Original trigger input data (task_query)
- File attachments uploaded with task
- Variable configurations
- New timestamps and task ID
- Current workflow configuration (reflects any edits made)
- Latest tool versions and integrations
- Updated evaluation criteria
Step-Level Rerun
Restart workflow from a specific node instead of beginning, useful for debugging failed steps.
- Click on any workflow step in execution timeline
- Locate “Re-run” button in step detail panel
- Click to re-execute from this node forward
- Step failed validation or returned error
- Made changes to node configuration
- Want to test fix without re-running earlier steps
- Updated prompt for specific node
- Changed tool selection
- Modified evaluation criteria for this step
- Used “Optimise your prompt” feature
- Want to compare improved vs original prompt
- Validate AI-suggested improvements
Step Rerun Execution Flow
Step Rerun Execution Flow
What Gets Preserved:
- All outputs from steps before the rerun point
- Original trigger data (task_query)
- File attachments
- Selected step and all subsequent nodes
- Branch decisions after rerun point
- Evaluation criteria for re-executed steps
- Steps 1-3: Use outputs from original execution
- Steps 4-6: Re-execute with updated configuration
Prompt Optimization Workflow
Prompt Optimization Workflow
After clicking “Optimise your prompt”:AI Analysis:
- Reviews failed task execution
- Analyzes evaluation criteria not met
- Identifies prompt weaknesses
- Suggests specific improvements
- Applies AI-suggested prompt changes
- Automatically reruns step with new prompt
- Compares results before/after
- Shows improvement in evaluation scores
Auto-Rerun Configuration
Automatically retry steps that don’t meet evaluation thresholds without manual intervention.
- Open workflow in Flow builder
- Click on node to configure
- Scroll to “Auto-run” toggle in right panel
- Enable toggle (currently disabled in screenshot)

How Auto-Rerun Works
How Auto-Rerun Works
Evaluation-Based Triggering:
- Node executes and generates output
- Evaluation criteria assess accuracy
- If score below threshold → Auto-rerun triggered
- Step re-executes with same input
- Repeat until passing score or max retries reached
- Evaluation threshold: 90%
- First execution: 75% (fails)
- Auto-rerun 1: 85% (fails)
- Auto-rerun 2: 92% (passes)
- Workflow continues with passing output
Auto-Rerun Best Practices
Auto-Rerun Best Practices
When to Enable:
- Steps with non-deterministic outputs (GPT-based extraction)
- Classification tasks requiring high confidence
- Data extraction from inconsistent formats
- Steps where retry often improves results
- Deterministic operations (API calls with fixed responses)
- Steps failing due to missing data (retries won’t help)
- Integration errors requiring manual fix
- Final output steps (may need human review instead)
- Max 2-3 retries (more rarely helps)
- Clear evaluation criteria (specific, measurable)
- Monitor retry frequency (high retries indicate prompt issues)
Auto-Rerun vs Manual Rerun
Auto-Rerun vs Manual Rerun
Auto-Rerun:
- Happens during task execution automatically
- Triggered by evaluation scores
- No human intervention required
- Limited to configured max retries
- Single step only, not full workflow
- Initiated by user after task completes
- Can rerun full task or from specific step
- Unlimited reruns available
- Useful for testing changes made after execution
- Demonstrates improvements to stakeholders
Workflow Context for Reruns
Auto-rerun configuration appears in flow builder alongside evaluation criteria.
- Left: Visual workflow with nodes and branches
- Right: Node configuration panel showing:
- Evaluation criteria (Criteria 8, Criteria 9)
- “Add criteria” and “Re-generate criteria” buttons
- Auto-run toggle and settings
- Settings dropdown for advanced options
- Tool used displayed in node (e.g., “PO Database Lookup Tool”)
- Accuracy percentage shown (e.g., “92.59%”)
- Branch paths labeled (e.g., “PO Not Found Handling”, “PO Found Proceed”)
Backtesting Prompt Changes
Re-execute multiple tasks to validate prompt improvements across representative data set. Backtesting Workflow:Save Representative Tasks
Identify 10-20 tasks representing common scenarios, edge cases, and failure patterns. Mark or note task IDs for batch rerun.
Modify Prompt or Configuration
Update node prompts, evaluation criteria, or tool configurations based on identified improvements.
Rerun Saved Tasks
Execute rerun on each saved task individually. Beam creates new execution records for comparison.
Compare Results
Review evaluation scores before/after changes. Calculate improvement rate: tasks that now pass vs previously failed.
Selecting Backtest Tasks
Selecting Backtest Tasks
Criteria for Good Backtest Set:
- Variety: Cover all workflow branches and scenarios
- Failures: Include tasks that previously failed
- Edge Cases: Unusual data formats or inputs
- Success Cases: Verify changes don’t break working scenarios
- Recent Data: Reflects current data patterns
- Minimum: 10 tasks for basic validation
- Optimal: 20-30 tasks for comprehensive testing
- Large Changes: 50+ tasks for major overhauls
Measuring Improvement
Measuring Improvement
Key Metrics:Accuracy Improvement:
- Before: Average evaluation score across backtest set
- After: Average evaluation score after prompt changes
- Target: 10-20% improvement in scores
- Before: Number of tasks failing evaluation
- After: Number of tasks failing after changes
- Target: 50%+ reduction in failures
- Standard deviation of evaluation scores
- Lower = more consistent performance
- Target: Reduced variance in results
- Previously passing tasks still pass
- No new failures introduced
- Target: Zero regression on working cases
Common Backtest Scenarios
Common Backtest Scenarios
Prompt Optimization:
- Tested new extraction prompts on 15 invoices
- Accuracy improved from 78% to 93%
- Reduced “amount” field extraction errors by 60%
- Adjusted confidence thresholds
- Retested on 25 classification tasks
- Improved precision without sacrificing recall
- Modified API parameters for data lookup
- Reran 20 validation workflows
- Reduced timeout errors from 15% to 2%
Best Practices
Maintain Rerun Test Sets
Maintain Rerun Test Sets
Create Task Libraries:
- Save 10-20 representative tasks per agent
- Cover all workflow branches
- Include both successes and failures
- Update quarterly with new patterns
- Label tasks by scenario type
- Note which branch/node they test
- Document expected outcomes
- Track when last used for backtesting
Compare Before/After Results
Compare Before/After Results
Systematic Comparison:
- Keep original execution visible
- Note evaluation score changes
- Review output quality differences
- Document unexpected behavior
- Execution time (faster/slower?)
- Evaluation scores (improved/degraded?)
- Branch selections (changed logic?)
- Tool errors (more/fewer issues?)
Use Step Reruns for Efficiency
Use Step Reruns for Efficiency
When to Use:
- Early steps succeeded, later step failed
- Testing changes to specific node
- Debugging isolated step issues
- Validating prompt optimization
- Faster than full workflow rerun
- Preserves earlier step outputs
- Saves API calls and execution time
- Focuses testing on changed components
Monitor Auto-Rerun Frequency
Monitor Auto-Rerun Frequency
Warning Signs:
- Step frequently uses all 3 retries
- Auto-reruns happen on >30% of tasks
- Retries rarely improve scores
- Execution time significantly increased
- Review and improve evaluation criteria
- Optimize prompts causing frequent retries
- Consider if data quality is issue
- Disable auto-rerun if not helping
Document Rerun Results
Document Rerun Results
What to Track:
- Which tasks were rerun and why
- Changes made before rerun
- Before/after evaluation scores
- Whether change solved the issue
- Proves ROI of optimization work
- Identifies patterns in failures
- Guides future improvements
- Demonstrates value to stakeholders
Next Steps
Task Executions
Monitor task execution results before rerunning
Evaluation Framework
Configure evaluation criteria triggering auto-reruns
Optimize Outputs
Use AI-powered prompt optimization before rerunning
Debug Tools
Leverage debugging features alongside reruns