Understanding Task Reruns
Every completed task can be re-executed to verify behavior, test changes, or debug issues. Beam provides multiple rerun strategies: Full Task Rerun - Re-execute entire workflow from start with original trigger data Step-Level Rerun - Restart from specific node for targeted debugging Auto-Rerun - Automatically retry steps that fail evaluation criteria Batch Rerun - Re-process multiple tasks to backtest prompt or workflow improvementsManual Task Rerun
Re-execute any completed or failed task to test changes or debug issues.
- Navigate to task execution details in Tasks page
- Scroll to bottom of execution timeline
- Click “Re-run task” button below workflow steps
- Workflow re-executes with identical trigger input (
task_query) - All file attachments from original task preserved
- New execution creates separate task record
- Original task remains unchanged for comparison
When to Use Full Rerun
When to Use Full Rerun
Testing Workflow Changes:
- Modified node configurations or prompts
- Updated evaluation criteria
- Changed tool selections
- Added or removed nodes
- Task failed due to transient error (API timeout, network issue)
- Integration temporarily unavailable
- Want to verify fix worked
- Show before/after results to stakeholders
- Validate optimization impact
- Compare agent performance over time
Rerun Behavior
Rerun Behavior
Preserved Elements:
- Original trigger input data (task_query)
- File attachments uploaded with task
- Variable configurations
- New timestamps and task ID
- Current workflow configuration (reflects any edits made)
- Latest tool versions and integrations
- Updated evaluation criteria
Step-Level Rerun
Restart workflow from a specific node instead of beginning, useful for debugging failed steps.
- Click on any workflow step in execution timeline
- Locate “Re-run” button in step detail panel
- Click to re-execute from this node forward
- Step failed validation or returned error
- Made changes to node configuration
- Want to test fix without re-running earlier steps
- Updated prompt for specific node
- Changed tool selection
- Modified evaluation criteria for this step
- Used “Optimise your prompt” feature
- Want to compare improved vs original prompt
- Validate AI-suggested improvements
Step Rerun Execution Flow
Step Rerun Execution Flow
What Gets Preserved:
- All outputs from steps before the rerun point
- Original trigger data (task_query)
- File attachments
- Selected step and all subsequent nodes
- Branch decisions after rerun point
- Evaluation criteria for re-executed steps
- Steps 1-3: Use outputs from original execution
- Steps 4-6: Re-execute with updated configuration
Prompt Optimization Workflow
Prompt Optimization Workflow
After clicking “Optimise your prompt”:AI Analysis:
- Reviews failed task execution
- Analyzes evaluation criteria not met
- Identifies prompt weaknesses
- Suggests specific improvements
- Applies AI-suggested prompt changes
- Automatically reruns step with new prompt
- Compares results before/after
- Shows improvement in evaluation scores
Auto-Rerun Configuration
Automatically retry steps that don’t meet evaluation thresholds without manual intervention.
- Open workflow in Flow builder
- Click on node to configure
- Scroll to “Auto-run” toggle in right panel
- Enable toggle (currently disabled in screenshot)

How Auto-Rerun Works
How Auto-Rerun Works
Evaluation-Based Triggering:
- Node executes and generates output
- Evaluation criteria assess accuracy
- If score below threshold → Auto-rerun triggered
- Step re-executes with same input
- Repeat until passing score or max retries reached
- Evaluation threshold: 90%
- First execution: 75% (fails)
- Auto-rerun 1: 85% (fails)
- Auto-rerun 2: 92% (passes)
- Workflow continues with passing output
Auto-Rerun Best Practices
Auto-Rerun Best Practices
When to Enable:
- Steps with non-deterministic outputs (GPT-based extraction)
- Classification tasks requiring high confidence
- Data extraction from inconsistent formats
- Steps where retry often improves results
- Deterministic operations (API calls with fixed responses)
- Steps failing due to missing data (retries won’t help)
- Integration errors requiring manual fix
- Final output steps (may need human review instead)
- Max 2-3 retries (more rarely helps)
- Clear evaluation criteria (specific, measurable)
- Monitor retry frequency (high retries indicate prompt issues)
Auto-Rerun vs Manual Rerun
Auto-Rerun vs Manual Rerun
Auto-Rerun:
- Happens during task execution automatically
- Triggered by evaluation scores
- No human intervention required
- Limited to configured max retries
- Single step only, not full workflow
- Initiated by user after task completes
- Can rerun full task or from specific step
- Unlimited reruns available
- Useful for testing changes made after execution
- Demonstrates improvements to stakeholders
Workflow Context for Reruns
Auto-rerun configuration appears in flow builder alongside evaluation criteria.
- Left: Visual workflow with nodes and branches
- Right: Node configuration panel showing:
- Evaluation criteria (Criteria 8, Criteria 9)
- “Add criteria” and “Re-generate criteria” buttons
- Auto-run toggle and settings
- Settings dropdown for advanced options
- Tool used displayed in node (e.g., “PO Database Lookup Tool”)
- Accuracy percentage shown (e.g., “92.59%”)
- Branch paths labeled (e.g., “PO Not Found Handling”, “PO Found Proceed”)
Backtesting Prompt Changes
Re-execute multiple tasks to validate prompt improvements across representative data set. Backtesting Workflow:1
Save Representative Tasks
Identify 10-20 tasks representing common scenarios, edge cases, and failure patterns. Mark or note task IDs for batch rerun.
2
Modify Prompt or Configuration
Update node prompts, evaluation criteria, or tool configurations based on identified improvements.
3
Rerun Saved Tasks
Execute rerun on each saved task individually. Beam creates new execution records for comparison.
4
Compare Results
Review evaluation scores before/after changes. Calculate improvement rate: tasks that now pass vs previously failed.
5
Validate and Publish
If improvement meets targets (e.g., 90%+ success rate), publish workflow changes to production.
Selecting Backtest Tasks
Selecting Backtest Tasks
Criteria for Good Backtest Set:
- Variety: Cover all workflow branches and scenarios
- Failures: Include tasks that previously failed
- Edge Cases: Unusual data formats or inputs
- Success Cases: Verify changes don’t break working scenarios
- Recent Data: Reflects current data patterns
- Minimum: 10 tasks for basic validation
- Optimal: 20-30 tasks for comprehensive testing
- Large Changes: 50+ tasks for major overhauls
Measuring Improvement
Measuring Improvement
Key Metrics:Accuracy Improvement:
- Before: Average evaluation score across backtest set
- After: Average evaluation score after prompt changes
- Target: 10-20% improvement in scores
- Before: Number of tasks failing evaluation
- After: Number of tasks failing after changes
- Target: 50%+ reduction in failures
- Standard deviation of evaluation scores
- Lower = more consistent performance
- Target: Reduced variance in results
- Previously passing tasks still pass
- No new failures introduced
- Target: Zero regression on working cases
Common Backtest Scenarios
Common Backtest Scenarios
Prompt Optimization:
- Tested new extraction prompts on 15 invoices
- Accuracy improved from 78% to 93%
- Reduced “amount” field extraction errors by 60%
- Adjusted confidence thresholds
- Retested on 25 classification tasks
- Improved precision without sacrificing recall
- Modified API parameters for data lookup
- Reran 20 validation workflows
- Reduced timeout errors from 15% to 2%
Best Practices
Maintain Rerun Test Sets
Maintain Rerun Test Sets
Create Task Libraries:
- Save 10-20 representative tasks per agent
- Cover all workflow branches
- Include both successes and failures
- Update quarterly with new patterns
- Label tasks by scenario type
- Note which branch/node they test
- Document expected outcomes
- Track when last used for backtesting
Compare Before/After Results
Compare Before/After Results
Systematic Comparison:
- Keep original execution visible
- Note evaluation score changes
- Review output quality differences
- Document unexpected behavior
- Execution time (faster/slower?)
- Evaluation scores (improved/degraded?)
- Branch selections (changed logic?)
- Tool errors (more/fewer issues?)
Use Step Reruns for Efficiency
Use Step Reruns for Efficiency
When to Use:
- Early steps succeeded, later step failed
- Testing changes to specific node
- Debugging isolated step issues
- Validating prompt optimization
- Faster than full workflow rerun
- Preserves earlier step outputs
- Saves API calls and execution time
- Focuses testing on changed components
Monitor Auto-Rerun Frequency
Monitor Auto-Rerun Frequency
Warning Signs:
- Step frequently uses all 3 retries
- Auto-reruns happen on >30% of tasks
- Retries rarely improve scores
- Execution time significantly increased
- Review and improve evaluation criteria
- Optimize prompts causing frequent retries
- Consider if data quality is issue
- Disable auto-rerun if not helping
Document Rerun Results
Document Rerun Results
What to Track:
- Which tasks were rerun and why
- Changes made before rerun
- Before/after evaluation scores
- Whether change solved the issue
- Proves ROI of optimization work
- Identifies patterns in failures
- Guides future improvements
- Demonstrates value to stakeholders