Skip to main content
Rerunning tasks allows you to re-execute completed or failed workflows with the same inputs, test modifications, or start from a specific step. This is essential for debugging, validating changes, and demonstrating improvements.

Understanding Task Reruns

Every completed task can be re-executed to verify behavior, test changes, or debug issues. Beam provides multiple rerun strategies: Full Task Rerun - Re-execute entire workflow from start with original trigger data Step-Level Rerun - Restart from specific node for targeted debugging Auto-Rerun - Automatically retry steps that fail evaluation criteria Batch Rerun - Re-process multiple tasks to backtest prompt or workflow improvements

Manual Task Rerun

Re-execute any completed or failed task to test changes or debug issues.
Accessing Rerun:
  1. Navigate to task execution details in Tasks page
  2. Scroll to bottom of execution timeline
  3. Click “Re-run task” button below workflow steps
What Happens:
  • Workflow re-executes with identical trigger input (task_query)
  • All file attachments from original task preserved
  • New execution creates separate task record
  • Original task remains unchanged for comparison
Testing Workflow Changes:
  • Modified node configurations or prompts
  • Updated evaluation criteria
  • Changed tool selections
  • Added or removed nodes
Debugging Failures:
  • Task failed due to transient error (API timeout, network issue)
  • Integration temporarily unavailable
  • Want to verify fix worked
Demonstrating Improvements:
  • Show before/after results to stakeholders
  • Validate optimization impact
  • Compare agent performance over time
Preserved Elements:
  • Original trigger input data (task_query)
  • File attachments uploaded with task
  • Variable configurations
Fresh Execution:
  • New timestamps and task ID
  • Current workflow configuration (reflects any edits made)
  • Latest tool versions and integrations
  • Updated evaluation criteria
Important: Rerun uses current published workflow, not the version from original execution.

Step-Level Rerun

Restart workflow from a specific node instead of beginning, useful for debugging failed steps.
Accessing Step Rerun:
  1. Click on any workflow step in execution timeline
  2. Locate “Re-run” button in step detail panel
  3. Click to re-execute from this node forward
Use Cases: Debugging Failed Step:
  • Step failed validation or returned error
  • Made changes to node configuration
  • Want to test fix without re-running earlier steps
Testing Step Modifications:
  • Updated prompt for specific node
  • Changed tool selection
  • Modified evaluation criteria for this step
Prompt Optimization:
  • Used “Optimise your prompt” feature
  • Want to compare improved vs original prompt
  • Validate AI-suggested improvements
What Gets Preserved:
  • All outputs from steps before the rerun point
  • Original trigger data (task_query)
  • File attachments
What Gets Re-Executed:
  • Selected step and all subsequent nodes
  • Branch decisions after rerun point
  • Evaluation criteria for re-executed steps
Example: Workflow has 6 steps. Step 4 failed validation. After fixing step 4 configuration:
  • Steps 1-3: Use outputs from original execution
  • Steps 4-6: Re-execute with updated configuration
After clicking “Optimise your prompt”:AI Analysis:
  • Reviews failed task execution
  • Analyzes evaluation criteria not met
  • Identifies prompt weaknesses
  • Suggests specific improvements
Optimise Button:
  • Applies AI-suggested prompt changes
  • Automatically reruns step with new prompt
  • Compares results before/after
  • Shows improvement in evaluation scores

Auto-Rerun Configuration

Automatically retry steps that don’t meet evaluation thresholds without manual intervention.
Accessing Auto-Rerun:
  1. Open workflow in Flow builder
  2. Click on node to configure
  3. Scroll to “Auto-run” toggle in right panel
  4. Enable toggle (currently disabled in screenshot)
Configuration Options: Auto-run Toggle: Enable automatic retry when accuracy score is low Number of Re-runs: Set maximum retry attempts (max 3) Trigger Condition: “Automatically re-run the step if the accuracy score is low”
Evaluation-Based Triggering:
  1. Node executes and generates output
  2. Evaluation criteria assess accuracy
  3. If score below threshold → Auto-rerun triggered
  4. Step re-executes with same input
  5. Repeat until passing score or max retries reached
Example:
  • Evaluation threshold: 90%
  • First execution: 75% (fails)
  • Auto-rerun 1: 85% (fails)
  • Auto-rerun 2: 92% (passes)
  • Workflow continues with passing output
When to Enable:
  • Steps with non-deterministic outputs (GPT-based extraction)
  • Classification tasks requiring high confidence
  • Data extraction from inconsistent formats
  • Steps where retry often improves results
When NOT to Enable:
  • Deterministic operations (API calls with fixed responses)
  • Steps failing due to missing data (retries won’t help)
  • Integration errors requiring manual fix
  • Final output steps (may need human review instead)
Optimal Configuration:
  • Max 2-3 retries (more rarely helps)
  • Clear evaluation criteria (specific, measurable)
  • Monitor retry frequency (high retries indicate prompt issues)
Auto-Rerun:
  • Happens during task execution automatically
  • Triggered by evaluation scores
  • No human intervention required
  • Limited to configured max retries
  • Single step only, not full workflow
Manual Rerun:
  • Initiated by user after task completes
  • Can rerun full task or from specific step
  • Unlimited reruns available
  • Useful for testing changes made after execution
  • Demonstrates improvements to stakeholders

Workflow Context for Reruns

Auto-rerun configuration appears in flow builder alongside evaluation criteria.
Flow Builder Integration:
  • Left: Visual workflow with nodes and branches
  • Right: Node configuration panel showing:
    • Evaluation criteria (Criteria 8, Criteria 9)
    • “Add criteria” and “Re-generate criteria” buttons
    • Auto-run toggle and settings
    • Settings dropdown for advanced options
Visual Indicators:
  • Tool used displayed in node (e.g., “PO Database Lookup Tool”)
  • Accuracy percentage shown (e.g., “92.59%”)
  • Branch paths labeled (e.g., “PO Not Found Handling”, “PO Found Proceed”)

Backtesting Prompt Changes

Re-execute multiple tasks to validate prompt improvements across representative data set. Backtesting Workflow:
1

Save Representative Tasks

Identify 10-20 tasks representing common scenarios, edge cases, and failure patterns. Mark or note task IDs for batch rerun.
2

Modify Prompt or Configuration

Update node prompts, evaluation criteria, or tool configurations based on identified improvements.
3

Rerun Saved Tasks

Execute rerun on each saved task individually. Beam creates new execution records for comparison.
4

Compare Results

Review evaluation scores before/after changes. Calculate improvement rate: tasks that now pass vs previously failed.
5

Validate and Publish

If improvement meets targets (e.g., 90%+ success rate), publish workflow changes to production.
Criteria for Good Backtest Set:
  • Variety: Cover all workflow branches and scenarios
  • Failures: Include tasks that previously failed
  • Edge Cases: Unusual data formats or inputs
  • Success Cases: Verify changes don’t break working scenarios
  • Recent Data: Reflects current data patterns
Recommended Size:
  • Minimum: 10 tasks for basic validation
  • Optimal: 20-30 tasks for comprehensive testing
  • Large Changes: 50+ tasks for major overhauls
Key Metrics:Accuracy Improvement:
  • Before: Average evaluation score across backtest set
  • After: Average evaluation score after prompt changes
  • Target: 10-20% improvement in scores
Failure Reduction:
  • Before: Number of tasks failing evaluation
  • After: Number of tasks failing after changes
  • Target: 50%+ reduction in failures
Consistency:
  • Standard deviation of evaluation scores
  • Lower = more consistent performance
  • Target: Reduced variance in results
Regression Check:
  • Previously passing tasks still pass
  • No new failures introduced
  • Target: Zero regression on working cases
Prompt Optimization:
  • Tested new extraction prompts on 15 invoices
  • Accuracy improved from 78% to 93%
  • Reduced “amount” field extraction errors by 60%
Evaluation Criteria Tuning:
  • Adjusted confidence thresholds
  • Retested on 25 classification tasks
  • Improved precision without sacrificing recall
Tool Configuration Changes:
  • Modified API parameters for data lookup
  • Reran 20 validation workflows
  • Reduced timeout errors from 15% to 2%

Best Practices

Create Task Libraries:
  • Save 10-20 representative tasks per agent
  • Cover all workflow branches
  • Include both successes and failures
  • Update quarterly with new patterns
Organization:
  • Label tasks by scenario type
  • Note which branch/node they test
  • Document expected outcomes
  • Track when last used for backtesting
Systematic Comparison:
  • Keep original execution visible
  • Note evaluation score changes
  • Review output quality differences
  • Document unexpected behavior
Metrics to Track:
  • Execution time (faster/slower?)
  • Evaluation scores (improved/degraded?)
  • Branch selections (changed logic?)
  • Tool errors (more/fewer issues?)
When to Use:
  • Early steps succeeded, later step failed
  • Testing changes to specific node
  • Debugging isolated step issues
  • Validating prompt optimization
Efficiency Gains:
  • Faster than full workflow rerun
  • Preserves earlier step outputs
  • Saves API calls and execution time
  • Focuses testing on changed components
Warning Signs:
  • Step frequently uses all 3 retries
  • Auto-reruns happen on >30% of tasks
  • Retries rarely improve scores
  • Execution time significantly increased
Action Items:
  • Review and improve evaluation criteria
  • Optimize prompts causing frequent retries
  • Consider if data quality is issue
  • Disable auto-rerun if not helping
What to Track:
  • Which tasks were rerun and why
  • Changes made before rerun
  • Before/after evaluation scores
  • Whether change solved the issue
Benefits:
  • Proves ROI of optimization work
  • Identifies patterns in failures
  • Guides future improvements
  • Demonstrates value to stakeholders

Next Steps