-
Notifications
You must be signed in to change notification settings - Fork 346
Open
Labels
Description
Problem
The E2E tests for calculator_multiply and calculator_divide fail all 10 retry attempts due to non-deterministic receiver method instantiation patterns generated by qwen2.5-coder:0.5b.
Details
Even with temperature=0 and seed=42, the LLM randomly chooses between two valid receiver instantiation patterns:
Pattern 1 (in golden files):
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
c := &Calculator{}
if got := c.Multiply(tt.args.n, tt.args.d); got != tt.want {
t.Errorf("Calculator.Multiply() = %v, want %v", got, tt.want)
}
})
}Pattern 2 (sometimes generated):
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
if got := tt.c.Multiply(tt.args.n, tt.args.d); got != tt.want {
t.Errorf("Calculator.Multiply() = %v, want %v", got, tt.want)
}
})
}Both patterns are syntactically valid but produce different output strings, causing E2E test failures.
Current Status
- Temporarily disabled
calculator_multiplyandcalculator_divideE2E tests in internal/ai/e2e_test.go - 9/11 E2E tests passing consistently on first attempt
- 2/11 tests disabled with TODO comment referencing this issue
Possible Solutions
- Add normalization logic: Convert Pattern 2 → Pattern 1 before comparison
- Strengthen prompt: Add explicit instruction to prefer Pattern 1
- Try different LLM: Test with larger/different models (e.g., qwen2.5-coder:1.5b)
- Relax matching: Use AST comparison instead of exact string matching (loses determinism validation)
- Accept both patterns: Update golden files to include both valid patterns (complex to implement)
References
- PR feat: AI-powered test case generation #194: Add AI-powered test generation
- Test failure logs: /tmp/full_e2e.txt
- E2E test code: internal/ai/e2e_test.go:116-258
- Golden files: testdata/goldens/calculator_{multiply,divide}_ai.go