UPSTREAM PR #17070: server : print the samplers chain for each request #113

DajanaV · 2025-11-07T09:35:50Z

This is useful for debugging to track which samplers are being used:

0.06.413.534 I slot get_availabl: id  3 | task -1 | selected slot by LCP similarity, sim_best = 1.000 (> 0.100 thold), f_keep = 0.687
0.06.413.669 I slot launch_slot_: id  3 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist 
0.06.413.690 I slot launch_slot_: id  3 | task 38 | processing task

loci-agentic-ai · 2025-11-07T10:11:33Z

Access the complete analysis in the LOCI Dashboard

Performance Analysis Summary

Overview

Analysis of PR #113 reveals minimal performance impact from the addition of sampler chain logging functionality. The change introduces a single debug logging statement in the server's slot initialization process without affecting core inference operations.

Key Findings

Performance Metrics:

Highest Response Time change: llama_supports_rpc function showed +0.08% increase (+0.024 ns, from 29.66 ns to 29.68 ns)
Highest Throughput change: llm_ffn_exps_block_regex function showed -0.13% improvement (-0.16 ns, from 120.85 ns to 120.69 ns)
Changes are within compiler optimization variance and measurement noise

Core Function Impact:

No modifications to critical inference functions (llama_decode, llama_encode, llama_tokenize)
Token processing throughput remains unaffected
No impact on tokens per second performance for the reference model (ollama://smollm:135m on 12th Gen Intel i7-1255U)

Power Consumption Analysis:

All binaries maintain stable energy profiles with negligible variations (< 0.001%)
libllama.so: 280,779.72 nJ (stable)
llama-cvector-generator: 314,115.69 nJ (stable)
No energy efficiency regressions identified

Flame Graph and CFG Analysis:

llama_supports_rpc shows simple two-level execution structure with linear progression
CFG comparison reveals identical assembly code and control flow between versions
0.02 ns timing variance attributed to compiler or binary layout differences rather than functional changes

Code Review Findings:

Well-implemented debugging enhancement using SLT_INF logging
Positioned correctly after sampler initialization validation
Maintains backward compatibility with no API changes
Adds operational visibility for sampler chain configuration

Conclusion:
The changes represent a pure debugging enhancement with no measurable performance impact on inference operations. All observed timing variations fall within normal compiler optimization variance, confirming the implementation maintains system performance while adding valuable debugging capabilities.

server : print the samplers chain for each request

0d05160

DajanaV temporarily deployed to PROD__AL_DEMO November 7, 2025 09:35 — with GitHub Actions Inactive

DajanaV force-pushed the main branch 27 times, most recently from 6aa5dc2 to 81cedf2 Compare November 10, 2025 16:10

DajanaV force-pushed the main branch 30 times, most recently from 9ea0205 to 1308d3f Compare November 14, 2025 08:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

UPSTREAM PR #17070: server : print the samplers chain for each request #113

UPSTREAM PR #17070: server : print the samplers chain for each request #113

Uh oh!

DajanaV commented Nov 7, 2025

Uh oh!

loci-agentic-ai bot commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

UPSTREAM PR #17070: server : print the samplers chain for each request #113

Are you sure you want to change the base?

UPSTREAM PR #17070: server : print the samplers chain for each request #113

Uh oh!

Conversation

DajanaV commented Nov 7, 2025

Uh oh!

loci-agentic-ai bot commented Nov 7, 2025

Performance Analysis Summary

Overview

Key Findings

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants