-
Notifications
You must be signed in to change notification settings - Fork 346
fix(llmobs): properly parse "newer" anthropic models, cohere models from bedrock-runtime calls #6383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(llmobs): properly parse "newer" anthropic models, cohere models from bedrock-runtime calls #6383
Conversation
Overall package sizeSelf size: 12 MB Dependency sizes| name | version | self size | total size | |------|---------|-----------|------------| | @datadog/libdatadog | 0.7.0 | 35.02 MB | 35.02 MB | | @datadog/native-appsec | 10.2.1 | 20.64 MB | 20.65 MB | | @datadog/native-iast-taint-tracking | 4.0.0 | 11.72 MB | 11.73 MB | | @datadog/pprof | 5.10.0 | 9.91 MB | 10.3 MB | | @opentelemetry/core | 1.30.1 | 908.66 kB | 7.16 MB | | protobufjs | 7.5.4 | 2.95 MB | 5.6 MB | | @datadog/wasm-js-rewriter | 4.0.1 | 2.85 MB | 3.58 MB | | @datadog/native-metrics | 3.1.1 | 1.02 MB | 1.43 MB | | @opentelemetry/api | 1.8.0 | 1.21 MB | 1.21 MB | | jsonpath-plus | 10.3.0 | 617.18 kB | 1.08 MB | | import-in-the-middle | 1.14.2 | 122.36 kB | 850.93 kB | | lru-cache | 10.4.3 | 804.3 kB | 804.3 kB | | opentracing | 0.14.7 | 194.81 kB | 194.81 kB | | source-map | 0.7.6 | 185.63 kB | 185.63 kB | | pprof-format | 2.2.1 | 163.06 kB | 163.06 kB | | @datadog/sketches-js | 2.1.1 | 109.9 kB | 109.9 kB | | lodash.sortby | 4.7.0 | 75.76 kB | 75.76 kB | | ignore | 7.0.5 | 63.38 kB | 63.38 kB | | istanbul-lib-coverage | 3.2.2 | 34.37 kB | 34.37 kB | | rfdc | 1.4.1 | 27.15 kB | 27.15 kB | | dc-polyfill | 0.1.10 | 26.73 kB | 26.73 kB | | @isaacs/ttlcache | 1.4.1 | 25.2 kB | 25.2 kB | | tlhunter-sorted-set | 0.1.0 | 24.94 kB | 24.94 kB | | shell-quote | 1.8.3 | 23.74 kB | 23.74 kB | | limiter | 1.1.5 | 23.17 kB | 23.17 kB | | retry | 0.13.1 | 18.85 kB | 18.85 kB | | semifies | 1.0.0 | 15.84 kB | 15.84 kB | | jest-docblock | 29.7.0 | 8.99 kB | 12.76 kB | | crypto-randomuuid | 1.0.0 | 11.18 kB | 11.18 kB | | ttl-set | 1.0.0 | 4.61 kB | 9.69 kB | | mutexify | 1.4.0 | 5.71 kB | 8.74 kB | | path-to-regexp | 0.1.12 | 6.6 kB | 6.6 kB | | koalas | 1.0.2 | 6.47 kB | 6.47 kB | | module-details-from-path | 1.0.4 | 3.96 kB | 3.96 kB |🤖 This report was automatically generated by heaviest-objects-in-the-universe |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #6383 +/- ##
=======================================
Coverage 84.31% 84.32%
=======================================
Files 477 477
Lines 20086 20086
=======================================
+ Hits 16936 16937 +1
+ Misses 3150 3149 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…r/bedrock-runtime-tests-use-cassettes
BenchmarksBenchmark execution time: 2025-09-09 17:00:18 Comparing candidate commit 380a448 in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 1683 metrics, 81 unstable metrics. |
…b.com:DataDog/dd-trace-js into sabrenner/bedrock-runtime-tests-use-cassettes
…rom bedrock-runtime calls (#6383) * utils fixes * update testagent version * update fixtures * generate cassettes * test changes * use fixed testagent version * add in break line * add in nullish coalesce for anthropic input messages
…rom bedrock-runtime calls (#6383) * utils fixes * update testagent version * update fixtures * generate cassettes * test changes * use fixed testagent version * add in break line * add in nullish coalesce for anthropic input messages
What does this PR do?
Makes sure that bedrock runtime calls to invoke anthropic models properly capture the last user message on the LLM span's input, and properly capture the text blocks for the assistant's response.
Additionally, does the same thing for newer cohere response models.
Lastly, adds cassettes to these tests instead of
nock
mocking, which could be flaky and are hard to get right with the actual return values from the bedrock-runtime AWS service.Motivation
Drive-by fixes while generating/using cassettes for real-world, accurate response shapes from bedrock-runtime. Evidently, it paid off 😆