Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@iceljc
Copy link
Collaborator

@iceljc iceljc commented Dec 17, 2025

PR Type

Enhancement


Description

  • Rename Tokenizer abstraction to NER (Named Entity Recognition)

  • Update all interfaces from ITokenizer to INERAnalyzer

  • Rename TokenizeOptions/TokenizeResult to NEROptions/NERResult

  • Update API endpoints and controller methods for NER

  • Update FuzzySharp plugin implementation to use NER naming


Diagram Walkthrough

flowchart LR
  A["Tokenizer Abstraction"] -->|Rename| B["NER Abstraction"]
  C["ITokenizer Interface"] -->|Rename| D["INERAnalyzer Interface"]
  E["TokenizeOptions/Result"] -->|Rename| F["NEROptions/Result"]
  G["Tokenizer Endpoints"] -->|Update| H["NER Endpoints"]
  I["FuzzySharp Plugin"] -->|Update| J["NER Implementation"]
Loading

File Walkthrough

Relevant files
Enhancement
16 files
INERAnalyzer.cs
Create new NER analyzer interface                                               
+11/-0   
INERDataLoader.cs
Rename ITokenDataLoader to INERDataLoader                               
+2/-2     
NEROptions.cs
Rename TokenizeOptions to NEROptions                                         
+2/-2     
NERResult.cs
Rename TokenizeResult to NERResult                                             
+2/-2     
NERResponse.cs
Create new NER response class                                                       
+8/-0     
ITokenizer.cs
Remove deprecated ITokenizer interface                                     
+0/-11   
TokenizeResponse.cs
Remove deprecated TokenizeResponse class                                 
+0/-8     
KnowledgeBaseController.Document.cs
Update knowledge base document processor endpoint               
+2/-2     
KnowledgeBaseController.NER.cs
Create new NER controller with analysis endpoints               
+46/-0   
KnowledgeBaseController.Tokenizer.cs
Remove deprecated tokenizer controller                                     
+0/-48   
Using.cs
Add NER namespace global usings                                                   
+4/-0     
NERAnalysisRequest.cs
Rename TokenizeRequest to NERAnalysisRequest                         
+2/-4     
FuzzySharpPlugin.cs
Update plugin DI registration for NER                                       
+2/-2     
CsvNERDataLoader.cs
Rename CsvTokenDataLoader to CsvNERDataLoader                       
+4/-4     
FuzzySharpNERAnalyzer.cs
Rename FuzzySharpTokenizer to FuzzySharpNERAnalyzer           
+12/-13 
Using.cs
Update FuzzySharp plugin NER namespace usings                       
+3/-3     

@qodo-code-review
Copy link

qodo-code-review bot commented Dec 17, 2025

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢
No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

🔴
Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Null returned on error: When no matching INERAnalyzer is found, the endpoint returns null rather than a clear HTTP
error response (e.g., 400/404) with actionable context, which can lead to ambiguous client
behavior.

Referred Code
var analyzer = _services.GetServices<INERAnalyzer>()
                        .FirstOrDefault(x => x.Provider.IsEqualTo(request.Provider));

if (analyzer == null)
{
    return null;
}
return await analyzer.AnalyzeAsync(request.Text, request.Options);

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status:
Exception message exposed: The analyzer assigns ex.Message to response.ErrorMsg, which can expose internal
implementation details to callers if returned by the API instead of using a generic
user-facing message and logging details internally.

Referred Code
response.ErrorMsg = ex.Message;
return response;

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
Missing audit logging: The new NER endpoints perform request-driven analysis but do not emit any audit trail
context (e.g., caller identity, action, outcome), which may be required depending on
whether this operation is considered critical in your environment.

Referred Code
[HttpPost("knowledge/NER/analyze")]
public async Task<NERResponse?> NERAnalyze([FromBody] NERAnalysisRequest request)
{
    var analyzer = _services.GetServices<INERAnalyzer>()
                            .FirstOrDefault(x => x.Provider.IsEqualTo(request.Provider));

    if (analyzer == null)
    {
        return null;
    }
    return await analyzer.AnalyzeAsync(request.Text, request.Options);
}

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status:
Nonstandard route casing: The routes use mixed/uppercase segments (e.g., knowledge/NER/...), which may reduce
readability and consistency with typical lowercase REST conventions unless this is an
established API standard in the codebase.

Referred Code
[HttpPost("knowledge/NER/analyze")]
public async Task<NERResponse?> NERAnalyze([FromBody] NERAnalysisRequest request)
{
    var analyzer = _services.GetServices<INERAnalyzer>()
                            .FirstOrDefault(x => x.Provider.IsEqualTo(request.Provider));

    if (analyzer == null)
    {
        return null;
    }
    return await analyzer.AnalyzeAsync(request.Text, request.Options);
}

/// <summary>
/// Get NER analyzers
/// </summary>
/// <returns></returns>
[HttpGet("knowledge/NER/analyzers")]
public IEnumerable<string> GetNERAnalyzers()
{
    var analyzers = _services.GetServices<INERAnalyzer>();


 ... (clipped 9 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Missing request validation: The new endpoint accepts external input (request.Text, request.Provider, request.Options)
without explicit validation/guardrails in the controller, relying on unseen model
validation or downstream handling to prevent empty/invalid values.

Referred Code
[HttpPost("knowledge/NER/analyze")]
public async Task<NERResponse?> NERAnalyze([FromBody] NERAnalysisRequest request)
{
    var analyzer = _services.GetServices<INERAnalyzer>()
                            .FirstOrDefault(x => x.Provider.IsEqualTo(request.Provider));

    if (analyzer == null)
    {
        return null;
    }
    return await analyzer.AnalyzeAsync(request.Text, request.Options);
}

Learn more about managing compliance generic rules or creating your own custom rules

  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-code-review
Copy link

qodo-code-review bot commented Dec 17, 2025

PR Code Suggestions ✨

Latest suggestions up to 6bfc3f4

CategorySuggestion                                                                                                                                    Impact
Incremental [*]
Return proper HTTP status codes

Modify the EntityAnalyze action to return ActionResult, using NotFound or
BadRequest for error cases instead of null to provide more descriptive HTTP
responses.

src/Infrastructure/BotSharp.OpenAPI/Controllers/KnowledgeBase/KnowledgeBaseController.Entity.cs [12-23]

 [HttpPost("knowledge/entity/analyze")]
-public async Task<EntityAnalysisResponse?> EntityAnalyze([FromBody] EntityAnalysisRequest request)
+public async Task<ActionResult<EntityAnalysisResponse>> EntityAnalyze([FromBody] EntityAnalysisRequest request)
 {
+    if (string.IsNullOrWhiteSpace(request.Text))
+    {
+        return BadRequest("Text is required.");
+    }
+
     var analyzer = _services.GetServices<IEntityAnalyzer>()
                             .FirstOrDefault(x => x.Provider.IsEqualTo(request.Provider));
 
     if (analyzer == null)
     {
-        return null;
+        return NotFound($"Entity analyzer provider '{request.Provider}' was not found.");
     }
-    return await analyzer.AnalyzeAsync(request.Text, request.Options);
+
+    var result = await analyzer.AnalyzeAsync(request.Text, request.Options);
+    return Ok(result);
 }
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that returning null is poor API design and proposes using ActionResult<T> to return specific HTTP status codes, which significantly improves API robustness and client-side error handling.

Medium
De-duplicate provider name lists

Apply Distinct() to the provider lists in GetEntityAnalyzers and
GetEntityDataProviders to prevent returning duplicate names.

src/Infrastructure/BotSharp.OpenAPI/Controllers/KnowledgeBase/KnowledgeBaseController.Entity.cs [29-45]

 [HttpGet("knowledge/entity/analyzers")]
 public IEnumerable<string> GetEntityAnalyzers()
 {
     var analyzers = _services.GetServices<IEntityAnalyzer>();
-    return analyzers.Select(x => x.Provider);
+    return analyzers.Select(x => x.Provider).Distinct(StringComparer.OrdinalIgnoreCase);
 }
 
 [HttpGet("knowledge/entity/data-providers")]
 public IEnumerable<string> GetEntityDataProviders()
 {
     var dataLoaders = _services.GetServices<IEntityDataLoader>();
-    return dataLoaders.Select(x => x.Provider);
+    return dataLoaders.Select(x => x.Provider).Distinct(StringComparer.OrdinalIgnoreCase);
 }
  • Apply / Chat
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies a potential issue where duplicate provider names could be returned and proposes using Distinct() to prevent it, which makes the API response more reliable and predictable.

Low
Possible issue
Make routes absolute and stable

Add a leading slash (/) to the new API endpoint routes in
KnowledgeBaseController.Entity.cs to make them absolute. This ensures
consistency with other routes in the controller and prevents potential routing
issues.

src/Infrastructure/BotSharp.OpenAPI/Controllers/KnowledgeBase/KnowledgeBaseController.Entity.cs [12-45]

-[HttpPost("knowledge/entity/analyze")]
+[HttpPost("/knowledge/entity/analyze")]
 public async Task<EntityAnalysisResponse?> EntityAnalyze([FromBody] EntityAnalysisRequest request)
 {
     var analyzer = _services.GetServices<IEntityAnalyzer>()
                             .FirstOrDefault(x => x.Provider.IsEqualTo(request.Provider));
 
     if (analyzer == null)
     {
         return null;
     }
     return await analyzer.AnalyzeAsync(request.Text, request.Options);
 }
 
 ...
 
-[HttpGet("knowledge/entity/analyzers")]
+[HttpGet("/knowledge/entity/analyzers")]
 public IEnumerable<string> GetEntityAnalyzers()
 {
     var analyzers = _services.GetServices<IEntityAnalyzer>();
     return analyzers.Select(x => x.Provider);
 }
 
 ...
 
-[HttpGet("knowledge/entity/data-providers")]
+[HttpGet("/knowledge/entity/data-providers")]
 public IEnumerable<string> GetEntityDataProviders()
 {
     var dataLoaders = _services.GetServices<IEntityDataLoader>();
     return dataLoaders.Select(x => x.Provider);
 }

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 6

__

Why: The suggestion correctly points out an inconsistency in routing style within the KnowledgeBaseController partial class, as another route in the PR was changed to be absolute. Adopting absolute paths improves consistency and makes the API endpoints more robust against future changes to controller-level routes.

Low
  • More

Previous suggestions

Suggestions up to commit bd2a66a
CategorySuggestion                                                                                                                                    Impact
High-level
Rename is too specific

The term "NER" is too specific for the current implementation, which is closer
to dictionary-based entity extraction. A more general name like IEntityAnalyzer
would be more accurate and extensible.

Examples:

src/Infrastructure/BotSharp.Abstraction/NER/INERAnalyzer.cs [6-11]
public interface INERAnalyzer
{
    string Provider { get; }

    Task<NERResponse> AnalyzeAsync(string text, NEROptions? options = null);
}
src/Plugins/BotSharp.Plugin.FuzzySharp/Services/FuzzySharpNERAnalyzer.cs [6-7]
public class FuzzySharpNERAnalyzer : INERAnalyzer
{

Solution Walkthrough:

Before:

// Abstraction
public interface INERAnalyzer
{
    Task<NERResponse> AnalyzeAsync(string text, NEROptions? options = null);
}

// Implementation
public class FuzzySharpNERAnalyzer : INERAnalyzer
{
    public async Task<NERResponse> AnalyzeAsync(string text, NEROptions? options = null)
    {
        // ... logic for fuzzy matching against a vocabulary
    }
}

// Controller Endpoint
[HttpPost("knowledge/NER/analyze")]
public async Task<NERResponse?> NERAnalyze(...) { ... }

After:

// Abstraction
public interface IEntityAnalyzer
{
    Task<EntityAnalysisResponse> AnalyzeAsync(string text, EntityAnalysisOptions? options = null);
}

// Implementation
public class FuzzySharpEntityAnalyzer : IEntityAnalyzer
{
    public async Task<EntityAnalysisResponse> AnalyzeAsync(string text, EntityAnalysisOptions? options = null)
    {
        // ... logic for fuzzy matching against a vocabulary
    }
}

// Controller Endpoint
[HttpPost("knowledge/entity/analyze")]
public async Task<EntityAnalysisResponse?> EntityAnalyze(...) { ... }
Suggestion importance[1-10]: 9

__

Why: This is a critical design suggestion that correctly points out the potential for confusion by using the term "NER" for what is essentially dictionary-based entity extraction, impacting the entire PR's naming convention.

High
General
Return appropriate HTTP error codes

Modify the NERAnalyze method to return ActionResult instead of NERResponse?.
This allows for returning specific HTTP status codes, such as 400 Bad Request if
the provider is missing and 404 Not Found if the analyzer is not found,
providing clearer client feedback.

src/Infrastructure/BotSharp.OpenAPI/Controllers/KnowledgeBase/KnowledgeBaseController.NER.cs [13-23]

-public async Task<NERResponse?> NERAnalyze([FromBody] NERAnalysisRequest request)
+public async Task<ActionResult<NERResponse>> NERAnalyze([FromBody] NERAnalysisRequest request)
 {
+    if (string.IsNullOrEmpty(request.Provider))
+    {
+        return BadRequest("Provider is required.");
+    }
+
     var analyzer = _services.GetServices<INERAnalyzer>()
                             .FirstOrDefault(x => x.Provider.IsEqualTo(request.Provider));
 
     if (analyzer == null)
     {
-        return null;
+        return NotFound($"NER analyzer for provider '{request.Provider}' not found.");
     }
     return await analyzer.AnalyzeAsync(request.Text, request.Options);
 }
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that returning null results in an ambiguous 204 No Content response and proposes using specific HTTP status codes (400, 404), which significantly improves the API's usability and error handling for clients.

Medium
Possible issue
Fix incorrect exception handling logic

Refactor the AnalyzeAsync method to consistently use the response variable.
Populate its properties within the try block on success, and modify it in the
catch block on failure, then return it at the end of the method.

src/Plugins/BotSharp.Plugin.FuzzySharp/Services/FuzzySharpNERAnalyzer.cs [27-55]

 public async Task<NERResponse> AnalyzeAsync(string text, NEROptions? options = null)
 {
     var response = new NERResponse();
 
     try
     {
         var result = await AnalyzeTextAsync(text, options);
 
-        return new NERResponse
+        response.Success = true;
+        response.Results = result?.FlaggedItems?.Select(f => new NERResult
         {
-            Success = true,
-            Results = result?.FlaggedItems?.Select(f => new NERResult
+            Token = f.Token,
+            CanonicalText = f.CanonicalForm,
+            Data = new Dictionary<string, object>
             {
-                Token = f.Token,
-                CanonicalText = f.CanonicalForm,
-                Data = new Dictionary<string, object>
-                {
-                    { "data_source", f.DataSource },
-                    { "score", f.Score },
-                    { "matched_by", f.MatchedBy }
-                }
-            }).ToList() ?? new List<NERResult>()
-        };
+                { "data_source", f.DataSource },
+                { "score", f.Score },
+                { "matched_by", f.MatchedBy }
+            }
+        }).ToList() ?? new List<NERResult>();
     }
     catch (Exception ex)
     {
         response.Success = false;
         response.Message = ex.Message;
-        return response;
     }
+    return response;
 }
Suggestion importance[1-10]: 6

__

Why: This suggestion correctly identifies a logic flaw in the try-catch block where the initialized response object is not used in the success path. The proposed refactoring improves code clarity and ensures consistent object handling for both success and failure scenarios.

Low

@iceljc iceljc merged commit abbdb67 into SciSharp:master Dec 18, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant