Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@ywang1110
Copy link
Contributor

@ywang1110 ywang1110 commented Nov 6, 2025

User description

__


PR Type

Enhancement


Description

  • Add FuzzySharp plugin for text analysis with synonym detection, typo correction, and entity extraction

  • Implement phrase collection abstraction with CSV-based vocabulary and synonym mapping loading

  • Create token matching system with priority-based matchers (synonym, exact, fuzzy)

  • Add n-gram processing and result deduplication for intelligent text analysis

  • Integrate plugin into solution with API endpoint and dependency injection


Diagram Walkthrough

flowchart LR
  A["Text Input"] --> B["TextTokenizer"]
  B --> C["NgramProcessor"]
  C --> D["Token Matchers"]
  D --> E["SynonymMatcher"]
  D --> F["ExactMatcher"]
  D --> G["FuzzyMatcher"]
  E --> H["ResultProcessor"]
  F --> H
  G --> H
  H --> I["SearchPhrasesResult"]
  J["CsvPhraseCollectionLoader"] --> K["Vocabulary & Synonyms"]
  K --> C
Loading

File Walkthrough

Relevant files
Enhancement
22 files
IPhraseCollection.cs
Define phrase collection interface for vocabulary loading
+7/-0     
IPhraseService.cs
Define phrase service interface for text search                   
+6/-0     
SearchPhrasesResult.cs
Define search result model with match metadata                     
+11/-0   
MatchReason.cs
Define match type constants for analysis results                 
+20/-0   
TextConstants.cs
Define separator and tokenization character constants       
+29/-0   
FuzzySharpController.cs
Add API endpoint for text analysis                                             
+59/-0   
TextAnalysisRequest.cs
Define text analysis request parameters                                   
+13/-0   
INgramProcessor.cs
Define n-gram processing interface                                             
+26/-0   
IResultProcessor.cs
Define result processing interface for deduplication         
+17/-0   
ITokenMatcher.cs
Define token matcher interface and context models               
+39/-0   
FlaggedItem.cs
Define flagged item model for matches                                       
+13/-0   
TextAnalysisResponse.cs
Define text analysis response model                                           
+10/-0   
FuzzySharpPlugin.cs
Implement plugin registration and dependency injection     
+29/-0   
CsvPhraseCollectionLoader.cs
Implement CSV-based vocabulary and synonym loading             
+187/-0 
ExactMatcher.cs
Implement exact match token matcher                                           
+23/-0   
FuzzyMatcher.cs
Implement fuzzy matching for typo correction                         
+81/-0   
SynonymMatcher.cs
Implement synonym matching with highest priority                 
+23/-0   
PhraseService.cs
Implement phrase service orchestrating analysis pipeline 
+199/-0 
NgramProcessor.cs
Implement n-gram processing with matcher priority               
+131/-0 
ResultProcessor.cs
Implement result deduplication and sorting logic                 
+102/-0 
Using.cs
Add global using statements for plugin                                     
+5/-0     
TextTokenizer.cs
Implement text preprocessing and tokenization utilities   
+63/-0   
Configuration changes
4 files
BotSharp.sln
Add FuzzySharp plugin project to solution                               
+11/-0   
BotSharp.Plugin.FuzzySharp.csproj
Create FuzzySharp plugin project file                                       
+21/-0   
WebStarter.csproj
Add FuzzySharp plugin project reference                                   
+1/-0     
appsettings.json
Register FuzzySharp plugin in application settings             
+2/-1     
Dependencies
1 files
Directory.Packages.props
Add CsvHelper and FuzzySharp package dependencies               
+3/-0     

@qodo-code-review
Copy link

qodo-code-review bot commented Nov 6, 2025

PR Compliance Guide 🔍

(Compliance updated until commit 3162be4)

Below is a summary of compliance checks for this PR:

Security Compliance
🔴
Detailed error disclosure

Description: The AnalyzeText endpoint returns detailed exception messages to clients on failures
(StatusCode 500 with ex.Message), which can leak sensitive internal information and aid
attackers in reconnaissance.
FuzzySharpController.cs [41-58]

Referred Code
public async Task<IActionResult> AnalyzeText([FromBody] string text)
{
    try
    {
        if (string.IsNullOrWhiteSpace(text))
        {
            return BadRequest(new { error = "Text is required" });
        }

        var result = await _phraseService.SearchPhrasesAsync(text);
        return Ok(result);
    }
    catch (Exception ex)
    {
        _logger.LogError(ex, "Error analyzing and searching entities");
        return StatusCode(500, new { error = $"Error analyzing and searching entities: {ex.Message}" });
    }
}
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

🔴
Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Leaky error message: The controller returns a 500 with the raw exception message in the response body, exposing
internal details and lacking safe, actionable context for clients.

Referred Code
catch (Exception ex)
{
    _logger.LogError(ex, "Error analyzing and searching entities");
    return StatusCode(500, new { error = $"Error analyzing and searching entities: {ex.Message}" });
}

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status:
Internal details exposed: The 500 response includes the exception message string in the user-facing payload,
potentially exposing internal system details.

Referred Code
    _logger.LogError(ex, "Error analyzing and searching entities");
    return StatusCode(500, new { error = $"Error analyzing and searching entities: {ex.Message}" });
}

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
Missing audit logs: The new analyze-text endpoint performs a critical text analysis action but does not emit
structured audit logs capturing who invoked it, the action, and outcome; only an internal
error log exists on failure.

Referred Code
public async Task<IActionResult> AnalyzeText([FromBody] string text)
{
    try
    {
        if (string.IsNullOrWhiteSpace(text))
        {
            return BadRequest(new { error = "Text is required" });
        }

        var result = await _phraseService.SearchPhrasesAsync(text);
        return Ok(result);
    }
    catch (Exception ex)
    {
        _logger.LogError(ex, "Error analyzing and searching entities");
        return StatusCode(500, new { error = $"Error analyzing and searching entities: {ex.Message}" });
    }
}

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status:
Log data sensitivity: Informational logs include counts, file paths, and potentially source identifiers; while
no PII is evident, the sensitivity of logged source names and paths requires verification
against policy.

Referred Code
        var terms = await LoadCsvFileAsync(filePath);
        vocabulary[source] = terms;
        _logger.LogInformation($"Loaded {terms.Count} terms for source '{source}' from {filePath}");
    }
    catch (Exception ex)
    {
        _logger.LogError(ex, $"Error loading CSV file for source '{source}': {filePath}");
    }
}

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Minimal input validation: The API only checks for empty text and forwards raw content to processing without limits
or sanitation, lacking validation for size, encoding, or malicious payloads.

Referred Code
if (string.IsNullOrWhiteSpace(text))
{
    return BadRequest(new { error = "Text is required" });
}

var result = await _phraseService.SearchPhrasesAsync(text);
return Ok(result);

Learn more about managing compliance generic rules or creating your own custom rules

Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

Previous compliance checks

Compliance check up to commit f3a0101
Security Compliance
Path traversal

Description: The service loads all .csv files from a folder name supplied via request-controlled input
and constructs file paths under AppContext.BaseDirectory without explicit sanitization,
creating a possible path traversal or arbitrary file read risk if an attacker can
influence vocabulary_folder_name.
VocabularyService.cs [147-166]

Referred Code
private async Task<Dictionary<string, string>> LoadCsvFilesFromFolderAsync(string folderName)
{
    var csvFileDict = new Dictionary<string, string>();
    var searchFolder = Path.Combine(AppContext.BaseDirectory, "data", "plugins", "fuzzySharp", folderName);
    if (!Directory.Exists(searchFolder))
    {
        _logger.LogWarning($"Folder does not exist: {searchFolder}");
        return csvFileDict;
    }

    var csvFiles = Directory.GetFiles(searchFolder, "*.csv");
    foreach (var file in csvFiles)
    {
        var fileName = Path.GetFileNameWithoutExtension(file);
        csvFileDict[fileName] = file;
    }

    _logger.LogInformation($"Loaded {csvFileDict.Count} CSV files from {searchFolder}");
    return await Task.FromResult(csvFileDict);
}
Arbitrary file read

Description: The domain term mapping file path is built from a user-controlled filename appended to a
base folder; without strict allowlisting or sanitization, this may allow reading
unintended files via crafted domain_term_mapping_file.
VocabularyService.cs [56-115]

Referred Code
public async Task<Dictionary<string, (string DbPath, string CanonicalForm)>> LoadDomainTermMappingAsync(string? filename)
{
    var result = new Dictionary<string, (string DbPath, string CanonicalForm)>();
    if (string.IsNullOrWhiteSpace(filename))
    {
        return result;
    }

    var searchFolder = Path.Combine(AppContext.BaseDirectory, "data", "plugins", "fuzzySharp");
    var filePath = Path.Combine(searchFolder, filename);

    if (string.IsNullOrEmpty(filePath) || !File.Exists(filePath))
    {
        return result;
    }

    try
    {
        using var reader = new StreamReader(filePath); 
        using var csv = new CsvReader(reader, CreateCsvConfig());



 ... (clipped 39 lines)
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

🔴
Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status:
Leaks exception info: The API returns internal exception messages to the client in the 500 response body (Error
analyzing text: {ex.Message}), potentially exposing internal details.

Referred Code
catch (Exception ex)
{
    _logger.LogError(ex, "Error analyzing text");
    return StatusCode(500, new { error = $"Error analyzing text: {ex.Message}" });
}
Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
Missing audit logs: The new POST endpoint performs text analysis without recording structured audit logs for
the request, user identity, parameters used, or outcome beyond a generic info log, which
may be required for auditing critical actions.

Referred Code
[ProducesResponseType(typeof(TextAnalysisResponse), StatusCodes.Status200OK)]
[ProducesResponseType(StatusCodes.Status400BadRequest)]
[ProducesResponseType(StatusCodes.Status500InternalServerError)]
public async Task<IActionResult> AnalyzeText([FromBody] TextAnalysisRequest request)
{
    try
    {
        if (string.IsNullOrWhiteSpace(request.Text))
        {
            return BadRequest(new { error = "Text is required" });
        }

        var result = await _textAnalysisService.AnalyzeTextAsync(request);
        return Ok(result);
    }
Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Generic 500 message: The controller returns a 500 with exception message text, and service methods rely on
broad try/catch with rethrow, lacking explicit validation of external CSV inputs and
boundary cases across the pipeline.

Referred Code
catch (Exception ex)
{
    _logger.LogError(ex, "Error analyzing text");
    return StatusCode(500, new { error = $"Error analyzing text: {ex.Message}" });
}
Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status:
Potential PII logging: Informational logs include text length and flagged item counts, and error logs may capture
exception context tied to user-provided text, which could risk sensitive data exposure
depending on inputs.

Referred Code
    _logger.LogInformation(
        $"Text analysis completed in {response.ProcessingTimeMs}ms | " +
        $"Text length: {request.Text.Length} chars | " +
        $"Flagged items: {flagged.Count}");

    return response;
}
catch (Exception ex)
{
    stopwatch.Stop();
    _logger.LogError(ex, $"Error analyzing text after {stopwatch.Elapsed.TotalMilliseconds}ms");
    throw;
}
Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Path handling risk: CSV file and folder inputs are combined with base directories without explicit
normalization or whitelisting, which may allow unintended file access if user-controlled
values are passed.

Referred Code
private async Task<Dictionary<string, string>> LoadCsvFilesFromFolderAsync(string folderName)
{
    var csvFileDict = new Dictionary<string, string>();
    var searchFolder = Path.Combine(AppContext.BaseDirectory, "data", "plugins", "fuzzySharp", folderName);
    if (!Directory.Exists(searchFolder))
    {
        _logger.LogWarning($"Folder does not exist: {searchFolder}");
        return csvFileDict;
    }

    var csvFiles = Directory.GetFiles(searchFolder, "*.csv");
    foreach (var file in csvFiles)
    {
        var fileName = Path.GetFileNameWithoutExtension(file);
        csvFileDict[fileName] = file;
    }

    _logger.LogInformation($"Loaded {csvFileDict.Count} CSV files from {searchFolder}");
    return await Task.FromResult(csvFileDict);
}

@qodo-code-review
Copy link

qodo-code-review bot commented Nov 6, 2025

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
High-level
Integrate data sources with the architecture

The current implementation uses local CSV files for data, which is unsuitable
for production. Refactor the data loading to integrate with the platform's
architecture by fetching data from a database or a dedicated service.

Examples:

src/Plugins/BotSharp.Plugin.FuzzySharp/Services/VocabularyService.cs [21-115]
        public async Task<Dictionary<string, HashSet<string>>> LoadVocabularyAsync(string? foldername)
        {
            var vocabulary = new Dictionary<string, HashSet<string>>();

            if (string.IsNullOrEmpty(foldername))
            {
                return vocabulary;
            }

            // Load CSV files from the folder

 ... (clipped 85 lines)

Solution Walkthrough:

Before:

// In VocabularyService.cs
public class VocabularyService : IVocabularyService
{
    public async Task<Dictionary<string, HashSet<string>>> LoadVocabularyAsync(string? foldername)
    {
        // ...
        var searchFolder = Path.Combine(AppContext.BaseDirectory, "data", "plugins", "fuzzySharp", foldername);
        var csvFiles = Directory.GetFiles(searchFolder, "*.csv");
        // ... load from files
    }

    public async Task<Dictionary<string, (string, string)>> LoadDomainTermMappingAsync(string? filename)
    {
        // ...
        var searchFolder = Path.Combine(AppContext.BaseDirectory, "data", "plugins", "fuzzySharp");
        var filePath = Path.Combine(searchFolder, filename);
        // ... load from file
    }
}

After:

// In VocabularyService.cs
public class VocabularyService : IVocabularyService
{
    private readonly IMyDatabaseService _dbService; // or a DbContext

    public VocabularyService(IMyDatabaseService dbService)
    {
        _dbService = dbService;
    }

    public async Task<Dictionary<string, HashSet<string>>> LoadVocabularyAsync(string? vocabularySourceId)
    {
        // Fetch vocabulary from a database or central service
        var vocabularyData = await _dbService.GetVocabularyByIdAsync(vocabularySourceId);
        // ... process data into the required dictionary format
        return processedVocabulary;
    }

    public async Task<Dictionary<string, (string, string)>> LoadDomainTermMappingAsync(string? mappingSourceId)
    {
        // Fetch domain terms from a database or central service
        var mappingData = await _dbService.GetDomainTermMappingAsync(mappingSourceId);
        // ... process data into the required dictionary format
        return processedMapping;
    }
}
Suggestion importance[1-10]: 9

__

Why: This suggestion correctly identifies the most critical architectural issue in the PR—the reliance on local file-based data sources—which is explicitly mentioned by the author as a reason this is a proof-of-concept.

High
General
Optimize regex normalization for performance

Optimize the Normalize method for performance by pre-compiling the regular
expression and reordering the string manipulation operations.

src/Plugins/BotSharp.Plugin.FuzzySharp/Services/Matching/FuzzyMatcher.cs [74-80]

+private static readonly Regex NormalizationRegex = new Regex(@"[^\w']+", RegexOptions.Compiled);
+
 private string Normalize(string text)
 {
-    // Replace non-word characters (except apostrophes) with spaces
-    var normalized = Regex.Replace(text, @"[^\w']+", " ", RegexOptions.IgnoreCase);
-    // Convert to lowercase, collapse multiple spaces, and trim
-    return Regex.Replace(normalized.ToLowerInvariant(), @"\s+", " ").Trim();
+    // Convert to lowercase first
+    var lowerText = text.ToLowerInvariant();
+    // Replace non-word characters (except apostrophes) with a single space
+    var normalized = NormalizationRegex.Replace(lowerText, " ");
+    // Trim and collapse multiple spaces that might be introduced
+    return Regex.Replace(normalized, @"\s+", " ").Trim();
 }
  • Apply / Chat
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies a performance bottleneck and proposes a valid optimization by pre-compiling a regex and reordering operations, which is beneficial as this method is in a hot path.

Low
Improve performance of n-gram extraction

Improve the performance of ExtractContentSpan by replacing LINQ's Skip and Take
methods with the more efficient List.GetRange.

src/Plugins/BotSharp.Plugin.FuzzySharp/Services/Processors/NgramProcessor.cs [124-132]

 private (string ContentSpan, List<string> Tokens, List<int> ContentIndices) ExtractContentSpan(
     List<string> tokens, 
     int startIdx, 
     int n)
 {
-    var span = tokens.Skip(startIdx).Take(n).ToList();
+    var span = tokens.GetRange(startIdx, n);
     var indices = Enumerable.Range(startIdx, n).ToList();
     return (string.Join(" ", span), span, indices);
 }
  • Apply / Chat
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly points out that using GetRange is more performant than Skip and Take for List<T>, which is a valid optimization for code inside nested loops.

Low
Learned
best practice
Guard nulls and hide exception details

Add a null check for the request object and avoid returning raw exception
messages; instead return a generic error. This prevents null-reference issues
and leaking internal details.

src/Plugins/BotSharp.Plugin.FuzzySharp/Controllers/FuzzySharpController.cs [42-59]

-public async Task<IActionResult> AnalyzeText([FromBody] TextAnalysisRequest request)
+public async Task<IActionResult> AnalyzeText([FromBody] TextAnalysisRequest? request)
 {
     try
     {
-        if (string.IsNullOrWhiteSpace(request.Text))
+        if (request == null || string.IsNullOrWhiteSpace(request.Text))
         {
             return BadRequest(new { error = "Text is required" });
         }
 
         var result = await _textAnalysisService.AnalyzeTextAsync(request);
         return Ok(result);
     }
     catch (Exception ex)
     {
         _logger.LogError(ex, "Error analyzing text");
-        return StatusCode(500, new { error = $"Error analyzing text: {ex.Message}" });
+        return StatusCode(500, new { error = "Error analyzing text" });
     }
 }
  • Apply / Chat
Suggestion importance[1-10]: 6

__

Why:
Relevant best practice - Ensure nullability guards and safe fallbacks before property access and when returning error details to avoid null reference issues or leaking sensitive exception messages.

Low
  • More

@iceljc iceljc marked this pull request as draft November 6, 2025 22:04
@iceljc
Copy link
Collaborator

iceljc commented Nov 6, 2025

Please remove any business related documents.

Yanan Wang and others added 9 commits November 10, 2025 09:20
@ywang1110 ywang1110 changed the title Add FuzzySharp-based text analysis plugin for domain-specific typo detection and entity extraction Add FuzzySharp-based text analysis plugin for typo/synonym detection and entity extraction Nov 13, 2025
@ywang1110 ywang1110 changed the title Add FuzzySharp-based text analysis plugin for typo/synonym detection and entity extraction Add FuzzySharp-based text analysis plugin for synonym detection, typo correction and entity extraction Nov 13, 2025
@ywang1110 ywang1110 marked this pull request as ready for review November 13, 2025 20:42
@qodo-code-review
Copy link

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢
No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

🔴
Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status:
Leaky error detail: The endpoint returns the internal exception message to the client in the 500 response,
exposing internal details.

Referred Code
    _logger.LogError(ex, "Error analyzing and searching entities");
    return StatusCode(500, new { error = $"Error analyzing and searching entities: {ex.Message}" });
}

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
Missing auditing: The new analyze-text endpoint performs analysis without emitting audit logs for the
critical API action (who called, what input length, outcome), making event reconstruction
unclear.

Referred Code
[HttpPost("fuzzy-sharp/analyze-text")]
[ProducesResponseType(typeof(List<SearchPhrasesResult>), StatusCodes.Status200OK)]
[ProducesResponseType(StatusCodes.Status400BadRequest)]
[ProducesResponseType(StatusCodes.Status500InternalServerError)]
public async Task<IActionResult> AnalyzeText([FromBody] string text)
{
    try
    {
        if (string.IsNullOrWhiteSpace(text))
        {
            return BadRequest(new { error = "Text is required" });
        }

        var result = await _phraseService.SearchPhrasesAsync(text);
        return Ok(result);
    }

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Generic 500 error: The controller returns a 500 with exception message concatenated into the user response
and lacks validation of large inputs or bounds for parameters, indicating incomplete
edge-case handling.

Referred Code
    catch (Exception ex)
    {
        _logger.LogError(ex, "Error analyzing and searching entities");
        return StatusCode(500, new { error = $"Error analyzing and searching entities: {ex.Message}" });
    }
}

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status:
Potential PII logs: Logging statements may include file paths and counts, and the controller logs errors with
exceptions; without redaction policies, vocabulary contents or terms from external CSV may
inadvertently surface in logs.

Referred Code
            var terms = await LoadCsvFileAsync(filePath);
            vocabulary[source] = terms;
            _logger.LogInformation($"Loaded {terms.Count} terms for source '{source}' from {filePath}");
        }
        catch (Exception ex)
        {
            _logger.LogError(ex, $"Error loading CSV file for source '{source}': {filePath}");
        }
    }

    return vocabulary;
}

[SharpCache(60)]
public async Task<Dictionary<string, (string DbPath, string CanonicalForm)>> LoadSynonymMappingAsync()
{
    string filename = "";
    var result = new Dictionary<string, (string DbPath, string CanonicalForm)>();
    if (string.IsNullOrWhiteSpace(filename))
    {
        return result;


 ... (clipped 47 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Minimal validation: The analyze-text API only checks for empty text and does not enforce length limits, rate
limits, or validate CSV-driven vocabulary inputs, which could allow resource abuse or
unsafe data handling.

Referred Code
if (string.IsNullOrWhiteSpace(text))
{
    return BadRequest(new { error = "Text is required" });
}

var result = await _phraseService.SearchPhrasesAsync(text);
return Ok(result);

Learn more about managing compliance generic rules or creating your own custom rules

Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-code-review
Copy link

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
High-level
Plugin configuration is missing

The plugin is non-functional as it fails to load vocabulary and synonym data due
to hardcoded empty file paths in CsvPhraseCollectionLoader. Implement a
configuration mechanism, such as using appsettings.json, to provide the
necessary file paths.

Examples:

src/Plugins/BotSharp.Plugin.FuzzySharp/Services/CsvPhraseCollectionLoader.cs [23-28]
        string foldername = "";
        var vocabulary = new Dictionary<string, HashSet<string>>();

        if (string.IsNullOrEmpty(foldername))
        {
            return vocabulary;
src/Plugins/BotSharp.Plugin.FuzzySharp/Services/CsvPhraseCollectionLoader.cs [59-63]
        string filename = "";
        var result = new Dictionary<string, (string DbPath, string CanonicalForm)>();
        if (string.IsNullOrWhiteSpace(filename))
        {
            return result;

Solution Walkthrough:

Before:

// In CsvPhraseCollectionLoader.cs
public class CsvPhraseCollectionLoader : IPhraseCollection
{
    public async Task<Dictionary<string, HashSet<string>>> LoadVocabularyAsync()
    {
        string foldername = ""; // Hardcoded empty string
        var vocabulary = new Dictionary<string, HashSet<string>>();

        if (string.IsNullOrEmpty(foldername))
        {
            return vocabulary; // Always returns empty
        }
        // ... logic to load vocabulary ...
    }

    public async Task<Dictionary<string, (string, string)>> LoadSynonymMappingAsync()
    {
        string filename = ""; // Hardcoded empty string
        if (string.IsNullOrWhiteSpace(filename))
        {
            return new Dictionary<string, (string, string)>(); // Always returns empty
        }
        // ... logic to load synonyms ...
    }
}

After:

// In CsvPhraseCollectionLoader.cs
public class CsvPhraseCollectionLoader : IPhraseCollection
{
    private readonly MyPluginSettings _settings;

    public CsvPhraseCollectionLoader(MyPluginSettings settings, ...)
    {
        _settings = settings;
    }

    public async Task<Dictionary<string, HashSet<string>>> LoadVocabularyAsync()
    {
        string foldername = _settings.VocabularyFolderName; // From config
        if (string.IsNullOrEmpty(foldername))
        {
            return new Dictionary<string, HashSet<string>>();
        }
        // ... logic to load vocabulary ...
    }

    public async Task<Dictionary<string, (string, string)>> LoadSynonymMappingAsync()
    {
        string filename = _settings.SynonymMappingFile; // From config
        if (string.IsNullOrWhiteSpace(filename))
        {
            return new Dictionary<string, (string, string)>();
        }
        // ... logic to load synonyms ...
    }
}
Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies a critical flaw where hardcoded empty file paths in CsvPhraseCollectionLoader prevent the plugin from loading any data, rendering it non-functional.

High
Learned
best practice
Replace ContinueWith with async/await

Prefer async/await over ContinueWith to avoid sync waits, improve readability,
and ensure proper exception/stack propagation. Await the analysis task and then
map the results.

src/Plugins/BotSharp.Plugin.FuzzySharp/Services/PhraseService.cs [31-47]

-public Task<List<SearchPhrasesResult>> SearchPhrasesAsync(string term)
+public async Task<List<SearchPhrasesResult>> SearchPhrasesAsync(string term)
 {
     var request = BuildTextAnalysisRequest(term);
-    var response = AnalyzeTextAsync(request);
-    return response.ContinueWith(t =>
+    var response = await AnalyzeTextAsync(request);
+    var results = response.Flagged.Select(f => new SearchPhrasesResult
     {
-        var results = t.Result.Flagged.Select(f => new SearchPhrasesResult
-        {
-            Token = f.Token,
-            Sources = f.Sources,
-            CanonicalForm = f.CanonicalForm,
-            MatchType = f.MatchType,
-            Confidence = f.Confidence
-        }).ToList();
-        return results;
-    });
+        Token = f.Token,
+        Sources = f.Sources,
+        CanonicalForm = f.CanonicalForm,
+        MatchType = f.MatchType,
+        Confidence = f.Confidence
+    }).ToList();
+    return results;
 }
  • Apply / Chat
Suggestion importance[1-10]: 6

__

Why:
Relevant best practice - Use appropriate async patterns: avoid synchronous waits or ContinueWith on tasks in async flows; prefer async/await for clarity and correct exception propagation.

Low
  • More

@Oceania2018 Oceania2018 merged commit 1d28e8e into SciSharp:master Nov 13, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants