Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Optimize str.count method and do analysis against cpython #2355

Open
@OculusMode

Description

@OculusMode

Current algorithm of str.count basically implements naive algorithm, having all time complexities as O(n*m) where n stands for length of string and m stands for length of substring to be found, this implementation is most likely even slower than cpython once, since, it has a very very lower average time complexity.

Implement better algo and put analysis of it to make sure implementations are comparatively faster than cpython.

refs for algos: https://en.wikipedia.org/wiki/String-searching_algorithm
implementation in cpython itself: via: https://github.com/python/cpython/blob/main/Objects/stringlib/fastsearch.h

Some crux of what cpython does taken from above file:

/* fast search/count implementation, based on a mix between boyer-
moore and horspool, with a few more bells and whistles on the top.
for some more background, see:
https://web.archive.org/web/20201107074620/http://effbot.org/zone/stringlib.htm */

/* note: fastsearch may access s[n], which isn't a problem when using
Python's ordinary string types, but may cause problems if you're
using this code in other contexts. also, the count mode returns -1
if there cannot possibly be a match in the target string, and 0 if
it has actually checked for matches, but didn't find any. callers
beware! */

/* If the strings are long enough, use Crochemore and Perrin's Two-Way
algorithm, which has worst-case O(n) runtime and best-case O(n/k).
Also compute a table of shifts to achieve O(n/k) in more cases,
and often (data dependent) deduce larger shifts than pure C&P can
deduce. See stringlib_find_two_way_notes.txt in this folder for a
detailed explanation. */

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions