Optimize str.count method and do analysis against cpython

Current algorithm of str.count basically implements naive algorithm, having all time complexities as O(n*m) where n stands for length of string and m stands for length of substring to be found, this implementation is most likely even slower than cpython once, since, it has a very very lower average time complexity. 

Implement better algo and put analysis of it to make sure implementations are comparatively faster than cpython.

refs for algos: https://en.wikipedia.org/wiki/String-searching_algorithm
implementation in cpython itself: via: https://github.com/python/cpython/blob/main/Objects/stringlib/fastsearch.h

Some crux of what cpython does taken from above file:

/* fast search/count implementation, based on a mix between boyer-
   moore and horspool, with a few more bells and whistles on the top.
   for some more background, see:
   https://web.archive.org/web/20201107074620/http://effbot.org/zone/stringlib.htm */

/* note: fastsearch may access s[n], which isn't a problem when using
   Python's ordinary string types, but may cause problems if you're
   using this code in other contexts.  also, the count mode returns -1
   if there cannot possibly be a match in the target string, and 0 if
   it has actually checked for matches, but didn't find any.  callers
   beware! */

/* If the strings are long enough, use Crochemore and Perrin's Two-Way
   algorithm, which has worst-case O(n) runtime and best-case O(n/k).
   Also compute a table of shifts to achieve O(n/k) in more cases,
   and often (data dependent) deduce larger shifts than pure C&P can
   deduce. See stringlib_find_two_way_notes.txt in this folder for a
   detailed explanation. */

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize str.count method and do analysis against cpython #2355

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Optimize str.count method and do analysis against cpython #2355

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions