Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

ErdaradunGaztea
Copy link

In many languages (including Polish) having a list of all words with number of syllables for each of them is superfluous. There are a few rules that allow to correctly compute number of syllables for >99% of words, and they are as follows:

  • In general, one vowel = one syllable.
  • If "i" appears in front of any other vowel, it's not counted.
  • Repetition of the same vowel is also a single syllable (e.g. "czeeeemuuuu", translated as "whyyyy").
  • "au" and "eu" are the complicated ones (though, luckily, they are quite rare):
    • In most cases "u" is pronounced as "ł", so it's a single syllable (this is what I implemented, I omitted the other cases due to their difficulty).
    • However, if "au" or "eu" falls on the border between the root and a prefix/suffix, then these vowels are pronounced separately.
    • And there are a few exceptions (Wiktionary lists them as: nauka, nauczka, nauczanie, nauczenie, laurka, Zeus, Dzeus, Seul, neutron, neutralny -- not counting non-lemma forms) where, again, "au" or "eu" are pronounced as separate vowels.
  • All other vowel sequences are pronounced separately (so back to point one).

I had to rewrite the code a bit to allow a different set of syllable rules to be used (i.e. a different regex). Moreover:

  • Replaced sapply() with vapply() (the former simplifies uncontrollably).
  • Had to set default dictionary (i.e. when language is not "en") to an empty named integer vector, so that the code doesn't try to subset NULL later on (which resulted in returning complete garbage).
  • Covered Polish examples with tests, some are skipped due to that implementing all rules "au" and "eu" can't be done with just a regex and will require creating a dictionary with exceptions.
  • Updated README to reflect implementing rules for Polish language.

Tried to stick to your code style, I believe I succeeded at that. Should I add myself as a contributor?
Hopefully this will make it easier to implement other languages too!

@codecov
Copy link

codecov bot commented May 16, 2022

Codecov Report

Merging #12 (7d7bc65) into master (124d2cf) will not change coverage.
The diff coverage is 100.00%.

❗ Current head 7d7bc65 differs from pull request most recent head 7735a6e. Consider uploading reports for the commit 7735a6e to get more accurate results

@@            Coverage Diff            @@
##            master       #12   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            1         2    +1     
  Lines           22        35   +13     
=========================================
+ Hits            22        35   +13     
Impacted Files Coverage Δ
R/nsyllable.R 100.00% <100.00%> (ø)
R/syllable_rules.R 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 124d2cf...7735a6e. Read the comment docs.

@ErdaradunGaztea
Copy link
Author

I don't have an access to a computer right now, but there seems to be a solution to the failing R CMD check on Ubuntu devel: https://community.rstudio.com/t/github-action-failure-with-rcmd-check-on-ubuntu-devel/129727. I'll try and implement it later today.

@ErdaradunGaztea
Copy link
Author

I'm not sure how many differences there are between v2 check-standard and that check you used earlier, but I hope they are similar enough to not break your workflow. And, hopefully, this one works on Ubuntu devel too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant