This tool provides an API to phoneticize Japanese paragraphs and novels, via the Japanese morphological analyzer kuromoji.
The project is named after Yumina Ernier Belfast in light novel In Another World With My Smartphone.
You may also download novels from 小説家になろう and save them to a local database, where you could update them on a timely basis.
In addition, downloaded novels can also be phoneticized and exported to EPUB documents. However, the EPUB templates ought to be modified before utilization.
-
kuromoji.jarDownload at https://github.com/atilika/kuromoji/downloads.
-
TokenizerCaller.classCompile with command
javac -encoding utf-8 -classpath kuromoji.jar TokenizerCaller.java. Must havekuromoji.jarin advance. -
7z / 7z.exeYou must have a valid 7-zip distribution in order to create EPUB documents. You can download them here: http://7-zip.org/download.html
-
/templatesThe default templates are provided in the project.
These files must be present under the working directory.
The entire library is guranteed to be thread-safe.
-
Phoneticized Article Interchange Format (PAIF)
[ ('line', [ ('regular', '「というわけで、お'), ('phonogram', '前', 'マエ'), ('regular', 'さんは'), ('phonogram', '死', ' シ'), ('regular', 'んでしまった。'), ('phonogram', '本当', 'ホントウ'), ('regular', 'に'), ('phonogram', '申', 'モウ'), ('regular', 'し'), ('phonogram', '訳', 'ワケ'), ('regular', 'ない」') ]), ('break',), ('line', [ ('regular', '「はあ」') ]) ] -
renderer.PhonogramRendererClass initializes a kuromoji library and communicates with it in order to add phonograms to the sentence. This class is thread-safe.
-
.convert(content, hiragana=False)content: A string to phoneticize, must not contain line breaks.
hiragana: Set to True to represent phonograms with hiragana, otherwise would use katakana.
-
.close()Terminate the kuromoji library and discard this renderer instance.
-
-
renderer.get_phonogram_renderer()
Returns a shared renderer.PhonogramRenderer instance.
Alias: get_phonogram_renderer(...)
renderer.phoneticize(article, phonogram_renderer=None, hiragana=False)
Phoneticize an article and returns a PAIF object.
-
articleA PAIF object, must not be processed in advance.
-
phonogram_rendererIf not given, would assign one of its own.
-
hiraganaUse hiragana instead of katakana while phoneticizing.
Alias: phoneticize(...)
syosetu.get_chapter_list(web_id)
Returns a list of chapters, taking the following forms:
'chapter_title', chapter_id, title'subtitle', subtitle_id, subtitle
For example:
[('chapter_title', 1, '異世界来訪。'), ('subtitle', 1, '死亡、そして復活。'), ('subtitle', 2, '目覚め、そして異世界。'), ...]
syosetu.get_chapter(web_id, chap_id)
Returns a PAIF object, unprocessed.
-
web_idWeb novel ID, e.g.
n1443bp. -
chap_idChapter ID, an integer.
-
syosetu.SyosetuDatabaseClass manages novel chapters stored in a SQLite database.
-
.__init__(filename, syosetu_id, force_clear=False)-
filenameDatabase filename.
-
syosetu_idWeb novel ID, e.g.
n1443bp. -
force_clearIf set to True, would reset the database.
-
-
.get_contents()Functions like
syosetu.get_chapter_list(), but offline. -
.get_chapter_title(typ, num)Finds the title of the entry in TOC matching
typandnum. -
.get_contents_chapters_id()Get all subtitle IDs in a list.
-
.update_contents()Update offline database TOC.
-
.get_chapter(chap_id)Returns two PAIF objects, one is the original data and the other is processed.
-
.has_chapter(chap_id)Returns if this chapter exists in database.
-
.update_chapter(chap_id, phonogram_renderer=None)Downloads chapter, processes and stores into database.
-
.update_all(phonogram_renderer=None, display_progress_bar=False)Downloads entire novel to local database.
-
.commit()Flush changes to file.
-
.close()Close database.
-
-
epub.export_epub(database, display_progress_bar=False)Export EPUB file with given templates and database.
- can't convert raw string to PAIF object
- easier customizations of EPUB templates
-
don't have a morphological analyzer of my own -
this light novel doesn't make any sense...