The file main.py represents the process of finding metaphors in a text:
- Text Segmentation
- Finding candidates for the metaphors: a candidate is a pair of word: adjective-noun or verb-noun
- Labeling the metaphors: Is a candidate metaphorical or literal?
Arguments:
- -ml or --labeler followed by an ID: choose the metaphor labeling method:
- darkthoughts
- cluster
- kmeans
- Default value: darkthoughts
- -cf or --finder followed by an ID: choose the candidate finding method:
- adjNoun
- verbNoun
- Default value: adjNoun
- -v or --verbose
- Print the different steps of the process
- Default value: False
- -f or --file followed by a file name: look for metaphors in a text file
- -s or --string followed by a string: look for metaphors in a specified string
- -cg or --cgenerator:
- Useful when combined with an excel or csv file. Use word pair in the file as candidates instead of looking for candidates in the annotated text
- Default value: False
If no string or text file is specified in the command line then a default text is used.
- Parsing the command line
- Creating a hash-table
- Adding the candidate finder functions
- Adding the metaphor labeler functions
- Initializing the text either from:
- Default text - defined in modules/utils.py
- A string - written in the command line
- A file - path in the command line
- Creating the object MetaphorIdentification
The AnnotatedText is created from the raw text using the nltk.word_tokenize() function. The Part-of-Speech and the Lemma of each word is also determined with NLTK functions: nltk.pos_tag and nltk.WordNetLemmatizer.
Call the procedure MetaphorIdentification.findCandidates()
Call the procedure MetaphorIdentification.labelMetaphors()
Defined in /new_structure/modules/datastructs/registry.py.
To identify metaphors in a text, at least two steps need to be followed: the candidate identification step and the labelling step. Each of these steps can be done in many ways. Each method needs to be registered in the metaphorRegistry defined in /sample/modules/registry.py
Defined in /new_structure/modules/datastructs/MetaphorIdentification.py.
It has four fields:
- rawText: string
- annotatedText: class AnnotatedText from modules/datastructs/annotated_text.py
- candidates: class CandidateGroup from modules/datastructs/candidate_group.py
- metaphors: class MetaphorGroup from modules/datastructs/labeled_metaphor_list.py
##How to Add a New Metaphor-Labeling Function Your function must be defined in a new file in the modules folder.
The input of the function must be:
- candidates
- Type: Object of class CandidateGroup
- cand_type:
- Type: string
- Value: "adjNoun" or "verbNoun"
- Usage: Corresponds to a database
- verbose:
- Type: Boolean
- Usage: Display some information if its value is True
The output of the function must be an object of class MetaphorGroup
- Variables
- candidates: list of objects of class Candidate
- size: number of elements in the list above
- Methods
- addCandidate(candidate): Add the element candidate to the list candidates and increment the variable size by 1
- getCandidate(index): Return the candidate of index index in the list candidates
- __iter__()
- __str__()
- Variables
- metaphors: list of objects of class Metaphor
- size: number of elements in the list above
- Methods
- addMetaphor(metaphor): Add the element metaphor to the list metaphors and increment the variable size by 1
- getMetaphor(index): Return the metaphor of index index in the list metaphors
- writeToCSV()
- __iter__()
- __str__()
- Variables
- annotatedText: object of class AnnotatedText
- sourceIndex: index of the source in the annotatedText
- sourceSpan: 2-tuple = (index of the first word in the source, index of the last word in the source)
- targetIndex: index of the target in the annotatedText
- targetSpan: 2-tuple = (index of the first word in the target , index of the last word in the target)
- Methods
- getSource(): return the first word of the source
- getTarget(): return the first word of the target
- getFullSource()
- getFullTarget()
- __stringAdder(): used in the getFull... functions
- Variables
- candidate: object of class candidate
- result: boolean
- confidence: number between 0 and 1
- Methods
- getSource(): return candidate.getFullSource()
- getTarget(): return candidate.getFullTarget()
- getResult()
- getConfidence()
- __str__()