0% found this document useful (0 votes)

89 views13 pages

Topic Map v2

The document discusses developing tools and techniques for automatically collecting and categorizing Philippine language resources from the web. This includes crawling the web, storing, organizing, and annotating collected texts based on language. Related work involves clustering languages.

Uploaded by

Sharell Gwen Monreal Dineros

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

89 views13 pages

Topic Map v2

Uploaded by

Sharell Gwen Monreal Dineros

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 13

eParticipation 2.

0 Topic Map
Collection

Pre-processing

Analysis: Social computing

•Sentiment analysis
•Classification
•Topic discovery
•Information Extraction

Analysis: Multi-dimensional analysis

•Temporal
•Location
•Trends
•Manual analysis
•Comparison of diff. approaches
•Archival
Collection
As research works are now data-driven, there is a need for a databank of Philippine language resources.
Towards addressing this concern, students who are interested will develop tools and techniques that
can aid automatic collection and categorization of texts. This includes crawling the web for language
resources and automatically storing and organizing them based on language. Related work includes
clustering the languages, and annotating each collected text.

Possible Resource Person(s):

 Mr. Nathaniel Oco

 Prof. Rachel Edita Roxas

Target Venue(s):

 Local: PCSC, NNLPRS

 International (SCOPUS): TENCON, IALP

Starting Reference(s):

Authors Oco, Nathaniel; Syliongka, Leif Romeritch; Allman, Tod; Roxas, Rachel Edita
Title Resources for Philippine Languages: Collection, Annotation, and Modeling
Publication The 30th Pacific Asia Conference on Language, Information and Computation
Pages 433-438
Year 2016
Publisher Institute for the Study of Language and Information at Kyung Hee University

Authors Dita, Shirley N; Roxas, Rachel EO; Inventado, Paul;

Title Building Online Corpora of Philippine Languages
Publication The 23rd Pacific Asia Conference on Language, Information and Computation
Pages 646-653
Year 2009
Publisher City University of Hong Kong

Authors Oco, Nathaniel; Ilao, Joel; Roxas, Rachel Edita; Syliongka, Leif Romeritch;
Title Measuring language similarity using trigrams: Limitations of language identification
Publication 2013 International Conference on Recent Trends in Information Technology (ICRTIT)
Pages 478-481
Year 2013
Publisher IEEE
Pre-Processing
Textual data has been the main resource for numerous software programs. One of the integral
considerations is the proper representation and use of high quality data. In order to achieve such
quality, text pre-processors – or subprograms that modify the raw data to custom fit or provide new
data features to a given system – are needed. Currently, there are numerous pre-processors that are
available. However, there exists no compilation of tools that are lightweight and flexible to different
kind of systems or language domains. Students are to develop pre-processing tools for textual data.
These may consist of the following:

 Tokenization
o Cleaning
o URLs
o Special Characters
o Length Limit
o Duplicates
o Stop words
 True-casing (e.g. john -> John)
 Feature Extraction (Affixes)
 Stemming (Root words)
 Text Transformation
o Standard text normalization (e.g. resume -> résumé, canonicalization)
o Unicode normalization (e.g. ñ -> U+00F1, Å -> U+00C5)
o Shortcut text normalization (e.g. LOL -> Laughing Out Loud, gr8 -> great)
o Spell / grammar check
o Translation

Possible Resource Person(s):

 Mr. Nicco Nocon

 Mr. Matthew Phillip Go
 Mr. Nathaniel Oco
 Prof. Rachel Edita Roxas

Target Venue(s):

 Local: PCSC, NNLPRS

 International (SCOPUS): TENCON, IALP, PACLIC

Starting Reference(s):

Authors Nocon, Nicco; Borra, Allan;

SMTPOST: Using Statistical Machine Translation Approach in Filipino Part-of-Speech
Title Tagging
Proceedings of the 26th Pacific Asia Conference on Language, Information, and
Publication Computation
Pages 391-396
Year 2016
Publisher Institute for the Study of Language and Information at Kyung Hee University

Authors Nocon, Nicco; Oco, Nathaniel; Ilao, Joel; Roxas, Rachel Edita;
Title Philippine component of the network-based ASEAN language translation public service
2014 International Conference on Humanoid, Nanotechnology, Information Technology,
Publication Communication and Control, Environment and Management (HNICEM)
Year 2014
Publisher IEEE

Authors Oco, Nathaniel; Roxas, Rachel Edita;

Title Pattern matching refinements to dictionary-based code-switching point detection
Proceedings of the 26th Pacific Asia Conference on Language, Information, and
Publication Computation
Pages 07-10
Year 2012

Authors Oco, Nathaniel; Borra, Allan;

Title A grammar checker for Tagalog using LanguageTool
Publication Asian Language Resources collocated with IJCNLP 2011
Pages 2-9
Year 2011
Sentiment Analysis
• Tweets; Game chat; News articles; Facebook
Data collection posts; Blogs

Data filtering • Language identification; Geolocation

• POS tagging; Code-switching detection;

Data annotation Named entity recognition

• Text normalization; Grammar checking;

Data processing Machine translation; Language modeling

Data analysis • Classification; WordNet

Result Evaluation • Accuracy; F-score; Kappa Statistics

Possible Resource Person(s):

 Mr. Alron Lam

 Mr. Nathaniel Oco
 Mr. Leif Romeritch Syliongka
 Prof. Rachel Edita Roxas

Target Venue(s):

 Local: PCSC, NNLPRS

 International (SCOPUS): TENCON, IALP
 Journal (SCOPUS): Philippine Political Science Journal, ACM Transactions on Asian Language
Information Processing, Literary and Linguistic Computing

Starting Reference(s):

Authors Lam, Alron Jan;

Improving Twitter Community Detection through Contextual Sentiment Analysis of
Title Tweets
Publication 54th Annual Meeting of the Association for Computational Linguistics
Pages 30-36
Year 2016
Publisher ACL

Authors Regalado, Ralph Vincent J; Chua, Jenina L; Co, Justin L; Tiam-Lee, Thomas James Z;
Subjectivity Classification of Filipino Text with Features Based on Term Frequency--
Title Inverse Document Frequency
Publication 2013 International Conference on Asian Language Processing (IALP)
Pages 113-116
Year 2013
Publisher IEEE

Authors Regalado, Ralph Vincent J; Cheng, Charibeth K;

Title Feature-Based Subjectivity Classification of Filipino Text
Publication 2012 International Conference on Asian Language Processing (IALP)
Pages 57-60
Year 2012
Publisher IEEE
Classification
• Tweets; Game chat; News articles; Facebook
Data collection posts; Blogs

Data filtering • Language identification; Geolocation

• POS tagging; Code-switching detection;

Data annotation Named entity recognition

• Text normalization; Grammar checking;

Data processing Machine translation; Language modeling

• Probabilistic classifiers; Decision trees;

Data analysis Convolutional Neural Nets

Result Evaluation • Accuracy; F-score; Kappa Statistics

Possible Topic: Classification of Typhoon-related Tweets

Twitter has been found to be a potentially useful source of information in times of disaster. As a
microblogging platform, users tend to use it for near-real-time updates. Specifically, in the context of
disasters, some use it to report damage, request for assistance, find missing persons, etc. These could be
useful for concerned entities like government agencies that conduct disaster response. However, with
the large multitude of tweets, it is hard for people to manually scour through them; the task is
sometimes likened to finding a needle in a haystack. Thus, automatic classification of relevant tweets
will be useful for situations like these. Students interested in this area will be involved in experimenting
with different features (like word embeddings) and classification algorithms to achieve this end goal.

Possible Resource Person(s) / Mentor(s):

 Mr. Alron Lam

 Mr. Nathaniel Oco
 Mr. Leif Romeritch Syliongka
 Prof. Rachel Edita Roxas

Target Venue(s):

 Local: PCSC, NNLPRS

 International (SCOPUS): TENCON, IALP
 Journal (SCOPUS): Philippine Political Science Journal, ACM Transactions on Asian Language
Information Processing, Literary and Linguistic Computing

Starting Reference(s):

10NNLPRS Proceedings and 11NNLPRS Proceedings (https://sites.google.com/site/11nnlprs/past-

symposia)
Topic Discovery
• Tweets; Game chat; News articles; Facebook
Data collection posts; Blogs

• Language identification; Geolocation;

Data filtering Sampling

Data annotation • Coding

• Text normalization; Grammar checking;

Data processing Machine translation; Language modeling

Data analysis • Unsupervised clustering; Topic modeling

Result Evaluation • Silhouette Index, Purity Index

Possible Resource Person(s) / Mentor(s):

 Mr. Nathaniel Oco

 Mr. Leif Romeritch Syliongka
 Prof. Rachel Edita Roxas

Target Venue(s):

 Local: PCSC, NNLPRS

 International (SCOPUS): TENCON, IALP
 Journal (SCOPUS): Philippine Political Science Journal, ACM Transactions on Asian Language
Information Processing, Literary and Linguistic Computing

Starting Reference(s):

Ligutom III, Cerino; Orio, Jay Vincent; Ramacho, Dyannah Alexa Marie; Montenegro,
Authors Chuchi; Roxas, Rachel Edita; Oco, Nathaniel;
Title Using Topic Modelling to make sense of typhoon-related tweets
Publication 2016 International Conference on Asian Language Processing (IALP)
Pages 362 - 365
Year 2017
Publisher IEEE

Authors Soriano, Cheryll Ruth; Roldan, Ma Divina Gracia; Cheng, Charibeth; Oco, Nathaniel;
Social media and civic engagement during calamities: the case of Twitter use during
Title typhoon Yolanda
Publication Philippine Political Science Journal
Volume 37
Number 1
Pages 06-25
Year 2016
Publisher Routledge

Syliongka, Leif Romeritch; Oco, Nathaniel; Lam, Alron Jan; Soriano, Cheryll Ruth; Roldan,
Authors Ma Divina Gracia; Magno, Francisco; Cheng, Charibeth;
Combining Automatic and Manual Approaches: Towards a Framework for Discovering
Title Themes in Disaster-related Tweets
Publication Proceedings of the 24th International Conference on World Wide Web
Pages 1239-1244
Year 2015
Publisher ACM
Information Extraction

Data collection • Tweets; News articles; Facebook posts; Blogs

Data filtering • Language identification; Geolocation

Data annotation • POS tagging; Named entity recognition

Data processing • Text normalization; Grammar checking;

Data analysis • Rule-based IE; Deep Learning

Result Evaluation • Accuracy; Word Error Rate

Possible Topic: Visualizing Disaster Information Extracted from Philippine News Articles / Tweets

News articles and tweets contain loads of information on disasters before, during, and after it happens.
These information sources contain typhoon names, date range of occurrence, locations hit, casualties,
financial and material needs of the victims, and others. They also contain information about donations
(and of what type) provided by countries, organizations, individuals to the victims. In this research,
students will create an automated way of extracting this information from these sources and displaying
them in a visual way showing the series of events related to each typhoon.

Possible Resource Person(s):

 Mr. Matthew Phillip Go

 Mr. Nicco Nocon
 Mr. Nathaniel Oco
 Prof. Rachel Edita Roxas

Target Venue(s):

 Local: PCSC, NNLPRS

 International (SCOPUS): TENCON, IALP, PACLIC, IJCNLP
 Journal (SCOPUS): Philippine Political Science Journal, ACM Transactions on Asian Language
Information Processing, Literary and Linguistic Computing

Starting References:

 https://www.aclweb.org/anthology/W/W14/W14-2905.pdf
 https://www.aclweb.org/anthology/W/W16/W16-3906.pdf
 https://www.aclweb.org/anthology/C/C08/C08-3001.pdf
Resources
http://bit.ly/1MpcFoT

 Tweets – From 2013 to present (filtered tweets available upon request)

 WordNets – Filipino WordNet
 Dictionaries – Filipino dictionary
 Tagged data – Tagged
 Language models – Religious text in different languages
 Multilingual corpora – Religious articles; Parallel corpus
 English and Filipino monolingual corpora – Wikipedia articles

Projects
LanguageTool: https://languagetool.org/

ASEANMT: http://aseanmt.org/

California Report Card: http://californiareportcard.org/

QuakeCAFE: http://quakecafe.org/

Opinion Space: http://opinion.berkeley.edu/

Online Tools
Twitter 4J: http://twitter4j.org/en/

Moses SMT Engine: http://www.statmt.org/moses/

SentiWordNet: http://sentiwordnet.isti.cnr.it/

Weka: http://www.cs.waikato.ac.nz/ml/weka/

eParticipation 2.0 PCARI-funded Project

https://www.dropbox.com/sh/avvs2qxo0f0qe92/AACkXSGBnSt2Urlgd41lOTD3a?dl=0

Topic Map V4
No ratings yet
Topic Map V4
14 pages
RRL Matrix Sample
No ratings yet
RRL Matrix Sample
3 pages
Your Name Here, PH.D.: Employment History
No ratings yet
Your Name Here, PH.D.: Employment History
3 pages
A Customised Curve CV
No ratings yet
A Customised Curve CV
3 pages
NLP with Python Lab Manual
No ratings yet
NLP with Python Lab Manual
15 pages
Academic & Tech Career Overview
No ratings yet
Academic & Tech Career Overview
3 pages
Academic CV: IT & Magic Expertise
No ratings yet
Academic CV: IT & Magic Expertise
3 pages
Communications in Computer and Information Science 781: Editorial Board
No ratings yet
Communications in Computer and Information Science 781: Editorial Board
10 pages
The Sensem Project Syntactico Semantic A PDF
100% (1)
The Sensem Project Syntactico Semantic A PDF
321 pages
Processing-V-Selected-Papers-From-Ranlp-2007-4726392: 4.9 Out of 5.0 (95 Reviews)
No ratings yet
Processing-V-Selected-Papers-From-Ranlp-2007-4726392: 4.9 Out of 5.0 (95 Reviews)
123 pages
Sample Questions NLP
No ratings yet
Sample Questions NLP
2 pages
NLP Assignment Answer
No ratings yet
NLP Assignment Answer
4 pages
New Docx 09:12:2024 07:25:21
No ratings yet
New Docx 09:12:2024 07:25:21
4 pages
An Online Punjabi Shahmukhi Lexical Resource
100% (1)
An Online Punjabi Shahmukhi Lexical Resource
7 pages
Disaster Response Classification Using NLP: Under Supervision of - Mrs. Sonali Mathur
No ratings yet
Disaster Response Classification Using NLP: Under Supervision of - Mrs. Sonali Mathur
14 pages
NLP Lab Manual-1
No ratings yet
NLP Lab Manual-1
18 pages
Comia 2019
No ratings yet
Comia 2019
6 pages
Demos 2012
No ratings yet
Demos 2012
550 pages
Demos 000
No ratings yet
Demos 000
6 pages
Final PPT
No ratings yet
Final PPT
21 pages
Module 2.1 NLTK and Corpora
No ratings yet
Module 2.1 NLTK and Corpora
7 pages
X - AI-NLP Worksheet
No ratings yet
X - AI-NLP Worksheet
2 pages
Association For Computational Linguistics: Applied Natural Language Processing Conference
No ratings yet
Association For Computational Linguistics: Applied Natural Language Processing Conference
6 pages
Pragmatic Approach For Twitter Analysis 3d1214e0
No ratings yet
Pragmatic Approach For Twitter Analysis 3d1214e0
14 pages
Question Bank On NLP, COA, ITB
No ratings yet
Question Bank On NLP, COA, ITB
154 pages
Unraveling The Power of Natural Language Processing
No ratings yet
Unraveling The Power of Natural Language Processing
11 pages
Indonesian Twitter Pre-processing
No ratings yet
Indonesian Twitter Pre-processing
7 pages
(Natural Language Processing) Didier Bourigault, Christian Jacquemin, Marie-Claude L'Homme - Recent Advances in Computational Terminology-John Benjamins Publishing Co (2001)
No ratings yet
(Natural Language Processing) Didier Bourigault, Christian Jacquemin, Marie-Claude L'Homme - Recent Advances in Computational Terminology-John Benjamins Publishing Co (2001)
399 pages
P18 4 PDF
No ratings yet
P18 4 PDF
162 pages
Semantic Computing Insights
No ratings yet
Semantic Computing Insights
12 pages
Anand Institute of Higher Technology Department of Computer Science and Engineering ACADEMIC YEAR: 2018-19 Mini Project Report
No ratings yet
Anand Institute of Higher Technology Department of Computer Science and Engineering ACADEMIC YEAR: 2018-19 Mini Project Report
9 pages
NLP Exercises
No ratings yet
NLP Exercises
2 pages
Practice Problems of NLP
No ratings yet
Practice Problems of NLP
3 pages
Translating Government Agencies' Tweet Feeds: Specificities, Problems and (A Few) Solutions
No ratings yet
Translating Government Agencies' Tweet Feeds: Specificities, Problems and (A Few) Solutions
10 pages
A Neuro-Symbolic AI Approach For Translating Children's Stories From English To Tamil With Emotional Paraphrasing
No ratings yet
A Neuro-Symbolic AI Approach For Translating Children's Stories From English To Tamil With Emotional Paraphrasing
28 pages
NLP Assignment 2
No ratings yet
NLP Assignment 2
1 page
NLP MTE Syllabus and Practice Problems
No ratings yet
NLP MTE Syllabus and Practice Problems
2 pages
Module2 Discussion2.1
No ratings yet
Module2 Discussion2.1
9 pages
Paper 1
No ratings yet
Paper 1
8 pages
978 3 642 36337 5
No ratings yet
978 3 642 36337 5
851 pages
Introduction Group 2
No ratings yet
Introduction Group 2
26 pages
NLP Applications in Healthcare
No ratings yet
NLP Applications in Healthcare
71 pages
Recent Advances in Natural Language Processing IV Selected Papers From RANLP 2005 1st Edition Nicolas Nicolov Online Reading
100% (7)
Recent Advances in Natural Language Processing IV Selected Papers From RANLP 2005 1st Edition Nicolas Nicolov Online Reading
163 pages
Paper 11-Normalization of Unstructured and Informal Text
No ratings yet
Paper 11-Normalization of Unstructured and Informal Text
8 pages
AR - B A A J - S S C C: ULE Ased Pproach For Ligning Apanese Panish Entences From A Omparable Orpora
No ratings yet
AR - B A A J - S S C C: ULE Ased Pproach For Ligning Apanese Panish Entences From A Omparable Orpora
8 pages
Collaborative Semantic Editing of Linked Data Lexica: John Mccrae, Elena Montiel-Ponsoda, Philipp Cimiano
No ratings yet
Collaborative Semantic Editing of Linked Data Lexica: John Mccrae, Elena Montiel-Ponsoda, Philipp Cimiano
7 pages
NLP CIE 1 Important Questions
No ratings yet
NLP CIE 1 Important Questions
4 pages
Computational Linguistics 16th International Conference of the Pacific Association for Computational Linguistics PACLING 2019 Hanoi Vietnam in Computer and Information Science 1215 Le-Minh Nguyen (Editor) download
100% (8)
Computational Linguistics 16th International Conference of the Pacific Association for Computational Linguistics PACLING 2019 Hanoi Vietnam in Computer and Information Science 1215 Le-Minh Nguyen (Editor) download
56 pages
Project Paper Submission B21CS045
No ratings yet
Project Paper Submission B21CS045
6 pages
Recent Advances in Natural Language Processing IV Selected Papers From RANLP 2005 1st Edition Nicolas Nicolov PDF Download
100% (1)
Recent Advances in Natural Language Processing IV Selected Papers From RANLP 2005 1st Edition Nicolas Nicolov PDF Download
61 pages
Project Report
No ratings yet
Project Report
12 pages
Measurement of Semantic Text Similarity
No ratings yet
Measurement of Semantic Text Similarity
13 pages
MINI
No ratings yet
MINI
9 pages
Similarity Engine Thought Paper
No ratings yet
Similarity Engine Thought Paper
7 pages
Answer NLP
No ratings yet
Answer NLP
5 pages
MTE Practice Set
No ratings yet
MTE Practice Set
4 pages
Practice Set NLP
No ratings yet
Practice Set NLP
5 pages
NLP Unit-2
No ratings yet
NLP Unit-2
12 pages
Bernstein Polynomials
No ratings yet
Bernstein Polynomials
13 pages
RADAR
No ratings yet
RADAR
10 pages
Network Security Essentials Guide
No ratings yet
Network Security Essentials Guide
22 pages
TLE-ICT 8 Periodical Exam Guide
No ratings yet
TLE-ICT 8 Periodical Exam Guide
1 page
Minecraft Launcher Debug Log
No ratings yet
Minecraft Launcher Debug Log
14 pages
Invoice - Bitrefill
No ratings yet
Invoice - Bitrefill
2 pages
0300006EN
No ratings yet
0300006EN
66 pages
Introduction To Embedded Systems - : Lesson 1: Definition, Classification, Skills Required, Application Examples, .
No ratings yet
Introduction To Embedded Systems - : Lesson 1: Definition, Classification, Skills Required, Application Examples, .
15 pages
Solaris & Oracle Cluster Setup Guide
92% (12)
Solaris & Oracle Cluster Setup Guide
108 pages
CMD Commands Under Windows - Thomas-Krenn-Wiki
No ratings yet
CMD Commands Under Windows - Thomas-Krenn-Wiki
4 pages
Equip Sim User Manual
No ratings yet
Equip Sim User Manual
131 pages
Salinan Dari Copy of Genshin Impact Materials Tracker (By Oble)
No ratings yet
Salinan Dari Copy of Genshin Impact Materials Tracker (By Oble)
242 pages
Industrial Training Report
No ratings yet
Industrial Training Report
17 pages
480 Quarter Turn Actuator-Product Catalogues-English PDF
No ratings yet
480 Quarter Turn Actuator-Product Catalogues-English PDF
4 pages
Class 6
No ratings yet
Class 6
25 pages
Pakdd 2018 Workshops Bdasc BDM Ml4cyber Paisi Damemo Melbourne Vic Australia June 3 2018 Revised Selected Papers Mohadeseh Ganji
No ratings yet
Pakdd 2018 Workshops Bdasc BDM Ml4cyber Paisi Damemo Melbourne Vic Australia June 3 2018 Revised Selected Papers Mohadeseh Ganji
141 pages
Software Testing and Quality Assurance Assignment: Exercise 1
No ratings yet
Software Testing and Quality Assurance Assignment: Exercise 1
3 pages
Ethics in Military and Civilian Software Development
No ratings yet
Ethics in Military and Civilian Software Development
10 pages
Vignesh Kumar Resume
No ratings yet
Vignesh Kumar Resume
1 page
Why Web3 Matters - Cdixon
No ratings yet
Why Web3 Matters - Cdixon
5 pages
Amazon Complaint
No ratings yet
Amazon Complaint
103 pages
2024 Navori Presentation English PDF
No ratings yet
2024 Navori Presentation English PDF
38 pages
2.development of Blind Assistive Device in Shopping Malls
No ratings yet
2.development of Blind Assistive Device in Shopping Malls
4 pages
Use of Technology in Accounting
No ratings yet
Use of Technology in Accounting
2 pages
YamMonManual 1.0
No ratings yet
YamMonManual 1.0
5 pages
EC8691 Lesson Plan Microprocessor and Micro COntroller
No ratings yet
EC8691 Lesson Plan Microprocessor and Micro COntroller
7 pages
C# Array PDF
No ratings yet
C# Array PDF
13 pages
XML &amp DHTML
No ratings yet
XML &amp DHTML
2 pages
Zend Framework
No ratings yet
Zend Framework
229 pages
M e Cse
No ratings yet
M e Cse
77 pages

Topic Map v2

Uploaded by

Topic Map v2

Uploaded by

eParticipation 2.

Analysis: Social computing

Analysis: Multi-dimensional analysis

Possible Resource Person(s):

 Mr. Nathaniel Oco

 Local: PCSC, NNLPRS

Authors Dita, Shirley N; Roxas, Rachel EO; Inventado, Paul;

Possible Resource Person(s):

 Mr. Nicco Nocon

 Local: PCSC, NNLPRS

Authors Nocon, Nicco; Borra, Allan;

Authors Oco, Nathaniel; Roxas, Rachel Edita;

Authors Oco, Nathaniel; Borra, Allan;

Data filtering • Language identification; Geolocation

• POS tagging; Code-switching detection;

• Text normalization; Grammar checking;

Data analysis • Classification; WordNet

Result Evaluation • Accuracy; F-score; Kappa Statistics

Possible Resource Person(s):

 Mr. Alron Lam

 Local: PCSC, NNLPRS

Authors Lam, Alron Jan;

Authors Regalado, Ralph Vincent J; Cheng, Charibeth K;

Data filtering • Language identification; Geolocation

• POS tagging; Code-switching detection;

• Text normalization; Grammar checking;

• Probabilistic classifiers; Decision trees;

Result Evaluation • Accuracy; F-score; Kappa Statistics

Possible Topic: Classification of Typhoon-related Tweets

Possible Resource Person(s) / Mentor(s):

 Mr. Alron Lam

 Local: PCSC, NNLPRS

10NNLPRS Proceedings and 11NNLPRS Proceedings (https://sites.google.com/site/11nnlprs/past-

• Language identification; Geolocation;

Data annotation • Coding

• Text normalization; Grammar checking;

Data analysis • Unsupervised clustering; Topic modeling

Result Evaluation • Silhouette Index, Purity Index

Possible Resource Person(s) / Mentor(s):

 Mr. Nathaniel Oco

 Local: PCSC, NNLPRS

Data collection • Tweets; News articles; Facebook posts; Blogs

Data filtering • Language identification; Geolocation

Data annotation • POS tagging; Named entity recognition

Data processing • Text normalization; Grammar checking;

Data analysis • Rule-based IE; Deep Learning

Result Evaluation • Accuracy; Word Error Rate

Possible Resource Person(s):

 Mr. Matthew Phillip Go

 Local: PCSC, NNLPRS

 Tweets – From 2013 to present (filtered tweets available upon request)

California Report Card: http://californiareportcard.org/

Opinion Space: http://opinion.berkeley.edu/

Moses SMT Engine: http://www.statmt.org/moses/

eParticipation 2.0 PCARI-funded Project

You might also like