Introduction To Latent Things

This document discusses topic modeling of tweets using hashtags. It proposes an approach that identifies latent associations between topic-related words, even if they don't appear together frequently. These broader relations are supported by the idea that words can be contextually related through other words in a transitive way. This approach aims to better capture topic models over larger time spans, as topics and word usage evolve. The method represents tweets with different feature types and trains one-class classifiers on older tweets to act as models for identifying relevant newer tweets. Experiments show this method has more persistent performance over time compared to traditional approaches and can better handle the changing semantics of topics in social media streams.

Uploaded by

Emmanuel Mecanosaurio

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views2 pages

Introduction To Latent Things

Uploaded by

Emmanuel Mecanosaurio

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Topic Models from Twitter Hashtags

Authors December 2, 2013

Introduction
Topic modelling consist in extracting a mathematical model from a set of semantically related documents. Ideally, the model must characterize the semantic knowledge a set of training examples have in common, this is, the information that makes the set relevant to the topic. The task of identifying information that relates documents to topics is hard due to natural language complexity and its hardness tends to scale when documents are generated in colloquial informal environments as in social networks or microtexts like sms or tweets. The main diculties are related to the informal way of expression of documents from such domains and the ephemeral nature of the topics, events and opinions discussed. While traditional topic modelling deals with more formal documents in almost static domains (like news or scientic articles, where the semantics about certain topic does not change considerably in short times), a set of brief texts generated by users is noisy and its semantic components changes rapidly because of multiple points of view referring to same facts or events. Most successful techniques developed to tackle topic modelling are based on statistical information about the presence of words in certain contexts, and its applicability relies on assumptions that may not be satised in other domains. In this works, we tried to push towards the problem of capturing topic models from twitter messages. For this, we follow an approach based on the assumption that there exists implicit relations between words that are topic related even when they do not appear in the same contexts with high frequency. These latent associations are supported by the idea that relations which connect words could be broader that those considered in other techniques. With broader relations we mean that a couple of words could be related contextually through others in transitive manner (if wi relates to wj and wj to wk , then wi related to wk ). The transitive property of latent associations also allow to attack the problem of temporal decaying of models. After some time, the set of words related to some event tend to change, discarding some words that keeps weak (infrequent) relations with the topic while including as new elements, words whose relation with it became more strong in recent time. This temporal variation in the set that relates to a given event in certain time is consequence of the temporal evolution of events and opinions and is hard to capture with traditional static modelling while the proposed method allows us to better identify relations in larger time spans. In order to test our ideas, we developed an experimental setup which consists of a stream ltering environment where socially-labelled messages from before a given time t become positive training examples. One-class classiers are trained with these examples, representing them with dierent types of features. These are our models. In turn, classiers are used to lter (or classify) an incoming stream of user generated messages created after t and represented in the same feature space, labelling them as positives if relevant to the topic and negatives otherwise. Experiments are also intended to show the temporal decay of models when using traditional features: after some time, classiers become unable to identify relevant messages with the same accuracy that it has when ltering documents generated few time after the training due to the evolution of topics. We show the advantage of proposed method through analysis plottings where this behaviour is observable and a decay index. For the last part of our work, we relate the observed results with a proposed measure that quanties the dispersion or broadness of topics in time, the Stream Broadness Index. It is intended to quantify the semantic variation in time for a given topic-corpus and shows that variations in the performance of a classier are clearly aected by time and corpus intrinsic characteristics. Results of our experiments shows that our proposed representation outperforms other classic approaches in the proposed evaluation framework, having a more persistent behaviour in time, which reduces the decay and improve precision after bigger periods. Moreover, the intrinsic analysis conrms that our method is applicable in scenarios where the semantic component of a topic labelled data stream changes in time. The remaining of this article is as follows: In section 2 a review of related works, emphasizing those approaches which attempt to tackle ltering, recommendation and classication problems in highly dynamic domains. Section 3 contains a formalization of the problem while Section 4 shows our proposed approach, including its theoretical foundations in stochastic neural networks, and also reviewing the training algorithms. The proposed evaluation framework is also

presented here. In section 5 the experimental setup and results are discussed in two stages: the ltering test related to the temporal decay and the intrinsic corpus analysis. Finally, in section 6 we present conclusions and some future directions for the work.

Shabd Yoga Meditation - The FIVE Names Mantra of Radhasoami
94% (36)
Shabd Yoga Meditation - The FIVE Names Mantra of Radhasoami
8 pages
EKEGUSII Encyclopedia
No ratings yet
EKEGUSII Encyclopedia
2 pages
Synopsis Ms PH.D Buitems
100% (1)
Synopsis Ms PH.D Buitems
3 pages
Twitter Hashtag Topic Modeling
No ratings yet
Twitter Hashtag Topic Modeling
2 pages
Topic Models From Twitter Hashtags: Method
No ratings yet
Topic Models From Twitter Hashtags: Method
2 pages
Text Classificatio Through Time:: Efficient Label Propagation in Time-Based Graphs
No ratings yet
Text Classificatio Through Time:: Efficient Label Propagation in Time-Based Graphs
9 pages
Asynchronous Text Mining Method
No ratings yet
Asynchronous Text Mining Method
5 pages
Topic Pattern Mining Survey
No ratings yet
Topic Pattern Mining Survey
7 pages
A Novel Heuristic For Graph-Based Topic
No ratings yet
A Novel Heuristic For Graph-Based Topic
9 pages
Semantic Distances
No ratings yet
Semantic Distances
32 pages
Topic Modelling: A Survey of Topic Models: Abstract-In Recent Years We Have Significant Increase
No ratings yet
Topic Modelling: A Survey of Topic Models: Abstract-In Recent Years We Have Significant Increase
12 pages
T 2V: D R T: OP EC Istributed Epresentations of Opics
No ratings yet
T 2V: D R T: OP EC Istributed Epresentations of Opics
25 pages
Review On Topic Detection Methods For Twitter Streams
No ratings yet
Review On Topic Detection Methods For Twitter Streams
5 pages
ECIR2009 Topic Trend Detection
No ratings yet
ECIR2009 Topic Trend Detection
5 pages
Paper 2
No ratings yet
Paper 2
9 pages
A Survey of Topic Modeling in Text Mining
No ratings yet
A Survey of Topic Modeling in Text Mining
7 pages
Machine Learning in Automated Text Categorization FABRIZIO SEBASTIANI Consiglio Nazionale Delle Ricerche
No ratings yet
Machine Learning in Automated Text Categorization FABRIZIO SEBASTIANI Consiglio Nazionale Delle Ricerche
3 pages
Eai 13-7-2018 159623
No ratings yet
Eai 13-7-2018 159623
16 pages
A LDA Based Model For Topic Evolution: Evidence From Information Science Journals
No ratings yet
A LDA Based Model For Topic Evolution: Evidence From Information Science Journals
6 pages
Review 3 - Journal Submission Format: Team Number Title (New)
No ratings yet
Review 3 - Journal Submission Format: Team Number Title (New)
28 pages
2013, Badea - Text Analysis Based On Time Series
No ratings yet
2013, Badea - Text Analysis Based On Time Series
5 pages
Text Mining of Twitter Data Using A Latent Dirichlet Allocation Topic Model and Sentiment Analysis
No ratings yet
Text Mining of Twitter Data Using A Latent Dirichlet Allocation Topic Model and Sentiment Analysis
6 pages
A Natural Language Processing Model For Automated Organization and Analysis of Intangible Cultural Heritage
No ratings yet
A Natural Language Processing Model For Automated Organization and Analysis of Intangible Cultural Heritage
27 pages
Knowledge Extraction From LLMs For Scalable Historical Data Annotation
No ratings yet
Knowledge Extraction From LLMs For Scalable Historical Data Annotation
12 pages
Analyzing and Ranking Prevalent News Over Social Media
No ratings yet
Analyzing and Ranking Prevalent News Over Social Media
12 pages
Twitternews: Real Time Event Detection From The Twitter Data Stream
No ratings yet
Twitternews: Real Time Event Detection From The Twitter Data Stream
9 pages
MIT - Accelerating Scientific Discovery With KG and AI
No ratings yet
MIT - Accelerating Scientific Discovery With KG and AI
83 pages
Major Research Topics in Big Data
No ratings yet
Major Research Topics in Big Data
4 pages
Improved Pattern Discovery in Text Mining
No ratings yet
Improved Pattern Discovery in Text Mining
5 pages
Efficient Preprocessing and Patterns Identification Approach For Text Mining
No ratings yet
Efficient Preprocessing and Patterns Identification Approach For Text Mining
6 pages
Dynamic Topic Modeling
No ratings yet
Dynamic Topic Modeling
13 pages
AI Topic Labeling-Final PDF
No ratings yet
AI Topic Labeling-Final PDF
5 pages
Text Classification by Augmenting Bag of Words (BOW) Representation With Co-Occurrence Feature
No ratings yet
Text Classification by Augmenting Bag of Words (BOW) Representation With Co-Occurrence Feature
5 pages
Twitter Storyline Generation
No ratings yet
Twitter Storyline Generation
18 pages
A Review of Approaches For Topic Detection in Twitter
No ratings yet
A Review of Approaches For Topic Detection in Twitter
28 pages
Rough Desertation Ver 1
No ratings yet
Rough Desertation Ver 1
14 pages
Clustering and Sentiment Analysis On Twitter Data
No ratings yet
Clustering and Sentiment Analysis On Twitter Data
5 pages
Jipeng Qiang 2019
No ratings yet
Jipeng Qiang 2019
17 pages
2403 11996v3-2
No ratings yet
2403 11996v3-2
85 pages
Comparing Topic Modeling and Named Entity Recognition Techniques For The Semantic Indexing of A Landscape Architecture Textbook
No ratings yet
Comparing Topic Modeling and Named Entity Recognition Techniques For The Semantic Indexing of A Landscape Architecture Textbook
6 pages
Analysis of Keywords in The Field of Crisis Management Using Semantic Network Graphs - Annals of Disaster Risk Sciences
No ratings yet
Analysis of Keywords in The Field of Crisis Management Using Semantic Network Graphs - Annals of Disaster Risk Sciences
7 pages
Using Incremental PLSI For Threshold-Resilient Online Event Analysis
No ratings yet
Using Incremental PLSI For Threshold-Resilient Online Event Analysis
11 pages
Twitter-Based Traffic Monitoring
No ratings yet
Twitter-Based Traffic Monitoring
3 pages
SSRN 4575985
No ratings yet
SSRN 4575985
29 pages
Department of Computer Science and Engineering Spring 2012
No ratings yet
Department of Computer Science and Engineering Spring 2012
18 pages
Wang 2006
No ratings yet
Wang 2006
10 pages
2019 - Latent Dirichlet Allocation (LDA) and Topic Modeling: Models, Applications, A Survey
No ratings yet
2019 - Latent Dirichlet Allocation (LDA) and Topic Modeling: Models, Applications, A Survey
43 pages
Combining Lexical and Semantic Features For Short Text Classification
No ratings yet
Combining Lexical and Semantic Features For Short Text Classification
9 pages
Thematic Analysis and Visualization of Textual Corpus
No ratings yet
Thematic Analysis and Visualization of Textual Corpus
17 pages
Comparison of Topic Modelling Approaches in The Banking Context
No ratings yet
Comparison of Topic Modelling Approaches in The Banking Context
14 pages
State of The Art Document Clustering Algorithms Based On Semantic Similarity
No ratings yet
State of The Art Document Clustering Algorithms Based On Semantic Similarity
18 pages
S U R J S S: Indh Niversity Esearch Ournal (Cience Eries)
No ratings yet
S U R J S S: Indh Niversity Esearch Ournal (Cience Eries)
6 pages
Sessionppt Topicmoelling
No ratings yet
Sessionppt Topicmoelling
40 pages
Xiangyingdai 2010
No ratings yet
Xiangyingdai 2010
5 pages
228 International Conference On Engineering Technologies (ICENTE'17)
No ratings yet
228 International Conference On Engineering Technologies (ICENTE'17)
3 pages
Jurnal BERTopic 8
No ratings yet
Jurnal BERTopic 8
12 pages
Twitter Sentiment Analysis Algorithm
No ratings yet
Twitter Sentiment Analysis Algorithm
4 pages
A Gentle Introduction To Topic Modeling Using Pyth
No ratings yet
A Gentle Introduction To Topic Modeling Using Pyth
10 pages
Minor Project Ii Report Text Mining: Reuters-21578: Submitted by
100% (1)
Minor Project Ii Report Text Mining: Reuters-21578: Submitted by
51 pages
Journal Pre-Proofs: Expert Systems With Applications
No ratings yet
Journal Pre-Proofs: Expert Systems With Applications
16 pages
Search Engine Techniques
No ratings yet
Search Engine Techniques
10 pages
A Two Staged NLP Based Framework For Assessing The Sentiments On Indian Supreme Court Judgments
No ratings yet
A Two Staged NLP Based Framework For Assessing The Sentiments On Indian Supreme Court Judgments
10 pages
Metolit Pertemuan 1 13092018
No ratings yet
Metolit Pertemuan 1 13092018
28 pages
Role of Teacher in Digital Era
100% (2)
Role of Teacher in Digital Era
11 pages
Week 3
No ratings yet
Week 3
11 pages
Sales Force Organization
No ratings yet
Sales Force Organization
56 pages
Morgan Stanley Case Study Analysis
No ratings yet
Morgan Stanley Case Study Analysis
5 pages
Socilisation Process in Organisation
No ratings yet
Socilisation Process in Organisation
6 pages
CV - Ravinder Bhardwaj
No ratings yet
CV - Ravinder Bhardwaj
3 pages
Full Text
No ratings yet
Full Text
121 pages
VTR Disciplines Card v1.2 by DLT
No ratings yet
VTR Disciplines Card v1.2 by DLT
11 pages
Spanking as Child Discipline Method
No ratings yet
Spanking as Child Discipline Method
1 page
Cap Provider HANDBOOK
No ratings yet
Cap Provider HANDBOOK
34 pages
Lesson Plan 1
No ratings yet
Lesson Plan 1
3 pages
5th Grade ELA Lesson: Intro to Verbs
No ratings yet
5th Grade ELA Lesson: Intro to Verbs
3 pages
Acc3110 Lec8 20091
No ratings yet
Acc3110 Lec8 20091
26 pages
Cognitive Semantics Overview
No ratings yet
Cognitive Semantics Overview
9 pages
Specially Designed Instruction-2
No ratings yet
Specially Designed Instruction-2
9 pages
Annihilation - The Sense and Significance of Death - Christopher Belshaw (The Philosophical Quarterly, Vol. 60, Issue 238) (2010)
No ratings yet
Annihilation - The Sense and Significance of Death - Christopher Belshaw (The Philosophical Quarterly, Vol. 60, Issue 238) (2010)
3 pages
Empathy English
No ratings yet
Empathy English
11 pages
Introduction Human Person Ewondo
No ratings yet
Introduction Human Person Ewondo
10 pages
10-21 Kestrel Newsletter
No ratings yet
10-21 Kestrel Newsletter
3 pages
TE Lawrence
No ratings yet
TE Lawrence
11 pages
Critical Path Method (PERT-CPM) and Precedence Diagram Method (PDM) As Project Scheduling Techniques
100% (1)
Critical Path Method (PERT-CPM) and Precedence Diagram Method (PDM) As Project Scheduling Techniques
4 pages
Htet TGT Social Science 2019-2023 All Previous Year Paper
No ratings yet
Htet TGT Social Science 2019-2023 All Previous Year Paper
222 pages
Marketing Exam: Multiple Choice & Essays
No ratings yet
Marketing Exam: Multiple Choice & Essays
7 pages
Exploratory Data Analysis-1
No ratings yet
Exploratory Data Analysis-1
10 pages
Architectural Graphics II Course
No ratings yet
Architectural Graphics II Course
4 pages
Measurement, Accuracy and Precision PDF
No ratings yet
Measurement, Accuracy and Precision PDF
19 pages

Introduction To Latent Things

Uploaded by

Introduction To Latent Things

Uploaded by

Topic Models from Twitter Hashtags

Authors December 2, 2013

You might also like