0% found this document useful (0 votes)

100 views9 pages

Conceptual Clustering of Text Clusters

In this paper, we discuss a way of combining both techniques. We first cluster the documents by a variant of -Means. Then we cluster the resulting clusters using a Conceptual Clustering technique. The resulting concept lattice can then be accessed using existing techniques from Formal concept Analysis.

Uploaded by

hebobo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

100 views9 pages

Conceptual Clustering of Text Clusters

Uploaded by

hebobo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Conceptual Clustering of Text Clusters

Andreas Hotho, Gerd Stumme

Institute of Applied Informatics and Formal Description Methods AIFB, University of Karlsruhe, D76128 Karlsruhe, Germany; http://www.aifb.uni-karlsruhe.de/WBS, hotho, stumme @aifb.uni-karlsruhe.de

Abstract. Common clustering techniques have the disadvantage that they do not provide intensional descriptions of the clusters obtained. Conceptual Clustering techniques, on the other hand, provide such descriptions, but are known to be rather slow. In this paper, we discuss a way of combining both techniques. We rst cluster the documents by a variant of Means, using a thesaurus as background knowledge. This clustering reduces the large number of documents to a relatively small number of clusters, which can then be clustered conceptually in the second step. Keywords. Text Clustering, Conceptual Clustering, Means, Formal Concept Analysis

1 Introduction
Common clustering techniques have the disadvantage that they do not provide intensional descriptions of the clusters obtained. Conceptual Clustering techniques, on the other hand, provide such descriptions, but are known to be rather slow. In this paper, we discuss a way of combining both techniques. Our approach consists of two steps. First, we apply a common (non-conceptual) clustering algorithm in our case a variant of the well-known Means algorithm in order to decrease the size of the problem. Then we cluster the resulting clusters using a conceptual clustering technique in our case, Formal Concept Analysis. The latter provides intensional descriptions of the resulting clusters; and is efcient enough, if the number of clusters chosen in the rst clustering step is not too high. The resulting concept lattice can then be accessed using existing techniques from Formal Concept Analysis. In this paper, we focus on the problem of text clustering. In order to improve the quality and understandibility of the clusters, we additionally make use of background knowledge in form of a thesaurus. In our application, we used WordNet. The problem addressed can thus be described as follows: Given a set of documents and a thesaurus, provide a clustering of the documents with reasonable performance, which comes along with intensional descriptions of the clusters. In this paper, we discuss our approach along the Reuters21578 text collection. The remainder of the

paper is organized as follows. In the next section, we describe the Reuters data, the preprocessing we performed, the non-conceptual clustering step, and the extraction of cluster descriptions. In Section 3, we recall the basic notions of Formal Concept Analysis, explain how the document clusters are clustered conceptually, and discuss the results. Section 4 provides an overview over related work. At the end of the paper, we discuss some future research issues.

Clustering the documents

In this section we describe the dataset we used for the evaluation, and the non-conceptual clustering part. This part consists of the preprocessing of documents, mapping words to synsets of WordNet, the nonconceptual clustering itself, and the extraction of cluster descriptions. The purpose of this rst clustering step is to reduce the number of objects (and the number of describing attributes) so that it can be treated in reasonable time by Formal Concept Analysis. The nal output of this part will be a set of clusters, together with a list of terms for each cluster describing it best. In this paper, we will use the expression term both for words and for terms (synsets) of the thesaurus for sake of simplicity. If we talk about one of them specifically, we will mention it explicitly.

2.1 The Reuters-21578 Dataset We selected the Reuters-21578 1 text collection for our experiments. The corpus consists of 21578 documents. This corpus is especially interesting for evaluation, as part of it comes along with a (hand-crafted) classication. It contains 135 so-called topics. To be more general, we will refer to them as classes in the sequel. For allowing evaluation, we restrict ourselves to the 12344 documents which have been classied manually by Reuters. Some of them could not be assigned by the experts to one of the predened classes; we collect them in an additional class defnoclass. In order to make the problem more homogeneous, we drop all classes with less than 25 documents and randomly prune documents in each class to at most 30 documents. Reuters assigns some of its documents to multiple classes, but we consider only the rst assignment. After these steps, we obtain our nal corpus for evaluation. It consists of 1015 documents, distributed over 34 Reuters topics. 2 2.2 Preprocessing the Document Set For the preprocessing of the documents, we used the text mining system developed at AIFB within the KAON3 framework. We performed the following steps on the selected corpus: First we lowered the letters of all words and removed stopwords. We used a stopword list with 571 entries which removed 374 stopwords from the documents. We also dropped all words with less than ve occurrences over the whole corpus. 4257 words were removed in total. After these steps, 2311 different words remained in our list, with a total occurrence of 85284. 2.3 WordNet as Background Knowledge Instead of using a bag-of-word model directly, we additionally enriched it with background knowledge. The idea was to replace the words by terms and their broader terms of a given thesaurus, in order to capture also similarities on a higher conceptual level. For this purpose we needed a resource suitable for the Reuters corpus. We choose WordNet 4 as our background knowledge. WordNet consists of so-called synsets, together with a hypernym/hyponym hierar1 http://www.daviddlewis.com/resources/testcollections/ reuters21578/ 2 acq, alum, bop, carcass, cocoa, coffee, copper, cotton, cpi, crude, defnoclass, dlr, earn, gas, gnp, gold, grain, interest, ipi, ironsteel, jobs, livestock, money-fx, money-supply, nat-gas, oilseed, petchem, reserves, rubber, ship, sugar, tin, trade, veg-oil 3 http://kaon.semanticweb.org 4 http://www.cogsci.princeton.edu/ wn/

chy.5 First we replaced all nouns appearing in the documents with synsets from WordNet (and omitted the rest). As the assignment of words to synsets is ambiguous, we implemented several strategies. The strategy we nally used was to assign to each word the synset that WordNet suggests as the most probable. Then we used the hypernym/hyponym hierarchy on the synsets of WordNet to add more general terms, which later help identifying related topics that are addressed by (seemingly) different words. We added to each synset its four most specic hypernyms. The number of four was chosen for not obtaining too general (and hence non-distinguishing) terms. The synsets that were assigned to at least one document formed then the set of terms, which is used for describing the documents. We performed our approach also without thesaurus. In order to capture the declination of nouns (which was implicitly done by the mapping to synsets in the approach described above), we applied a Porter Stemmer [14]. It showed that the thesaurus-based approach performed better in terms of accuracy, hence we dropped the stemming approach. 2.4 Building the Term Vectors Based on the work done so far, we built a term vector for each document . For each document, the terms are weighted by tdf (term frequency inverse document frequency) [15], which is dened as follows: tdf tf (1)

where tf is the frequency of term in document , and is the set of all documents containing term . The term vector for document is then the tuple tdf . Tdf weighs the frequency of a term in a document with a factor that discounts its importance when it appears in almost all documents. Therefore terms that appear too rarely or too frequently are ranked lower than terms that hold the balance and, hence, are expected to be better able to contribute to clustering results.
5 See http://www.cogsci.princeton.edu/ wn/man1.7.1/ gloss.7WN.html

wn-

2.5 Clustering the Documents with BiSec Means On the preprocessed data (as described in 2.2) we applied a variant of Means, the bisecting Means (in the following called BiSec Means), using the socalled cosine similarity: We calculate the similarity between two documents as the cosine of their word vectors and , which can be computed as follows:

For clustering the obtained document clusters conceptually, we need a description for each of the clusters. We describe next, how these descriptions are extracted. 2.6 Extracting Cluster Descriptions For applying a conceptual clustering approach like Formal Concept Analysis (FCA), we need intensional descriptions of the objects to be clustered. In our scenario this means that we have to decide, for each thesaurus term and each cluster, if the term shall be considered as being important for the cluster or not. For performance reasons, we also would like to keep the total number of selected terms small. Therefore we need a method which points us to the most important terms for each cluster. We introduce a threshold to decide whether a term is important or not. This way we are also able to control how many terms remain to describe the clusters. In our application, we used a threshold of 25 % of the maximal value. We used the centroid vectors of the clusters for extracting the cluster descriptions. For each cluster, the description of the cluster is the set of all terms having a value in the centroid vector which is above the threshold . This assures that those terms are selected which are most important for the cluster. All terms which were not assigned to at least one cluster were nally dropped. The resulting set is denoted by . The assignment of the terms to the clusters is the basis for the next step, the conceptual clustering part.

tdf tdf

tdf

(2)

tdf

For the non-conceptual clustering step we need a fast algorithm (such as Means) to deal with large datasets, which should also provide a reasonable accuracy. Instead of a slow agglomerative clustering technique with a good accuracy we choose BiSec Means which tends to give better results as Means and is sometimes also better as agglomerative clustering, while it is as fast as Means (cf. [16]). BiSec Means is based on the Means algorithm, which works as follows: Let be the number of desired clusters. Choose randomly ing centroids. points as start-

Assign each point to the closest centroid (with respect to a given similarity measure). (Re-)calculate all cluster centroids. Repeat the last two steps until the centroids do not change any more.

Conceptual Clustering of the Document Clusters

BiSec Means applies Means repeatedly times where is the predened number of clusters for BiSec Means. At the rst time, Means is performed for . Then the cluster with the highest cardinality is selected and split into two new clusters; . This procedure is using again Means with repeated until the requested clusters are built. The set of clusters is denoted by . The centroid of a cluster is denoted by .
This clustering reduces the large number of documents to a relatively small number of clusters, which can then be clustered conceptually in the second step. The idea is, however, to keep that number as large as possible, since the more of the clustering is done conceptually, the better the results will be interpretable. 3

Now we consider the clusters of documents as atomic objects which will be clustered conceptually. As the number of objects is thus reduced to a reasonable size, we are able to apply a conceptual clustering technique. We will obtain a clustering of document clusters, where each cluster of document clusters comes along with an intensional description. This description then serves also as description of the documents themselves. 3.1 Conceptual Clustering by Formal Concept Analysis As conceptual clustering technique, we make use of Formal Concept Analysis. Formal Concept Analysis (FCA) was introduced as a mathematical theory modeling the concept of concepts in terms of lattice theory. We recall the basics of Formal Concept Analysis (FCA) as far as they are needed for this paper. An

extensive overview is given in [5]. To allow a mathematical description of concepts as being composed of extensions and intensions, Formal Concept Analysis starts with a formal context: Denition: A formal context is a triple , where is a set of objects, is a set of attributes, and is a binary relation between and (i. e. ). is read object has attribute . In our application, the set of objects consists of all . clusters determined in the previous step, i. e., The set of attributes consists of all terms which remain from the step described in Section 2.6, i. e., ; and the relation indicates if a term is related to a cluster, i. e., if its value in the centroid vector is above the threshold : . In the sequel, attribute and term are thus used synonymously. Object is used synonymously with BiSec Means cluster unless otherwise stated. From a formal context, a concept hierarchy, called concept lattice, can be derived: Denition: For

if and only if there is a path of ascend ing(!) edges from the node representing to the node representing . The name of an object is always attached to the node representing the most specic concept (i. e., the smallest concept with respect to ) with in its extent (i. e., in our gure, the highest such node); dually, the name of an attribute is always attached to the node representing the most general concept with in its intent (i. e., the lowest such node in the diagram). We can always read the context relation from the diagram, since an object has an attribute if and only if the concept labeled by is a subconcept of the one labeled by . The extent of a concept consists of all objects whose labels are attached to subconcepts, and, dually, the intent consists of all attributes attached to superconcepts.

For example, the concept labeled by rener has CL 1, CL 3 as extent, and (h)rener, (h)oil, . . . , (h)compound, chemical compound as intent. (h) indicates here WordNet synsets. In the diagram, we can for instance see that there is a chain of concepts with increasing specicity. The most general of them (beside the top concept) contains in its extent clusters of documents addressing chemical compounds: CL 1, CL 3, CL 11, CL 17, and CL 33. In the next concept, they are restricted to document clusters related to oil: CL 1, CL 3, CL 11. The following concept considers only two of these clusters, namely the clusters 1 and 3. These are the only clusters talking about rening oil. When we nally have a look at the attribute labels of the two concepts labeled by CL 1 and CL 3, resp., then we see that they address in fact different aspects of rening oil: The documents in Cluster 1 deal with the renement of plant oil, while the documents in Cluster 3 have crude oil as subject. The resulting concept lattice can also be interpreted as a concept hierarchy directly on the documents, as it is isomorphic to the concept lattice of the context with , , and iff and for some cluster . This context is in fact an approximation of the descriptions of the documents by term vectors, with the property that all documents in one cluster obtain exactly the same description. This loss of information is the price we pay for improving the efciency. These observations show that we are indeed able to derive clusters of objects together with intensional descriptions in reasonable time; and still with a reasonable degree of detail. Furthermore, the technique is robust with regard to upcoming documents: A new document is rst assigned to the cluster with the closest centroid, and then nds its place within the concept lattice. If on the contrary the document would be con4

, we dene and, for

, we dene

A formal concept of a formal context is dened as a pair with , , and . The sets and are called the ex. The tent and the intent of the formal concept subconceptsuperconcept relation is formalized by

The set of all formal concepts of a context together with the partial order is always a complete lattice, 6 called the concept lattice of and denoted by . 3.2 Visualizing the Concept Hierarchy Figure 1 highlights a part of the concept lattice of our context by a line diagram. It will be explained in detail below. The lattice was computed and visualized using the Cernato software of NaviCon Gmbh. 7 Line diagrams follow the conventions for the visualization of hierarchical concept systems as established in the international standard ISO 704. In a line diagram, each node represents a formal concept. Due to technical reasons, we reverse the usual reading order: A concept is a subconcept of a concept
for each set of formal concepts, there exists always a unique greatest common subconcept and a unique least common superconcept. 7 www.navicon.de
6 I. e.,

Figure 1 The resulting conceptual clustering of the text clusters (highlighting the concepts related to (oil) renement).

sidered directly for computing the concept lattice, it could not be guaranteed that the structure of the lattice does not change. 3.3 Analyzing the Document Clusters Let us show another example of analyzing the documents by our method. In order to give a rst hint where to discover interesting structures, we applied rst a magnetic spring algorithm for graph visualization 8 for recognizing which clusters are related. A part of the resulting graph is shown in Figure 2. Based on the cosine similarity, it tries to map the clusters into the Euclidean plane such that clusters with similar centroids attract each other, and clusters with different centroids repel each other. Strong similarity (with respect to a given threshold, in our example 75 % of the maximal
8 http://java.sun.com/applets/jdk/1.0/demo/GraphLayout/

similarity) is indicated by a line between the clusters. In the diagram, we see for instance that the Clusters 8, 17, 33, 34, 44 and 49 have similar centroids. The term in parentheses behind a cluster number in the diagram indicates to which Reuters topic the majority of the documents in the cluster were assigned by the Reuters experts. Of course one does not have this additional information when clustering documents in an unsupervised way. We added this information for simplifying the evaluation. In an unsupervised setting, one could display the most important term(s) describing the cluster. In order to analyze the similarity of the Clusters 8, 17, 33, 34, 44 and 49 conceptually, we restrict the object set of the formal context to just those clusters, and recompute the concept lattice. The result is shown in Figure 3. The lattice provides a lot of details which can be explored interactively using Cernato (as in Fig5

Figure 2 Graph showing (distance-based) similarities between the text clusters

ure 1). In the diagram, we rst observe that the concepts labeled by beverage and latex partition the set of the six clusters under consideration: The extent of the former is CL 8, CL 17, CL 33, CL 44 , while the extent of the latter is CL 33, CL 34, CL 49 . The extent of the beverage concept is further split into two disjoint sets: the extent CL 8, CL 44 of the concept labeled by coffee, and the extent CL 17, CL 33 of the concept labeled by cocoa. Checking this observation with the hand-crafted Reuters topics, one observes that most of the documents contained in these clusters are indeed about coffee and cocoa, resp. The documents related to latex, on the other side, are grouped together in the extent of the right-most context. As we can see, a part of them (namely the documents contained in cluster 49) is also addressing topics like buffer stocks and international [organiza-

tions].9 In fact, when looking at selected documents, one observes that they address for instance negotiations about the regulation of rubber prices depending on the volume of buffer stocks. The topics buffer stocks and international provide also a bridge to the cocoa related documents: All 10 the Reuters documents talking about cocoa also address buffer stock issues, while those contained in Cluster 33 additionally have international organizations as topic. When checking the concept intent of the concept labeled by CL 8, one observes a large diversity of topics: pork, . . . , music, coffee, food, beverage. In
9 The labels non-market economy, socialism, etc. are an artifact of our mapping of words to synsets, as International was interpreted as noun. We plan to add a partofspeech tagger to overcome this problem. 10 Here, All means more specically all, up to the precision reached by BiSec Means.

Figure 3 Concept lattice focusing on the clusters 8, 17, 33, 34, 44 and 49.

fact, when reading the documents in Cluster 8, one can observe that many of them are about livestock, and only four of them are about coffee. Thus, in this case, BiSec Means has performed badly, and put together unrelated documents in one cluster. This example shows how one can identify inconsistencies in the results of non-conceptual clustering by using Formal Concept Analysis.

in our approach. In [7], Karypis and Han show that cluster centroids can be used to summarize the content of a cluster. They state that the most important terms in a cluster centroid are the terms with the highest weight. This observation underlies our approach in Section 2.6, where we use only the highly weighted terms to describe the content of the cluster. We differ from their approach in that we additionally make use of WordNet. Buenaga Rodrguez et. al. [3] and Ure a L ez et. al. n o [9] show a successful integration of the WordNet resource for a document categorization task. They use the Reuters corpus for evaluation and improve the classication results of the Rocchio and Widrow-Hoff algorithms by 20 points. In contrast to our approach, they manually select synsets for each category and add the terms contained in the synsets with certain weights to the term vectors. In [6], WordNet is used for word sense disambiguation. Gonzalo et.al. manually build a synset vector.

4 Related work
In [13], Pantel and Lin introduce an algorithm called CBC (Clustering by Committee). Committees are disjoint subsets of the object set which are distributed as homogeneous as possible over the object space. Iteratively, documents are assigned to the closest committee, or introduce a new committee if the existing committees are too far away, or are ignored if they are just between existing committees. CBC provides more precise descriptions for the clusters, but does not cover all objects. It could be used instead of BiSec Means

They show in an information retrieval setting the improvement of the disambiguated synset model over the word vector model. In contrast to our approach, they (as well as [3] and [9]) do not make use of WordNet relations other than hypernyms. Conceptual clustering with Formal Concept Analysis has been discussed in [17, 1, 11, 18]. Another approach to Conceptual Clustering is for instance discussed in [10]. Formal Concept Analysis differs from them in that it does not make use of any heuristics (including arbitrary start settings) and allows for overlapping clusters. Compared to non-conceptual clustering approaches, all conceptual clustering approaches have in common less computational efciency. Our paper is an approach to overcome this drawback.

fact can be exploited for navigation and retrieval tasks. Another interesting question is if Formal Concept Analysis can be used for automatically computing intensional descriptions of the clusters generated by the non-conceptual clustering algorithm. These descriptions will consist of conjunctions of terms. It has to be dened what a globally optimal description for the clusters is. Then an algorithm for computing such a description has to be developed (compare also with [10]). From our experience with the application described in this paper we believe that it is promising to combine the advantages of an intensional description of conceptual clustering with the efciency of nonconceptual clustering. But further work has to been done to bring this combination to its full potential.

5 Conclusion and Future work

In this paper, we discussed a way of combining the efciency of a common (non-conceptual) clustering technique with the intensional descriptions provided by a conceptual clustering approach. We showed how this approach can be applied to a corpus of documents. We rst clustered the documents using BiSec Means, using a thesaurus as background knowledge. This clustering reduces the number of documents, so that they can be clustered conceptually in the second step using Formal Concept Analysis. As the work presented here is a rst attempt to combine conceptual and non-conceptual clustering, many interesting research topics remain. For instance, it seems promising to check if non-disjoint clustering techniques (instead of BiSec Means) might give better results together with FCA for describing the documents, since FCA explicitly allows a oneto many assignment between objects and attributes. We will also study alternatives to tdf and cosine for measuring similarity. Another important question is how the resulting concept lattice shall be presented to the user. We will check which approaches are best t to this purpose. One way is for instance to use Cernato as we did for this paper. Another way is to derive conceptual scales [4] by grouping together the most related sets of terms. These conceptual scales can then be visualized using TOSCANA [8, 19, 12]. The resulting concept lattice may also be accessed based on iceberg concept lattices [18], or as discussed in [1] or [2]. We will test these approaches also on domain-specic ontologies other than WordNet. Interesting from a structural point of view is how the tree structure from the non-conceptual clustering by BiSec Means ts with the concept lattice. If it is (more or less) embedded in the concept lattice, this 8

References
1. C. Carpineto and G. Romano. GALOIS: An ordertheoretic approach to conceptual clustering. In Machine Learning, Proc. ICML 1993, pages 3340. Morgan Kaufmann Publishers, 1993. 2. R. Cole and G. Stumme. CEM a conceptual email manager. In B. Ganter and G. W. Mineau, editors, Conceptual Structures: Logical, Linguistic, and Computational Issues. Proc. ICCS 00, volume 1867 of LNAI, pages 438452, Heidelberg, 2000. Springer. 3. Manuel de Buenaga Rodrguez, Jos Mara G mez e o Hidalgo, and Bel n Daz-Agudo. Using wordnet to e complement training information in text categorization. In N. Nicolov and R. Mitkov, editors, Recent Advances in Natural Language Processing II: Selected Papers from RANLP 97, Amsterdam-Philadelphia, 2000. John Benjamins. 4. B. Ganter and R. Wille. Conceptual scaling. In F.Roberts, editor, Applications of combinatorics and graph theory to the biological and social sciences, pages 139167, New York, 1989. Springer. 5. B. Ganter and R. Wille. Formal Concept Analysis: Mathematical Foundations. Springer, 1999. 6. J. Gonzalo, F. Verdejo, I. Chugur, and J. Cigarr n. Ina dexing with wordnet synsets can improve text retrieval. In Proceedings ACL/COLING Workshop on Usage of WordNet for Natural Language Processing, 1998. 7. George Karypis and Eui-Hong Han. Fast supervised dimensionality reduction algorithm with applications to document categorization and retrieval. In Arvin Agah, Jamie Callan, and Elke Rundensteiner, editors, Proceedings of CIKM-00, 9th ACM International Conference on Information and Knowledge Management, pages 1219, McLean, US, 2000. ACM Press, New York, US. 8. W. Kollewe, M. Skorsky, F. Vogt, and R. Wille. TOSCANA ein Werkzeug zur begrifichen Analyse und Erkundung von Daten. pages 267288, 1994. 9. J. M. G mez Hidalgo L. A. Ure a L ez, M. de Bueo n o naga Rodrguez. Integrating linguistic resources in

10.

11.

12.

13.

14. 15.

16.

17.

18.

19.

tc through wsd. Computers and the Humanities, 35(2):215230, 2001. R. S. Michalski and R. Stepp. Learning from observation: Conceptual clustering. In R. S. Michalski, J. G. Carbonell, and T. M. Mitchell, editors, Machine Learning, An Articial Intelligence Approach, volume II, pages 331363, Palo Alto, 1983. TIOGA Publishing Co. G. Mineau and R. Godin. Automatic structuring of knowledge bases by conceptual clustering. IEEE Transactions on Knowledge and Data Engineering, 7(5):824829, 1995. The ToscanaJ Project: An Open-Source Reimplementation of TOSCANA. http://toscanaj.sourceforge.net. Patrick Pantel and Dekang Lin. Document clustering with committees. In Proceedings of SIGIR02, Tampere, Finland, 2002. M. F. Porter. An algorithm for sufx stripping. Program, 14(3):130137, 1980. G. Salton. Automatic Text Processing: The Transformation, Analysis and Retrieval of Information by Computer. Addison-Wesley, 1989. M. Steinbach, G. Karypis, and V. Kumar. A comparison of document clustering techniques. In KDD Workshop on Text Mining, 2000. S. Strahringer and R. Wille. Conceptual clustering via convex-ordinal structures. In O. Opitz, B. Lausen, and R. Klar, editors, Information and Classication, pages 8598, Berlin-Heidelberg, 1993. Springer. G. Stumme, R. Taouil, Y. Bastide, N. Pasqier, and L. Lakhal. Computing iceberg concept lattices with Titanic. J. on Knowledge and Data Engineering, 42:189 222, 2002. F. Vogt and R. Wille. TOSCANA a graphical tool for analyzing and exploring data. In R. Tamassia and I. G. Tollis, editors, Graph Drawing 94, volume LNCS 894, pages 226233, Heidelberg, 1995. Springer.

Philippines Bureaucracy
91% (11)
Philippines Bureaucracy
7 pages
The Secular City (Cox, Stewart)
No ratings yet
The Secular City (Cox, Stewart)
2 pages
DVT Unit 4
No ratings yet
DVT Unit 4
21 pages
Wordnet Improves Text Document Clustering: Andreas Hotho Steffen Staab Gerd Stumme
No ratings yet
Wordnet Improves Text Document Clustering: Andreas Hotho Steffen Staab Gerd Stumme
8 pages
A Nice OSCP Cheat Sheet
50% (2)
A Nice OSCP Cheat Sheet
12 pages
Module 6 - Notes To Financial Statements - Organized PDF
No ratings yet
Module 6 - Notes To Financial Statements - Organized PDF
9 pages
Automatic Building of An Ontology From A Corpus of Documents
No ratings yet
Automatic Building of An Ontology From A Corpus of Documents
5 pages
Grade 3 English FAL Term3 Weeks 1 To 10
No ratings yet
Grade 3 English FAL Term3 Weeks 1 To 10
18 pages
Iso 45009
No ratings yet
Iso 45009
30 pages
Battlefleet Gothic 2010 Compendium: Powers of Chaos
100% (1)
Battlefleet Gothic 2010 Compendium: Powers of Chaos
23 pages
Clustering and Search Techniques in Information Retrieval Systems
67% (3)
Clustering and Search Techniques in Information Retrieval Systems
39 pages
NLP Unit Test 2
No ratings yet
NLP Unit Test 2
10 pages
The Peculiarities of The Text Document Representation, Using Ontology and Tagging-Based Clustering Technique
No ratings yet
The Peculiarities of The Text Document Representation, Using Ontology and Tagging-Based Clustering Technique
4 pages
In The Supreme Court of Bangladesh (High Court Division) : Md. Imman Ali and Sk. Hassan Arif, JJ
No ratings yet
In The Supreme Court of Bangladesh (High Court Division) : Md. Imman Ali and Sk. Hassan Arif, JJ
14 pages
Common Printer Problems Guide
No ratings yet
Common Printer Problems Guide
17 pages
DVT UNIT - 4 Notes 211124
No ratings yet
DVT UNIT - 4 Notes 211124
21 pages
Network Security: Intrusion Detection
No ratings yet
Network Security: Intrusion Detection
4 pages
Paper 2
No ratings yet
Paper 2
9 pages
Bharatanatyam Mudras Guide
100% (1)
Bharatanatyam Mudras Guide
10 pages
Semantic Topic Extraction and Segmentation For Efficient Document Visualization
No ratings yet
Semantic Topic Extraction and Segmentation For Efficient Document Visualization
72 pages
Quitoy Feature
No ratings yet
Quitoy Feature
2 pages
Employing A Domain Specific Ontology To Perform Semantic Search
No ratings yet
Employing A Domain Specific Ontology To Perform Semantic Search
13 pages
The Present Simple and Present Continuous in Engli Activities Promoting Classroom Dynamics Group Form - 94392
No ratings yet
The Present Simple and Present Continuous in Engli Activities Promoting Classroom Dynamics Group Form - 94392
2 pages
Measure Term Similarity Using A Semantic Network Approach
No ratings yet
Measure Term Similarity Using A Semantic Network Approach
5 pages
Ijet V2i3p7
No ratings yet
Ijet V2i3p7
6 pages
Literary Icons of the 20th Century
No ratings yet
Literary Icons of the 20th Century
8 pages
Knowledge Representation For Multilingual Text Categorization
No ratings yet
Knowledge Representation For Multilingual Text Categorization
5 pages
Human Resource Management Thesis Philippines
100% (2)
Human Resource Management Thesis Philippines
6 pages
Case Study On Anthropology
No ratings yet
Case Study On Anthropology
4 pages
Unit I - Text Mining
No ratings yet
Unit I - Text Mining
48 pages
Multimedia Information Retrieval (CSC 545) : The Problem of IR
No ratings yet
Multimedia Information Retrieval (CSC 545) : The Problem of IR
29 pages
SSC MTS Syllabus PDF in Hindi Download 2021 Exam Pattern
No ratings yet
SSC MTS Syllabus PDF in Hindi Download 2021 Exam Pattern
4 pages
Financial Performance: Hindalco (Aditya Birla Group) An Overview of Financial Performance
No ratings yet
Financial Performance: Hindalco (Aditya Birla Group) An Overview of Financial Performance
10 pages
Ontology-Based Text Clustering: A. Hotho and S. Staab A. Maedche
No ratings yet
Ontology-Based Text Clustering: A. Hotho and S. Staab A. Maedche
8 pages
Module 3
No ratings yet
Module 3
40 pages
Week 7 - Show in Class - Text Processing
No ratings yet
Week 7 - Show in Class - Text Processing
4 pages
Mpi I 2006 5 006
No ratings yet
Mpi I 2006 5 006
39 pages
Art Appreciation Module Overview
No ratings yet
Art Appreciation Module Overview
4 pages
Data Mining Methods For The Content Analyst: An Introduction To The Computational Analysis of Content
No ratings yet
Data Mining Methods For The Content Analyst: An Introduction To The Computational Analysis of Content
121 pages
Unit 2
No ratings yet
Unit 2
25 pages
06 Text Clustering
No ratings yet
06 Text Clustering
20 pages
Hindu Medieval Salvation Islamic Sufism: Bhakti Movement
No ratings yet
Hindu Medieval Salvation Islamic Sufism: Bhakti Movement
1 page
DeekshikaJadyada28 AP24LDS11
No ratings yet
DeekshikaJadyada28 AP24LDS11
7 pages
Semantic-Based Grouping of Search Engine Results Using WordNet
No ratings yet
Semantic-Based Grouping of Search Engine Results Using WordNet
9 pages
Lect 5
No ratings yet
Lect 5
40 pages
Bok:978 3 540 72035 5
No ratings yet
Bok:978 3 540 72035 5
667 pages
Story Writing Guide: Format & Examples
No ratings yet
Story Writing Guide: Format & Examples
5 pages
A FCA-Based Ontology Construction For The Design of Class Hierarchy
No ratings yet
A FCA-Based Ontology Construction For The Design of Class Hierarchy
9 pages
DVT U4 My Notes
No ratings yet
DVT U4 My Notes
15 pages
DSBA+Master+Codebook+ +Text+Mining+&+TSF
No ratings yet
DSBA+Master+Codebook+ +Text+Mining+&+TSF
11 pages
Unsupervised Relation Discovery With Sense Disambiguation: Limin Yao Sebastian Riedel Andrew Mccallum
No ratings yet
Unsupervised Relation Discovery With Sense Disambiguation: Limin Yao Sebastian Riedel Andrew Mccallum
9 pages
Bernie Is Not Very Successful in His Current Position Because First of All, He
No ratings yet
Bernie Is Not Very Successful in His Current Position Because First of All, He
2 pages
Preprocessing Stemin JI
No ratings yet
Preprocessing Stemin JI
3 pages
Jurnal Information Retrieval
No ratings yet
Jurnal Information Retrieval
4 pages
NLP Unit 2 Lec11 12 13 14 15
No ratings yet
NLP Unit 2 Lec11 12 13 14 15
7 pages
Generating A Concept Hierarchy For Sentiment Analysis: Bin Shi Kuiyu Chang
No ratings yet
Generating A Concept Hierarchy For Sentiment Analysis: Bin Shi Kuiyu Chang
6 pages
International Journal of Computing: Comprehensive Document Clustering For Information Retrieval On Web
No ratings yet
International Journal of Computing: Comprehensive Document Clustering For Information Retrieval On Web
7 pages
Almost Unsupervised Content Filtering Using Topic Models: Swapnil Hingmire Sandeep Chougule Girish K. Palshikar
No ratings yet
Almost Unsupervised Content Filtering Using Topic Models: Swapnil Hingmire Sandeep Chougule Girish K. Palshikar
8 pages
Arabic Words Clustering by Using K-Means Algorithm
No ratings yet
Arabic Words Clustering by Using K-Means Algorithm
5 pages
Vintage Aircraft Sextants for Sale
No ratings yet
Vintage Aircraft Sextants for Sale
1 page
Introduction To (Demand) Forecasting
No ratings yet
Introduction To (Demand) Forecasting
35 pages
Keyword 2
No ratings yet
Keyword 2
5 pages
Automatic Building of An Ontology From A Corpus of
No ratings yet
Automatic Building of An Ontology From A Corpus of
8 pages
3rd Unit Part-1
No ratings yet
3rd Unit Part-1
7 pages
Lesson 6 Uts
No ratings yet
Lesson 6 Uts
53 pages
Wa0007.
No ratings yet
Wa0007.
8 pages
Search Engine Techniques
No ratings yet
Search Engine Techniques
10 pages
Text, Web and Social Media Analytics: SE Computer, Sem VIII Academic Year: 2023 - 24
No ratings yet
Text, Web and Social Media Analytics: SE Computer, Sem VIII Academic Year: 2023 - 24
36 pages
Is WC 06 Welty Murdock
No ratings yet
Is WC 06 Welty Murdock
14 pages
Text Analysis
No ratings yet
Text Analysis
13 pages
Unsupervised Methods For Developing Taxonomies by Combining Syntactic and Statistical Information
No ratings yet
Unsupervised Methods For Developing Taxonomies by Combining Syntactic and Statistical Information
8 pages
Domenske Ontologije I Deo
No ratings yet
Domenske Ontologije I Deo
7 pages
Ece03 Prelim
No ratings yet
Ece03 Prelim
8 pages
Demos 049
No ratings yet
Demos 049
8 pages
Contextual Abstraction Based Clustering Technique For Effective Text Document Mining
No ratings yet
Contextual Abstraction Based Clustering Technique For Effective Text Document Mining
13 pages
Text Mining
No ratings yet
Text Mining
34 pages
Natural Language Processing: Mature Enough For Requirements Documents Analysis?
No ratings yet
Natural Language Processing: Mature Enough For Requirements Documents Analysis?
12 pages
Concepts and Semantic Relations in Information Science: Wolfgang G. Stock
No ratings yet
Concepts and Semantic Relations in Information Science: Wolfgang G. Stock
19 pages
Clustering
No ratings yet
Clustering
43 pages
Text
No ratings yet
Text
102 pages
CSE442 Text
No ratings yet
CSE442 Text
89 pages
State of The Art Document Clustering Algorithms Based On Semantic Similarity
No ratings yet
State of The Art Document Clustering Algorithms Based On Semantic Similarity
18 pages
Semantic Analysis Theory1
No ratings yet
Semantic Analysis Theory1
16 pages
Introduction To Text Visualization by Nan Cao, Weiwei Cui (Auth.)
No ratings yet
Introduction To Text Visualization by Nan Cao, Weiwei Cui (Auth.)
122 pages
Data Mining:: Concepts and Techniques
No ratings yet
Data Mining:: Concepts and Techniques
37 pages
Derivatives Customer Agreement
No ratings yet
Derivatives Customer Agreement
24 pages
4 Days Bali (Gtmc-Git)
No ratings yet
4 Days Bali (Gtmc-Git)
1 page
RT381-SP Rotary Temperature Transmitter (853-101) : Installation
No ratings yet
RT381-SP Rotary Temperature Transmitter (853-101) : Installation
2 pages