Currently, the data in the form of text more abundant on various domains and media, both print and online media. The addition of this text document causes the ease of access to any information or knowledge contained in the text is reduced. In addition, the existing information or knowledge is increasingly difficult to interpret and understand comprehensively. For that background, the purpose of the research is to extract knowledge on abundant text data through the processing of unstructured data (text mining), by developing ontology-based interpretation method on text to gain a new knowledge as state of the art. In this research, some technique/method were developed. The first is the development of preprocessing techniques on text data (corpus) and key phrase extraction using AST (Annotated Suffix Tree) to obtain key phrase and frequency of occurrence. The second is the development of ontology modeling as a knowledge base on a domain in the form of relationships between key phrases using Bayesian Network. The third is the development of sparse clustering method in sparse data, namely is-FADDIS (iterative scaling-Additive Fuzzy Spectral Clustering) for text grouping process, which is the addition of FADDIS clustering method (Additive Fuzzy Spectral Clustering) and the fourth is the development of matching and correlating method as a technique used at interpreting the text entered using ontology. In an integrated manner, the ontology development of the text, with news domains, is done by processes include key phrase extraction, clustering (is-FADDIS, optional) and structure learning to form ontology-tree. Key phrase as a concept, being the node on the ontology, which becomes the domain knowledge base. The next step is to interpret the text on an input text consisting of a key phrase or a cluster using the ontology to gain new knowledge. Interpretation done with ontology comes from text with two domains and one domain. Text interpretation results using Fuzzy Spectral Clustering (is-FADDIS) based ontology is evaluated using relevancy scores. In the input text with one key phrase a total of five interpreted inputs, the result is 40% relevant, 40% less relevant and 20% irrelevant. In one-cluster input text a number of two inputs are interpreted, the result is relevant. Relevant relevance score score, empirically more than 0.3 of scale 1, and score relevance obtained, some reaching 0.33. By comparing the results of interpretation through the variation of techniques on ontology development, it was found, the use of FADDIS-based ontology for textual interpretation, relative to this research has not provided optimal results. In the use of developed techniques, this method provides textual interpretation output that can help to process text information in quantities not too large but fastly."
Depok: Fakultas Ilmu Komputer Universitas Indonesia, 2018