100% found this document useful (1 vote)

100 views5 pages

Text Mining: Techniques and Its Application: December 2014

Uploaded by

Ipsita Jena

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

100 views5 pages

Text Mining: Techniques and Its Application: December 2014

Uploaded by

Ipsita Jena

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/273038150

Text Mining : Techniques and its Application

Article · December 2014

CITATIONS READS
14 13,901

1 author:

Shilpa Dang
Maharishi Markandeshwar University, Mullana
15 PUBLICATIONS 48 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Text Mining View project

All content following this page was uploaded by Shilpa Dang on 04 March 2015.

The user has requested enhancement of the downloaded file.

IJETI International Journal of Engineering & Technology Innovations, Vol. 1 Issue 4, November 2014 22
ISSN (Online): 2348-0866
www.IJETI.com

Text Mining:
Techniques and its Application
Shilpa Dang1, Peerzada Hamid Ahmad 2
1
Assistant Professor, 2Research Scholar
M.M Institute of Computer Technology & Business Management
Maharishi Marakandeshwar University, Haryana, India.

Abstract organizations and industries information are stored in

Abstract electronic form.
Text mining has become an exciting research field as it tries to There are a variety of names for text mining like text data
discover valuable information from unstructured texts. The mining, knowledge discovery [4] from textual databases,
unstructured texts which contain vast amount of information analysis of intelligent text refers to extracting or retrieve
cannot simply be used for further processing by computers.
the valuable information from the unstructured text. It can
Therefore, exact processing methods, algorithms and techniques
are vital in order to extract this valuable information which is
be viewed as an extension of data mining or knowledge
completed by using text mining. In this paper, we have discussed discovery from (structured) databases. Text mining
general idea of text mining and comparison of its techniques. In discovers new pieces of information from textual data
addition, we briefly discuss a number of text mining applications which is earlier unidentified or secret information by
which are used presently and in future. extracting it using different techniques. Text mining is a
multidisciplinary field, concerning retrieval of information,
KEYWORDS: Retrieval, Extraction, Categorization, Clustering, analysis of text, extraction of information, categorization,
Summarization. clustering, visualization, mining of data, and machine
learning.
INTRODUCTION There are five basic text mining steps as under:
Text mining has become important research vicinity. A
very large number of information stored in different places Text mining steps:
in unstructured structure. Approximately 80% of the a) Collecting information from unstructured data.
world’s data is in unstructured text [1]. This unstructured b) Convert this information received into structured
text cannot be easily used by computer for more processing. data
So there is a need for some technique that is useful to c) Identify the pattern from structured data
extract some precious information from unstructured text. d) Analyze the pattern
These information are then stored in text database format e) Extract the valuable information and store in the
which contains structured and few unstructured fields. Text database.
can be sited in mails, chats, SMS, newspaper articles,
journals, product reviews, and organization records [2].
Almost every one of the institutions, government sectors,

f) STRUCTURED IDENTIFIED ANALYSIS OF

DATA DATA DATA
g)

UNSTRUCTURED EXTRACTION
TEXT DATABASE h)

Fig 1: Processing of Text Mining

IJETI
www.ijeti.com
IJETI International Journal of Engineering & Technology Innovations, Vol. 1 Issue 4, November 2014 23
ISSN (Online): 2348-0866
www.IJETI.com

Basic Text Mining Technologies Clustering:

Clustering is one of the most interesting and important
Information Retrieval: topics in text mining. Its aim is to find intrinsic structures
The most well known information retrieval (IR) systems in information, and arrange them into significant subgroups
are Google search engines which recognize those for further study and analysis. It is an unsupervised process
documents on the World Wide Web that are associated to through which objects are classified into groups called
a set of given words. It is measured as an extension to clusters. The problem is to group the given unlabeled
document retrieval where the documents that are returned collection into meaningful clusters without any prior
are processed to extract the useful information crucial for information. Any labels associated with objects are
the user [3]. Thus document retrieval is followed by a text obtained solely from the data. For example, document
summarization stage that focuses on the query posed by the clustering assists in retrieval by creating links between
user, or an information extraction stage. IR in the broader related documents, which in turn allows related documents
sense deals with the whole range of information to be retrieved once one of the documents has been
processing, from information retrieval to knowledge deemed relevant to a query [8].
retrieval [8]. It is a relatively old research area where first Clustering is useful in many application areas such as
attempts for automatic indexing where made in 1975. It biology, data mining, pattern recognition, document
gained increased attention with the grow of the World retrieval, image segmentation, pattern classification,
Wide Web and the need for classy search engines. security, business intelligence and Web search. Cluster
analysis can be used as a standalone text mining tool to
Information Extraction: achieve data distribution, or as a pre-processing step for
The goal of information extraction (IE) methods is the other text mining algorithms operating on the detected
extraction of useful information from text. It identifies the clusters.
extraction of entities, events and relationships from semi-
structured or unstructured text. Most useful information Summarization:
such as name of the person, location and organization are Text summarization is an old challenge in text mining but
extracted without proper understanding of the text [4]. IE in dire need of researcher’s attention in the areas of
is concerned with extraction of semantic information from computational intelligence, machine knowledge and
the text.IE can be described as the construction of a natural language processing. Text summarization is the
structured image of selected relevant piece information process of automatically creating a compressed version of
drawn from texts. a given text that provides useful information for the user.
In big organization or company, researcher do not have
Categorization: time to read all documents so they summarize document
Text categorization is a kind of “supervised” learning and highlight summary with main points [4]. A summary is
where the categories are known in advance and firm in a text that is produced from one or more texts that contains
progress for each training document. Then, its key a significant portion of the information, reduced in length
projected utilize was for indexing scientific literature by and keeps the overall meaning as it is in the original texts.
means of controlled words. It was only in the 1990s that Text summarization involves various methods that employ
the field fully developed with the availability of continuous text categorization, such as neural networks, decision trees,
increasing numbers of text documents in digital form and semantic graphs, regression models, fuzzy logic and swarm
the requirement to organize them for easier use [5]. intelligence. However, all of these methods have a
Categorization is the assignment of normal language common problem, that is, the quality of the development of
documents to predefined set of topics according to their classifiers is variable and highly dependent on the type of
content. It is a collection of text documents, the process of text being summarized.
finding the accurate topic or topics for each document.
Nowadays automated text categorization is applied in a Comparison of Text Mining Techniques:
variety of contexts from the classical automatic or Text mining uses various numbers of techniques which
semiautomatic indexing of texts to personalized play an important role. The techniques differ from each
commercials delivery, spam filtering, and categorization of other. The information of retrieval technique used
Web page under hierarchical catalogues, automatic unstructured text where it can retrieve valuable information
metadata generation, and detection of text genre, topic while as the information of extraction extracts the
tracking and many others [6]. The learning of automated information from structured database. The Summarization
text categorization starts early 1960s. It is a hot topic in technique is used to summarize the document which
machine learning today’s research field. reduces length and keeps meaning same as it is.

IJETI
www.ijeti.com
IJETI International Journal of Engineering & Technology Innovations, Vol. 1 Issue 4, November 2014 24
ISSN (Online): 2348-0866
www.IJETI.com

The categorization is supervised process and uses related subgroups for further study and analysis. It is an
predefined set documents according to their contents. unsupervised process through which objects are classified
Responsiveness and flexibility of the post-co-ordinate into groups called clusters. Clustering is dealing with high
system effectively prohibit the establishment of meaningful dimensional data, finding interesting pattern associated
relationships because a category is created by individual with data. Another feature is that it is a group of similar
not the system. While as the clustering is used to find type of data and their relationship between them.
intrinsic structures in information, and arrange them into

Table1: Comparison of text mining techniques

Technique Characteristics Tools

Retrieval Retrievals valuable information from Intelligent Miner,

unstructured text Text Analyst

Extraction Extract information from structured database Text Finder,

Clear Forest Text

Summarization Reduce length by keeping its main points and Tropic Tracking Tool,
overall meaning as it is Sentence Ext Tool

Categorization Document based categorization Intelligent Miner

Cluster Cluster collection of documents, Carrot,

Clustering, classification and analysis of text Rapid Miner
document

Bioinformatics
Applications text mining Research work has grown-up in a bioinformatics field,
where biomedical literature has become an important
Academic applications research application area for text mining. In the year 2005,
To discover the patterns and trends in the journals and the first textbook on biomedical text mining appeared,
proceedings from huge volume of papers is an essential where it has reported that industry has suggested that 90%
task in the research field [1]. The matter of importance to of drug targets are derived from the literature. The
publishers who hold large databases of information need motivation for this work comes primarily from biologists,
indexing for retrieval. This is especially true in scientific who find themselves faced with a massive increase in the
disciplines in which highly specific information is often number of publications in their field, by keeping up with
contained within written text. This text mining tool is the related literature is nearly not possible for many
applied to discover trends on different topics that exist in scientists [7]. The goal of text mining in this area is to
the proceedings and to show how they change over time. It allow biomedical researchers to extract knowledge from
is also used as topic tracking. Therefore, initiatives have the biomedical literature in facilitating new innovation in a
been taken such as Nature's proposal for an Open Text more efficient manner. One online text mining application
Mining Interface (OTMI) and the National Institutes of in the biomedical literature is that combines biomedical
Health's common Journal Publishing Document Type text mining with network visualization as an Internet
Definition (DTD) that would provide semantic cues to service. Bio-entity recognition aims to identify and classify
machines to answer specific queries contained within text technical terms in the domain of molecular biology that
without removing publisher barriers to public access. corresponds to instances of concepts that are of interest to
biologists. Entity recognition is becoming increasingly

IJETI
www.ijeti.com
IJETI International Journal of Engineering & Technology Innovations, Vol. 1 Issue 4, November 2014 25
ISSN (Online): 2348-0866
www.IJETI.com

important with the massive increase in reported results due of text mining, several text mining techniques and its
to high throughput experimental methods. It can be used in applications in various fields have been discussed. A
several higher level information access tasks such as comparison of different text mining has been shown which
relation extraction, summarization and question answering can be further enhanced. Text mining algorithms will give
[10]. us useful and structured data which can reduces time and
cost. Hidden information in social network sites,
Copyright and Customer Profile Analysis bioinformatics and internet security etc. are identified
The copyright analysis developed to a large application using text mining is a major challenge in these fields. The
area in recent years because of the increased number of advancement of web technologies has lead to a tremendous
copyright applications. The supervised and unsupervised interest in the classification of text documents containing
techniques are applied to analyze copyright documents and links or other information.
to support companies and also the copyright office in some
countries to their work. The challenges in copyright References:
analysis consist of the length of the documents, which are
larger than documents usually used in text classification, [1] Vallikannu Ramanathan, T. Meyyappan "Survey
and the large number of available documents in a corpus of Text Mining", International Conference on
[6]. Technology and Business and Management,
Companies use text mining to draw out the occurrences March 2013, pp. 508-514.
and instances of key terms in large blocks of text such as [2] Vidya K A, G Aghila, “Text Mining Process,
articles, Web pages, complaint forums. The software Techniques and Tools: an Overview”,
converts the unstructured data formats into topic structures International Journal of Information Technology
and semantic networks which are important information and Knowledge Management, July-December
drilling tools. By studying the semantic network, one can 2010, Volume 2, No 2, pp.613-622.
learn the general quality of the complaints, reasons for [3] R.Sagayam, S.Srinivasan, S.Roshini, “A Survey
complaining. It also finds common words used in of Text Mining: Retrieval, Extraction and
complaints and their relationships to other words in the Indexing Techniques”. Internaltional Journal of
text via semantic weight [9, 10]. Computational Engineering Research
(ijceronline.com) Vol.2 Issue.5.
Internet Security [4] Vishal Gupta and Guruprit Lehal, “A Survey of
The use of text mining tool in security field has become an Text Mining Techniques and Applications”,
important matter. A lot of text mining software packages is Journal Of Emerging Technologies In Web
marketed for security applications, particularly monitoring Intelligence, Vol. 1, No. 1, August 2009.
and analysis of online plain text sources such as Internet [5] Hearst, M. A. (1997) Text data mining: Issues,
news, blogs, mail etc. for security purposes [7]. It is also techniques, and the relationship to information
involved in the study of text encryption/decryption. access. Presentation notes for UW/MS workshop
Government agencies are investing considerable resources on data mining, July 1997.
in the surveillance of all kinds of communication, such as [6] Rashmi Agrawal, Mridula Batra, "A Detailed
email, online chats. Email is used in many legitimate Study on Text Mining Techniques", IJSCE, ISSN:
activities such as messages and documents exchange. 2231-2307, Vol. 2, Issue-6, January 2013.
Unfortunately, it can also be misused, for example in the [7] Falguni N. Patel, Neha R. Soni,"Text mining: A
distribution of unwanted junk mail, mailing offensive or Brief survey", International Journal of Advanced
bullying materials. The explosive growth of unsolicited e- Computer Research, ISSN (Online):2277-7970,
mail, more commonly known as spam, over the last years Vol. 2, No. 4, Issue-6, Dec 2012.
has been undermining constantly the usability of e-mail. [8] Mr. Rahul Patel,Mr. Gaurav Sharma,"A survey on
One solution is offered by anti-spam filters. Most text mining techniques", International Journal Of
commercially available filters use black-lists and hand- Engineering And Computer Science ISSN:2319-
crafted rules. Since time is crucial and given the scale of 7242, Vol 3 Issue 5, May 2014, pp.5621-5625
the problem, it is infeasible to monitor emails or online [9] Seth Grimes, “The developing text mining
chat normally. Thus automatic text mining tools offer a market”, white paper, Text Mining Summit Alta
considerable promise in this area [10]. Plana Corporation, Boston, 2005, pp. 1-12.
[10] Shaidah Jusoh and Hejab M. Alfawareh,
Conclusion "Techniques, Applications and Challenging Issue
Text mining generally refers to the process of extracting in Text Mining", IJCSI, ISSN (Online): 1694-
valuable information from unstructured text. In this survey 0814, Vol. 9, Issue-6, No. 2, November 2012.

IJETI
www.ijeti.com

View publication stats

1 2 3 4 5 Merged
No ratings yet
1 2 3 4 5 Merged
23 pages
10 1109@icaccs 2019 8728547
No ratings yet
10 1109@icaccs 2019 8728547
5 pages
Case Study On Text Mining
100% (1)
Case Study On Text Mining
8 pages
43.IJCSCN PreprocessingTechniquesforTextMining Ilamathi Nithya
No ratings yet
43.IJCSCN PreprocessingTechniquesforTextMining Ilamathi Nithya
11 pages
A Detailed Study On Text Mining Techniques
No ratings yet
A Detailed Study On Text Mining Techniques
4 pages
Text Mining: Concepts, Process and Applications: January 2013
No ratings yet
Text Mining: Concepts, Process and Applications: January 2013
5 pages
Text Mining: A Burgeoning Technology For Knowledge Extraction
100% (1)
Text Mining: A Burgeoning Technology For Knowledge Extraction
5 pages
Text Mining: Techniques and Challenges
No ratings yet
Text Mining: Techniques and Challenges
5 pages
Data Mining in Business Intelligence
No ratings yet
Data Mining in Business Intelligence
63 pages
DMTerm Paper
No ratings yet
DMTerm Paper
4 pages
TextAnalyticsApplicationofTextMining2021 31122023 071845am 1 10122024 061001pm
No ratings yet
TextAnalyticsApplicationofTextMining2021 31122023 071845am 1 10122024 061001pm
7 pages
Text Mining
No ratings yet
Text Mining
16 pages
Text Mining in Big Data Analytics
No ratings yet
Text Mining in Big Data Analytics
34 pages
Text Mining in Data Mining Guide
No ratings yet
Text Mining in Data Mining Guide
18 pages
Unit 1
No ratings yet
Unit 1
8 pages
Assignment Rubel - Data Mining
No ratings yet
Assignment Rubel - Data Mining
12 pages
Comparative Analysis of Text Mining Techniques For
No ratings yet
Comparative Analysis of Text Mining Techniques For
12 pages
Data Mining Assignment
No ratings yet
Data Mining Assignment
6 pages
(IJCST-V6I4P5) :S.Sheela, T.Bharathi
No ratings yet
(IJCST-V6I4P5) :S.Sheela, T.Bharathi
7 pages
Text Mining and Its Applications
No ratings yet
Text Mining and Its Applications
5 pages
Survey Data Analysis
No ratings yet
Survey Data Analysis
17 pages
Text Mining and Its Business Applications
No ratings yet
Text Mining and Its Business Applications
17 pages
Submitted To: Submitted By:: Text Mining
No ratings yet
Submitted To: Submitted By:: Text Mining
15 pages
Text Mining Assignment
No ratings yet
Text Mining Assignment
12 pages
What Is Text Mining
No ratings yet
What Is Text Mining
9 pages
Text Mining
No ratings yet
Text Mining
18 pages
1-What Is Text Mining - IBM
No ratings yet
1-What Is Text Mining - IBM
5 pages
Module 1 Part1
No ratings yet
Module 1 Part1
54 pages
Text Mining
No ratings yet
Text Mining
12 pages
A Tutorial Review On Text Mining Algorithms: Mrs. Sayantani Ghosh, Mr. Sudipta Roy, and Prof. Samir K. Bandyopadhyay
No ratings yet
A Tutorial Review On Text Mining Algorithms: Mrs. Sayantani Ghosh, Mr. Sudipta Roy, and Prof. Samir K. Bandyopadhyay
11 pages
Data Mining in Business Intelligence
No ratings yet
Data Mining in Business Intelligence
64 pages
Diborinaye 2
No ratings yet
Diborinaye 2
7 pages
Information Retrieval
No ratings yet
Information Retrieval
3 pages
Effective Classification of Text
No ratings yet
Effective Classification of Text
6 pages
FDS-Content Beyond Syllabus
No ratings yet
FDS-Content Beyond Syllabus
15 pages
TextMining PAKDD1999
No ratings yet
TextMining PAKDD1999
7 pages
Dissertation Text Mining
100% (2)
Dissertation Text Mining
4 pages
Dibo IR
No ratings yet
Dibo IR
7 pages
Text Mining Literature Review Guide
100% (3)
Text Mining Literature Review Guide
5 pages
13254-Article Text-23653-2-10-20230414
No ratings yet
13254-Article Text-23653-2-10-20230414
14 pages
Zhang 2015
No ratings yet
Zhang 2015
5 pages
Module 4
No ratings yet
Module 4
63 pages
Twitter Text Mining Techniques
No ratings yet
Twitter Text Mining Techniques
4 pages
Unit Ii DM
No ratings yet
Unit Ii DM
18 pages
Text Mining
No ratings yet
Text Mining
6 pages
Unit 5 DM
No ratings yet
Unit 5 DM
11 pages
Text and Web Mining
No ratings yet
Text and Web Mining
44 pages
Text Analytics
No ratings yet
Text Analytics
9 pages
Chengqing Zong - Rui Xia - Jiajun Zhang - Text Data Mining-Springer Singapore
No ratings yet
Chengqing Zong - Rui Xia - Jiajun Zhang - Text Data Mining-Springer Singapore
528 pages
IMTC634 - Data Science - Chapter 7
No ratings yet
IMTC634 - Data Science - Chapter 7
24 pages
Comprehensive Guide to Text Mining
No ratings yet
Comprehensive Guide to Text Mining
15 pages
Business Intelligence and Data Mining: by Dr. Atanu Rakshit Email: Atanu - Rakshit@iimrohtak - Ac.in
No ratings yet
Business Intelligence and Data Mining: by Dr. Atanu Rakshit Email: Atanu - Rakshit@iimrohtak - Ac.in
122 pages
Unit:: A. Text Mining Algorithms
No ratings yet
Unit:: A. Text Mining Algorithms
21 pages
IJCER (WWW - Ijceronline.com) International Journal of Computational Engineering Research
No ratings yet
IJCER (WWW - Ijceronline.com) International Journal of Computational Engineering Research
4 pages
CIT 503 Database Administration and Management
No ratings yet
CIT 503 Database Administration and Management
5 pages
SSH Commands & Secure Config Guide
No ratings yet
SSH Commands & Secure Config Guide
1 page
53 - Sas Pce
No ratings yet
53 - Sas Pce
592 pages
Materialized View
No ratings yet
Materialized View
30 pages
Unicenta - How To Publish A Report
No ratings yet
Unicenta - How To Publish A Report
9 pages
DiCentral JD Data Integration Specialist
No ratings yet
DiCentral JD Data Integration Specialist
3 pages
DB2 DDS Database Tutorials
100% (1)
DB2 DDS Database Tutorials
11 pages
Presentation 2
No ratings yet
Presentation 2
36 pages
Install Saprouter
No ratings yet
Install Saprouter
3 pages
Themes For Winforms: Componentone
No ratings yet
Themes For Winforms: Componentone
47 pages
SFTP
No ratings yet
SFTP
2 pages
IBM ZOS Management Facility Messages
No ratings yet
IBM ZOS Management Facility Messages
260 pages
Backend Developer Assignment
No ratings yet
Backend Developer Assignment
3 pages
YSU Security Manual v2-1
No ratings yet
YSU Security Manual v2-1
32 pages
EDI Processing & SAP Adapter Guide
50% (2)
EDI Processing & SAP Adapter Guide
40 pages
Inside Sales Representative Role and Responsibilities
No ratings yet
Inside Sales Representative Role and Responsibilities
2 pages
One Stage Stop To Know All About BW Extractors - Part2
No ratings yet
One Stage Stop To Know All About BW Extractors - Part2
4 pages
Data Warehouse Building Guide
No ratings yet
Data Warehouse Building Guide
10 pages
ASUG84529 - The Transformation From SAP Customer Relationship Management To SAP S4HANA For Customer Management
No ratings yet
ASUG84529 - The Transformation From SAP Customer Relationship Management To SAP S4HANA For Customer Management
14 pages
Photo Scanner App Document
100% (1)
Photo Scanner App Document
99 pages
Visual Business Object-VBO
No ratings yet
Visual Business Object-VBO
6 pages
DBASE
No ratings yet
DBASE
3 pages
Time Boxing
No ratings yet
Time Boxing
17 pages
Mobile Network Generations Explained
No ratings yet
Mobile Network Generations Explained
2 pages
One Identity Manager Installation Guide
No ratings yet
One Identity Manager Installation Guide
189 pages
Privileged Access Management For Dummies®, Delinea Special Edition Joseph Carson Updated Edition 2025
No ratings yet
Privileged Access Management For Dummies®, Delinea Special Edition Joseph Carson Updated Edition 2025
111 pages
File Systems for IT Professionals
No ratings yet
File Systems for IT Professionals
42 pages
SQL For Data Analysis - 3 Books - Johanson, Louis
No ratings yet
SQL For Data Analysis - 3 Books - Johanson, Louis
514 pages
MizeMap UML
No ratings yet
MizeMap UML
7 pages
Differential Billing
0% (1)
Differential Billing
3 pages

Text Mining: Techniques and Its Application: December 2014

Uploaded by

Text Mining: Techniques and Its Application: December 2014

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Text Mining : Techniques and its Application

Article · December 2014

Text Mining View project

The user has requested enhancement of the downloaded file.

Abstract organizations and industries information are stored in

f) STRUCTURED IDENTIFIED ANALYSIS OF

Fig 1: Processing of Text Mining

Basic Text Mining Technologies Clustering:

Table1: Comparison of text mining techniques

Technique Characteristics Tools

Retrieval Retrievals valuable information from Intelligent Miner,

Extraction Extract information from structured database Text Finder,

Categorization Document based categorization Intelligent Miner

Cluster Cluster collection of documents, Carrot,

View publication stats

You might also like