posted on 2017-12-06, 00:00authored byDeepani Guruge, Russel Stonier
Current rnajor search engines on the web retrieve too many documents, of which only a small fraction are relevant to the user query. We propose a new fuzzy document- filtering algorithm to filter out documents irrelevant to the user query from the output of Internet search engines. This algorithrn uses output of 'Google' search engine as the basic input and processes this input to filter documents most relevant to the query. The clustering algorithm used here is based on the fuzzy c-means with simple modifications to the membership function formulation and cluster prototype initialisation. It classifies input documents into 3 predefined clusters. Finally, clustered and context-based ranked URLs are presented to the user. The effectiveness of the algorithm has been tested using data provided by the eighth Text REtrieval Conference (TREC8) [25]and also with on-line data. Experimental results were evaluated by using error matrix method, precision, recall and clustering validity measures.
Funding
Category 1 - Australian Competitive Grants (this includes ARC, NHMRC)
History
Start Page
39
End Page
53
Number of Pages
15
Start Date
2004-01-01
ISBN-13
9780646443799
Location
Cairns, Qld.
Publisher
University of Technology Sydney
Place of Publication
Sydney, N.S.W.
Peer Reviewed
Yes
Open Access
No
External Author Affiliations
Faculty of Informatics and Communication; TBA Research Institute;