Building Search Engine for ICS domain

Utilizing a crawling library below, we initiated the process with the seed http://www.ics.uci.edu and crawled and indexed approximately 136,604 pages within the ics.uci.edu domain.

Crawling library, Java: http://code.google.com/p/crawler4j/
Crawling library, Python: https://github.com/Mondego/crawler4py

Subsequently, we developed a keyword-based web page search engine, implementing a ranking system that integrates TF-IDF, PageRank, and Cosine Similarity to enhance search result relevance.

Base algorithm - rank by TF-IDF
Main algorithm - rank by Cosine Similarity and PageRank

Search Engine

Yoon Kyung Shon

Yoon Kyung Shon

Building Search Engine for ICS domain