Thanks to visit codestin.com
Credit goes to github.com

Skip to content

An Information Retrieval engine for scientific papers – Lucene-powered with synonyms, wildcards, and smart query expansion.

Notifications You must be signed in to change notification settings

spiliossp/Information-Retrieval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scientific Article Search Engine

Overview

This project implements a search engine for scientific research papers using Apache Lucene.
The system allows users to perform advanced searches with customizable queries, synonym expansion, wildcard search, sorting options, and search history management.

The engine is designed to retrieve the most relevant research papers based on user-defined queries and provides multiple ways of presenting and storing results.


Dataset

  • Source: Kaggle - NIPS Papers 1987–2019
  • Preprocessing:
    • Random sample of 500 papers selected
    • Removed empty fields (e.g., abstract)
    • Cleaned line breaks in full_context
    • Final dataset stored in papers_cleaned.csv

Project Structure

  • Main → Handles the user interface and search flow
  • CSVReader → Loads and parses the dataset
  • SearchHistory → Stores and retrieves search history; provides query suggestions
  • SearchResultsWriter → Saves search results to .txt
  • SearchResultsWriterHTML → Saves search results to .html with highlighted terms
  • LuceneSearch → Builds the index, performs searches, ranking, highlighting, and result pagination

Features

  • Field-based search (title, year, full text, etc.)
  • Synonym expansion and improved query suggestion
  • Wildcard search (*, ?) support
  • Result highlighting in console and HTML
  • Sorting results by publication year (ascending/descending)
  • Search history with query reuse suggestions
  • Results presentation:
    • Console (paginated, 10 per page)
    • Text file (.txt)
    • HTML file (.html)

Example Usage

Enter the field you want to search: title
Enter the query: nlp

Releases

No releases published

Packages

No packages published