Scientific Article Search Engine

Overview

This project implements a search engine for scientific research papers using Apache Lucene.
The system allows users to perform advanced searches with customizable queries, synonym expansion, wildcard search, sorting options, and search history management.

The engine is designed to retrieve the most relevant research papers based on user-defined queries and provides multiple ways of presenting and storing results.

Dataset

Source: Kaggle - NIPS Papers 1987–2019
Preprocessing:
- Random sample of 500 papers selected
- Removed empty fields (e.g., abstract)
- Cleaned line breaks in full_context
- Final dataset stored in papers_cleaned.csv

Project Structure

Main → Handles the user interface and search flow
CSVReader → Loads and parses the dataset
SearchHistory → Stores and retrieves search history; provides query suggestions
SearchResultsWriter → Saves search results to .txt
SearchResultsWriterHTML → Saves search results to .html with highlighted terms
LuceneSearch → Builds the index, performs searches, ranking, highlighting, and result pagination

Features

Field-based search (title, year, full text, etc.)
Synonym expansion and improved query suggestion
Wildcard search (*, ?) support
Result highlighting in console and HTML
Sorting results by publication year (ascending/descending)
Search history with query reuse suggestions
Results presentation:
- Console (paginated, 10 per page)
- Text file (.txt)
- HTML file (.html)

Example Usage

Enter the field you want to search: title
Enter the query: nlp

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.idea		.idea
documents_reports		documents_reports
other		other
papers		papers
phases		phases
requirements		requirements
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Scientific Article Search Engine

Overview

Dataset

Project Structure

Features

Example Usage

About

Uh oh!

Releases

Packages

Uh oh!

Languages

spiliossp/Information-Retrieval

Folders and files

Latest commit

History

Repository files navigation

Scientific Article Search Engine

Overview

Dataset

Project Structure

Features

Example Usage

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages