Company Data Extractor

A full-stack application designed to fetch, verify, and manage company information efficiently using AI-powered data extraction and web scraping technologies.

📋 Table of Contents

🌟 Overview

The Company Data Extractor is a comprehensive solution for businesses and researchers who need to collect and verify company information at scale. The application combines the power of OpenAI's API with intelligent web scraping to provide high-quality, verified company data including operational status, location details, and official websites.

Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│                 │    │                 │    │                 │
│   React Frontend│◄──►│ Spring Boot API │◄──►│   MySQL Database│
│                 │    │                 │    │                 │
└─────────────────┘    └─────────────────┘    └─────────────────┘
                               │
                               ▼
                       ┌─────────────────┐
                       │   OpenAI API    │
                       │   Integration   │
                       └─────────────────┘

✨ Features

🔍 Intelligent Data Extraction: Leverage OpenAI API for accurate company information retrieval
🌍 Location-Based Search: Filter companies by city and industry
✅ Verification System: Verify company operational status (Active, Acquired, Closed)
🚫 Duplicate Prevention: Built-in exclusion lists to avoid duplicate entries
📊 Excel Export: Export company data to Excel files for further analysis
📁 File Upload: Drag-and-drop interface for batch processing
⚡ Real-time Processing: Interactive table display with real-time updates
🔒 Secure: Environment-based configuration for sensitive data

🛠 Tech Stack

Component	Technology
Backend	Java 17, Spring Boot 3.5.3, Maven
Database	MySQL 8.0
Frontend	React 18.x, JavaScript ES6+, CSS3
API Integration	OpenAI API (GPT Models)
File Processing	Apache POI (Excel export/import)
Build Tools	Maven (Backend), npm (Frontend)
Security	Environment Variables, Spring Security

📋 Prerequisites

Before you begin, ensure you have the following installed:

Java Development Kit (JDK) 17+
Node.js 16+ and npm
MySQL 8.0+
Maven 3.8+
OpenAI API Key (obtain from OpenAI Platform)

🚀 Installation

Backend Setup

Clone the repository

git clone https://github.com/yourusername/company-data-extractor-backend.git
cd company-data-extractor-backend

Set up the database

CREATE DATABASE company_extractor;
CREATE USER 'your_username'@'localhost' IDENTIFIED BY 'your_password';
GRANT ALL PRIVILEGES ON company_extractor.* TO 'your_username'@'localhost';
FLUSH PRIVILEGES;

Configure environment variables

export DB_USERNAME=your_username
export DB_PASSWORD=your_password
export OPENAI_API_KEY=your_openai_api_key

Build and run the application
```
mvn clean install
mvn spring-boot:run
```
The backend will be available at http://localhost:8081

Frontend Setup

Clone the frontend repository

git clone https://github.com/yourusername/company-data-extractor-frontend.git
cd company-data-extractor-frontend

Install dependencies
```
npm install
```
Start the development server
```
npm start
```
The frontend will be available at http://localhost:3000

⚙️ Configuration

Backend Configuration (`application.properties`)

# Application
spring.application.name=company-data-extractor
server.port=8081

# Database Configuration
spring.datasource.url=jdbc:mysql://localhost:3306/company_extractor?useSSL=false&allowPublicKeyRetrieval=true&serverTimezone=UTC
spring.datasource.username=${DB_USERNAME}
spring.datasource.password=${DB_PASSWORD}
spring.datasource.driver-class-name=com.mysql.cj.jdbc.Driver

# JPA Configuration
spring.jpa.database-platform=org.hibernate.dialect.MySQLDialect
spring.jpa.hibernate.ddl-auto=update
spring.jpa.show-sql=false
spring.jpa.open-in-view=false

# OpenAI API Configuration
openai.api.key=${OPENAI_API_KEY}
openai.api.timeout=30000

# File Upload Configuration
spring.servlet.multipart.max-file-size=10MB
spring.servlet.multipart.max-request-size=10MB

Environment Variables

Create a .env file in your project root (for development):

DB_USERNAME=your_db_username
DB_PASSWORD=your_db_password
OPENAI_API_KEY=sk-your-openai-api-key

📖 Usage

Basic Workflow

Start the Application
- Ensure both backend and frontend are running
- Navigate to http://localhost:3000
Search for Companies
- Enter the target city (e.g., "San Francisco")
- Specify the industry (e.g., "Software")
- Set the number of companies to fetch
- Click "Fetch Companies"
View Results
- Browse companies in the interactive table
- View operational status (Active, Acquired, Closed)
- Check company websites and locations
Export Data
- Click "Export to Excel" to download results
- Use the exported file for further analysis

File Upload Feature

Drag and Drop: Simply drag Excel files onto the upload area
Batch Processing: Upload multiple company lists for processing
Format Support: Supports .xlsx and .xls formats

🔌 API Documentation

Main Endpoints

Fetch Companies

POST /api/companies/fetch
Content-Type: application/json

{
  "city": "San Francisco",
  "industry": "Software",
  "count": 50
}

Get All Companies

GET /api/companies

Export to Excel

GET /api/companies/export?city=San Francisco&industry=Software

Upload File

POST /api/companies/upload
Content-Type: multipart/form-data

{
  "file": [Excel file]
}

Response Format

{
  "success": true,
  "data": [
    {
      "id": 1,
      "name": "Example Corp",
      "website": "https://example.com",
      "status": "Active",
      "city": "San Francisco",
      "industry": "Software",
      "createdAt": "2025-09-24T19:30:00Z"
    }
  ],
  "total": 1,
  "message": "Companies fetched successfully"
}

🤝 Contributing

We welcome contributions! Please follow these steps:

Fork the repository
Create a feature branch
```
git checkout -b feature/amazing-feature
```

Commit your changes

git commit -m 'Add some amazing feature'

Push to the branch
```
git push origin feature/amazing-feature
```
Open a Pull Request

Code Style Guidelines

Backend: Follow Java naming conventions and Spring Boot best practices
Frontend: Use ESLint and Prettier for consistent formatting
Database: Use meaningful table and column names
Comments: Write clear, concise comments for complex logic

🔒 Security

Environment Variables: All sensitive data is stored in environment variables
API Key Protection: OpenAI API keys are never exposed in client-side code
Database Security: Use strong passwords and limit database access
Input Validation: All user inputs are validated and sanitized
CORS Configuration: Properly configured for production deployments

Security Checklist

🚀 Future Enhancements

Planned Features

User Authentication: JWT-based authentication system
Role-Based Access Control: Different permission levels
Advanced Analytics: Company data insights and trends
Bulk Operations: Process thousands of companies simultaneously
API Rate Limiting: Prevent abuse and ensure fair usage
Caching System: Redis integration for better performance
Monitoring: Application performance monitoring
Docker Support: Containerization for easy deployment

Cloud Deployment Options

Backend: AWS EC2, Heroku, Google Cloud Platform
Database: AWS RDS, Google Cloud SQL
Frontend: Vercel, Netlify, AWS S3 + CloudFront

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

MIT License

Copyright (c) 2025 Company Data Extractor

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

💬 Support

Getting Help

Documentation: Check our Wiki for detailed guides
Issues: Report bugs and request features via GitHub Issues
Discussions: Join the community in GitHub Discussions

Contact

Email: [email protected]
Twitter: @CompanyDataExt
LinkedIn: Company Data Extractor

⬆ Back to Top

Made with ❤️ by the Company Data Extractor Team

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
public		public
springbootzipfile/company-backend		springbootzipfile/company-backend
src		src
README.md		README.md
eslint.config.js		eslint.config.js
front-end		front-end
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
vite.config.js		vite.config.js

Uh oh!

Uh oh!

Tushargg1/Data-Finder

Folders and files

Latest commit

History

Repository files navigation

Company Data Extractor

📋 Table of Contents

🌟 Overview

Architecture

✨ Features

🛠 Tech Stack

📋 Prerequisites

🚀 Installation

Backend Setup

Frontend Setup

⚙️ Configuration

Backend Configuration (application.properties)

Environment Variables

📖 Usage

Basic Workflow

File Upload Feature

🔌 API Documentation

Main Endpoints

Fetch Companies

Get All Companies

Export to Excel

Upload File

Response Format

🤝 Contributing

Code Style Guidelines

🔒 Security

Security Checklist

🚀 Future Enhancements

Planned Features

Cloud Deployment Options

📄 License

💬 Support

Getting Help

Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Backend Configuration (`application.properties`)

Packages