Thanks to visit codestin.com
Credit goes to github.com

Skip to content

This repository contains Terraform code to deploy a serverless batch processing architecture on AWS, designed to replace an on-premises system with a scalable, reliable, and maintainable cloud solution.

Notifications You must be signed in to change notification settings

HakeemSalaudeen/salesproject-batch-processing-on-AWS

Repository files navigation

salesproject: Batch processing on AWS

Project Overview

This project migrates an on-premises batch processing system to AWS, addressing challenges in reliability, scalability, and maintainability. The architecture leverages AWS services like S3, Glue, and Redshift Serverless to create a robust, serverless data pipeline. All infrastructure is provisioned using Terraform, ensuring consistency and reproducibility.

image


Project Structure

The repository contains the following Terraform configuration files:

  1. backend.tf: Configures the S3 backend for storing the Terraform state file.
  2. vpc.tf: Defines the VPC, subnets, Internet Gateway, NAT Gateway, Route Tables, and Security Groups.
  3. iamrole.tf: Creates IAM roles and policies for secure service interactions.
  4. redshift.tf: Provisions Redshift Serverless Workgroup, Namespace, and associated configurations.
  5. glue.tf: Configures Glue jobs, connections, and crawlers for data processing.
  6. providers.tf: Specifies the required Terraform providers and versions.
  7. s3.tf: Creates S3 buckets for raw data, processed data, and backups.
  8. variable.tf: Defines input variables for reusable configurations.
  9. sns.tf: Sets up SNS topics for error notifications.

Key Features

  • Serverless Architecture: Uses managed services like Redshift Serverless and Glue for scalability eliminating infrastructure management overhead.
  • Automated Data Pipeline: Glue jobs are scheduled to run hourly to process new data files.
  • Error Handling: SNS sends email notifications for Glue job failures.
  • Infrastructure as Code: All resources are deployed using Terraform, ensuring consistency and reproducibility.
  • Security: Redshift is deployed in a VPC (private subnet), least privilege access, and credentials are stored securely in Secrets Manager.
  • Cost Optimization: Pay-for-use model with no upfront infrastructure costs

Prerequisites

  1. AWS Account: Ensure you have an active AWS account.
  2. Terraform: Install Terraform on your local machine.
  3. AWS CLI: Configure AWS CLI with your credentials (access key and secret key.

Setup Instructions

  1. Clone the Repository:

    git clone https://github.com/HakeemSalaudeen/salesproject-batch-processing-on-AWS.git
  2. Initialize Terraform:

    terraform init
  3. Review Variables: Update the variable.tf file with your specific configurations (Redshift credentials).

  4. Deploy Infrastructure:

    terraform apply
  5. Verify Deployment:

    • Check the AWS Management Console to ensure all resources are created.
    • Test the data pipeline by uploading a file to the S3 bucket.

Code Quality

  • Code Linting: All Terraform files are formatted using terraform fmt for consistency.
  • Best Practices: Follows Terraform best practices for modularity and readability.

Monitoring and Maintenance

  • CloudWatch: Use CloudWatch to monitor Glue jobs, Redshift performance, and system logs.
  • SNS Alerts: Configure SNS topics to receive notifications for failures.

Contributing

Contributions are welcome! Please follow these steps:

  1. Fork the repository.
  2. Create a new branch for your feature or bug fix.
  3. Submit a pull request with a detailed description of your changes.

Happy Coding! 🚀

About

This repository contains Terraform code to deploy a serverless batch processing architecture on AWS, designed to replace an on-premises system with a scalable, reliable, and maintainable cloud solution.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published