Welcome to the Telecom Network Inventory Management (TNI) Python Flask application for duplicate data detection. This repository contains the Flask component of our Telecom Network Inventory Management project, which complements the Java Spring Boot application. This Flask application is responsible for identifying duplicate records in CSV files sent from the Java side. It uses the Dedupe library to perform the duplicate detection.
The core functionality of this Flask app involves processing CSV files, creating training data for Dedupe, and performing duplicate record identification. It communicates with the Java application via HTTP requests and responds with the identified duplicate data.
The Python Flask application in this project serves as a crucial component for identifying duplicate records in CSV files. Here's how it works:
-
Receiving Data: The Flask app receives a POST request from the Java side, including the path to a CSV file generated in the Java app.
-
Data Processing: It reads the CSV file and checks if a training file already exists. If not, it goes through a process of interactively asking the user questions to create a training file for Dedupe.
-
Duplicate Detection: The app uses Dedupe to compare records in the CSV file and identifies duplicates based on learned attributes and values.
-
User Interaction: If necessary, the app may prompt the user to confirm whether attributes of certain records are the same or different while the traing is ongoing.
-
Training Data Update: After identifying duplicates, it updates the training data for future use.
-
Response to Java App: The identified duplicate data is sent back to the Java application.
- Duplicate data detection using Dedupe library.
- Interactive user prompts for attribute similarity confirmation.
- Training data generation for Dedupe.
- Seamless communication with the Java Spring Boot application.
- HTTP-based data exchange.
To run the Flask application, you need the following:
- Python 3.x
- Flask
- Dedupe
-
Clone this repository to your local machine.
-
Install the required dependencies using the pip install command or if you are using pycharm,then just go to the imports section and on the errored imports click and press 'alt+enter'.
-
Run the Flask application and ensuer that the tranning files and the csv file is downloded in the same path of the flask app.
-
Ensure the Java Spring Boot application is running and can send POST requests to this Flask app.
-
Start sending POST requests from the Java side with the path to the CSV file for duplicate detection.
This project is licensed under the MIT License.
I would like to express my sincere gratitude to Sincera Consultancy, the company where the I completed my internship. The guidance, support, and real-world experience gained during this internship have been invaluable to me in the development of the Telecom Network Inventory Management (TNI) project.
Special thanks to the mentors who provided invaluable insights and guidance throughout the project:
-
Srivastsa G: Srivastsa G's guidance were instrumental in shaping the architecture and design of the TNI project.
-
Kamal: Kamal's contributions this project on the flask side is signifcant, particularly in the areas of AI and Machine Learning integration,and as a guid for Flask API integration significantly enhanced the application's functionality.
-
Anand GP: Anand GP's dedication and attention to detail in inventory management have greatly improved the project's data accuracy and efficiency.