0% found this document useful (0 votes)

57 views5 pages

Research Article

This document proposes a project to develop a system that uses file checksums to detect and remove duplicate data. It discusses how data duplication wastes storage and impacts data management. The project aims to generate unique checksums for files and compare them to efficiently identify identical duplicates. It will also allow flexible handling of duplicates based on user needs. By automating duplicate detection and removal using this technique, the project aims to optimize storage usage and improve data quality and workflows for organizations.

Uploaded by

utpalchoudhary177

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views5 pages

Research Article

Uploaded by

utpalchoudhary177

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Data Duplication Removal Using File Checksum

Utpal Choudhary Saksham Jaiswal Manan Sharma Vikas Gupta

Chandigarh University Chandigarh University Chandigarh University Chandigarh University
Mohali, Punjab Mohali, Punjab Mohali, Punjab Mohali, Punjab
[email protected] [email protected] [email protected] [email protected]

Ms. Sukhmeet Kaur

Chandigarh University
Mohali, Punjab

Abstract— Data duplication poses a significant challenge in challenge of data duplication has emerged as a significant
modern data management systems, causing wastage of impediment to efficient data management. Data duplication, the
storage resources, increased processing overhead, and presence of identical copies of files across a dataset, not only
potential inconsistencies in data integrity. Addressing this consumes valuable storage resources but also introduces
issue, this project introduces a robust methodology that complexities in data processing, maintenance, and integrity.
harnesses the power of file checksum techniques to detect Addressing this issue is crucial for organizations seeking to
and eliminate duplicate files within a given dataset optimize their data infrastructure and ensure the accuracy and
effectively. reliability of their information.
The project's key objective revolves around developing an The project titled "Data Duplication Removal Using File
intelligent system that can seamlessly identify and manage Checksum" seeks to tackle the persistent problem of data
duplicate files. The project employs cryptographic hash duplication through a systematic and intelligent approach. By
functions to generate unique checksums for individual files leveraging the power of file checksum techniques, the project
to achieve this. By comparing these checksums, identical aims to create a solution that can identify and eliminate
files are accurately pinpointed, facilitating efficient duplicate files within a given dataset, contributing to
duplicate identification even in large and diverse datasets. streamlined data management and enhanced data quality.
A notable aspect of the proposed approach is its versatility. In this era of big data and rapidly expanding digital repositories,
The system not only excels in the identification of duplicate identifying duplicate files manually is a laborious and error-
files but also incorporates a flexible mechanism for the prone task. Moreover, traditional methods of detecting
subsequent handling of duplicates based on the specific duplicates based on file attributes or content tend to be time-
requirements of the application. This encompasses options consuming and resource-intensive. The proposed project
for the immediate removal of redundant files or their recognizes these limitations and proposes a novel methodology
archival for historical or compliance purposes. that utilizes cryptographic hash functions to generate unique
The potential impact of this project is substantial, as it checksums for each file. By comparing these checksums,
offers a practical and automated solution to a pervasive duplicate files can be accurately and efficiently pinpointed,
problem in data management. By mitigating data regardless of file names, formats, or content.
duplication, organizations can optimize storage usage, The significance of this project lies in its potential to
streamline data processing operations, and bolster overall revolutionize data management practices. By automating the
data quality. process of duplicate file detection and removal, organizations
can expect to witness improved storage utilization, streamlined
data workflows, and heightened data accuracy. The
Keywords— Data duplication, File checksum, Duplicate introduction of an adaptable mechanism for the handling of
file detection, Data integrity, Cryptographic hash duplicate files further enhances the applicability of the solution,
functions, Data management, Storage optimization. catering to diverse organizational needs.
Through this project, we embark on a journey to explore the
intricate interplay between data duplication and file checksums,
I. INTRODUCTION
aiming to create a robust solution that empowers organizations
to efficiently manage their data resources while ensuring the
In the digital age, the proliferation of data has ushered in integrity and reliability of their information assets.
unparalleled opportunities for innovation and insight across
various domains. However, alongside this growth, the

XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE

II. LITERATURE SURVEY III. PROBLEM STATEMENT

K.Praveen (2016): This paper focuses on the implementation In today's digital age, organizations and individuals generate
of data deduplication techniques, including the utilization of vast amounts of data, leading to a growing concern: data
file checksums for duplicate data detection and elimination. It duplication. Duplicated data not only consumes storage space
discusses the challenges faced in implementing such but also poses challenges in data management, version control,
techniques, such as handling large datasets efficiently and and data integrity. To address this issue, the project aims to
ensuring data integrity. develop a robust and efficient system for "Data Duplication
Removal Using File Checksum."
C.Qiang (2015): This comprehensive survey paper covers a
wide range of data deduplication methods. While it does not The problem statement for this project encompasses the
exclusively focus on checksums, it provides a thorough following key aspects:
exploration of their role in identifying and removing duplicate
data. The article discusses various hashing and checksum-based 1. Data Duplication: The proliferation of data across different
techniques and their advantages and limitations. It serves as a storage devices, cloud platforms, and networks has resulted in
valuable resource for understanding the broader landscape of data duplication issues. Identifying and eliminating redundant
data deduplication. copies of data is essential to optimize storage resources and
improve data management.
J.Li (2018): This research article specifically addresses data
deduplication in cloud storage environments. It highlights the 2. File Checksum: A file checksum is a unique and fixed-length
importance of using checksums to efficiently detect duplicate string generated based on the content of a file. It serves as a
files, which is crucial for optimizing storage resources in the fingerprint for the file's content, making it an effective method
cloud. The article also discusses emerging research directions for identifying duplicate files.
in the field, such as improving deduplication techniques for
cloud-based storage systems. 3. Efficient Duplication Removal: The project aims to design
and implement an efficient algorithm or system that can quickly
F.Salman (2014): This survey offers a comprehensive identify duplicate files by calculating and comparing
overview of data deduplication techniques, with a particular checksums. This process should be capable of handling large
focus on checksum-based methods. It explains how checksums datasets and various file formats with minimal computational
are utilized to identify and eliminate redundant data, leading to overhead.
storage space savings. The article provides insights into the
practical application of checksums in data deduplication. 4. Data Integrity: It is crucial to ensure that the duplication
removal process preserves data integrity. The system should
W.Cong (2012): This research article addresses the security accurately identify duplicates while avoiding false positives
aspect of data deduplication in cloud storage. It discusses the and false negatives, ensuring that no critical data is mistakenly
use of checksums in identifying duplicate encrypted data while deleted.
preserving data confidentiality. The emphasis here is on
ensuring that data remains secure even during the deduplication 5. Scalability: The system should be scalable to accommodate
process, making it suitable for sensitive data storage scenarios. growing data volumes and work seamlessly in enterprise-level
environments or for individual users.
W.Zhihao (2019): This comprehensive survey thoroughly
explores various data deduplication techniques, including the 6. User-Friendly Interface: To make the solution accessible to
role of checksums and hashing. It provides a comparative a wide range of users, the project should include a user-friendly
analysis of different deduplication methods and discusses their interface that allows users to initiate and monitor the
strengths and weaknesses. This article covers multiple aspects duplication removal process.
of deduplication, making it a valuable resource for researchers
and practitioners interested in the field. 7. Automation: The project should incorporate automation
features, such as scheduled scans and removal of duplicates, to
Z.Xiangliang (2014): This research article introduces an minimize manual intervention and provide a seamless
efficient data deduplication scheme specifically tailored for experience for users.
data centres. It explains how checksums and Bloom filters are
incorporated into the deduplication process. The article 8. Integration: The solution should be designed to integrate with
highlights how these techniques work together synergistically various storage platforms, operating systems, and data
to reduce storage redundancy effectively, making it suitable for management tools, making it versatile for different user needs.
large-scale data storage environments.
By addressing these aspects, the "Data Duplication Removal
Using File Checksum" project aims to provide an effective and
scalable solution for identifying and removing duplicate data, 2. Flexibility:
thereby improving data management, reducing storage costs, - Allow users to configure the level of strictness for duplicate
and enhancing overall data integrity. detection, including options for partial matches or near-
duplicates.
IV. PROPOSED SOLUTION 3. Reporting:
- Provide detailed reports on the results of each scan, showing
Data duplication can be a significant problem in various the number of duplicates detected, the space saved, and the
domains, from data storage and management to backup actions taken.
systems. Duplicates consume valuable storage space and can
lead to inefficiencies in data retrieval and processing. The 4. Backup and Restore:
proposed project aims to tackle this issue by employing file - Implement a backup and restore feature that enables users to
checksums as a means to identify and remove duplicate data. recover accidentally deleted files.
The primary objective of this project is to develop a robust
system that can identify and remove duplicate files efficiently 5. Automatic Scheduling:
using checksums. This solution will be applicable in a wide - Allow users to schedule regular scans and removals to
range of scenarios, such as file storage, data backup, and data maintain a clean and organized data repository.
synchronization.

Solution Components: V. IMPLEMENTATION

1. Data Scanning:
- Develop a data scanning module that can traverse directories 1. Data Collection and Preprocessing:
and collect information about each file, including their Data sources: We collected a diverse dataset comprising files
checksum values. from various sources, including documents, images, videos,
- Use various algorithms like MD5, SHA-256, or CRC32 to and audio files, to test the effectiveness of our approach across
compute checksums for files. The choice of algorithm may different file types.
depend on the desired balance between speed and collision Data Preprocessing: Prior to checksum generation, we
resistance. conducted data preprocessing, including data cleaning, file
format standardization, and file categorization, to ensure
2. Checksum Database: uniformity and efficiency in the deduplication process.
- Store the computed checksums in a database for quick
reference. This database should facilitate efficient lookup and 2. Checksum Calculation:
management of checksums. Selection of Hash Function: We chose widely recognized
cryptographic hash functions such as SHA-256 and MD5 to
3. Duplicate Detection: calculate checksums for each file in the dataset.
- Implement an algorithm for duplicate detection, comparing Implementation of the Hashing Algorithm: We implemented
the checksums of files to identify duplicates. the selected hash functions in our system to generate unique
- Use hash tables, trees, or other data structures to optimize the checksums for each file. The checksums were stored in a
search for duplicates. dedicated database for reference.

4. Duplicate Removal: 3. Duplication Detection:

- Develop a mechanism for removing identified duplicate files. Checksum Comparison: During the deduplication process, we
This can involve moving duplicates to a quarantine folder or compared the calculated checksums of new files with the
deleting them, depending on user preferences. existing checksums in the database. If a match was found, the
file was identified as a duplicate.
5. User Interface: Handling Conflicts: In the event of a checksum collision (i.e.,
- Create a user-friendly interface for users to interact with the two distinct files with the same checksum), additional checks
application. The interface should allow users to initiate scans, were performed using file content analysis to confirm or reject
review detected duplicates, and control the removal process. the duplication.

Key Features: 4. Duplicate Removal:

Duplicate Identification: Once a file was identified as a
1. Efficiency: duplicate, it was marked for removal.
- Optimize the checksum computation and duplicate detection Removal Mechanism: Depending on the system's
processes to make the system efficient and responsive, even configuration, duplicate files were either deleted or flagged for
with large data sets. manual review and removal by system administrators.
5. Performance Optimization: File inside folder ‘Duplicate’ before execution of code. File 2
Parallel Processing: To improve processing efficiency, we and File 3 are identical
implemented parallel processing techniques, allowing multiple
files to be checked for duplicates simultaneously.
Database Indexing: We employed database indexing to
accelerate the checksum comparison process.

6. Reporting and Monitoring:

Logging: A comprehensive logging system was implemented
Execution of code
to record the deduplication process, including duplicate file
details, timestamps, and system alerts.
Real-time Monitoring: System administrators were provided
with real-time monitoring capabilities to track the progress of
the deduplication process and address any issues as they arose.

7. Scalability and Integration:

Our system was designed with scalability in mind, enabling the
addition of new data sources and adaptability to various file
storage systems. File inside folder ‘Duplicate’ after execution of code.

Implementing our data duplication removal system using file

checksums has been tested on a range of datasets, VI. RESULT
demonstrating its effectiveness in reducing data redundancy
and improving storage efficiency. Extensive testing and The project "Data Duplication Removal Using File Checksum"
performance evaluation have been conducted to ensure the was undertaken to address the issue of data duplication in
system's reliability and efficiency in real-world applications. computer systems and storage devices. The primary goal was
to develop a system that could identify and remove duplicate
files efficiently through the use of file checksums. Here, we
provide an overview of the project's objectives, methodology,
and the results achieved.

The project was successful in achieving its objectives:

Identification of Duplicate Files: The system was able to

accurately identify duplicate files across various file types with
a high degree of accuracy.

Utilization of File Checksums: The project successfully

implemented checksums, providing unique identifiers for each
file.

Development of an Automated Removal System: The system

offered a user-friendly interface for removing duplicate files
automatically, simplifying the data management process.

Source Code
REFERENCES

1. Smith, J. R., & Johnson, A. B. (2018). Data

deduplication techniques: A comprehensive survey.
International Journal of Data Management, 7(2), 45-
62.

2. Brown, M. L., & Garcia, S. (2019). Efficient file

deduplication using SHA-256 checksums.
Proceedings of the International Conference on Data
Science and Technology, 142-149. 7. National Institute of Standards and Technology
(NIST). (2015). Secure Hash Standard (SHS). Federal
Information Processing Standards Publication 180-4.
3. Anderson, L. C. (2020). A comparative analysis of
checksum algorithms for data deduplication. Journal 8. Patel, R., & Gupta, S. (2021). Enhancing data
of Information Security and Applications, 50, 101539. deduplication with a novel file checksum algorithm.
Proceedings of the International Conference on
4. IEEE Computer Society. (2008). IEEE Standard for Advanced Computing and Data Science, 245-252.
Information Technology - Telecommunications and
Information Exchange between Systems - Local and 9. Reimers, K. P., & Johnson, M. A. (2019). A practical
Metropolitan Area Networks - Specific Requirements approach to data deduplication in large-scale storage
Part 3: Carrier Sense Multiple Access with Collision systems using file checksums. ACM Transactions on
Detection (CSMA/CD) Access Method and Physical Storage, 15(1), 1-21.
Layer Specifications. IEEE Std 802.3-2008.
10. Zhang, Q., & Wang, L. (2014). A comprehensive
5. Garcia, S., & Chen, W. (2017). A novel approach for study of data deduplication in cloud storage. In
data deduplication using MD5 and SHA-256 Proceedings of the IEEE International Conference on
checksums. International Journal of Computer Cloud Computing (CLOUD), 383-390.
Science and Information Security, 15(6), 112-119.

6. Lee, T. K., & Kim, H. S. (2016). Data deduplication

and checksum-based error correction in cloud storage
systems. Journal of Cloud Computing: Advances,
Systems and Applications, 5(1), 1-11.

Accounting Basics for Beginners
No ratings yet
Accounting Basics for Beginners
5 pages
PM Construction Services Masonry Works Proposed Quotation - Supply of Labor Only
No ratings yet
PM Construction Services Masonry Works Proposed Quotation - Supply of Labor Only
1 page
Penang 2 ND Bridge
100% (2)
Penang 2 ND Bridge
128 pages
Sample Acknowledgement
No ratings yet
Sample Acknowledgement
6 pages
Tug and Barge Survey Report PDF
No ratings yet
Tug and Barge Survey Report PDF
14 pages
UPSCPORTAL Magazine Vol 17 September 2010 WWW - Upscportal
No ratings yet
UPSCPORTAL Magazine Vol 17 September 2010 WWW - Upscportal
80 pages
I. Even-Zohar, G. Toury: Introduction To Translation Theory and Intercultural Relations
No ratings yet
I. Even-Zohar, G. Toury: Introduction To Translation Theory and Intercultural Relations
8 pages
Cadiente vs. Santos
100% (2)
Cadiente vs. Santos
2 pages
Touchstone Touchstone Unit 6-12 Unit 6-12
No ratings yet
Touchstone Touchstone Unit 6-12 Unit 6-12
3 pages
Probabilistic Data Deduplication Study
No ratings yet
Probabilistic Data Deduplication Study
5 pages
Case 2 Bigay
100% (2)
Case 2 Bigay
4 pages
College
No ratings yet
College
10 pages
Nomination Petition Challenge in Bethlehem Township
No ratings yet
Nomination Petition Challenge in Bethlehem Township
10 pages
Vacation Insights for Language Learners
No ratings yet
Vacation Insights for Language Learners
2 pages
Contact Centre Module 6 Slides
No ratings yet
Contact Centre Module 6 Slides
8 pages
Normalization of Duplicate Recordsfrom Multiple Sources: Bachelor of Technology IN Computer Science and Engineering
No ratings yet
Normalization of Duplicate Recordsfrom Multiple Sources: Bachelor of Technology IN Computer Science and Engineering
60 pages
Island Eye News - July 20, 2018
No ratings yet
Island Eye News - July 20, 2018
18 pages
Facts:: 1. Bagong Pagkakaisa vs. Secretary G.R. No. 167401 July 5, 2010
No ratings yet
Facts:: 1. Bagong Pagkakaisa vs. Secretary G.R. No. 167401 July 5, 2010
5 pages
Deduplication Using Hadoop and Hbase
100% (1)
Deduplication Using Hadoop and Hbase
18 pages
Intro To Duplicate Detection
No ratings yet
Intro To Duplicate Detection
87 pages
Smpit Assyifa Boarding School Jalancagak Semester Ganjil TAHUN PELAJARAN 2020/2021
No ratings yet
Smpit Assyifa Boarding School Jalancagak Semester Ganjil TAHUN PELAJARAN 2020/2021
6 pages
Notice Writting
No ratings yet
Notice Writting
5 pages
Research - Paper - Secure Text Transfer
No ratings yet
Research - Paper - Secure Text Transfer
10 pages
Secure Data Transfer & Deletion
No ratings yet
Secure Data Transfer & Deletion
51 pages
Research Paper
No ratings yet
Research Paper
10 pages
(Tcs 031) Data Mining and Warehousing: Unit-I
No ratings yet
(Tcs 031) Data Mining and Warehousing: Unit-I
5 pages
Researchpaper - Serverless Computing
No ratings yet
Researchpaper - Serverless Computing
16 pages
United States v. Lawrence Janiec, in No. 71-2149. Appeal of Samuel Laytham, in No. 71-2027, 464 F.2d 126, 3rd Cir. (1972)
No ratings yet
United States v. Lawrence Janiec, in No. 71-2149. Appeal of Samuel Laytham, in No. 71-2027, 464 F.2d 126, 3rd Cir. (1972)
12 pages
Case Studies in Data Structures & Algorithms
No ratings yet
Case Studies in Data Structures & Algorithms
13 pages
43 1480754329 - 03-12-2016 PDF
No ratings yet
43 1480754329 - 03-12-2016 PDF
4 pages
Leap Guide Book Edited
No ratings yet
Leap Guide Book Edited
36 pages
Virtual Assistant
No ratings yet
Virtual Assistant
18 pages
Culture and Customs of Kenya: Neal Sobania
No ratings yet
Culture and Customs of Kenya: Neal Sobania
257 pages
Serverless Computing
No ratings yet
Serverless Computing
25 pages
Duplicate Detection of Record Linkage in Real-World Data: K. M, P T
No ratings yet
Duplicate Detection of Record Linkage in Real-World Data: K. M, P T
10 pages
Cse Project Abstrct
No ratings yet
Cse Project Abstrct
4 pages
Voice - Assistant - Research Paper
No ratings yet
Voice - Assistant - Research Paper
6 pages
Why I Always Choose The Hard Way
No ratings yet
Why I Always Choose The Hard Way
2 pages
Research Paper On Plagiarism Detection Methods: Submitted By: Supervised by
No ratings yet
Research Paper On Plagiarism Detection Methods: Submitted By: Supervised by
15 pages
WFH Activity: Ethics
No ratings yet
WFH Activity: Ethics
7 pages
Research Article
No ratings yet
Research Article
5 pages
Kits Mtech Front
No ratings yet
Kits Mtech Front
6 pages
Data Duplication Removal Using File Checksum
No ratings yet
Data Duplication Removal Using File Checksum
2 pages
Credit Card Frauds
No ratings yet
Credit Card Frauds
52 pages
Academic Managment
No ratings yet
Academic Managment
59 pages
Data Science Assignment Guide
No ratings yet
Data Science Assignment Guide
3 pages
Sugar Daddy
100% (13)
Sugar Daddy
18 pages
17CS651 DMDW
No ratings yet
17CS651 DMDW
302 pages
2022 GR 12 Accounting Study Pack Activities
No ratings yet
2022 GR 12 Accounting Study Pack Activities
8 pages
Punjab Lecturer English Paper Indiresult
No ratings yet
Punjab Lecturer English Paper Indiresult
15 pages
NIS Part AB Rinku
No ratings yet
NIS Part AB Rinku
10 pages
04-Functionaldependencies 2
No ratings yet
04-Functionaldependencies 2
72 pages
AAM Praposal
No ratings yet
AAM Praposal
3 pages
Project Phase-1 Report Final
No ratings yet
Project Phase-1 Report Final
30 pages
File Sharing and Data Duplication Removal in Cloud Using File Checksum
No ratings yet
File Sharing and Data Duplication Removal in Cloud Using File Checksum
3 pages
Fs Project Report
No ratings yet
Fs Project Report
50 pages
Final Thesis
No ratings yet
Final Thesis
34 pages
A Driving Decision Strategy (DDS) Based On Machine Learning For An Autonomous Vehicle
No ratings yet
A Driving Decision Strategy (DDS) Based On Machine Learning For An Autonomous Vehicle
19 pages
DDAS (Data Duplicate Alert System)
100% (1)
DDAS (Data Duplicate Alert System)
9 pages
Vanshika Verma: Skills
No ratings yet
Vanshika Verma: Skills
1 page
Reading 55 Yield and Yield Spread Measures For Fixed-Rate Bonds
No ratings yet
Reading 55 Yield and Yield Spread Measures For Fixed-Rate Bonds
17 pages
53.entrepreneur Project
No ratings yet
53.entrepreneur Project
56 pages
25 127 Network1
No ratings yet
25 127 Network1
9 pages
Computer Forensics Analysis and Validation With Tool Demo
No ratings yet
Computer Forensics Analysis and Validation With Tool Demo
36 pages
Efficient Cloud Data Auditing
No ratings yet
Efficient Cloud Data Auditing
3 pages
Download
No ratings yet
Download
3 pages
2025 Allianz Risk Barometer Report
No ratings yet
2025 Allianz Risk Barometer Report
47 pages
Conference CloudFileOptimizer
No ratings yet
Conference CloudFileOptimizer
4 pages
Crime Analysis and Prediction Using Data: Harshada Pramod Nikam
No ratings yet
Crime Analysis and Prediction Using Data: Harshada Pramod Nikam
19 pages
DS REPORT (1) K
No ratings yet
DS REPORT (1) K
22 pages
First
No ratings yet
First
71 pages
Final Documentation 9th Batch
No ratings yet
Final Documentation 9th Batch
55 pages
Azhar Dis.12
No ratings yet
Azhar Dis.12
74 pages
Reportgfdsa 0000000
No ratings yet
Reportgfdsa 0000000
49 pages
Awards Invitation
No ratings yet
Awards Invitation
1 page
Attribute Based Data Sharing
No ratings yet
Attribute Based Data Sharing
80 pages
Ayesha
No ratings yet
Ayesha
43 pages
Documentation
No ratings yet
Documentation
73 pages
CHANDRIKA INTERNSHIP REPORT To Be Printed Tomorrow
No ratings yet
CHANDRIKA INTERNSHIP REPORT To Be Printed Tomorrow
34 pages
1822 B.E Cse Batchno 141
No ratings yet
1822 B.E Cse Batchno 141
56 pages
SS ABSTRACTgrp 14
No ratings yet
SS ABSTRACTgrp 14
2 pages
Character-Centered Data Verification
No ratings yet
Character-Centered Data Verification
75 pages
Checklist For KPK APK KKS PHK
No ratings yet
Checklist For KPK APK KKS PHK
1 page
Starting Final Organized
No ratings yet
Starting Final Organized
62 pages
Ii Cse CS3492 QB Int4
No ratings yet
Ii Cse CS3492 QB Int4
4 pages
PWP Project Done
No ratings yet
PWP Project Done
19 pages
Inversion of Arrays1
No ratings yet
Inversion of Arrays1
12 pages
Children in Conflict With Law Procedures in India N
No ratings yet
Children in Conflict With Law Procedures in India N
9 pages
Varsha .... Internship
No ratings yet
Varsha .... Internship
10 pages
A FNN Project Cleaned
No ratings yet
A FNN Project Cleaned
131 pages
Minor Project Report Final One
No ratings yet
Minor Project Report Final One
31 pages
IJCDS Latex 4 8 2024-2
No ratings yet
IJCDS Latex 4 8 2024-2
12 pages
SELab
No ratings yet
SELab
20 pages

Research Article

Uploaded by

Research Article

Uploaded by

Data Duplication Removal Using File Checksum

Utpal Choudhary Saksham Jaiswal Manan Sharma Vikas Gupta

Ms. Sukhmeet Kaur

XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE

Solution Components: V. IMPLEMENTATION

4. Duplicate Removal: 3. Duplication Detection:

Key Features: 4. Duplicate Removal:

6. Reporting and Monitoring:

7. Scalability and Integration:

Implementing our data duplication removal system using file

The project was successful in achieving its objectives:

Identification of Duplicate Files: The system was able to

Utilization of File Checksums: The project successfully

Development of an Automated Removal System: The system

1. Smith, J. R., & Johnson, A. B. (2018). Data

2. Brown, M. L., & Garcia, S. (2019). Efficient file

6. Lee, T. K., & Kim, H. S. (2016). Data deduplication

You might also like