Go to page

Bibliographic Metadata


Data deduplication systems discover and remove redundancies between data blocks. The search for redundant data blocks is often based on hashing the content of a block and comparing the resulting hash value with already stored entries inside an index. The limited random IO performance of hard disks limits the overall throughput of such systems, if the index does not fit into main memory.This paper presents the architecture of the dedupv1 deduplication system that uses solid-state drives (SSDs) to improve its throughput compared to disk-based systems. dedupv1 is designed to use the sweet spots of SSD technology (random reads and se- quential operations), while avoiding random writes inside the data path. This is achieved by using a hybrid deduplication design. It is an inline deduplication system as it per- forms chunking and fingerprinting online and only stores new data, but it is able to delay much of the processing as well as IO operations. An important advantage of the dedupv1 system is that it does not rely on temporal or spatial locality to achieve high performance. But using the filter chain abstraction the system can easily be extended to facilitate locality to improve the throughput.