File, Block and Object storage in distributed storage system

At the bottom of the storage system, there are lots of data. The physical storage medias are limited in a single server, the IO performance is also limited, the distributed storage system is used to fix this problem. It is infrastructure that can store data on multiple physical servers, which behave as one storage system although data is distributed between these servers. It typically takes the form of a cluster of storage servers, with a mechanism for data synchronization and coordination between cluster nodes. 

Distributed storage system can store 3 types of storage: file, block, and object. The essential difference is the "user" of the data: the user of the block storage is the software system that can read and write to the block device, such as the traditional file system, database; The user of the file storage is a natural person; The user of object storage is other computer software. 

File storage

The user of the file storage is a natural person. All data are presents by 0 and 1 in computer hardware devices, which is completely unrecognizable and unmanageable to us. So we use the concept of "file" for the data organization. There are different types of files (usually use different suffixes to refer to different types) based on the different structures of application requirements, then we gave each file a name which is easy to understand and remember. When there are a lot of files, we group these files in some way put them into a directory (or folder). Of course, we also need to name these directories. And the directory may also include sub directory in addition to the files. All the files, directories form a tree structure. 

The data on the storage medium is organized into a data structure in the form of directory - subdirectory - file. File system is the program which is used to find, add, modify, delete files from the structure, and also to maintain the structure. There are many types of file systems, FAT/FAT32/NTFS in Windows system, EXT2/EXT3/EXT4/XFS/BtrFS in Linux and so on. However, in the network storage, the underlying data is not stored in the local storage medium, but on another server, and different clients can access the files on this server in a way similar to the File System. Such kind of system is called Network File System. Common network file systems include CIFS (also called SMB) for Windows networks, NFS for UNIX-like systems networks, and so on. And besides network file system, FTP, HTTP is also a special implementation of file storage, can access a file through a URL.

Block storage

A traditional file system that directly accesses the hardware media on which data is stored. The medium doesn't and can't care about how the data is organized and structured, so the simplest way to organize it that all the data are divided into blocks of fixed size, and each block is assigned an addressable number. Take the mechanical hard disk as an example, a block is a sector, it has 512 bytes for old hard disk, and 4K bytes for new hard disk. Old hard disk uses the number with Cylinder-Head-Sector for addressing, while new hard disk uses Logical Block number (LBA) for addressing. A hard disk is often called a Block Device. Of course, there are other Block devices besides hard disks, such as floppy disks, optical disks, magnetic tapes, and so on.

It is up to the file system to decide which blocks make up a file and which blocks record directory/subdirectory information. For management purposes, block devices such as hard disks can often be divided into logical block devices, known as partitions of hard disks. On the other hand, because of the limited capacity and performance of a single medium, multiple physical block devices can be combined into a logical block device through some technical methods, for example, various levels of RAID, Volume Manager of some operating systems (dynamic disk of Windows, LVM of Linux)

In network storage, server can simulate a block device through some protocol using a local logical block device (may be a part of a physical block device, a combination of multiple physical block devices, a part of the combination of multiple physical block devices, or even a file on a local file system). A remote client (may be a physical server or a virtual machine) uses the same protocol to attach the block device as a local storage medium, to partition and format it. This is block storage, and the common block storage protocol is iSCSI.

Object storage

Object storage uses a unified underlying storage system, manages the organization structure of these files and the underlying media, and then gives each file a unique identifier. Other system can use the identifier to access the file. The storage system can manage these identifiers and the blocks on the corresponding storage media in a more efficient way. Of course, for different software systems, a single access may not get the file in the traditional sense, it may be a value, a group of value, a part of file, a combination of multiple files, or even a block device, we call it as object.

Comments

Popular posts from this blog

What is MPIO and how to use it?

Introduction to the basic architecture and operation of Internet Small Computer Systems Interface

Why is QStora a software-defined storage controller