-
Codes for Distributed Storage
Authors:
Vinayak Ramkumar,
Myna Vajha,
S. B. Balaji,
M. Nikhil Krishnan,
Birenjith Sasidharan,
P. Vijay Kumar
Abstract:
This chapter deals with the topic of designing reliable and efficient codes for the storage and retrieval of large quantities of data over storage devices that are prone to failure. For long, the traditional objective has been one of ensuring reliability against data loss while minimizing storage overhead. More recently, a third concern has surfaced, namely of the need to efficiently recover from…
▽ More
This chapter deals with the topic of designing reliable and efficient codes for the storage and retrieval of large quantities of data over storage devices that are prone to failure. For long, the traditional objective has been one of ensuring reliability against data loss while minimizing storage overhead. More recently, a third concern has surfaced, namely of the need to efficiently recover from the failure of a single storage unit, corresponding to recovery from the erasure of a single code symbol. We explain here, how coding theory has evolved to tackle this fresh challenge.
△ Less
Submitted 3 October, 2020;
originally announced October 2020.
-
A Tight Rate Bound and Matching Construction for Locally Recoverable Codes with Sequential Recovery From Any Number of Multiple Erasures
Authors:
S. B. Balaji,
Ganesh R. Kini,
P. Vijay Kumar
Abstract:
By a locally recoverable code (LRC), we will in this paper, mean a linear code in which a given code symbol can be recovered by taking a linear combination of at most $r$ other code symbols with $r << k$. A natural extension is to the local recovery of a set of $t$ erased symbols. There have been several approaches proposed for the handling of multiple erasures. The approach considered here, is on…
▽ More
By a locally recoverable code (LRC), we will in this paper, mean a linear code in which a given code symbol can be recovered by taking a linear combination of at most $r$ other code symbols with $r << k$. A natural extension is to the local recovery of a set of $t$ erased symbols. There have been several approaches proposed for the handling of multiple erasures. The approach considered here, is one of sequential recovery meaning that the $t$ erased symbols are recovered in succession, each time contacting at most $r$ other symbols for assistance in recovery. Under the constraint that each erased symbol be recoverable by contacting at most $r$ other code symbols, this approach is the most general and hence offers maximum possible code rate. We characterize the maximum possible rate of an LRC with sequential recovery for any $r \geq 3$ and $t$. We do this by first deriving an upper bound on code rate and then going on to construct a {\em binary} code that achieves this optimal rate. The upper bound derived here proves a conjecture made earlier relating to the structure (but not the exact form) of the rate bound. Our approach also permits us to deduce the structure of the parity-check matrix of a rate-optimal LRC with sequential recovery.
The parity-check matrix in turn, leads to a graphical description of the code. The construction of a binary code having rate achieving the upper bound derived here makes use of this description. Interestingly, it turns out that a subclass of binary codes that are both rate and block-length optimal, correspond to graphs known as Moore graphs that are regular graphs having the smallest number of vertices for a given girth. A connection with Tornado codes is also made in the paper.
△ Less
Submitted 6 December, 2018;
originally announced December 2018.
-
Erasure Codes for Distributed Storage: Tight Bounds and Matching Constructions
Authors:
S. B. Balaji,
P. Vijay Kumar
Abstract:
This thesis makes several significant contributions to the theory of both Regenerating (RG) and Locally Recoverable (LR) codes. The two principal contributions are characterizing the optimal rate of an LR code designed to recover from $t$ erased symbols sequentially, for any $t$ and the development of a tight bound on the sub-packetization level (length of a vector code symbol) of a sub-class of R…
▽ More
This thesis makes several significant contributions to the theory of both Regenerating (RG) and Locally Recoverable (LR) codes. The two principal contributions are characterizing the optimal rate of an LR code designed to recover from $t$ erased symbols sequentially, for any $t$ and the development of a tight bound on the sub-packetization level (length of a vector code symbol) of a sub-class of RG codes called optimal-access RG codes. There are however, several other notable contributions as well such as deriving the tightest-known bounds on the performance metrics such as minimum distance and rate of a sub-class of LR codes known as availability codes. The thesis also presents some low field size constructions of Maximal Recoverable codes.
△ Less
Submitted 12 June, 2018;
originally announced June 2018.
-
Erasure Coding for Distributed Storage: An Overview
Authors:
S. B. Balaji,
M. Nikhil Krishnan,
Myna Vajha,
Vinayak Ramkumar,
Birenjith Sasidharan,
P. Vijay Kumar
Abstract:
In a distributed storage system, code symbols are dispersed across space in nodes or storage units as opposed to time. In settings such as that of a large data center, an important consideration is the efficient repair of a failed node. Efficient repair calls for erasure codes that in the face of node failure, are efficient in terms of minimizing the amount of repair data transferred over the netw…
▽ More
In a distributed storage system, code symbols are dispersed across space in nodes or storage units as opposed to time. In settings such as that of a large data center, an important consideration is the efficient repair of a failed node. Efficient repair calls for erasure codes that in the face of node failure, are efficient in terms of minimizing the amount of repair data transferred over the network, the amount of data accessed at a helper node as well as the number of helper nodes contacted. Coding theory has evolved to handle these challenges by introducing two new classes of erasure codes, namely regenerating codes and locally recoverable codes as well as by coming up with novel ways to repair the ubiquitous Reed-Solomon code. This survey provides an overview of the efforts in this direction that have taken place over the past decade.
△ Less
Submitted 12 June, 2018;
originally announced June 2018.
-
Small-d MSR Codes with Optimal Access, Optimal Sub-Packetization and Linear Field Size
Authors:
Myna Vajha,
S. B. Balaji,
P. Vijay Kumar
Abstract:
This paper presents an explicit construction of a class of optimal-access, minimum storage regenerating (MSR) codes, for small values of the number $d$ of helper nodes. The construction is valid for any parameter set $(n,k,d)$ with $d \in \{k+1, k+2, k+3\}$ and employs a finite field $\mathbb{F}_q$ of size $q=O(n)$. We will refer to the constructed codes as Small-d MSR codes. The sub-packetization…
▽ More
This paper presents an explicit construction of a class of optimal-access, minimum storage regenerating (MSR) codes, for small values of the number $d$ of helper nodes. The construction is valid for any parameter set $(n,k,d)$ with $d \in \{k+1, k+2, k+3\}$ and employs a finite field $\mathbb{F}_q$ of size $q=O(n)$. We will refer to the constructed codes as Small-d MSR codes. The sub-packetization level $α$ is given by $α= s^{{\lceil\frac{n}{s}\rceil}}$, where $s=d-k+1$. By an earlier result on the sub-packetization level for optimal-access MSR codes, this is the smallest value possible.
△ Less
Submitted 22 September, 2021; v1 submitted 2 April, 2018;
originally announced April 2018.
-
A Rate-Optimal Construction of Codes with Sequential Recovery with Low Block Length
Authors:
Balaji Srinivasan Babu,
Ganesh R. Kini,
P. Vijay Kumar
Abstract:
An erasure code is said to be a code with sequential recovery with parameters $r$ and $t$, if for any $s \leq t$ erased code symbols, there is an $s$-step recovery process in which at each step we recover exactly one erased code symbol by contacting at most $r$ other code symbols. In earlier work by the same authors, presented at ISIT 2017, we had given a construction for binary codes with sequent…
▽ More
An erasure code is said to be a code with sequential recovery with parameters $r$ and $t$, if for any $s \leq t$ erased code symbols, there is an $s$-step recovery process in which at each step we recover exactly one erased code symbol by contacting at most $r$ other code symbols. In earlier work by the same authors, presented at ISIT 2017, we had given a construction for binary codes with sequential recovery from $t$ erasures, with locality parameter $r$, which were optimal in terms of code rate for given $r,t$, but where the block length was large, on the order of $r^{c^t}$, for some constant $c >1$. In the present paper, we present an alternative construction of a rate-optimal code for any value of $t$ and any $r\geq3$, where the block length is significantly smaller, on the order of $r^{\frac{5t}{4}+\frac{7}{4}}$ (in some instances of order $r^{\frac{3t}{2}+2}$). Our construction is based on the construction of certain kind of tree-like graphs with girth $t+1$. We construct these graphs and hence the codes recursively.
△ Less
Submitted 21 January, 2018;
originally announced January 2018.
-
On Lower Bounds on Sub-Packetization Level of MSR codes and On The Structure of Optimal-Access MSR Codes Achieving The Bound
Authors:
S. B. Balaji,
Myna Vajha,
P. Vijay Kumar
Abstract:
We present two lower bounds on sub-packetization level $α$ of MSR codes with parameters $(n, k, d=n-1, α)$ where $n$ is the block length, $k$ dimension, $d$ number of helper nodes contacted during single node repair and $α$ the sub-packetization level. The first bound we present is for any MSR code and is given by $α\ge e^{\frac{(k-1)(r-1)}{2r^2}}$.
The second bound we present is for the case of…
▽ More
We present two lower bounds on sub-packetization level $α$ of MSR codes with parameters $(n, k, d=n-1, α)$ where $n$ is the block length, $k$ dimension, $d$ number of helper nodes contacted during single node repair and $α$ the sub-packetization level. The first bound we present is for any MSR code and is given by $α\ge e^{\frac{(k-1)(r-1)}{2r^2}}$.
The second bound we present is for the case of optimal-access MSR codes and the bound is given by $α\ge \min \{ r^{\frac{n-1}{r}}, r^{k-1} \}$. There exist optimal-access MSR constructions that achieve the second sub-packetization level bound with an equality making this bound tight.
We also prove that for an optimal-access MSR codes to have optimal sub-packetization level under the constraint that the indices of helper symbols are dependant only on the failed node, it is needed that the support of the parity check matrix is same as the support structure of several other optimal constructions in literature.
△ Less
Submitted 18 September, 2021; v1 submitted 16 October, 2017;
originally announced October 2017.
-
A Tight Rate Bound and a Matching Construction for Locally Recoverable Codes with Sequential Recovery From Any Number of Multiple Erasures
Authors:
S. B. Balaji,
Ganesh R. Kini,
P. Vijay Kumar
Abstract:
An $[n,k]$ code $\mathcal{C}$ is said to be locally recoverable in the presence of a single erasure, and with locality parameter $r$, if each of the $n$ code symbols of $\mathcal{C}$ can be recovered by accessing at most $r$ other code symbols. An $[n,k]$ code is said to be a locally recoverable code with sequential recovery from $t$ erasures, if for any set of $s \leq t$ erasures, there is an…
▽ More
An $[n,k]$ code $\mathcal{C}$ is said to be locally recoverable in the presence of a single erasure, and with locality parameter $r$, if each of the $n$ code symbols of $\mathcal{C}$ can be recovered by accessing at most $r$ other code symbols. An $[n,k]$ code is said to be a locally recoverable code with sequential recovery from $t$ erasures, if for any set of $s \leq t$ erasures, there is an $s$-step sequential recovery process, in which at each step, a single erased symbol is recovered by accessing at most $r$ other code symbols. This is equivalent to the requirement that for any set of $s \leq t$ erasures, the dual code contain a codeword whose support contains the coordinate of precisely one of the $s$ erased symbols. In this paper, a tight upper bound on the rate of such a code, for any value of number of erasures $t$ and any value $r \geq 3$, of the locality parameter is derived. This bound proves an earlier conjecture due to Song, Cai and Yuen. While the bound is valid irrespective of the field over which the code is defined, a matching construction of {\em binary} codes that are rate-optimal is also provided, again for any value of $t$ and any value $r\geq3$.
△ Less
Submitted 17 February, 2017; v1 submitted 25 November, 2016;
originally announced November 2016.
-
Bounds on Codes with Locality and Availability
Authors:
S. B. Balaji,
P. Vijay Kumar
Abstract:
In this paper we investigate bounds on rate and minimum distance of codes with $t$ availability. We present bounds on minimum distance of a code with $t$ availability that are tighter than existing bounds. For bounds on rate of a code with $t$ availability, we restrict ourselves to a sub-class of codes with $t$ availability called codes with strict $t$ availability and derive a tighter rate bound.…
▽ More
In this paper we investigate bounds on rate and minimum distance of codes with $t$ availability. We present bounds on minimum distance of a code with $t$ availability that are tighter than existing bounds. For bounds on rate of a code with $t$ availability, we restrict ourselves to a sub-class of codes with $t$ availability called codes with strict $t$ availability and derive a tighter rate bound. Codes with strict $t$ availability can be defined as the null space of an $(m \times n)$ parity-check matrix $H$, where each row has weight $(r+1)$ and each column has weight $t$, with intersection between support of any two rows atmost one. We also present two general constructions for codes with $t$ availability.
△ Less
Submitted 28 February, 2017; v1 submitted 1 November, 2016;
originally announced November 2016.
-
Binary Codes with Locality for Four Erasures
Authors:
S. B. Balaji,
K. P. Prasanth,
P. Vijay Kumar
Abstract:
In this paper, codes with locality for four erasures are considered. An upper bound on the rate of codes with locality with sequential recovery from four erasures is derived. The rate bound derived here is field independent. An optimal construction for binary codes meeting this rate bound is also provided. The construction is based on regular graphs of girth $6$ and employs the sequential approach…
▽ More
In this paper, codes with locality for four erasures are considered. An upper bound on the rate of codes with locality with sequential recovery from four erasures is derived. The rate bound derived here is field independent. An optimal construction for binary codes meeting this rate bound is also provided. The construction is based on regular graphs of girth $6$ and employs the sequential approach of locally recovering from multiple erasures. An extension of this construction that generates codes which can sequentially recover from five erasures is also presented.
△ Less
Submitted 3 November, 2016; v1 submitted 11 July, 2016;
originally announced July 2016.
-
Binary Codes with Locality for Multiple Erasures Having Short Block Length
Authors:
S. B. Balaji,
K. P. Prasanth,
P. Vijay Kumar
Abstract:
The focus of this paper is on linear, binary codes with locality having locality parameter $r$, that are capable of recovering from $t\geq 2$ erasures and that moreover, have short block length. Both sequential and parallel (through orthogonal parity checks) recovery is considered here. In the case of parallel repair, minimum-block-length constructions for general $t$ are discussed. In the case of…
▽ More
The focus of this paper is on linear, binary codes with locality having locality parameter $r$, that are capable of recovering from $t\geq 2$ erasures and that moreover, have short block length. Both sequential and parallel (through orthogonal parity checks) recovery is considered here. In the case of parallel repair, minimum-block-length constructions for general $t$ are discussed. In the case of sequential repair, the results include (a) extending and characterizing minimum-block-length constructions for $t=2$, (b) providing improved bounds on block length for $t=3$ as well as a general construction for $t=3$ having short block length, (c) providing short-block-length constructions for general $r,t$ and (d) providing high-rate constructions for $r=2$ and $t$ in the range $4 \leq t \leq7$. Most of the constructions provided are of binary codes.
△ Less
Submitted 2 February, 2016; v1 submitted 26 January, 2016;
originally announced January 2016.
-
On Partial Maximally-Recoverable and Maximally-Recoverable Codes
Authors:
S. B. Balaji,
P. Vijay Kumar
Abstract:
An [n, k] linear code C that is subject to locality constraints imposed by a parity check matrix H0 is said to be a maximally recoverable (MR) code if it can recover from any erasure pattern that some k-dimensional subcode of the null space of H0 can recover from. The focus in this paper is on MR codes constrained to have all-symbol locality r. Given that it is challenging to construct MR codes ha…
▽ More
An [n, k] linear code C that is subject to locality constraints imposed by a parity check matrix H0 is said to be a maximally recoverable (MR) code if it can recover from any erasure pattern that some k-dimensional subcode of the null space of H0 can recover from. The focus in this paper is on MR codes constrained to have all-symbol locality r. Given that it is challenging to construct MR codes having small field size, we present results in two directions. In the first, we relax the MR constraint and require only that apart from the requirement of being an optimum all-symbol locality code, the code must yield an MDS code when punctured in a single, specific pattern which ensures that each local code is punctured in precisely one coordinate and that no two local codes share the same punctured coordinate. We term these codes as partially maximally recoverable (PMR) codes. We provide a simple construction for high-rate PMR codes and then provide a general, promising approach that needs further investigation. In the second direction, we present three constructions of MR codes with improved parameters, primarily the size of the finite field employed in the construction
△ Less
Submitted 28 January, 2015;
originally announced January 2015.