Big Data and Distributed storage

Shreeraj Redgaonkar
4 min readSep 17, 2020

We all live in the age of internet, where every interaction with your computer or phone creates data. Every interaction on social media creates data. Every time you walk down the street with a phone in your pocket, it’s tracking your location through GPS sensors — more data. Every time you buy something with your debit card? Data. Every time you read an article online? Data. Every time you stream a song, movie or podcast? Data, data, data…

Data

According to statistics, in year 2018, in just one minute, Twitter users sent 473,400 tweets, Snapchat users shared 2 million photos, Instagram users posted 49,380 pictures and LinkedIn gained 120 new users. Talking of facebook, the social media giant alone generates 400 510,000 comments, 293,000 statuses, 4 million likes,and 136,000 new photos- all within the span of just 1 minute.

1 minute GIF

Reports say that social media site Facebook generates 4 petabytes of data per day — that’s a million gigabytes. Google processes more than 40,000 searches every second, or 3.5 billion searches a day. 12 years back, in year 2008, Google published that it’s MapReduce jobs were processing more than 20 petabytes of data every single day! That’s 4 times the data Facebook produces Today… Unfortunately, Google doesn’t disclose how much data it stores and processes now a days, but looking back at 2008, it must be enormous.

All of this data that is stored, processed and analyzed by all MNCs (Multi-National Companies) is very big in size, as we discussed above. So that’s the reason, why it’s called as Big data. Big data is a term that describes the large volume of data.

According to market intelligence company IDC, Total data stored in the whole world is 18 zettabytes (that’s 10²¹ Bytes). The same company predicts that there would be 175 zettabytes of data in 2025.

The invention of mobile phones too has skyrocketed the world of data. According to Facebook, more than 95% of its traffic comes from mobile phones. This tells us that the mobile phones dominates any other electronics when it comes to data creation all round the world! The mobile users of Facebook alone tripled within the span of just 2 years.

Mobile users of Facebook

Naturally, a question arises in our minds, where is all this data stored? Off-course, the data is stored in different data centers, but has anyone thought exactly how?

I know most of us don’t think about it, but just out of curiosity, if you want to know how this data is stored, hang on tight, as that’s what I am gonna discuss here.

Hold on…

Okay, so let’s get started… Now answer some of my questions. Where do you store your local data on computer? Hard disks, right? But what is the maximum capacity of a hard disk? It’s around 12 TB. So how can we store petabytes of data in those hard disks? Can we store it inside a 12 TB Hard disk? Obviously not. Then How?

Now here comes the concept of Distributed Storage. So what is the meaning of Distributed storage? Let’s find out. Suppose I have 100 GB data and I want to store it in my two 64 GB pen drives. So how can I store the data? Now you will tell me that divide the data into 2 pieces of 50 GB each. Easy, isn’t it? So now I can store my 100 GB data in my two 64 GB pen drives. So here, what I exactly did is I distributed my 100 GB by dividing it into smaller size which I can store. In the same way, large data centers divide their data into multiple small pieces and store them into multiple hard disks with the help of various technologies like Hadoop. So easy, right?

I hope that I have solved your curiosity and well as helped you gain some additional knowledge. Thanks for reading till the end.

--

--