Gemineye Logo

Articles

What is a Data Lake?

A data lake is home to all of your data from all your departments in its purest, raw form.

Water dumping into a body of water.

Articles

What is a Data Lake?

A data lake is home to all of your data from all your departments in its purest, raw form.

Water dumping into a body of water.

If you’ve ever tried swimming in a data lake, you know what disappointment feels like.

 

Not only is there no water (it’s bad for the computers), but those sharp things byting (see what we did there??) at your ankles are probably spreadsheets, graphs, and emails that you tried to ignore while you were still back on dry land. 

 

So, the first big lesson is: don’t go swimming in a data lake.

 

But what is a data lake anyway…and why are they so incredibly dry?

 

We’ve written about data warehouses, which are giant repositories of formatted data that are categorized and sorted for easy access. Data from different teams, departments, or any other subset can be grouped together so that it’s noted to belong to that group. The data is neat and clean, just like a well-organized warehouse.

 

Yet when you have massive amounts of data coming at you all at once, sometimes that processing and organization is just too laborious. You may intentionally want to keep your data in its raw form, so that you’re not wasting any time manipulating it or cleaning it up. This includes everything from emails to spreadsheets to images to PDF files.

 

In order to store all of your organization’s data in its natural state, you’ll need what is known as a data lake. 

 

A data lake is called that because it’s home to just about everything in its purest, raw form. This can be data from spreadsheets and databases, to tables and websites, images, audio, and even video. Using a data lake means that it can all live together in the same place, and while it can be curated from there, it does not need to be to enter the lake. 

 

The benefit of having a data lake is that you can access what you need, when you need it, without the unnecessary time or expense of grouping and manipulating data that you’re not using. For example, if you were looking to access financial data from the organization for multiple years, or listen to customer service calls from a specific period, you could access both easily from the same central repository. 

 

Depending on the size and needs of your financial organization, you may have a data lake as just one of your many data storage tools. You may also be utilizing a data warehouse, which stores your data in a crisp and organized format as opposed to the massive pool in a data lake. You may even be utilizing a data lakehouse, which is a combination of the two that stores that raw data, but overlays formatting to make it more readable.