Sometimes, financial institutions mistake their data warehouses for nothing more than massive data dumps. But in reality, a data warehouse should be your business powerhouse, not an operational data store.
The Gemineye Team Explains Why Data Warehouses are More than Data Dumps
Gemineye crew Brewster Knowlton, CEO and Maggie Chopp, Director of Business Development, discuss why true data warehouses are about much more than storage. A proper data warehouse should model, normalize, and unify data from multiple sources to create a single source of truth that provides the foundation for ALL decision-making and AI initiatives.
Key Takeaways in this Video Include:
- Data warehouses are about the quality of the data process, not just storage
- The true power of a data warehouse lies in its ability to model, clean, and define data consistently
- Organizations often mistake staging layers or data lakes for warehouses
- Effective data warehouses implement strict modeling, business logic, and normalization to enable scalable, insightful analysis
- The unsexy groundwork is the real differentiator in data maturity
Full Transcript
Alicia Disantis: Maggie, aren’t data warehouses just big data dumps?
Maggie Chopp: No, they’re not, but I can understand why some people might think that way.
I’m a data warehousing originally became a topic. People were treating it that way and just plopping everything in the spot and checking off the box and hoping that got them some level of result. We know now. It’s been a long time since those days.
Your data doesn’t do anything for you when it’s just sitting there. So, theoretically, yeah, you could have one of these at your organization. But if data warehousing is done well and right, it’s got a lot of things that are happening from the ingestion to the modeling, the centralizing, the logic, data cleaning, if you have it going on, out to the actual analysis.
So, no, a good data warehouse is actually producing an effect that the teams are then using to drive business outcomes.
So, no, they’re not just big data dumps, but we understand why. That’s a really common misconception.
Brewster Knowlton: The reality is, if you think your data warehouse is just a big data dump, congratulations. You don’t have a data warehouse. You have an operational data set like what you’ve just described. And a lot of cases is a staging layer or an ODS where you’re accumulating all this information. All of it is more or less raw, maybe with some minor date and D tagging and metadata, but it’s generally just a large swath of data.
When you get into the warehouse, that’s when you actually get into dimensionalization and modeling. I might have four different subject areas for different sources, excuse me, that have records about a member. Well, that needs to be in one spot so that I don’t have to go to four separate places to get my definition of a member. If I’ve done that, I’ve just created a fancy version of the isolated and disparate systems that I already have today.
So the data warehouse is where it comes up a lot more now in the context of AI is where that context, that awareness, that curation of not just data from a normalization perspective, but from an actual business definition standpoint has to be stored. Because if I have to go to seven different places and know all of these rules intuitively, there’s no scalability and there’s no leveraging the idea of what a data warehouse or lakehouse or whatever you want to call it, just this centralized, consolidated, mastered, normalize where definitions and logics are applied.
That has to be there. Everyone wants to talk about all the stuff that they want to do with AI. That’s like trying to design your bathroom in your kitchen before you figured out how big of a house you want to have. Do I need a foundation?
It’s like you can’t just go shopping for all the cool stuff until you’ve done the unsexy stuff, but that’s the important pieces that lay the groundwork, the foundation, literally and metaphorically for what you want to accomplish with AI in the future.
And of course, all of your natural other business focused data outcomes.
Maggie Chopp: And the last thing I to add, Alicia, because this is a really interesting question, is we talked to lots of credit unions that are going through a self-assessment process, and what we find is that they may say, hey, we have ten data sources in our data warehouse, but when you really look under the hood, they have two that are maybe, you know, being adjusted and modeled and used, and they have maybe eight other that are just being kind of dumped in.
So we really try to look beyond the surface level of is the data there, and present, to is the data being used, is kind of a different question.
