From the course: Top 10 Skills for BI Developers

ETL process

"

- [Instructor] The next skill that you should consider learning more about in terms of being a good BI developer is understanding the ETL process, which, again, stands for extract, transform, and load. Now, this is the process that takes what you learned in the previous step of data mapping and applies it to the real world where you pull the data out, you extract it, then you transform it, meaning you clean up certain data elements and you match it up with your other data sources. And then, you load it into a new database, whether that's a traditional database or some other kind of data store. And then at the end of that, you have this beautiful data set that is ready for you to ask questions of. And if you mapped it out properly, you should be able to reuse it for many other projects within a given business process. Another way that people do this nowadays is kind of the opposite, where you extract the data, you just pull it out; you load it into usually what we call a data lake, sort of just like a dumping ground; and then we transform it and build new tables on top of that. So you may hear ELT in addition to ETL, but essentially the concept is the same thing, where we pull the data out from the source systems, we land it into a repository, and then we transform it into something that is easier to query and ask questions of. Now, the benefits of loading the data before transforming it is that if something happens during that transformation step, maybe a new data type is uncovered or something that wasn't there is there now, there can be issues. And if you didn't have that data living somewhere that you could go back to, you could actually lose data. So landing it into a staging area or loading it into a data lake, if you want to use those terms, is kind of the preferred way. Even before we started calling it ELT, this was essentially a common set that we would use in the typical ETL process. So again, if we have similar data from multiple places, we are going to want to match that data up. Let's say we have the state field for a customer in one system, and the same field in another, and one it's the abbreviation, the ISO code, and the other one it's fully spelled out. We may have to have these mapping tables we build in between that allow us to join them so that in our final table that we use or our final set of tables, if we're building something more robust, we have a standardized way of looking at state codes. So there's a lot that happens in the transformation step and you can go really deep into making your data really crisp and clean. So that way, when you go to present that data, it is even better and it is just even easier to ask questions of. And in terms of diving deeper into the ETL process, the Kimball Group did a great job explaining this with all the books that they've put out there. So you can go check those out. They're definitely something to read deeper into this nine or 10-step process we summarize as ETL. But in the end, the idea is to have a database that has all the data we need in a really standardized format, it's not missing values or anything like that, and we can query it using SQL or we can connect something, like a business intelligence tool on top of it.

Inhalt