Posts

Showing posts from November, 2020

Python in ETL - Data Analytics world

Image
ETL Data Pipeline concept flow in architecture design: Use case:             ETL is the process of fetching data from one or more source systems and loading it into a target data warehouse/data base after doing some intermediate transformations. Using Python for data processing, data analytics, and data science, especially with the powerful Pandas library. Extract:       This is the process of extracting data from various data sources.  Should include file formats like csv, xls, xml and json. The Script performs all operations on the source directory. Transform:      The extracted data is cleansed and transformed into a meaningful form of storing it in a database. Doing data transformation features like row operations, joins, sorting and aggregations. Load:      In the load process, the transformed data is loaded into the target databases like MS SQL, Snowflake, Oracle along with local systems, If ...