Posts

Python in ETL - Data Analytics world

Image
ETL Data Pipeline concept flow in architecture design: Use case:             ETL is the process of fetching data from one or more source systems and loading it into a target data warehouse/data base after doing some intermediate transformations. Using Python for data processing, data analytics, and data science, especially with the powerful Pandas library. Extract:       This is the process of extracting data from various data sources.  Should include file formats like csv, xls, xml and json. The Script performs all operations on the source directory. Transform:      The extracted data is cleansed and transformed into a meaningful form of storing it in a database. Doing data transformation features like row operations, joins, sorting and aggregations. Load:      In the load process, the transformed data is loaded into the target databases like MS SQL, Snowflake, Oracle along with local systems, If ...

Automation using aws lambda when loading Json file from S3 bucket to DynamoDB

Image
Description :        Here the requirement is processing a Json file from S3 Bucket to Dynamo DB .  We need an  automating  process in order to load S3 Bucket information to Dynamo DB. Here we are using lambda function with python boto3 to achieve it .  Whenever any new data is inserted on S3 Bucket, data gets automatically triggered and will be moved to Dynamo DB Use Case: Assume a scenario in which if there is a new entry for an invoice, the data must be moved to a destination database Example:  Payment Transactions                     Step 1:  1.  Sign in to the AWS Management Console and open the  Amazon S3  console 2.  Choose  Create Bucket .The Create bucket Wizard opens  3.  In  Region  Choose the AWS Region where you want the bucket to reside and Upload Json file For previewing the data, Click on “ sel...