SENIOR DATA ENGINEER
Name:Pavan kumar
Email: mpkr523@gmail.com
Phone: 608-501-1257
LinkedIn: www.linkedin.com/in/reddy885
PROFESSIONAL SUMMARY:
Over 10 years of experience as a Senior Data Engineer, specializing in the design and
implementation of large-scale data solutions across cloud platforms such as AWS, Azure, and
GCP.
Proficient in big data management and analysis using technologies like Hadoop YARN, Spark,
and Spark Streaming, significantly enhancing data processing capabilities for diverse projects.
Advanced programming skills in Scala, Java, and Python, allowing for the development of robust
data processing scripts and automation tools that increase efficiency and accuracy.
Expertise in real-time data processing and analytics with Kafka, AWS Kinesis, and Spark
Streaming, enabling dynamic data insights and informed decision-making.
Demonstrated experience in data warehousing using AWS Redshift and SQL Server, optimizing
data storage and retrieval processes for improved performance.
Designing and maintaining scalable DBT ETL pipelines and Agent,Force.com for data ingestion and
transformation. Proficient in leveraging GIT CI/CD for streamlined deployment and versioning, and
optimizing end-to-end data management with automated workflows.
Skilled in data integration and ETL processes with tools like Informatica, Sqoop, and AWS Data
Pipelines, Matillion ETL ensuring seamless data flow between disparate sources.
Proficient in developing and maintaining scalable data pipelines with AWS Glue and Talend,
aligning data architecture with evolving business requirements.
Strong background in database management with technologies such as Cassandra, Oracle 12c,
and DynamoDB, maintaining high standards of data integrity and availability.
Experienced in cloud technologies and services across Azure, GCP, and AWS, including Azure
Data Factory, ADLS,GCP BigQuery, and AWS EMR.
Comprehensive knowledge of data visualization and business intelligence tools like Tableau,
Power BI, and AWS Quick Sight, GCP Data Studio, transforming complex datasets into
actionable insights.
Proficient in Linux system administration and scripting, supporting data operations and platform
management effectively.
Utilized Control-M and Oozie for job scheduling and workflow management, enhancing process
automation and operational efficiency.
Advanced knowledge of GCP services including Cloud Functions, Cloud Dataflow, and Cloud
Spanner, optimizing cloud-based data solutions.
In-depth experience with Azure Synapse Analytics, Snowflake, and Azure DevOps,Event Hub
facilitating data warehousing, collaboration, and CI/CD pipelines.
Proficiency in developing data strategies and architectures using tools like Erwin and Visio,
ensuring alignment with business goals and regulatory standards.
Strong foundation in SQL, PL/SQL, and data manipulation, coupled with extensive experience in
cross-platform data migrations and upgrades.
Knowledge of SDLC lifecycle methodologies, including Agile and Waterfall, to ensure successful
project delivery and alignment with stakeholder requirements.
TECHNICAL SKILLS:
Category Skills
AWS (Redshift, S3, Data Pipelines, Glue, EMR, EC2, RDS, DynamoDB,AWS Quick
sight,CDN,VPC,Simple Queuing Service,), GCP (BigQuery, GCS Bucket, G-Cloud
Function, Apache Beam, Cloud Dataflow, Cloud Shell, Gsutil, Bq Command Line Utilities,
Dataflow, Dataproc, Cloud Composer, Cloud Pub/Sub, Cloud Storage Transfer Service,
Cloud Spanner, Cloud SQL, Data Catalog, Databricks, VM Instances, Cloud SQL), Azure
(Data Factory, Data Lake, Synapse Analytics, DevOps, Event Hubs, Synapse, Data
Factory, ADLS,Databricks, Service Bus, SQL, Cosmos DB, Log Analytics, AKS, Event Hub,
Cloud Platform Service Bus, Key Vault, App Insights, VM creation)
Programming
Languages Python, Scala,Java, SQL, PL/SQL
Category Skills
BigData
Technologies Hadoop YARN, Spark, Spark Streaming, Hive, Map Reduce, Sqoop, Cassandra,Kafka
ETL/Integration
Tools Informatica, Talend, Fivetran, Oozie, Control-M, Snowflake,Airflow
Databases Oracle 12c, SQL Server, MySQL, PostgreSQL, Teradata
Visualization
Tools Tableau, PowerBI, BI Business Objects
Development
Tools Erwin, Visio, TOAD for Oracle, SharePoint, Windows 10
Management &
QA HP QC/ALM, Agile, Excel, Access, MS PowerPoint
PROFESSIONAL EXPERIENCE
LEAD DATA ENGINEER || Salesforce, Dallas, TX || August 2024 to Present
Responsibilities:
Designed and implemented scalable data pipelines to extract, transform, and load data from
diverse sources, including Salesforce APIs and external databases, using DBT for efficient data
ingestion and transformation.
Developed and maintained DBT ETL pipelines to automate data workflows, ensuring repeatability
and reusability while optimizing data transformation processes.
Leveraged GIT CI/CD for deploying and versioning ETL processes, maintaining high code quality
and streamlined DevOps practices.
Created end-to-end data management solutions, including identifying key fields, establishing data
lineage, and performing data quality checks for consistent and accurate data integration.
Built metrics from unstructured data sources, such as chat transcripts and AI agent product logs
and Force.com, to develop advanced conversational analytics, enhancing customer experience
insights.
Designed and developed metrics from structured data, including Salesforce Data Cloud,
Snowflake tables, and Salesforce Sales Cloud objects, enabling comprehensive business
analytics.
Collaborated with cross-functional teams, including data analysts, software engineers, and product
managers, to translate business requirements into data-driven solutions and actionable insights.
Optimized data storage, retrieval, and processing for high-performance analytics, ensuring
scalability with increasing data volumes.
Utilized Airflow to orchestrate and monitor data pipeline tasks, ensuring high availability and
minimal latency.
Innovated methods for managing, transforming, and validating data using Matillion ETL, Spark,
Kafka, and Hive for big data processing.
Automated data operations using Unix, Shell Scripting, and Bash to enhance system
performance and workflow management.
Applied quality assurance best practices, including testing, debugging, and validation, ensuring
reliability and accuracy in all data products.
Conducted statistical analysis on large-scale datasets, identifying patterns and actionable insights
to support strategic decision-making.
Defined and upheld data engineering best practices, including code quality, version control, and
testing standards to maintain high development standards.
Supported financial services and commercial banking operations through robust data engineering
solutions, ensuring compliance with industry standards.
SENIOR DATA ENGINEER || AIG, Charlotte, NC || August 2023 to July 2024
Responsibilities:
Managed Azure Data Lake, Azure Synapse Analytics,Kafka Streaming,Azure SQL
Database, and Azure Blob Storage to store and process large-scale insurance data, ensuring
efficient data management and scalability.
Developed and maintained ETL/ELT pipelines using Azure Data Factory and Azure Databricks
for data ingestion, transformation, and integration, optimizing data workflows for real-time
insurance analytics.
Ensured data quality and security by leveraging Azure Key Vault, Azure Active Directory, and
Azure Security Center, safeguarding sensitive client information.
Designed and implemented scalable data models in Azure SQL Database and Azure Synapse
Analytics to support complex insurance data queries and reporting.
Utilized Terraform and Azure Resource Manager (ARM) templates to automate infrastructure
deployment, improving consistency and reducing manual effort in cloud environments.
Led a team of engineers to develop solutions using Python, SQL, and Spark for data processing
and analysis, streamlining claims processing and fraud detection.
Employed Apache Kafka and Databricks to handle real-time streaming data, ensuring near real-
time updates and processing for insurance transactions and customer data.
Utilized Power BI to create interactive dashboards and visualizations for key insurance
stakeholders, enhancing decision-making through actionable insights.
Collaborated with cross-functional teams using Azure DevOps, Git, and Docker for version
control, CI/CD pipelines, and containerization, streamlining development processes.
Deployed and managed Kubernetes clusters for scalable microservices architectures, ensuring
high availability and reliability for insurance applications.
Developed and managed large-scale data solutions using Azure Data Lake Storage
(ADLS) to store, process, and analyze massive datasets for insurance claims, policies, and
customer data, ensuring high availability and scalability of the data infrastructure.
Designed and implemented secure, cost-effective data pipelines using Azure Data
Factory and ADLS to ingest, transform, and store sensitive insurance data, adhering to regulatory
compliance requirements like HIPAA and ensuring data governance with Azure Purview.
Developed and integrated REST APIs to enable seamless communication between systems,
facilitating data exchange and operational efficiency across insurance platforms.
Proactively monitored data pipeline performance using Azure Monitor and conducted root-cause
analysis to resolve bottlenecks and maintain high system performance.
Implemented Azure Purview for data cataloging and data governance, ensuring compliance
with regulatory requirements in the insurance industry.
Utilized Apache Hadoop for large-scale data processing and storage, optimizing historical
insurance data analysis.
Leveraged Linux environments for managing and automating batch jobs, enhancing system
reliability and operational efficiency.
Created automated data workflows using Azure Functions and Event Hub to trigger specific
tasks, streamlining insurance data processing operations.
SENIOR DATA ENGINEER || MOLINA HEALTHCARE, BOTHELL, WA || November 2021 to July 2023
Responsibilities:
Managed and optimized healthcare data storage using AWS Redshift, AWS S3, and AWS RDS,
ensuring secure and scalable storage solutions for patient data and health records.
Developed and maintained data pipelines using AWS Data Pipelines and AWS Glue to automate
the ingestion, transformation, and loading (ETL) of healthcare data from multiple sources.
Processed large volumes of healthcare data using PySpark, Spark Streaming, and Hadoop
YARN, enabling real-time processing for patient monitoring and operational analytics.
Designed and implemented real-time data streaming solutions using Kinesis and Scala, ensuring
continuous flow and processing of health data for real-time decision-making in patient care.
Leveraged DynamoDB and Cassandra for scalable and high-availability NoSQL database
solutions, enabling fast access to patient data and records.
Ensured seamless data integration across healthcare systems by utilizing Sqoop, Informatica,
and Talend to migrate data from relational databases such as SQL Server into cloud platforms.
Built and maintained CI/CD pipelines to streamline development and deployment of healthcare
applications, ensuring rapid and reliable delivery of new features and updates.
Developed data visualization dashboards using Tableau to provide actionable insights on patient
data, clinical outcomes, and operational metrics to healthcare providers and stakeholders.
Scheduled and monitored batch jobs using Oozie and Control-M, ensuring timely execution of
critical data processing workflows in healthcare operations.
Deployed and managed AWS EC2 and EMR clusters for scalable data processing, enabling the
analysis of massive healthcare datasets and predictive modeling.
Integrated AWS Lambda and EKS to support microservices-based architectures for healthcare
applications, improving scalability and reliability.
Utilized Kinesis and Simple Queue Service (SQS) for handling high-throughput messaging
between healthcare systems, ensuring reliable data exchange and event-driven architectures.
Monitored system performance and application health using CloudWatch, identifying and resolving
bottlenecks in real-time to maintain the availability of healthcare applications.
Secured healthcare applications and data using IAM roles, policies, and best practices, ensuring
compliance with healthcare regulations such as HIPAA.
Created and deployed infrastructure as code (IaC) using CloudFormation, streamlining the
provisioning and management of cloud resources for healthcare environments and Interactive
dynamic Dashboards AWS Quick Sight.
Implemented Simple Notification Service (SNS) for sending real-time notifications and alerts to
healthcare teams regarding critical events and system updates.
Utilized AWS CDN for improving the performance and availability of healthcare applications by
caching and delivering content globally.
DATA ENGINEER || AgFirst Farm Credit Bank, Columbia, SC || August 2019 to October 2021
Responsibilities:
Developed a scalable data warehouse using BigQuery, optimizing query execution for real-time
analysis of large-scale banking transactions, enabling faster fraud detection and operational
insights.
Managed secure data storage using GCS Buckets, ensuring efficient and organized storage of
sensitive financial data, facilitating smooth data access and retrieval for analytics teams.
Automated data processing workflows using G-Cloud Functions, reducing manual intervention
and improving data accuracy in daily financial transaction processing.
Built robust data pipelines using Apache Beam and Cloud Dataflow, enabling near real-time data
ingestion and processing, significantly enhancing the bank’s ability to monitor transaction patterns
and identify anomalies.
Deployed microservices on GKE (Google Kubernetes Engine) to handle large-scale real-time
transaction processing, ensuring high availability, scalability, and fault tolerance for critical banking
applications.
Configured and maintained VM Instances and Cloud SQL databases, providing the infrastructure
needed to support high-volume banking applications, ensuring consistent performance for
transaction monitoring.
Orchestrated data workflows with Cloud Composer, automating the scheduling and monitoring of
complex data pipelines, ensuring timely data availability for reporting and analytics.
Leveraged Cloud Pub/Sub for real-time messaging between banking systems, enabling seamless
integration across distributed applications for real-time updates on transaction statuses.
Implemented Cloud Storage Transfer Service to migrate legacy banking data from on-premise
systems to the cloud, ensuring minimal disruption and reliable data synchronization during
migration.
Administered Cloud Spanner and MySQL databases to manage high-volume transactional data,
ensuring strong consistency and performance for mission-critical banking applications.
Utilized Data Catalog for metadata management, improving data governance and discoverability,
ensuring the accuracy and relevance of banking datasets used by the analytics teams.
Engineered advanced analytics and machine learning pipelines using GCP Databricks, enhancing
real-time fraud detection models and enabling predictive analytics for transaction trends.
Managed Dataproc clusters for processing large datasets with Spark and Hive, optimizing resource
allocation and enhancing processing speed for large-scale banking data analytics.
Optimized SQL and PostgreSQL databases for better performance and scalability, supporting
banking applications that handle high-frequency transactions and data-heavy operations.
Programmed in Python and Scala to automate complex data processing workflows, increasing the
efficiency of data ingestion, transformation, and reporting pipelines.
Employed Spark-SQL and Sqoop to facilitate seamless data transformation and migration
between Hadoop ecosystems and relational databases, ensuring data integrity during transitions.
Used Cloud Shell and gsutil commands to automate routine administrative tasks, streamlining
cloud operations and improving the efficiency of the cloud infrastructure management.
Monitored data processing jobs using Bq Command Line Utilities, ensuring operational stability and
cost efficiency for all data workflows in BigQuery.
Collaborated with financial analysts to align data solutions with regulatory requirements and
business objectives, improving decision-making processes with actionable insights.
Documented all workflows and development processes, ensuring compliance with data governance
policies and making development artifacts available for future reference.
Provided technical leadership in the implementation of GCP solutions, facilitating smooth cloud
adoption and enabling the bank’s digital transformation initiatives.
Ensured compliance with data security and privacy regulations, implementing best practices for
protecting sensitive customer information and adhering to industry standards.
Regularly analyzed and reported on pipeline performance, proposing improvements to enhance
scalability, reduce latency, and optimize resource usage for real-time transaction monitoring.
DATA ENGINEER || Homesite insurance, Boston, MA || September 2017 to July 2019
Responsibilities:
Created interactive dashboards and reports using PowerBI, enabling stakeholders to visualize and
understand business metrics effectively.
Administered SharePoint for document management and collaboration across the data engineering
team.
Maintained and configured Windows 10 systems used for developing and testing data solutions
within Azure environments.
Utilized Hadoop, Hive, and Map Reduce and ADLs to process large datasets, enhancing data
processing capabilities for insurance analytics.
Managed Teradata systems to execute complex SQL queries, supporting critical data warehousing
operations.
Orchestrated data integration and streaming solutions using Azure Event Hubs and Service Bus,
facilitating real-time data feeds.
Implemented and maintained Azure Databricks for advanced data analytics and machine
learning workflows.
Configured and maintained Azure SQL and Cosmos DB for high-performance and scalable
database solutions.
Utilized Log Analytics to monitor and analyze system performance, ensuring optimal operation of
data processes.
Managed Azure Kubernetes Service (AKS) to orchestrate containerized applications, improving
deployment speed and scalability.
Designed and implemented security measures using Azure Key Vault to manage and protect
encryption keys and secrets.
Set up and monitored Azure VMs for various data processing tasks, ensuring efficient resource
utilization.
Enhanced system insights through Application Insights, tracking live applications to detect and
solve performance anomalies.
Developed and enforced data governance and compliance protocols using Azure's security and
compliance frameworks.
Automated data backup and disaster recovery procedures, ensuring data integrity and availability
across systems.
Collaborated with insurance analysts to define data requirements and build tailored data models
that support specific business needs.
Documented all data engineering processes and system configurations to ensure consistent
operations and ease of maintenance.
Provided technical support and training to team members on Azure platform tools and best
practices.
Monitored data pipelines and databases regularly, proactively addressing any issues to ensure
smooth data operations.
Engaged in ongoing professional development to stay current with the latest Azure features and
data engineering practices.
Developed and maintained Azure Data Factory pipelines for automated data ingestion and
transformation, enhancing data availability for analysis.
Managed Azure Data Lake storage solutions, ensuring secure and scalable data storage for
insurance claim data.
Leveraged Azure Synapse Analytics to perform complex data queries and generate insightful
analytics reports for risk assessment.
Configured and utilized Azure DevOps for CI/CD pipelines, improving deployment processes and
collaboration across teams.
Integrated Snowflake with Azure services to support dynamic data warehousing needs and real-
time data analytics.
BIGDATA ENGINEER || GREEN APEX SOLUTIONS LIMITED,India || August 2014 to June 2017
Responsibilities:
Designed, built, and maintained large-scale data processing systems for both structured
and unstructured data using technologies such as Hadoop, MapReduce, and Spark, ensuring high
performance and reliability.
Utilized programming languages like Java, and Scala to develop efficient code for data
processing tasks, enhancing system capabilities and optimizing resource usage.
Worked with data storage solutions such as Hadoop Distributed File System (HDFS) and NoSQL
databases, including Cassandra, MongoDB, and HBase, to meet diverse data storage needs.
Created automation scripts to streamline data workflows and optimize data processing
performance, reducing manual intervention and increasing operational efficiency.
Designed and implemented data ingestion processes from various data sources, including APIs,
databases, and real-time streaming platforms, ensuring seamless data integration.
Developed robust ETL (Extract, Transform, Load) processes using Informatica to transform
raw data into meaningful insights, facilitating informed decision-making for clients.
Employed the Waterfall model for project management, ensuring structured phases of
development, from requirements gathering to implementation and maintenance.
Collaborated with data scientists, data analysts, product managers, and other
stakeholders to gather business requirements, delivering tailored data solutions that aligned with
organizational goals.
Utilized data modeling tools like Erwin to design and visualize data structures, improving data
architecture and governance.
Created dynamic dashboards and reports using Tableau and Power BI, providing clients
with actionable insights and visualizations to support data-driven decision-making.
Monitored and troubleshot data systems and pipelines to ensure high availability and
reliability, promptly addressing any issues that arose.
Optimized the performance of data pipelines and processing tasks using MapReduce to
efficiently handle large data volumes, ensuring timely data delivery for analytics and reporting.
Developed comprehensive documentation for data workflows, ETL processes, and data
pipelines using Visio to promote transparency and facilitate knowledge transfer within teams.
Managed data stored in Teradata databases, ensuring high performance and consistency for
transactional data.
Engaged in continuous learning and adoption of new tools and techniques to enhance
data engineering practices, ensuring the team remained at the forefront of industry trends and
technologies.
Utilized SharePoint and Windows for project collaboration and documentation management,
streamlining communication among team members.
EDUCATION:
JNTUH ||B-TECH CSE ||Aug 2010 to Aug 2014