KEMBAR78
Sanjana Data Engineer | PDF | Apache Spark | Apache Hadoop
100% found this document useful (1 vote)
267 views4 pages

Sanjana Data Engineer

Data engineer

Uploaded by

babjileo22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
267 views4 pages

Sanjana Data Engineer

Data engineer

Uploaded by

babjileo22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Sanjana Reddy Arepally

Sr. Data Engineer


781-901-8610 | sanjana.re.ar@gmail.com

TECHNICAL SKILLS PROFESSIONAL SUMMARY


Programming Languages and Tools:  Experienced Sr. Data Engineer with over 10+ years of hands-on
Python, Scala, SQL, PL/SQL, PowerShell, expertise in designing, developing, and maintaining data pipelines
Pandas, NumPy, Scikit-Learn, TensorFlow and analytics solutions.
Cloud Computing Platforms: AWS (S3,  Proficient in various programming languages, including Python,
EC2, Lambda, Glue, Data Pipeline, SQS, Scala, SQL, PL/SQL, PowerShell, and data science libraries such as
SNS, CloudFormation, Redshift, EMR, Pandas, NumPy, Scikit-Learn, and TensorFlow.
Kinesis, AWS KMS, CloudTrail, IAM roles  Extensive cloud platform experience with AWS (S3, EC2, Lambda,
and policies, CloudFront), Azure (Azure Glue, Data Pipeline, SQS, SNS, CloudFormation, Redshift, EMR,
Data Lake Storage, Azure Event Hubs, Kinesis, AWS KMS, CloudTrail, IAM roles and policies, CloudFront),
Azure Data Factory, Azure DevOps, Azure Azure (Azure Data Lake Storage, Azure Event Hubs, Azure Data
AD), Google Cloud Platform (GCP) Factory, Azure DevOps, Azure AD), and Google Cloud Platform (GCP)
(Dataprep, Dataflow, Pub/Sub, BigQuery) (Dataprep, Dataflow, Pub/Sub, BigQuery).
Big Data and Data Processing: Spark,  Strong knowledge of real-time data streaming technologies,
SparkSQL, PySpark, MapReduce, Hive, including Kinesis, Azure Event Hubs, Pub/Sub, and Kafka, coupled
HBase, Pig, Databricks, Kafka, Hadoop, with expertise in big data processing frameworks like Spark,
CDH, Sqoop SparkSQL, PySpark, MapReduce, Hive, HBase, Pig, Databricks, Kafka,
Data Integration and ETL: AWS Glue, Hadoop, CDH, and Sqoop.
Data Pipeline, ADF, Informatica, Sqoop,  Skilled in data integration tools such as AWS Glue, Data Pipeline,
REST APIs, Data vault modeling, Data ADF, Informatica, Sqoop, and REST APIs, focusing on data vault
ingestion techniques modeling and efficient data ingestion techniques.
Databases: MySQL, PostgreSQL, Oracle,  Proficient in managing various database systems, including MySQL,
SQL Server, MongoDB, Cassandra PostgreSQL, Oracle, SQL Server, MongoDB, Cassandra, Redshift,
Data Warehousing: Redshift, Snowflake, Snowflake, and BigQuery, ensuring optimal data storage and
BigQuery retrieval.
Streaming and Messaging Systems:  Experienced in data visualization tools like Tableau, Power BI, and
Kinesis, Azure Event Hubs, Pub/Sub, Kafka Google Data Studio, providing insightful analytics and reports to
Visualization and Reporting Tools: stakeholders.
Tableau, Power BI, Google Data Studio  Strong understanding of cloud security protocols, including AWS
Infrastructure and Automation: Ansible, IAM, Azure AD, and OAuth, with expertise in infrastructure
Terraform, Docker, Kubernetes, Jenkins automation using Ansible, Terraform, Docker, Kubernetes, Jenkins,
Version Control: Git, GitHub, BitBucket and version control systems like Git, GitHub, and BitBucket.
Project Management Methodologies:  Proven track record of working in Agile, Scrum, and Kanban
Agile, Scrum, Kanban, JIRA environments, utilizing tools like JIRA for effective project
Security and Access Control: AWS IAM, management and collaboration.
Azure AD, OAuth  Ability to analyze complex data problems, identify critical issues,
and develop practical solutions.
EDUCATION  Capacity to evaluate data sources, methodologies, and outcomes
critically to ensure accuracy and relevance.
 Proficient in clear and concise communication, adept at
collaborating with diverse teams and stakeholders. Skilled in
articulating technical concepts to non-technical audiences,
facilitating effective discussions, and presenting findings
convincingly.

WORK EXPERIENCE

Lowe's, Mooresville, NC | Nov 2022 - Present


Sr. Data Engineer
 Designed and implemented SDLC processes for AWS services, including S3 for data storage, EC2 for virtual
servers, and Lambda for serverless computing, ensuring efficient data processing and scalability.
 Utilized Glue for ETL processes and Data Pipeline for data integration, optimizing data workflows and
pipeline efficiency.
 Implemented SQS for message queuing and SNS for notification systems, ensuring reliable communication
and event-driven architectures.
 Utilized AWS CloudFormation for infrastructure deployment, automating resource provisioning and
management.
 Implemented CloudFormation for infrastructure as code (IaC), streamlining the provisioning and
deployment of AWS resources for enhanced consistency and reproducibility.
 Leveraged Spark, SparkSQL, and PySpark for big data processing and machine learning tasks, enhancing
data analytics capabilities.
 Utilized TensorFlow for machine learning model development, enabling scalable and efficient training
workflows.
 Managed Redshift for data warehousing, optimizing query performance and data storage.
 Implemented EMR for Elastic MapReduce, processing large-scale data sets efficiently.
 Utilized Kinesis for streaming data processing, enabling real-time analytics and insights.
 Implemented AWS Key Management Service (KMS) for data encryption, ensuring data security and
compliance with regulations.
 Developed Python scripts to parse XML, Json files and load the data in Snowflake Data warehouse.
 Implement One-time Data Migration of Multistate level data from SQL server to Snowflake using Python and
SnowSQL.
 Designed and developed the data warehouse models using Snowflake and Star schema.
 Used efficient data modelling techniques to reduce the size and performance of many existing production
reports.
 Configured ELK stack for log analysis and monitoring, optimizing system performance and troubleshooting.
 Managed CI/CD pipelines using Jenkins, automating build, test, and deployment processes for data
solutions.
 Utilized CloudFront for content delivery, optimizing data distribution and user experience.
 Implemented CloudTrail for auditing and compliance, ensuring traceability and governance of AWS
resources.
 Oversaw IAM roles and policies for access management, ensuring secure and controlled data and resource
access.
 Implemented data encryption techniques for sensitive data protection, ensuring data privacy and security.
 Utilized Erwin for data modeling, ensuring data accuracy, consistency, and efficient database design.
 Automated infrastructure provisioning and configuration using Ansible, improving deployment speed and
efficiency.
 Created interactive visualizations and dashboards using Tableau, providing actionable insights to
stakeholders.
 Utilized MapReduce, Hive, HBase, and Pig for data processing tasks, optimizing data workflows and
analytics.
 Implemented Agile and Scrum methodologies, managing project tasks and workflows efficiently using JIRA.
Environment: AWS, S3, EC2, Lambda, AWS CloudFormation, Spark, SparkSQL, PySpark, Redshift, EMR, Kinesis
AWS KMS, ELK, Jenkins, CloudFront, CloudTrail, IAM, Erwin, Ansible, Tableau, MapReduce, Hive, HBase, Pig, Agile,
Scrum, JIRA.

BNY Mellon, NYC, NY | Nov 2021 - Oct 2022


Data Engineer
 Implemented Azure services, including Azure Data Lake Storage for data ingestion, Azure Event Hubs for
real-time data streaming, and Azure Data Factory (ADF) for building data pipelines.
 Utilized Databricks for big data processing with Apache Spark, optimizing data workflows and analytics.
 Developed REST APIs for data integration and system interactions, ensuring seamless data communication
and integration across platforms.
 Utilized Erwin for data modeling and dimensional modeling techniques, ensuring data accuracy and
consistency.
 Managed SQL Server and Cassandra databases, optimizing data storage and retrieval for efficient data
processing.
 Implemented data vault modeling for data warehouse architecture, ensuring data traceability and
scalability.
 Leveraged Snowflake Data Warehouse for cloud-based data storage and analytics, optimizing query
performance and data processing speed.
 Implemented Kafka for real-time data streaming and message processing, ensuring data reliability and low
latency.
 Utilized Terraform for infrastructure as code (IaC) management, automating data infrastructure
deployment and scaling.
 Managed CI/CD pipelines using Azure DevOps, enabling continuous integration and deployment of data
solutions.
 Implemented Azure AD for authentication and access control, ensuring data security and compliance.
 Worked with Hadoop, Spark, and MapReduce for large-scale data processing, parallel computing, and
analytics tasks.
 Utilized Databricks for big data processing and analytics, optimizing data workflows and enhancing
processing performance.
 Implemented Kafka for real-time data streaming pipelines, enabling efficient processing and analysis of
streaming data.
 Implemented Snowflake for data warehousing and applied Hadoop, Spark, Hive, Pig, and Sqoop for diverse
data processing solutions.
 Managed Cloudera Distribution for Hadoop (CDH) clusters for reliable and scalable extensive data
processing infrastructure.
 Employed Pandas and NumPy for data manipulation and analysis tasks, enhancing data processing
efficiency and accuracy.
 Orchestrated containerized applications using Docker and managed containerized environments using
Kubernetes for portability and scalability of data processing workloads.
 Processed JSON data structures for efficient data handling and integration within data pipelines.
 Used efficient data modelling techniques to reduce the size and performance of many existing production
reports.
 Automated administrative tasks and system configurations using PowerShell scripting, enhancing
operational efficiency.
 Used Pandas and NumPy for data manipulation and analysis, enhancing data insights and decision-making
processes.
 Developed machine learning models using TensorFlow, integrating predictive analytics into data solutions.
 Managed code versioning and collaboration using GitHub, ensuring code quality and team productivity.
 Containerized applications using Docker for efficient deployment and scalability, orchestrating containers
with Kubernetes.
 Developed interactive dashboards and reports using Power BI, delivering actionable insights to
stakeholders.
 Implemented Agile and Scrum methodologies, managing project tasks and workflows efficiently using JIRA.
Environment: Azure, ADF, Databricks, Apache Spark, REST APIs, Erwin, PowerShell, SQL Server, Cassandra,
Snowflake, Kafka, Terraform, Azure DevOps, Azure AD, Hadoop, Spark, MapReduce, Pandas, NumPy, TensorFlow,
GitHub, Docker, Kubernetes, Power BI, Agile, Scrum, JIRA

Kretoss Technology, Gujarat, India | March 2018 - Aug 2021


Data Engineer
 Utilized Python, Scala, and Informatica for data processing and transformation for high data quality and
consistency.
 Implemented Google Cloud Platform (GCP) services such as Dataprep for data preprocessing, Dataflow for
batch and stream processing, and Pub/Sub for real-time messaging.
 Leveraged NumPy, Pandas, and Scikit-Learn for data analysis, manipulation, and machine learning model
development.
 Created interactive visualizations and dashboards using Tableau for data insights and decision-making.
 Conducted data analysis and reporting using Google Analytics and Google Data Studio, providing actionable
insights to stakeholders.
 Implemented OAuth authentication for secure data access and integration across platforms.
 Worked with JSON format for data interchange and storage, ensuring compatibility and efficiency in data
processing.
 Utilized Cloudera Distribution for Hadoop (CDH) to manage and query large-scale datasets in HDFS.
 Designed and optimized BigQuery data warehouse for efficient data storage and retrieval.
 Managed PostgreSQL databases, ensuring data integrity, availability, and performance.
 Developed and optimized Hive queries for querying and analyzing large-scale datasets in HDFS.
 Used Sqoop for data ingestion from relational databases into HDFS, ensuring data consistency and
reliability.
 Implemented Apache Spark for large-scale data processing, parallel computing, and machine learning tasks.
 Designed and maintained CI/CD pipelines using Jenkins and Maven for automated deployment and testing
of data pipelines.
 Monitored and optimized system performance using New Relic, ensuring data infrastructure scalability and
reliability.
 Worked in Agile and Kanban methodologies, managing tasks and workflows efficiently using ServiceNow.
Environment: Python, Scala, Informatica, GCP, NumPy, Pandas, Scikit-Learn, Tableau, Google Analytics, OAuth,
JSON, CDH, BigQuery, PostgreSQL, Hive, Sqoop, HDFS, Apache Spark, Jenkin. Maven, New Relic, Agile, Kanban,
ServiceNow

Marolix Technology Solutions PVT LT, Hyd, India| Nov 2014 - Feb 2018
Data Engineer
 Developed and optimized SQL queries, PL/SQL procedures, functions, and triggers for efficient data
retrieval and manipulation.
 Utilized Python for data extraction, transformation, and loading tasks, ensuring data accuracy and integrity.
 Managed AWS services, including S3 for data storage, EC2 for computing, RDS for relational databases, and
Lambda functions for serverless computing.
 Utilized Apache Sqoop and AWS (S3) to efficiently transfer large datasets between relational databases and
Hadoop Distributed File System (HDFS).
 Developed Python scripts to automate data extraction, transformation, and loading (ETL) processes.
 Designed, developed, and maintained complex database structures using Oracle.
 Leveraged stored procedures to enhance database performance and data integrity.
 Implemented and managed data warehouses using tools like AWS (RDS).
 Implemented and maintained Oracle databases, ensuring high availability and performance through
indexing and query optimization.
 Worked with Big Data solutions in the Hadoop ecosystem, including HDFS for storage, Spark for data
processing, and Sqoop for data integration.
 Conducted version control using Git, ensuring code consistency and collaboration within the development
team.
 Utilized Splunk for log analysis, monitoring, and troubleshooting to ensure system reliability and
performance.
 Performed performance tuning and optimization on databases and queries to improve overall system
efficiency.
 Optimized Hive queries for Big Data analytics, enhancing query performance and data processing speed.
 Managed and tracked issues using Bugzilla, ensuring timely resolution and effective communication within
the team.
Environment: SQL, PL/SQL, Python, AWS (S3, EC2, RDS, Lambda functions), Oracle, Hadoop, HDFS, Spark, Sqoop,
Git, Splunk, Hive, Bugzilla.

You might also like