KEMBAR78
Going Serverless with AWS Lambda at ReportGarden | PPTX
Going Serverless
with AWS Lambda
Gandhi Jay
Why talk about Serverless?
● Web Scraping - Actual use case from Tribelocal
○ Use case is to find Business is listed correctly, incorrectly or not listed on directory.
○ Process
■ Scrap the directory’s web page with parameter
■ Parse businesses and it’s related information
■ Find matching record from parsed data
■ If any information is incorrect or business is not listed, Notify the user.
○ Each directory might have different way to data.
Traditional approach
● In Monolith,
○ Create Directory DataLoader
○ Write XPaths/Css Selectors
○ Write code to scrap directory’s webpage
○ Write code to parse Businesses
○ Write code to match Business and find out it’s correct, incorrect, duplicate or not
listed.
Repeat the process for another directory.
Traditional approach: is it good?
● As number of locations increases requests are
increasing.
● Simple Web scraping problem suddenly
becomes distributed computing nightmare.
● A service take independent functions
● runs it in parallel "containers"
○ can be monitored separately
○ can be scaled separately and automatically
● Pay per execution. (in ms)
● Lets you focus on application, not infrastructure.
What is Serverless?
There are actual servers, located somewhere but you don’t have to worry about.
Someone said use Serverless,
Backend as a Service Function as a Service Serverless+ =
Serverless Approach
● Choose your runtime.
● Write code for AWS Lambda.
● Put it into S3
● Configure AWS Lambda
● Configure Services like API Gateway, SNS etc.
● Test it.
Benefits of Serverless
● Increases flexibility to scale
○ Provisioning based on usage, not instances
○ Self auto-scaling and auto-provisioning
● Reduced time to write code
● Minimum risk
○ No living host or instances
● Decreased Time to Market
● Encourages a modular, well encapsulated, loosely coupled architecture
● Reduced resource cost
○ Costs based on precise usage (no usage = no cost)
Service Providers - Serverless
● AWS Lambda
● Google Cloud Functions
● Azure Functions
● Auth0 Webtask
● IBM OpenWhisk
At ReportGarden - AWS Lambda
● Function as a Service platform
○ Billed per 100ms
○ Node.js, Python, C# and JRE(OpenJDK)
● Event-driven
● Asynchronous invocation
● Can be integrated with other AWS service
● AWS Serverless Ecosystem
○ Lambda
○ API Gateway
○ SNS
○ SQS
○ DynamoDB
○ S3
○ Kinesis
○ CloudWatch
○ Step Functions
AWS Lambda Runtime Environment
● Memory Available from 128MB to 1.5GB
○ Minimum CPU speed and I/O scale will be based on Memory
● 2 Virtual CPUs
● 512MB /tmp storage
● 50MB compressed(jar/zip) for function, 250MB deployment package
● STDOUT and STDERR, goes to CloudWatch Logs
AWS Lambda - http://docs.aws.amazon.com/lambda/latest/dg/welcome.html
Functions runs when events are triggered
● A POST request is sent to API Gateway
● A scheduled cron triggers
● A new item is added to S3 bucket
● An infrastructure event matching rule is found.
● A SAAS webhook is fired.
● A link in email is clicked.
● A new item is added to DB.
Async, Latency Tolerant WebScraping with AWS Lambda Framework - Scrapie
Request with
JSON spec
AWS API
Gateway
Trigger
Lambda
AWS
SNS
scrape
-job
AWS Lambda
ScraperNotifier
Client
System
AWS Lambda
ScraperJob
Hit Callback
URL with
messageid and
content.
AWS CloudWatch &
Rollbar
(Monitoring & Logging)
AWS CloudFormation
(Orchestration)
Gradle & Serverless
(Build & Deployment)
Helium Scraping
Service
Proxy
Configure Lambda, SNS, API Gateway….
Still lot of work & We are lazy.
DEMO
● Prerequisite
○ Install NVM - Node Version Manager
○ Install serverless
○ Install AWS CLI
○ AWS configure with your AccessKey and SecretKey
○ Good to GO!
Challenges - Serverless
● State
● Latency
● Loss of control
● Testing/Tooling
○ Serverless-offline
○ localstack/localstack
○ But not lot of the options.
● Very low latency
○ High Frequency Trading
● Large Scale, in memory, stateful
○ MFTs of TBs of data.
● Long running, stateful
○ Synchronous external transactions
Terrible use cases -
Serverless
Awesome Serverless use cases.
● Async, Latency tolerant
○ Data Pipelines, Automatic Thumbnail generation, New User welcome emails, PDF Generation
● Sync, Latency tolerant
○ Web apps, APIs
● Glue
○ Infrastructure automation, orchestration
● Analytics Stack
○ AWS Athena, AWS Glue
Thank You.
Gandhi Jay

Going Serverless with AWS Lambda at ReportGarden

  • 1.
    Going Serverless with AWSLambda Gandhi Jay
  • 2.
    Why talk aboutServerless? ● Web Scraping - Actual use case from Tribelocal ○ Use case is to find Business is listed correctly, incorrectly or not listed on directory. ○ Process ■ Scrap the directory’s web page with parameter ■ Parse businesses and it’s related information ■ Find matching record from parsed data ■ If any information is incorrect or business is not listed, Notify the user. ○ Each directory might have different way to data.
  • 3.
    Traditional approach ● InMonolith, ○ Create Directory DataLoader ○ Write XPaths/Css Selectors ○ Write code to scrap directory’s webpage ○ Write code to parse Businesses ○ Write code to match Business and find out it’s correct, incorrect, duplicate or not listed. Repeat the process for another directory.
  • 4.
    Traditional approach: isit good? ● As number of locations increases requests are increasing. ● Simple Web scraping problem suddenly becomes distributed computing nightmare.
  • 5.
    ● A servicetake independent functions ● runs it in parallel "containers" ○ can be monitored separately ○ can be scaled separately and automatically ● Pay per execution. (in ms) ● Lets you focus on application, not infrastructure. What is Serverless? There are actual servers, located somewhere but you don’t have to worry about. Someone said use Serverless, Backend as a Service Function as a Service Serverless+ =
  • 6.
    Serverless Approach ● Chooseyour runtime. ● Write code for AWS Lambda. ● Put it into S3 ● Configure AWS Lambda ● Configure Services like API Gateway, SNS etc. ● Test it.
  • 7.
    Benefits of Serverless ●Increases flexibility to scale ○ Provisioning based on usage, not instances ○ Self auto-scaling and auto-provisioning ● Reduced time to write code ● Minimum risk ○ No living host or instances ● Decreased Time to Market ● Encourages a modular, well encapsulated, loosely coupled architecture ● Reduced resource cost ○ Costs based on precise usage (no usage = no cost)
  • 8.
    Service Providers -Serverless ● AWS Lambda ● Google Cloud Functions ● Azure Functions ● Auth0 Webtask ● IBM OpenWhisk
  • 9.
    At ReportGarden -AWS Lambda ● Function as a Service platform ○ Billed per 100ms ○ Node.js, Python, C# and JRE(OpenJDK) ● Event-driven ● Asynchronous invocation ● Can be integrated with other AWS service ● AWS Serverless Ecosystem ○ Lambda ○ API Gateway ○ SNS ○ SQS ○ DynamoDB ○ S3 ○ Kinesis ○ CloudWatch ○ Step Functions
  • 10.
    AWS Lambda RuntimeEnvironment ● Memory Available from 128MB to 1.5GB ○ Minimum CPU speed and I/O scale will be based on Memory ● 2 Virtual CPUs ● 512MB /tmp storage ● 50MB compressed(jar/zip) for function, 250MB deployment package ● STDOUT and STDERR, goes to CloudWatch Logs AWS Lambda - http://docs.aws.amazon.com/lambda/latest/dg/welcome.html
  • 11.
    Functions runs whenevents are triggered ● A POST request is sent to API Gateway ● A scheduled cron triggers ● A new item is added to S3 bucket ● An infrastructure event matching rule is found. ● A SAAS webhook is fired. ● A link in email is clicked. ● A new item is added to DB.
  • 12.
    Async, Latency TolerantWebScraping with AWS Lambda Framework - Scrapie Request with JSON spec AWS API Gateway Trigger Lambda AWS SNS scrape -job AWS Lambda ScraperNotifier Client System AWS Lambda ScraperJob Hit Callback URL with messageid and content. AWS CloudWatch & Rollbar (Monitoring & Logging) AWS CloudFormation (Orchestration) Gradle & Serverless (Build & Deployment) Helium Scraping Service Proxy
  • 13.
    Configure Lambda, SNS,API Gateway…. Still lot of work & We are lazy.
  • 14.
    DEMO ● Prerequisite ○ InstallNVM - Node Version Manager ○ Install serverless ○ Install AWS CLI ○ AWS configure with your AccessKey and SecretKey ○ Good to GO!
  • 15.
    Challenges - Serverless ●State ● Latency ● Loss of control ● Testing/Tooling ○ Serverless-offline ○ localstack/localstack ○ But not lot of the options. ● Very low latency ○ High Frequency Trading ● Large Scale, in memory, stateful ○ MFTs of TBs of data. ● Long running, stateful ○ Synchronous external transactions Terrible use cases - Serverless
  • 16.
    Awesome Serverless usecases. ● Async, Latency tolerant ○ Data Pipelines, Automatic Thumbnail generation, New User welcome emails, PDF Generation ● Sync, Latency tolerant ○ Web apps, APIs ● Glue ○ Infrastructure automation, orchestration ● Analytics Stack ○ AWS Athena, AWS Glue
  • 17.

Editor's Notes

  • #5 Need to ask why? Bar