KEMBAR78
Daftar
Login
5分ではじめるApache Spark on AWS | PPTX
Download free for 30 days
Sign in
Upload
Language (EN)
Support
Business
Mobile
Social Media
Marketing
Technology
Art & Photos
Career
Design
Education
Presentations & Public Speaking
Government & Nonprofit
Healthcare
Internet
Law
Leadership & Management
Automotive
Engineering
Software
Recruiting & HR
Retail
Sales
Services
Science
Small Business & Entrepreneurship
Food
Environment
Economy & Finance
Data & Analytics
Investor Relations
Sports
Spiritual
News & Politics
Travel
Self Improvement
Real Estate
Entertainment & Humor
Health & Medicine
Devices & Hardware
Lifestyle
Change Language
Language
English
Español
Português
Français
Deutsche
Cancel
Save
Submit search
EN
Uploaded by
Noritaka Sekiyama
PPTX, PDF
133 views
5分ではじめるApache Spark on AWS
JAWSDAYS 2022の"AWS SA/エキスパート怒濤のLTチャレンジ"で発表したSpark on AWSのLTです。
Software
◦
Read more
0
Save
Share
Embed
Download
Download to read offline
1
/ 15
2
/ 15
3
/ 15
4
/ 15
5
/ 15
6
/ 15
7
/ 15
8
/ 15
9
/ 15
10
/ 15
11
/ 15
12
/ 15
13
/ 15
14
/ 15
15
/ 15
More Related Content
PDF
20200128 AWS Black Belt Online Seminar Amazon Forecast
by
Amazon Web Services Japan
PDF
20190220 AWS Black Belt Online Seminar Amazon S3 / Glacier
by
Amazon Web Services Japan
PDF
AWS Black Belt Techシリーズ Amazon WorkDocs / Amazon WorkMail
by
Amazon Web Services Japan
PDF
20210216 AWS Black Belt Online Seminar AWS Database Migration Service
by
Amazon Web Services Japan
PDF
AWS Black Belt Online Seminar 2017 AWS WAF
by
Amazon Web Services Japan
PDF
20190514 AWS Black Belt Online Seminar Amazon API Gateway
by
Amazon Web Services Japan
PDF
AWS Black Belt Online Seminar 2017 AWS Storage Gateway
by
Amazon Web Services Japan
PDF
AWS Black Belt Techシリーズ Amazon VPC
by
Amazon Web Services Japan
20200128 AWS Black Belt Online Seminar Amazon Forecast
by
Amazon Web Services Japan
20190220 AWS Black Belt Online Seminar Amazon S3 / Glacier
by
Amazon Web Services Japan
AWS Black Belt Techシリーズ Amazon WorkDocs / Amazon WorkMail
by
Amazon Web Services Japan
20210216 AWS Black Belt Online Seminar AWS Database Migration Service
by
Amazon Web Services Japan
AWS Black Belt Online Seminar 2017 AWS WAF
by
Amazon Web Services Japan
20190514 AWS Black Belt Online Seminar Amazon API Gateway
by
Amazon Web Services Japan
AWS Black Belt Online Seminar 2017 AWS Storage Gateway
by
Amazon Web Services Japan
AWS Black Belt Techシリーズ Amazon VPC
by
Amazon Web Services Japan
What's hot
PDF
AWS Black Belt Online Seminar 2017 Amazon DynamoDB
by
Amazon Web Services Japan
PDF
20191023 AWS Black Belt Online Seminar Amazon EMR
by
Amazon Web Services Japan
PDF
AWS Black Belt Online Seminar Amazon Elastic Block Store (EBS)
by
Amazon Web Services Japan
PDF
20190424 AWS Black Belt Online Seminar Amazon Aurora MySQL
by
Amazon Web Services Japan
PDF
AWS Black Belt Online Seminar AWSで実現するDisaster Recovery
by
Amazon Web Services Japan
PDF
20200811 AWS Black Belt Online Seminar CloudEndure
by
Amazon Web Services Japan
PDF
Amazon Kinesis Familyを活用したストリームデータ処理
by
Amazon Web Services Japan
PDF
20190723 AWS Black Belt Online Seminar AWS CloudHSM
by
Amazon Web Services Japan
PDF
20190731 Black Belt Online Seminar Amazon ECS Deep Dive
by
Amazon Web Services Japan
PDF
20191029 AWS Black Belt Online Seminar Elastic Load Balancing (ELB)
by
Amazon Web Services Japan
PDF
AWS 初心者向けWebinar 利用者が実施するAWS上でのセキュリティ対策
by
Amazon Web Services Japan
PDF
20200804 AWS Black Belt Online Seminar Amazon CodeGuru
by
Amazon Web Services Japan
PDF
30분만에 만드는 AWS 기반 빅데이터 분석 애플리케이션::안효빈::AWS Summit Seoul 2018
by
Amazon Web Services Korea
PDF
AWS Black Belt - AWS Glue
by
Amazon Web Services Japan
PDF
20190319 AWS Black Belt Online Seminar Amazon FSx for Windows Server
by
Amazon Web Services Japan
PDF
Presto ベースのマネージドサービス Amazon Athena
by
Amazon Web Services Japan
PDF
20191016 AWS Black Belt Online Seminar Amazon Route 53 Resolver
by
Amazon Web Services Japan
PDF
20190410 AWS Black Belt Online Seminar Amazon Elastic Container Service for K...
by
Amazon Web Services Japan
PPT
Cassandraのしくみ データの読み書き編
by
Yuki Morishita
PDF
20211203 AWS Black Belt Online Seminar AWS re:Invent 2021アップデート速報
by
Amazon Web Services Japan
AWS Black Belt Online Seminar 2017 Amazon DynamoDB
by
Amazon Web Services Japan
20191023 AWS Black Belt Online Seminar Amazon EMR
by
Amazon Web Services Japan
AWS Black Belt Online Seminar Amazon Elastic Block Store (EBS)
by
Amazon Web Services Japan
20190424 AWS Black Belt Online Seminar Amazon Aurora MySQL
by
Amazon Web Services Japan
AWS Black Belt Online Seminar AWSで実現するDisaster Recovery
by
Amazon Web Services Japan
20200811 AWS Black Belt Online Seminar CloudEndure
by
Amazon Web Services Japan
Amazon Kinesis Familyを活用したストリームデータ処理
by
Amazon Web Services Japan
20190723 AWS Black Belt Online Seminar AWS CloudHSM
by
Amazon Web Services Japan
20190731 Black Belt Online Seminar Amazon ECS Deep Dive
by
Amazon Web Services Japan
20191029 AWS Black Belt Online Seminar Elastic Load Balancing (ELB)
by
Amazon Web Services Japan
AWS 初心者向けWebinar 利用者が実施するAWS上でのセキュリティ対策
by
Amazon Web Services Japan
20200804 AWS Black Belt Online Seminar Amazon CodeGuru
by
Amazon Web Services Japan
30분만에 만드는 AWS 기반 빅데이터 분석 애플리케이션::안효빈::AWS Summit Seoul 2018
by
Amazon Web Services Korea
AWS Black Belt - AWS Glue
by
Amazon Web Services Japan
20190319 AWS Black Belt Online Seminar Amazon FSx for Windows Server
by
Amazon Web Services Japan
Presto ベースのマネージドサービス Amazon Athena
by
Amazon Web Services Japan
20191016 AWS Black Belt Online Seminar Amazon Route 53 Resolver
by
Amazon Web Services Japan
20190410 AWS Black Belt Online Seminar Amazon Elastic Container Service for K...
by
Amazon Web Services Japan
Cassandraのしくみ データの読み書き編
by
Yuki Morishita
20211203 AWS Black Belt Online Seminar AWS re:Invent 2021アップデート速報
by
Amazon Web Services Japan
Similar to 5分ではじめるApache Spark on AWS
PPTX
AWSで作る分析基盤
by
Yu Otsubo
PDF
BigData-JAWS 2020-11-30 - AWS Batchによるデータ変換処理
by
Yutaro Ono
PDF
デジタル化への第一歩 「エンタープライズデータレイク構築事例のご紹介」
by
BeeX.inc
PDF
書誌データのLOD化: データソン的デモンストレーション
by
Kouji Kozaki
PDF
開発中の新機能 Spark Declarative Pipeline に飛びついてみたが難しかった(JEDAI DAIS Recap#2 講演資料)
by
NTT DATA Technology & Innovation
PDF
Serverless analytics on aws
by
Amazon Web Services Japan
PDF
20180619 AWS Black Belt Online Seminar データレイク入門: AWSで様々な規模のデータレイクを分析する効率的な方法
by
Amazon Web Services Japan
PPTX
AWS を活用して小さなチームで 世界で使われるサービスを運用する方法 - JAWS Days 2013
by
Takashi Someda
PPTX
大規模データ処理の定番OSS Hadoop / Spark 最新動向 - 2021秋 -(db tech showcase 2021 / ONLINE 発...
by
NTT DATA Technology & Innovation
PDF
Data Engineering at VOYAGE GROUP #jawsdays
by
VOYAGE GROUP
PDF
Data Engineering at VOYAGE GROUP #jawsdays
by
Kenta Suzuki
PDF
ソリューションセッション#3 ビッグデータの3つのVと4つのプロセスを支えるAWS活用法
by
Amazon Web Services Japan
PDF
NoSQLとビックデータ入門編Update版
by
Koichiro Nishijima
PDF
AWS初心者向けWebinar AWSでBig Data活用
by
Amazon Web Services Japan
PDF
Amazon Game Tech Night #22 AWSで実現するデータレイクとアナリティクス
by
Amazon Web Services Japan
PDF
Hadoop上の多種多様な処理でPigの活きる道 (Hadoop Conferecne Japan 2013 Winter)
by
NTT DATA OSS Professional Services
PDF
個人的にAmazon EMR5.0.0でSpark 2.0を使ってZeppelinでSQL集計してみる
by
Eiji Shinohara
PPTX
Sparkにプルリク投げてみた
by
Noritaka Sekiyama
PDF
社会ネットワーク分析第7回
by
Satoru Mikami
PDF
TokyoWebminig カジュアルなHadoop
by
Teruo Kawasaki
AWSで作る分析基盤
by
Yu Otsubo
BigData-JAWS 2020-11-30 - AWS Batchによるデータ変換処理
by
Yutaro Ono
デジタル化への第一歩 「エンタープライズデータレイク構築事例のご紹介」
by
BeeX.inc
書誌データのLOD化: データソン的デモンストレーション
by
Kouji Kozaki
開発中の新機能 Spark Declarative Pipeline に飛びついてみたが難しかった(JEDAI DAIS Recap#2 講演資料)
by
NTT DATA Technology & Innovation
Serverless analytics on aws
by
Amazon Web Services Japan
20180619 AWS Black Belt Online Seminar データレイク入門: AWSで様々な規模のデータレイクを分析する効率的な方法
by
Amazon Web Services Japan
AWS を活用して小さなチームで 世界で使われるサービスを運用する方法 - JAWS Days 2013
by
Takashi Someda
大規模データ処理の定番OSS Hadoop / Spark 最新動向 - 2021秋 -(db tech showcase 2021 / ONLINE 発...
by
NTT DATA Technology & Innovation
Data Engineering at VOYAGE GROUP #jawsdays
by
VOYAGE GROUP
Data Engineering at VOYAGE GROUP #jawsdays
by
Kenta Suzuki
ソリューションセッション#3 ビッグデータの3つのVと4つのプロセスを支えるAWS活用法
by
Amazon Web Services Japan
NoSQLとビックデータ入門編Update版
by
Koichiro Nishijima
AWS初心者向けWebinar AWSでBig Data活用
by
Amazon Web Services Japan
Amazon Game Tech Night #22 AWSで実現するデータレイクとアナリティクス
by
Amazon Web Services Japan
Hadoop上の多種多様な処理でPigの活きる道 (Hadoop Conferecne Japan 2013 Winter)
by
NTT DATA OSS Professional Services
個人的にAmazon EMR5.0.0でSpark 2.0を使ってZeppelinでSQL集計してみる
by
Eiji Shinohara
Sparkにプルリク投げてみた
by
Noritaka Sekiyama
社会ネットワーク分析第7回
by
Satoru Mikami
TokyoWebminig カジュアルなHadoop
by
Teruo Kawasaki
More from Noritaka Sekiyama
PDF
VPC Reachability Analyzer 使って人生が変わった話
by
Noritaka Sekiyama
PDF
AWS で Presto を徹底的に使いこなすワザ
by
Noritaka Sekiyama
PDF
Modernizing Big Data Workload Using Amazon EMR & AWS Glue
by
Noritaka Sekiyama
PDF
Running Apache Spark on AWS
by
Noritaka Sekiyama
PDF
Effective Data Lakes - ユースケースとデザインパターン
by
Noritaka Sekiyama
PPTX
S3 整合性モデルと Hadoop/Spark の話
by
Noritaka Sekiyama
PDF
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
by
Noritaka Sekiyama
PDF
Introduction to New CloudWatch Agent
by
Noritaka Sekiyama
PPTX
Security Operations and Automation on AWS
by
Noritaka Sekiyama
PDF
Hadoop/Spark で Amazon S3 を徹底的に使いこなすワザ (Hadoop / Spark Conference Japan 2019)
by
Noritaka Sekiyama
PDF
運用視点でのAWSサポート利用Tips
by
Noritaka Sekiyama
PPTX
基礎から学ぶ? EC2マルチキャスト
by
Noritaka Sekiyama
PDF
Floodlightってぶっちゃけどうなの?
by
Noritaka Sekiyama
VPC Reachability Analyzer 使って人生が変わった話
by
Noritaka Sekiyama
AWS で Presto を徹底的に使いこなすワザ
by
Noritaka Sekiyama
Modernizing Big Data Workload Using Amazon EMR & AWS Glue
by
Noritaka Sekiyama
Running Apache Spark on AWS
by
Noritaka Sekiyama
Effective Data Lakes - ユースケースとデザインパターン
by
Noritaka Sekiyama
S3 整合性モデルと Hadoop/Spark の話
by
Noritaka Sekiyama
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
by
Noritaka Sekiyama
Introduction to New CloudWatch Agent
by
Noritaka Sekiyama
Security Operations and Automation on AWS
by
Noritaka Sekiyama
Hadoop/Spark で Amazon S3 を徹底的に使いこなすワザ (Hadoop / Spark Conference Japan 2019)
by
Noritaka Sekiyama
運用視点でのAWSサポート利用Tips
by
Noritaka Sekiyama
基礎から学ぶ? EC2マルチキャスト
by
Noritaka Sekiyama
Floodlightってぶっちゃけどうなの?
by
Noritaka Sekiyama
5分ではじめるApache Spark on AWS
1.
JAWS DAYS 2022 ©
2022, Amazon Web Services, Inc. or its affiliates. © 2022, Amazon Web Services, Inc. or its affiliates. 5分ではじめる Spark on AWS Noritaka Sekiyama Principal Big Data Architect, AWS Glue
2.
JAWS DAYS 2022 ©
2022, Amazon Web Services, Inc. or its affiliates. 関山 宜孝 Principal Big Data Architect, AWS Glue • 5年間 AWS サポートにて技術支援を担当 • 2019年からGlue開発チームにジョイン @moomindani moomindani
3.
JAWS DAYS 2022 ©
2022, Amazon Web Services, Inc. or its affiliates. こんなことありませんか? 3 CSVファイルを JSONに変換したい ファイルから文字列を 検索・集計したい データベースからデータを 抽出してファイルに書き出したい CSV ファイルを 特定のカラムでソートしたい Amazon S3 上のデータを Amazon DynamoDB に移動したい
4.
JAWS DAYS 2022 ©
2022, Amazon Web Services, Inc. or its affiliates. CSVファイルをJSONに変換したい 4 import pandas as pd df = pd.read_csv("s3://amazon-reviews-pds/tsv/sample_us.tsv", sep='t') df.to_json("sample_us.json") import json import csv import s3fs json_list = [] json_data = {} fs = s3fs.S3FileSystem(anon=True) with fs.open('amazon-reviews-pds/tsv/sample_us.tsv', 'r') as f: for line in csv.DictReader(f, delimiter='t’): json_list.append(line) json_data["data"] = json_list with open('sample_us.json', 'w') as f: json.dump(json_data, f)
5.
JAWS DAYS 2022 ©
2022, Amazon Web Services, Inc. or its affiliates. CSVファイルをJSONに変換したい 5 データサイズ 圧縮形式 処理時間 15 KB 非圧縮 2 秒 442 MB gzip 719 秒 2.7 GB gzip 5336 秒 • Macbook Pro 2019, Python 3.7.2 • Pandas による CSV->JSON 変換 • us-east-1 上の S3 バケット (Public Dataset) を使用
6.
JAWS DAYS 2022 ©
2022, Amazon Web Services, Inc. or its affiliates. データが大きいと・・・? 6 100GBのCSVファイルを JSONに変換したい 1TBのファイルから文字列を 検索・集計したい データベースから1TBのデータを 抽出してファイルに書き出したい 100GBのCSV ファイルを 特定のカラムでソートしたい Amazon S3 上の1TBのデータを Amazon DynamoDB に移動したい
7.
JAWS DAYS 2022 ©
2022, Amazon Web Services, Inc. or its affiliates. AWS で分散処理! 7 AWS Glue Amazon Athena Amazon EMR Amazon Redshift
8.
JAWS DAYS 2022 ©
2022, Amazon Web Services, Inc. or its affiliates. AWS で分散処理! 8 Amazon Athena Amazon EMR Amazon Redshift AWS Glue
9.
JAWS DAYS 2022 ©
2022, Amazon Web Services, Inc. or its affiliates. 9
10.
JAWS DAYS 2022 ©
2022, Amazon Web Services, Inc. or its affiliates. 10
11.
JAWS DAYS 2022 ©
2022, Amazon Web Services, Inc. or its affiliates. 11
12.
JAWS DAYS 2022 ©
2022, Amazon Web Services, Inc. or its affiliates. 12
13.
JAWS DAYS 2022 ©
2022, Amazon Web Services, Inc. or its affiliates. 13
14.
JAWS DAYS 2022 ©
2022, Amazon Web Services, Inc. or its affiliates. 14
15.
JAWS DAYS 2022 ©
2022, Amazon Web Services, Inc. or its affiliates. Thank you! © 2022, Amazon Web Services, Inc. or its affiliates. Noritaka Sekiyama @moomindani moomindani
Editor's Notes
#3
AWS Glue, Lake Formation チームの関山と申します。 ビッグデータアーキテクトとしてプロダクトチームで働いており、データレイクに関するサービスサイドの開発や、グローバルのお客様の技術支援を担当しています。 また、先日は「AWSではじめるデータレイクという本」を出版したり、GitHub awslabs にてライブラリやツールなども提供したりもしておりますので、よろしければご覧ください。
Download