KEMBAR78
DataStax Sandbox Tutorial | PDF | Databases | No Sql
0% found this document useful (0 votes)
16 views16 pages

DataStax Sandbox Tutorial

The DataStax Enterprise Sandbox is a virtual machine designed to educate users on Apache Cassandra and DataStax Enterprise through a guided tutorial. It includes various tools for database management and querying, such as DataStax OpsCenter and DevCenter, and covers topics like creating database objects, monitoring clusters, and running analytics. The tutorial is structured into seven sessions, each focusing on different aspects of using the DataStax Sandbox and its components.

Uploaded by

vkonur
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views16 pages

DataStax Sandbox Tutorial

The DataStax Enterprise Sandbox is a virtual machine designed to educate users on Apache Cassandra and DataStax Enterprise through a guided tutorial. It includes various tools for database management and querying, such as DataStax OpsCenter and DevCenter, and covers topics like creating database objects, monitoring clusters, and running analytics. The tutorial is structured into seven sessions, each focusing on different aspects of using the DataStax Sandbox and its components.

Uploaded by

vkonur
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

DataStax Enterprise Sandbox

A Guided Tutorial
June 2016
Table of Contents
 
TABLE OF CONTENTS 2  
WELCOME 3  
WHAT IS APACHE CASSANDRA? 3  
WHAT IS DATASTAX ENTERPRISE? 4  
ABOUT THIS TUTORIAL 4  
SESSION 1: GETTING STARTED WITH THE DATASTAX SANDBOX 4  
To Learn More 5  
SESSION 2: CREATING AND QUERYING DATABASE OBJECTS WITH DATASTAX
DEVCENTER 5  
To Learn More 7  
SESSION 3: QUERYING CASSANDRA OBJECTS FROM THE COMMAND LINE 7  
To Learn More 9  
SESSION 4: MONITORING CASSANDRA AND DATASTAX ENTERPRISE WITH
DATASTAX OPSCENTER 9  
The Learn More 11  
SESSION 5: RUNNING ANALYTICS ON CASSANDRA DATA 11  
To Learn More 12  
SESSION 6: RUNNING SEARCH OPERATIONS ON CASSANDRA DATA 12  
To Learn More 12  
SESSION 7: GETTING STARTED WITH DSE GRAPH 13  
To Learn More 16  
WRAP UP 16  
CONCLUSION 16  
ABOUT DATASTAX 16  

DataStax Enterprise Sandbox Tutorial 2


Welcome
The DataStax Sandbox is a self-contained virtual machine (VM) designed to introduce and educate you
on the use of Apache Cassandra™ and DataStax Enterprise (DSE). It includes the following components:

• The DataStax Enterprise Server – a production-certified NoSQL database platform powered by


Apache Cassandra architected for today's online applications and designed to securely manage
real-time, analytic, and search data all in the same database cluster.
• DataStax OpsCenter – a visual, web-based management and monitoring solution for Cassandra
and DataStax Enterprise.
• DataStax DevCenter – a free visual query tool that allows you to easily create and run
Cassandra Query Language (CQL) queries and commands against Apache Cassandra and
DataStax Enterprise.
• DataStax Studio – a web-based visual development tool for interacting with DSE Graph.
• Database Utilities – various utilities for performing administration and command line query
functions in the DataStax Sandbox.

The DataStax Sandbox is configured so as to contain a single node of DSE running Cassandra as the
default node type. You can switch node types to analytic and search easily to explore how they work.

The DataStax Sandbox runs on either Oracle VM Virtual box or VMware Fusion and requires at least
20GB of disk space, a 64-bit operating system, and 8GB of RAM for all non-DSE Graph functionality,
while running the Sandbox with DSE Graph enabled requires 16GB of RAM.

NOTE: The DataStax Sandbox is NOT intended nor configured for production deployments and
performance testing.

Suggestions for improving DataStax Sandbox can be sent to sandbox@datastax.com.

What is Apache Cassandra?


Apache Cassandra is a massively scalable, open source NoSQL database that provides continuous
availability, fast performance with linear scalability, and operational simplicity for today’s modern online
applications. All RDBMS’s and some NoSQL databases have master-slave architectures that often
impose certain challenges in manually maintaining & scaling the sharded design. Rather than using these
designs, Cassandra has an elegant masterless (i.e. all nodes are the same) distributed architecture that
is much more elegant, and easier to set up, scale and maintain.

Cassandra provides automatic data distribution across all nodes that participate in a “ring” or database
cluster. There is no addition work, programmatic or operational, that a developer or administrator needs
to do to distribute data across a cluster.

Instead, Cassandra provides built-in and customizable replication, which stores redundant copies of data
across nodes that participate in a Cassandra ring, whether that cluster is on-premise, in the cloud, or
spans multiple data centers and cloud providers. This means that if any node in a cluster goes down, one
or more copies of that node’s data is available on other machines in the cluster and the database stays
online and remains operational.

DataStax Enterprise Sandbox Tutorial 3


What is DataStax Enterprise?
DataStax Enterprise delivers a comprehensive data management layer with its unique always-on
architecture that accelerates the ability of enterprises, government agencies, and systems integrators to
power their exploding number of cloud applications. DSE meets these applications requirements that
include being able to easily distributed data across datacenters and clouds, through the use of its secure,
operationally simple platform that is built on Apache Cassandra.

Like Cassandra, DSE scales out across multiple nodes and provides full workload isolation so that nodes
designated for online operations do not compete with nodes specified as analytic or search nodes where
resources or data are concerned.

About This Tutorial


This tutorial is intended to help guide you through the various parts of the DataStax Sandbox and assists
in educating you on the basics on Cassandra, DSE, and other DataStax software. The guide is divided
into seven sessions:

• Session 1: Getting started with the DataStax Sandbox.


• Session 2: Creating and querying database objects with DataStax DevCenter.
• Session 3: Querying Cassandra objects from the command line.
• Session 4: Monitoring Cassandra and DataStax Enterprise with DataStax OpsCenter.
• Session 5: Running analytics on Cassandra data.
• Session 6: Running search operations on Cassandra data.
• Session 7: Getting started with DSE Graph.

Session 1: Getting Started with the DataStax Sandbox


In this session, you will become acquainted with the DataStax Sandbox and how it is organized.

Open your Virtualbox or Vmware Fusion software and import the sandbox image (vm image is all pre-
configured with appropriate RAM & CPU settings). It should take just a couple of minutes to boot the
image and then a login screen is shown.

Log into the Sandbox by entering the ID/password combination of ‘datastax/datastax’.

The VM image will present a Firefox browser with a couple of tabs open. The second tab contains an
introductory welcome message with links at the bottom for DataStax OpsCenter (a visual management
and monitoring solution for Cassandra and DataStax Enterprise) and a copy of this tutorial.

Minimizing the browser shows the VM image’s desktop. The VM’s desktop contains a number of folders
and icons that enable you to easily try out various parts of the sandbox. For example, to check that DSE
is running and is ready for database operations, perform the following:

1. Locate the utilities folder on the desktop and double-click to open it.
2. Double click on the Check Node Status icon.

The nodetool utility of Cassandra is executed; you should see a window with output that resembles Fig 1.

DataStax Enterprise Sandbox Tutorial 4


Fig 1 – Output from the Cassandra nodetool utility

Note: the line starting with UN confirms that the Cassandra node is running.

To Learn More
For more introductory information on Cassandra and DataStax Enterprise, please reference the following
resources:

• Introduction to Apache Cassandra White Paper


• Introduction to DataStax Enterprise White Paper
• DataStax documentation for Apache Cassandra and DataStax Enterprise
• Free online/virtual training for Cassandra and DataStax Enterprise

Session 2: Creating and Querying Database Objects with


DataStax DevCenter
In this session, you will learn about the various database objects available in Cassandra, and understand
how to create, insert data into, and query objects.

The basic database objects that you will routinely interact with are:

• Keyspace – Serves as a container for database objects such as tables and indexes, and is
where the level of replication is set. It is analogous to a Microsoft SQL Server or MySQL
database.
• Table – Sometimes referred to in Cassandra literature as a column family, it is the primary object
used to store data. A Cassandra table looks a lot like an RDBMS table on the surface, but
actually it is a sparse data object that provides much more flexibility.
• Index – Akin to an index in an RDBMS, it is a mechanism used to improve the performance of
some queries.

There are other objects in Cassandra, but the above three are the most common with which you will work.

DataStax Enterprise Sandbox Tutorial 5


Creating database objects in Cassandra is accomplished via the Cassandra Query Language (CQL),
which looks much like SQL in the relational database world. To get a feel for how to create, insert data
into, and query tables, you will use DataStax DevCenter, which is a GUI development tool designed to
create and query database objects with ease. First, perform the following:

1. Locate the Launch DataStax DevCenter icon on the Sandbox desktop.


2. Double-click the icon to start DevCenter.

DevCenter will open and present an interface like the following:

Connection Manager Tabbed Query Interface Keyspace/Schema Navigator

Fig 2 – DataStax DevCenter

DataStax DevCenter operates in the same way as various GUI tools for RDBM’s (e.g. TOAD for Oracle,
SQL Server Query Analyzer, MySQL Workbench). DevCenter automatically connects you to the running
DSE instance in the VM. For this exercise, you will create a new keyspace, insert data into a number of
tables, and run a query against a table.

1. Locate the first tab in DevCenter’s query interface (labeled ‘Sample Data Modeling’). Click on it to
give it focus in the interface. Alternatively you can double click on the “1-Sample Data
Modelling.cql” script displayed in the CQL Scripts panel. This script will create a new keyspace
and a number of tables/indexes.
2. Notice how the CQL in the interface greatly resembles DDL in SQL.
3. Click on the green arrow icon to execute the script.
4. In the status bar of the Results pane, you will see a message at the bottom of: “10 statement(s)
successfully executed.”
5. Notice there is now a new keyspace labeled “videodb” in the Schema Navigator pane (right hand
side).
6. Click on the arrow in the Schema Navigator to view the new tables you have just created.

DataStax Enterprise Sandbox Tutorial 6


Now you will insert data into your new tables:

1. Click on the second tab in DevCenter’s query interface (labeled ‘Sample Inserts’)
2. Click on the green arrow icon to execute the script.
3. In the Results pane, you will see a message at the bottom of: “51 statement(s) successfully
executed.”

Lastly, you can now query your new tables:

1. Click on the third tab in DevCenter’s query interface (labeled ‘Sample Queries’)
2. This tab contains a variety of sample queries you can run against your new tables.
3. Go up under the File menu and choose New CQL Script. This will open a new query tab for you.
4. Type the following into the interface: select * from videodb.users; If you press
Ctrl+space when writing this query the code completion popup will show up and it can help you
write queries faster.
5. Click the green arrow icon to execute your query.
6. Observe the rows returned in the Results portion of the interface.

To Learn More
For more information on Cassandra’s data model, designing NoSQL applications, the Cassandra Query
Language (CQL) and DataStax DevCenter, please visit:

• Guided tutorials on learning the Cassandra data model


• Documentation for CQL
• CQL Reference Cards
• DataStax DevCenter Info Sheet

Session 3: Querying Cassandra Objects From the Command Line


In this session, you will learn how to use the main command line query tool for Cassandra - cqlsh.

In addition to the graphical DataStax DevCenter tool, you can create, manage, and query Cassandra
objects from a command line tool – the CQL shell or cqlsh. To open the cqlsh tool in your VM:

1. Go to the VM desktop and locate the Utilities folder.


2. Open the Utilities folder and locate the Start cqlsh icon.
3. Double click the Start cqlsh icon, which opens the cqlsh tool.

DataStax Enterprise Sandbox Tutorial 7


Fig 3 – the command line cqlsh utility

You will see some informational messages at the top of the utility regarding the version of Cassandra and
CQL to which you are connected.

Now type help; and hit the enter key. You will see a list of CQL commands that you can use inside the
utility. To get more information about each one, type help and hit the enter key.

Now, let’s use the cqlsh tool to get some information about a certain table and then query that table:

1. Type use videodb; inside the utility and hit enter. You have now switched the context of the tool to
use the videodb keyspace.
2. Type desc table users; and hit enter. This command will show you the DDL used to create the
table.
3. Type select * from users; and hit enter. This query will pull back all rows for the users table.

DataStax Enterprise Sandbox Tutorial 8


Fig 4 – Examples CQL command output

Now type exit; and hit the enter key. This will disconnect you from Cassandra and the cqlsh tool and
return you to a terminal prompt. You can type exit again at the prompt to close the terminal window.

To Learn More
There is much more you can do with CQL and the cqlsh tool. For more information on CQL and the cqlsh
tool, please refer to the following:

• Documentation for CQL and cqlsh


• CQL Reference Cards

Session 4: Monitoring Cassandra and DataStax Enterprise with


DataStax OpsCenter
In this session you will learn the basics of how to use OpsCenter to monitor and manage a Cassandra /
DataStax Enterprise cluster.

DataStax OpsCenter is a visual management and monitoring solution for Cassandra and DataStax
Enterprise. DataStax OpsCenter can be installed on any server – on premise or in the cloud – that has
connectivity to clusters running Cassandra or DataStax Enterprise.

Each node in a Cassandra or DataStax Enterprise cluster contains a DataStax agent, which
communicates with the central OpsCenter service. The DataStax agent and OpsCenter service work
together to monitor and handle tasks on every managed cluster.

OpsCenter provides a Web-based console from which everything can be centrally managed. The
OpsCenter interface provides a visual point-and-click environment for quickly carrying out many
administration and performance monitoring activities.

DataStax Enterprise Sandbox Tutorial 9


Fig 5 – Overview of DataStax OpsCenter architecture

On your VM, you have a version of OpsCenter running. To invoke OpsCenter:

1. Go to your VM’s desktop and locate the Launch DataStax OpsCenter icon.
2. Double click the icon. Doing so will invoke the Firefox browser and present you with the
OpsCenter dashboard:

Cluster Navigation Pane Management Navigation Pane Monitoring Pane

Fig 6 – the main DataStax OpsCenter interface

DataStax Enterprise Sandbox Tutorial 10


The main OpsCenter dashboard will provide you with an overview of your VM’s DSE node. To find out
more about your cluster, do the following:

1. Click on the Nodes icon in the left hand management navigation pane (which looks like a 4-leaf
clover).

This will show you an alternative graphical dashboard of your cluster and will display a ring graphic with
one green circle (which represents a database node). If you take your mouse pointer and hover over a
green circle/node, OpsCenter will present demographic information about that node.

You can explore all the various core OpsCenter features by using the functions listed on the left hand
management navigation pane:

• Nodes (Ring or List view) – lets you navigate a cluster’s nodes and perform various actions on
them (e.g. start, stop, etc.).
• Activities – lets you check out activities being carried out on the cluster as well as the event log
that lists all actions that have occurred.
• Data – allows you to run backup/restore operations and view/create data objects in the cluster.
• Services – lets you graphically manage the various DataStax server services running on the
cluster as well as utilize the Best Practice service that helps those new to DataStax automatically
tune and optimize their database clusters.

There are also functions listed across the top of OpsCenter that are used to visually create new database
clusters and perform other actions.

The Learn More


For more information on using DataStax OpsCenter, please refer to the following:

• OpsCenter White Paper


• Video overview of OpsCenter
• OpsCenter documentation

Session 5: Running Analytics on Cassandra Data


In this session you will learn how to run analytics on Cassandra data.

DataStax Enterprise provides built-in integration with Spark to run near real-time analytics on Cassandra
data as well as a number of Hadoop components (MapReduce, Hive, Pig, Mahout, Sqoop) that allows
you to run batch analytics on Cassandra data. DSE provides complete workload isolation for analytics
operations so that nodes designated as analytics nodes will not conflict or compete with online/Cassandra
nodes (or enterprise search/Solr nodes) for compute resources or data.

To run analytics on Cassandra data in your VM, you can use the weather sensor demo that is bundled
with DataStax Enterprise. The demo simulates a weather sensor collection and analytics application. To
use the demo, perform the following:

1. Locate the folder on the VM desktop labeled “Weather Sensor Demo” and open it.
2. Double click on the “Start with Spark Analytic Node” icon, which will stop your existing Cassandra
instance and restart the node as an analytics (or Spark enabled) node. Minimize the window that
is left up.
3. Double click on the “Load Weather Sensor Demo Data”, which will load sample data into your
new analytics node. This will take a few minutes to complete. You can type ‘exit’ to exit the
command shell once the data loading process completes.
4. Double click on the “Start Spark Service” icon. Minimize the window afterwards.

DataStax Enterprise Sandbox Tutorial 11


5. Double click on the “Start Hive Service” icon. Minimize the window afterwards.
6. Double click on the “Start Webserver” icon. Minimize the window afterwards.
7. Double click on “View Weather Sensor Console”, which will bring up the visual HTML interface
used to view the application and demo data.
8. At the top of the Web interface is an HTML toolbar that allows you to interact with the demo. The
Near Time Reports option lets you view visual analytics reports on weather data for various
regions. The Sample Live Queries option lets you select various queries to run against the
database and choose whether to run the queries through Spark or through Apache Hive (allowing
you to see the response time differences between the two). The Custom Live Queries option
allows you to visually select query options (e.g. day of week) and visually alter the analytic query
that runs, as well as view the Spark query itself.

To Learn More
For more information on running analytics on Cassandra data in DSE using Spark, please refer to the
following:

• DSE documentation (see section of DSE docs entitled “Analyzing Data Using Spark” under the
“DSE Analytics” link).

Session 6: Running Search Operations on Cassandra Data


In this session, you will learn how DSE’s built in enterprise search support works.

DataStax Enterprise supplies the ability to easily run enterprise search operations on Cassandra data
with its built in Solr integration. DSE provides complete workload isolation for search tasks so that nodes
designated as search nodes will not conflict or compete with online/Cassandra nodes (or analytics nodes)
for compute resources or data.

Your VM comes with a demo of enterprise search functionality. To run through the demo, perform the
following:

1. Locate the folder on the VM desktop labeled “Wikipedia Demo Showing Solr” and open it.
2. Double click on the “Start Solr Node” icon, which restarts your VM’s node as a search node.
Minimize the window after you open it.
3. Double click on the “Create Schema and Index” icon, which creates a sample schema with data
that can be searched. You can close the window once it finishes loading its 3,000 sample records
from Wikipedia.
4. Double click on the “View Sample Search Screen”, which brings up a simple browser window
designed to act as a front-end search application.
5. Type “north” into the Search widget provided and hit enter. On the right hand side, results will be
provided from DSE/Solr that contain Wikipedia articles that have the word “north” in it. You can
click on the “wikipedia article” link to see the article in Wikipedia if you are connected to the Web.

To Learn More
There is much more to DSE’s built in enterprise search capabilities than what the simple demo above has
shown. For more information on running enterprise search on Cassandra data in DSE using Solr, please
refer to the following:

• DSE documentation (see section of DSE docs entitled “DSE Search/Solr” under the “Integrated
Solutions” link).

DataStax Enterprise Sandbox Tutorial 12


Session 7: Getting Started with DSE Graph
In this session, you will be introduced to the concept of creating graph databases with DSE Graph. DSE
Graph is part of DSE’s multi-model platform, which includes support for key-value, tabular,
JSON/document and graph data models, each of which have their data persisted to DSE Cassandra.

DSE Graph is a graph database built for cloud applications that need to manage complex data and its
many relationships. DSE Graph delivers continuous uptime along with predictable performance and
scale, while remaining operationally simple to manage.

For this tutorial, you will use DataStax Studio, which is a web-based tool for visually interacting with DSE
Graph. With DataStax Studio, you will create a sample graph schema with data, query data from the
graph, and visually transform the data into a variety of charts. To get started:

1. Locate the Launch Graph Node icon on the Sandbox desktop and double-click it. This will start a
DSE node that is graph enabled on your VM.
2. Locate the DataStax Studio icon on the Sandbox desktop and double-click it. This will invoke a
tab in the VM’s browser, which will run DataStax Studio.

DataStax Studio will present an interface in your browser like the following:

Navigation menu DataStax Studio and DSE Graph Tutorial

Fig 7 – The DataStax Studio web interface

DataStax Studio allows you to create and save multiple ‘notebooks’ that contain code and queries that
you run against DSE Graph. DataStax Studio automatically connects you to the running DSE node on
your VM running DSE Graph.

DataStax Enterprise Sandbox Tutorial 13


For this exercise, you will use the “Welcome to DataStax Studio” notebook. Click on that notebook and
you will be presented with the following:

Fig 8 – the DataStax Studio tutorial notebook

First, read through the introductory two paragraphs in the notebook.

Notebooks in DataStax Studio are broken up into “cells” that typically contain code and queries executed
against DSE Graph along with any result sets from them. DSE Graph uses the Gremlin language – which
is the open source standard for graph databases – to interact with DSE Graph.

The third cell contains a set of Gremlin statements use to create a sample graph. To create the graph,
move your mouse pointer down into the cell and notice that a set of options appears on the right hand
side of the cell. Click on the arrow or “execute” icon, which will run the statements needed to create your
sample graph:

Fig 9 – The execute/run query notebook icon

DataStax Enterprise Sandbox Tutorial 14


The Gremlin script will create a set of vertexes (much like an entity in an RDBMS), edges (connections or
relationships between vertexes) and properties for the vertexes and edges.

Once DataStax Studio has finished creating your sample graph, you can get a visual picture of the small
graph by clicking on the “Schema” icon in the upper right hand corner of DataStax Studio, which will
display the following:

Fig 10 – Visualizing the sample graph in DataStax Studio

If you move your mouse pointer over the objects displayed in the schema graphic, DataStax Studio will
provide you with a pop-up box that contains metadata about the object. You can click on the Schema icon
again to remove the schema graphic from view.

Now, scroll down to the following set of cells. The next step in the tutorial allows you to select all the data
from the graph using a simple Gremlin query, which is presented in a grid format:

Fig 11 – Querying all the data from the newly created graph

DataStax Enterprise Sandbox Tutorial 15


You can continue to scroll down and work as far into the tutorial as you would like, and find out how to
visually explore graphs built in DSE Graph, create charts from graph data, and much more.

To Learn More
For more introductory information on DSE Graph, please reference the following resources:

• The Multi-Model Database White Paper


• Why Graph White Paper
• DataStax Academy video introducing graph databases
• DataStax Academy video on learning Gremlin
• DSE Online documentation

Wrap Up
Once you have completed all the exercises, you can shutdown your VM by choosing System->Shut
down… from the main menu.

To return the VM to its original state, you can open the Utilities folder on the desktop and double-click on
the “Clear All Data” icon.

Suggestions for improving DataStax Sandbox can be sent to sandbox@datastax.com.

Conclusion
The DataStax Sandbox provides a basic hands-on overview of DataStax software. The recommended
next steps for you are (1) to enroll in the DataStax free online training (DataStax Academy) that provides
self-paced instruction and exercises designed to help ground you in creating applications for DSE and
Cassandra; (2) Follow up with the recommended resources in each of the above sections and visit the
DataStax website for additional materials.

About DataStax
DataStax, the leading provider of database software for cloud applications, accelerates the ability of
enterprises, government agencies, and systems integrators to power the exploding number of cloud
applications that require data distribution across datacenters and clouds, by using our secure,
operationally simple platform built on Apache Cassandra™.

With more than 500 customers in over 50 countries, DataStax is the database technology of choice for
the world’s most innovative companies, such as Netflix, Safeway, ING, Adobe, Intuit, Target and eBay.
Based in Santa Clara, Calif., DataStax is backed by industry-leading investors including Comcast
Ventures, Crosslink Capital, Lightspeed Venture Partners, Kleiner Perkins Caufield & Byers, Meritech
Capital, Premji Invest and Scale Venture Partners. For more information, visit DataStax.com or follow us
@DataStax. 06.27.16

DataStax Enterprise Sandbox Tutorial 16

You might also like