KEMBAR78
RapidAnalytics Manual | PDF | Parameter (Computer Programming) | Databases
0% found this document useful (0 votes)
393 views23 pages

RapidAnalytics Manual

This document provides instructions for installing and launching RapidAnalytics 1.0. It describes three installation methods: using an installer, extracting a JBoss bundle, or a manual installation. The installer is recommended for most users as it is easiest. The document outlines prerequisites, database configuration, installing additional files, and launching RapidAnalytics on port 8080. It also provides information on further configuration, connecting RapidMiner to RapidAnalytics, and using features like the repository, remote process execution, and accessing processes as services.

Uploaded by

ansana
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
393 views23 pages

RapidAnalytics Manual

This document provides instructions for installing and launching RapidAnalytics 1.0. It describes three installation methods: using an installer, extracting a JBoss bundle, or a manual installation. The installer is recommended for most users as it is easiest. The document outlines prerequisites, database configuration, installing additional files, and launching RapidAnalytics on port 8080. It also provides information on further configuration, connecting RapidMiner to RapidAnalytics, and using features like the repository, remote process execution, and accessing processes as services.

Uploaded by

ansana
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

RapidAnalytics 1.

0
User and Installation Manual
Simon Fischer, Rapid-I GmbH November 7, 2011

Contents
1 Installation 1.1 Common Prerequisites . . . . . . . . . . . . . . 1.2 RapidAnalytics Installer . . . . . . . . . . . . . 1.2.1 Headless Installation . . . . . . . . . . . 1.3 RapidAnalytics/JBoss Bundle . . . . . . . . . . 1.3.1 Extracting the RapidAnalytics Archive 1.3.2 Conguring the Database . . . . . . . . 1.3.3 Additional Conguration . . . . . . . . 1.4 Manual Installation . . . . . . . . . . . . . . . . 1.4.1 Prerequisites . . . . . . . . . . . . . . . 1.4.2 Conguring the Database . . . . . . . . 1.4.3 Copy Additional Files . . . . . . . . . . 1.4.4 Conguring a Security Domain . . . . . 1.4.5 Additional Conguration . . . . . . . . 2 Launching RapidAnalytics 3 Initial Web-based Conguration 4 Migration from Earlier Versions of RapidAnalytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3 3 4 4 4 4 5 5 5 5 5 6 6 6 6 7

5 Further Conguration 9 5.1 Setting up Database Connections . . . . . . . . . . . . . . . . . . . . . . . 9 5.2 Creating a User . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 6 Connecting RapidMiner to RapidAnalytics 10

7 Working with RapidAnalytics and RapidMiner 7.1 Using the Repository . . . . . . . . . . . . . . . . . . 7.1.1 Storing and Accessing Data in the Repository 7.1.2 Managing Access Rights . . . . . . . . . . . . 7.1.3 Accessing Data in Processes . . . . . . . . . . 7.2 Remote Process Execution . . . . . . . . . . . . . . . 7.2.1 Running a Process Remotely . . . . . . . . . 7.2.2 Scheduled Process Execution . . . . . . . . . 7.2.3 Monitoring Job Execution . . . . . . . . . . . 7.3 Accessing Processes as Services . . . . . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

12 12 13 15 16 17 18 18 19 20

Installation

There are three ways to download and install RapidAnalytics: The installer, the JBoss bundle and the manual installation. The installer is the easiest way and is recommended for most users. The other installation types are recommended only for users that are experienced in conguring application servers.

1.1

Common Prerequisites

Before you proceed with installing RapidAnalytics using any of the three methods, make sure you downloaded and installed the following: Download and install a Java Runtime Environment (JRE), at least version 1.6, e.g. from http://www.java.com. Install any SQL database. RapidAnalytics will store data and administration information in there. The download bundle already contains JDBC drivers for MySQL, Ingres, Postgres, and Microsoft SQL Server. If you are using a dierent database make sure you also download an appropriate JDBC driver (jar le) for this data base. In your SQL database server, create a new database and call it rapidanalytics. Also create a user rapidanalytics for this database and assign it a password. (Of course you can choose other names, but then you must also change the default values in the corresponding conguration steps below.) Install RapidMiner, version 5.1 or above. Get it from www.rapid-i.com. (Alternatively, the Web start version of RapidMiner can be used.) In case you have another JBoss instance installed on the same host, please make sure it does not conict with the RapidAnalytics installation. To avoid such a conict, make sure the environment variable JBOSS HOME is not set.

1.2

RapidAnalytics Installer

Unzip the downloaded le RapidAnalytics-Installer-1.1.xxx.zip to a directory of your choice and run the installer. If your java executable is on the path, open a command shell and type: java -jar RapidAnalytics-Installer-1.1.012.jar On many systems you can as well double-click the jar le to execute it. However, note that on Windows, system-wide installation or installation of a Windows service requires administrator privileges. To obtain these, use Run as administrator when opening the command shell.

In the installer you can make various settings that are explained in the individual steps. The most important one is the conguration of the database and user that you just created. Dont forget to check the connection once all settings are made. Note: The installer does not start RapidAnalytics. Please continue reading in Section 2. 1.2.1 Headless Installation

If you want to install RapidAnalytics on a headless server, you can run the installer on any other machine to generate an installation conguration le without actually installing RapidAnalytics. Check the respective option in the last step of the installer. Specify all settings, especially directories, as if installing on the target server. Finally, copy the conguration le to the server and run the installer with a single command line parameter which points to this le. This will run the installer without bringing up a window and using the settings in the le you created.

1.3
1.3.1

RapidAnalytics/JBoss Bundle
Extracting the RapidAnalytics Archive

First unzip RapidAnalytics-JBoss-bundle-1.1.xxx.zip to a place of your liking. Make sure no blank character appears in the parent path. (Beware of the Program Files directory!) Denote the top-level directory that is thus created by ${RapidAnalytics}. If you are using any of the databases listed in in Section 1.1 above for which we already provide a JDBC driver, you are done. If not, place the JDBC driver jar le to ${RapidAnalytics}/server/default/lib. 1.3.2 Conguring the Database

Change to the directory ${RapidAnalytics}/server/default/deploy/ and search for les with names of the form rapidanalytics-XXX -ds.xml.template. Choose the one where XXX matches your database name, copy it to rapidanalytics-ds.xml, and edit it. Search for the string XXX PASSWORD and replace it with the database password of the user you created for RapidAnalytics. If you like, you can as well change user names and database name in case you did not name them rapidanalytics as recommended. In case your database server does not run on the same host as RapidAnalytics, you must also change the hostname from localhost to the appropriate host name. If you run MySQL, Oracle, Microsoft SQL Server, or PostgreSQL, RapidAnalytics will create the necessary database schema for you. Otherwise, you must create some tables manually. Go to ${RapidAnalytics}/config/quartz/ and select the le tables XXX where XXX matches your database system name, and run this script in your database manually to create the necessary tables.

1.3.3

Additional Conguration

The default installation of RapidAnalytics is congured to use 1024 MB of main memory. To change this, edit bin/run.conf or bin/run.bat.conf, search for -Xmx1024m and replace this number by the desired one. Please continue reading in Section 2.

1.4

Manual Installation

This is only recommended for experienced users and developers that know how to congure an application server according to their needs, including dening a data source and security domain. 1.4.1 Prerequisites

RapidAnalytics will run on several recent application servers, but is tested only on JBoss 6.0.0 Final. Download it and follow the vendors installation instructions. Also install Spring and Spring Security, version 3.1. 1.4.2 Conguring the Database

Follow the steps described in Section 1.3.2. In addition, copy the JDBC driver jar le for your database to a place where the application server can nd them. 1.4.3 Copy Additional Files

Create a folder for extensions and temporary les at a place of your choice. Preferred names are plugins and tmp. Go to the folder into which RapidMiner was installed. Copy all jar les from lib and lib/freehep except rapidminer.jar and launcher.jar to a place where your application server can nd them. If you want to enable Web Start, copy the same les including rapidminer.jar and launcher.jar to a place that is served by your application server under the context root webstart. Sign the jars with jarsigner, or edit your Web browsers Java plugins security settings to accept unsigned classes. To make the server redirect you directly to RapidAnalytics, place a le index.xhtml so your application server serves it in the root directory: <html> <head> <meta HTTP-EQUIV="REFRESH" content="0; url=RA/faces/restricted/index.xhtml"> </head> <body></body> </html>

Finally, copy the le RapidAnalytics-1.1.xxx.ear to the deploy directory of your application server. 1.4.4 Conguring a Security Domain

In your application server, dene a security domain RapidAnalyticsEJBDomain. For JBoss, edit server/default/conf/login-config.xml and copy the application policy entry named client-login to the new name RapidAnalyticsEJBDomain. 1.4.5 Additional Conguration

Congure your application server according to your needs: Set memory consumption, port numbers, etc.

Launching RapidAnalytics

Change to the bin directory inside your installation directory. To start RapidAnalytics, run run.bat -b 0.0.0.0 on Windows or run.sh -b 0.0.0.0 on Unix-like systems (you may have to make the shell script executable by typing chmod 755 run.sh). This will launch RapidAnalytics listening on port 8080 on the local host or on the port and hostname you congured. You will see a lot of messages. Check whether anything unusual appears in the messages. The error message WARNING [com.sun.xml.bind.v2.runtime.reect.opt.Injector] (HDScanner) duplicate class denition bug occured? Please report this. . . can be ignored. Please dont report it.

Initial Web-based Conguration

Point your favourite Web browser to http://localhost:8080 (assuming your application server listens on port 8080 which it does for the bundled download). You will be presented with a login screen (Figure 1). The initial user and password are admin and changeit. After logging in, you will be presented with a setup screen (Figure 2). Check whether RapidAnalytics detects your database system correctly so it can create the necessary tables. If this is not the case, create tables manually as described in Section 1.3.2, and check again. Specify an absolute directory on your le system where RapidAnalytics searches for extensions and a directory in which RapidAnalytics places temporary les (Upload directory). If you chose the installer or the bundled installation, these are the directories

Figure 1: RapidAnalytics login screen. The initial user name is admin, password changeit. ${RapidAnalytics}/plugins and ${RapidAnalytics}/tmp. For the manual installation, these are the folders you created in Section 1.4.3. Click Start installation now, and check potential error messages. If everything looks as in Figure 3, you can click on Complete installation. You should then see the RapidAnalytics welcome screen. Thats it. You now see the Web interface of RapidAnalytics. In most views, there is a navigation bar on the left and a box with possible actions and online help on the right. The rst thing you should do is go to Administration, Preferences and change your administrator password (Figure 4).

Migration from Earlier Versions of RapidAnalytics

To migrate from RapidAnalytics 1.0 to 1.1 (or any other version), follow these steps: Backup your database. Run the installer or follow one of the other installation methods, using the same settings as in your previous installation. Start RapidAnalytics. Go to the login screen. As of RapidAnalytics 1.1, passwords are MD5 hashed by default, so the old password no longer works. If you have congured a mail server,

Figure 2: The RapidAnalytics installation procedure.

Figure 3: The RapidAnalytics installation procedure is complete.

Figure 4: Changing the password is one of the rst things you should do. reset your password by clicking on Forgot password. If you havent, use your database administration tool and edit the table ra ent user. Reset the password of the admin user (but of no other user). This statement may help: UPDATE ra_ent_user SET passwd = MD5(passwd) WHERE userName = "admin"

Once you can log in again, do so. You will be presented a migration screen which will perform additional steps, including the hashing of the remaining passwords.

Further Conguration

RapidAnalytics is now ready to use. Before you start, you probably want to set up a few more things, including users and database connections.

5.1

Setting up Database Connections

One of the rst things you probably want to congure in RapidAnalytics are your database connections. Here, the term database connection refers to connections to databases that contain data that is to be analysed by RapidMiner and RapidAnalytics. This does not refer to the database connection you created exclusively for RapidAnalytics to store administrative information. You can create multiple database connections in RapidAnalytics. To do so, click on Administration, Database Connections in the menu on the left. You will see the screen depicted in Figure 5. Now, choose Create new connection entry from the box on the right and enter the data for your database connection as seen in Figure 6: database system, host, port, username, password, and a name under which it will be

known in RapidMiner and RapidAnalytics. Press Submit and then Test in the box on the right hand side. You should see Ping succeeded as in the gure. Otherwise, check your settings and network connection.

Figure 5: Creating a database connection in RapidAnalytics.

5.2

Creating a User

In everyday work, you should not work in RapidAnalytics as administrator. Instead of that, create a regular user by going to Administration, User management and selecting Create new user from the box on the right hand side (see Figure 7). You can as well create user groups in a very similar fashion. A list of users and groups is available from the main User management view.

Connecting RapidMiner to RapidAnalytics

From the RapidAnalytics Web interface, you can launch RapidMiner via Web Start using the Launch RapidMiner button. If you do this, RapidMiner will be automatically connected to RapidAnalytics: In your Repositories view, you will see a repository named Home. This repository is actually your RapidAnalytics instance. Furthermore, database connections etc. will automatically be shared with RapidMiner. If for some reason you do not like the Web Start solution, you can congure the connection to RapidAnalytics manually. Start RapidMiner and open the Repositories view. Click the Add Repository button (rst button in the toolbar of the Repositories view), select Remote repository, and enter the URL to your server, e.g. http://localhost:8080. Also ll in the username and password of the RapidAnalytics user you created.

10

Figure 6: Entering connection details. The database connection test succeeded as indicated by the message box.

Figure 7: Managing users with RapidAnalytics.

11

Figure 8: Connecting RapidMiner to RapidAnalytics.


Note: A common mistake becoming apparent at this stage is that the host that runs RapidAnalytics does not know its own name. To check this, go to http://localhost:8080/RAWS/RepositoryService?wsdl. Scroll to the bottom and search for something like this: <port binding="tns:RepositoryServiceBinding" name="RepositoryServicePort"> <soap:address location="http://HOSTNAME:8080/RAWS/RepositoryService"/> </port> Check whether the host name is actually a host name under which the host is known in the local network. If it is not, you will get weird error messages when connecting to it. This is mainly because ISPs nowadays tend to redirect HTTP requests for unknown hosts to a search engine when they cant resolve a DNS entry rather than letting the request fail.

Once you are connected to RapidAnalytics, settings made in RapidAnalytics like database connections etc. are also shared with RapidMiner. To check this, go to Tools, Manage database connections and check whether the database connections you dened in Section 5.1 have been published to RapidMiner.

7
7.1

Working with RapidAnalytics and RapidMiner


Using the Repository

Using RapidAnalytics as a server repository is straightforward if you know how to use repositories in RapidMiner: In the Repositories view you see a tree of folders, data, and RapidMiner processes. Using RapidMiner without RapidAnalytics, each of these entries is stored on the local le system. With RapidAnalytics, the behaviour of RapidMiner stays the same, but the entries are stored on the server and can be accessed by a group of people.

12

7.1.1

Storing and Accessing Data in the Repository

We will walk you through some common steps in everyday work with RapidAnalytics. We assume that you connected RapidMiner to RapidAnalytics as described in Section 6 and that you assigned the alias RapidAnalytics to the RapidAnalytics repository in RapidMiner. For using the repository, we rst create a few folders and copy some data. As a rst step, locate the Repositories view in RapidMiner. If that view is currently not showing on screen, go to View, Show View, Repositories. The top level elements of the Repositories view shows the dened repositories. You should at least see a Samples repository and your RapidAnalytics repository. Now, open your home folder in the RapidAnalytics repository. RapidAnalytics automatically created a folder /home/username where username is replaced by your user name. First create two folders named data and processes: Right-click the folder corresponding to your user name, select Create Folder, and enter data. Repeat the same for creating the processes folder. Now, also open the Samples repository, and navigate to the data folder. Right-click the entry Labor-negotiations to open the context menu and select Copy. Now, right-click the data folder you just created and select Paste in the same way. Finally, create a new process. In RapidMiner, you typically specify the place at which a process is saved even before you create the process. Although this behaviour may seem uncommon, you will soon see why saving the process rst is a useful practice. Click File, New process (or use the rst button in the tool bar), select the processes folder you created, and enter Cleanse Data as a le name. Your (yet empty) process will then be saved at this location. Your repository should now look as in Figure 9.

Figure 9: The repository populated with demo data. You can use the repository just as any local repository in RapidMiner. However, you can inspect it also using the RapidAnlytics Web interface. To that end, either go to

13

the Web interface and select Repository, Browse Repository from the navigation bar, or right-click a repository entry in RapidMiners repository tree and select Browse. Each type of repository entry has an individual representation in the Web interface, but all have certain common parts: Actions available for an entry are at the top of the box at the right: Here, you can, e.g., rename and delete entries. Access rights can be dened for each entry. See Section 7.1.2 for details. Entries can be downloaded in a format appropriate for the type of entry. Entries can be navigated using the breadcrumps at the top. Folders. Folders can contain other items (including sub-folders). Figure 10 shows an example of our folders. Folders can be downloaded as a zip dump. In the box on the right, you can create subfolders or upload new entries.

Figure 10: A folder stored in the RapidAnalytics repository.

Data and Tables. This subsumes all kinds of objects that RapidMiner understands, including, e.g., example sets (i.e., tables), models, etc. Figure 11 shows the LaborNegotiations data set we just copied to RapidAnalytics. The preview in the Web interface displays the meta data of the table, i.e., the types and possible values of the columns. Also, you can download the table in various formats, e.g., as an HTML table or as an Excel spreadsheet. If you click on Dependencies you will see which processes read or generate this data set.

14

Figure 11: A table stored in the RapidAnalytics repository. Processes. RapidMiner processes can also be stored on the server. They can be downloaded as an XML le. Furthermore, in the Dependencies panel, RapidAnalytics shows the input and output les of the processes, so you can navigate between linked objects by a click. Other Objects (Blobs). Finally, you can store objects like images and HTML les on the server, in case you want to use them for reporting or other functionality. RapidAnalytics does not interpret them, but just provides them for download exactly as they were uploaded. Furthermore, you can use these blobs in processes by using operators like Open File and Read CSV. 7.1.2 Managing Access Rights

You can dene individual access rights on a per-entry basis. For an example, look at the box on the right in Figure 12. In the Permissions panel you see a list of three groups for which we have assigned access rights for this folder: The groups users, simon, and rapid-i. The group users contains all users that are created. The group simon contains only the user simon, and this cannot be changed. It is the users so-called singleton group. Finally, the group rapid-i is a custom group I made that contains the user simon, among others. To edit the access rights for this entry, rst click the small edit link. For each user you can grant (green check mark) or revoke (red cross) the rights to read, write, and execute, respectively. You can remove the specications for a particular group entirely by pressing the delete button in the rightmost column, and you can insert a new group to the list of permission specications by selecting a group from the menu and pressing the plus sign.

15

Figure 12: A RapidMiner process stored in the RapidAnalytics repository. In this case, the user group simon has full access, whereas the group users is rejected. The group rapid-i has only the right to read from this folder. All other permissions are inherited from the parent folder. 7.1.3 Accessing Data in Processes

Now that we know how to access entries in our repository, lets get back to designing our rst process in RapidMiner. You should still have the empty process named Cleanse Data opened. The rst thing you probably want to do in almost every process is to load some data. To that end, you have two choices: Drag the data set Labor Negotiations from the repository tree right into the process. RapidMiner will create a Retrieve operator and set the appropriate parameter referencing this entry. Drag the data set onto the input port in the upper left corner of the process. RapidMiner will connect it in the so-called process context. To show the process context, select View, Show view, Context. Using this option has two advantages over the Retrieve operator: First, it can save space in the process view, since processes typically start with data loading operators. Second, the entries referenced in the process context are those that are displayed by the Web interface as links, as outlined in Section 7.1.1. Note that RapidMiner automatically inserted ../data/Labor-Negotiations as the repository entry parameter of the Retrieve operator or into the process context. This is a relative addressing of the repository entry: The sequence .. navigates one folder up (from the processes folder). This is a practice you should always use for two reasons:

16

You can move around folders without destroying functionality. RapidAnalytics can resolve them properly. Do not use the absolute repository name in the repository location (e.g. as //RapidAnalytics/home/data) because RapidAnalytics is an alias that only exists on your client. RapidAnalytics does not know that you are referencing it under this name in RapidMiner (you could, e.g., have several RapidAnalytics instances connected), and hence cannot resolve this name. You can, however, use absolute locations without the leading repository reference //RapidAnalytics, i.e., only the part /home/simon/data/Labor-negotiations. Before we execute our rst RapidMiner process, we rst add a bit of functionality. You may have noticed while looking at the Labor-Negotiations data set, that it contains a lot of missing values, indicated as question marks in RapidMiner. We replace these values with more useful ones. Since we do not know what the correct values are, we just replace them with the average of the respective attribute (column). This is exactly what the Replace Missing Values operator does. In the Operators view, open Data Transformation, Data Cleansing, and drag the Replace Missing Values operator into the process. Connect its input port to the process input port on the left of the process view. We must now tell RapidMiner to store the result of this process. Likewise, retrieving data from the RapidAnalytics repository, we have two choices: Choose the Store operator from the Repository Access group in the Operators view. Drag it into the process, and connect its input to the topmost output of the Replace Missing Values operator. Enter ../data/Cleansed Data as the repository entry parameter or select it using the repository location chooser available from the folder button next to this parameter. Again, RapidMiner will resolve the relative location for you. Instead, we can again use the process context as above: Just connect the topmost output of the Replace Missing Values operator to the process output port on the right and enter ../data/Cleansed Data as the rst entry in the output port list in the Context view. Here, too, you can use the folder button to bring up a repository location chooser dialog. For the same reasons, it is recommended to use the process context rather than using the Store operator. Your process should now look as depicted in Figure 13.

7.2

Remote Process Execution

You could now run your process locally on your desktop as usual in RapidMiner, pressing the blue Play button. With RapidAnlaytics, you have a more powerful solution: You can run the RapidMiner process on the server, consuming no resources on the desktop, or run multiple processes simultaneously.

17

Figure 13: Your rst RapidAnalytics process. 7.2.1 Running a Process Remotely

Open the Remote Processes view using View, Show View. This view shows one top-level entry for each RapidAnalytics installation you are connected to. To execute your process on the RapidAnalytics instance rather than on your local client machine, use the rst button in the toolbar at the top of the Remote Processes view. This item is also available directly from the Process menu. RapidMiner will show the dialog presented in Figure 14. For now, leave all options unchanged and press Ok. After a few seconds, you will see that you can open the RapidAnalytics node in the Remote Processes view. You will see an entry for the process you just started, together with information about when it started, when it completed, etc. If the process was still running, you would see at which stage it was, but for such a small process this is unlikely to happen. Furthermore, you should see the output produced by the process: The Cleansed Data table. The fact that this output (which is now stored on RapidAnalytics) is listed here is another advantage of using the process context. You can open this data in RapidMiner by selecting it and clicking the open folder icon in the toolbar of the Remote Processes view. 7.2.2 Scheduled Process Execution

In case you have long-running processes that you do not want to execute immediately, the remote execution dialog shown in Figure 14 provides the option to run a process once, but later. In that case, you can choose a date and time using a date picker. Apart from that, the behaviour is equivalent. For regular execution, you choose the option to schedule the process as a so-called cron expression. Cron expressions are a compact yet powerful way to describe repeating events. In general, they take the form:

18

Figure 14: Executing a process on a remote RapidAnalytics instance. seconds minutes hours dayOfMonth month dayOfWeek [year] For each entry you can specify a number or an asterisk (*), meaning any, or a question mark, meaning dont care. Use SUN-SAT for dayOfWeek and JAN-DEC for month. E.g., the expression 001**? * means, everyday, at 1:00 am, on everyday of the month, no matter what day of week we have. Note that you can use the asterisk only for dayOfMonth or dayOfWeek. Use the question mark for the other. For dayOfMonth, you can use L to specify the last day of the month, or use MON#2 for dayOfWeek to specify the second Monday in a month, and FRI#L to specify the last Friday. Furthermore, k/n means every n units of this interval, starting with k , so 5/20 in the minutes eld means every twenty minutes, starting at 5, so at 5, 25, and 45 minutes after the hour. The complete cron expression would then be 0 5/20 * * ? *. 7.2.3 Monitoring Job Execution

As mentioned in Section 7.2.1 you can monitor the running and completed processes in the Remote Processes view of RapidMiner. You can lter the displayed list of processes by showing only the ones executed in this RapidMiner session (make sure you use the same system time as the server does), showing only the processes of today, or by showing all processes (with a cap on the number of displayed processes). You can as well monitor the running and scheduled processes in the Web interface. To that end, select Processes, Process Scheduler from the navigation bar. You will see a screen similar to Figure 15.

19

Figure 15: The Web interface to the process scheduler. On the bottom, you can see a list of running and completed processes, together with error messages, in case they aborted abnormally. E.g., the rst process in the gure was aborted because the user entered a wrong name for the input data (the dash is missing). If a process is complete, you can directly click on the process output to navigate to the corresponding repository entry and browse the data. Using the icon in the rightmost column of the table, you can also access the log le. The log le is also accessible from the Remote Processes view of RapidMiner. The current state of long-running processes is displayed in this view, similar to the familiar RapidMiner status bar. Processes can also be stopped here. The list of processes scheduled for future execution is at the top of the page. You see a list of processes together with their last and next execution time. Each entry can be removed using the leftmost icon or temporarily disabled using the icon next to it. The entire scheduler can be paused using the link in the box on the right. This can be useful for system maintenance or before a system restart.

7.3

Accessing Processes as Services

One of the strengths of RapidAnalytics is the fact that you can access processes (or rather, their results), from outside, even without RapidMiner. To that end, we have introduced the concept of so-called services : You can simply expose RapidAnalytics processes as Web services and easily dene input parameters and output format. To understand this, you must rst understand the concept of macros. In RapidMiner, a process can use macros in place of any operator parameter. You can think of macros as variables that take on dierent values. To understand this concept, we re-use the process designed earlier. Recall that the Replace Missing Values operator by default replaces missing values by the respective

20

average of the attribute. For the sake of simplicity, let us assume that we want to specify the replacement value explicitly, but we want to make this particular value a congurable number. First, tell RapidMiner that the value replacement should only be applied to numerical attributes: Select the Replace Missing Values operator, set the parameter attribute filter type to value type, value type to numeric, and default to value. For the actual replacement value we can now specify the parameter replenishment value. This is a regular parameter and we could ll in a regular number here, but we use a macro: simply enter %{replacementValue}. If you would run the process now, RapidMiner would complain since the macro is not yet dened. Besides dening input and output, this is the third and last functionality of the Context view: In the bottom third of the Context view, press the Add macro button to add a new (the rst) line to the macro table, enter replacementValue as the macro name and a number, say 2, as the value. Your screen should look as depicted in Figure 16.

Figure 16: A service process congurable through macros. If you run this process now, you will see that all missing values were replaced by the number 2. Dening the macro in the process context is convenient in RapidMiner, because we can edit parameters we change frequently in a single place, but it has an additional advantage. Save the process and open it in the Web interface of RapidAnalytics (Figure 12). In the box on the right, you have an action Export as service. If you click it, you will see a screen similar to Figure 17. As you see, RapidAnalytics displays a list of macros dened in the process. In our case, there is only one such macro, replacementValue. RapidAnalytics proposes to bind this macro to a service parameter of the same name. In this view, you can also make settings that aect the output of the service. We select HTML as the output format. For now, we leave the remaining settings unchanged. Click Submit, and then choose Test

21

Figure 17: Exporting a process as a service in the RapidAnalytics Web interface. from the box on the right. You will see the screen in Figure 18. As you see, you are presented a form into which you can enter a value for the replacementValue parameter. In our example we have lled in 5. On the bottom you see the output of the service: The example set in HTML format, where all missing values were replaced by 5. Despite the somehow articial toy example, this shows that RapidAnalytics services are an extremely powerful tool to embed your processes into other IT environments: In Figure 18 you also see that there is a direct link to the process and embeddable HTML code. You can use this link to embed the process into any other page, simply supplying the process macro replacementValue as a query parameter. In addition to the representation as an HTML table, you can as well generate interactive charts, images, or, machine readable formats like XML les or JSON les.

22

Figure 18: Applying a service process in the RapidAnalytics Web interface.

23

You might also like