Install RapidMiner Radoop in RapidMiner Server 

Prerequisites

The following requirements must be met before installing the RapidMiner Radoop extension on RapidMiner Server:

    RapidMiner Radoop Extension installed and tested on RapidMiner Studio. If necessary, see Configuring RapidMiner Radoop Connections to ensure that you have a valid connection to a Hadoop cluster in RapidMiner Studio.

Installing RapidMiner Radoop on RapidMiner Server

Installing the RapidMiner Radoop client on RapidMiner Server requires that you copy files from your RapidMiner Studio configuration into your RapidMiner Server installation. To install the RapidMiner Radoop Extension:

    1) Stop the server.

    2) Download RapidMiner Radoop extension from Marketplace Upload the file to the extension directory of RapidMiner Server.

    To determine the location of your RapidMiner Server extensions directory, from the RapidMiner Server home page open Administration and then System Settings. The value of thecom.rapidanalytics.plugindirsystem setting indicates the location of the directory.

    3) From your local.RapidMiner configuration directory (created by RapidMiner Studio), copy the files cipher.key and radoop_connections.xml to the server machine. Identify the user running RapidMiner Server and place these files in its home folder into the.RapidMiner subfolder.

    4) Restart the RapidMiner Server。

Changing Hadoop connections

If you change your Hadoop connection in RapidMiner Studio, you must do the following for the new connection settings to be reflected on RapidMiner Server.

    1) Stop RapidMiner Server.

    2) Re-upload radoop_connections.xml

    3) Re-start RapidMiner Server。

Configuring and securing multiple connections

In a multi-user RapidMiner Server environment, the Server administrator needs to manually edit the radoop_connections.xmlfile on the Server to make sure that all connections are included. Theradoop_connections.xml file can list an arbitrary number of connections. These connections may point to the same Hadoop cluster or may point to different clusters. They may define connections for the same user or for different users (e.g., with different Hadoop username fields).

The connection file on the Server should list all connections that may be used by any process submitted to this Server. The connection names must be the same on the Server and in the RapidMiner Studio instance that submits the process. However, RapidMiner Studio users only need to have their own connection(s) in their connection file on their client machine. An example for naming the connections: <cluster_name>_<username>, where <cluster_name> is an identifier for the Hadoop cluster and <username> is an identifier for the user (that may be the same as the value of the Hadoop username field).

Starting from RapidMiner Radoop 2.3.1, the usage of a Hadoop connection on the Server can be limited to a user or a group of users. This means that a RapidMiner Server user that is not on the optionally specified whitelist of a connection cannot use it when submitting Radoop processes. This way, the Server administrator can make sure that users cannot use connections that they are not permitted to use, and that they cannot evade this restriction by manipulating their connection identifiers in submitted processes. To define a user whitelist for a connection, add the accesswhitelist tag for the corresponding radoop-connection-entry in the radoop_connections.xml The value of this property is an arbitrary regular expression. Only RapidMiner Server users whose username matches this expression are allowed to use the connection in a submitted process. If this optional accesswhitelist is not specified for a connection, than any user can use it in a process.

                      <radoop-connection-entry><br />
    &#8230;.<br />
    <accesswhitelist>john|scott|allen</accesswhitelist><br />
</radoop-connection-entry>
                    

Connection to Hadoop clusters with Kerberos authentication

For configuring a connection to a cluster with Kerberos authentication, see Hadoop security. Please take the following notes when using these connections through RapidMiner Server.

Note: A RapidMiner Server instance can only talk to a single kerberized Hadoop cluster, more precisely, to a single Kerberos Realm. This limitation comes from the architecture of the Java Kerberos implementation. However, multiple users can use this kerberized Hadoop cluster concurrently through this RapidMiner Server instance.

All connections to a Kerberized cluster must specify the path for the user keytab file. This means that the keytab file must be accessible on the local file system of the Server. The path usually differs from the path on the local file system of the user using RapidMiner Studio with RapidMiner Radoop. The RapidMiner Server administrator must make sure that the keytabFile field of the radoop_connections.xml file on the Server points to the appropriate path on the Server. The keytab file itself on the file system should only be accessible for the user running RapidMiner Server.