The D2K Toolkit includes support for using distributed computing to execute itineraries. The distributed computing feature enables the D2K Toolkit to execute modules on remote machines. This allows for the greatest use of resources in a highly parallel process. Each remote machine must be running a copy of the D2K Server. The D2K Toolkit interacts directly with D2K Servers to support the distributed execution of modules.
The D2K Server is implemented in Java. A Java Virtual Machine (JVM), version J2SE (Java 2 Standard Edition) 1.3.1 or later, is required by the D2K Server. If you do not already have a proper JVM installed, first download one from http://java.sun.com.
There are two different modes in which to run the D2K Server: standalone and Jini-enabled.
The standalone server is the easiest of the two to configure, but will not automatically make itself available to the D2K Toolkit. To use standalone servers with D2K Toolkit, select the Machines tab in the Proximity Editor window (Tools -> Edit Proximities) to enter the names of machines running D2K Servers.
The Jini-enabled server is more complex to configure, but allows for greater flexibility in using distributed resources. In this mode, a D2K Server registers itself with a Jini lookup service. The lookup service maintains a dynamic list of available resources. A D2K Toolkit that has also been Jini-enabled will contact the Jini lookup service to find all the remote machines available for processing modules. These machines are then presented to the user in the D2K Toolkit's Proximity Editor.
Jini must be installed and configured separately from the D2K Server. More information and downloads for Jini technology can be found at the Jini web site, http://www.jini.org.
D2K Servers must be properly installed and configured on each machine that is to be used as a distributed computational resource. The installation package for the D2K Server can be downloaded from http://alg.ncsa.uiuc.edu. Like the D2K Toolkit and D2K Server, the installer is a Java application requiring a JVM to run. The installer will guide you through the installation process.
D2K Server administrators should be aware of several command line arguments used by the Java Virtual Machine. Below is an example of an extremely simple command to run the D2K Server, followed by a description of the different options:
java -Xms512M -Xmx1024M -server -cp log4j-1.2.5.jar:classes ncsa.d2k.core.proximities.remote.D2K
-Xms512m This option allows the user to specify how much space to allocate for the Java VM's heap when the virtual machine first starts up. The initial size is specified in megabytes by specifying the number of megabytes followed by an "M". The option here is identified by the -Xms option; the text following specifies that 1024 megabytes should be assigned.
-Xmx1024M This option specifies how large the Java VM's heap may grow before running out of memory. The max size, like the initial size, is specified in megabytes by specifying the number of megabytes followed by an "M". The option here is identified by the -Xmx option; the text following specifies that 1024 megabytes should be assigned.
-server This option tells the Java VM that it should run in a mode that improves performance at the cost of memory efficiency. This option is often used when running a server application, and is recommended when running the D2K Server, although it is not required.
-cp This is the classpath option that tells the Java VM where to find the Java class files required to run the Java application, in this case the D2K Server. For the purposes of this example, one jar file and one directory were included. Jar files are a convenient way to archive java class files. They can also be stored in a directory, as indicated by the second entry in the class path. Entries in the class path are separated by ":" on Unix and Macintosh systems, and by a ";" on Windows systems.
The last parameter in this command, "ncsa.d2k.core.proximities.remote.D2K," is the name of the Java class file that will start the server. For a complete list of Java command line options, refer to the J2SE documentation, or the man pages for Java on Unix systems. There are many more options than those listed above. These are the most useful when running the D2K Server.
The provided command line example is not necessarily the command that should be used to execute a D2K Server. This is intended only as an example of how the use the Java command line. The D2K Server application has its own set of command line options described later.
Java applications often have configuration files called properties files. This is true of the D2K Server as well. The server will read a properties file called ".d2kV4.props" that is located in the user's home directory. This file will be created the first time a user runs the D2K Server or D2K Toolkit on a machine. The entries in this properties file are shared between the D2K Toolkit and D2K Server, so a change in the properties file affects both applications.
Java properties files are text files where named properties are set by an expression of the form <property_name>=<property_value>, where property_name is the name of the property to set, and the property_value is the textual representation of the value for that property. An example properties file may look like this:
#These are the properties of the D2K system #Wed Mar 26 12:49:10 CST 2003 made.samplerate=2000 made.debug=false num.threads=1 made.jiniUrl=test made.debugMsg=false
Although most of the properties in the .d2kV4.props file can be ignored, it may be useful to change some of them when running the D2K Server:
made.samplerate Defines the frequency at which the server will send status update information to the client application (usually the D2K Toolkit). This value is specified in milliseconds. Therefore, a value of 2000 is two seconds. The smaller the value, the more frequently the server will send status information. A smaller value degrades server performance but improves the accuracy of the status information on the client.
made.debug When this property is set to true, the server will output cryptic status information. This option should typically be set to false.
made.debugMsg Also a debug output switch. If this is set to true, debugging logs related to protocol messages between client and server will be displayed. As with the made.debug flag, this should usually be set to false.
made.jiniUrl Specifies the Jini URL. If this field is set to "auto" the D2K Server will find any available Jini servers that support the D2K protocol. Then, the D2K server will register itself with the Jini service. If Jini is not to be used, this field should be set to an empty string or the word "test".
num.threads Specifies the number of threads to be allocated. Typically, the best server performance is achieved if this value is set to the number of processors on the machine.
There are two operating modes for the D2K Server. In standalone mode, the D2K Toolkit ascertains the names of machines running D2K Servers from a user-created configuration file. In Jini-enabled mode, Jini, a system developed by Sun Microsystems, is used to provide a service by which the D2K Toolkit can automatically discover the names of available machines running D2K Servers.
There are several command line options associated with the D2K Server. These should come at the end of the Java command line so they will be parsed by the D2K Server, not the Java VM:
-jini <jini_url> Overrides the "made.jiniUrl" entry in the D2K properties file.
-port <listener_port_number> Specifies the port on which to listen for service requests from D2K Toolkits. The default is 7021, and should rarely need to be changed.
-threads <thread_count> Specifies the number of threads to allocate for processing D2K Modules. The default number of threads is specified in the d2k properties file in an entry like the following: "num.threads=1." In the D2K properties file, this number should be set to the number of processors on the machine.
A csh script, <D2K-SERVER-INSTALL-DIRECTORY>/run_standalone.sh, is included in the D2K Server installation package to run the server in standalone mode. This file will launch the D2K Server in the most basic way:
#!/bin/csh set CLASSPATH=log4j-1.2.5.jar:infrastructure.jar java -server -Xms1024M -Xmx1024M -cp $CLASSPATH ncsa.d2k.core.proximities.remote.D2K
In the classpath, there are two jar files. The first, log4j-1.2.5.jar, contains support classes the D2K Server needs to run. The second, infrastructure.jar, contains the D2K Server itself.
To configure the D2K Toolkit to use standalone servers, access the Preferences in the Edit menu. In the Preferences window, click on the Environment tab. The last button in this panel will allow you to enter the Jini URL. For use in the standalone mode, this field should be set to "test". Next, create a text file named "machines.txt" in your home directory. This text file needs to contain, on each line, the name of the machine running a D2K Server, and its number of processors. Below is an example of the contents of a machines.txt file:
sausage.ncsa.uiuc.edu, 8 cantaloupe.ncsa.uiuc.edu, 4
This indicates two machines running D2K Servers, one
named sausage.ncsa.uiuc.edu, and another named
cantaloupe.ncsa.uiuc.edu.
The first machine has 8 processors, the second has 4 processors.
When the D2K Toolkit Proximity Editor is displayed,
these two machines will be displayed in addition to the local
machine.
On the server side, the entry in the .d2k.props file specifying the Jini URL must also be set to "test". Alternatively, you could add the option to the command line:
java -server -Xms1024M -Xmx1024M -cp $CLASSPATH ncsa.d2k.core.proximities.remote.D2K -jini test
The D2K Toolkit discovers D2K Servers available for module processing based on your distributed computing configuration. If you have configured your Toolkit for standalone distributed computing, servers will be read from the machines.txt file in your home directory. On the other hand, if you have configured your Toolkit for Jini-enabled distributed computing, servers will be retrieved from the Jini registry service specified in the Jini URL Environment preference.
The Proximity Editor is used to assign modules of an itinerary to D2K Servers for processing. When an itinerary is run, the D2K Toolkit will automatically handle the distribution of module execution across the specified machines. The Proximity Editor is displayed using a table of machines and modules. The column labels across the top are the names of machines. In parentheses, next to the machine name, is the number of processors available on that machine. The row labels along the left hand side are module names of the currently loaded itinerary. The checkboxes in the table cells allow the user to associate a module with a machine for processing.
When assigning modules to machines for processing, some
important limitations do exist. User interface modules
cannot be assigned to remote machines. Since remote machines
are not capable of displaying user interfaces on the local
machine, this is a logical limitation. Although most modules
can only be assigned to one machine, modules that extend
ReentrantComputeModule or
OrderedReentrantModule can be assigned to
multiple machines. Reentrant modules can clone themselves
and run in tandem on different servers, operating
simultaneously on different pieces of data.
By default, distributed computing is disabled for all itineraries. The default port number for D2K Servers is 7021. The D2K Toolkit will expect to find all proximities running on the port specified in the Proximity Editor.
Although the D2K Toolkit provides a convenient environment for creating and running itineraries, you will occasionally need to modify and execute itineraries without using the Toolkit's graphical user interface (GUI). For example, a computationally intensive itinerary may require a very powerful machine to finish execution in a reasonable amount of time. Such a machine may have limited video capabilities and thus would be unable to display the Toolkit's user interface. Or, in another example, an itinerary may need to be run 1000 times, each run with a slightly different property value. Rather than set the property after each run using the Toolkit interface, it would be more practical to automate the process by "scripting" the change in value. Both scenarios would benefit from a "GUI-less" mode of using D2K. Using D2K without the GUI is referred to as running headless. Each execution mode (GUI and Headless) is activated by passing the java command line the appropriate class name. More on this in the next section.
To run D2K via the command line the user may use one of the following:
To run the toolkit with GUI use the folowing syntax:
java [JAVA OPTIONS] ncsa.d2k.gui.ToolKit [D2K OPTIONS]
To execute D2K headless use the folowing syntax:
java [JAVA OPTIONS] ncsa.d2k.batch.CommandLineD2K [D2K OPTIONS]
Where D2K OPTIONS are:
-noremoteclassloader disable remote class loading.
-load <file name> This option specifies an itinerary to load. If a full path name is not specified, D2K will look in the itineraries directory to find the itinerary. If this is a headless execution, the itinerary will be loaded, any script specified will be applied, and the itinerary will execute. Otherwise, the itinerary will simply be loaded into the D2K Toolkit Workspace.
-jini <jiniurl> Identifies the Jini URL to use when searching for Jini enabled D2K services. This option overrides the setting in the D2K properties file.
-script <filename> If this option is included, the script in the given filename will be applied to the loaded itinerary, or if no itinerary is loaded, the script can be used to create an itinerary. This option is ignored when running the Toolkit with GUI.
-threads <number threads> Use this option to specify the number of threads D2K should create and employ for the execution of the itinerary. This value is typically equal to the number of the processors on the machine running D2K. This option also overrides the setting in the D2K properties file.
-vis <vis file name> This option is used to display a previously saved visualization. Naturally it is ignored when running headless.
When D2K is run in headless mode, models and visualizations are automatically saved to the model and visualizations directories specified in the preferences.
D2K supports a number of scripting commands. These commands can be stored in a text file and applied to an existing itinerary or they can create a completely new itinerary. Supported commands are add, remove, link, unlink, set, assign, unassign, machine and port.
add <module name> <class name> Add an instance of a module of the given class name to the itinerary, and name it "module name".
set <module name> <property name> <value> Set the property named "property name" of the module named "module name" to the "value". The property name here is the name as determined by the name of the setter/getter methods.
remove <module name> Remove the module named "module name" from the itinerary.
link <parent module name> <output port index> <child module name> <input port index> Connect the module named "parent module name" to the module named "child module name". Parent's output port is indicated by "output port index" and the input port is indicated by "input port index".
unlink <parent module name> <output port index> Disconnect the port at "output port index" form the module with the name "parent module name".
assign <module name> <machine> Assign the module named "module name" to the machine named "machine".
unassign <module name> <machine> Remote the module from the machine named "machine".
machine <machine name> <processors> Identify a machine to the system. The IP address of the machine is it's name, "machine name", and the last integer argument specifies the number of processors on that machine.
port <remote port> This command declares the port where remote d2k servers will listen for connections.
The following script illustrates how the above commands might be used:
set 'Apriori' minimumSupport '40.0' set 'Compute Confidence' confidence '90.0' remove 'Rule Visualization' remove 'RuleAssocReport' remove 'FanOut1' add 'Headless Rule Assoc Report' 'ncsa.d2k.modules.core.discovery.ruleassociation.HeadlessRuleAssocReport' link 'Compute Confidence' 0 'Headless Rule Assoc Report' 0 machine machine1.my.domain 4 port 7070 assign 'Apriori' machine1.my.domain
The first command of this script sets a property named
"minimumSupport" to the value 40 in the module
named Apriori. Next we set the confidence property in a
module named Compute Confidence to 90. We then remove
"Rule Visualization". We also remove "RuleAssocReport"
and "FanOut1". We add a new module of class
ncsa.d2k.modules.core.discovery.ruleassociation.HeadlessRuleAssocReport
and name it "Headless Rule Assoc Report". Then we link the
first output port of "Compute Confidence" to the first
input port of "Headless Rule Assoc Report".
The last 3 lines assign proximities to the itinrary.
Machine named machine1.my.domain with 4 processors is being identified to
the toolkit.
Then the toolkit is informed that the D2K Server on remote machines is listening
to port 7070.
Fianlly, Apriori Module is set to run on the remote machine "machine1.my.domain".
Comment lines can also be included. These lines must start with a "#".
When running D2K headless, standard user interface modules that subclass
UIModule will not run. If such a module enables during a headless D2K
run, an error message will be displayed.
To circumvent this problem, wherever possible user interface modules
should subclass HeadlessUIModule rather than UIModule.
Even though a default implementation of the doit() method is
provided in this superclass, if a user interface can perform a
useful operation without a gui, it should provide the code to do
so in the doit() method.
For more information about developing user interface modules for headless mode, please refer to the Principles of Module Development at http://alg.ncsa.uiuc.edu/tools/docs/d2k/principles/.
Here is an example batch file that runs the Apriori itinerary three times with different support values.
# !/bin/csh # the classpath includes dom4j which is used to parse XML, # antlr to parse the source codes and the jini stuff, log4j for logging. set CP=/Users/foobar/projects/modules3/core/lib/dom4j-full.jar:/Users/foobar/tmp/anto/antlr-2.7.1:/Users/foobar/projects/lib/jini-core.jar:/Users/foobar/projects/lib/jini-ext.jar:/Users/foobar/projects/lib/log4j-1.2.5.jar:classes # create the script file. echo "set 'Apriori' minimumSupport '35.0'" > script java -server -Xmx256M -Xms256M -cp $CP ncsa.d2k.batch.CommandLineD2K -load headless.itn -script script # create the script file. echo "set 'Apriori' minimumSupport '30.0'" > script java -server -Xmx256M -Xms256M -cp $CP ncsa.d2k.batch.CommandLineD2K -load headless.itn -script script # create the script file. echo "set 'Apriori' minimumSupport '25.0'" > script java -server -Xmx256M -Xms256M -cp $CP ncsa.d2k.batch.CommandLineD2K -load headless.itn -script script
In this batch file (Unix csh script), the Java classpath is first configured. For each run, a script file is generated by simply echoing the command that will change the support value property in the Apriori module. This command is described in the previous section. With this single command in the script file, D2K is run specifying the -load option including the itinerary to load, and the -script option that runs the generated script.
The minimumSupport property is changed to 35 for the first run, 30 for the second run, and 25 for the last run. None of these scripted modifications alter the saved itinerary. They only operate on the loaded itinerary.
Following is the output of the runs:
Setting minimumSupport in Apriori to 35.0 Apriori: Identified 131 frequent itemsets with 2 items that met the support criteria. Apriori: Identified 310 frequent itemsets with 3 items that met the support criteria. Apriori: Identified 379 frequent itemsets with 4 items that met the support criteria. Apriori: Identified 250 frequent itemsets with 5 items that met the support criteria. Apriori: Identified 84 frequent itemsets with 6 items that met the support criteria. Apriori: A total of 1154 frequent itemsets were found that met the specified Minimum Support of 35.0%. Apriori: Elapsed wallclock time was 6.331 seconds Compute Confidence: Beginning to compute confidence for frequent itemsets containing 2 attributes. Compute Confidence: Beginning to compute confidence for frequent itemsets containing 3 attributes. Compute Confidence: Beginning to compute confidence for frequent itemsets containing 4 attributes. Compute Confidence: Beginning to compute confidence for frequent itemsets containing 5 attributes. Compute Confidence: Beginning to compute confidence for frequent itemsets containing 6 attributes. Compute Confidence: A total of 59 rules were found that met the specified Minimum Confidence of 80.0%. Error: Can't run headless Can't run RuleAssocReport without a gui, it must subclass HeadlessUIModule. Compute Confidence: Elapsed Wallclock time was 0.704 Seconds Unused inputs in RuleAssocReport on input #0 Apriori: Total Elapsed Wallclock Time was 7.17 Seconds Agenda execution complete, elapsed time : 7184 Setting minimumSupport in Apriori to 30.0 Apriori: Identified 163 frequent itemsets with 2 items that met the support criteria. Apriori: Identified 455 frequent itemsets with 3 items that met the support criteria. Apriori: Identified 725 frequent itemsets with 4 items that met the support criteria. Apriori: Identified 712 frequent itemsets with 5 items that met the support criteria. Apriori: Identified 441 frequent itemsets with 6 items that met the support criteria. Apriori: A total of 2496 frequent itemsets were found that met the specified Minimum Support of 30.0%. Apriori: Elapsed wallclock time was 10.323 seconds Compute Confidence: Beginning to compute confidence for frequent itemsets containing 2 attributes. Compute Confidence: Beginning to compute confidence for frequent itemsets containing 3 attributes. Compute Confidence: Beginning to compute confidence for frequent itemsets containing 4 attributes. Compute Confidence: Beginning to compute confidence for frequent itemsets containing 6 attributes. Compute Confidence: A total of 477 rules were found that met the specified Minimum Confidence of 80.0%. Error: Can't run headless Can't run RuleAssocReport without a gui, it must subclass HeadlessUIModule. Compute Confidence: Elapsed Wallclock time was 1.835 Seconds Unused inputs in RuleAssocReport on input #0 Apriori: Total Elapsed Wallclock Time was 12.182 Seconds Agenda execution complete, elapsed time : 12231 Setting minimumSupport in Apriori to 25.0 Apriori: Identified 241 frequent itemsets with 2 items that met the support criteria. Apriori: Identified 749 frequent itemsets with 3 items that met the support criteria. Apriori: Identified 1323 frequent itemsets with 4 items that met the support criteria. Apriori: Identified 1433 frequent itemsets with 5 items that met the support criteria. Apriori: Identified 1005 frequent itemsets with 6 items that met the support criteria. Apriori: Elapsed wallclock time was 19.583 seconds Compute Confidence: Beginning to compute confidence for frequent itemsets containing 2 attributes. Compute Confidence: Beginning to compute confidence for frequent itemsets containing 3 attributes. Compute Confidence: Beginning to compute confidence for frequent itemsets containing 4 attributes. Compute Confidence: Beginning to compute confidence for frequent itemsets containing 5 attributes. Compute Confidence: Beginning to compute confidence for frequent itemsets containing 6 attributes. Compute Confidence: A total of 967 rules were found that met the specified Minimum Confidence of 80.0%. Error: Can't run headless Can't run RuleAssocReport without a gui, it must subclass HeadlessUIModule. Compute Confidence: Elapsed Wallclock time was 2.966 Seconds Unused inputs in RuleAssocReport on input #0 Apriori: Total Elapsed Wallclock Time was 22.689 Seconds Agenda execution complete, elapsed time : 22701
Notice that errors were generated. The itinerary that was run included a UIModule
that did not support headless execution. UIModules that do not subclass
HeadlessUIModule are ignored during itinerary execution. In this case, it
did not affect itinerary execution.
![]() |