The D2K Toolkit


Installing the D2K Toolkit

D2K requires a Java Virtual Machine (JVM) version J2SE (Java 2 Standard Edition) 1.3.1 or later to run. If you do not already have a proper JVM installed, first download one from http://java.sun.com. Any earlier JVM installations should be removed to avoid conflicts. If you do not want to remove an earlier JVM installation, workaround instructions are provided later. Users planning to develop D2K Modules or other D2K components should download a Java Software Development Kit (SDK), which includes Java development tools as well as a Java Runtime Environment (JRE). Other users should download a Java Runtime Environment (JRE) only.

Next, navigate to http://alg.ncsa.uiuc.edu and download a D2K installer. Site registration may be required. If you already have an installer, locate it on your hard disk. Quit all running programs, then double click on the installer icon. The installer wizard will guide you through the rest of the installation process.

If you have an earlier JVM installed that you do not want to remove, you will need to edit the <D2K-INSTALL-DIRECTORY>/d2k.lax file. Find and change the lax.nl.current.vm property to point to the correct JVM.

The d2k.lax file contains other important properties that all users should be aware of when installing. For example, the Java classpath and the maximum heap size can be changed by editing the lax.class.path and lax.nl.java.option.additional properties respectively. To make the D2K Toolkit aware of already installed .jar files, modify the Java classpath property.

The D2K Toolkit does not include any database drivers. These will need to be downloaded separately. Place any database drivers in <D2K-INSTALL-DIRECTORY>/modules or add the drivers to the classpath in the d2k.lax file as described above.

You are now ready to launch the D2K Toolkit.

D2K Toolkit Window

The D2K Toolkit, shown in figure 6, provides the most flexible and feature-rich user interface for composing itineraries and controlling knowledge discovery tasks. It supports the creation of data flow graphs where different data mining methods can build and save models, and where results can be visualized. The D2K Toolkit was designed to effectively use screen real estate, with panes that can be expanded on the side and bottom when needed.

D2K Toolkit Window
Figure 6. D2K Toolkit Window

1  Workspace

The Workspace is the large region on the right side of the Toolkit window where data mining applications are constructed. Modules are connected to create itineraries in this space.

2  Resource Panel

The Resource Panel is the area to the left of the Workspace that contains software components required to build a complete itinerary or data mining application. These components are persistent, and are available to every session. Tabs are provided to access each type of component — Modules, Models, Itineraries, and Visualizations. Expand the Resource Panel by clicking on a tab. The Workspace will slide to the right and the contents of the selected tab will be revealed. To close the Resource Panel, click the active tab.

3  Modules

A Module is a computational unit that follows the D2K Module API. Click on the Modules tab in the Resource Panel to view all the modules in your toolkit. The modules listed are from the modules directory specified in the Preferences. Modules may be viewed by Java package hierarchy or by D2K Module type. When a module is selected here, its associated documentation is shown in the Component Info Pane. Modules are incorporated into the Workspace using drag and drop. Once in the workspace, they can be connected to other modules.

4  Models

A model is a structure that summarizes or partially summarizes a set of data. D2K was created to facilitate building and applying models. Click on the Models tab in the Resource Panel to view all the saved models in your toolkit. The models listed are from the models directory specified in the Preferences. A model is incorporated into the Workspace using drag and drop where it can be connected to other modules. Models appear in the Resource Panel after they have been saved from the Generated Models Session Pane.

5  Itineraries

An Itinerary is essentially an application — a group of modules connected together to perform a certain task. Click on the Itineraries tab in the Resource Panel to view the itineraries in your toolkit. The itineraries listed are from the itineraries directory specified in the Preferences. When an itinerary is selected in the Resource Panel, the annotation associated with this itinerary is shown in the Component Info Pane. Double click on an itinerary icon to load that itinerary into the Workspace. Drag and drop an itinerary icon to place an itinerary as a single module. This module is called a "nested itinerary." Any inputs and outputs not satisfied in the itinerary become inputs and outputs of the nested itinerary.

6  Visualizations

Visualizations are graphical representations of data. In the D2K Toolkit, visualization objects are saved with the data necessary to recreate the visualization. Click on the Visualizations tab in the Resource Panel to view the saved visualizations in your toolkit. The visualizations listed are from the visualizations directory specified in the Preferences. Double click on a visualization icon to launch that visualization application. Visualizations appear in the Resource Panel after they have been saved from the Generated Visualizations Session Pane.

7  Generated Visualizations Session Pane

Visualizations generated during the current session appear in this pane. The pane can be opened by clicking on the icon in the top right corner, or by dragging the title bar up. Generated visualizations must be saved or they will be lost at the end of the session. Right-click on a visualization icon to launch, save or discard the visualization. Saved visualizations appear in the Visualizations Pane of the Resource Panel. Double click a visualization icon to launch the visualization. The text on the title bar of the Generated Visualizations Session Pane changes to red when visualizations are present.

8  Generated Models Session Pane

Models generated during the current session appear in this pane. The pane can be opened by clicking on the icon in the top right corner, or by dragging the title bar up. Generated models must be saved or they will be lost at the end of the session. Right-click on a model icon to save or discard the model. Saved models appear in the Models Pane of the Resource Panel. A model can be dragged directly from the Generated Models Session Pane into the Workspace. You will be promted to save the model if it is not already saved. Once in the Workspace, the model can be connected just like any other module. The text on the title bar of the Generated Visualizations Session Pane changes to red when models are present.

9  Component Info Pane

This pane shows information about the currently selected module or itinerary. When a module is selected in the Resource Panel or in the Workspace, its associated information is displayed. Displayed information includes descriptions of the module function, inputs, outputs and properties. When an itinerary is selected in the Resource Panel, its annotation is displayed.

10  Toolbar

The Toolbar contains a row of icons representing some of the most frequently used functions in D2K. A more detailed description of these icons is given in the next section.

11  Console

The Console displays output and error messages generated by the Toolkit. This feature is enabled and disabled in the Toolkit Preferences.

D2K Toolbar

The D2K Toolbar, shown in figure 7, allows rapid access to common D2K functions. The Toolbar is a tear-off region that may be positioned anywhere on the screen. The first set of buttons deals with loading and saving components. The second set of buttons provides tools for connecting modules in the Workspace. One of these connection buttons is always selected. The buttons to the far right control itinerary execution.

D2K Toolbar
Figure 7. D2K Toolbar

1  Load itinerary

Opens a File Browser for selecting an itinerary to load into the Workspace. If the current itinerary has not been saved, the user will be prompted to save the itinerary.

2  Save itinerary

Allows user to save the current itinerary. If the current itinerary has never been saved, a File Browser will be displayed requesting a filename.

3  Reload module classes

Reloads the module classes and current itinerary. This feature is used by module developers after module sources have been edited and recompiled. If the current itinerary has not been saved, the user will be prompted to save the itinerary.

4  Edit annotation

Opens a window where a description and notes about the current itinerary can be entered.

5  Connect

Tool for connecting modules in the Workspace. Clicking and dragging from the output port of one module to the input port of another module creates the connection.

6  Queue multiple inputs

Tool for creating a special module that allows the user to connect multiple output ports to one input port. By selecting this tool, and clicking and dragging from an output port, a Queue multiple inputs module is placed in the Workspace. Once this module is placed, control will be returned to the Connect Tool. Other outputs can then be connected to the input of this special module. With the Connect Tool selected, pressing and holding the "i" key and clicking and dragging from an output port also places this module in the Workspace.

7  Generate multiple outputs

Tool for creating a special module that allows the user to connect a single output port to multiple input ports. By selecting this tool and clicking and dragging from an output port, a Queue multiple outputs module is placed in the Workspace. Once the module is placed, control will be returned to the Connect Tool. This special module can now be connected to multiple input ports. With the Connect Tool selected, pressing and holding the "o" key and clicking and dragging from an output port also places this module in the Workspace.

8  Outer Itinerary

Changes the itinerary being displayed to the containing itinerary. This tool becomes active only after a nested itinerary has been drilled-down into.

9  Run

Executes the current itinerary.

10  Pause

Temporarily halts the currently executing itinerary. When the Pause operation has completed, the Checkpoint button is enabled.

11  Checkpoint

Opens a Save Checkpoint dialog for saving progress made during itinerary execution. The itinerary, serialized modules, and pending inputs are saved to a file in a location specified by the user. The checkpoint file is essentially an itinerary that can be restarted. However, checkpoints saved in the itineraries directory will not be displayed in the Resource Panel. Use Load itinerary in the File menu to load a checkpoint file.

Automatic checkpointing can be enabled using the "Configure Checkpointing" tool found in the "Tools" menu. For more information, see the "Tools" menu item descriptions below.

12  Abort

Completely stops the currently executing itinerary. Itinerary processing has been aborted when only the Run button is enabled.

13  Process Busy Bar

Indicates, with a bar moving back and forth, that the D2K Toolkit is running the itinerary.

D2K Menubar

File

Load Itinerary...   Opens a File Browser for selecting an itinerary to load into the Workspace. If the current itinerary has not been saved, the user will be prompted to save the itinerary.

Reload Module Classes  Reloads the module classes and current itinerary. Used by module developers after module sources have been edited and recompiled. If the current itinerary has not been saved, the user will be prompted to save the itinerary.

New Itinerary  Clears the current itinerary from the Workspace. If changes have been made to the itinerary, the user is prompted to save the itinerary before the Workspace is cleared.

New Window  Launches a new D2K Toolkit window.

Save Itinerary  Allows user to save the current itinerary. If the current itinerary has never been saved, a File Browser will be displayed requesting a filename.

Save Itinerary As...   Opens a File Browser for saving the current itinerary to a new file.

Print  Prints an image of the itinerary currently in the Workspace.

Quit  Exits the D2K Toolkit.



Edit

Cut  Cut selected modules from the Workspace.

Copy  Copy selected modules in the Workspace.

Paste  Paste modules, that were previously cut or copied, into the Workspace.

Clear  Deletes the modules currently selected in the Workspace from the Workspace.

Preferences...  Opens the Preferences dialog window. See section on Preferences for further information on setting preferences.

Undo  Undo the last mousing operation performed in the Workspace.

Redo  Redo the last mousing operation in the Workspace that was undone.



Tools

Itinerary Annotation...   Opens a window where a description and notes about the current itinerary can be entered.

Configure Checkpointing...   Opens a window where automatic checkpointing can be enabled and configured. If automatic checkpointing is enabled, the system will periodically pause an itinerary, create a checkpoint, and then continue without further user interaction. This setting is disabled by default and is saved with the itinerary file.

Edit Proximities...   Opens the Proximities Editor. Proximities are the mechanism for distributing itinerary execution across multiple systems. Proximities are described in a later section.

Clear Generated Models   Deletes all models currently in the Generated Models Session Pane.

Clear Generated Visualizations   Deletes all visualizations currently in the Generated Visualizations Session Pane.

Save Console Contents...   Saves the contents of the console to a file that is specified by the user in the File Browser.

Clear Console   Clears the console.

Publish Modules   Publishes the module information for all modules available to the D2K environment in the directory specified in the Preferences.



Views

Snap to Grid   If this option is checked, the modules in the Workspace are moved in grid increments. Otherwise, modules are moved continuously with mouse movements.

Show Grid   If this option is checked, grid lines are shown in the Workspace. Grid size can be set in Preferences.

Legend   If this option is checked, a legend window displays the data types used in the current itinerary. When a module’s output port is brushed, the corresponding data type is highlighted in the legend window.

Show Machine Usage   If this option is checked, a translucent pane will overlay the Workspace to display information on machine usage.



RAD

New Module...   Opens the Rapid Application Development editor tool for the creation of new D2K modules.

Edit Module...   Opens a File Browser for choosing a module, then opens the Rapid Application Development editor tool for editing the information methods of the selected module.

Edit Selected Module   Opens the Rapid Application Development editor tool for editing the information methods of the module currently selected in the Modules Pane of the Resource Panel. The source directory must be specified in the Preferences.



Help

Help...   Displays the online help tool.

FAQ...   Displays the online FAQ.

About D2K...   Displays the splash screen, D2K version number, current settings, and memory usage. Also contains a button to force garbage collection.


Setting Preferences

To set preferences for D2K, use Edit>Preferences. A tabbed pane will appear for setting the Controller, Environment, Machine and Logging preferences.


Controller

The Controller preferences, shown in figure 8, allow for customization of the D2K Workspace.

Controller Preferences
Figure 8. Controller Preferences

Grid Size   Grid Size is measured in pixels and can be adjusted by the user. The default value is 24 pixels.

Tree Type   The Tree Type setting allows the user to view D2K modules by either module type (e.g. Compute Modules, Input Modules, Output Modules, etc.) or by the Java package hierarchy.

Connector Lines   Module connections can be drawn as horizontal and vertical lines, or they can be drawn as angled lines. Select the layout you prefer using the radio buttons.

Sampling Frequency   Indicates how frequently the system execution is sampled to update the status of module progress bars during itinerary execution. This value is indicated in thousandths of a second. Therefore, a setting of 500 means progress is updated twice a second.

Component Info Font Size   Specifies the font size to use in the Component Info Pane. Larger numbers result in larger text.

Use Console   If this option is checked, a Console is displayed as a panel in the D2K Toolkit and will display all output.

Console Buffer Size   Sets the maximum number of characters to show in the console. When the number of characters goes above this watermark, the console will purge the first half of its contents from memory.



Environment

The Environment settings, shown in figure 9, tell the application where to find various files and components, and where to publish module descriptions. The values shown indicate the current settings. It is necessary to have these preferences set correctly so that modules, itineraries, models and visualizations can be found.

Environment Preferences
Figure 9. Environment Preferences

Set Source Directory   Specifies the directory where Java source files for modules are stored.

Set Modules Directory   Specifies the directory where Java class files for modules are stored.

Set Itinerary Directory   Specifies the directory where itineraries are saved.

Set Model Directory   Specifies the directory where models are saved.

Set Visualization Directory   Specifies the directory where visualizations are saved.

Set Publish Directory   Specifies the local directory where module descriptions can be saved using Tools>Publish Modules.

Set Jini URL   Specifies the URL for the Jini service that is used to discover which machines are available for remote computation.



Machine

The Machine preferences, shown in figure 10, help configure D2K to run well on a variety of systems.

Machine Preferences
Figure 10. Machine Preferences

Slow Machine   If this option is checked, the D2K window is redrawn less frequently.

Number of Threads   This setting tells D2K the number of threads to use for executing itineraries on the local machine.



Logging

The Logging preferences, shown in figure 11, allow the user to select how much debugging output they want displayed or saved. There are 5 levels of logging – debug, info, warn, error, fatal. Debug provides the most detail, and fatal the least. Logging features are now more fully supported than in previous D2K releases.

Logging Preferences
Figure 11. Logging Preferences

D2K Core Logging Level   Controls the level of output produced by the D2K Toolkit.

D2K Modules Default Logging Level   Controls the level of output produced by modules that support this feature.

Logging Targets   Allows the user to choose where logging output should be directed.