ALG Logo
About ALG
Tools
Downloads
Projects
Case Studies
DocumentsLogin
Overview
D2K - Data to Knowledge
D2K Streamline
I2K - Image to Knowledge
T2K - Text to Knowledge
E2K - Evolution to Knowledge
ThemeWeaver
D2K - Data to Knowledge Overview

Tech Notes

Documentation

FAQs

Download / Licensing

Demos and Tutorials



General

1. How do I download D2K?

2. Which operating system do I need to run D2K?

3. Which Java virtual machine do I need to run D2K?

4. Launching a newly installed D2K Toolkit is failing, what could be wrong?

5. Where should I put jar files used by my modules?

6. How can I view the Java console when running D2K?

7. How can I change the Java virtual machine that runs D2K?

8. How can I add extra Jar files to the classpath for D2K?

9. What should be the classpath for a headless execution?

10. How can I increase the maximum heap size for D2K?

11. How can I save the output of D2K to a file?

12. What is a module?

13. What is an itinerary?

14. What is the modules directory?

15. How do I set the modules directory?

16. Where do I put my compiled modules?

17. What is the itineraries directory?

18. How do I set the itineraries directory?

19. What can I do when D2K runs out of memory?

20. I see Java 3D is required for some D2K Modules. Where do I get Java 3D?


D2K Toolkit

1. What is the D2K Toolkit?

2. Why doesn't the D2K Toolkit compile my modules?

3. How do I use the D2K Toolkit?

4. Can my module code throw an Exception?

5. How do I edit proxmities via a D2K script for a headless run?

6. Are objects passed by value or by reference via the itinerary pipes?

7. Out of Memory Error


Writing Modules

1. What are the different types of modules, and what do they do?

2. What is a ReentrantComputeModule?

3. What Jar files need to be in my classpath to compile modules?

4. How do I create a loop in the work flow?




General

1. How do I download D2K?Top

1) Register as a new user by following the provided link on this page: https://alg.ncsa.uiuc.edu/do/home/login/edit 2) Based on your stated academic affiliation, your account will either be automatically validated, or an ALG staff member will validate your account by hand and send you an email stating that you have been granted download privileges. 3) Upon receipt of the email, log into the ALG website with the username and password you chose when you registered. Proceed to download any product. 4) When you attempt to download, the ALG system will check for the presence of a download ticket. If you do not have a download ticket, one will be emailed to you. 5) Click on the link provided in the ticket email. This will bring you back to the ALG website and validate your ticket. Validated tickets are only active for 60 minutes. After that time, you will need to get a new ticket. 6) Follow the links on the ticket validation confirmation page to download any product you have access to.

2. Which operating system do I need to run D2K?Top

D2K runs on any operating system with the proper Java virutal machine installed. This includes but is not limited to: Windows 95/98/NT/2000, Unix (including Solaris, IRIX, and Linux), and MacOS X.

3. Which Java virtual machine do I need to run D2K?Top

J2SE 1.3 or later is required. We have not tested extensively with the latest 1.5 JVM. So we packaged a 1.4.x JVM with the latest installer of D2K 4.1.1.

4. Launching a newly installed D2K Toolkit is failing, what could be wrong?Top

The first time D2K is run, it creates several directories in the location where D2K toolkit is installed. If the user does not have permission to write to that directory, the launch will fail. Creating the directories manually or chaning the permissions to this directory solves this problem. The directories to be created are: itineraries, models, modules, visualizations. They should all be subdirectories of the directory where D2K toolkit is installed.

5. Where should I put jar files used by my modules?Top

Such jar files should be located under the modules directory. However, if a ClassDefNotFoundException is still thrown, adding the jar to the lib directory usually solves the problems. Whereever you put the jar files, be sure they are added to the classpath D2K is using.

6. How can I view the Java console when running D2K?Top

This can be enabled or disabled through the Controller tab in the Preferences window. The D2K console will be displayed in a scrollable pane at the bottom of the D2K ToolKit window. This is especially useful on Windows 9x, which does not offer scrolling. The output can be saved to a file at any time by using the Save Console Contents option in the Tools menu. The standard console will direct output to the system console from which D2K was invoked. Output can be sent directly to a file from the console preferences as well.

7. How can I change the Java virtual machine that runs D2K?Top

D2K is installed using the InstallAnywhere product. On Windows and Unix platforms, the runtime parameters of D2K are customizable by editing the <INSTALL-DIRECTORY>/d2k.lax file. If you want to change the Java Virtual Machine, modify the lax.nl.current.vm item. On the Mac OSX platform, you should edit the <INSTALL-DIRECTORY>/D2KToolkit/D2KToolkit.app/Contents/Info.plist file. This file is only visible when using the Terminal application. Open the file, then search for "JVMVersion". Change the contents of the "string" element.

8. How can I add extra Jar files to the classpath for D2K?Top

On Windows and Unix platforms, the classpath for D2K can be modified by editing the d2k.lax file. Add any additional Jar files to the lax.class.path item. On the Mac OSX platform, you should edit the <INSTALL-DIRECTORY>/D2KToolkit/D2KToolkit.app/Contents/Info.plist file. This file is only visible when using the Terminal application. Open the file, then search for "ClassPath". Add an entry to the "array" element. For example, "<string>lib/my.jar</string>".

9. What should be the classpath for a headless execution?Top

The classpath for headless execution should consist of the modules directory and all jar files under the lib and the modules directory.

10. How can I increase the maximum heap size for D2K?Top

The heap size can be changed by editing the d2k.lax file. The maximum heap size is set to 100 MB by default. This can be changed by editing the lax.nl.java.option.additional item. On the Mac OSX platform, you should edit the <INSTALL-DIRECTORY>/D2KToolkit/D2KToolkit.app/Contents/Info.plist file. This file is only visible when using the Terminal application. Open the file and search for "VMOptions". Within the "array" element, change the text "-Xmx250M" to a higher value. For example, "-Xmx512M".

11. How can I save the output of D2K to a file?Top

The contents of the D2K console can be saved to a file using the Save Console Contents option, or the InstallAnywhere product can direct the output to a file. To do this, disable the D2K console and edit the lax.stderr.redirect and lax.stdout.redirect items for standard error and standard out, respectively.

12. What is a module?Top

A module is a compiled Java class that performs some function. A module has 0 or more inputs and generates 0 or more outputs.

13. What is an itinerary?Top

An itinerary is a collection of modules connected together to perform a certain task.

14. What is the modules directory?Top

The modules directory is the directory that D2K looks in for your modules. Modules should go in here according to their package hierarchy. For example, if you have a module named Foo with the package org.bar, you would put this class in <MODULES-DIRECTORY>/org/bar.

15. How do I set the modules directory?Top

The modules directory is set in the user preferences dialog. Under the Edit menu, choose Preferences... Next, click the Environment tab. Now click on the Module Directory button. A file dialog box appears. Navigate to your modules directory, and click Choose Directory. By default, this is set to the directory named modules in the directory that D2K was installed in.

16. Where do I put my compiled modules?Top

Compiled modules go into your modules directory. Take notice to preserve the packages hierarchy for your modules. For example: If your module's class is in package foo.bar.modules then it should be placed under directory <modules_directory>/foo/bar/modules/

17. What is the itineraries directory?Top

The itineraries directory is the directory that D2K looks in for your itineraries.

18. How do I set the itineraries directory?Top

The itineraries directory is set in the user preferences dialog. Under the Edit menu, choose Preferences... Next, click the Environment tab. Now click on the Itinerary Directory button. A file dialog box appears. Navigate to your itineraries directory, and click Choose Directory. By default, this is set to the directory named itineraries in the directory that D2K was installed in.

19. What can I do when D2K runs out of memory?Top

If you are loading data from a delimited file, then memory can be reduced if you add a data type line to your dataset. There are many more memory allocations if the data is represented as text, as each entry is associated with its own object. If the data for a column can be an int, float, or double, there is one memory allocation for the entire row, that being an array of the primitive data type.

20. I see Java 3D is required for some D2K Modules. Where do I get Java 3D?Top

https://java3d.dev.java.net/


D2K Toolkit

1. What is the D2K Toolkit?Top

The D2K Toolkit provides the most flexible and feature-rich interface for creating itineraries and controlling knowledge discovery tasks. Very complex data flow graphs can be built to compare the accuracy of different data mining methods, to visualize results, and to save models for later use.

2. Why doesn't the D2K Toolkit compile my modules?Top

This feature will probably be included at some point in the future.

3. How do I use the D2K Toolkit?Top

Please read the User Manual.

4. Can my module code throw an Exception?Top

Yes. If an Exception is thrown from the module's code it is catched by the toolkit and transformed into a message box, detailing the type of the exception and its message, if any. So it is highly recommended to add messages to thrown Exceptions in the module's code.

5. How do I edit proxmities via a D2K script for a headless run?Top

Please refer to "Advanced Topics" chapter, "Scripting Itinerary Modifications" section in the User Manual.

6. Are objects passed by value or by reference via the itinerary pipes?Top

Objects are passed by reference in an itinerary. A fan out does not create copies of the input object, but rather just duplicates the references to that object. If 2 modules are connected to a fan out as an input, and one is altering the input object, the behavior of the other is unexpected, as it is unpredictable in what order will the 2 modules be executed.

7. Out of Memory Error Top

Running the Toolkit with GUI using console that nay be using significant amounts of resources, can be one option In such case try either turning off the console (via Edit menu -> Preferences) or better yet running in nogui mode, and redirecting the output to a file: java ncsa.d2k.batch.CommandLineD2K -load itin > outputfile Another reason can be the "Sampling Frequency" that is set via the preference window. The default is to take readings every second. Try increasing it to every 20 minutes


Writing Modules

1. What are the different types of modules, and what do they do?Top

There are six basic module types. ComputeModules perform a computation on some data. InputModules read data in from an input source. OutputModules output processed data in some format (to the console, to a file, etc). DataPrepModules prepare data in some way. UIModules show a graphical user interface to the user. VisModules show a visualization of results to the user. Modules should be serializable, in order to be able to run on a remote machine.

2. What is a ReentrantComputeModule?Top

Subclasses of this module can be cloned and run in parallel on multiple threads. When data is passed to this module, it will check to see if it has a clone that is not busy. If so, the data is passed to the clone. If there are no clones available, but there are more threads available in the thread pool, it will create another clone and pass the data to it. ReentrantComputeModules do not guarantee that the outputs will be produced in the same order as the inputs that produced them.

3. What Jar files need to be in my classpath to compile modules?Top

d2k.jar contains all the base classes from D2K. If you import any packages outside of the standard Java SDK, you will need to include those items in your classpath. Those Jar files will also need to be added to the classpath for D2K in order to use the modules within the D2K Toolkit. See the related topic for details.

4. How do I create a loop in the work flow?Top

In general one just add another output pipe to the "last" module on the work-flow and hook it to an additional input pipe in the "first" module in the itinerary. This connection can serve to pass information encapsulated in a java object, or just to send a trigger to the first module, and tell it to start a new iteration. Take special notice to override the isReady() method of the "first" module. The "first" module will be ready to execute if this is the first call to doit() or if there is an object waiting in the input pipe that closes the loop.






    Copyright © 2004 The Board of Trustees of the University of Illinois, All Rights ReservedNCSA Logo