ALG Logo
About ALG
Tools
Downloads
Projects
Case Studies
DocumentsLogin
Overview
D2K - Data to Knowledge
D2K Streamline
I2K - Image to Knowledge
T2K - Text to Knowledge
E2K - Evolution to Knowledge
ThemeWeaver
D2K - Data to Knowledge Overview

Tech Notes

Documentation

FAQs

Download / Licensing

Demos and Tutorials



In order to facilitate our research activities, ALG has, over the last few years, developed the D2K application environment for data mining. D2K - Data to Knowledge is a rapid, flexible data mining and machine learning system that integrates analytical data mining methods for prediction, discovery, and deviation detection, with data and information visualization tools. It offers a visual programming environment that allows users to connect programming modules together to build data mining applications and supplies a core set of modules, application templates, and a standard API for software component development. All D2K components are written in Java for maximum flexibility and portability.



Features and Functionality

Major features that D2K provides to an application developer include:

Visual Programming System Employing a Scalable Framework

Robust Computational Infrastructure
BulletEnables processor intensive applications
BulletSupports distributed computing
BulletEnables data intensive applications
BulletProvides low overhead for module execution

Flexible and Extensible Architecture
BulletProvides plug and play subsystem architectures and standard APIs
BulletPromotes code reuse and sharing
BulletExpedites custom software developments
BulletRelieves distributed computing burden

Rapid Application Development (RAD) Environment

Integrated Environment for Models and Visualization

D2K components

D2K Screen Shot
D2K Module Development

The Automated Learning Group has developed hundreds of modules that address every part of the KDD process. some data mining algorithms implemented include Naive bayesian, Decision Trees, and apriori, as well as visualizations for the results of each of these approaches. In addition, we have developed modules for cleaning and transforming data sets and a number of visualization modules for deviation detection problems. Modules have also been created for specific projects and collaborations.

We are continuing development of modules with the short-term goal of enhancing our cleaning and transformation modules, improving the data mining algorithms and continuing development of feature subset selection modules. Long-term, we plan to continue devleopment of modules for predictive modeling, image analysis and textual analysis, particularly toward enabling them for distributed and parallel computing. This work enables us to make the latest research developments available to be used on real-world applications.


D2K-driven Applications

D2K can be used as a stand-along applications for developing data mining applications or developers can take advantage of the D2K infrastructure and D2K modules to build D2K-driven applications such as the ALG applications ThemeWeaver and I2K-Image to Knowledge. These applications employ D2K functionality in the background, using modules dynamically to construct applications. They present their own specialized user interfaces specific to the tasks being performed. More information on ThemeWeaver and I2K can be found in the Tools section of this website. Advantages of coupling with D2K to build highly functional data mining applications such as these include reduced development time through module reuse anad sharing, and access to D2K distributed computing and parallel processing capabilities.






    Copyright © 2004 The Board of Trustees of the University of Illinois, All Rights ReservedNCSA Logo