Module Documentation

The largest obstacle to code reuse is the lack of accurate documentation. Although the code itself is often self documenting, and therefore accurate, it is not practical to expect developers to consume the code of each module they employ.

Because of the simple nature of a D2K module, we can easily identify the documentation requirements.

Module Information

Module Name

There must be a display name for the module. This name should be similar to the module class name, and should make sense to the user.

Coding guildelines:

Code sample:

  /** 
   *  Return the name of this module.
   *
   *  @return The display name for this module.
   */
  public String getModuleName() {
    return "Create Weighted Example Set";
  }

Module Description

The Module description should be entered using multiple paragraphs and key terms so that there is uniformity between the modules, and so that in the future the information may be extracted and formatted for separate documents and enhanced presentation. Follow the outline here. Some keyphrases will not be relevant for all modules. All modules must have the Overview and Detailed Description sections.

If you refer to a property anywhere in the module description section, italicize the name of the property to make it stand out. For example: "The example table is divided into Number of Folds train/test sets.", where Number of Folds is the property display name that is fully documented later in the Property Information. Note: <em> can be used instead of <i> for italics.

The module description should begin with an overview of the module functionality that is no more than a couple sentences long. This is intended to give the user an idea of the module functionality so that they can make an initial determination as to the applicability of the module to their needs. The overview will likely be extracted and included in a separate document listing all modules. The first paragraph should contain the overview and should be tagged with the keyphrase <p>Overview:

Following the overview, the module functionality must be fully described to the extent that the user knows what to expect from the module and the testers know how to test correctness of the module. Include comments on how missing values are handled and similar "boundary" information. Use a new paragraph and keyphrase to start this section: </p><p>Detailed Description: For long descriptions, text in this section should be broken into multiple paragraphs by inserting </p><p> directives into the text.

For some modules, for example those implementing well-know model-building algorithms, references to relevant papers, texts, technical reports, or other descriptive resources should be provided in the next section. Use a new paragraph and keyphrase to introduce the reference information: </p><p>References:

Document cases where the module is not applicable. For example, if the model can only use nominal input types or predict continuous values, note that in this section. If the module fails when nominal attributes have a large number of distinct values, document it. These are the types of things that are not checked by the port compatibility checks. Separate different types of constraints listed in this section into different paragraphs. Use a new paragraph and keyphrase to introduce this section </p><p>Data Type Restrictions:

Discuss how the module treats the table data on its input ports. In particular, if it makes modifications to any of the data, note it. This would be expected behavior for modules that take mutable tables, for example. That said, it's worth highlighting for the user so that they don't create an itinerary where a mutable table is directed to a splitter, and then the output of the splitter goes to 2 modules that both modify the data. Not good! Some modules may make a copy of the input data and modify that. Here the input data is intact. Put the information about treatment of input data in a section with the keyphrase </p><p>Data Handling:

Document scalability issues. These may be memory, compute, or navigation time in the case of visualizations. If the module does not scale well for a large number of examples or for a large number of attributes, document that in this section. This section should begin in a new paragraph with a keyphrase </p><p>Scalability:

Non-standard module enabling conditions must be clearly described. The default behavior is that a module must have all its inputs before it will fire. Document any deviation from this standard. If the module overrides isReady(), the behavior must be documented. This section begins with the keyphrase </p><p>Enabling Conditions:

Coding guildelines:

Code sample:

  /** 
   *  Return information about the module.
   *
   *  @return A detailed description of the module.
   */
  public String getModuleInfo() {
    String s = "<p>Overview: ";
    s += "This module creates a new example set where each example in the ";
    s += "set appears in proportion to a weighting factor determined by the ";
    s += "value of a feature in the example. ";
    s += "</p><p>Detailed Description: ";
    s += "This module can be used to sub-sample or to amplify the original ";
    s += "example set. It is useful when you want to give more weight ";
    s += "to some examples than to others, based on a feature that is in ";
    s += "the data. For instance, you may want to take years of experience ";
    s += "into account when processing patient diagnostic recommendations. ";
    s += "</p><p>";
    s += "Properties are used to select the feature that will ";
    s += "be used as the weighting factor, the multiplier factor which ";
    s += "controls whether sub-sampling or amplification will occur, ";
    s += "the maximum number of examples in the output, and other options. ";
    s += "</p><p>Data Type Restrictions: ";
    s += "The feature used as the weighting factor must be a numeric value. ";
    s += "</p><p>Data Handling: ";
    s += "The module does not destroy or modify the input data. ";
    s += "</p><p>Scalability: ";
    s += "The module requires enough memory to make a copy of the input data. ";
    s += "If the examples are being amplified, additional memory is required. ";
    s += "The module does not make a complete copy of each example that ";
    s += "appears multiple times, instead using multiple pointers to the same ";
    s += "copy of the example, so the memory requirements are less than ";
    s += "the total number of output examples * the size of an example when ";
    s += "examples appear multiple times. ";
    s += "</p>";
    return s;
  }

Input & Output Port Information

Input and Output Display Name:

This information is displayed when the user runs the mouse over an input or output port and is the port name used in the Component Info display under the Input/Output section.

Coding guildelines:

Code sample:

  /** 
   *  Return the name of a specific output.
   *
   *  @param i The index of the output.
   *  @return The name of the output.
   */
  public String getOutputName(int i) {
    switch (i) {
      case 0:
        return "Prediction Table";
      case 1:
        return "Decision Tree Model";
      default:
        return "No such output";
    }
  }

Input and Output type information:

This information is used for port compatibility checks. This information appears in the Legend and under the Input/Output Display Name in the Component Info Pane

Coding guildelines:

Code sample:

  /**
   * Return a String array containing datatypes of the outputs to this module.
   *
   * @return The datatypes of the module outputs.
   */
  public String[] getOutputTypes() {
    String[] out = { "ncsa.d2k.modules.core.datatype.table.PredicitonTable",
        "ncsa.d2k.modules.core.prediction.decisiontree.DecisionTreeModel" };              
    return out;
  }

Input and Output information:

There must be descriptive text for all inputs and outputs. This information appears under the Input/Output type in the Component Info Pane.

Coding guildelines:

Code sample:

 /** 
  *  Return a description of a specific output.
  *  
  *  @param i The index of the output.
  *  @return The description of the output.
  */
  public String getOutputInfo(int i) {
    switch (i) {
      case 0:
        String s = "The original example table with an extra column ";
        s += "containing predictions for each row in the data set. ";
        return s;
      case 1:
        return "A reference to the decision tree model that was created.";
      default:
        return "No such output";
    }
  }

Property Information

In this section we only consider properties that are to be seen and possibly updated by the user. Every module with properties must implement the getPropertiesDescriptions() method, which returns an array of PropertyDescription instances. Each property that is to be seen by the user has a corresponding PropertyDescription instance. The first field in a PropertyDescription is the property name. It must agree with names used in the getter/setter methods in the module.

Property Display Order:

Properties must be presented in an order that is sensible. Properties that are not to be modified by the user should not be displayed.

Coding guildelines:

Property Display Names:

Properties should be given display names that will make sense to the user. These names appear next to property input boxes in the property dialog. These names will appear in the Component Info Pane under the Properties section.

Coding guildelines:

Property Description:

Property descriptions must be entered. The user should be given an idea of valid ranges/selections. The user should be given enough information to have an understanding of the implications of one setting over another.

These descriptions will be included in the Component Info Pane under the (not yet created) Properties section.

Properties that appear in many modules (label name for example) should be identified and documented consistently in every occurrence across all the modules that have that property.

Coding guildelines:

Code sample:

  /** 
   *  Return an array with information on the properties the user may update.
   *
   *  @return The PropertyDescriptions for properties the user may update.
   */
  public PropertyDescription [] getPropertiesDescriptions() {
    PropertyDescription [] pds = new PropertyDescription [2];

    pds[0] = new PropertyDescription( "scalarTypeIsBoolean",
             "Use Boolean type for new scalar columns",
             "Controls whether converted nominal columns will have scalar type " +
             "boolean (true) or type int (false)." );
  
    pds[1] = new PropertyDescription( "verboseOutput",
             "Generate verbose diagnostic output",
             "Controls whether vervose output will be generated for debugging." );

    return pds;
  }

  private boolean scalarTypeIsBoolean = true;
  public void setScalarTypeIsBoolean( boolean value ) {
    scalarTypeIsBoolean = value;
  }
  public boolean getScalarTypeIsBoolean() {
    return scalarTypeIsBoolean;
  }

  private boolean verboseOutput = false;
  public void setVerboseOutput( boolean value ) {
    verboseOutput = value;
  }
  public boolean getVerboseOutput() {
    return verboseOutput;
  }

User Dialogs

User Dialogs are the dialog boxes that pop up when an itinerary is running that accept user input. Filename selection dialogs and input/output feature selection are examples of these. Follow the guidelines for display order and display names from the properties section for user dialogs. Currently there is no way to provide and display further information for the fields of the user dialogs. However, that is in the works. As you work through the modules, please consider adding a comment that describes each field in the user dialog. If the comments are there, they can later be used to "fill in the blanks" as appropriate to support the setting and retrieval of detailed descriptions.