Joint ACM Java Grande - ISCOPE 2002 Conference

Seattle, Washington, November 3-5, 2002

Sponsored by ACM SIGPLAN

Home

Call for Participation

Program
  Tutorials
  List of Papers
  List of Posters

People

Java Grande Charter

Related Links

Old:
CFP (TXT)
Paper Submissions page

JGI'02 - Papers Accepted

Keynote #1: Pratap Pattnaik, IBM, on Autonomic Computing
Click here for PDF of slides

Abstract:

The goal of autonomic computing is the reduction of complexity in the
management of large computing systems. The evolution of computing systems
faces a continuous growth in the number of degrees of freedom that the
system must manage in order to be efficient. Two major factors contribute
to the increase in the number of degrees of freedom: Historically,
computing elements, such as CPUs, memory, disks, network, etc., have
non-uniform advancements. The disparity between the capabilities/speeds
of various elements opens up the opportunity for each element to introduce
a number of different strategies depending upon the environment the
element is encountering. This eliminates the possibility of developing a
tightly managed global state. In turn, this calls for a dynamic strategy
to make judicious choices for achieving targeted efficiency. Secondly,
the systems tend to have a global scope in terms of the demand for their
services and the resources they employ for rendering the services. Changes
in the demands/resources in one part of the system can have a significant
effect on other parts of the system. Recent experiences with web servers
(related to popular events like the Olympics) emphasize the variability and
unpredictability of demands and the need to rapidly react to the changes.
A system must perceive the changes in the environment and must be ready to
react with a variety of choices, so that suitable strategies can be quickly
selected for the new environment.

The autonomic computing approach is to orchestrate the management of the
functionalities, efficiencies, and the qualities of services of large
computing systems through logically distributed, autonomous controlling
elements, and to achieve a harmonious functioning of the global system
within the confines of its stipulated behavior, while individual elements
make locally autonomous decisions. In this approach, one moves from a
resource/entitlement model to a goal-oriented model. In order to
significantly reduce system management complexity, one must clearly
delineate the boundaries of these controlling elements. The reduction in
complexity is achieved mainly by making a significant amount of decisions
locally in these elements. This enables the separation of subsystems into
various ranges of space and time.

In this talk we will review some of the major examples of autonomic
computing at various levels of the computing hierarchy, and will discuss a
framework to study them. Also we will discuss some of the major challenges
and areas of autonomic computing that require significant theoretical
inventions.

Keynote #2: Alexander Stepanov, The Future of Abstraction

Abstract

Abstraction in programming languages is a mechanism that allows us to
group together similar sections of code, and, by doing so, better
organize our programs. C++ provides two different language constructs
to deal with abstraction: inheritance and templates. Both these
mechanisms, however, are flawed. After presenting the relative flaws
of inheritance and templates, a new, unifying mechanism called
concepts is proposed.

Title: Aggressive Object Combining

Authors: Ronald Veldema, Vrije Universteit
Ceriel Jacobs, Vrije Universteit
Rutger Hofman, Vrije Universteit
Henri Bal, Vrije Universteit

Title: Ravenscar-Java: A High Integrity Profile for Real-Time Java

Authors: Jagun Kwon, University of York
Andy Wellings, University of York
Steve King, University of York

Title: JavaSymphony: New Directives to Control and Synchronize Locality,
Parallelism, and Load Balancing for Cluster and GRID-Computing

Authors: Thomas Fahringer, Institute for Software Science -
University of Vienna
Alexandru Jugravu, Institute for Software Science -
University of Vienna

Title: Run-time Evaluation of Opportunities for Object Inlining in Java

Authors: Ondrej Lhotak, School of Computer Science, McGill University
Laurie Hendren, School of Computer Science, McGill University

Title: Elimination of Java Array Bounds Checks in the Presence of
Indirection

Authors: Mikel Lujan, University of Manchester
John R. Gurd, University of Manchester
T.L. Freeman, University of Manchester
Jose Miguel, UPV-EHU, Spain

Title: Ibis: an Efficient Java-based Grid Programming Environment

Authors: Rob V. van Nieuwpoort, Vrije Universiteit
Jason Maassen, Vrije Universiteit
Rutger Hofman, Vrije Universiteit
Thilo Kielmann, Vrije Universiteit
Henri E. Bal, Vrije Universiteit

Title: Abstracting Remote Object Interaction in a Peer-2-Peer Environment

Authors: Patrick Eugster, Chalmers University of Technology, Göteborg, Sweden
Sébastien Baehni, Swiss Federal Institute of Technology in Lausanne,
Switzerland

Title: Efficient, Flexible and typed Group Communications for Java

Authors: Laurent Baduel, INRIA Sophia Antipolis,
CNRS - I3S - University of Nice Sophia Antipolis
Françoise Baude, INRIA Sophia Antipolis,
CNRS - I3S - University of Nice Sophia Antipolis
Denis Caromel, INRIA Sophia Antipolis,
CNRS - I3S - University of Nice Sophia Antipolis

Title: Parsek: Object Oriented Particle in Cell. Implementation and
Performance Issues

Authors: Stefano Markidis, Politecnico di Torino
Giovanni Lapenta, LANL
Brian VanderHeyden, LANL

Title: Adding tuples to Java: a study in lightweight data structures

Authors: C. van Reeuwijk, Delft University of Technology
H.J. Sips, Delft University of Technology

Title: Almost-whole-program compilation

Authors: Zoran Budimlic, Rice University
Ken Kennedy, Rice University

Title: The Ninf Portal: An Automatic Generation Tool for Computing Portals

Authors: Toyotaro Suzumura, Tokyo Institute of Technology
Hidemoto Nakada, National Institute of Advanced Industorial Science
and Technology (AIST)
Masayuki Saito, IBM Japan
Satoshi Matsuoka, Tokyo Institute of Technology/National Institute
of Informatics
Yoshio Tanaka, National Institute of Advanced Industorial Science and
Technology (AIST)
Satoshi Sekiguchi, National Institute of Advanced Industorial Science
and Technology (AIST)

Title: Fast Subtype Checking in the HotSpot JVM

Authors: Cliff Click, Sun Microsystems
John Rose, Sun Microsystems

Title: Advanced Eager Scheduling for Java-Based Adaptively Parallel
Computing

Authors: Michael O. Neary, Dep. of Computer Science,
University of California, Santa Barbara
Peter Cappello, Dep. of Computer Science,
University of California, Santa Barbara

Title: Higher-Order Functions and Partial Applications for a C++
Skeleton Library

Authors: Herbert Kuchen, University of Münster
Jörg Striegnitz, Research Center Julich, Central Institute
for Applied Mathematics, D-52425 Julich, Germany

Title: Open Runtime Platform: Flexibility with Performance using Interfaces

Authors: Michal Cierniak, Intel
Brian Lewis, Intel
James Stichnoth, Intel

Title: Specifying Java Thread Semantics Using a Uniform Memory Model

Authors: Yue Yang, University of Utah
Ganesh Gopalakrishnan, University of Utah
Gary Lindstrom, University of Utah

Title: JOPI: A Java Object-Passing Interface

Authors: Nader Mohamed, University of Nebraska-Lincoln
Jameela Al-Jaroodi, University of Nebraska-Lincoln
Hong Jiang, University of Nebraska-Lincoln
David Swanson, University of Nebraska-Lincoln

Title: inAspect - Interfacing Java and VSIPL

Authors: Torey Alford, MPI Software Technology Inc.
Vijay Shah, Mississippi State University
David Whitcomb, MPI Software Technology Inc.
Anthony Skjellum, MPI Software Technology Inc.
Andrew Watkins, MPI Software Technology Inc.
Nick Younan, Mississippi State University

Title: Jeeg: A Programming Language for Concurrent Objects Synchronization

Authors: Giuseppe Milicia, BRICS - University of Aarhus
Vladimiro Sassone, University of Sussex

Title: Simple and Effective Array Prefetching in Java

Authors: Brendon Cahoon, University of Massachusetts
Kathryn McKinley, University of Texas

Title: Generic Programming for High Performance Scientific Applications

Authors: Lie-Quan Lee, Indiana University
Andrew Lumsdaine, Indiana University

Title: A Scaleable Event Infrastructure for Peer to Peer Grids

Authors: Geoffrey Fox, Community Grid Computing Laboratory,
Dept. of Computer Science, Indiana University
Shrideep Pallickara, Community Grid Computing Laboratory,
Dept. of Computer Science, Indiana University
Xi Rao, Community Grid Computing Laboratory,
Dept. of Computer Science, Indiana University
Qinglin Pei, Community Grid Computing Laboratory,
Dept. of Computer Science, Indiana University

Title: Immutability Specification and its Applications

Authors: Igor Pechtchanski, New York University
Vivek Sarkar, IBM T.J.Watson Research Center

Aggressive Object Combining

Object combining tries to put objects together that have roughly the
same live times to reduce strain on the memory manager and to re-
duce the number of pointer indirections during a program's execu-
tion. Object combining works by appending the fields of one object
to another, allowing allocation and freeing of multiple objects with
a single heap (de)allocation. Unlike object inlining, which will only
optimize objects where one has a (unique) pointer to another, our
optimization also works if there is no such relation. Object inlin-
ing is thus treated as a special case of object combining. Object
inlining also directly replaces the pointer by the inlined object's
fields. Object combining leaves the pointer in place to allow more
combining. Elimination of the pointer accesses is implemented in
a separate compiler optimization pass.
This paper describes and evaluates techniques to implement ob-
ject combining for Java. Unlike previous object inlining systems,
reference field overwrites are allowed and handled, resulting in
much more aggressive optimization. Our object combining heuris-
tics also allow unrelated objects to be combined, for example those
allocated inside a loop; recursive data structures (linked lists, trees)
can be allocated several at a time and objects that are always used
together can be combined.
The main focus of object combining in this paper is on reducing
object (de)allocation overhead, by reducing both garbage collection
work and the number of object allocations. Reduction of memory
management overhead causes execution time to be reduced by up
to 35%. Indirection removal further reduces execution time by up
to 6%.

Ravenscar-Java: A High Integrity Profile for Real-Time Java

For many, Java is the antithesis of a high integrity programming
language. Its combination of object-oriented programming
features, its automatic garbage collection, and its poor support for
real-time multi-threading are all seen as particular impediments.
The Real-Time Specification for Java has introduced many new
features that help in the real-time domain. However, the
expressive power of these features means that very complex
programming models can be created, necessitating complexity in
the supporting real-time virtual machine. Consequently, Java, with
the real-time extensions as they stand, seems too complex for
confident use in high integrity systems. This paper presents a Java
profile for the development of software-intensive high integrity
real-time systems. This restricted programming model removes
language features with high overheads and complex semantics, on
which it is hard to perform timing and functional analyses. The
profile fits within the J2ME framework and is consistent with
well-known guidelines for high integrity software development,
such as those defined by the U.S. Nuclear Regulatory Commission.

JavaSymphony: New Directives to Control and Synchronize Locality,
Parallelism, and Load Balancing for Cluster and GRID-Computing

There has been an increasing research interest in extending the use of Java
towards performance-oriented programming for distributed and parallel
applications. Numerous research projects have introduced class libraries
or language extensions for Java in order to support automatic management
of locality, parallelism and load balancing which is almost entirely under
the control of a runtime system and frequently results in critical
performance problems. In previous work we described JavaSymphony to
substantially alleviate this problem. JavaSymphony is a Java class
library that allows the programmer to control parallelism, load balancing
, and locality at a high level. Objects can be explicitly distributed and
migrated based on a high-level API to static and dynamic system parameters
and dynamic virtual distributed architectures which impose a virtual
hierarchy on a distributed system of physical computing nodes.

In this paper we describe various important extensions to the original
JavaSymphony API which includes a generalization of virtual architectures
that can be used to specify and to request arbitrary heterogeneous
distributed architectures. The number of threads that execute an object's
methods can be controlled dynamically through single- and
multi-threaded objects. Conventional Java objects can be dynamically
converted to JavaSymphony objects. A (un)lock mechanism has been
introduced in order to avoid inconsistent modification of objects or
virtual architectures. A sophisticated event mechanism for asynchronous
communication and interaction is provided. Moreover, we included several
synchronization constructs including barrier synchronization and
synchronization for asynchronous method invocations.

We conducted an experiment to compare the performance of JavaSymphony
against two related Java-based systems. We also demonstrate how to
simplify the programming effort to synchronize and to coordinate
distributed objects through the JavaSymphony event and synchronization
mechanism.

Run-time Evaluation of Opportunities for Object Inlining in Java

Object-oriented languages, like Java, encourage the use of many small
objects linked together by field references, instead of a few monolithic
structures. While this practice is beneficial from a program design
perspective, it can slow down program execution by incurring many
pointer indirections. One solution to this problem is object inlining:
when the compiler can safely do so, it fuses small objects together,
thus removing the reads/writes to the removed field, saving the memory
needed to store the field and object header, and reducing the number of
object allocations.

The objective of this paper is to measure the potential for object
inlining by studying the run-time behavior of a comprehensive set of
Java programs. We study the traces of program executions in order to
determine which fields behave like inlinable fields. Since we are using
dynamic information instead of a static analysis, our results give
an upper bound on what could be achieved via a static compiler-based
approach. Our experimental results measure the potential improvements
attainable with object inlining, including reductions in the numbers of
field reads and writes, and reduced memory usage.

Our study shows that some Java programs can benefit significantly from
object inlining, with close to a 10% speedup. Somewhat to our surprise,
our study found one case, the db benchmark, where the most important
inlinable field was the result of unusual program design, and fixing
this small flaw led to both better performance and clearer program
design. However, the opportunities for object inlining are highly
dependent on the individual program being considered, and are in many
cases very limited. Furthermore, fields that are inlinable also have
properties that make them potential candidates for other optimizations
such as removing redundant memory accesses. The memory savings possible
through object inlining are moderate.

Elimination of Java Array Bounds Checks in the Presence of
Indirection

The Java language specification states that every access to
an array needs to be within the bounds of that array; i.e. be-
tween 0 and array length - 1. Different techniques for dif-
ferent programming languages have been proposed to elim-
inate explicit bounds checks. Some of these techniques are
implemented in off-the-self Java Virtual Machines (JVMs).
The underlying principle of these techniques is that bounds
checks can be removed when a JVM/compiler has enough
information to guarantee that a sequence of accesses (e.g. in-
side a for-loop) is safe (within the bounds).
Most of the techniques for the elimination of array bounds
checks have been developed for programming languages that
do not support multi-threading and/or enable dynamic class
loading. These two characteristics make most of these tech-
niques unsuitable for Java. Those techniques developed
specifically for Java have not addressed the elimination of
array bounds checks when the index is stored in another
array (indirection array) in the presence of indirection.
With the objective of optimising applications with array
indirection, this paper proposes and evaluates three imple-
mentation strategies, each implemented as a Java class. The
classes provide the functionality of Java arrays of type int so
that objects of the classes can be used instead of indirection
arrays. Each strategy enables JVMs, when examining only
one of these classes at a time, to obtain enough information
to remove array bounds checks.

Ibis: an Efficient Java-based Grid Programming Environment

In computational grids, performance-hungry applications need to
simultaneously tap the computational power of multiple, dynami-
cally available sites. The crux of designing grid programming envi-
ronments stems exactly from the dynamic availability of compute
cycles: grid programming environments (a) need to be portable to
run on as many sites as possible, (b) they need to be flexible to
cope with different network protocols and dynamically changing
groups of compute nodes, while (c) they need to provide efficient
(local) communication that enables high-performance computing
in the first place.
Existing programming environments are either portable (Java),
or they are flexible (Jini, Java RMI), or they are highly efficient
(MPI). No system combines all three properties that are necessary
for grid computing. In this paper, we present Ibis, a new program-
ming environment that combines Java's "run everywhere" porta-
bility both with flexible treatment of dynamically available net-
works and processor pools, and with highly efficient, object-based
communication. Ibis can transfer Java objects very efficiently by
combining streaming object serialization with a zero-copy proto-
col. Using RMI as a simple test case, we show that Ibis out-
performs existing RMI implementations, achieving up to 9 times
higher throughputs with trees of objects.

Abstracting Remote Object Interaction in a Peer-2-Peer Environment

Leveraged by the success of applications aiming at the "free" sharing
of data in the Internet, the paradigm of peer-2-peer (P2P) computing
has been devoted substantial consideration recently.

This paper presents an abstraction for object interaction in a P2P
environment, called query/share (QS). We present the principles underlying
our QS abstraction, and its implementation in Java. We constrast our
abstraction with established abstractions for distributed
programming such as the remote method invocation or the tuple space,
illustrating how the QS abstraction, obviously influenced by such predating
abstractions, unifies flavors of these, but also how it captures the
constraints specific to P2P environments.

Efficient, Flexible and typed Group Communications for Java

Group communication is a crucial feature for high-performance
and Grid computing. While previous works and libraries proposed
such a characteristic (e.g. MPI, or object-oriented frameworks),
the use of groups was imposing specific constrained on the
programmers -- for instance the use of dedicated interfaces to
trigger group communications.

We aimed at a more flexible mechanism. More specifically, this
paper proposes a scheme where, given a Java class, one can initiate
group communications using the standard public methods of the class
together with the classical dot notation; in that way group
communications remained typed. Furthermore, groups are automatically
constructed to handle the result of collective operations, providing
an elegant and effective way to program gather operations.
The flexibility also allows to handle results that are groups of
remotely accessible object, and to use group as a mean to dispatch
different parameters to different group members (for instance in a
cyclic manner). More, hierarchical groups can be easily and
dynamically constructed; an important feature to achieve the use of
several clusters in Grid computing.

Performance measures demonstrated the viability of the approach.
The challenge is to provide easy to use, efficient, and dynamic
group management for objects dynamically distributed on the Grid.

Parsek: Object Oriented Particle in Cell. Implementation and
Performance Issues

The paper describes a plasma physics simulation package written entirely
in Java using a pure object-oriented design. Plasma simulation is an ideal
candidate for object oriented programming since the physics of plasmas and the numerical
schemes used lend themselves to a natural object oriented interpretation. In the present
paper we primarily discuss three issues.

First, we review briefly the numerical scheme used, based on the
implicit formulation that results in a cost effective simulation
method. With the use of implicit plasma simulation a gain of eight
orders of magnitude in computational cost can be demonstrated
compared with standard explicit methods.

Second, we discuss the issue of object oriented implementation. We
show that the use of the standard JIT compilation results in a
penalty of the fully object oriented approach. Although the
observed penalty is much less than previously observed, we still
have to gain a factor of two to make the fully object oriented
version competitive. To overcome this difficulty we try the
JaMake compiler developed at Rice university.

Finally, the implicit approach proposed here requires the most
modern numerical techniques for the solution of non-linear
equations(Newton-Krylov) and linear systems(genera-lized minimum
residual with multigrid preconditioning). The complexity of the
numerical methods also further complicate the design of
parallelizable algorithms. We approach the problem using a package
recently developed to handle multiple physics:
CartaBlanca. CartaBlanca provides all the necessary
classes to perform the advanced numerical operations required and
provides a fully parallel environment based on native JAVA
multithreading.

Adding tuples to Java: a study in lightweight data structures

Java classes are very flexible, but this comes at a price. The main
cost is that every class instance must be allocated. Their access by
reference introduces pointer dereferences and complicates program
analysis. These costs are particularly burdensome for small, ubiqui-
tous data structures such as coordinates and state vectors. For such
data structures a lightweight representation would be desirable, al-
lowing such data to be handled directly, similar to primitive types.
A number of proposals introduce restricted or mutated variants of
standard Java classes that could serve as lightweight representation,
but the impact of these proposals has never been studied.
Since we have implemented a Java compiler with lightweight data
structures we are in a good position to do this evaluation. Our
lightweight data structures are tuples. As we will show, their use
can result in significant performance gains: for a number of ex-
isting benchmark programs using tuples we gain more than 50% in
performance relative to our own compiler, and more than 20% rela-
tive to Sun's Hotspot 1.4 compiler. We expect similar performance
gains for other implementations of lightweight data structures.
With respect to the expressiveness of Java, lightweight variants of
standard Java classes have little impact. In contrast, tuples add a
different language construct that, as we will show, can lead to sub-
stantially more concise program code.

Almost-whole-program compilation

This paper presents a strategy, called almost-whole-program
compilation, for extending the benefits of whole-program
optimization to large collections of Java components that are
packaged as a group after the development phase. This
strategy has been implemented in a framework that uses Java
visibility and scoping rules to transform a collection of classes
into a package that is amenable to whole-program optimizations,
without precluding extensions to the optimized and compiled code.
Thus, it enables the Java developer to balance performance against
flexibility of the program after the development phase,
without compromising the design process. The transformation is
shown to incur only modest performance penalties, which are more
than compensated for by the interprocedural optimizations it
enables. The paper concludes with experimental results showing the
benefits that can be achieved using this approach.

The Ninf Portal: An Automatic Generation Tool for Computing Portals

As the Grid proliferates as the next-generation comput-
ing infrastructure, a user interface in the form of "Grid
Portals" is becoming increasingly important, especially for
computational scientists and engineers. Although several
Grid Portal toolkits have been proposed, the portal devel-
oper still must build and deploy both the user interface and
the application, which results in considerable programming
efforts. We aim to ease this burden by generating the por-
tal frontend (that constitutes of JSP and Java Servlets) from
a XML document for the former, and a GridRPC system,
Ninf-G for easily "gridifying" existing applications for the
latter, and realizing their seamless integration. The result-
ing system, which we call the Ninf Portal, allowed concise
description and easy deployment of a real Grid application
with greatly small programming efforts.

Fast Subtype Checking in the HotSpot JVM

We present the fast subtype checking imple-
mented in Sun's HotSpot JVM. Subtype checks
occur when a program wishes to know if class S
implements class T, where S and T are not
known at compile-time. Large Java programs
will make millions or even billions of such
checks, hence a fast check is essential. In actual
benchmark runs our technique performs com-
plete subtype checks in 3 instructions (and only 1
memory reference) essentially all the time. In
rare instances it reverts to a slower array scan.
Memory usage is moderate (6 words per class)
and can be traded off for time. Class loading
does not require recomputing any data structures
associated with subtype checking.

Advanced Eager Scheduling for Java-Based Adaptively Parallel
Computing

Javelin 3 is a software system for developing large-scale, fault
tolerant, adaptively parallel applications. When all or part of their
application can be cast as a master-worker or as a branch-and-bound
computation, Javelin 3 frees application developers from concerns
about inter-processor communication and fault tolerance among
networked hosts, allowing them to focus on the underlying application.

The paper describes an advanced fault tolerant task scheduler and its
performance analysis. The task scheduler integrates work stealing
with an advanced form of eager scheduling. This scheduler enables
dynamic task decomposition, which improves host load-balancing in the
presence of tasks whose non-uniform computational load is evident only
at execution time. Speedup measurements are presented of actual
performance when using up to 1,000 hosts. We analyze the
expected performance degradation due to unresponsive hosts, and
measure the actual performance degradation due to unresponsive hosts.

Higher-Order Functions and Partial Applications for a C++
Skeleton Library

Message passing based on libraries such as MPI is typically
used to program parallel machines with distributed memory.
This is efficient but error prone.
Algorithmic skeletons intend to simplify
parallel programming by increasing the expressive power. The
idea is to offer typical parallel programming patterns as
polymorphic higher-order functions which are efficiently
implemented in parallel. The present paper describes, how
C++ templates and operator overloading can be used in
order to provide the main features needed for algorithmic
skeletons, namely higher-order functions, partial applications,
and parametric polymorphism. Experimental results based on a
draft implementation of our C++ skeleton library show that the
higher expressive power can be gained without a significant
performance penalty.

Open Runtime Platform: Flexibility with Performance using Interfaces

According to conventional wisdom, interfaces provide flexibility at the cost of performance. Most high-
performance Java virtual machines today tightly integrate their core virtual machines with their just-in-time
compilers and garbage collectors to get the best performance. The Open Runtime Platform (ORP) is
unusual in that it reconciles high performance with the extensive use of well-defined interfaces between its
components. ORP was developed to support experiments in dynamic compilation, garbage collection,
synchronization, and other technologies. To achieve this, two key interfaces were designed: one for garbage
collection and another for just-in-time compilation. This paper describes some interesting features of these
interfaces and discusses lessons learned in their use.

Specifying Java Thread Semantics Using a Uniform Memory Model

Standardized language level support for threads is one of
the most important features of Java. However, defining and
understanding the Java Memory Model (JMM) has turned
out to be a major challenge. Several models produced to
date are not as easily comparable as first thought. Given
the growing interest in multithreaded Java programming,
it is essential to have a sound framework that would allow
formal specification and reasoning about the JMM.
This paper presents the Uniform Memory Model (UMM),
a formal memory model specification framework. With a
flexible architecture, it can be easily configured to capture
different memory consistency requirements including both
architectural and language level memory models. Based on
guarded commands, UMM is integrated with a model check-
ing utility, providing strong built-in support for formal veri-
fication and program analysis. A formal specification of the
JMM following the semantics proposed by Manson and Pugh
is implemented in UMM. Systematic analysis has revealed
interesting properties of the proposed semantics. Mistakes
from the original specification have also been uncovered.

JOPI: A Java Object-Passing Interface

Recently there has been an increasing interest in developing
parallel programming capabilities in Java to harness the vast
resources available in clusters, grids and heterogeneous
networked systems. In this paper, we introduce a Java object-
passing interface (JOPI) library. JOPI provides Java programmers
with the necessary functionality to write object-passing parallel
programs in distributed heterogeneous systems. JOPI provides an
MPI-like interface that can be used to exchange objects among
processes. In addition to the well-known benefits of the object-
oriented development model, using objects to exchange
information in JOPI is advantageous because it facilitates passing
complex structures and enables the programmer to isolate the
problem space from the parallelization problem. The run-time
environment for JOPI is portable, efficient and provides the
necessary functionality to deploy and execute parallel Java
programs. A number of experiments were conducted to measure
JOPI's performance and compare it with MPI. The experiments
were conducted on a cluster system and a collection of
heterogeneous platforms and the results show good performance
gain using JOPI.

inAspect - Interfacing Java and VSIPL

In this paper, we discuss the origin, design, performance, and
directions of the inAspect high-performance signal and image
processing package for Java. The Vector Signal and Image
Processing (VSIPL) community provides a standardized API
(Application Programmer Interface) for high-performance signal
and image processing plus linear algebra with a C emphasis and
object-based design framework. Java programmers need high
performance and/or portable APIs for this broad base of
functionality as well. InAspect addresses PDA's, embedded Java
boards, workstations, and servers, with emphasis on embedded
systems at present. Efforts include supporting integer precisions
and utilizing CORDIC algorithms, both aimed at added relevance
for limited-performance environments, such as present-day PDAs.

Jeeg: A Programming Language for Concurrent Objects Synchronization

We introduce Jeeg, a dialect of Java based on a declarative
replacement of the synchronization mechanisms of Java that results
in a complete decoupling of the `business' and the `synchronization'
code of classes. Synchronization constraints in Jeeg are expressed
in a linear temporal logic which allows to effectively limit the
occurrence of the inheritance anomaly that commonly affects
concurrent object oriented languages.
Jeeg is inspired by the current trend in aspect oriented
languages. In a Jeeg program the sequential
and concurrent aspects of object behaviors are decoupled: specified
separately by the programmer these are then weaved together by the
Jeeg compiler.

Simple and Effective Array Prefetching in Java

Java is becoming a viable choice for numerical algorithms due to
the software engineering benefits of object-oriented programming.
Because these programs still use large arrays heavily, they con-
tinue to suffer poor memory performance. To hide memory latency,
we describe a new unified compile-time analysis for software pre-
fetching arrays and pointers in Java. Our previous work uses data
flow analysis to discover linked data structure accesses, and here
we present a more general version that also identifies loop induc-
tion variables used in array accesses. Our algorithm schedules pre-
fetches for all array references that contain induction variables. We
evaluate our technique on a set of array-based Java programs, and
we report improvements greater than 15% in 6 of the 12 programs.
Across all our programs, prefetching reduces execution time by an
average of 23.5%, and the largest improvement is 57.5%. Tradi-
tional software prefetching algorithms for C and Fortran use local-
ity analysis and sophisticated loop transformations. Because our
analysis is much simpler and quicker, it is suitable for including in
a just-in-time compiler. We further show that the additional loop
transformations and careful scheduling of prefetches used in pre-
vious work are not always necessary for modern architectures and
Java programs.

Generic Programming for High Performance Scientific Applications

We present case studies that apply generic programming to
the development of high-performance parallel codes for solv-
ing two archetypal PDEs. We examine the overall structure
of the example scientific codes and consider their generic im-
plementation. With a generic approach it is a straightforward
matter to reuse software components from different sources;
implementations with components from ITL, MTL, Blitz++,
A++/P++, and Fortran BLAS are presented. Our newly-
developed Generic Message Passing library is used for com-
munication. We compare the generic implementation to equiv-
alent implementations developed with alternative libraries and
languages and discuss not only performance but software en-
gineering issues as well.

A Scaleable Event Infrastructure for Peer to Peer Grids

In this paper we propose a peer-to-peer (P2P) grid comprising
resources such as relatively static clients, high-end resources and
a dynamic collection of multiple P2P subsystems. We investigate
the architecture of the messaging and event service that will
support such a hybrid environment. We designed a distributed
publish-subscribe system NaradaBrokering for XML specified
messages. NaradaBrokering interpolates between centralized
systems like JMS (Java Message Service) and P2P environments.
Here we investigate and present our strategy for the integration of
JXTA into NaradaBrokering. The resultant system naturally
scales with multiple JXTA Peer Groups linked by
NaradaBrokering.

Immutability Specification and its Applications

In this paper, we introduce a framework for immutability specification, and
discuss its application to code optimization. A location is said to be immutable
if its value and the values of selected locations reachable from it are guaranteed
to remain unchanged during a specified time interval. Compared to a final
declaration, an immutability assertion in our framework can express a richer set
of immutability properties along three important dimensions -- lifetime, reachability
and context. We present a framework for processing and verifying immutability
annotations in Java, as well as extending optimizations so as to exploit
immutability information. Preliminary experimental results show that a significant
number (82%) of read accesses could potentially be classified as immutable in
our framework. Further, use of immutability information yields measurable
speedups in the range of 6% to 11% for certain benchmark programs.