340 likes | 355 Views
Explore the motivations behind Java's limitations and challenges in extending the language for generic programming technology, including proposed solutions, syntax examples, and the concept of type erasure.
E N D
Motivation Java represents a quantum leap forward in mainstream programming technology; it provides: • Safe execution • Truly portable, standardized binary representation (“Write Once, Run Anywhere …”) • Clean support for object-oriented design • Higher order data (closures) in the form of anonymous inner classes • Multiple threading • Comprehensive libraries built-in to the run-time environment
But … it has significant limitations including: • Absence of type parameterization; critical type constraints cannot be asserted and checked, e.g. Vector<Integer> • Weak support for decomposition of programs into independent components; packages use “hard-coded” imports and lack “interfaces” • Failure to include primitive types in object type hierarchy
Extending Java Is Challenging Because of Compatibility Constraints • Old Java binaries should not break on new releases of the Java Virtual Machine (reflection interface?) • Old source code should compile under new editions of the Java compiler • Existing Java applications should be able to gracefully incorporate language extensions
Constraintson Java Extensions • Backward compatibility: old programs (source & binary) work in extended language (compiler & JVM) • Simple model for interoperability between old and new code • Extensions should be seamless
Blueprint for Extending Java • No change to JVM including run-time libraries (except additions) • Extensions supported entirely by the source language compiler (javac) • Class files may be augmented by new attributes • New programs may incorporate a customized class loader
Focus: Extending Java to Accommodate Genericity (Parametric Types) • Coherence is a challenging problem: • array types are already parametric with run-time types and co-variant subtyping (Integer[] is a subtype of Number[]) • co-variant subtyping conflicts with static type checking • supporting parametric run-time types requires significant new execution machinery • Container classes should be re-interpreted or re-implemented as generic classes, e.g. Vector Vector<T>
Interim Solution: JSR14 version of Generic Java • Any class or method can be parameterized by type • Each type parameter has an upper bound (typically Object). • Type parameters are non-variantly subtyped,e.g.Vector<Number> is unrelated to Vector<Integer> • Parametric classes and methods are implemented using “type erasure”; every reference to a type variable is replaced by its bound. This approach is called a “homogeneous” implementation.
Syntax of Generic Java • Type parameterization added to class declarations: class Foo<type-parms> … { … } where type-parms :: = id extends class-type | id implements interface-type Note: type-parms may appear in their own bounds • Generic (parametric) types can be used almost everywhere than conventional types. The exceptions correspond to places where run-time parametric type information is required to support the generalized notation (e.g., casts to parametric types.
Syntax of Generic Java (cont.) • Method declarations may be parameterized by type: <T> return-type m(parm-list) { … } where return-type, parm-list, and the method body can mention T. The scope of T is the method declaration only.
Simple Example abstract class Tree<T> {} class Cons<T> extends Tree<T> { Tree<T> left,right; Cons(Tree<T> l, Tree<T> r) { left = l; right = r; } } class Leaf<T> extends Tree<T> { T value; Leaf(T v) { value = v; } }
Example Continued interface TreeVisitor<T,U> { U forCons(Cons<T> c); U forLeaf(Leaf<T> l); } Must add the following polymorphic method to Tree<T> public abstract <U> U accept(TreeVisitor<T,U> v); and define it as follows in Leaf<T> and Cons<T>
Fine points of Syntax • Static inner classes create holes in scope of any containing generic (parametric) class, e.g. class Foo<T> { static class Bar1 { … // T is not in scope } class Bar2 { … // T is in scope } }
Type Erasure • In essence, type erasure translates parameterized code to the standard universal Object type idiom required to simulate genericity in ordinary Java, e.g., Vector<Integer> Vector augmented by (Integer) casts where required; these generated casts never fail. • Type erasure is well-established part of the functional programming language folklore; most ML compilers rely on this process in generating code. • Technical complications: bridge methods • Near production quality compiler available for downloading on the web (JSR14 at JDC)
Where (Interim) Generic Java Fails • Absence of run-time types inconsistent with the built-in array type new T[], new T[][], … are all illegal • Absence of run-time types inconsistent with run-time type tests provided by Java instanceof Vector<T> is illegal (Vector<T>) and (T) are illegal exception types cannot be parametric • Absence of co-variant subtyping forces copying in some cases; given A <: B, List<A> object does not have type List<B>
Importance of Run-time Types Requiredto support: • isolated parametric allocationnew T(), new T[], new T[][], ... • parametric casts(isolated and non-isolated)(T) ... , (T[]) … , (Vector<T>) … , ... • instantiated casts(cloning!)(Vector<Integer>) ... , (List<Number>) … , … • accurate getClass() semantics
Comprehensive Solution: NextGen(Cartwright and Steele) • Supports exactly the same extension syntax as GJ, less the restrictions. • All types are available at run-time for casting and instanceof tests • Lightweight homogeneous (code shared across parametric instantiations) implementation • Performance of prototype compiler has been very encouraging.
NextGen Implementation Strategy Augment GJ homogeneous implementation relying on type-erasure • Use lightweight “instantiation” classes (generated on demand) to specify run-time types • Replace type dependent operations in base classes by abstract methods (snippets)and override them in instantiation classes
Simple Example (no snippets) abstract class Tree<T> {} class Cons<T> extends Tree<T> { Tree<T> left,right; Cons(Tree<T> l, Tree<T> r) { left = l; right = r; } } class Leaf<T> extends Tree<T> { T value; Leaf(T v) { value = v; } }
Corresponding Base Classes abstract class Tree {} abstract class Cons extends Tree { Tree left, right; Cons(Tree l, Tree r) { left = l; right = r; } } abstract class Leaf extends Tree { Object value; Leaf(Object v) { value = v; } }
Naïve Instantiation Classes Leaf<Integer>generates: abstract class Tree$_Integer_$ extends Tree {} class Leaf$_Integer_$ extends Leaf { Leaf$_Integer_$(Integer v) { super(v); } } Problem:Leaf$_Integer_$ !<: Tree$_Integer_$ Leaf$_Integer_$needs to be subclass of two different classes:List$_Integer_$andLeaf
Naïve Instantiation Class Hierarchy Tree Cons Tree$_Integer_$ Leaf Cons$_Integer_$ Leaf$_Integer_$
Solution Use interfaces to represent type inclusions Each class instantiation C<T> generates both • a wrapper class C$_T_$ and • a wrapper interface $C$_T_$ where C$_T_$ implements $C$_T_$ Use class inheritance only for code inheritance
Correct Instantiation Class Hierarchy T $T$_I_$ L C T$_I_$ $C$_I_$ $L$_I_$ C$_I_$ L$_I_$
Correct Instantiation Classes interface $Tree$_Integer_$ {} abstract class Tree$_Integer_$ extends Tree implements $Tree$_Integer _$ {} interface $Leaf$_Integer_$ extends $Tree$_Integer_$ {} class Leaf$_Integer_$ extends Leaf implements $Leaf$_Integer_$ { Leaf$_Integer_$(Integer v) { super(v); } }
Snippet Example abstract class Tree<T> {} class Cons<T> extends Tree<T> { ... boolean equals(Object other) { if (getClass() == other.getClass() { Cons<T> o = (Cons<T>) other; return left.equals(o.left) && right.equals(o.right); } else return false; } } class Leaf<T> extends Tree<T> { ... boolean equals(Object other) { if (getClass() == other.getClass()) { Leaf<T> o = (Leaf<T>) other; return value.equals(o.value); } } }
Corresponding Base Classes abstract class Tree {} abstract class Cons extends Tree { ... boolean equals(Object other) { if (getClass() == other.getClass()) { Cons o = $snip1(other); return left.equals(o.left) && right.equals(o.right); } else return false; } abstract Cons $snip1(Object o); } class Leaf extends Tree { ... boolean equals(Object other) { if (getClass() == other.getClass()) { Leaf o = $snip2(other); return value.equals(o.value); }} abstract Leaf $snip2(Object o); }
Snippet Instantiation Classes Leaf<Integer>generates interface Tree$_Integer_$ {} abstract class $Tree$_Integer_$ implements Tree$_Integer_${} interface Leaf$_Integer_$ {} class $Leaf$_Integer_$ extends Leaf implements Leaf$_Integer_$ { … Leaf $snip1(Object o) { return (Leaf)((Leaf$_Integer_$) o); } }
Parametric Casting x instanceof Leaf<Integer> x instanceof Leaf$_Integer_$ (Leaf<Integer>) x (Leaf) (Leaf$_Integer_$) x
Other Subtle Issues • Circular type application chains prevents compile-time generation of instantiation classes and interfaces • Package private type parameters • Polymorphic methods (inner class translation fails in general) • Parametric generalization of library classes, e.g., Vector • Interoperability with old code
Observations • Peformance difference between different • JVM’s (e.g., Sun vs. IBM JDK 1.3 on Linux) • is much greater than difference between • GJ and Linux • Implementation tuning of JIT can eliminate much of the performance penalty
Work in Progress Full support for polymorphic methods Covariant type parameterization of classes Per-class-instantiation static fields Generic types as parameters NextGen cognizant IDE: DrJava (teachjava.org) Designing related extensions • primitive types as object types • hygienic mixins • modular components
Conclusion Java has lots of room to grow because the JVM has a standardized virtual machine with a rich set of primitives, most notably custom class loaders Primary costs: • challenging design problems • more complex compiler • reliance on customized class loader • extra class file attributes • less transparent debugging