Data Types and the Type System

Data Types and the Type System James Brucker

Important Topics • what is a "type system"? • what are common data types? • how are numeric types stored and operated on? • compound types: arrays, struct, records • enumerated types • character and string types • strong type checking versus not-so-strong • enumerations, type compatibility, and type safety • advantages & disadvantages of compile-time checking

Important Topics • type compatibility • what are compatibility rules in C, C++, and Java? • when are user-defined types compatible? • type conversion • what conversions are automatic in C, C++, Java? • what conversions are allowed using a cast?

Importance of knowing data types • Need to know the valid range of data values. for(int k=0; k<9999999; k++) /*do something*/; int k = Integer.MAX_VALUE; // = 2,147,483,647 • Need to know rules for operations. int k = Integer.MAX_VALUE; // = 2,147,483,647 k = k + 1; // overflow? int m = 7, n = 4; float x = m / n; // 1.75, 2.0 or 1.0 ? • Need to know what assignments are valid and how the compiler will convert from one type to another. • Need to know what variables represent: value of data or a reference to a storage location.

Data Type • A data type is a set of possible values and operations on those values Example: int set of values: -2,147,483,648 ..., 0, 1, 2,147,483,647 ( -231 to 231 - 1 ) operations: +: intint  int * : intint  int etc. internal representation: 32-bit 2's complement

Data Types define meaning • To the computer, a stored value is just bits. • The data type assigns meaning to those bits. • Example C function: /* "An int is a unsigned is a float" --the cpu */ void rawdata( ) { union { int i; unsigned int u; float f; } x; while( 1 ) { printf("Input a value: "); scanf("%d", &x.i); // read as an int printf("int %d is unsigned %u is float %g\n", x.i, x.u, x.f); } }

Type System The type system is • the collection of all data types • rules for type equivalence, type compatibility, and type conversion between data types

Type System Example: int n = 0.5 * 99; • cannot directly multiply a float times int • type conversion rule: "int" can be automatically converted to "float". • type system says float * float is float (49.5) • in C, assignment compatibility rule says that you can convert float back to int by truncation. • in Java or C#, result is double and it is not assignment compatible with int (assignment error)

Memory Concepts • We will cover memory management later, but first... • When OS runs a program it allocates at least 2 memory segments: • text segment for program instructions (Read Only) • data segment for data (variables, constants, ...) • Data segment is divided into 3 parts: • static area - static data • stack area - stack oriented data • heap - dynamic, non-stack data

Memory Concepts a[0], a[1], ... int count = 0; const int MAXSIZE = 4000; int *getarray(int n) { int *a = (int *)malloc( n*sizeof(int) ); return a; } int main( ) { int size; scanf("%d",&n); int *a = getarray(n); scanf("%f",a); a++; } Heap Area unused space Stack Frame for getarray Stack Area Stack Frame for main size a (pointer only) count MAXSIZE Static Area

Virtual Memory • Most OS use virtual memory. • The actual location of memory pages varies. • Accessing memory efficiently affects program speed. Program virtual memory Real memory page n page n+1 page n Memory manager page n+1

Integer Data Types C/C++ support both “unsigned” and “signed” integer types. Type # Bytes Range of values short int 2 -32,768 (-215) to 32,767 (215 - 1) unsigned short 2 0 to 65,535 (216 - 1) int 4 -2,147,483,648 (-231) to 2,147,483,647 (231 - 1) unsigned int 4 0 to 4,294,967,295 (232 - 1) long int same as "int" on Pentium and Athlon CPU C permits “char” type for integer values, too… char 1 -128 to 127 unsigned char 1 0 to 255

Example use of unsigned int • To display the address of a variable in C: printf("address of %s is %d\n", "x", (unsigned int)&x);

IEEE 754 Floating Point Standard • Problem: some numerical algorithms would run on one computer, but fail on another computer. with arithmetic overflow/underflow error on another. • Worse problem: results from different computers could differ greatly! • This reduced trust in the answer from computer. • In fact, when numerical results differ greatly it usually indicates a problem in the algorithm! • Solution: IEEE 754 (1985) defines a standard for computer storage of floating point numbers.

IEEE Floating Point Data Types -1.011100 x 211 = 1 1 0 0 0 1 0 1 0 0 1 1 1 0 0 0 0 . . . Sign bit Biased Exponent Mantissa Float: 1 8 bits bias= 127 23 bits Double: 1 11 bits bias=1023 52 bits Range Precision Float: 10-38 - 10+38 24 bits =~ 7 dec. digits Double: 10-308 - 10+308 53 bits =~ 15 dec. digits Stored exponent = actual exponent + bias

Implicit Leading "1" • Floating point numbers are stored in normalizedform: 13.2525 =1101.01010 = 1.1010100 x 23 3/16 =0.00110000 = 1.1000000 x 2-3 • Normalized form: the leading digit is always one. So, IEEE 752 doesn't store it. • Rule: if the stored value has exp. 2-bias to 2+bias then the floating point value is stored in normalized format: 1011.01110 = 1.011011100 x 23 mantissa: 011011100... exponent: 3+bias = 130 (single prec)

Gradual underflow • To extend the precision for small numbers, very small numbers are not stored in normalized form. • In this case the leading "1" is also stored and the biased exponent has value 0 (smallest exponent) Value MantissaBiased Exp. 1.01101110x 2-126 01101110000000000000000 -126+bias = 1 1.01101110x 2-127 10110111000000000000000 -127+bias = 0 1.01101110x 2-128 01011011100000000000000 -127+bias = 0 1.01101110x 2-129 00101101110000000000000 -127+bias = 0 1.01101110x 2-130 00010110111000000000000 -127+bias = 0 ... as number gets smaller, leading significant digits shift right 1.01101110x 2-147 00000000000000000000101 -127+bias = 0 1.01101110x 2-148 00000000000000000000010 -127+bias = 0 1.01101110x 2-149 00000000000000000000001 -127+bias = 0

IEEE 754 Floating Point Values The standard defines special values: +/-Infinity: 1/0 = +Infinity, -3/0 = -Infinity, exp(5000)= +Infinity, Infinity+Infinity = Infinity NaN (Not-a-Number). 0/0 = NaN, Infinity*0 = NaN, ... Value Sign Bit Exponent Mantissa Normalized f.p. 0, 1 1 to 2*bias any Denormalized 0, 1 00000000 any Zero 0, 1000000000 +Infinity 0 11111111 0 -Infinity 1 11111111 0 NaN 0, 1 11111111 any non-0

Floating Point Questions Question: How do you store 2.50 as a "float"? 2.50 = 1.25 x 2 = 1.01000000000 x 21 Implicit leading 1 rule: mantissa = 010000000000000 Exponent: 1 + bias = 128 Stored value: 0 10000000 010000000000000000000 Question: What is the decimal value of: 1 10000000 100000000000000000000 0 00000000 000000000000000000000 0 11111111 000000000000000000000

Floating Point Questions (cont'd) Question: How do you store 0.1 as a "float"? 0.1 = 0.0011001100110011001100 ... Normalized mantissa = 10011001100110011001100 Exponent: -3 + bias = 124 Stored val: 0 01111100 10011001100110011001100 • 0.1 has no exact representation in binary! Question: what decimal values have an exact binary representation (no truncation error)???

Consequence of inexact conversion • 0.1 does not have exact binary representation. • Therefore, we may have: 10 * 0.1 != 1.0 • 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1!= 1.0 • Don't use "==" as test criteria in loops with floats.This loop never terminates: double x = 0.1; while( x != 1.0 ) { // better: ( x <= 1.0 ) System.out.println( x ); x = x + 0.1; }

Type Compatibility for built-in types • Operations in most languages will automatically convert ("promote") some data types: 2 * 1.75 convert 2 (int) to floating point • Assignment compatibility: what automatic type conversions are allowed on assignment? int n = 1234567890; float x = n; // OK is C or Java n = x; // allowed in C? Java? • char -> short -> int -> long -> double short -> int -> float -> double • What about long -> float ? • Rules for C/C++ not same as Java.

C/C++ Arithmetic Type Conversion • For +, -, *, /, both operands must be the same type • C/C++ compiler "promotes" mixed type operands to make all operands same using the following rules: Operand Types Promote Result short op int short => int int long op int int => long long int op float int => float float int op double int => double double float op double float => double double etc... "op" is any arithmetic operation: + - * /

Assignment Type Conversion is not Arithmetic Type Conversion (1) • What is the result of this calculation? int m = 15; int n = 16; double x = m / n;

Forcing Type Conversion • Since arguments are integer, integer division is used: double x = 15 / 16; // = 0 ! • you must coerce "int" values to floating point. There are two ways: int m = 15; int n = 16; /** Efficient way: cast as a double */ double x = (double)m / (double)n ; /** Clumsy way: multiply by a float (ala Fortran) */ double x = 1.0*m / n;

Assignment Type Conversion is not Arithmetic Type Conversion (2) • Many students wrote this in Fraction program: public class Fraction { int numerator; // numerator of the fraction int denominator; // denominator of the fraction ...etc... /** compare this fraction to another. */ public int compareTo( Fraction frac ) { double r1 = this.numerator / this.denominator; double r2 = frac.numerator / frac.denominator; if ( r1 > r2 ) return 1; else if ( r1 == r2 ) return 0; else return -1; }

Arrays An array is a series of elements of the same type, with an index, which occupy consecutive memory locations. float x[10]; // C: array of 10 “float” vars char [] c = new char[40]; // Java: array of 40 "char" Array x[ ] in memory: x[0] x[1] x[2] . . . x[9] 4 Bytes = sizeof(float) Array c[ ] in memory : c[0] c[1] . . . c[39]

Array "dope vector" • In C or Fortran an array is just a set on continuous elements. No type or length information is stored. • Some languages store a "dope vector" (aka array descriptor) describing the array. /* C language */ double x[10]; /* Language with dope */ double x[10]; x 01E4820 x[0] x[1] x[2] x[3] ... x[0] x[1] x[2] x[3] ... x[9] x double 0 10 01E4820

Array as Object • In Java, arrays are objects: double [ ] x = new double[10]; • x is an Object; x[10] is a double (primitive type). double[ ] +length = 10 x[0] x[1] x[2] ... x x.getClass( ).toString( ) returns "[D"

1-Dimensional Arrays • Element of 1-D array computed as offset from start: float f[20]; address of f[n] = address(f) + n*sizeof(float) • Some languages permit arbitrary index bounds: • Pascal: var a: array [ 2..5 ] of real; • FORTRAN REAL (100) X array is X(1) ... X(100) REAL (2:5) Y array is Y(2) ... Y(5) • In any case, array element can be computed as offset: address of a[n] = address(a) + (n-start)*sizeof(real)

2-Dimensional Arrays There are different organizations of 2-D arrays: • Rectangular array in row major order: float r[4,3]; In memory (row major order): r[0,0] r[0,1] r[0,2] r[1,0] r[1,1] r[1,2] r[2,0] • Rectangular array in column major order (Fortran): real(4,3) x in memory (column major order) x(1,1) x(2,1) x(3,1) x(4,1) x(1,2) x(2,2) x(3,2) x(4,2) x(1,3)...

2-Dimensional Arrays • Computing address of array elements • Rectangular arrays in row major order: float x[ROWS][COLS]; address of x[j][k] = address(x) + (j*COLS + k) * sizeof(float) • Three dimensional array: float y[J][K][L]; address of y[j][k][l] = address(y) + j*K*L + k*L + l • 2-D and 3-D arrays require more time to access due to this calculation. • Compiler can optimize when you access consecutive items for(k = 0; k<COLS; k++) sum += x[j][k];

Arrays of Pointers: ragged arrays • Each element of a vector is a pointer to a vector char *days[7] = { "Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday" }; days = days[0] days[1] days[2] days[3] days[4] days[5] days[6] days[7] S u n d a y 0 M o n d a y 0 T u e s d a y 0 W e d n e s d a y 0 T h u r s d a y 0 F r i d a y 0 S a t u r d a y 0 Vector of pointers: 7 x 4 bytes = 28 bytes Array of characters: = 57 bytes

Arrays of Pointers: ragged arrays (2) • Compare previous slide with 2-D array: char days[ ][10] = { "Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday" }; days = S u n d a y 0 M o n d a y 0 T u e s d a y 0 W e d n e s d a y 0 T h u r s d a y 0 F r i d a y 0 S a t u r d a y 0 2-D array = 7 x 10 bytes = 70 bytes

What is sizeof( ) for 2-D arrays? • Rectangular array in C: char days[7][10] = { "Sunday", "Monday", ... }; int m = sizeof( days ); int n = sizeof( days[0] ); • Array of pointers in C: char *days[7] = { "Sunday", "Monday", ... }; int m = sizeof( days ); int n = sizeof( days[0] );

Java: always uses array of pointers • 2-D arrays in Java are always treated as array of pointers final int N = 10; double [][] a; a = new double[N][ ]; // create row pointers for(int k=0; k<N; k++) a[k] = new double[k+1]; // create columns // array dimensions determined by initial values int [][] m = { { 1, 2, 3, 4}, { 5, 6}, { 8, 9, 10}, { 11 } }; What are the sizes of each row of m ?

C#: rectangular and ragged arrays • A rectangular array in C# (one set of brackets) const int N1 = 10, N2 = 20, N3=25; // 2-dimensional array double [,] a = new double[N1,N2]; // 3-dimensional array double [,,] a = new double[N1,N2,N3]; • A ragged array in C# or Java uses multiple brackets: // create array of row pointers double [][] b = new double[N1][ ]; // allocate space for each row (can differ) for (k=0; k<N1; k++) b[k] = new double[N2]; • In Java (but not C#) can write: b = new double[N1][N2]

Accessing Array Elements • Ragged Arrays require multiple levels of dereferencing result = b[i][j]; • get address of b. • get b[i]. _addr = valueat( address(b) + i*sizeof( b[ ][ ] ) ) • result = valueat( _addr + j*sizeof( b[ ][ ] ) ) • Rectangular array computes address as offset: double [ , ] b = new double[N1,N2]; result = b[i,j]; • get address of b. • result = valueat( address(b) +i*N2 + j ) • In Java and C#, arrays are objects, so address is not this simple.

Efficiency and multi-dimensional array • Multi-dimensional array access is muchslower than 1-D array. • Access in row order is more efficient, and can minimize paging. // search a[ROWS,COLS] in row major order for(int r=0; r<ROWS; r++) for (int c=0; c<COLS; c++) if ( a[r,c] > max ) max = a[r,c]; r[0,0] r[0,1] ... r[0,ROWS-1] r[1,0] r[1,1] // search a[ROWS,COLS] in column major order for(int c=0; c<COLS; c++) for (int r=0; r<ROWS; r++) if ( a[r,c] > max ) max = a[r,c]; r[0,0] r[0,1] ... r[0,ROWS-1] r[1,0] r[1,1]

Type Checking • Verifying that the actual value of an expression is valid for the type to which it is assigned. • A strongly typed language is one in which all type errors are detected at compile time or run time. Example: • Java is strongly typed: most type errors are detected by compiler. Others, like casts, are checked at runtime and generate exceptions: Object obj = new Double(2.5); String s = obj; // compile time error String s = (String) obj; // run-time ClassCastException

Type Compatibility for user types typedef int type_a; typedef int type_b; int main( ) { type_a a; type_b b; b = 5; // assign integer to "type_b" variable OK? a = b; // assign "type_b" to "type_a" variable OK? • In C, "typedef" defines an alias for a type -- it doesn't create a new type.

Type Compatibility for user types (2) struct A { float x; char c; }; struct B { float x; char c; }; typedef C { float z; char c; }; int main( ) { struct A a; struct B b; struct C c; a.x = 0.5; a.c = 'a'; b = a; // OK c = a; // Error if (b == a) // OK?

Type Compatibility for classes (3) public class A { public float x; public char c; } public class B { public float x; public char c; } public class C { public float z; public char c; } public static void main(...) { A a = new A(); B b; C c; a.x = 0.5; a.c = 'a'; b = a; // Error c = a; // Error

Type Conversion and Polymorphism int max( int a, int b) { if ( a > b ) return a; else return b; } float max( float a, float b) { if ( a > b ) return a; else return b; } int main( ) { int m, n; float x, y, z; x = 5.5; m = x; // OK to convert float to int z = max(x, y); // EASY! call max(float,float) n = max(m, x); // which max function? y = max(x, m); // which max function?

Explicit Polymorphism in C++ /* This template generates "max" functions of * any parameter type that the program needs. */ template <typename T> T max( T a, T b ) { if ( a > b ) return a; else return b; } int main( ) { int m = 4, n = 9; float x = 0.5, y = 2.7; n = max(m, m); // generate max(int, int) y = max(x, y); // generate max(float, float)

Data Types and the Type System

Data Types and the Type System

Presentation Transcript

Types, Type Inference and Unification

Chapter 7: User-Defined Simple Data Types, Namespaces, and the string Type

Data Types and Data Structures

Types and Type Inference

Methods, Data and Data Types

Data Type and Operator

The String Data Type

Data Type and Display

The fundamental data type

Data Type and Variable

Data Types and Data Sources

CHAPTER 8 USER-DEFINED SIMPLE DATA TYPES, NAMESPACES, AND THE string TYPE

CHAPTER 8 USER-DEFINED SIMPLE DATA TYPES, NAMESPACES, AND THE string TYPE

Spatial Data Type Definitions in ORDB Abstract Data Types (ADT)

Data Type

The Type System

Data and Data Types