450 likes | 476 Views
Data Types and the Type System. James Brucker. Important Topics. what is a "type system"? what are common data types? how are numeric types stored and operated on? compound types: arrays, struct, records enumerated types character and string types strong type checking versus not-so-strong
E N D
Data Types and the Type System James Brucker
Important Topics • what is a "type system"? • what are common data types? • how are numeric types stored and operated on? • compound types: arrays, struct, records • enumerated types • character and string types • strong type checking versus not-so-strong • enumerations, type compatibility, and type safety • advantages & disadvantages of compile-time checking
Important Topics • type compatibility • what are compatibility rules in C, C++, and Java? • when are user-defined types compatible? • type conversion • what conversions are automatic in C, C++, Java? • what conversions are allowed using a cast?
Importance of knowing data types • Need to know the valid range of data values. for(int k=0; k<9999999; k++) /*do something*/; int k = Integer.MAX_VALUE; // = 2,147,483,647 • Need to know rules for operations. int k = Integer.MAX_VALUE; // = 2,147,483,647 k = k + 1; // overflow? int m = 7, n = 4; float x = m / n; // 1.75, 2.0 or 1.0 ? • Need to know what assignments are valid and how the compiler will convert from one type to another. • Need to know what variables represent: value of data or a reference to a storage location.
Data Type • A data type is a set of possible values and operations on those values Example: int set of values: -2,147,483,648 ..., 0, 1, 2,147,483,647 ( -231 to 231 - 1 ) operations: +: intint int * : intint int etc. internal representation: 32-bit 2's complement
Data Types define meaning • To the computer, a stored value is just bits. • The data type assigns meaning to those bits. • Example C function: /* "An int is a unsigned is a float" --the cpu */ void rawdata( ) { union { int i; unsigned int u; float f; } x; while( 1 ) { printf("Input a value: "); scanf("%d", &x.i); // read as an int printf("int %d is unsigned %u is float %g\n", x.i, x.u, x.f); } }
Type System The type system is • the collection of all data types • rules for type equivalence, type compatibility, and type conversion between data types
Type System Example: int n = 0.5 * 99; • cannot directly multiply a float times int • type conversion rule: "int" can be automatically converted to "float". • type system says float * float is float (49.5) • in C, assignment compatibility rule says that you can convert float back to int by truncation. • in Java or C#, result is double and it is not assignment compatible with int (assignment error)
Memory Concepts • We will cover memory management later, but first... • When OS runs a program it allocates at least 2 memory segments: • text segment for program instructions (Read Only) • data segment for data (variables, constants, ...) • Data segment is divided into 3 parts: • static area - static data • stack area - stack oriented data • heap - dynamic, non-stack data
Memory Concepts a[0], a[1], ... int count = 0; const int MAXSIZE = 4000; int *getarray(int n) { int *a = (int *)malloc( n*sizeof(int) ); return a; } int main( ) { int size; scanf("%d",&n); int *a = getarray(n); scanf("%f",a); a++; } Heap Area unused space Stack Frame for getarray Stack Area Stack Frame for main size a (pointer only) count MAXSIZE Static Area
Virtual Memory • Most OS use virtual memory. • The actual location of memory pages varies. • Accessing memory efficiently affects program speed. Program virtual memory Real memory page n page n+1 page n Memory manager page n+1
Integer Data Types C/C++ support both “unsigned” and “signed” integer types. Type # Bytes Range of values short int 2 -32,768 (-215) to 32,767 (215 - 1) unsigned short 2 0 to 65,535 (216 - 1) int 4 -2,147,483,648 (-231) to 2,147,483,647 (231 - 1) unsigned int 4 0 to 4,294,967,295 (232 - 1) long int same as "int" on Pentium and Athlon CPU C permits “char” type for integer values, too… char 1 -128 to 127 unsigned char 1 0 to 255
Example use of unsigned int • To display the address of a variable in C: printf("address of %s is %d\n", "x", (unsigned int)&x);
IEEE 754 Floating Point Standard • Problem: some numerical algorithms would run on one computer, but fail on another computer. with arithmetic overflow/underflow error on another. • Worse problem: results from different computers could differ greatly! • This reduced trust in the answer from computer. • In fact, when numerical results differ greatly it usually indicates a problem in the algorithm! • Solution: IEEE 754 (1985) defines a standard for computer storage of floating point numbers.
IEEE Floating Point Data Types -1.011100 x 211 = 1 1 0 0 0 1 0 1 0 0 1 1 1 0 0 0 0 . . . Sign bit Biased Exponent Mantissa Float: 1 8 bits bias= 127 23 bits Double: 1 11 bits bias=1023 52 bits Range Precision Float: 10-38 - 10+38 24 bits =~ 7 dec. digits Double: 10-308 - 10+308 53 bits =~ 15 dec. digits Stored exponent = actual exponent + bias
Implicit Leading "1" • Floating point numbers are stored in normalizedform: 13.2525 =1101.01010 = 1.1010100 x 23 3/16 =0.00110000 = 1.1000000 x 2-3 • Normalized form: the leading digit is always one. So, IEEE 752 doesn't store it. • Rule: if the stored value has exp. 2-bias to 2+bias then the floating point value is stored in normalized format: 1011.01110 = 1.011011100 x 23 mantissa: 011011100... exponent: 3+bias = 130 (single prec)
Gradual underflow • To extend the precision for small numbers, very small numbers are not stored in normalized form. • In this case the leading "1" is also stored and the biased exponent has value 0 (smallest exponent) Value MantissaBiased Exp. 1.01101110x 2-126 01101110000000000000000 -126+bias = 1 1.01101110x 2-127 10110111000000000000000 -127+bias = 0 1.01101110x 2-128 01011011100000000000000 -127+bias = 0 1.01101110x 2-129 00101101110000000000000 -127+bias = 0 1.01101110x 2-130 00010110111000000000000 -127+bias = 0 ... as number gets smaller, leading significant digits shift right 1.01101110x 2-147 00000000000000000000101 -127+bias = 0 1.01101110x 2-148 00000000000000000000010 -127+bias = 0 1.01101110x 2-149 00000000000000000000001 -127+bias = 0
IEEE 754 Floating Point Values The standard defines special values: +/-Infinity: 1/0 = +Infinity, -3/0 = -Infinity, exp(5000)= +Infinity, Infinity+Infinity = Infinity NaN (Not-a-Number). 0/0 = NaN, Infinity*0 = NaN, ... Value Sign Bit Exponent Mantissa Normalized f.p. 0, 1 1 to 2*bias any Denormalized 0, 1 00000000 any Zero 0, 1000000000 +Infinity 0 11111111 0 -Infinity 1 11111111 0 NaN 0, 1 11111111 any non-0
Floating Point Questions Question: How do you store 2.50 as a "float"? 2.50 = 1.25 x 2 = 1.01000000000 x 21 Implicit leading 1 rule: mantissa = 010000000000000 Exponent: 1 + bias = 128 Stored value: 0 10000000 010000000000000000000 Question: What is the decimal value of: 1 10000000 100000000000000000000 0 00000000 000000000000000000000 0 11111111 000000000000000000000
Floating Point Questions (cont'd) Question: How do you store 0.1 as a "float"? 0.1 = 0.0011001100110011001100 ... Normalized mantissa = 10011001100110011001100 Exponent: -3 + bias = 124 Stored val: 0 01111100 10011001100110011001100 • 0.1 has no exact representation in binary! Question: what decimal values have an exact binary representation (no truncation error)???
Consequence of inexact conversion • 0.1 does not have exact binary representation. • Therefore, we may have: 10 * 0.1 != 1.0 • 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1!= 1.0 • Don't use "==" as test criteria in loops with floats.This loop never terminates: double x = 0.1; while( x != 1.0 ) { // better: ( x <= 1.0 ) System.out.println( x ); x = x + 0.1; }
Type Compatibility for built-in types • Operations in most languages will automatically convert ("promote") some data types: 2 * 1.75 convert 2 (int) to floating point • Assignment compatibility: what automatic type conversions are allowed on assignment? int n = 1234567890; float x = n; // OK is C or Java n = x; // allowed in C? Java? • char -> short -> int -> long -> double short -> int -> float -> double • What about long -> float ? • Rules for C/C++ not same as Java.
C/C++ Arithmetic Type Conversion • For +, -, *, /, both operands must be the same type • C/C++ compiler "promotes" mixed type operands to make all operands same using the following rules: Operand Types Promote Result short op int short => int int long op int int => long long int op float int => float float int op double int => double double float op double float => double double etc... "op" is any arithmetic operation: + - * /
Assignment Type Conversion is not Arithmetic Type Conversion (1) • What is the result of this calculation? int m = 15; int n = 16; double x = m / n;
Forcing Type Conversion • Since arguments are integer, integer division is used: double x = 15 / 16; // = 0 ! • you must coerce "int" values to floating point. There are two ways: int m = 15; int n = 16; /** Efficient way: cast as a double */ double x = (double)m / (double)n ; /** Clumsy way: multiply by a float (ala Fortran) */ double x = 1.0*m / n;
Assignment Type Conversion is not Arithmetic Type Conversion (2) • Many students wrote this in Fraction program: public class Fraction { int numerator; // numerator of the fraction int denominator; // denominator of the fraction ...etc... /** compare this fraction to another. */ public int compareTo( Fraction frac ) { double r1 = this.numerator / this.denominator; double r2 = frac.numerator / frac.denominator; if ( r1 > r2 ) return 1; else if ( r1 == r2 ) return 0; else return -1; }
Arrays An array is a series of elements of the same type, with an index, which occupy consecutive memory locations. float x[10]; // C: array of 10 “float” vars char [] c = new char[40]; // Java: array of 40 "char" Array x[ ] in memory: x[0] x[1] x[2] . . . x[9] 4 Bytes = sizeof(float) Array c[ ] in memory : c[0] c[1] . . . c[39]
Array "dope vector" • In C or Fortran an array is just a set on continuous elements. No type or length information is stored. • Some languages store a "dope vector" (aka array descriptor) describing the array. /* C language */ double x[10]; /* Language with dope */ double x[10]; x 01E4820 x[0] x[1] x[2] x[3] ... x[0] x[1] x[2] x[3] ... x[9] x double 0 10 01E4820
Array as Object • In Java, arrays are objects: double [ ] x = new double[10]; • x is an Object; x[10] is a double (primitive type). double[ ] +length = 10 x[0] x[1] x[2] ... x x.getClass( ).toString( ) returns "[D"
1-Dimensional Arrays • Element of 1-D array computed as offset from start: float f[20]; address of f[n] = address(f) + n*sizeof(float) • Some languages permit arbitrary index bounds: • Pascal: var a: array [ 2..5 ] of real; • FORTRAN REAL (100) X array is X(1) ... X(100) REAL (2:5) Y array is Y(2) ... Y(5) • In any case, array element can be computed as offset: address of a[n] = address(a) + (n-start)*sizeof(real)
2-Dimensional Arrays There are different organizations of 2-D arrays: • Rectangular array in row major order: float r[4,3]; In memory (row major order): r[0,0] r[0,1] r[0,2] r[1,0] r[1,1] r[1,2] r[2,0] • Rectangular array in column major order (Fortran): real(4,3) x in memory (column major order) x(1,1) x(2,1) x(3,1) x(4,1) x(1,2) x(2,2) x(3,2) x(4,2) x(1,3)...
2-Dimensional Arrays • Computing address of array elements • Rectangular arrays in row major order: float x[ROWS][COLS]; address of x[j][k] = address(x) + (j*COLS + k) * sizeof(float) • Three dimensional array: float y[J][K][L]; address of y[j][k][l] = address(y) + j*K*L + k*L + l • 2-D and 3-D arrays require more time to access due to this calculation. • Compiler can optimize when you access consecutive items for(k = 0; k<COLS; k++) sum += x[j][k];
Arrays of Pointers: ragged arrays • Each element of a vector is a pointer to a vector char *days[7] = { "Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday" }; days = days[0] days[1] days[2] days[3] days[4] days[5] days[6] days[7] S u n d a y 0 M o n d a y 0 T u e s d a y 0 W e d n e s d a y 0 T h u r s d a y 0 F r i d a y 0 S a t u r d a y 0 Vector of pointers: 7 x 4 bytes = 28 bytes Array of characters: = 57 bytes
Arrays of Pointers: ragged arrays (2) • Compare previous slide with 2-D array: char days[ ][10] = { "Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday" }; days = S u n d a y 0 M o n d a y 0 T u e s d a y 0 W e d n e s d a y 0 T h u r s d a y 0 F r i d a y 0 S a t u r d a y 0 2-D array = 7 x 10 bytes = 70 bytes
What is sizeof( ) for 2-D arrays? • Rectangular array in C: char days[7][10] = { "Sunday", "Monday", ... }; int m = sizeof( days ); int n = sizeof( days[0] ); • Array of pointers in C: char *days[7] = { "Sunday", "Monday", ... }; int m = sizeof( days ); int n = sizeof( days[0] );
Java: always uses array of pointers • 2-D arrays in Java are always treated as array of pointers final int N = 10; double [][] a; a = new double[N][ ]; // create row pointers for(int k=0; k<N; k++) a[k] = new double[k+1]; // create columns // array dimensions determined by initial values int [][] m = { { 1, 2, 3, 4}, { 5, 6}, { 8, 9, 10}, { 11 } }; What are the sizes of each row of m ?
C#: rectangular and ragged arrays • A rectangular array in C# (one set of brackets) const int N1 = 10, N2 = 20, N3=25; // 2-dimensional array double [,] a = new double[N1,N2]; // 3-dimensional array double [,,] a = new double[N1,N2,N3]; • A ragged array in C# or Java uses multiple brackets: // create array of row pointers double [][] b = new double[N1][ ]; // allocate space for each row (can differ) for (k=0; k<N1; k++) b[k] = new double[N2]; • In Java (but not C#) can write: b = new double[N1][N2]
Accessing Array Elements • Ragged Arrays require multiple levels of dereferencing result = b[i][j]; • get address of b. • get b[i]. _addr = valueat( address(b) + i*sizeof( b[ ][ ] ) ) • result = valueat( _addr + j*sizeof( b[ ][ ] ) ) • Rectangular array computes address as offset: double [ , ] b = new double[N1,N2]; result = b[i,j]; • get address of b. • result = valueat( address(b) +i*N2 + j ) • In Java and C#, arrays are objects, so address is not this simple.
Efficiency and multi-dimensional array • Multi-dimensional array access is muchslower than 1-D array. • Access in row order is more efficient, and can minimize paging. // search a[ROWS,COLS] in row major order for(int r=0; r<ROWS; r++) for (int c=0; c<COLS; c++) if ( a[r,c] > max ) max = a[r,c]; r[0,0] r[0,1] ... r[0,ROWS-1] r[1,0] r[1,1] // search a[ROWS,COLS] in column major order for(int c=0; c<COLS; c++) for (int r=0; r<ROWS; r++) if ( a[r,c] > max ) max = a[r,c]; r[0,0] r[0,1] ... r[0,ROWS-1] r[1,0] r[1,1]
Type Checking • Verifying that the actual value of an expression is valid for the type to which it is assigned. • A strongly typed language is one in which all type errors are detected at compile time or run time. Example: • Java is strongly typed: most type errors are detected by compiler. Others, like casts, are checked at runtime and generate exceptions: Object obj = new Double(2.5); String s = obj; // compile time error String s = (String) obj; // run-time ClassCastException
Type Compatibility for user types typedef int type_a; typedef int type_b; int main( ) { type_a a; type_b b; b = 5; // assign integer to "type_b" variable OK? a = b; // assign "type_b" to "type_a" variable OK? • In C, "typedef" defines an alias for a type -- it doesn't create a new type.
Type Compatibility for user types (2) struct A { float x; char c; }; struct B { float x; char c; }; typedef C { float z; char c; }; int main( ) { struct A a; struct B b; struct C c; a.x = 0.5; a.c = 'a'; b = a; // OK c = a; // Error if (b == a) // OK?
Type Compatibility for classes (3) public class A { public float x; public char c; } public class B { public float x; public char c; } public class C { public float z; public char c; } public static void main(...) { A a = new A(); B b; C c; a.x = 0.5; a.c = 'a'; b = a; // Error c = a; // Error
Type Conversion and Polymorphism int max( int a, int b) { if ( a > b ) return a; else return b; } float max( float a, float b) { if ( a > b ) return a; else return b; } int main( ) { int m, n; float x, y, z; x = 5.5; m = x; // OK to convert float to int z = max(x, y); // EASY! call max(float,float) n = max(m, x); // which max function? y = max(x, m); // which max function?
Explicit Polymorphism in C++ /* This template generates "max" functions of * any parameter type that the program needs. */ template <typename T> T max( T a, T b ) { if ( a > b ) return a; else return b; } int main( ) { int m = 4, n = 9; float x = 0.5, y = 2.7; n = max(m, m); // generate max(int, int) y = max(x, y); // generate max(float, float)