320 likes | 541 Views
PHP Bible. Chapter 10: Strings and String Functions. Summary. Creating and manipulating strings Inspecting, comparing, and searching strings Advanced string functions. Strings in PHP. Strings are sequences of characters that can be treated as a unit Assigned to variables
E N D
PHP Bible Chapter 10: Strings and String Functions
Summary • Creating and manipulating strings • Inspecting, comparing, and searching strings • Advanced string functions
Strings in PHP • Strings are sequences of characters that can be treated as a unit • Assigned to variables • Given as input to functions • Returned from functions • Sent as output to appear on your user's web page • The simplest way to specify a string in PHP is to enclose it in single or double quotes
Interpolation with curly braces • In most situations, you can simply include a variable in a doubly-quoted string and the variable's value will be spliced into the string when it is interpreted except when: • When text follows the variable name without a space separating them • When the expression you want interpolated is not a simple variable • If you have the string "I will play $sport1ball today", PHP will attempt to find a value for the variable named $sport1ball. If the variable name you intended to use is just $sport1, PHP will replace the text $sport1ball with the empty string. To tell PHP that you want to just interpolate the variable name $sport1, you can write the string as "I will play {$sport1}ball today" • This construct is also useful for including array elements or object attributes in a string as in "I will play {$sport[1]}ball"
Characters and string indices • In most programming languages, strings are handled as arrays of single characters • To access a character at a particular position in a string, you would normally just access the index of the position you wanted (e.g. $stringval[1] would retrieve the 2nd character in the $stringval variable – note that strings and most arrays in PHP begin indexing at 0) • You can access individual characters in strings this way in PHP, although this usage has been deprecated and is not suggested • The recommended method, for no apparent reason, is to place the index of the character you wish to retrieve in curly brackets (e.g. $stringval{1})
String operators • PHP offers only one real operator on strings: the dot (.) or concatenation operator • This operator, when placed between two string arguments, produces a new string that is the result of putting the two strings together in sequence $my_two_cents = 'You\'re a moron'; $third_cent = 'And another thing'; print ($my_two_cents.'...'.$third_cent.'...'); • In this case, we are not passing multiple string arguments to the print statement, we are passing a single string created by concatenating the 4 strings together • Unlike Java, you cannot use the + operator to concatenate 2 strings together. If you try, it will attempt to interpret the strings as numbers and return the addition of the 2 numbers.
Concatenation and assignment • Just as with arithmetic operators, PHP has a shorthand operator that combines concatenation with assignment (.=) $string_var = 'one plus one = '; $string_var .= 'two'; print ($string_var); • Would print the string "one plus one = two" • The string to the right of the concatenation and assignment operator will always be appended to the end (right side) of the string value of the variable on the left side of the operator and assign the new string to the variable on the left side of the operator
String functions • PHP provides a huge variety of functions for the munching and crunching of strings • This section presents the basic functions for inspecting, comparing, modifying, and printing strings • Note for C programmers: many of the PHP string function names are the same or similar to the corresponding C/C++ function names and perform the same tasks and take the same arguments
Inspecting strings • int strlen (string str) returns the number of characters in the string • Knowing the string's length is particularly useful for situations in which we'd like to loop through a string character by character
Finding characters and substrings • int strpos (string haystack, string needle [,int offset]) finds the numerical position of a particular character or string within a string, if it exists $test = 'Where is Carmen Santiago?'; $carmen_loc = strpos($test,'Carmen'); $where_loc = strpos($test,'Where'); $bob_loc = strpos($test,'Bob'); $last_i = strrpos($test,'i'); • Would assign the value 9 to $carmen_loc • Would assign the value 0 to $where_loc • Would assign the value FALSE to $bob_loc • Would assign the value 20 to $last_i (strrpos starts looking for the needle at the end of the string) • Note: be careful when using strpos & strrpos in boolean evaluations since if the needle is located at the beginning of the string, the functions would return a value of 0 which is evaluated as FALSE in booleans. Instead, if you are wanting to test if a string appears within another string, use the equality operator === (e.g. if ($strpos($test,'Where') === FALSE))
Comparison • If you are wanting to determine if one string is the same as another, you can use the equality operator (==) if you want to determine if the strings are exactly equal (including case) and you are sure that the values on both sides are strings (e.g. if one of the values is an integer, the other one may be converted to an integer to perform the comparison) • int strcmp (string str1, string str2) returns < 0 if str1 is less than str2; > 0 if str1 is greater than str2, and 0 if they are equal • int strcasecmp ( string str1, string str2) is the same as strcmp except that the comparisons are done case-insensitive
Searching • To find out if one string is contained within another, you can use the strpos function or the strstr function • string strstr (string haystack, string needle) returns part of haystack string from the first occurrence of needle to the end of haystack. If needle is not found, returns FALSE • Note that one difference between strpos and strstr is that strpos returns an integer based upon the location of the needle, and strstr returns a substring starting at the position of the needle $test = 'Where is Carmen Santiago?'; $carmen_loc = strpos($test,'Carmen'); • Would assign the string "Carmen Santiago?" to $carmen_loc • You can perform a case-insensitive search of the haystack using the stristr() function
Substring selection • Many of PHP's string functions have to do with slicing (choosing a portion of a string) and dicing (selectively modifying a string) • Most dicing functions do not change the string you started out with, they return the modified string which can then be stored or printed, etc. • string substr (string string, int start [,int length]) returns the portion of string specified by the start and length parameters • If start is non-negative, the returned string will start at the start'th position in string, counting from zero. For instance, in the string 'abcdef', the character at position 0 is 'a', the character at position 2 is 'c', and so forth. • If start is negative, the returned string will start at the start'th character from the end of string • If length is given and is positive, the string returned will contain at most length characters beginning from start (depending on the length of string). If string is less than start characters long, FALSE will be returned. • If length is given and is negative, then that many characters will be omitted from the end of string (after the start position has been calculated when a start is negative). If start denotes a position beyond this truncation, an empty string will be returned.
Substring selection (cont.) $alphabet_test = 'abcdefghijklmnop'; print('3: '.substr($alphabet_test,3).'<BR>'); print('-3: '.substr($alphabet_test,-3).'<BR>'); print('3,5: '.substr($alphabet_test,3,5).'<BR>'); print('3,-5: '.substr($alphabet_test,3,-5).'<BR>'); print('-3,-5: '.substr($alphabet_test,-3,-5).'<BR>'); print('-3,5: '.substr($alphabet_test,-3,5).'<BR>'); • Would give us the output 3: defghijklmnop -3: nop 3,5: defgh 3,-5: defghijk -3, -5: -3,5: nop
String cleanup functions • string rtrim (string str [,string charlist]) returns a string with whitespace stripped from the end of str. The charlist parameter. Allows you to list characters that you want stripped • string ltrim (string str [,string charlist]) returns a string with whitespace stripped from the beginning of str • string trim (string str [,string charlist]) returns a string with whitespace stripped from the beginning and end of str • Whitespace characters include • " " (ASCII 32 (0x20)), an ordinary space. • "\t" (ASCII 9 (0x09)), a tab. • "\n" (ASCII 10 (0x0A)), a new line (line feed). • "\r" (ASCII 13 (0x0D)), a carriage return. • "\0" (ASCII 0 (0x00)), the NUL-byte. • "\x0B" (ASCII 11 (0x0B)), a vertical tab. • Note: trim should be called on almost all user input
String replacement • mixed str_replace (mixed search, mixed replace, mixed subject) returns a string or an array with all occurrences of search in subject replaced with the given replace value • string substr_replace (string string, string replacement, int start [,int length]) replaces a copy of string delimited by the start and (optionally) length parameters with the string given in replacement • string strrev (string string) returns string, reversed • string str_repeat (string input, int multiplier) returns input_str repeated multiplier times
Tokenizing and parsing functions • string strtok (string arg1, string arg2) splits a string (arg1) into smaller strings (tokens), with each token being delimited by any character from arg2. • That is, if you have a string like "This is an example string" you could tokenize this string into its individual words by using the space character as the token. • Note that only the first call to strtok uses the string argument. Every subsequent call to strtok only needs the token to use, as it keeps track of where it is in the current string. To start over, or to tokenize a new string you simply call strtok with the string argument again to initialize it. • You may put multiple tokens in the token parameter. $string = "This is\tan example\nstring"; $tok = strtok($string, " \n\t"); while ($tok) {echo "Word=$tok "; $tok = strtok(" \n\t"); } • Would output "Word=This Word=is Word=an Word=example Word=string"
Tokenizing and parsing functions (cont.) • array explode (string separator, string string [, int limit]) returns an array of strings, each of which is a substring of string formed by splitting it on boundaries formed by the string separator. If limit is set, the returned array will contain a maximum of limit elements with the last element containing the rest of string • explode is similar to strtok except that the results are returned in an array and all of the "tokenizing" is done in a single step
Case functions • string strtolower ( string str) returns string with all alphabetic characters converted to lowercase • string strtoupper ( string string) returns string with all alphabetic characters converted to uppercase • string ucfirst (string str) returns a string with the first character of str capitalized, if that character is alphabetic • string ucwords (string str) returns a string with the first character of each word in str capitalized, if that character is alphabetic. • The definition of a word is any string of characters that is immediately followed by a whitespace • ucfirst and ucwords will not convert any characters to lower case. If you want to convert the string to all lowercase and have the first letter of each word uppercased, use ucwords(strtolower($str))
Escaping functions • One of the virtues of PHP is that it can talk to almost any application or protocol • Frequently, the input to the functions that communicate with these applications is passed in as a string • Since many applications attach special meanings to certain characters which if they appear in your string may cause unintentional behavior • To prevent this, PHP provides several functions which will escape these special characters so they are handled normally
Escaping functions (cont.) • If you're going to be passing text into an SQL query, you need to ensure that the values have been properly escaped $name = "D'Amato"; $query = "INSERT INTO test SET name_field='$name' SET id='1'"; $result = mysql_query($query) or die(mysql_error()); • Or even worse if $name = "Aken'; DROP TABLE test "; $name = mysql_escape_string("D'Amato"); • The following functions are also used to escape strings: • string addslashes ( string str ) returns a string with backslashes before characters that need to be quoted in database queries etc. These characters are single quote ('), double quote ("), backslash (\) and NUL (the NULL byte) • htmlspecialchars() & quotemeta() also escape strings • stripslashes() reverses the action of addslashes()
Printing and output • If you need more tightly formatted output than print or echo, PHP also offers: • void printf ( string format [, mixed args]) which produces output according to format • string sprintf ( string format [, mixed args]) returns a string produced according to the formatting string format • The format string takes every character that you put in the string and outputs it literally except for the % character which signals the beginning of a conversion specification, indicating how to print one of the arguments that follow the format string • Each conversion specification consists of a percent sign (%), followed by one or more of these elements, in order: • An optional padding specifier that says what character will be used for padding the results to the right string size. This may be a space character or a 0 (zero character). The default is to pad with spaces. An alternate padding character can be specified by prefixing it with a single quote ('). See the examples below. • An optional alignment specifier that says if the result should be left-justified or right-justified. The default is right-justified; a - character here will make it left-justified. • An optional number, a width specifier that says how many characters (minimum) this conversion should result in. • An optional precision specifier that says how many decimal digits should be displayed for floating-point numbers. This option has no effect for other types than float. (Another function useful for formatting numbers is number_format().) • A type specifier that says what type the argument data should be treated as
Printing & output (cont.) • Possible types used in printf statements include: • % - a literal percent character. No argument is required. • b - the argument is treated as an integer, and presented as a binary number. • c - the argument is treated as an integer, and presented as the character with that ASCII value. • d - the argument is treated as an integer, and presented as a (signed) decimal number. • u - the argument is treated as an integer, and presented as an unsigned decimal number. • f - the argument is treated as a float, and presented as a floating-point number. • o - the argument is treated as an integer, and presented as an octal number. • s - the argument is treated as and presented as a string. • x - the argument is treated as an integer and presented as a hexadecimal number (with lowercase letters). • X - the argument is treated as an integer and presented as a hexadecimal number (with uppercase letters)
Printing & output (cont.) • Here is a way to get a random background color for your website. printf is needed to generate a valid color code, because zero padding is needed (4 has to become 04) <HTML><HEAD> <TITLE>Random backgrounds</TITLE></HEAD><?php function getRandomColor() { $ROOD = dechex(rand(0,255)); $GROEN= dechex(rand(0,255)); $BLAUW= dechex(rand(0,255)); return sprintf('%02s%02s%02s',$ROOD,$GROEN,$BLAUW); }?><BODY BGCOLOR="#<?php echo getRandomColor();?>"> ...</BODY></HTML>
HTML-Specific string functions • string htmlspecialchars (string string [,int quote_style [,string charset]]) returns a string with some characters converted to their HTML equivalent; the translations made are those most useful for everyday web programming. • string htmlentities (string string [,int quote_style [,string charset]]) identical to htmlspecialchars() in all ways, except with htmlentities(), all characters which have HTML character entity equivalents are translated into these entities • string nl2br ( string string) Returns string with '<br />' inserted before all newlines • string strip_tags ( string str [, string allowable_tags]) tries to return a string with all HTML and PHP tags stripped from a given str
Hashing using MD5 • MD5 is a string-processing algorithm that's used to produce a digest or signature of whatever string it is given. The algorithm boils its input string down into a fixed-length string of 32 hexadecimal values that: • Always produces the same output string for any given input string • Results are evenly spread over the range of possible values • No known way to efficiently produce an input string corresponding to a given MD5 output string or produce 2 inputs that yield the same output • string md5 (string str [,bool raw_output]) calculates the MD5 hash of str using the RSA Data Security, Inc. MD5 Message-Digest Algorithm, and returns that hash. If the optional raw_output is set to TRUE, then the md5 digest is instead returned in raw binary format with a length of 16
Hashing using MD5 (cont.) print ('md5 of "Tim" is '.md5('Tim')."\n"); print ('md5 of "tim" is '.md5('tim')."\n"); print ('md5 of "time" is '.md5('time')."\n"); print ('md5 of "Tim" is '.md5('Tim')."\n"); • Gives us the browser output: md5 of "Tim" is dc2054afd537ddc98afd9347136494ac md5 of "tim" is b15d47e99831ee63e3f47cf3d4478e9a md5 of "time" is 07cc694b9b3fc636710fa08b6922c42b md5 of "Tim" is dc2054afd537ddc98afd9347136494ac
Hashing using MD5 (cont.) • MD5 is useful for: • Checksumming a message or file: If you are worried about errors that might happen in the transfer, you can transmit an MD5 digest along with the message and run the message through MD5 on receipt and check it against the digest. If it's not equal, the message was modified or corrupted in the transfer • Checking if a file's contents have changed: If you want to check on whether a file (e.g. a web page) has changed, you can store the MD5 digest of the file (which is much smaller than the original file) and then when you are checking for modifications, compare the MD5 of the new file against the one you stored. • Encrypting passwords: If you store an MD5'd password in your database, you can compare the MD5 of what was entered to what was stored in the database and no one will be able to look at what was stored in the database to see what that person's password was
String similarity functions • If you want to test 2 strings to see how similar they are to one another • int levenshtein ( string str1, string str2) returns the Levenshtein-Distance between the two argument strings or -1, if one of the argument strings is longer than the limit of 255 characters. The Levenshtein distance is defined as the minimal number of characters you have to replace, insert or delete to transform str1 into str2 • string soundex ( string str) Calculates the soundex key of str. Soundex keys have the property that words pronounced similarly produce the same soundex key, and can thus be used to simplify searches in databases where you know the pronunciation but not the spelling. This soundex function returns a string 4 characters long, starting with a letter. • string metaphone ( string str) Calculates the metaphone key of str. Similar to soundex() metaphone creates the same key for similar sounding words. It's more accurate than soundex() as it knows the basic rules of English pronunciation. The metaphone generated keys are of variable length.