Regular Languages and Regular Grammars

Regular Languages and Regular Grammars Chapter 3

Regular Languages Regular Language Describes Regular Expression Accepts Finite State Machine

Operators on Regular Expressions In order of precedence: () Parentheses * Star Closure Concatenation + Union Example: Over  = {a, b, c}, (a + (b . c))* produces: {λ, a, bc, aa, abc, bcbc, … } . Note: The concatenation symbol is often omitted.

Regular Expressions Let  be a given alphabet. Then , λ, anda  are all primitive regular expressions. 2. If r1and r2 are regular expressions, so are r1 + r2, r1.r2, r1*, and (r1) 3. A string is a regular expression, iff it can be derived from the primitive regular expressions by a finite number of application of the rules in (2).

Languages Associated with Regular Expressions If r is a regular expression L(r) is a language associated with r. Rules to simplify languages associated with r: L() =  L(λ) = λL(a) = {a} L(r1 + r2) = L(r1) U L(r2) L(r1.r2) = L(r1).L(r2) L((r1)) = L(r1) L(r1*) = (L(r1))*

Analyzing a Regular Expression L((a+ b)*b) = L((a+b)*) L(b) = (L(a+b))* L(b) = (L(a) UL(b))* L(b) = ({a} U{b})* {b} = {a, b}* {b}. A string of a’s and b’s that end with b

Analyzing a Regular Expression L(a*b*) = L(a*)L(b*) = {a}*{b}* A string of zero or more a’s followed by a string of zero or more b’s.

Given a Language, find a rex L = {w {a, b}* : w = |w| is even} ((a + b)(a + b))* or (aa + ab + ba + bb)*

Examples L = {w {a, b}* : wcontains an odd number of a’s} b*(ab*ab*)*ab* or b*ab*(ab*ab*)* Both expressions require that there be a single a somewhere. There can also be other a’s, but they must occur in pairs.

More Regular Expression Examples Try these: L= {w {a, b}*: there is no more than one b in w} L(r) = {a2nb2m+1 : n  0, m  0}

More Regular Expression Examples Try these: L= {w {a, b}*: there is no more than one b in w} a*(λ+b)a* or a* + a*ba* L(r) = {a2n b2m+1 : n  0, m  0} (aa)*(bb)*b

The Details Matter a* +b*  (a+b)* (ab)* a*b*

Rex to NFA Finite state machines and regular expressions define the same class of languages. Theorem: Any language that can be defined with a regular expression can be accepted by some NFA and so is regular. Proof by Construction: Must show that an NFA can be constructed using rules for: , λ, any symbol in , union, and concatenation.

For Every Regular Expression There is a Corresponding FSM We’ll show this by construction. An FSM for: :

For Every Regular Expression There is a Corresponding FSM We’ll show this by construction. An FSM for: : A single element of :

For Every Regular Expression There is a Corresponding FSM We’ll show this by construction. An FSM for: : A single element of : λ:

Union M1 (recognizes string s) ;;; … λ λ λ λ … M2(recognizes string t) FSA that recognizes s+t

Concatenation M1 (recognizes string s) M2(recognizes string t) λ ;;; λ λ … … FSA that recognizes st

Star Closure λ M1 (recognizes string s) ;;; λ λ … λ λ FSA that recognizes s*

An Example (b +ab)* An FSM for a An FSM for b An FSM for ab: λ

An Example (b +ab)* An FSM for (b+ab): λ λ λ

An Example An FSM for (b+ab)*: λ λ λ λ λ λ λ λ

An Example A Simplified FSM for (b+ab)*: λ b a b λ

For Every FSM There is a Corresponding Regular Expression Theorem: Every regular language (i.e., every language that can be accepted by some DFSM) can be defined with a regular expression. Proof by Construction: Use generalized transition graphs (GTGs) to convert FSM to REX. A GTG is a transition graph whose edges are labeled with regular expressions.

A Simple Example Let M be: Suppose we rip out state 2:

The Algorithm fsmtoregexheuristic • fsmtoregexheuristic(M: FSM) = • 1. Remove unreachable states from M. • 2. If M has no accepting states then return . • 3. If the start state of M is part of a loop, create a new start state s • and connect s to M’s start state via an λ-transition. • 4. If there is more than one accepting state of M or there are any • transitions out of any of them, create a new accepting state and • connect each of M’s accepting states to it via an λ-transition. The • old accepting states no longer accept. • 5. If M has only one state then return λ. • 6. Until only the start state and the accepting state remain do: • 6.1 Select rip (not s or an accepting state). • 6.2 Remove rip from M. • 6.3 *Modify the transitions among the remaining states so M • accepts the same strings. • 7. Return the regular expression that labels the one remaining • transition from the start state to the accepting state.

Example 1 Create a new initial state and a new, unique accepting state, neither of which is part of a loop. Note:  λ

Example 1, Continued 2. Remove states and arcs and replace with arcs labeled with larger and larger regular expressions.

Example 1, Continued Remove state 3:

Example 1, Continued + Remove state 2:

Example 1, Continued + Remove state 1: + +

Example 2 a*(a + b)c*

Example 3 a* + a*(a + b)c*

Simplifying Regular Expressions Regex’s describe sets: ● Union is commutative: + = +. ● Union is associative: (+) + = +(+). ●  is the identity for union: += + = . ● Union is idempotent: + = . Concatenation: ● Concatenation is associative: () = (). ●λis the identity for concatenation: λ= λ = . ●  is a zero for concatenation:  =  = . Concatenation distributes over union: ● (+)  = () +(). ●  (+) = () +(). Kleene star: ● * = λ. ●λ* = λ. ●(*)* = *. ● ** = *. ●(+)* = (**)*.

Applications of regular expressions: Pattern Matching • Many applications allow pattern matches • unix • perl • Excel • Access • … • Pattern matching programs use automata • pattern  rex nfa  dfa  transition table  driver

A Biology Example – BLAST Given a protein or DNA sequence, find others that are likely to be evolutionarily close to it. ESGHDTTTYYNKNRYPAGWNNHHDQMFFWV Build a DFSM that can examine thousands of other sequences and find those that match any of the selected patterns.

Regular Expressions in Perl

Using Regular Expressions in the Real World Matching numbers: -? ([0-9]+(\.[0-9]*)? | \.[0-9]+) Matching ip addresses: S !<emphasis> ([0-9]{1,3} (\ . [0-9] {1,3}){3}) </emphasis> !<inet> $1 </inet>! Finding doubled words: \< ([A-Za-z]+) \s+ \1 \> From Friedl, J., Mastering Regular Expressions, O’Reilly,1997.

More Regular Expressions Identifying spam: \badv$?ert$?\b Trawl for email addresses: \b[A-Za-z0-9_%-]+@[A-Za-z0-9_%-]+ (\.[A-Za-z]+){1,4}\b

Using Substitution Building a chatbot: On input: <phrase1> is <phrase2> the chatbot will reply: Why is <phrase1> <phrase2>?

Chatbot Example <user> The food there is awful <chatbot> Why is the food there awful? Assume that the input text is stored in the variable $text: $text =~ s/^([A-Za-z]+)\sis\s([A-Za-z]+)\.?$/ Why is \1 \2?/ ;

Regular Grammars A regular grammar G is a quadruple (V, T, S, P) that is either consistently right-linear or consistently left-linear. ● V - Variables ●T – Terminals ● S - Start variable, S V ● P - Productions

Right-Linear Grammar All production rules are of the form: A  xB or A  x A,B  V A and B are variables x  T* x is a string in the alphabet Example: G = ({S}, {a, b}, S, P) P: S abS | a Corresponding Regular Expression: (ab)*a

Left-Linear Grammar All production rules are of the form: A Bx or A  x A,B  V A and B are variables x  T* x is a string in the alphabet Example: G = ({S, S1, S2}, {a, b}, S, P) P: S  S1ab S1 S1ab | S2 S2 a Corresponding Regular Expression: aab(ab)*

Focus on Right-Linear Grammars A language generated by a right-linear grammar is always regular. Proof by construction of FA on page 91 of text. Example: Construct an FA that accepts the language generated by the grammar: V0 aV1 V1  abV0| b

Focus on Right-Linear Grammars V0 aV1 V1  b V1  abV0 Complete FA:

Regular Languages and Regular Grammars

Regular Languages and Regular Grammars

Presentation Transcript

Regular Languages

Languages, grammars, and regular expressions

Regular Languages

Regular Grammars

Languages, Grammars, and Regular Expressions

Regular expressions Regular languages

REGULAR LANGUAGES

Regular Languages and Regular Expressions

Regular Languages and Grammars

Regular Languages

Regular Grammars

Regular Grammars

Regular Grammars

Regular Languages

Regular Expressions and Regular Languages

Regular Languages

Regular Grammars

Regular Grammars

Chapter 3 Regular Languages and Regular Grammars

Regular Grammars

Regular Languages

Regular Languages