Finite Automata & Regular Languages

Finite Automata &Regular Languages Sipser, Chapter 1

Deterministic Finite Automata • A DFA or deterministic finite automaton M is a 5-tuple, M = (Q, , , q0, F), where: • Q is a finite set of states of M •  is the finite input alphabet of M • : Q    Q is the state transition function • q0 is the start state of M • F  Q is the set of accepting states or final states of M

0 0 q0 1 q1 1 DFA Example M • State diagram • Q = { q0, q1 } = { 0, 1 }F = { q1 } StateTable

State table &state transition function • State table • State transition function(q0, 0) = q0, (q0, 1) = q1(q1, 0) = q1, (q1, 1) = q0

State transitions • If q, q’  Q, s  , and (q, s) = q’, then we say that q’ is an s-successor of q, or there is a transition from q to q’ on input s, and we writeq s q’ • Example: since (q0, 1) = q1, then there is a transition from q0 to q1 on input 1, and we write q0 1 q1.

State sequences • If a string of input symbolsw = s0s1s2 … sk-1 takes M from initial state q0 to state qk, namelyq0s0 q1 s1 q2 s2 q3  … s[k-1] qk then we say that qk is a w-successor of q0, and write q0w qk. Also q0q1q2 … qk is called an admissible state sequence for w.

Strings accepted by a DFA • Let M = (Q, , , q0, F) be a DFA, and w = s0s1s2 … sk-1  * be a string over alphabet . Then M accepts w if there exists an admissible state sequence q0q1q2 … qk for w, starting at initial state q0 and ending with state qk, where qk  F. That is, M accepts input string w if M ends up in one of the final states.

Language recognized by a DFA • The language L(M) that is recognized by a DFA, M = (Q, , , q0, F), is the set of all strings accepted by M. That is,L(M) = { w  * | M accepts w }= { w  * | q0 w qk, qk  F }. • Example: For the previous DFA, L(M) is the set of all strings of 0s and 1s with odd parity, that is, odd number of 1s.

1 B 0 1 1 C A 0 0 D 0,1 DFA Example 2 • Recognizer for 11*01* Trap

DFA Example 2 • M = (Q, , , q0, F), L(M) = 11*01*Q = { q0=A, B, C, D } = { 0, 1 }F = { C }

DFA Example 3 0 • Modulo 3 counter B 1 0,R 2,R A 1 2 2 C 1,R 0

DFA Example 3 • M = (Q, , , q0, F)Q = { q0=A, B, C } = { 0, 1, 2, R }F = { A }

Regular Languages • A language L  * is called regular if there exists a DFA M such that L(M)=L. • Earlier, we defined a language L  * as regular if there exists a T3 or regular (left-linear or right-linear) grammar G such that L(G)=L. We shall prove that these two definitions are equivalent.

Operations on Regular Languages • Let A and B be regular languages:Union:A  B = { x | x  A or x  B } • Concatenation:AB = { xy | x  A and y  B }. • Kleene Closure (A-star)A* = {x1x2x3 ... xk | k  0 and xi  A }

Examples of regular operations • A = { good, bad }, B = { boy, girl }A  B = { good, bad, boy, girl }AB = { goodboy, goodgirl, badboy, badgirl }A* = { , good, bad, goodgood, goodbad, badgood, badbad, … }

Closure under Union • If A and B are regular languages, then their union, A  B, is a regular language

Union Machine M(A È B) q1F q0 M(A) l q2F r0 p1F l M(B) p0 p2F

Closure under Concatenation • If A and B are regular languages, then their concatenation, AB, is a regular language.

Concatenation Machine M(AB)

Closure under Kleene Star • If A is a regular language, then the Kleene closure of A, A*, is also a regular language

Kleene Closure Machine M(A*)

NFAs:Nondeterministic Finite Automata • Presence of lambda transtitions. • May have more than one initial state. • On input a, state q may have no transition out. • On input a, state q may have more than one transition out.

NFAs • A nondeterministic finite automaton M is a five-tuple M = ( Q, , R, I, F ), where • Q is a finite set of states •  is the (finite) input alphabet • R is the transition relation, R  QQ • I  Q is the set of initial states • F  Q is the set of final states

Example NFAs • NFA that recognizes the language0*1  1*0 • NFA that recognizes the language(0  1)*11 (0  1)*

Converting NFAs to DFAs • Given a NFA, M = (Q, , R, I, F), build a DFA, M’ = (Q’, , , S0, F’) as follows. • The states S0, S1, S2, … of M’ are sets of states of M. • The initial state of M’ is obtained by putting together all the initial states of M and all states reachable from those by  transitions, and calling this set S0, the initial state of M’

Converting NFAs to DFAs • For each state Sk already in Q’ in M’, and for each input symbol a  , put together into a set Sj all states of M reachable from each state in Sk on input a. This set Sj may or may not yet already be in Q’. Also it may be the empty set . Add to  the transition from Sk to Sj on input a. • Since there can only be a finite number of subsets of states of M, this procedure will stop after a finite number of steps.

Example conversions • Convert the NFA for the language(0  1)*00  (0  1)*11 to a DFA 0,1 0 0 A C B 0,1 1 1 D F E

State transition table of NFA

State table of DFA

State diagram of DFA ABD 0 ABCD 0 1 0 1 0 AD 0 1 1 1 ADEF ADE

Regular Expressions (r.e.) • If a  , then the set a = {a} is a r.e. • The set  = {} is a r.e. • The set  = { } is a r.e. • If R and S are r.e., then (R  S) is a r.e. • If R and S are r.e., then (RS) is a r.e. • If R is a r.e., then ( R )* is a r.e. • Any r.e. is obtained by a finite application of the above rules.

REs and Regular Languages • R.E.s are shorthand notation for regular languages.

Regex: REs in Unix • [a-f], [^a-f] • R*, R+, R? • {R} • RS • R|S

Minimization of DFAs • Subset construction(Myhill-Nerode Theorem)

NFAs, DFAs, & Lexical Analyzer Generators • Sec 3.6: Finite Automata, Aho, Sethi, Ullman, “Compilers: P.T.T” • Sec 3.7: From REs to NFAs (Thompson’s Construction) • Sec 3.8: Design of a Lexical Analyzer generator • Sec 3.9: Optimization of DFA-based Lexical Analyzers

Finite Automata & Regular Languages