KEMBAR78
Lexical Analysis Sample | PDF | Regular Expression | Theory Of Computation
100% found this document useful (1 vote)
390 views13 pages

Lexical Analysis Sample

This document contains sample exercises and solutions for a compiler design course on lexical analysis. It includes 6 problems dealing with constructing finite automata from regular expressions, converting between NFAs and DFAs using subset construction and minimization, and deriving regular expressions from DFAs. For each problem, it provides the relevant regular expression, describes the steps to solve the problem, and includes any resulting finite automata diagrams. The problems cover topics like Thompson construction, subset construction, DFA minimization, and deriving regular expressions from DFAs.

Uploaded by

Majd Abu Rakty
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
390 views13 pages

Lexical Analysis Sample

This document contains sample exercises and solutions for a compiler design course on lexical analysis. It includes 6 problems dealing with constructing finite automata from regular expressions, converting between NFAs and DFAs using subset construction and minimization, and deriving regular expressions from DFAs. For each problem, it provides the relevant regular expression, describes the steps to solve the problem, and includes any resulting finite automata diagrams. The problems cover topics like Thompson construction, subset construction, DFA minimization, and deriving regular expressions from DFAs.

Uploaded by

Majd Abu Rakty
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

University of Southern California Computer Science Department

Compiler Design
Spring 2011

Lexical Analysis

Sample Exercises and Solutions

Prof. Pedro C. Diniz


USC / Information Sciences Institute
4676 Admiralty Way, Suite 1001
Marina del Rey, California 90292
pedro@isi.edu

Lexical Analysis Sample Exercises 1 Spring 2011


University of Southern California Computer Science Department

Problem 1: Considering the alphabet Σ = {a,b} derive a Non-Deterministic-Finite Automaton (NFA)


using the Thompson construction that is able to recognize the sentences generated by the regular
expression b*(ab|ba)b*. Does the sentence w = “abbb” belong to the language generated by this regular
expression? Justify.

Solution: Considering a simplifcation of the combination of the NFA during the Thompson construction
(e.g., sequence of two ε-transitions can be considered as a single ε-transition) a possible NFA is as shown
below and where the start state is the state labeled 0.

The word w = “abbb” belongs to the language generated by this RE because there is a path from the start
state 0 to the accepting state 13 that spells out the word, respectively the path 0,3,4,6,8,10,11,12,11,12,13.

Problem 2: Starting from the NFA you derived in problem 1, convert it to a DFA using the sub-set
construction described in class. Verify that the DFA yields the same output as the original NFA for the
same input string w.

Solution: Using the ε-closure and DFAedge computation as described in class, we have the following
mapping of set of states of the NFA to sets of states of the DFA:

I0 = ε-closure (0) = {0, 1, 3, 4, 5}


I1 = DFAedge(I0,a) = ε-closure ({0, 1, 3, 4, 5}, a) = {6}
I2 = DFAedge(I0,b) = ε-closure ({0, 1, 3, 4, 5}, b) = {1, 2, 3, 4, 5, 7}
I3 = DFAedge(I1,a) = ε-closure ({6}, a) = Ierr
I4 = DFAedge(I1,b) = ε-closure ({6}, b) = {8, 10, 11, 13}
I5 = DFAedge(I2,a) = ε-closure ({1, 2, 3, 4, 5, 7}, a) = {6, 9, 10, 11, 13}
I6 = DFAedge(I2,b) = ε-closure ({1, 2, 3, 4, 5, 7}, b) = {1, 2, 3, 4, 5, 7} = I2
I7 = DFAedge(I4,a) = ε-closure ({8, 10, 11, 13}, a) = Ierr
I8 = DFAedge(I4,b) = ε-closure ({8, 10, 11, 13}, b) = {11, 12, 13}
I9 = DFAedge(I5,a) = ε-closure ({6, 9, 10, 11, 13}, a) = Ierr
I10 = DFAedge(I5,b) = ε-closure ({6, 9, 10, 11, 13}, b) = {8, 10, 11, 12, 13}
I11 = DFAedge(I8,a) = ε-closure ({11, 12, 13}, a) = Ierr
I12 = DFAedge(I8,b) = ε-closure ({11, 12, 13}, b) = {11, 12, 13} = I8
I13 = DFAedge(I10,a) = ε-closure ({8, 10, 11, 12, 13}, a) = Ierr
I14 = DFAedge(I10,b) = ε-closure ({8, 10, 11, 12, 13}, b) = I8

Effectively there are only 8 states as shown in the DFA below and this is clearly not the minimal DFA
that can recognize this regular expression.

Lexical Analysis Sample Exercises 2 Spring 2011


University of Southern California Computer Science Department

For the input sentence w = “abbb” in his DFA we would reach the state I8, through states I1, I4 and I8
and thus accepting this string.

Problem 3: Starting from the DFA you constructed in the previous problem, convert it to a minimal DFA
using the minimization algorithm described in class. Verify that the DFA yields the same output as the
original NFA for the same input string w.

Solution: The first partition would have the two sets of states P1 = {I0, I1, I2, Ierr} and P2 = {I4, I5, I8,
I10}. A second partition would maintain P2 and partition P1 into {I0, I1, Ierr} and generate P3 = {I2}
since I2 goes to itself on b and to P2 on a. Next the algorithm would generate P4 = {I1) and next generate
P5 = {Ierr} leaving P1 = {I0}. The final DFA would have therefore 5 states has shown below.

Again on the input string w = “abbb” the DFA would traverse P1, P4, P2 and loop in P2 therefore
accepting the string.

Important Note: If you try to apply the same construction, i.e. partitioning the set of states starting with
an incomplete DFA where the “trap” state is missing you might be surprised to find that you reach at an
incorrect minimal DFA.

Lexical Analysis Sample Exercises 3 Spring 2011


University of Southern California Computer Science Department

Problem 4: Considering the alphabet Σ = {0,1} construct a Non-Deterministic-Finite Automaton (NFA)


using the Thompson construction that is able to recognize the sentences generated by the regular
expression (1*01*0)*1*. Does the sentence w = “1010” belong to the language generated by this regular
expression? Justify.

Solution: The NFA is as shown below and where the start state is state labeled 0.

S0 = ε-closure (0) = {0, 1, 2, 4, 10, 11, 13} – this is a final state because of 13
S2 = DFAedge(S0,0) = ε-closure (S0, 0) = {5, 6, 8}
S1 = DFAedge(S0,1) = ε-closure (S0, 1) = {2, 3, 4, 11, 12, 13} – final state
DFAedge(S1,0) = ε-closure (S1, 0) = {5, 6, 8} = S2
DFAedge(S1,1) = ε-closure (S1, 0) = {2, 3, 4, 11, 12, 13} = S1
S3 = DFAedge(S2,0) = ε-closure (S2, 0) = {1, 2, 4, 9, 10, 11, 13} – final state
S4 = DFAedge(S2,1) = ε-closure (S2, 1) = {6, 7, 8}
DFAedge(S4,0) = ε-closure (S4, 0) = {1, 2, 4, 9, 10, 11, 13} = S3
DFAedge(S4,1) = ε-closure (S4, 1) = {6, 7, 8} = S4
DFAedge(S3,0) = ε-closure (S3, 0) = {5, 6, 8} = S2
DFAedge(S3,1) = ε-closure (S3, 1) = {2, 3, 4, 11, 12, 13} = S1

This results in the DFA shown below with starting state S0.

This DFA can be further minimize by recognizing that S1 and S3 form a self-contained partiton and S2
and S4 another partition resulting in the DFA on the right-hand-side.

Lexical Analysis Sample Exercises 4 Spring 2011


University of Southern California Computer Science Department

Problem 5: Consider the alphabet Σ = {a,b}.

a) Construct a Non-Deterministic-Finite Automaton (NFA) using the Thompson construction that is


able to recognize the sentences generated by the regular expression RE = (ab)*.(a)*.
b) Do the sentences w1 = “abaa” and w2 = “aaa” belong to the language generated by this regular
expression? Justify.
c) Convert the NFA in part a) to a DFA using the subset construction. Show the mapping between the
states in the NFA and the resulting DFA.
d) Minimize the DFA using the iterative refinement algorithm discussed in class. Show your
intermediate partition results and double check the DFA using the sentences w1 and w2.

Solution:
a) A possible construction (already simplified to have only a single ε-transition between states)
would result in the NFA shown below and where the start state is labeled 0.

b) Both words are recognized by this NFA. Regarding the word “abaa” there is a path from state
0 to the accepting state 6, namely: 0,1,2,3,4,5,4,5,6. Regarding the word “aaa” the automaton
may reach state 6 following the path 0,1,3,4,5,4,5,4,5,6.

c) Using the subset construction we arrive at the following subsets and transitions.

S0 = ε-closure (0) = {0, 1, 3, 4, 5, 6} – this is a final state because of state 6


S1 = DFAedge(S0,a) = ε-closure (goto(S0, a)) = {2, 4, 5, 6} – final state
SE = DFAedge(S0,b) = ε-closure (goto(S0, b)) = { }
S2 = DFAedge(S1,a) = ε-closure (goto(S1, a)) = {4, 5, 6} – final state
S3 = DFAedge(S1,b) = ε-closure (goto(S1, b)) = {3, 4, 5, 6} – final state

DFAedge(S2,a) = ε-closure (goto(S2, a)) = {4, 5, 6} = S2


DFAedge(S2,b) = ε-closure (goto(S2, b)) = { } = SE
DFAedge(S3,a) = ε-closure (goto(S3, a)) = {2, 4, 5, 6} = S1
DFAedge(S3,b) = ε-closure (goto(S3, b)) = { } = SE

This results in the DFA shown below with starting state S0.

d) This DFA can be further minimize by using the iterative refinement partitioning yeilding the
sequence of partitions indicated below.

Lexical Analysis Sample Exercises 5 Spring 2011


University of Southern California Computer Science Department

Lexical Analysis Sample Exercises 6 Spring 2011


University of Southern California Computer Science Department

Problem 6: Consider the DFA below with starting state 1 and accepting state 2:

a) Using the Kleene construction algorithm derive a regular expression.


b) Symplify the expression found in a) above.

Solution: Before engaging in a very long sequence of concatenation and symbolic operations we make
the observation that state 4 is a trap-state. As such we can safely ignore it since it is not an accepting state.
What this means is that we can use the algorithm simply with k = 3 and need to use it to compute R312 as
there are paths from the start state 1 to the final state 2 via state 3.

Expressions for k = 0
R011 = (ε)
R012 = (a|b)
R023 = (a)
R032 = (a)
R022 = (ε|b)

Expressions for k = 1
R111 = R011 (R011)*R011 | R011 = (ε)
R112 = R011 (R011)*R012 | R012 = (a|b)
R123 = R021 (R011)*R013 | R023 = (a)
R132 = R031 (R011)*R012 | R032 = (a)
R122 = R021 (R011)*R012 | R022 = (ε|b)

Expressions for k = 2
R211 = R112 (R122)*R121 | R111 = R111 = (ε)
R212 = R112 (R122)*R122 | R112 = (a|b) (ε|b)* (ε|b) | (a|b) = (a|b) b*
R223 = R122 (R122)*R123 | R123 = (ε|b) (ε|b)* (a) | (a) = b*(a)
R232 = R132 (R122)*R122 | R132 = (a) (ε|b)* (ε|b) | (a) = (a) b*
R222 = R122 (R122)*R122 | R122 = (ε|b) (ε|b)* (ε|b) | (ε|b) = b*
R213 = R112 (R122)*R123 | R113 = (a|b) (ε|b)* (a) = (a|b) b* (a)
R233 = R132 (R122)*R123 | R133 = (a) (ε|b)* (a) = (a) b*(a)

Expressions for k = 3
R311 = R213 (R233)*R231 | R211 = R211 = (ε)
R312 = R213 (R233)*R232 | R212 = (a|b) b* (a) ((a) b*(a))* (a) b* | (a|b)b*
R323 = R223 (R233)*R233 | R223 = b*(a) ((a) b*(a))* (a) b*(a) | b* (a)
R332 = R233 (R233)*R232 | R232 = (a) b*(a) ((a) b*(a))* (a) b* | (a) b*
R322 = R223 (R233)*R232 | R222 = b*(a) ((a) b*(a))* (a) b* | b*
R313 = R213 (R233)*R233 | R213 = (a|b) b* (a) ((a) b*(a))* (a) b*(a) | (a|b) b* (a)
R333 = R233 (R233)*R233 | R233 = (a) b*(a) ((a) b*(a))* (a) b*(a) | (a) b*(a)

Lexical Analysis Sample Exercises 7 Spring 2011


University of Southern California Computer Science Department

Clearly these expressions have a lot of redundant terms. In fact we are only interested in R312 = (a|b) b* (a)
((a) b*(a))* (a) b* | (a|b)b* since the starting state is state 1 and the only accepting state is state 2.

b) To simplify the expression found above is very hard, as there are a lot of partial terms in this
expression. Instead you follow the edges in the DFA and try to compose the regular expressions (in some
sense ignoring what you have done in the previous step).

The simplest way to look at this problem is to see that there is a prefix (a|b) and then you return to state 2
either by (b*) or (aa), so a compact regular expressions would be

R = (a|b)((b)|(aa))*

a much more compact regular expressions that R312 above.

Problem 7: Consider the alphabet Σ = {a; b}. Define a short, possibly the shortest, regular
expression that generates strings over Σ that contain exactly one “a” and at least one “b”.

Solution: The string must either start with one “b” or an “a”, so we can use a “b*” preffix to account for
any possible number of leading “b’s” before the first “a”. The single “a” must have at least one “b” either
preceeding it or just before it. After this single “a” there can be any number of trailing “b’s”. As possible
(although not necessarily unique) regular expression denoting the sought language is R = b*(ab|ba)b*.

Problem 8: Given an NFA with ε-transitions how do you derive an equivalent NFA without those ε-
transitions? Justify.

Solution: For those states with ε-transitions merge the edges of the states at the end of those ε transitions
into the current state. Iterate until there are no ε-transitions left. You still have a NFA but there will be at
more multiple edges with the label label out of a given state.

Problem 9: Given an NFA with several accept states, show how to convert it onto a NFA with exactly
one start state and one accepting state.

Solution: The inital state is the same as in the original NFA. As to the accepting states just add ε-
transitions from the orignal accepting states to a new accepting state and label that accepting state as the
only accepting state of the new NFA.

Lexical Analysis Sample Exercises 8 Spring 2011


University of Southern California Computer Science Department

Problem 10: Given the Finite Automaton below with initial state 0 and alphabet {a,b} answer the
following questions:

(a) Why is this FA a Non-Deterministic Finite Automaton (NFA)?


(b) Convert this NFA to a DFA using the closure computation.
(c) Minimize the resulting DFA.
(d) What is the Regular Expression matched by this NFA? You are advised not to use the automatic
Kleene construction or try to look at the input NFA but rather the correctly minimized DFA.

Solution:
(a) This FA is already a DFA, as the only transitions that are missing all go to the trap state 3.
(b) The subset construction, which in this case yields the same original DFA on the left where we have
noted the subset each state stands for.

(c) Using the iterative partition refinement we have the following partitions (below left) and the resulting
minimized DFA on the right again with the states relabeled. Notice that this is a special case where
the refinement algorithm cannot refine any sub-sets of states given that the original DFA was already
minimal.

(d) Based on the last DFA the regular expression identified by this FA must have exactly two consecutive
“b” symbol possibly preceded by a single “a” symbol. For instance, the string “abb” is in the language
but the string “abba” or “ba” are not. The regular expression can be denoted by a*bb.

Lexical Analysis Sample Exercises 9 Spring 2011


University of Southern California Computer Science Department

Problem 11: Draw the smallest DFA that accepts the language derived by the regular expression
(0*1*|1*0*).

Solution: A possible translation guided by the idea of the Thompson construction is shown below.

Using the iterative refinements algorithm for DFA minimization indeed confirms that this is the smallest
possible DFA. The sequence of refinements is shown below.

Problem 12: Given the regular expression RE = (ab|b*)*(ba*) over the alphabet Σ = {a,b} do the
following transformations:

a. Derive a NFA using the Thompson construction capabable of identifying the strings generated by
this regular expression.
b. Convert the NFA obtained in a) to a DFA.
c. Minimize the resulting DFA using the iterative refinement algorithm discussed in class.
d. Determine in how many steps is the sequence “ababaaa’ processed. Justify.

Solution: Regarding a) the NFA is given below. Here we made some very simple simplification that two
or more sequences of ε-transitions can be converted to a single ε-transitions. This substantially reduces
the number of states in the NFA and thius facilitates the construction of the equivalent DFA.

Lexical Analysis Sample Exercises 10 Spring 2011


University of Southern California Computer Science Department

Applying the subset construction we have the following states

S0 = ε-closure (0) = {0, 1, 4, 7, 8, 10, 2, 3, 11, 14, 15} – this is a final state because of 15
S1 = DFAedge(S0,a) = ε-closure (goto(S0, a)) = {5}
S2 = DFAedge(S0,b) = ε-closure (goto(S0, b)) = {1, 2, 3, 4, 7, 8, 9, 10, 11, 14, 15} – final
S3 = DFAedge(S1,a) = ε-closure (goto(S1, a)) = {}
S4 = DFAedge(S1,b) = ε-closure (goto(S1, b)) = {1, 2, 3, 4, 6, 7, 8, 10, 11, 14, 15} – final
S5 = DFAedge(S2,a) = ε-closure (goto(S2, a)) = {5} = S1
S6 = DFAedge(S2,b) = ε-closure (goto(S2, b)) = {1, 2, 3, 4, 7, 8, 9, 10, 11, 12, 14, 15} – final
S7 = DFAedge(S3,a) = ε-closure (goto(S3, a)) = {} = S3
S8 = DFAedge(S3,b) = ε-closure (goto(S3, b)) = {} = S3
S9 = DFAedge(S4,a) = ε-closure (goto(S4, a)) = {5} = S1
S10 = DFAedge(S4,b) = ε-closure (goto(S4, b)) = {1, 2, 3, 4, 5, 7, 8, 8, 10, 11, 12, 14, 15} = S6
S11 = DFAedge(S6,a) = ε-closure (goto(S6, a)) = {5, 11, 13, 14, 15} – final
S12 = DFAedge(S6,b) = ε-closure (goto(S6, b)) = {1, 2, 3, 4, 7, 8, 9, 10, 11, 14, 15} = S2
S13 = DFAedge(S11,a) = ε-closure (goto(S11, a)) = {} = S3
S14 = DFAedge(S11,b) = ε-closure (goto(S11, b)) = {1, 2, 3, 4, 6, 7, 8, 10, 11, 14, 15}= S4

So there are a total of 7 states, S0, S1, S2, S3, S4, S6, S11 and the corresponding DFA is as shown by the
table below (bold states are final states) and then in the figure below the table.

Next State
Current State a b
S0 S1 S2
S1 S3 S4
S2 S1 S6
S3 S3 S3
S4 S1 S6
S6 S11 S2
S11 S3 S4

Lexical Analysis Sample Exercises 11 Spring 2011


University of Southern California Computer Science Department

Problem 13: Given the DFA below describe in English the set of strings accepted by it.

Solution: Any string over the alphabet that contains any number of consecutive 1s, including none with a
preffix of zero or more consecutive 0s.

Lexical Analysis Sample Exercises 12 Spring 2011


University of Southern California Computer Science Department

Problem 14: Given a regular language L. i.e., a language described by a regular expression, prove that
the reverse of L is also a regular language (Note: the reverse of a language L is LR where for each word w
in L, wR is in LR. Given a word w over the given alphabet, wR is constructed by spelling w backwards).

Solution: If L is a regular language then there exists a DFA M1 that recognizes it. Now given M1 we can
construct M2 that recognizes the reverse of L with respect to the input alphabet. We now describe how to
construct M2. M2 is a replica of M1 but reversing all the edges. The final state of M2 is the state that used
to be the start state of M1. The start state of M2 is a new state with ε-transitions to the states in M2 that
used to be the final states of M1. Now because M1 might have multiple final states M2 is by constructiona
NFA. Given the equivalence of NFA and regular expressions we have shown that if L is regular so is LR.

Problem 15: Draw the DFA capable of recognizing the set of all strings beginning with a 1 which
interpreted as the binary representation of an integer (assuming the last digit to be processed is the least
significant) is congruent to zero modulo 3 i.e., the numeric value of this binary representation is a
multiple of 3.

Solution: The hard part about this problem is that you need to keep track with the already observed bits
what the remainder of the division by 3 is. Given that you have a reminder you would need no more that 2
states, one for each of the remainder values 1 through 2 being the state that represents a remainder of zero
the accepting state, in this case state S3. The DFA below accomplishes this. You can verify this DFA by
trying the number 12 in binary 1100 or 21 in binary 10101. Notice that in the last state S3 any additional
0 means you are shifting the bits by one bit, i.e., multiplying by 2, hence staying in the same state.

Lexical Analysis Sample Exercises 13 Spring 2011

You might also like