October 11, 2019 6

Lecture 12 – Set Operations on Strings Over an Alphabet

Lecture 12 – Set Operations on Strings Over an Alphabet


Today we shall consider set operations on
sigma star where sigma is an alphabet. So let us take an alphabet sigma for example
say a, b, c. The set of strings over sigma is denoted as sigma star and we want to consider
operations on the set of strings over sigma star. What is a string over sigma? For example
say ababbac this is a string over sigma. It is a sequence of symbols taken from sigma
and you denote the length of this string this ababbac is a string over sigma. The length
of this will be 7. So you denote the length of the string like this. The empty string
is denoted by lambda. Usually it is denoted by lambda but some times epsilon is also used. The length of the empty string is 0. If you
take the length of a it is 1, if we take the length of ab it will be 2 and so on. Now we
want to define some operations on this. The operation to be defined on the strings is
first concatenation. Let us see what concatenation is. Let sigma be an alphabet and x and y be
elements of sigma star. As I told you sigma star is the set of all strings over sigma
including the empty string. If you have one string x is equal to a1 a2 am and another
string y is equal to b1 b2 bn where all these symbols are from sigma, then the concatenation
of x with y is denoted by x dot y or simply xy is the string xy is equal to a1 a2 am and
b1 b2 bm. If x is equal to lambda or the empty string then xy will be equal to y for every
y. Similarly, if y is equal to lambda then xy will be equal to x. Let us take some example and see suppose I
have a string x, x is equal to say abb and y is equal to cba. Then what is the concatenation
of x with y? You can write it as x dot y. Usually is it is written like xy only. It
is, the symbols in x that is abb and then symbols in y are placed by the side of x that
is cba. The concatenation of x with y is denoted by abbcba. Now if x is equal to lambda then
xy will be just lambda into y that will be y and if y is equal to lambda xy will be x
into lambda that is x. So if you concatenate with the empty string on the right or on the
left you get the same string. So what can you say about the length of xy? The length
of xy in this case we see the length is 6, the length of
this is 3. So the length of xy will be the length of x plus length of y. In this particular
example it is 3 plus 3. So the length of abbcba is the length of abb plus length of cba that
is equal to 3 plus 3 you can see it is 6. So that also confirms to this, length of x
plus length of y will be length of x plus length of lambda is 0 that is equal to the
length of x. Now in this example which you have taken I
will repeat that x is equal to abb, y is equal to cba. So xy is equal to abbcba and yx is
equal to cbaabb. We can very easily see that xy is not equal to yx. Therefore concatenation
of strings is not commutative but it is associative.
That is you can see that x concatenated by yz is equal to xy concatenated with z. This you can very easily see. Again let me
take an example, x is equal to ab y is equal to bc z is equal to say ac, then what is x
concatenated with yz that is ab concatenated with yz is bcac that is equal to abbcac. Now
what is x concatenated with y and then concatenated with z? xy will be abbc concatenated with
z is ac that will again give you abbcac. So concatenation is an associative operator,
you can these you can see that these two are equal. It is not commutative but it is associative. Now you can also define some thing like x
power 0. Let us see what it is. Let x be an element of sigma star, for each n belonging
to N the string x power n is defined as follows: x power 0 is lambda and x power n plus 1 is
x power n into x, because of the associative property of concatenation we can define this
very easily. Take any string x say x is equal to abc then x power 0 is denoted as lambda
which is the empty string, x power 1 is x itself that is abc, x square is x concatenated
with x that is abc abc, x cubed will be x squared concatenated with x. You can write
this without any ambiguity because of the associative property. And that will be abcabcabc.
In general you can note that x power n will be abc power n and abc abc abc written n times.
So for a string we can define the power in this manner. Let sigma be a finite alphabet, then what
do you say about the language over sigma? A language over sigma is a subset of sigma
star. So, after defining concatenation and a power of a string we see what an alphabet
is. Usually we talk about languages over an alphabet. So a language is defined in this
manner. A language over sigma is a subset of sigma star. Consider for example, L1 equal to abc bc bca
where the alphabet is taken as a comma b comma c. This is a subset of sigma star that is
you are taking some strings from sigma star and L1 has these strings, it is a finite set.
So L1 is a finite language over sigma as it consists of a finite subset of sigma star.
This has got three strings. You can define infinite subsets as well. For example, you can consider some thing like
this L2 equal to a power n b power m n comma m greater than or equal to 1. So what sort
of strings will belong to L2? a power n that is n a’s followed by m b’s where n and
m are integers greater than or equal to 1. So L2 is an infinite subset of sigma star
and it consists of strings of the form as a sequence of a’s followed by a sequence
of b’s. No other string is in this language, it has only strings of this form, string of
a’s or a sequence of a’s followed by a sequence of b’s. So this is another example.
Take another example L3 equal to a power n b power n and n greater than or equal to 1. Again this is an infinite subset of sigma
star. This language consists of strings of the form aaa n a’s followed by an equal
number of b’s. This is also of the same form but here n and m can be different, here
the sequence of a’s should be followed by an equal number of b’s, that is string of
a’s followed by an equal number of b’s n and n, and n can vary from 1 2 3 etc, it
can be any integer. So this is again another infinite subset of sigma star. This is another
language. Later on if you study automata and formal language theory you will learn that
this is a finite set, this is a regular set and this is a contextary language, this is
not a regular set and this is a poset and infinite set. Now we had defined concatenation of two strings.
Next we have to define concatenation of two languages or two sets of strings. How do we
define this? Let A and B be languages over sigma, that is A and B are two languages.
The set product of A with B or the concatenation of A with B or you can denote it as A dot
B or you can say AB is the language AB equal to xy where x belongs to A and y belongs to
B. That is AB will consist of strings of the form xy where x will belong to A and y will
belong to B. The language AB consists of all strings which are formed by concatenating
an element of A with an element of B. Let L1 or let me say AB because, we have used
A and B in the definition. Let A consists of two strings a, ab and B consists of two
string bc, ca. Then the concatenation of A with B, actually by definition it is the set
of strings of the form xy where x belongs to A and y belongs to B. Now in this example
A dot B will be, a concatenated with bc will give you abc, a concatenated with ca will
give you aca, ab concatenated with bc will give you abbc, and ab concatenated with ca
will give you abca. So A dot B are concatenation of A with B where A and B are sets of strings. In this example A has two strings and B has
two strings, so AB has four strings, a concatenated with bc, a concatenated with ca, ab concatenated
with bc and ab concatenated with ca. Let us also consider an infinite example. Let say
L1 equal to a power n n greater than or equal to 1 and L2 is equal to b power m, c power
m, m greater than or equal to 1. Then what is the concatenation of L1 and L2. That will
have strings of the form xy where x belongs to L1 y belongs to L2 that is it will have
strings of the form a power n, b power m, c power m where n and m are greater than or
equal to 1. That is; L1 has strings of the form aaaa…a
and L2 has strings of the form bbb followed by an equal number of cs. So L1 and L2 will
have strings of the form aaaa n a’s followed by bbb m b’s and followed by an equal number
of cs where n and m can be any integers greater than are equal to 1. So the language will
consist of strings of the form, a string of a’s followed by a string of b’s with an
equal number of cs. The number of b’s and c’s should be equal. So it is denoted as
a power n, b power m, c power m, n and m are greater than or equal to 1. The concatenation of two languages is defined
in this manner. Now we have certain properties of this concatenation over languages. Let
us see what they are. Let A B C D be arbitrary languages over sigma. We have taken an alphabet
sigma and A B C D are some languages over sigma. Then the following relations hold:
The first one says that A phi into phi A equal to phi, that is, when A concatenated with
the empty set it consists of strings of the form xy but you cannot have y because this
is an empty set, so the result is empty set. A concatenated with lambda or lambda concatenated
with A will give you only A because this will have strings of the form xy where y is lambda.
So when x into lambda is equal to x so you only get strings from A. So in a similar manner you can prove that
if you concatenate with the set consisting of the empty string only you will get the
same set. Note the difference between these two, if you concatenate with the empty set,
you get the empty set but if you concatenate with the set consisting of the empty string
alone you get the same set. And because of the associative property of concatenation
of strings, we can very easily verify that ABC is equal to A into BC. That is this will
consist of strings of the form xyz where x belongs to A, y belongs to B, and z belongs
to C. And we know that the operation is associative, so you will have ABC is equal to ABC. And
also you have, if a language A is contained in B and another language C is contained in
D then AC will be contained in BD. This is because AC will consist of strings of the
form xy where x belongs to A and y belongs to C. A is contained in B and C is contained in
D, so you want to show AC is contained in BD. Now, AC has strings of the form xy where
x belongs to A and y belongs to C. So if z belongs to AC then you can write z in the
form xy where x belongs to A and y belongs to C. Now A is contained in B, so x belongs
to A would imply x belongs to B and y belongs to C, C is contained in D, so this will imply
y belongs to D and if x belongs to B and y belongs to D xy will belong to BD that is
z belongs to CD. So you conclude that z belongs to AC that will imply z belongs to BD. So
that means AC is contained BD. Then concatenation is distributive with respect
to union and intersection. These four rules say that. So A concatenated with B union C
is equal to AB union AC and B union C concatenated with A gives you BA union CA. A concatenated
with B intersection C gives you AB intersection AC and B intersection Cconcatenated with A
gives you BA intersection CA. Let us take one and prove then the others are similar. Take the first one that is A concatenated
with B union C gives you AB union AC. That is you want to prove that A B union C is equal
to AB union AC. How do you prove that? Now, suppose a string z belongs to lhs that is
z belongs to A B union C, this is equivalent to saying z is of the form xy where x belongs
to A and y belongs to B union C. That is equivalent to saying z is equal to xy where x belongs
to A and y belongs to B union C, this I can write as y belongs to B or y belongs to C.
This is equivalent to saying z is of the form xy, where when you distribute this x belongs
to A and y belongs to B or x belongs to A and y belongs to C. That is equivalent to
saying x belongs to A and y belongs to B would mean x belongs to AB or here xy this is xy
belongs to AB and here xy belongs to AC. So this is equivalent to saying xy belongs to
AB union AC. So z belonging to A concatenated with B union C is equivalent to saying z is
of the form xy and xy belongs to AB union AC so these two are equivalent. In the similar
manner we can prove the other three equalities. This is because concatenation is distributive
with respect to union and intersection. Now we have defined what is the power of a
string? If x is a string what is x power n, what is x power 0 and so on. If you have a
language A, then how do you define the power of a language. We have defined the concatenation
of two languages A and B and we have also seen that the concatenation operator when
with respect to languages is associative that is ABC is equal to A into BC, because of this
with out any problem we can define A power n. Let A be a language over sigma the language
A power n is defined inductively as follows: A power 0 is taken to be just the empty string
lambda and A power n plus 1 is equal to A power n concatenated with A where n is a natural
number. And this we can define because of the associative property with out any difficulty. The language A power n is the set product
of A with itself n times. Therefore if z belongs to a power n then z can be written in the
form w1 w2 w3 wn where each wi will belong to A for each i from 1 to n. So how do we
define A power n? Some A is a set of strings then A power 0 consists of just one string
lambda the empty string, A power 1 is just A itself, and A power 2 is A into A and A
cubed is A squared into A, that you can write as A into A into A, that is A concatenated
with A concatenated with A without any problem because of the associativity property. Now
A power n is n plus 1 is defined as A power n into A. Let us take an example say A consists of ab
bc then A power 0 will be lambda A power 1 will be A that is itself, what will be A square?
That will be A concatenated with A and that will have strings of the form abab, abbc,
then bcab, and bcbc. So this concatenated itself will give you this, this concatenated
with bc gives you this, bc concatenated with ab will give you this, and bc concatenated
with bc will give you this. So A squared will have these strings, A cubed will have these
strings concatenated with ab and bc. So how many strings you can expect? You can
expect about 8 strings ababab concatenated with ab will give you this, abbc concatenated
with ab will give you this, bcab concatenated with ab will give you this then bcbc concatenated
with ab will give you this. So you have concatenated each of this with ab. Now you concatenate
each of this with bc, so that will give you ababbc, abbcbc then bcabbc, bcbcbc. So these
are the string which belongs A cubed. Similarly you can define A power4, A power 5 etc and
until A power n. So in general A power n will have strings
of the form x1 x2 xn where each of the xi where i varying from 1 to n will belong to
A. So here, A consists of AB and BC. So each of these xi will be AB or BC. This is about
the power of language. We shall see some simple properties about this. Let A and B be subsets of sigma star and mn
be arbitrary elements of N. That is m and n are some natural numbers. Then you can very
easily check this. A power m into A power n will give you A power m plus n. A power
m whole power n will give you A power mn and A contain in B will imply A power n will be
contain in B power n. You can very easily check these things and they just follow from
the definition. Next we define what is mean by a kleene closure
of a language. Let A be a subset of sigma star, then A star, the kleene closure is defined
to be A star is union A power n where n belongs to N, that is union i is equal to 0 to infinity
A power n. That is how you say that A star is A power 0 A power 1 A power 2 A power 3
etc, this is an infinite union. And of course A power 0 you know that it is lambda, so it
is lambda A A squared etc. The set A star is often called the star closure or kleene
closure or simply the closure of A. Now let us take this same example. A is abbc
so A power 0 is lambda, A power 1 is A and this A squared is abab, abbc, bcab, bcbc,
and so on. Now what is A star here? A star is A power 0 union, it is an infinite union.
A star this is called kleene closure of A union A2 union A3 and so on. This is an infinite
union. And here it will have strings of the form in this particular example lambda union,
what does A have ab, bc then what ever strings are there in A squared will also be there.
So A star will have strings of the form lambda ab, bc, abab, abbc. We have taken the example
where A is a finite set it will hold in a similar manner for infinite sets also. For example, if we take A to be a power n,
b power n, n greater than or equal to 1, then A star will consist of strings of the form
a power k1, b power k1, a power k2; b power k2 etc a power kn , b power kn. So it will
have some strings of the form a power k1 b power k1, a power k2, b power k2 and so on
where n is greater than or equal to 0 and k1 k2 etc are greater than or equal to 1.
This will consist of the empty string also and just strings of the form a power n b power
n or strings of the form a power n b power n, a power m b power m and so on like that.
You will have strings like a power k1, b power k1, a power k2, b power k2 and so on. So in
the kleene closure you can define with respect to a finite language or infinite language
in both cases. You also have what is known as A to the power
plus or positive kleene closure or epsilon kleene closure. It is defined to be A to the
power plus is equal to union A power n where n is greater than or equal to 1. That is here
in A plus it starts with A power 1, the infinite union starts with A power 1 and not with A
power 0. In the earlier one it started with A power 0, A power 1 etc. Here we start with
A power 1, A power 2, A power 3. This is called positive closure or epsilon free kleene closure.
The set A to the power plus is often called the positive closure of A. The difference between A to the power plus
and A star is this. A star is defined as union i is equal to 0 to infinity A power i and
A to the power plus is defined as union i is equal to 1 to infinity A power i where
A is a language. So A star is A power 0 union A power 1 union A power 2 etc. A to the power
plus is an infinite union, A power 1 union, A power 2 union and so on. So you can very
easily see that A star equal to A power 0 union A to the power plus that is equal to
lambda because you know that A0 is lambda always Aplus. So let A and B be languages
over sigma and n is a natural number then the following relationships hold. The first one you have just seen, A star will
be equal to lambda union A to the power plus and A power n will be contain in A star for
all n greater than or equal to 0 because what is A star? A star is A power 0 union, A power
1 union, A power 2 and so on. So every A power n is part of A star. So A power n is contained
in A star for n greater than or equal to 0. Actually this is by definition. In a similar
manner you can very easily see that A power n is contained in A to the power plus for
n greater than or equal to 1. Then A is contained in AB star. How can you prove that? That is
A and B are two languages over sigma, then A is contained in AB star. What is AB star?
A concatenated with if you expand this is B power 0 union, B power 1 union, B power
2 and so on. That is AB power 0 union, AB power 1 union and so on. And what is this?
This is A into lambda union AB union AB squared union. And A into lambda is A union AB union
AB squared and so on. A is part of this, so A is contained in AB star. Please remember that A into lambda is just
A. In a similar manner you can prove that A is contained in B star A and of course A
is contained in B will imply A star contained in B star. Why? It is because A is equal to
A power 0 union, A power 1 union, A power 2 and this is A star. B star is B power 0
union, B power 1 union. If A is contained in B then A power 0 and B power 0 of course
are equal to lambda, A power 1 will be contained in B power 1, A squared will be contained
in B squared and each one of this will be contained in this, A power n will be contained
in B power n. So with the result A star will be contained in B star. The similar result holds when you consider
plus also. A contains in B will imply A to the power plus is contained in Bplus. A A
star is equal to A star A is equal to Aplus. This you can very easily
see like this. A A star will be A concatenated with A power 0, union A power 1, union A power
2 and so on. That will give you A union A squared union A cubed. Similarly, this will be nothing but Aplus.
In a similar manner you can also prove that A star A is also equal to Aplus. If lambda
belongs to A then A to the power plusand A star are equivalent.
Why A star is A power 0 union, A power 1 union, A power 2 union and so on. A to the power
plus is A union A squared union and so on. So what can you say about A star? A star is
A power 0 is lambda union Aplus. This we have already seen. Now if lambda belongs to A,
A has lambda so A to the power plus also has lambda. When this already has got lambda finding
the union with lambda again the start give anything different it gives the same thing.
So if lambda belongs to A then lambda belongs to Aplus.
In that case adding this additionally does not make any difference.
Therefore A star will be equal to Aplus. These are some of the results again based
on the definition of A star and Aplus. A star star is equal to A star A star equal to A
star, A star plus will be A to the power plus star equal to A star, A star A to the power
plus will be A to the power plus equal to A to the power plus A star and A star B star
star will be A union B star A star union B star star. You can verify these things very
easily. There is a small result about this which is very useful in automata theory. Later
on this set of result will be used while finding out the regular expression from a deterministic
final state automata. Let us consider this theorem. Let A and B
be arbitrary subsets of sigma star such that lambda does not belong to A. Then the equation
X is equal to AX union B has the unique solution X is equal to A star B. This is a very useful
lemma and let us try to prove this theorem. A and B are arbitrary languages over sigma,
it is a subset of strings. Then you have an equation
X is equal to AX union B. I shall write it as AX plus B where plus is denoting the union.
Then this has a unique self. What you want to show that this has a unique solution X
is equal to A star B. And the condition is lambda does not belong to A. Then the unique
solution is X is equal to A star B. Let us try to prove this result. This is what you
want to prove. So let us write like this: X is equal to AX plus B. Now again I can replace this X by AX plus
B. So that is AX plus B plus B that is again A squared X plus AB plus B that is equal to
A squared. Again I replace x by AX plus B plus AB plus B and this will give you A cubed
X plus A squared B plus AB plus B and proceeding in this manner you can get A power n plus
1 X plus A power n X plus A power B plus and AB plus B. Now you want to show that X is equal to A
star B. For that what you have to show is, if some w belongs to X then w belongs to A
star B. You have to prove it into two parts, first part is this if w belongs to x then
w belongs to A star B this means X is contained in A star B. The second part is if w belongs
to A star B then w belongs to X. This would mean A star B is contained in X. From these
two you get X is equal to A star B. Now let us take the first one. Let w belong
to X and the length of w be n. Take a string w belonging to X and a length of the string
is n. Now since lambda does not belong to A, any string here the length of any string
here will be at least n plus 1. So when you say w is of length n it should belong to this
portion. So w will belong to this portion that means w will be A power iB. That means
w belongs to A power iB for some i less than or equal to n. That means w belongs to A star
B. The second portion is, if w belongs to A star B then w belongs to X. w belongs to
A star B implies w belongs to some A power jB for some j. Take n greater than j, then
we know that X can be written in this form and w belongs to A power j, so it belongs
to this portion, X can be written in this form. X consists of this portion as well as
this portion. So it belongs to this portion and so it belongs to X. So this will imply
w belongs to X. And so X is a equivalent to A star B. This is a unique solution. Now we are able to prove this because lambda
does not contain A and that is why we are able to say that for this portion the minimum
length of string should be at least n plus 1. Now, if that condition is dropped, that
is if you have lambda belongs to A then what happens? In this case again X is equal to
A star will be one of the solutions. The solution will not be unique. Any C containing B or
any C containing B A star C will also be a solution. This particular lemma is very useful
when you want to find a regular expression from a deterministic final state automata. We shall learn about final state automata
later, that is towards the end of this course. Now this study about set of strings is very
useful in formal languages and automata theory. And that finds application in compiler design.
This study is useful in formal languages and automata theory. This study itself finds use
in compiler design because in any program, for example take a Pascal program, it is taken
as a sequence of symbols and what are syntactically correct programs and what are not syntactically
correct programs is defined by means of a grammar which defines that programming languages.
For each programming language like Pascal, earlier languages like algal sixty, Fortran
you consider a grammar which will generate all syntactically correct programs in that
particular language. The study of formal language and automata theory was really motivated by
the use of it in compiler design. So you talk about a grammar which is a generative
device and you also talk about an automaton which is an acceptance device. Thus the study
about strings and operations on sets of strings are very useful in formal languages in automata
theory which itself very useful in compiler design theory.

6 Replies to “Lecture 12 – Set Operations on Strings Over an Alphabet”

  • Josh Nielsen says:

    There are many applications, primarily in the areas of the study of languages and grammars in computer science. It becomes useful when one wants to generate or recognize a grammar or transform one grammar into another. This is useful in parsers for computer programs and recognizing keywords and elements (called lexemes) in a language that can be defined by regular expressions (formal descriptions of string patterns).

    (continued)

  • Josh Nielsen says:

    Knowing such relations about strings is also useful in specifying if a language is a subset of another language, such as with embedded Domain Specific Languages (for example a smaller subset of the C language being used for a domain specific programming application). It is just a very general way to represent the expression of formal languages in terms of mathematical logic (set theory). Its applications are varied. I just gave a few specific instances in which I have seen it used in CIS studies

  • daviangel says:

    You will be expected to know this stuff if you take any computer theory course and if you take that course you will see how this is used everyday in computer like with regular expressions, compilers, etc…

  • Rachit Choudhary says:

    IITAN ROXXXX…………………………

  • Anne Fatihah says:

    i have my final exam paper on this tomorrow…

  • BrofaBloke says:

    The intro is too funny

Leave a Reply

Your email address will not be published. Required fields are marked *

Recent Posts
Recent Comments
Tags
© Copyright 2019. Amrab Angladeshi. Designed by Space-Themes.com.