Regular Expression is a tiny language for pattern matching. It allows representing complex patterns in very few characters. It is used for finding patterns in text, replacing some patterns with other and to make sure the given text confirms to a know pattern etc.
Let us see a small example.
import re
m = re.match("ab+", "abbbbcd")
m
m.group()
The standard library module re provides regular expression support in Python.
The pattern ab+ matches any text having char a followed by one or more b characters.
Let us look at common patterns supported by regular expressions.
Patterns:
c - one character
. - any character
[abcd] - one of the characters specified in the group
[^abcd] - any charater other than the ones in the group
x* - zero or more occurances of x (x could be any of the above patterns)
x+ - one or more occurances of x
x? - zero or one occurance of x
(x) - match x and also remember it for use in substitution
\d - any digit
\s - any whitespace
^ - beginning of a string
$ - end of a string
Let us look at simple example.
text = "10 apples and 20 mangos"
Extract all numbers from the text.
re.findall("[0-9]+", text)
re.findall("[0-9]+", "1 apple, 2 oranges and 3 mangos")
re.sub("[0-9]+", "xx", text)
Q: Can regular expressions support OR?
Yes.
re.findall("\d+ (?:apple|mango)s?", "1 apple, 2 oranges and 3 mangos")
The (?:xxx) is a non matching group.
Problem: Write a function squeeze to replace multiple continuous space characters with a single space.
>>> squeeze("a b c d")
'a b c d'
# your code here
def squeeze(text):
return re.sub(" +", " ", text)
squeeze("a b c d")
Here is text I've copied from our slack window. Let us see if we can parse this.
%%file slack.txt
anandology [4:14 PM]
@sibiraja you have a typo in urlopen
[4:14]
it should be urlopen(url).read()
sibiraja [4:14 PM]
yeah, sorted it
anandology [4:15 PM]
urlopen(url) gives a http response object and that works like a file. you can read, read lines etc.
anandology [4:32 PM]
you may have to import requests
[4:32]
I've done that already in the previous example
senthilphython [4:33 PM]
ok
anandology [4:34 PM]
@ashok_m you need to import requests
[4:34]
@ashok_karuppaiyan you seems have got an error
[4:35]
try print requests.get(...).text
anandology [4:49 PM]
all the messages posted by the program will be available at #hack
sibiraja [4:50 PM]
yes