Thanks to visit codestin.com
Credit goes to www.slideshare.net

Domain Specific Languages in
         Python




    Siddharta Govindaraj
siddharta@silverstripesoftware.com
What are DSLs?

Specialized mini-languages for specific problem
domains that make it easier to work in that
domain
Example: SQL

SQL is a mini language specialized to retrieve data
from a relational database
Example: Regular Expressions

Regular Expressions are mini languages
specialized to express string patterns to match
Life Without Regular Expressions
def is_ip_address(ip_address):
    components = ip_address_string.split(".")
    if len(components) != 4: return False
    try:
        int_components = [int(component) for component in
components]
    except ValueError:
           return False
    for component in int_components:
           if component < 0 or component > 255:
               return False
    return True
Life With Regular Expressions
def is_ip(ip_address_string):
    match = re.match(r"^(d{1,3}).(d{1,3}).(d{1,3}).
(d{1,3})$", ip_address_string)
    if not match: return False
    for component in match.groups():
        if int(component) < 0 or int(component) > 255:
return False
    return True
The DSL that simplifies our life


 ^(d{1,3}).(d{1,3}).(d{1,3}).(d{1,3})$
Why DSL - Answered

When working in a particular domain, write your
code in a syntax that fits the domain.

             When working with patterns, use RegEx
              When working with RDBMS, use SQL
       When working in your domain – create your own DSL
The two types of DSLs

External DSL – The code is written in an external
file or as a string, which is read and parsed by the
application
The two types of DSLs

Internal DSL – Use features of the language (like
metaclasses) to enable people to write code in
python that resembles the domain syntax
Creating Forms – No DSL
<form>
<label>Name:</label><input type=”text” name=”name”/>
<label>Email:</label><input type=”text” name=”email”/>
<label>Password:</label><input type=”password”
name=”name”/>
</form>
Creating Forms – No DSL

– Requires HTML knowledge to maintain
– Therefore it is not possible for the end user to
change the structure of the form by themselves
Creating Forms – External DSL
UserForm
name->CharField label:Username
email->EmailField label:Email Address
password->PasswordField




This text file is parsed and rendered by the app
Creating Forms – External DSL

+ Easy to understand form structure
+ Can be easily edited by end users
– Requires you to read and parse the file
Creating Forms – Internal DSL
class UserForm(forms.Form):
    username = forms.RegexField(regex=r'^w+$',
          max_length=30)
    email = forms.EmailField(maxlength=75)
    password =
          forms.CharField(widget=forms.PasswordInput())




Django uses metaclass magic to convert this
syntax to an easily manipulated python class
Creating Forms – Internal DSL

+ Easy to understand form structure
+ Easy to work with the form as it is regular python
+ No need to read and parse the file
– Cannot be used by non-programmers
– Can sometimes be complicated to implement
– Behind the scenes magic → debugging hell
Creating an External DSL
UserForm
name:CharField -> label:Username size:25
email:EmailField -> size:32
password:PasswordField




Lets write code to parse and render this form
Options for Parsing

Using string functions → You have to be crazy
Using regular expressions →
Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems. - Jamie Zawinski


Writing a parser →         ✓   (we will use PyParsing)
Step 1: Get PyParsing
   pip install pyparsing
Step 2: Design the Grammar
form ::= form_name newline field+
field ::= field_name colon field_type [arrow property+]
property ::= key colon value
form_name ::= word
field_name ::= word
field_type ::= CharField | EmailField | PasswordField
key ::= word
value ::= alphanumeric+
word ::= alpha+
newline ::= n
colon ::= :
arrow ::= ->
Quick Note

Backus-Naur Form (BNF) is a syntax for
specifying grammers
Step 3: Implement the Grammar
newline = "n"
colon = ":"
arrow = "->"
word = Word(alphas)
key = word
value = Word(alphanums)
field_type = oneOf("CharField EmailField PasswordField")
field_name = word
form_name = word
field_property = key + colon + value
field = field_name + colon + field_type +
     Optional(arrow + OneOrMore(field_property)) + newline
form = form_name + newline + OneOrMore(field)
Quick Note

PyParsing itself implements a neat little internal
DSL for you to describe the parser grammer


Notice how the PyParsing code almost perfectly
reflects the BNF grammer
Output
> print form.parseString(input_form)


['UserForm', 'n', 'name', ':', 'CharField', '->',
'label', ':', 'Username', 'size', ':', '25', 'n',
'email', ':', 'EmailField', '->', 'size', ':', '25', 'n',
'password', ':', 'PasswordField', 'n']




PyParsing has neatly parsed our form input into
tokens. Thats nice, but we can do more.
Step 4: Suppressing Noise Tokens
newline = Suppress("n")
colon = Suppress(":")
arrow = Suppress("->")
Output
> print form.parseString(input_form)


['UserForm', 'name', 'CharField', 'label', 'Username',
'size', '25', 'email', 'EmailField', 'size', '25',
'password', 'PasswordField']




All the noise tokens are now removed from the
parsed output
Step 5: Grouping Tokens
field_property = Group(key + colon + value)
field = Group(field_name + colon + field_type +
Group(Optional(arrow + OneOrMore(field_property))) +
newline)
Output
> print form.parseString(input_form)


['UserForm',
  ['name', 'CharField',
    [['label', 'Username'], ['size', '25']]],
  ['email', 'EmailField',
    [['size', '25']]],
  ['password', 'PasswordField',[]]]

Related tokens are now grouped together in a list
Step 6: Give Names to Tokens
form_name = word.setResultsName("form_name")
field = Group(field_name + colon + field_type +
  Group(Optional(arrow + OneOrMore(field_property))) +
  newline).setResultsName("form_field")
Output
> parsed_form = form.parseString(input_form)
> print parsed_form.form_name


UserForm


> print parsed_form.fields[1].field_type


EmailField




Now we can refer to parsed tokens by name
Step 7: Convert Properties to Dict
def convert_prop_to_dict(tokens):
    prop_dict = {}
    for token in tokens:
        prop_dict[token.property_key] =
                                    token.property_value
    return prop_dict


field = Group(field_name + colon + field_type +
          Optional(arrow + OneOrMore(field_property))
             .setParseAction(convert_prop_to_dict) +
          newline).setResultsName("form_field")
Output
> print form.parseString(input_form)


['UserForm',
    ['name', 'CharField',
      {'size': '25', 'label': 'Username'}],
    ['email', 'EmailField',
      {'size': '32'}],
    ['password', 'PasswordField', {}]
]


Sweet! The field properties are parsed into a dict
Step 7: Generate HTML Output

We need to walk through the parsed form and
generate a html string out of it
def get_field_html(field):

   properties = field[2]

   label = properties["label"] if "label" in properties else field.field_name

   label_html = "<label>" + label + "</label>"

   attributes = {"name":field.field_name}

   attributes.update(properties)

   if field.field_type == "CharField" or field.field_type == "EmailField":

       attributes["type"] = "text"

   else:

       attributes["type"] = "password"

   if "label" in attributes:

       del attributes["label"]

   attributes_html = " ".join([name+"='"+value+"'" for name,value in attributes.items()])

   field_html = "<input " + attributes_html + "/>"

   return label_html + field_html + "<br/>"



def render(form):

   fields_html = "".join([get_field_html(field) for field in form.fields])

   return "<form id='" + form.form_name.lower() +"'>" + fields_html + "</form>"
Output
> print render(form.parseString(input_form))


<form id='userform'>
<label>Username</label>
<input type='text' name='name' size='25'/><br/>
<label>email</label>
<input type='text' name='email' size='32'/><br/>
<label>password</label>
<input type='password' name='password'/><br/>
</form>
It works, but....


                 Yuck!


The output rendering code is an UGLY MESS
Wish we could do this...
> print Form(CharField(name=”user”,size=”25”,label=”ID”),
             id=”myform”)


<form id='myform'>
<label>ID</label>
<input type='text' name='name' size='25'/><br/>
</form>




Neat, clean syntax that matches the output domain
well. But how do we create this kind of syntax?
Lets create an Internal DSL
class HtmlElement(object):

   default_attributes = {}

   tag = "unknown_tag"



   def __init__(self, *args, **kwargs):

       self.attributes = kwargs

       self.attributes.update(self.default_attributes)

       self.children = args



   def __str__(self):

       attribute_html = " ".join(["{}='{}'".format(name, value) for name,value in
                                                           self.attributes.items()])

       if not self.children:

            return "<{} {}/>".format(self.tag, attribute_html)

       else:

            children_html = "".join([str(child) for child in self.children])

            return "<{} {}>{}</{}>".format(self.tag, attribute_html, children_html,
self.tag)
> print HtmlElement(id=”test”)



<unknown_tag id='test'/>



> print HtmlElement(HtmlElement(name=”test”), id=”id”)



<unknown_tag id='id'><unknown_tag name='test'/></unknown_tag>
class Input(HtmlElement):

   tag = "input"



   def __init__(self, *args, **kwargs):

       HtmlElement.__init__(self, *args, **kwargs)

       self.label = self.attributes["label"] if "label" in self.attributes else

                                                             self.attributes["name"]

       if "label" in self.attributes:

           del self.attributes["label"]



   def __str__(self):

       label_html = "<label>{}</label>".format(self.label)

       return label_html + HtmlElement.__str__(self) + "<br/>"
> print InputElement(name=”username”)



<label>username</label><input name='username'/><br/>



> print InputElement(name=”username”, label=”User ID”)



<label>User ID</label><input name='username'/><br/>
class Form(HtmlElement):

   tag = "form"



class CharField(Input):

   default_attributes = {"type":"text"}



class EmailField(CharField):

   pass



class PasswordField(Input):

   default_attributes = {"type":"password"}
Now...
> print Form(CharField(name=”user”,size=”25”,label=”ID”),
             id=”myform”)


<form id='myform'>
<label>ID</label>
<input type='text' name='name' size='25'/><br/>
</form>




                            Nice!
Step 7 Revisited: Output HTML
def render(form):
    field_dict = {"CharField": CharField, "EmailField":
               EmailField, "PasswordField": PasswordField}
    fields = [field_dict[field.field_type]
          (name=field.field_name, **field[2]) for field in
                                              form.fields]
    return Form(*fields, id=form.form_name.lower())




Now our output code uses our Internal DSL!
INPUT
UserForm
name:CharField -> label:Username size:25
email:EmailField -> size:32
password:PasswordField
                          OUTPUT
<form id='userform'>
<label>Username</label>
<input type='text' name='name' size='25'/><br/>
<label>email</label>
<input type='text' name='email' size='32'/><br/>
<label>password</label>
<input type='password' name='password'/><br/>
</form>
Get the whole code

http://bit.ly/pyconindia_dsl
Summary

+ DSLs make your code easier to read
+ DSLs make your code easier to write
+ DSLs make it easy to for non-programmers to
maintain code
+ PyParsing makes is easy to write External DSLs
+ Python makes it easy to write Internal DSLs

Creating Domain Specific Languages in Python

  • 1.
    Domain Specific Languagesin Python Siddharta Govindaraj [email protected]
  • 2.
    What are DSLs? Specializedmini-languages for specific problem domains that make it easier to work in that domain
  • 3.
    Example: SQL SQL isa mini language specialized to retrieve data from a relational database
  • 4.
    Example: Regular Expressions RegularExpressions are mini languages specialized to express string patterns to match
  • 5.
    Life Without RegularExpressions def is_ip_address(ip_address): components = ip_address_string.split(".") if len(components) != 4: return False try: int_components = [int(component) for component in components] except ValueError: return False for component in int_components: if component < 0 or component > 255: return False return True
  • 6.
    Life With RegularExpressions def is_ip(ip_address_string): match = re.match(r"^(d{1,3}).(d{1,3}).(d{1,3}). (d{1,3})$", ip_address_string) if not match: return False for component in match.groups(): if int(component) < 0 or int(component) > 255: return False return True
  • 7.
    The DSL thatsimplifies our life ^(d{1,3}).(d{1,3}).(d{1,3}).(d{1,3})$
  • 8.
    Why DSL -Answered When working in a particular domain, write your code in a syntax that fits the domain. When working with patterns, use RegEx When working with RDBMS, use SQL When working in your domain – create your own DSL
  • 9.
    The two typesof DSLs External DSL – The code is written in an external file or as a string, which is read and parsed by the application
  • 10.
    The two typesof DSLs Internal DSL – Use features of the language (like metaclasses) to enable people to write code in python that resembles the domain syntax
  • 11.
    Creating Forms –No DSL <form> <label>Name:</label><input type=”text” name=”name”/> <label>Email:</label><input type=”text” name=”email”/> <label>Password:</label><input type=”password” name=”name”/> </form>
  • 12.
    Creating Forms –No DSL – Requires HTML knowledge to maintain – Therefore it is not possible for the end user to change the structure of the form by themselves
  • 13.
    Creating Forms –External DSL UserForm name->CharField label:Username email->EmailField label:Email Address password->PasswordField This text file is parsed and rendered by the app
  • 14.
    Creating Forms –External DSL + Easy to understand form structure + Can be easily edited by end users – Requires you to read and parse the file
  • 15.
    Creating Forms –Internal DSL class UserForm(forms.Form): username = forms.RegexField(regex=r'^w+$', max_length=30) email = forms.EmailField(maxlength=75) password = forms.CharField(widget=forms.PasswordInput()) Django uses metaclass magic to convert this syntax to an easily manipulated python class
  • 16.
    Creating Forms –Internal DSL + Easy to understand form structure + Easy to work with the form as it is regular python + No need to read and parse the file – Cannot be used by non-programmers – Can sometimes be complicated to implement – Behind the scenes magic → debugging hell
  • 17.
    Creating an ExternalDSL UserForm name:CharField -> label:Username size:25 email:EmailField -> size:32 password:PasswordField Lets write code to parse and render this form
  • 18.
    Options for Parsing Usingstring functions → You have to be crazy Using regular expressions → Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. - Jamie Zawinski Writing a parser → ✓ (we will use PyParsing)
  • 19.
    Step 1: GetPyParsing pip install pyparsing
  • 20.
    Step 2: Designthe Grammar form ::= form_name newline field+ field ::= field_name colon field_type [arrow property+] property ::= key colon value form_name ::= word field_name ::= word field_type ::= CharField | EmailField | PasswordField key ::= word value ::= alphanumeric+ word ::= alpha+ newline ::= n colon ::= : arrow ::= ->
  • 21.
    Quick Note Backus-Naur Form(BNF) is a syntax for specifying grammers
  • 22.
    Step 3: Implementthe Grammar newline = "n" colon = ":" arrow = "->" word = Word(alphas) key = word value = Word(alphanums) field_type = oneOf("CharField EmailField PasswordField") field_name = word form_name = word field_property = key + colon + value field = field_name + colon + field_type + Optional(arrow + OneOrMore(field_property)) + newline form = form_name + newline + OneOrMore(field)
  • 23.
    Quick Note PyParsing itselfimplements a neat little internal DSL for you to describe the parser grammer Notice how the PyParsing code almost perfectly reflects the BNF grammer
  • 24.
    Output > print form.parseString(input_form) ['UserForm','n', 'name', ':', 'CharField', '->', 'label', ':', 'Username', 'size', ':', '25', 'n', 'email', ':', 'EmailField', '->', 'size', ':', '25', 'n', 'password', ':', 'PasswordField', 'n'] PyParsing has neatly parsed our form input into tokens. Thats nice, but we can do more.
  • 25.
    Step 4: SuppressingNoise Tokens newline = Suppress("n") colon = Suppress(":") arrow = Suppress("->")
  • 26.
    Output > print form.parseString(input_form) ['UserForm','name', 'CharField', 'label', 'Username', 'size', '25', 'email', 'EmailField', 'size', '25', 'password', 'PasswordField'] All the noise tokens are now removed from the parsed output
  • 27.
    Step 5: GroupingTokens field_property = Group(key + colon + value) field = Group(field_name + colon + field_type + Group(Optional(arrow + OneOrMore(field_property))) + newline)
  • 28.
    Output > print form.parseString(input_form) ['UserForm', ['name', 'CharField', [['label', 'Username'], ['size', '25']]], ['email', 'EmailField', [['size', '25']]], ['password', 'PasswordField',[]]] Related tokens are now grouped together in a list
  • 29.
    Step 6: GiveNames to Tokens form_name = word.setResultsName("form_name") field = Group(field_name + colon + field_type + Group(Optional(arrow + OneOrMore(field_property))) + newline).setResultsName("form_field")
  • 30.
    Output > parsed_form =form.parseString(input_form) > print parsed_form.form_name UserForm > print parsed_form.fields[1].field_type EmailField Now we can refer to parsed tokens by name
  • 31.
    Step 7: ConvertProperties to Dict def convert_prop_to_dict(tokens): prop_dict = {} for token in tokens: prop_dict[token.property_key] = token.property_value return prop_dict field = Group(field_name + colon + field_type + Optional(arrow + OneOrMore(field_property)) .setParseAction(convert_prop_to_dict) + newline).setResultsName("form_field")
  • 32.
    Output > print form.parseString(input_form) ['UserForm', ['name', 'CharField', {'size': '25', 'label': 'Username'}], ['email', 'EmailField', {'size': '32'}], ['password', 'PasswordField', {}] ] Sweet! The field properties are parsed into a dict
  • 33.
    Step 7: GenerateHTML Output We need to walk through the parsed form and generate a html string out of it
  • 34.
    def get_field_html(field): properties = field[2] label = properties["label"] if "label" in properties else field.field_name label_html = "<label>" + label + "</label>" attributes = {"name":field.field_name} attributes.update(properties) if field.field_type == "CharField" or field.field_type == "EmailField": attributes["type"] = "text" else: attributes["type"] = "password" if "label" in attributes: del attributes["label"] attributes_html = " ".join([name+"='"+value+"'" for name,value in attributes.items()]) field_html = "<input " + attributes_html + "/>" return label_html + field_html + "<br/>" def render(form): fields_html = "".join([get_field_html(field) for field in form.fields]) return "<form id='" + form.form_name.lower() +"'>" + fields_html + "</form>"
  • 35.
    Output > print render(form.parseString(input_form)) <formid='userform'> <label>Username</label> <input type='text' name='name' size='25'/><br/> <label>email</label> <input type='text' name='email' size='32'/><br/> <label>password</label> <input type='password' name='password'/><br/> </form>
  • 36.
    It works, but.... Yuck! The output rendering code is an UGLY MESS
  • 37.
    Wish we coulddo this... > print Form(CharField(name=”user”,size=”25”,label=”ID”), id=”myform”) <form id='myform'> <label>ID</label> <input type='text' name='name' size='25'/><br/> </form> Neat, clean syntax that matches the output domain well. But how do we create this kind of syntax?
  • 38.
    Lets create anInternal DSL
  • 39.
    class HtmlElement(object): default_attributes = {} tag = "unknown_tag" def __init__(self, *args, **kwargs): self.attributes = kwargs self.attributes.update(self.default_attributes) self.children = args def __str__(self): attribute_html = " ".join(["{}='{}'".format(name, value) for name,value in self.attributes.items()]) if not self.children: return "<{} {}/>".format(self.tag, attribute_html) else: children_html = "".join([str(child) for child in self.children]) return "<{} {}>{}</{}>".format(self.tag, attribute_html, children_html, self.tag)
  • 40.
    > print HtmlElement(id=”test”) <unknown_tagid='test'/> > print HtmlElement(HtmlElement(name=”test”), id=”id”) <unknown_tag id='id'><unknown_tag name='test'/></unknown_tag>
  • 41.
    class Input(HtmlElement): tag = "input" def __init__(self, *args, **kwargs): HtmlElement.__init__(self, *args, **kwargs) self.label = self.attributes["label"] if "label" in self.attributes else self.attributes["name"] if "label" in self.attributes: del self.attributes["label"] def __str__(self): label_html = "<label>{}</label>".format(self.label) return label_html + HtmlElement.__str__(self) + "<br/>"
  • 42.
    > print InputElement(name=”username”) <label>username</label><inputname='username'/><br/> > print InputElement(name=”username”, label=”User ID”) <label>User ID</label><input name='username'/><br/>
  • 43.
    class Form(HtmlElement): tag = "form" class CharField(Input): default_attributes = {"type":"text"} class EmailField(CharField): pass class PasswordField(Input): default_attributes = {"type":"password"}
  • 44.
    Now... > print Form(CharField(name=”user”,size=”25”,label=”ID”), id=”myform”) <form id='myform'> <label>ID</label> <input type='text' name='name' size='25'/><br/> </form> Nice!
  • 45.
    Step 7 Revisited:Output HTML def render(form): field_dict = {"CharField": CharField, "EmailField": EmailField, "PasswordField": PasswordField} fields = [field_dict[field.field_type] (name=field.field_name, **field[2]) for field in form.fields] return Form(*fields, id=form.form_name.lower()) Now our output code uses our Internal DSL!
  • 46.
    INPUT UserForm name:CharField -> label:Usernamesize:25 email:EmailField -> size:32 password:PasswordField OUTPUT <form id='userform'> <label>Username</label> <input type='text' name='name' size='25'/><br/> <label>email</label> <input type='text' name='email' size='32'/><br/> <label>password</label> <input type='password' name='password'/><br/> </form>
  • 47.
    Get the wholecode http://bit.ly/pyconindia_dsl
  • 48.
    Summary + DSLs makeyour code easier to read + DSLs make your code easier to write + DSLs make it easy to for non-programmers to maintain code + PyParsing makes is easy to write External DSLs + Python makes it easy to write Internal DSLs