This presentation is an HTML5 website
Press → key to advance.
Austin Seipp
Hac-φ, July 30th 2011
Security is hard to get right (c.f. LulzSec.)
At least, it seems so.
A nasty world of strings, strings, and more strings.
SQL Injections were rated the most dangerous application vulnerability of 2011. http://cwe.mitre.org/top25/?2011
That’s bad.
It’s 2011, and we still don’t statically isolate and model different kinds of data (namely, the rich languages that are HTML/SQL vs strings), and by association, their properties, and as a result people suffer. The web is not a nice place.
A popular technique in mainstream languages for web programming (a la Rails/Django) is metaprogramming - creating reusable components based on generating code at runtime.
Studies (2008) have shown that SQL Injection and XSS attacks are two of the most prevalant vulnerabilities plauging web applications today. Here’s the TL;DR:
The statistics includes data about 12186 web applications with 97554
detected vulnerabilities of different risk levels. The analysis shows
that more than 13%* of all reviewed sites can be compromised
completely automatically. About 49% of web applications contain
vulnerabilities of high risk level (Urgent and Critical) detected
during automatic scanning (T. 1). However, detailed manual and
automated assessment by white box method allows to detect these high
risk level vulnerabilities with probability up to 80-96%.
Preventing injections manually can already be tough, but ensuring metaprograms that create code that is immune to vulnerability can be even more difficult.
Core idea: rich static types that prevent these classes of errors, and can make sure generated programs also play by the rules.
Ur is a variant of ML, with Haskell-isms. It features, amongst other things:
- Modules, Functors
- Type classes, Monads
- Strict evaluation, pure by default (monads control effects)
- Impredicative polymorphism/kind polymorphism
- Polymorphic variants
- ML syntax (grumble*)
- Sports a form of metaprogramming based on row types (see Mitchell Wand’s paper)
- Takes ideas from theorem provers like Coq, and has type level computation, but not dependence.
- Based on System F-omega
* Haskell syntax is the best
- Special library and compiler modifications to support building web applications with Ur.
- Typed XML fragments & SQL are part of the language (postgres/sqlite/mysql backends)
- Sports a form of functional reactive programming for client side. No javascript - everything is written in Ur.
- Aims to support high assurance and reusable web development through statically typed metaprogramming.
- Very efficient: implementation does not use garbage collection (a degenerate form of region inference is used,) and the compiler completely eliminates all polymorphism through whole program optimization, like MLton. Main Ur/Web demo server uses less RAM than the
bashshell.
Ur’s main selling point is the metaprogramming facilities, and type level computation which allow you to generate type-safe code - which still has all the prior safety guarantees - for any given specialization of a metaprogram.
Note: This isn’t done by type-checking individual instantiations of a metaprogram. Rather, the metaprogram itself is typed in such a way as to guarantee correct operation at any possible instantiated type.
Ask yourself: can template haskell generate code which is not well-typed, thus causing compilation to fail?
Remember Curry-Howard: types are theorems, and this also goes for Ur/Web. The type of a metaprogram is a static theory stating any program that all generated code plays by the rules.
All polymorphic type variables in Ur are explicitly declared, but inferred at use sites. Most polymorphic declarations require full annotations of this form, as type inference is undecidable.
Haskell:
reverse :: [a] -> [a] reverse l = rev l [] where rev [] a = a rev (x:xs) a = rev xs (x:a)
Ur/Web:
fun rev [a] (x : list a) = let fun loop (xs : list a) (acc : list a) = case xs of [] => acc | (x' :: xs') => loop xs' (x' :: acc) in loop x [] end
Algebraic data types are just how you would expect them:
datatype tree a = Leaf of a | Node of tree a * tree a fun size [a] (t : tree a) : int = case t of Leaf _ => 1 | Node (t1, t2) => size t1 + size t2 fun flatten [a] (t : tree a) : list a = let fun loop t = case t of Leaf x => x :: [] | Node (t1, t2) => List.append (flatten t1) (flatten t2) in loop t end
Anonymous records are still there like they are in OCaml or Standard ML, but they’re a bit more powerful
in Ur as we’ll show later. Here we can see two special record operators, -- which is ‘field removal’, and
++ which is ‘record concatenation’.
val x = { A = "hello", B = Some 1 } val x2 = (x -- #A ++ {A = "goodbye!", C = False }) val x3 = (x -- #A ++ {A = 1}) val x4 = x2 ++ { D = 1.23 } val x5 = { D = 1.23 } ++ x2 type record = { A : int, B : string } val (y : record) = { A = 1, B = "test" }
Note the syntax for projection is merely record.FieldName.
signature STACK = sig con t :: Type -> Type val empty : a ::: Type -> t a val push : a ::: Type -> t a -> a -> t a val pop : a ::: Type -> t a -> option a end structure Stack : STACK = struct con t = list val empty [a] = [] fun push [a] (t : t a) (x : a) = x :: t fun pop [a] (t : t a) = case t of [] => None | x :: _ => Some x end
Usage:
Stack.pop (Stack.push (Stack.push Stack.empty "A") "B") == Some("B")
fun max [a] (_ : ord a) (x : a) (y : a) : a = if x < y then y else x
Usage:
max 1 2 == 2
Critical thing to note: the ‘ord a’ constraint is actually a value parameter to the function, but we never explicitly use it when calling the function. This is because, like Haskell, type class constraints are inferred by default. But unlike Haskell, instances are actually values.
Type class instances are more like value-level witnesses of some instance, that the compiler infers automatically. We can also provide an explicit witness:
@max ord_int 1 2
signature DOUBLE = sig class double val double : a ::: Type -> double a -> a -> a val mkDouble : a ::: Type -> (a -> a) -> double a val double_int : double int val double_string : double string end structure Double : DOUBLE = struct class double a = a -> a fun double [a] (f : double a) (x : a) : a = f x fun mkDouble [a] (f : a -> a) : double a = f val double_int = mkDouble (times 2) val double_string = mkDouble (fn s => s ^ s) end
Usage:
double 13 == 26 double "ho" == "hoho"
Thanks to the fact that type class instances are really more like values, it’s possible to create closed type classes in Ur by simply not exporting a function to construct a new instance. So if we take the last example and change it:
signature DOUBLE = sig class double val double : a ::: Type -> double a -> a -> a val double_int : double int val double_string : double string end
Now, ‘double’ is a closed type class with only two possible instances:
one for string and one for int.
XML is a first-class part of the language like we said earlier. Here’s an example:
test action value = return <xml> Action result: {action ()}<br/> Safe value: {[value]}<br/> </xml> testAction () = return <xml>Hi!</xml> val main () = test testAction (<xml>test</xml>)
Note there are two ways of injecting a value into the page: we either use an antiquote, or we use a safe means of injecting it into the page which escapes the values.
SQL is first-class too! And the fields of a SQL table are in fact represented as a record.
table t : { Id : int, User : string, Pass : string } PRIMARY KEY Id getOneRowFromT id = oneRow1 (SELECT * FROM t where t.Id = {[id]});
So, if we had a row in the table ‘t’ with the values ‘1, admin, admin’, then:
v <- getOneRowFromT 1' return <xml> Id: {[v.Id]} Username: {[v.User]} </xml>
table users : { ID : int, Username : string, Password : string } PRIMARY KEY ID fun login name pass = oneOrNoRowsE1 (SELECT (users.Username) FROM users WHERE users.Username = {[name]} AND users.Password = {[pass]}) fun main () = s <- source "Not logged in."; u <- source ""; p <- source ""; return <xml><body> <form> Username: <ctextbox source={u}/><br/> Password: <ctextbox source={p}/><br/> <button value="Submit" onclick={loginButton s u p}/> </form><br/> Status: <dyn signal={v <- signal s; return <xml>{[v]}</xml>} /> </body></xml> and loginButton s u p = uu <- get u; pp <- get p; n <- rpc (login uu pp); case n of None => return () | Some r => set s ("Logged in as " ^ r)
In Haskell, you normally have a few basic kinds which quantify types. In Ur, there are several fundamental kinds, and they’re important because you end up dealing with them a lot when doing metaprogramming. The kinds are:
Type- kind of regular typesUnit- the kind inhabited only byunit, which is only inhabited by(){K}- the kind of type level records of kindK- example:[A = int, B = string] :: {Type}Name- the kind of a name of a type level record - example:#A :: Name
Also:
[t :: K]is an explicitly given type argument[t ::: K]is an inferred type argument$rflattens the type level record of kind{Type}into a regular kindType
Q: can we type a function which can project an arbitrary field of an arbitrary record?
A: Yes.
Scary, scary definition:
fun proj [nm :: Name] [t ::: Type] [r ::: {Type}] [[nm] ~ r] (x : $([nm = t] ++ r)) = x.nm
Easy easy usage:
proj [#A] r == 1 proj [#B] r == "hello!"
Once again, scary definition:
fun sum [fs ::: {Unit}] (fl : folder fs) (x : $(mapU int fs)) = @foldUR [int] [fn _ => int] (fn [nm :: Name] [rest :: {Unit}] [[nm] ~ rest] n acc => n + acc) 0 fl x
Easy usage:
sum {A = 0, B = 1} == 1
sum {C = 3, D = 10, E = 1} == 14
http://www.haskell.org/pipermail/haskell-cafe/2011-July/094096.html
TL;DR: Given is an example of a CRUD demo which allows you to automatically generate a Rails like ‘scaffold’ - particularly an administrative CRUD style interface - for any given database table.
The metaprogram is itself typed in such a way to guarantee that generated programs behave correctly and are immune to generic kinds of attacks, for any given SQL table. These guarantees apply to both server side (type-checking queries) and client side (type checking HTML,) for any and all generated programs.
Challenge: can equivalent functionality be implemented in Haskell? What’s it look like? Remember, it’s a ‘challenge’ - nobody says it’s impossible, just that it’s going to be challenging.
- C FFI
- JavaScript FFI
- Deployment/static file handling
- Kind polymorphism
- Impredicative polymorphism
- Mostly designed for apps backed by relational data stores. No built in key-value/nosql/non-relational support. You could bind this yourself however.
- No package manager. Not many packages at any rate. Neither of these are good.
- Did I mention Haskell-like syntax is the best?
- Hard to understand compiler errors.
- I’m going to say that again. Hard to understand compiler errors.
- Arguably, XML fragments as first-class literals in the language is not the “right way” because programmers don’t write that. You can abstract over this with modules at the very least. Not pretty but it does work.
- Arguably, literal SQL as part of the language is not the “right way” because you may inevitably want some other type of data store. It may also be less convenient than, say, an ORM for abstraction. You can write your own type-safe ORM however. On the other hand, who knows when or if non-relational backends will be supported.
- Needs more love and community!
- Typed functional programming is great for web dev. Also, lots of options!
- Types help in the construction of correct programs in any domain, but…
- … the web is particularly suited for it. For every SQL injection or XSS vulnerability, someone cries because you don’t know the difference between strings and real data.
- Metaprogramming is useful and can significantly help productivity.
- But ensuring generated programs play by the rules can be tricky.
- Domain specific knowledge helps capture invariants and gives high power to weight ratio in Ur’s case: no GC, useful type inference.
Questions, complaints, hatred, XOXOXOs: as@hacks.yi.org
Ur/Web main website: http://impredicative.com/ur
Tutorial: http://impredicative.com/ur/tutorial
Slides & code fragments: http://github.com/thoughtpolice/ur-hac-phi-talk/