Program Safety Via Lexical Constraints
Owen Lynch
Wed May 08 2024
I’m going to give 10 examples of something I’m calling “Program Safety Via Lexical Constraints,” and I want someone to tell me how I should be thinking about it in a more principled manner. This was loosely inspired by this tweet and the accompanying presentation, but is not terribly related to it
-
Classic Java object-oriented programming revolves around making classes with private variables (called encapsulation). From a type system perspective, there exist values of those private variables for which the program would produce bad results. However those bad values are not reachable as long as you only mutate an object via the public API of the class! You have safety via the lexical guarantee that the private variables are only accessible via the public API.
-
One way of “sandboxing” programs is by putting them in a chroot environment, which essentially means that the root directory
/
is bound to a subdirectory/program-a/jail
or something. We don’t need to prove that a chrooted program doesn’t read files outside of/program-a/jail
, because/..
is not a valid file name! The namespace of available files intrinsically constrains what the program can do. -
OCaml and Standard ML use their module systems to hide implementation details of modules by making the types opaque. For instance, if the signature of a module is
type t
val empty : t
val push : t -> int -> t
val pop : t -> int option * t
then we can guarantee that the only way that external code will use the type t
is by starting with an empty
and then pushing and popping. This allows us to do induction proofs for correctness; instead of having to prove correctness of push
and pop
for all values of t
, we need only prove correctness for values of t
produced by empty
, push
, and pop
.
-
Rust uses lexical scope to prevent memory issues. Specifically, when a variable goes out of scope, the memory associated with that variable is freed! Actually, most programming languages do this, just not in the same way as Rust; in C stack-allocated memory gets freed when a function exits, and in garbage-collected languages, memory may be freed when a variable goes out of all scopes. It would be much more of a pain if one had to track memory in the type system.
-
The
ST
monad in Haskell uses scope to make sure that no references to mutable memory allocated within the monad can escape the monad. -
The theorems for free paper uses parametricity to prove that, for instance, all terms of type
forall a. f a -> g a
forf
andg
functors must be natural transformations. This is also an example of what I’m trying to get at; the way this works is by showing that, essentially, “you don’t have the affordance to do unnatural things.” That is, the semantics of the type theory don’t prevent there from being a primitive that is typed asforall a. f a -> g a
but is in fact not natural. So “theorems for free” depends crucially on no such primitive being in scope. -
The proof that the DOT calculus (the core type theory of Scala 3) is sound rests on the following:
The main contribution of this paper is to demonstrate how, perhaps surprisingly, even though these properties are lost in their full generality, a rich DOT calculus that includes recursive type refinement and a subtyping lattice with intersection types can still be proved sound. The key insight is that subtyping transitivity only needs to be invertible in code paths executed at runtime, with contexts consisting entirely of valid runtime objects, whereas inconsistent subtyping contexts can be permitted for code that is never executed.
I can’t tie this directly into scoping, but it sure sounds very similar to what I was talking about with encapsulation in OCaml and Java; in theory the type system permits bad values, but you will never reach bad values because the way in which you interact with them is constrained.
-
Capabilities based security uses scoping to restrict access to system resources. Instead of having a function
readDir : String -> [String]
, we have a functionreadDir : FileSystem -> String -> [String]
. You can only read what’s in a directory if you have access to something of typeFileSystem
, and we can simply prevent there from being a global value of typeFileSystem
by instead having the runtime pass a value of typeFileSystem
into your main function. Then functions only get access to the file system when that object is passed down to them frommain
. -
Scala’s new effect system proposal is based on similar principles to capabilities based security: Effects as Capabilities: Effect Handlers and Lightweight Effect Polymorphism.
-
The concurrency library ox achieves “structured concurrency” via scoping threads; all threads started within a
scoped {}
block must exit before the scoped block finishes. Thus one cannot write a function which starts a thread and then returns before the thread finishes without having that function take in an extra argument which provides a handle to some threadscope, in which case cleanup responsibility is delegated to that threadscope. Ox is based on loom, which is the same idea, but in Java so it’s more annoying.
My point is: we have lots of tools within math/category theory for thinking about type theory. And so when we want to think about safety, we often gravitate towards the type system to structure our computation. But where is the math/category theory for thinking about lexical constraints that can achieve safety goals? Categorical semantics tends to be very “globally scoped”; maybe we need to work fibered/indexed over lexical scopes? I’m sure people have thought about this before, but I am not familiar with it.
Matteo Capucci
Fri May 10 2024
(Warning: I might have misunderstood the meaning of ‘lexical constraint’)
All these examples remind me of what modalities in type theory usually do: they turn a constraint into a lexical constraint For instance by only defining weakening for assumptions guarded by ! one makes linearity lexically guarded. More generally, type theory seems to be all about encoding semantical constraints (and affordances too) in syntax, and then category theory turns these things in universal properties.
Owen Lynch
Fri May 10 2024
OK, this is a different understanding of modality than I previously had; this is giving me motivation to try and take a look at modalities again and see if I can understand them from this POV.
Sam Staton
Sat May 11 2024
Nice discussion. Your slogan reminds me of “syntactic control of interference” for which this paper has a proposal and a categorical analysis. At the time this was done in the context of categorical work on local state (which is 1, 5?), parametric polymorphism (=6?), etc., so I think was all viewed as connected at the time.
For me, I have been looking at “parameterized algebraic theories” for algebraic effects that include private/local identifiers (e.g.). These are presentations of presheaf-enriched Lawvere theories. I looked at these to try to drill down into the uses of abstraction over identifiers.
The main framework for parameterized algebraic theories is in the spirit of garbage collected identifiers. As Pedro mentioned, avoiding garbage collection (=4?) might lead to substructural logic, which is also in the SCI paper above. We have looked at substructural versions of parameterized algebraic theories. For quantum programming there’s one here: even though one can “discard” qubits I think it should be done explicitly (like 4) because whether they are still lurking around changes things dramatically. Recently we saw that a version without even exchange captures “scoped effects” here (which might also be related to 10).
That’s me so far, there may be fruitful new perspectives, too!