Jump to page: 1 2
Thread overview
March 17

This proposal introduces a new semantic pass, that uses an IR to analyse memory transfers and the different states it can be in.

It is the bleading edge on program security, that would enable other memory analysis solutions such as isolated (immutable references ala Midori).

It enables:

  • To prevent reads to uninitialized variables
  • Prevent reads, writes, method calls to null objects
  • Logic errors such as trying to read from an unopened file
  • Is a framework to analyse unreachable vs reachable variables

It does not affect the type system.
Additional information must be provided by the user if they want to go beyond the defaults.
The default type state of each variable is meant to be as close to provable by the compiler, with user assistance like null checks.

A key premise is that newer edition @safe function will not be able to call the older @safe function.
This drastically simplifies decisions.

The end goal of this is to enable D to become temporally safe, which is critical to D's future (see the recent US government report on memory safety with specific mention towards temporal safety).
Without this, the framework required to do temporal analysis does not exist, and what would otherwise be very clear decisions become arguable at best, at worst complete unknowns.

For each variable declaration I am proposing a default type state, along with manual specification of the input and output states it may be in by using the UAX31 Medial character identifier'input'output.

This proposal was born out of necessaity after failure to solve issues surrounding isolated, and it too required a similar IR to be built.

I recognize that this could be a bit costly, but due to the potential performance wins as well as memory and logic safety, this is an absolute must if we want to be competitive.

Latest: https://gist.github.com/rikkimax/eed86a7061445a93f214e41fb6445e40
Current: https://gist.github.com/rikkimax/eed86a7061445a93f214e41fb6445e40/a8fffb5725904c6f5d74052d9c974a8f5d453fb0

March 17

On Sunday, 17 March 2024 at 06:51:23 UTC, Richard (Rikki) Andrew Cattermole wrote:

>

This proposal introduces a new semantic pass, that uses an IR to analyse memory transfers and the different states it can be in.

Any examples of languages that use this and what it provides?

March 18
On 18/03/2024 12:25 AM, claptrap wrote:
> On Sunday, 17 March 2024 at 06:51:23 UTC, Richard (Rikki) Andrew Cattermole wrote:
>> This proposal introduces a new semantic pass, that uses an IR to analyse memory transfers and the different states it can be in.
> 
> Any examples of languages that use this and what it provides?

From Wikipedia:

> Typestate is an experimental concept that has not yet crossed over into mainstream programming languages. However, many academic projects actively investigate how to make it more useful as an everyday programming technique. Two examples are the Plaid and Obsidian languages, which are being developed by Jonathan Aldrich's group at Carnegie Mellon University.[16][17] Other examples include the Clara[18] language research framework, earlier versions of the Rust language, and the >> keyword in ATS.[19]

This is the bleeding edge of research.

https://en.wikipedia.org/wiki/Typestate_analysis

If we get this, we'll be writing the literature.
March 17

On Sunday, 17 March 2024 at 06:51:23 UTC, Richard (Rikki) Andrew Cattermole wrote:

>

Latest: https://gist.github.com/rikkimax/eed86a7061445a93f214e41fb6445e40
Current: https://gist.github.com/rikkimax/eed86a7061445a93f214e41fb6445e40/a8fffb5725904c6f5d74052d9c974a8f5d453fb0

Yet another proposal for a humongous language feature to solve a problem that can already be solved without it.

It is possible to achieve temporal safety in D already with scope (DIP 1000) and @system variables (DIP 1035). The ergonomics are not great, but they can be improved (e.g., with built-in move-on-last-use).

This proposal specifically has the same problem as @live: it splits @safe code into separate "new @safe" and "old @safe" dialects, which are mutually incompatible.

The ideas themselves are not terrible. I would be interested to see what this looks like implemented in a research language. But I do not think it has any place in D.

March 29
The @live functions do type state analysis - data flow analysis is used to determine if a variable is 'live' or not.

It is indeed costly to do dfa in the front end, that's one reason why it's restricted to @live functions.

The dfa could be extended for null checking, but in practice, null checking is not that effective:

```
class C { void xx(); }

struct S { C c; }

C mars(C c) { return null; }

void phobos(ref S s)
{
    C c;
    c.xx();       // detected
    mars(c).xx(); // needs whole program DFA to detect
    s.c.xx();     // cannot be detected
}
```

This is why @live functions won't work without scope, ref, and return annotations on the functions it interfaces with. @live functions, like Rust, also severely limit the kinds of data structures possible.

Other type state analysis currently done in D is:

1. Value Range Propagation

2. whether a field is initialized or not in a constructor is tracked
March 30
On 30/03/2024 6:10 AM, Walter Bright wrote:
> The @live functions do type state analysis - data flow analysis is used to determine if a variable is 'live' or not.
> 
> It is indeed costly to do dfa in the front end, that's one reason why it's restricted to @live functions.
> 
> The dfa could be extended for null checking, but in practice, null checking is not that effective:
> 
> ```
> class C { void xx(); }
> 
> struct S { C c; }
> 
> C mars(C c) { return null; }
> 
> void phobos(ref S s)
> {
>      C c;
>      c.xx();       // detected
>      mars(c).xx(); // needs whole program DFA to detect
>      s.c.xx();     // cannot be detected
> }
> ```
> 
> This is why @live functions won't work without scope, ref, and return annotations on the functions it interfaces with. @live functions, like Rust, also severely limit the kinds of data structures possible.
> 
> Other type state analysis currently done in D is:
> 
> 1. Value Range Propagation
> 
> 2. whether a field is initialized or not in a constructor is tracked

Yes, typical applications of DFA need to be run on the whole program to work as far as I'm aware. ML family compilers do this almost exclusively, they cannot do multi-step builds. D however is the opposite.

Given that what I'm suggesting is not the norminal use case for DFA (quite the opposite), and we already have an approach thanks to DIP1000 I am proposing to go function by function.

You must annotate the type states input and output for functions parameters and return value if the defaults are less than what you are wanting guaranteed.

It acts purely in the form of verification against this.
Inferring only happens within the body of a function, it does not affect the signature including for templates (not part of type system).

Because of these decisions it can be parallelized without concern (unless you need to run semantic on newly inserted statements for cleanup).

See my recently added example that Rust cannot check since it hasn't got type state analysis in production.

https://gist.github.com/rikkimax/eed86a7061445a93f214e41fb6445e40

```d
T* makeNull(T)() @safe {
    return null;
}

void useNull() @safe {
    int* var = makeNull!int();
    // var is in type state initialized as per makeNull return state

    *var = 42;
    // segfault due to var being null
}
```

What we want to happen instead:

```d
T* makeNull(T)(/* return'initialized */) @safe {
    return null;
    // type state default is more than the type state initialized
    // so it is accepted
}

void useNull() @safe {
    int* var = makeNull!int();
    // var is in type state initialized as per MakeNull return state

    // perform load via var variable
    // this will error due to initialized is less than the nonnull type state
    // Error: Variable var is in type state initialized which could be null, cannot write to it
    *var = 42;
}
```

To fix, simply check for null!

```d
void useNull() @safe {
    int* var = makeNull!int();
    // var is in type state initialized as per MakeNull return state

    if (var !is null) {
        // in scope, assume var is in type state nonnull
        *var = 42;
    }
}
```
March 29
In order to make this work, it will need typestate annotations on all:

1. function returns
2. aggregate fields
3. array element types
4. pointed to types
5. function parameters

I suspect that would make it bulletproof. Isn't it a big ask to get people to add all those annotations?
March 30
On 30/03/2024 3:30 PM, Walter Bright wrote:
> Isn't it a big ask to get people to add all those annotations?

If people had to write annotations yes.

The key will be making sure that they don't have to, the defaults should be good enough, or at bare minimum match what we assume today.

What I am unsure about currently is related to pointers (that are not the this pointer).

Should we require them to be nonnull by default and allow people to opt out like in other languages, or do we go in the inverse and go with what we have now and stick to initialized and must be opt-in to the stronger type state of nonnull.

Slices are quite often null on purpose, what about them?

What about consistency between @safe and @system?

There are some unanswered questions that may be better answered after experience with it. Although I'm in favor of doing things differently than other languages and make people think about their assumptions for their code, rather than assume stronger guarantees that may not have been considered.
April 08

On Sunday, 17 March 2024 at 15:35:40 UTC, Paul Backus wrote:

>

On Sunday, 17 March 2024 at 06:51:23 UTC, Richard (Rikki) Andrew Cattermole wrote:

>

[...]

Yet another proposal for a humongous language feature to solve a problem that can already be solved without it.

It is possible to achieve temporal safety in D already with scope (DIP 1000) and @system variables (DIP 1035). The ergonomics are not great, but they can be improved (e.g., with built-in move-on-last-use).

This proposal specifically has the same problem as @live: it splits @safe code into separate "new @safe" and "old @safe" dialects, which are mutually incompatible.

The ideas themselves are not terrible. I would be interested to see what this looks like implemented in a research language. But I do not think it has any place in D.

I second all of the above.

April 09
On 09/04/2024 6:18 AM, Atila Neves wrote:
> On Sunday, 17 March 2024 at 15:35:40 UTC, Paul Backus wrote:
>> On Sunday, 17 March 2024 at 06:51:23 UTC, Richard (Rikki) Andrew Cattermole wrote:
>>> [...]
>>
>> Yet another proposal for a humongous language feature to solve a problem that can already be solved without it.
>>
>> It is possible to achieve temporal safety in D already with `scope` (DIP 1000) and `@system` variables (DIP 1035). The ergonomics are not great, but they can be improved (e.g., with built-in move-on-last-use).
>>
>> This proposal specifically has the same problem as `@live`: it splits `@safe` code into separate "new `@safe`" and "old `@safe`" dialects, which are mutually incompatible.
>>
>> The ideas themselves are not terrible. I would be interested to see what this looks like implemented in a research language. But I do not think it has any place in D.
> 
> I second all of the above.

Just so we are clear, DIP1000 is not able to provide memory safety on a single thread, let alone multiple.

It is missing the child/parent relationship to prevent:

```d
Parent* parent;
Field* child = &parent.field;
parent.destroy;
Field value = *child;
```

This happens to be one of the things that the DFA is capable of tracking, that is not currently possible to implement (Dennis has tried).

It also does not  provide any way of tracking memory from creation to ensure it can be transferred safely to another thread in its entirety to prevent needing things like locks.
« First   ‹ Prev
1 2