Thread overview
std.string.assumeUTF() silently casting mutable to immutable?
Feb 13
Forest
Feb 13
Johan
Feb 14
Forest
Feb 14
RazvanN
Feb 14
Forest
Feb 14
RazvanN
February 13

I may have found a bug in assumeUTF(), but being new to D, I'm not sure.

The description:

>

Assume the given array of integers arr is a well-formed UTF string and return it typed as a UTF string.
ubyte becomes char, ushort becomes wchar and uint becomes dchar. Type qualifiers are preserved.

The declaration:

auto assumeUTF(T)(T[] arr)
if (staticIndexOf!(immutable T, immutable ubyte, immutable ushort, immutable uint) != -1)

Shouldn't that precondition's immutable T be simply T?

As it stands, I can do this with no complaints from the compiler...

string test(ubyte[] arr)
{
    import std.string;
    return arr.assumeUTF;
}

...and accidentally end up with a "string" pointing at mutable data.

Am I missing something?

February 13
On Tuesday, February 13, 2024 12:40:57 AM MST Forest via Digitalmars-d-learn wrote:
> I may have found a bug in assumeUTF(), but being new to D, I'm
> not sure.
>
> The description:
> > Assume the given array of integers arr is a well-formed UTF
> > string and return it typed as a UTF string.
> > ubyte becomes char, ushort becomes wchar and uint becomes
> > dchar. Type qualifiers are preserved.
>
> The declaration:
>
> ```d
> auto assumeUTF(T)(T[] arr)
> if (staticIndexOf!(immutable T, immutable ubyte, immutable
> ushort, immutable uint) != -1)
> ```
>
> Shouldn't that precondition's `immutable T` be simply `T`?
>
> As it stands, I can do this with no complaints from the compiler...
>
> ```d
> string test(ubyte[] arr)
> {
>      import std.string;
>      return arr.assumeUTF;
> }
>
> ```
>
> ...and accidentally end up with a "string" pointing at mutable data.
>
> Am I missing something?

It's not a bug in assumeUTF. if you changed your code to

string test(ubyte[] arr)
{
     import std.string;
     pragma(msg, typeof(arr.assumeUTF));
     return arr.assumeUTF;
}

then the compiler will output

char[]

because assumeUTF retains the type qualifier of the original type (as its documentation explains). Rather, it looks like the problem here is that dmd will implictly change the constness of a return value when it thinks that it can do so to make the code work. Essentially, that means that the function has to be pure and that the return value can't have come from any of the function's arguments. And at a glance, that would be true here, because no char[] was passed into assumeUTF. However, casting from ubyte[] to char[] is @safe, so dmd should be taking that possibility into account, and it's apparently not.

So, there's definitely a bug here, but it's a dmd bug. Its checks for whether it can safely change the constness of the return type apparently aren't sophisticated enough to catch this case.

- Jonathan M Davis



February 13

On Tuesday, 13 February 2024 at 08:10:20 UTC, Jonathan M Davis wrote:

>

So, there's definitely a bug here, but it's a dmd bug. Its checks for whether it can safely change the constness of the return type apparently aren't sophisticated enough to catch this case.

This is a pretty severe bug.
Some test cases: https://d.godbolt.org/z/K1fjdj76M

ubyte[] pure_ubyte(ubyte[] arr) pure @safe;
ubyte[] pure_void(void[] arr) pure @safe;
ubyte[] pure_int(int[] arr) pure @safe;
int[] pure_ubyte_to_int(ubyte[] arr) pure @safe;

// All cases below should not compile, yet some do.

immutable(ubyte)[] test(ubyte[] arr) @safe
{
    // return with_ubyte(arr); // ERROR: OK
    return pure_void(arr); // No error: NOK!
}

immutable(ubyte)[] test(int[] arr) @safe
{
    return pure_int(arr); // No error: NOK!
}

immutable(int)[] test2(ubyte[] arr) @safe
{
    return pure_ubyte_to_int(arr); // No error: NOK!
}

-Johan

February 14

On Tuesday, 13 February 2024 at 14:05:03 UTC, Johan wrote:

>

On Tuesday, 13 February 2024 at 08:10:20 UTC, Jonathan M Davis wrote:

>

So, there's definitely a bug here, but it's a dmd bug. Its checks for whether it can safely change the constness of the return type apparently aren't sophisticated enough to catch this case.

This is a pretty severe bug.

Thanks, gents.

Reported on the tracker:

https://issues.dlang.org/show_bug.cgi?id=24394

February 14

On Wednesday, 14 February 2024 at 02:13:08 UTC, Forest wrote:

>

On Tuesday, 13 February 2024 at 14:05:03 UTC, Johan wrote:

>

On Tuesday, 13 February 2024 at 08:10:20 UTC, Jonathan M Davis wrote:

>

So, there's definitely a bug here, but it's a dmd bug. Its checks for whether it can safely change the constness of the return type apparently aren't sophisticated enough to catch this case.

This is a pretty severe bug.

Thanks, gents.

Reported on the tracker:

https://issues.dlang.org/show_bug.cgi?id=24394

This has already been fixed, you just need to use -preview=fixImmutableConv. This was put behind a preview flag as it introduces a breaking change.

February 14

On Wednesday, 14 February 2024 at 10:57:42 UTC, RazvanN wrote:

>

This has already been fixed, you just need to use -preview=fixImmutableConv. This was put behind a preview flag as it introduces a breaking change.

I just tried that flag on run.dlang.org, and although it fixes the case I posted earlier, it doesn't fix this one:

string test(const(ubyte)[] arr)
{
    import std.string;
    return arr.assumeUTF;
}

Shouldn't this be rejected as well?

February 14

On Wednesday, 14 February 2024 at 11:56:29 UTC, Forest wrote:

>

On Wednesday, 14 February 2024 at 10:57:42 UTC, RazvanN wrote:

>

This has already been fixed, you just need to use -preview=fixImmutableConv. This was put behind a preview flag as it introduces a breaking change.

I just tried that flag on run.dlang.org, and although it fixes the case I posted earlier, it doesn't fix this one:

string test(const(ubyte)[] arr)
{
    import std.string;
    return arr.assumeUTF;
}

Shouldn't this be rejected as well?

Indeed, that should be rejected as well, otherwise you can modify immutable table. This code currently happily compiles:

string test(const(ubyte)[] arr)
{
    import std.string;
    return arr.assumeUTF;
}

void main()
{
    import std.stdio;
    ubyte[] arr = ['a', 'b', 'c'];
    auto t = test(arr);
    writeln(t);
    arr[0] = 'x';
    writeln(t);
}

And prints:

abc
xbc

However, this seems to be a different issue.