DMD Source Archive - Why? (page 3) - D Programming Language Discussion Forum

Posted by Paolo Invernizzi
in reply to Walter Bright

Paolo Invernizzi

Posted in reply to Walter Bright

On Tuesday, 9 April 2024 at 17:19:41 UTC, Walter Bright wrote:
> On 4/4/2024 4:12 AM, Paolo Invernizzi wrote:
>> My 2 cents: there will be NO advantages in compilation time.
>
> Unfortunately, some things cannot be benchmarked until they are built.

Exactly, mine It's a bet ... but hey, I'll be happy to lost it, of course!

April 09

Posted by Walter Bright
in reply to Richard (Rikki) Andrew Cattermole

Walter Bright

Posted in reply to Richard (Rikki) Andrew Cattermole

On 4/9/2024 10:22 AM, Richard (Rikki) Andrew Cattermole wrote:
> Address randomization, Windows remapping of symbols at runtime (with state that is kept around so you can do it later), all suggest it isn't like that now.

Code generation has changed to be PIC (Position Independent Code) so this is workable.

April 10

Posted by Richard (Rikki) Andrew Cattermole
in reply to Walter Bright

Richard (Rikki) Andrew Cattermole

Posted in reply to Walter Bright

On 10/04/2024 7:04 AM, Walter Bright wrote:
> On 4/9/2024 10:22 AM, Richard (Rikki) Andrew Cattermole wrote:
>> Address randomization, Windows remapping of symbols at runtime (with state that is kept around so you can do it later), all suggest it isn't like that now.
> 
> Code generation has changed to be PIC (Position Independent Code) so this is workable.

From Windows Internals 5 and WinAPI docs, it seems as though it does memory map initially. But the patching will activate CoW, so in effect it isn't memory mapped if you need to patch.

Which is quite useful information for me while working with Unicode tables.

If you need to patch? That won't be shared.

If you don't need to patch? Who cares how much ROM is used! Don't be afraid to use 256kb in a single table. Just don't use pointers... no matter what and it'll be shared.

April 09

Posted by Walter Bright
in reply to ryuukk_

Walter Bright

Posted in reply to ryuukk_

On 4/9/2024 10:58 AM, ryuukk_ wrote:
> Who managed to convince you to spend time working on this?

Nobody. I've wanted to do it for decades, just never got around to it. What triggered it was my proposal to Adam to split Phobos modules into a much more granular structure, which would increase the number of files in it by a factor of 5 or more. (The current structure is of each module being a grab bag of marginally related functions.) A more granular nature would hopefully reduce the "every module imports every other module" problem Phobos has.

But lots more modules increases aggregate file lookup times.

One cannot really tell how well it works without trying it.

April 09

Posted by Steven Schveighoffer
in reply to Paolo Invernizzi

Steven Schveighoffer

Posted in reply to Paolo Invernizzi

See https://github.com/dlang/dmd/pull/14582

On Tuesday, 9 April 2024 at 18:49:21 UTC, Paolo Invernizzi wrote:

On Tuesday, 9 April 2024 at 17:19:41 UTC, Walter Bright wrote:

On 4/4/2024 4:12 AM, Paolo Invernizzi wrote:

My 2 cents: there will be NO advantages in compilation time.

Unfortunately, some things cannot be benchmarked until they are built.

Exactly, mine It's a bet ... but hey, I'll be happy to lost it, of course!

I will also bet that any difference in compile time will be extremely insignificant. I don't bet against decades of filesystem read optimizations. Saving e.g. microseconds on a 1.5 second build isn't going to move the needle.

I did reduce stats semi-recently for DMD and saved a significant percentage of stats, I don't really think it saved insane amounts of time. It was more of a "oh, I thought of a better way to do this". I think at the time, there was some resistance to adding more stats to the compiler due to the same misguided optimization beliefs, and so I started looking at it. If reducing stats by 90% wasn't significant, reducing them again likely isn't going to be noticed.

The only benefit I might see in this is to manage the source as one item. But I don't really know that we need a new custom format. tar is pretty simple. ARSD has a tar implementation that I lifted for my raylib-d installer which allows reading tar files with about 100 lines of code.

-Steve

April 09

Posted by Walter Bright
in reply to Steven Schveighoffer

Walter Bright

Posted in reply to Steven Schveighoffer

On 4/9/2024 4:42 PM, Steven Schveighoffer wrote:
> I will also bet that any difference in compile time will be extremely insignificant. I don't bet against decades of filesystem read optimizations. Saving e.g. microseconds on a 1.5 second build isn't going to move the needle.

On my timing on compiling hello world, a 1.412s build becomes 1.375s, 35 milliseconds faster. Most of the savings appear to be due to when the archive is first accessed, its table of contents is loaded into the path cache and file cache that you developed. Then, no stats are done on the filesystem.

> I did reduce stats semi-recently for DMD and saved a significant percentage of stats, I don't really think it saved insane amounts of time. It was more of a "oh, I thought of a better way to do this". I think at the time, there was some resistance to adding more stats to the compiler due to the same misguided optimization beliefs, and so I started looking at it. If reducing stats by 90% wasn't significant, reducing them again likely isn't going to be noticed.
> 
> See https://github.com/dlang/dmd/pull/14582

Nice. I extended it so files in an archive are tracked.

> The only benefit I might see in this is to *manage* the source as one item.

The convenience of being able to distribute a "header only" library as one file may be significant. I've always liked things that didn't need an installation program. An install should be "copy the file onto your system" and uninstall should be "delete the file" !

Back in the days of CD software, my compiler was set up so no install was necessary, just put the CD in the drive and run it. You didn't even have to set the environment variables, as the compiler would look for its files relative to where the executable file was (argv[0]). You can see vestiges of that still in today's dmd.

Of course, to get it to run faster you'd XCOPY it onto the hard drive. Though some users were flummoxed by the absence of INSTALL.EXE and I'd have to explain how to use XCOPY.

> But I don't really know that we need a new custom format. `tar` is pretty simple. ARSD has a tar implementation that I lifted for my raylib-d installer which allows reading tar files with about [100 lines of code](https://github.com/schveiguy/raylib-d/blob/9906279494f1f83b2c4c9550779d46962af7c342/install/source/app.d#L22-L132).

Thanks for the code.

A tar file is serial, meaning one has to read the entire file to see what it is in it (because it was designed for tape systems where data is simply appended).

The tar file doesn't have a table of contents, the filename is limited to 100 characters, and the path is limited to 155 characters.

Sar files have a table of contents at the beginning, and unlimited filespec sizes.

P.S. the code that actually reads the .sar file is about 20 lines! (Excluding checking for corrupt files, and the header structure definition.) The archive reader and writer can be encapsulated in a separate module, so anyone can replace it with a different format.

April 09

Posted by Walter Bright
in reply to Richard (Rikki) Andrew Cattermole

Walter Bright

Posted in reply to Richard (Rikki) Andrew Cattermole

On 4/9/2024 12:11 PM, Richard (Rikki) Andrew Cattermole wrote:
> On 10/04/2024 7:04 AM, Walter Bright wrote:
>> On 4/9/2024 10:22 AM, Richard (Rikki) Andrew Cattermole wrote:
>>> Address randomization, Windows remapping of symbols at runtime (with state that is kept around so you can do it later), all suggest it isn't like that now.
>>
>> Code generation has changed to be PIC (Position Independent Code) so this is workable.
> 
>  From Windows Internals 5 and WinAPI docs, it seems as though it does memory map initially. But the patching will activate CoW, so in effect it isn't memory mapped if you need to patch.

Right, so the executable is designed to not need patching.

> If you don't need to patch? Who cares how much ROM is used! Don't be afraid to use 256kb in a single table. Just don't use pointers... no matter what and it'll be shared.

Instead of using pointers, use offsets from the beginning of the file.

April 10

Posted by Adam Wilson
in reply to Walter Bright

Adam Wilson

Posted in reply to Walter Bright

On Tuesday, 9 April 2024 at 19:11:28 UTC, Walter Bright wrote:
> Nobody. I've wanted to do it for decades, just never got around to it. What triggered it was my proposal to Adam to split Phobos modules into a much more granular structure, which would increase the number of files in it by a factor of 5 or more. (The current structure is of each module being a grab bag of marginally related functions.) A more granular nature would hopefully reduce the "every module imports every other module" problem Phobos has.

Great.

Now everybody is going to think that I started this.

For the record I did **not** start this.

Walter sent me this idea out of the blue after I pointed out that working with hundreds (or thousands) of files in Phobos was going to be just as messy as it is with Java or C#.

This wasn't the problem I was thinking of because frankly, nobody cares about file access times in C#/Java, but this does have certain advantages from a distribution standpoint. Although honestly, we're going to end up unpacking the files for other tools to use anyways.

April 10

Posted by Paulo Pinto
in reply to Adam Wilson

Paulo Pinto

Posted in reply to Adam Wilson

On Wednesday, 10 April 2024 at 10:17:53 UTC, Adam Wilson wrote:
> On Tuesday, 9 April 2024 at 19:11:28 UTC, Walter Bright wrote:
>> [...]
>
> Great.
>
> Now everybody is going to think that I started this.
>
> For the record I did **not** start this.
>
> Walter sent me this idea out of the blue after I pointed out that working with hundreds (or thousands) of files in Phobos was going to be just as messy as it is with Java or C#.
>
> This wasn't the problem I was thinking of because frankly, nobody cares about file access times in C#/Java, but this does have certain advantages from a distribution standpoint. Although honestly, we're going to end up unpacking the files for other tools to use anyways.

Not only we don't care about file access times to JAR/WAR/EAR and DLLs, we happily ship binary libraries, instead of parsing source code all the time.

This looks to me at yet another distraction.

April 10

Posted by Steven Schveighoffer
in reply to Walter Bright

Steven Schveighoffer

Posted in reply to Walter Bright