On Wednesday, 10 April 2024 at 03:47:30 UTC, Walter Bright wrote:
> On 4/9/2024 4:42 PM, Steven Schveighoffer wrote:
> I will also bet that any difference in compile time will be extremely insignificant. I don't bet against decades of filesystem read optimizations. Saving e.g. microseconds on a 1.5 second build isn't going to move the needle.
On my timing on compiling hello world, a 1.412s build becomes 1.375s, 35 milliseconds faster. Most of the savings appear to be due to when the archive is first accessed, its table of contents is loaded into the path cache and file cache that you developed. Then, no stats are done on the filesystem.
Yes, the nice thing is knowing you will not have to ask the filesystem for something you know doesn't exist. Pre-loading the directory structure could do the same thing, but I think that's definitely not as efficient.
> > The only benefit I might see in this is to manage the source as one item.
The convenience of being able to distribute a "header only" library as one file may be significant. I've always liked things that didn't need an installation program. An install should be "copy the file onto your system" and uninstall should be "delete the file" !
Back in the days of CD software, my compiler was set up so no install was necessary, just put the CD in the drive and run it. You didn't even have to set the environment variables, as the compiler would look for its files relative to where the executable file was (argv[0]). You can see vestiges of that still in today's dmd.
Of course, to get it to run faster you'd XCOPY it onto the hard drive. Though some users were flummoxed by the absence of INSTALL.EXE and I'd have to explain how to use XCOPY.
Consider that java archives (.jar
files) are distributed as a package instead of individual .class
files.
And Microsoft (and other C compilers) can produce "pre-compiled headers", that take away some of the initial steps of compilation.
I think there would be enthusiastic support for D archive files that reduce some of the compilation steps, or provide extra features (e.g. predetermined inference or matching compile-time switches). Especially if you aren't going to directly edit these archive files, you will be mechanically generating them, why not do more inside there?
> A tar file is serial, meaning one has to read the entire file to see what it is in it (because it was designed for tape systems where data is simply appended).
You can index a tar file easily. Each file is preceded by a header with the information about the file (including size). So you can determine the catalog by seeking to each header.
Note also that we can work with tar files to add indexes that are backwards compatible with existing tools. Remember, we are generating this from a tool that we control. Prepending an index "file" is trivial.
> The tar file doesn't have a table of contents, the filename is limited to 100 characters, and the path is limited to 155 characters.
I'm not too worried about such things. I've never run into filename length problems with tar. But also, most modern tar formats do not have these limitations:
https://www.gnu.org/software/tar/manual/html_section/Formats.html
> Sar files have a table of contents at the beginning, and unlimited filespec sizes.
P.S. the code that actually reads the .sar file is about 20 lines! (Excluding checking for corrupt files, and the header structure definition.) The archive reader and writer can be encapsulated in a separate module, so anyone can replace it with a different format.
I would suggest we replace it with a modern tar format for maximum compatibility with existing tools. We already have seen the drawbacks of using the abandoned sdl
format for dub packages. We should not repeat that mistake.
-Steve