DMD Source Archive - Why? (page 4) - D Programming Language Discussion Forum

Settings

Help

Index » General » DMD Source Archive - Why? (page 4)

April 10

Re: DMD Source Archive - Why?

Posted by Paulo Pinto
in reply to Steven Schveighoffer

Permalink

Paulo Pinto

Posted in reply to Steven Schveighoffer

Permalink

On Wednesday, 10 April 2024 at 16:42:53 UTC, Steven Schveighoffer wrote:

On Wednesday, 10 April 2024 at 03:47:30 UTC, Walter Bright wrote:

[...]

Yes, the nice thing is knowing you will not have to ask the filesystem for something you know doesn't exist. Pre-loading the directory structure could do the same thing, but I think that's definitely not as efficient.

[...]

C++ compilers are already on the next level, past PCH, with C++ modules.

VC++ uses a database format for BMI (Binary Module Interface), has open sourced it, and there are some people trying to champion it as means to have C++ tooling similar to what Java and .NET IDEs can do with JVM/CLR metadata.

https://devblogs.microsoft.com/cppblog/open-sourcing-ifc-sdk-for-cpp-modules/

April 10

Re: DMD Source Archive - Why?

Posted by Walter Bright
in reply to Steven Schveighoffer

Permalink

Walter Bright

Posted in reply to Steven Schveighoffer

Permalink

We certainly could do more with .sar files, we just have to start somewhere.

If we're going to add features to a .tar file, like an index, aren't we then creating our own format and won't be able to use existing .tar programs?

Yes, one can skip through a .tar archive indexing as one goes. The problem is one winds up reading the .tar archive. With the .sar format, the index is at the beginning and none of the rest of the file is read in, unless actually needed. .tar is the only archive format I'm aware of that does not have an index section, and that's because it's designed for append-only magtapes. (Talk about ancient obsolete technology!)

Many archive formats also include optional compression, and various compression methods at that. All that support would have to be added to the compiler, as otherwise I'll get the bug reports "dmd failed with my .zip file!"

Still, the concept of presenting things as a single file is completely distinct from the file format used. The archive format being pluggable is certainly an option.

April 10

Re: DMD Source Archive - Why?

Posted by Walter Bright
in reply to Paulo Pinto

Permalink

Walter Bright

Posted in reply to Paulo Pinto

Permalink

On 4/10/2024 9:54 AM, Paulo Pinto wrote:
> C++ compilers are already on the next level, past PCH, with C++ modules.
> 
> VC++ uses a database format for BMI (Binary Module Interface), has open sourced it, and there are some people trying to champion it as means to have C++ tooling similar to what Java and .NET IDEs can do with JVM/CLR metadata.
> 
> https://devblogs.microsoft.com/cppblog/open-sourcing-ifc-sdk-for-cpp-modules/

That's more or less what my C++ compiler did back in the 1990s. The symbol table and AST was created in a memory-mapped file, which could be read back in to jump-start the next compilation.

Yes, it was faster.

But the problem C++ has is compiling it is inherently slow due to the design of the language. My experience with that led to D being fast to compile, because I knew what to get rid of. With a language that compiles fast, it isn't worthwhile to have a binary precompiled module.

April 11

Re: DMD Source Archive - Why?

Posted by Steven Schveighoffer
in reply to Walter Bright

Permalink

Steven Schveighoffer

Posted in reply to Walter Bright

Permalink

On Thursday, 11 April 2024 at 01:36:57 UTC, Walter Bright wrote:

We certainly could do more with .sar files, we just have to start somewhere.

If we're going to add features to a .tar file, like an index, aren't we then creating our own format and won't be able to use existing .tar programs?

No. tar programs would work fine with it. We could indicate they are normal files, and normal tar programs would just extract an "index" file when expanding, or we could indicate they are vendor-specific extensions, which should be ignored or processed as normal files by other tar programs. We are not the first ones to think of these things, it is in the spec.

Yes, one can skip through a .tar archive indexing as one goes. The problem is one winds up reading the .tar archive. With the .sar format, the index is at the beginning and none of the rest of the file is read in, unless actually needed. .tar is the only archive format I'm aware of that does not have an index section, and that's because it's designed for append-only magtapes. (Talk about ancient obsolete technology!)

This would be a fallback, when an index isn't provided as the first file. So normal tar source files could be supported.

Many archive formats also include optional compression, and various compression methods at that. All that support would have to be added to the compiler, as otherwise I'll get the bug reports "dmd failed with my .zip file!"

tar format doesn't have compression, though the tar executable supports it. I wouldn't recommend zip files as a supported archive format, and using compressed tarballs would definitely result in reading the whole file (you can't skip N bytes when you don't know the compressed size).

Still, the concept of presenting things as a single file is completely distinct from the file format used. The archive format being pluggable is certainly an option.

I stress again, we should not introduce esoteric formats that are mostly equivalent to existing formats without a good reason. The first option should be to use existing formats, seeing if we can fit our use case into them. If that is impossible or prevents certain features, then we can consider using a new format. It should be a high bar to add new file formats to the toolchain, as this affects all tools that people depend on and use.

Think of why we use standard object formats instead of our own format (which would allow much more tight integration with the language).

-Steve

April 11

Re: DMD Source Archive - Why?

Posted by Nick Treleaven
in reply to Steven Schveighoffer

Permalink

Nick Treleaven

Posted in reply to Steven Schveighoffer

Permalink

On Thursday, 11 April 2024 at 15:28:34 UTC, Steven Schveighoffer wrote:

On Thursday, 11 April 2024 at 01:36:57 UTC, Walter Bright wrote:

If we're going to add features to a .tar file, like an index, aren't we then creating our own format and won't be able to use existing .tar programs?

Sounds like a good solution. Users would be able to use e.g. any GUI program that supports tar to extract a file from the archive. The advantage is for reading. D-specific tools should be used to write the file. If there is any concern about this, it could even have a different extension so long as the file format is standard tar - users that know this can still benefit from tar readers. There seems to be precedent for this - apparently .jar files are .zip files.

> >

This would be a fallback, when an index isn't provided as the first file. So normal tar source files could be supported.

Or just error if a tar file doesn't have the expected index file.

April 13

Re: DMD Source Archive - Why?

Posted by Walter Bright
in reply to Steven Schveighoffer

Permalink

Walter Bright

Posted in reply to Steven Schveighoffer

Permalink

On 4/11/2024 8:28 AM, Steven Schveighoffer wrote:
> Think of why we use standard object formats instead of our own format (which would allow much more tight integration with the language).

We use standard object formats because we don't have a linker. I've spent a lot of time trying to understand their byzantine structure. It's not fun work.

I mentioned that the archive support can be pluggable. It's only two functions with a generic interface to them. If we aren't going to move forward with source archives, it would be a giant waste of time to learn .tar and all its variations.

I chose to invent the .sar format because it's 20 lines of code to read them, and about the same to write them. Even doing a survey of the top 10 archive formats would have taken more time than the entire PR, let alone the time spent debating them.

The source archive PR is a proof of concept. The actual archive format is irrelevant.

> or we could indicate they are vendor-specific extensions

Wouldn't that defeat the purpose of being a .tar format?

> It should be a high bar to add new file formats to the toolchain, as this affects all tools that people depend on and use.

Using a .tar format would affect all the dlang source code tools just as much as using the .sar format would.

April 14

Re: DMD Source Archive - Why?

Posted by Steven Schveighoffer
in reply to Walter Bright

Permalink

Steven Schveighoffer

Posted in reply to Walter Bright

Permalink

On Sunday, 14 April 2024 at 06:04:02 UTC, Walter Bright wrote:

On 4/11/2024 8:28 AM, Steven Schveighoffer wrote:

Think of why we use standard object formats instead of our own format (which would allow much more tight integration with the language).

We use standard object formats because we don't have a linker. I've spent a lot of time trying to understand their byzantine structure. It's not fun work.

Exactly, we don't need to be responsible for all the things. Using standard object format means we don't have to write our own linker.

I mentioned that the archive support can be pluggable. It's only two functions with a generic interface to them. If we aren't going to move forward with source archives, it would be a giant waste of time to learn .tar and all its variations.

Fair point. If this doesn't fly, then learning all the variations of tar might not be applicable (though I can say I personally "learned" tar in about 15 minutes, it's really simple).

I chose to invent the .sar format because it's 20 lines of code to read them, and about the same to write them. Even doing a survey of the top 10 archive formats would have taken more time than the entire PR, let alone the time spent debating them.

This misses the point. It's not that it's easy to add to the compiler. Both are easy, both are straightforward, one might be easier than the other, but it's probably a wash (maybe 2 hours vs 4 hours?)

The problem is all the other tools that people might want to use. And specifically, I'm talking about IDEs. You have a 20 line solution in D, how does that help an IDE written in Java? However, Java has tar support that is tried and tested, and probably already in the IDE codebase itself.

Writing 20 lines of code isn't "mission accomplished". We now have to ask all IDE providers to support this for symbol lookup. That's what I'm talking about.

The source archive PR is a proof of concept. The actual archive format is irrelevant.

This is good, and I understand what you are trying to say. As long as it remains PoC, with the expectation that if it turns out to be useful, we address these ecosystem issues, then I have no objections.

> >

or we could indicate they are vendor-specific extensions

Wouldn't that defeat the purpose of being a .tar format?

No, vendor-specific sections are in the spec. Existing tar programs would still read these just fine.

But even if we wanted to avoid that, adding an index can be done by including a specific filename that the D compiler recognizes as the index.

> >

It should be a high bar to add new file formats to the
toolchain, as this affects all tools that people depend on and use.

Using a .tar format would affect all the dlang source code tools just as much as using the .sar format would.

Yes, of course. It's just, will there be a ready-made library available for whatever IDEs are using for language/libraries? With .sar, the answer is no (it hasn't been invented yet). With .tar, it's likely yes.

-Steve

Top | Forum index | About this forum

Forums