December 22, 2011
Le 22/12/2011 04:41, Jonathan M Davis a écrit :
> 
> So, what do you think?
> 
> - Jonathan M Davis

Honestly ? The simpler, the better.

I've hardly ever seen anyone complain about the primitive formatting
functions available in other languages. It's not like anyone wants
something super sophisticated when doing date formatting.
Experience tells me that for such mundane tasks, what you want is:
- something that does what you want simply
- something fast

And that's about it. In fact, I am not even sure that custom formats are useful, when standard ones are perfectly suited for the task.
December 22, 2011
On Thursday, December 22, 2011 11:51:55 Somedude wrote:
> And that's about it. In fact, I am not even sure that custom formats are useful, when standard ones are perfectly suited for the task.

The standard ones are there, and I'd definitely recommend using them in the general case, but some people do require custom formatting, so the functions are needed. But yes, if you don't need them, then don't use them. The standard formats are standard for a reason.

- Jonathan M Davis
December 22, 2011
Le 22/12/2011 11:40, Jacob Carlborg a écrit :
> 
> Yeah, I don't get this. Most modules in Phobos are too large, in my opinion.
> 

It largely is a matter of taste, I think. There are advantages in
minimizing the size of files but there are also advantages in minimizing
the number of files.
But for datetime.d, it has largely gone beyond my own point of
acceptability (which is about 5,000 lines, if that means anything).
December 22, 2011
On Thursday, December 22, 2011 02:12:31 Walter Bright wrote:
> Timezone information is not necessary to measure elapsed time or relative time, for example.

The type requires it though. No, comparison doesn't require the time zone, but many (most?) of the other operations do. And the type can't be separated from the time zone. That's part of the whole point of how SysTime is designed. It holds its time internally in UTC and then uses the time zone to adjust the time whenever a property or other function is used which requires the time in that time zone. That way, you avoid all of the issues and bugs that result from converting the time. The cost of that is that you can't not have a time zone and use SysTime. So, if someone cares about saving that little bit of extra size in their executable by not using the time zone, they're going to have to use the C functions or design their own time code.

> Can it be added with PIMPL? PIMPL is good for more than just information hiding, it can also be a fine way to avoid pulling in things unless they are actually dynamically used (as opposed to pulling them in if they are statically referenced).

I don't follow you. You mean use PIMPL for the time zone? I haven't a clue how you're going to do PIMPL without .di files, and Phobos doesn't use .di files (and arguably _can't_, because it would destroy inlining and CTFE). Not to mention, PIMPL would make SysTime less efficient, because in order to avoid needing to know what the functions are on a TimeZone, you'd have to hide the bodies of a number of SysTime's functions, which would disallow inlining and CTFE (not that CTFE is terribly likely to be used with SysTime, but inlining could be very important). You'd be losing efficiency of execution just to save a few KB in the executable.

Rearranging stuff to save some size in the executable without costing efficiency is one thing, but if it's going to cost efficiency, then I'm generally going to be against it.

> Please check and see if additional symbols are pulled in or not. I've seen a lot of template code where the author of that code never checked and was surprised at the instantiations that were happening that he overlooked.

Nothing would pull in toCustomString or fromCustomString in unless the user decided to call them, because they're templated and no other functions in Phobos are going to use them at this point (if ever). What exactly will be pulled in when they _are_ called, I don't know, because they're not completed yet. It probably wouldn't be much though, since toCustomString is basically a fancy toString. But there's a good chance that it's stuff that's already being pulled in for the standard string functions (e.g. toISOExtString).

- Jonathan M Davis
December 22, 2011
On 22/12/2011 03:41, Jonathan M Davis wrote:
<snip>
> Stewart Gordon has a library that takes a different approach (
> http://pr.stewartsplace.org.uk/d/sutil/datetime_format.html ). It does away
> with % flags and uses maximul munch with each of the flags being name such that
> they don't overlap in a way that would make certain combinations of flags
> impossible.

If you mean such things as writing a datum twice consecutively in two different formats, it can be done using an empty literal.  For example, "Mmm''m" would today generate "Dec12".  Not that I can see any use for such a format, just showing that it can be done.

> It then requires that characters which are not part of the flags be
> surrounded by single quotes.

Wrong.  It requires _letters_ that aren't flags to be literalised, and that can be done either by surrounding with '...' or by prefixing with `.  Other characters that have no meaning are automatically literal.

> It's an interesting approach, but it isn't as
> flexible as it could be because of its use of maximul munch instead of % flags.

How do you mean?

> So, I've come up with something new which tries to take the best of both. On
> the whole, I think that it's fairly straightforward, and the flags are
> generally recognizable and memorable (though there are a lot). It's also
> definitely extremely flexible (e.g. you can pass it functions to generate
> portions of the string if the existing flags don't get you quite what you
> need). But I'd like some feedback on it before I spend a lot of time on the
> implementation.
>
> This page has the docs for std.datetime with everything else but the proposed
> custom formatting functions for SysTime stripped out of it:
>
> http://jmdavis.github.com/d-programming-language.org/std_datetime.html

Looks complicated compared to mine at first sight.  Maybe I just need to spend a bit of time looking at it in more detail.

Stewart.
December 22, 2011
On Thursday, December 22, 2011 12:01:31 Somedude wrote:
> Le 22/12/2011 11:40, Jacob Carlborg a écrit :
> > Yeah, I don't get this. Most modules in Phobos are too large, in my opinion.
> 
> It largely is a matter of taste, I think. There are advantages in
> minimizing the size of files but there are also advantages in minimizing
> the number of files.
> But for datetime.d, it has largely gone beyond my own point of
> acceptability (which is about 5,000 lines, if that means anything).

Well, a large portion of the file is documentation and unit tests, and the number of lines that the unit tests take up should go down as I refactor them (which I've done some of, but I've still got a long way to go), but it's never going to be anywhere near as small as 5,000 lines. SysTime alone is over 5,000 lines (though again, much of that is documentation and unit tests). But ultimately, I think that whether a module is too large or not is a function of its API rather than the amount of source code. It's a question of how digestible the documentation is. And by that count, std.datetime is still quite large, but it's a very different measurement.

- Jonathan M Davis
December 22, 2011
On 2011-12-22 12:01, Somedude wrote:
> Le 22/12/2011 11:40, Jacob Carlborg a écrit :
>>
>> Yeah, I don't get this. Most modules in Phobos are too large, in my
>> opinion.
>>
>
> It largely is a matter of taste, I think. There are advantages in
> minimizing the size of files but there are also advantages in minimizing
> the number of files.
> But for datetime.d, it has largely gone beyond my own point of
> acceptability (which is about 5,000 lines, if that means anything).

If there are too many files you divide them up in several packages. If there are too many packages you divide them up in several sub-packages or libraries/projects.

This approach should be taken through out the whole code. From statements, via functions, classes, modules and packages, up to libraries/projects.

-- 
/Jacob Carlborg
December 22, 2011
On 2011-12-22 12:27, Jonathan M Davis wrote:
> On Thursday, December 22, 2011 12:01:31 Somedude wrote:
>> Le 22/12/2011 11:40, Jacob Carlborg a écrit :
>>> Yeah, I don't get this. Most modules in Phobos are too large, in my
>>> opinion.
>>
>> It largely is a matter of taste, I think. There are advantages in
>> minimizing the size of files but there are also advantages in minimizing
>> the number of files.
>> But for datetime.d, it has largely gone beyond my own point of
>> acceptability (which is about 5,000 lines, if that means anything).
>
> Well, a large portion of the file is documentation and unit tests, and the
> number of lines that the unit tests take up should go down as I refactor them
> (which I've done some of, but I've still got a long way to go), but it's never
> going to be anywhere near as small as 5,000 lines. SysTime alone is over 5,000
> lines (though again, much of that is documentation and unit tests). But
> ultimately, I think that whether a module is too large or not is a function of
> its API rather than the amount of source code. It's a question of how
> digestible the documentation is. And by that count, std.datetime is still
> quite large, but it's a very different measurement.
>
> - Jonathan M Davis

Even if you cut it in half I think it's way too large. I think 5000 lines are too large as well.

I don't agree with what you're saying about the API. If a module has 5+k lines and only one public function and the rest are private functions I will still think it's too large.

About the unit tests. If they take up so much of the module then move them it their own module(s). And now everyone will say that it's very useful to have the unit tests next to the function. I don't agree with that when the unit test is more then around five lines.

I will have the same problem with std.serialization/Orange if that will end up in Phobos. In Orange I'm testing one feature in one module and all modules are located in a specific directory. The shortest testing module is 54 lines. The average is probably around 70 lines. I'm not particular happy about putting all those tests in one file and even less happy about putting them next to all the regular code making those module EVEN large.

You should treat your testing code just as you treat your "regular" code. Just as well designed, just as modularized, just as effective, just as clean. The testing code is in fact just as much part of the "regular" code as the rest of the code.

-- 
/Jacob Carlborg
December 22, 2011
On 22/12/2011 03:41, Jonathan M Davis wrote:
<snip>
> http://jmdavis.github.com/d-programming-language.org/std_datetime.html

Have I got all this right?

- a flag goes from % up to another %, a {, a whitespace or punctuation character
- flags with [...] portions are exceptions to this rule, extending to the closing ]
- %{ or %} is a literal { or }
- all literal characters that aren't classed as whitespace or punctuation must be enclosed in {...}
- {%} is just a literal %
- in %C2* and %Cn* flags, C must be either _ (space) or 0


To look at the detail:

- In %nY, does "up to n" mean that it truncates years longer than that to that many characters?  Does it strip any leading 0s that result from this truncation?  (Under %3Y, does 2011 become 11 or 011?)

- But the approach to formatting the year is nicely systematic.  (I've thought of possibly adding to my scheme a means of formatting dates to arbitrary length with sign, in order to support the ISO format for years that may be outside the 1BC-9999 range.

- If you're going to have the ISO week number in the system, it seems to me you should also have the week-numbering year.  (I've thought about possibly adding these to my scheme.)

- In %F, What does "as many digits as necessary" mean?  In particular, why in the example code does it give 12/100000 and not 3/25000 despite the latter being shorter?

- Can only literal stuff be included within %BC[...] and %AD[...]?

- %cond and %func - I'm made to wonder to what extent this would be used and to what extent it would just be simpler to use if / ?: / ~.  Anyway, I assume that if there's more than one, they will reference the template arguments in order.  Can %cond contain other %*[...] flags?  Indeed, can %cond's be nested to arbitrary depth?

- Does %localeDate give the short or the long date format?  (This and %localeTime are handled by separate functions in my library - I didn't think there was any real use case for including such a thing within a longer formatted date/time string.)


On the whole, it seems a powerful system.  The format strings can get quite obfuscated, but at least the flags that are likely to be commonly used aren't too bad.

It seems strange to require explicit notations both for flags and for alphanumeric literals.  But I suppose it makes the literals easier to read than having to follow where all those % signs are.

One thing I noticed doesn't appear in your scheme is ordinal suffix.  OK, so arbitrary alignment fields and collapsible portions aren't to be seen either, but those are quite rare in format string schemes anyway.

Stewart.
December 22, 2011
Walter Bright wrote:
> My first thought is that std.datetime is already very large. Few will
> need a custom date formatter, so it should be in a separate module to:
>
> 1. reduce cognitive load on the programmer
>
> 2. reduce the overhead pulled in for every program that may want to use
> an std.datetime function, but not need custom formatting

Why not just extract unittest code to separate module?