Do we need faster regex? (page 2)

The tables are stored compressed. We do that because otherwise the binary size of Phobos would grow by multiple megabytes (think 10, not 1). This is why when you trigger usage of regex at CT it slows down a whole lot. One of my optimizations was to prevent this unless it was needed. However there are a lot of templates in use in each of the tables, and these could all be swapped out for something else and I think that might shave off something close to 100ms. The tradeoffs for the tables are good ones, but it does bite us a bit here.

On Monday, 18 December 2023 at 18:09:16 UTC, H. S. Teoh wrote: > On Tue, Dec 19, 2023 at 06:47:00AM +1300, Richard (Rikki) Andrew Cattermole via Digitalmars-d wrote: >> Yeah basically std.regex is no longer the cause for importing std.regex slowdown. >> >> Its stuff like std.conv and std.uni. > > I haven't noticed too much horrible slowdown from std.conv, but std.uni could use some fixing. I'm tempted to suggest that those internal tables in std.uni should be pre-generated rather than done at compile-time. There comes a point where repeatedly doing something at every compile just isn't worth it when the desired output could be autogenerated beforehand and saved as a straight .d file with hard-coded values. > > > T std.conv slowdown comes from `std.conv.to!float` specifically. The other ones looks fine I guess. about rikki's statement, it can help, but whenever you import the values, it will still take a lot of time in compilation time, you can look at `core.sys.windows.uuid` for reference, most of the compilation time spent on any windows module is this one taking a lot, and when you look at it, it is only a lot of definitions. So, yes, I think it could be a lot better. One example I've done was to separate the complete generation in Metal and never import the file which actually does all the CTFE and mixin's. Think of this file like only important to the linker and not needed to be used. A .di file with only definitions could help even more, and let other thing implement it so only the symbols would be imported.

On Monday, 18 December 2023 at 17:16:40 UTC, H. S. Teoh wrote: > On Sun, Dec 17, 2023 at 03:43:22PM +0000, Dmitry Olshansky via Digitalmars-d wrote: >> So I’ve been working on rewind-regex trying to correct all of the decisions in the original engine that slowed it down, dropping some features that I knew I cannot implement efficiently (backreferences have to go). >> >> So while I’m obsessed with simplicity and speed I thought I’d ask people if it was an issue and what they really want from gen2 regex library. > [...] > > What I really want: > > - Reduce compile-time cost of `import std.regex;` to zero, or at least > close enough it's no longer noticeable. > > - Automatic caching of fixed-string regexes, i.e., the equivalent of: > > struct Re(string ctKnownRe) { > Regex!char re; > shared static this() { > re = regex(ctKnownRe); > } > Regex!char Re() { > return re; > } > } A runtime cache should work, btw std.regex caches regexes (at least those passed as strings to match* family of functions). > > void main() { > string s; > if (s.matchFirst(Re!`some\+pattern`)) { > ... > } > > // This should reuse the Regex instance from before: > if (s.matchFirst(Re!`some\+pattern`)) { > ... > } > } I'm thinking if it's worth it to intern patterns like that. > - Reasonably fast runtime performance. I don't really care if it's the > top-of-the-line superfast regex matcher, even though that would be > really nice. The primary pain points are the cost of import, and the > need to manually write code for automatic caching of fixed runtime > regexen. > - Get rid of ctRegex -- it adds a huge compile-time cost with > questionable runtime benefit. Unless there's a way to do this at > compile-time that *doesn't* add like 5 seconds per regex to compile > times. Yup it's dropped, to be eventually replaced by JIT which is both better at compile-time and much more flexible at run-time. --- Dmitry Olshansky CEO @ Glowlabs https://olshansky.me

On Mon, Dec 18, 2023 at 06:34:51PM +0000, Dmitry Olshansky via Digitalmars-d wrote: [...] > A runtime cache should work, btw std.regex caches regexes (at least those passed as strings to match* family of functions). Cool, didn't know that. :-) [...] > > - Get rid of ctRegex -- it adds a huge compile-time cost with > > questionable runtime benefit. Unless there's a way to do this at > > compile-time that *doesn't* add like 5 seconds per regex to > > compile times. > > Yup it's dropped, to be eventually replaced by JIT which is both better at compile-time and much more flexible at run-time. [...] Awesome stuff! T -- A mathematician is a device for turning coffee into theorems. -- P. Erdos

December 30

Re: Do we need faster regex?

Posted by Dmitry Olshansky
in reply to BoQsc

Permalink

Dmitry Olshansky

Posted in reply to BoQsc

Permalink

On Tuesday, 26 December 2023 at 15:58:03 UTC, BoQsc wrote:

The focus should always be on slower but more maintainable and simple implementation.

Agreed on simple, though humbly disagree on speed.

The efficiency and speed should be left for specific cases where it is needed and, again with as much clarity as possible.

Support and Features over efficiency and speed.

Well we already have std.regex which is quite fast and feature rich, sadly simplicity is not an option if we are to support full ECMAScript regex language and basic level 1 unicode regex.

The question of "is anyone even using that?" is a wrong question.
If it's a common behaviour among implementations, it should exist.

RE2 specifically avoids complicated features that block design of a fast engine.

If behaviour makes sense, it should exist and not avoided to be implemented just to gain some fraction of speed that can be gained in other ways or again, using more specific implementation for the job.

If you think that your implementation can be grown into supporting all crucial features, then it should be a good draft for other people to explore and complete implementation.

There are very few folks who want to develop regex engine I think. Encouraging collaboration is an interesting angle I did not account for.

Else it should be a specific case implementation that could be selected if there is a need for speed but with the sacrifice of features.

Okay, I understand your points. For the most part I could summarize my point of view as follows.
Building regex engine without regard for speed is not challenging for me, nor do I think that a simple slow engine could be gradually improved into simple fast engine. Speed is something you have to think of laying the first brick of whatever you are building, iff speed is desired.

—
Dmitry Olshansky
CEO @ Glowlabs
https://olshansky.me

Forums