This is implemented now using the "skip" parser field, indicating
to skip the first N characters. This also avoids a recursive parse
in one more case (more efficient). This simplifies the state machine
a little bit, while the rest of the code needs to properly account
for the value of the skip field.
Also allow whitespace prefix without penalty.
Modify the test suite to psuedo-randomly add a weekday prefix
to the formats that allow it (all except the purely numeric ones).
New option SimpleErrorMessages that avoids allocation in the error path. It's off by default to preserve backwards compatibility.
Added benchmark BenchmarkBigParseAnyErrors that takes the big set of test cases, and injects errors to make them fail at pseudo-random places.
This optimization speeds up the error path runtime by 4x and reduces error path allocation bytes by 13x!
Audit every stateDate so every unexpected alternative will fail.
In the process, fixed some newly found bugs:
* Extend format yyyy-mon-dd to allow times to follow it. Also allow full month name.
* Allow full day name before month (e.g., Monday January 4th, 2017)
Relevant confirmatory test cases were added.
Optimize the common and special case where mm and dd are the same length, just swap in place. Avoids having to reparse the entire string.
For this case, it's about 30% faster and reduces allocations by about 15%.
This format is especially common, hence the reason to optimize for this case.
Also fix the case for ambiguous date/time in the mm:dd:yyyy format.
Previously, for ambiguous date strings, it was always calling parse twice even when the first parse would have been successful.
Refactor so that parsing isn't re-attempted unless the first parse fails ambiguously.
Benchmark results show that with RetryAmbiguousDateWithSwap(true), it's now about 6.5% faster (ns/op) and reduces allocated bytes by 3.4%.
Uses a memory pool for parser struct and format []byte
Uses a new go 1.20 feature to avoid allocations for []byte to string conversions in allowable cases.
go 1.20 also fixes a go bug for parsing fractional sec after a comma, so we can eliminate a workaround.
The remaining allocations are mostly unavoidable (e.g., time.Parse constructing a FixedZone location or part to strings.ToLower).
Results show an 89% reduction in allocated bytes for the big benchmark cases, and for some formats an allocation can be avoided entirely.
There is also a resulting 26% speedup in ns/op.
Details:
BEFORE:
cpu: 12th Gen Intel(R) Core(TM) i7-1255U
BenchmarkShotgunParse-12 19448 B/op 474 allocs/op
BenchmarkParseAny-12 4736 B/op 42 allocs/op
BenchmarkBigShotgunParse-12 1075049 B/op 24106 allocs/op
BenchmarkBigParseAny-12 241422 B/op 2916 allocs/op
BenchmarkBigParseIn-12 244195 B/op 2984 allocs/op
BenchmarkBigParseRetryAmbiguous-12 260751 B/op 3715 allocs/op
BenchmarkShotgunParseErrors-12 67080 B/op 1679 allocs/op
BenchmarkParseAnyErrors-12 15903 B/op 200 allocs/op
AFTER:
BenchmarkShotgunParse-12 19448 B/op 474 allocs/op
BenchmarkParseAny-12 48 B/op 2 allocs/op
BenchmarkBigShotgunParse-12 1075049 B/op 24106 allocs/op
BenchmarkBigParseAny-12 25394 B/op 824 allocs/op
BenchmarkBigParseIn-12 28165 B/op 892 allocs/op
BenchmarkBigParseRetryAmbiguous-12 37880 B/op 1502 allocs/op
BenchmarkShotgunParseErrors-12 67080 B/op 1679 allocs/op
BenchmarkParseAnyErrors-12 3851 B/op 117 allocs/op
Fix for this bug mentioned in https://github.com/araddon/dateparse/pull/134
Also, the other cases mentioned in this PR are not valid formats, so add them to the TestParseErrors test, to document that this is expected.