close
Skip to content

Update timing parsing regex to account for scientific notation.#856

Merged
WardBrian merged 1 commit into
stan-dev:developfrom
chinandrew:timing_scientific_notation
May 26, 2026
Merged

Update timing parsing regex to account for scientific notation.#856
WardBrian merged 1 commit into
stan-dev:developfrom
chinandrew:timing_scientific_notation

Conversation

@chinandrew
Copy link
Copy Markdown
Contributor

Submission Checklist

  • Run unit tests
  • Declare copyright holder and open-source license: see below

Summary

The regex in parse_timing_lines() has a bug where if scientific notation is present, only the exponent is captured:

In [1]: from cmdstanpy.utils.stancsv import parse_timing_lines

In [2]: lines = [
   ...:     b'# \n',
   ...:     b'#  Elapsed Time: 630099 seconds (Warm-up)\n',
   ...:     b'#                435856 seconds (Sampling)\n',
   ...:     b'#                1.06595e+06 seconds (Total)\n',
   ...:     b'# \n'
   ...: ]

In [3]: parse_timing_lines(lines)
Out[3]: {'Warm-up': 630099.0, 'Sampling': 435856.0, 'Total': 6.0}

This change updates the regex for the first capture group to include e, +, and -, so the output becomes

In [3]: parse_timing_lines(lines)
Out[3]: {'Warm-up': 630099.0, 'Sampling': 435856.0, 'Total': 1065950.0}

Not sure if support for - is really needed in practice, but I added it for completeness. In keeping with the existing escaped . in the regex, I also escape + and -, though I believe no escapes are actually needed in the character class, i.e. ([\d.e+-]+) is valid.

Note that this can lead to the case where the Warm-Up + Sampling != Total due to truncation from the scientific notation. In this example, 1.06595e+06 gets parsed to 1065950.0 but Warm-Up + Sampling is 630099.0 + 435856.0 = 1065955.0. Arguably this is not a parsing issue since it's correctly reading the comment, but perhaps some additional logic to set Total = Warm-Up + Sampling might be useful.

Copyright and Licensing

Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company):

Myself

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:

@WardBrian
Copy link
Copy Markdown
Member

Thanks for the contribution @chinandrew. Is this a hypothetical issue or did you actually get the new test output from a run of cmdstan?

@chinandrew
Copy link
Copy Markdown
Contributor Author

The example above is from an actual cmdstanpy run:

b'#  Elapsed Time: 630099 seconds (Warm-up)\n',
b'#                435856 seconds (Sampling)\n',
b'#                1.06595e+06 seconds (Total)\n',

For the unit test, I modified 435856 to 4358560.0e-1 to capture the - case, though I just ran an extremely short run and small values appear to get rounded to 0:

#  Elapsed Time: 0.015 seconds (Warm-up)
#                0 seconds (Sampling)
#                0.015 seconds (Total)

so perhaps the - case isn't even needed.

Copy link
Copy Markdown
Member

@WardBrian WardBrian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@WardBrian WardBrian merged commit c213d8e into stan-dev:develop May 26, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants