Tutorial#
How do I use logpyle?#
Consider this example of a ‘simulation’ logging various quantities:
#!/usr/bin/env python3
import logging
from random import uniform
from time import sleep
from warnings import warn
from logpyle import (GCStats, IntervalTimer, LogManager, LogQuantity,
add_general_quantities, add_run_info, add_simulation_quantities,
set_dt)
class Fifteen(LogQuantity):
def __call__(self) -> int:
return 15
logger = logging.getLogger(__name__)
def main() -> None:
logmgr = LogManager("log.sqlite", "wo")
logmgr.enable_save_on_sigterm()
# set a run property
logmgr.set_constant("myconst", uniform(0, 1))
# Generic run metadata, such as command line, host, and time
add_run_info(logmgr)
# Time step duration, wall time, ...
add_general_quantities(logmgr)
# Simulation time, time step
add_simulation_quantities(logmgr)
# Additional quantities to log
vis_timer = IntervalTimer("t_vis", "Time spent visualizing")
logmgr.add_quantity(vis_timer)
logmgr.add_quantity(Fifteen("fifteen"))
logmgr.add_quantity(GCStats())
# Watches are printed periodically during execution
logmgr.add_watches(["step.max", "t_sim.max", "t_step.max", "fifteen",
"t_vis", "t_log", "memory_usage_hwm"])
for istep in range(200):
logmgr.tick_before()
dt = uniform(0.01, 0.05)
set_dt(logmgr, dt)
sleep(dt)
# Illustrate custom timers
if istep % 10 == 0:
with vis_timer.start_sub_timer():
sleep(0.05)
# Illustrate warnings capture
if uniform(0, 1) < 0.05:
warn("Oof. Something went awry.")
if istep == 50:
logger.warning("test logging")
print("FYI: Setting watch interval to 5 seconds.")
logmgr.set_watch_interval(5)
if istep == 150:
logger.error("another logging test")
print("FYI: Setting watch interval back to 1 second.")
logmgr.set_watch_interval(1)
logmgr.tick_after()
logmgr.close()
if __name__ == "__main__":
main()
(You can find this example as
examples/log.py
in the logpyle
source distribution.)
Running this script will give you a file log.sqlite
. Note how log quantities
given as “watches” are also periodically logged to the console.
To analyze this data, we need some tools that are distributed as part of this package.
You may use runalyzer
to analyze the log file:
$ runalyzer log.sqlite
Creating an in memory database from provided files
Scanning... [########################################] ETA ?
Importing... [########################################] ETA ?
Runalyzer running on Python 3.8.5 (default, Aug 2 2020, 15:09:07)
[GCC 10.2.0]
Run .help to see help for 'magic' commands
>>>
runalyzer
accepts multiple input files. Behind the scenes, these files are
gathered into a single database for analysis.
You can optionally do this manually by running runalyzer-gather
as a script:
$ runalyzer-gather summary.sqlite log.sqlite
Scanning... [########################################] ETA ?
Importing... [########################################] ETA ?
Specifying multiple data files serves two important functions:
If you would like to compare data from multiple runs, you can simply add data files from multiple runs and analyze them side-by-side.
For distributed-memory runs, you can likewise give data sets from multiple ranks to be gathered in a single file for centralized analysis.
As well as, naturally, any combination of the above.
This is a normal Python console, but it has a few extra tricks up its
sleeve. .help
gives an overview. Here, we’ll focus on demonstrating
two interesting features. First, lets get an overview of all the
logged quantities:
>>> .quantities
id | name | unit | description | rank_aggregator
---+---------+------+-----------------------------+-----------------
7 | dt | s | Simulation Timestep | None
9 | fifteen | None | None | None
5 | step | 1 | Timesteps | None
2 | t_2step | s | Step-to-step duration | None
3 | t_wall | s | Wall time | None
4 | t_log | s | Time spent updating the log | None
6 | t_sim | s | Simulation Time | None
1 | t_step | s | Time step duration | None
8 | t_vis | s | Time spent visualizing | None
Next, we could plot, e.g. time step duration against simulation time:
>>> .plot select $t_sim, $t_2step
Next, a fact that the select
may have given away: The argument to
.plot
is a SQL query that is passed on to sqlite3
, with support
for some additional syntax. You can see what this query expanded to internally:
>>> mangle_sql("select $t_sim, $t_2step")
'select t_sim.value, t_2step.value from runs inner join t_sim on (t_sim.run_id = runs.id) inner join t_2step on (t_2step.run_id = runs.id and t_sim.step = t_2step.step and t_sim.rank=t_2step.rank) '
As you can see, by default these queries return data from all runs. You can filter which runs you are interested in through run properties:
>>> .runprops
cmdline
date
dirname
filename
id
is_parallel
machine
myconst
rank_count
schema_version
unique_run_id
unixtime
simply by adding a SQL where
clause referring to these properties.
As you can see from the demo, you can of course add your own, application-focused run properties.
This means that you have the full power of SQL available to analyze your time series data, for example:
Plot the solution 2-norms of all runs that were run on a Tuesday between time step 2000 and 3000
Make a scatterplot correlating memory transfer amounts and simulation time
In some cases (for example, when logging is optional), it may be convenient to retrieve sub-timers once and reuse them. Adapting the example above, sub-timer initialization becomes:
# Add a timer quantity and retrieve a sub-timer for later use
if logmgr:
vis_timer = IntervalTimer("t_vis", "Time spent visualizing")
logmgr.add_quantity(vis_timer)
time_vis = vis_timer.get_sub_timer()
else:
from contextlib import nullcontext
time_vis = nullcontext()
and usage becomes:
if istep % 10 == 0:
# Use the sub-timer
with time_vis:
sleep(0.05)
(Full example code can be found in
examples/optional-log.py
in the logpyle
source distribution.)
Check out the Analysis section for further information.