Fine-tuning with Python#
The recommended way to do training is with the main.py
script in ocp. One of the reasons for that is training often takes a long time and is better suited for queue systems like slurm. However, you can submit Python scripts too, and it is possible to run notebooks in Slurm too. Here we work out a proof of concept in training from Python and a Jupyter notebook.
%run ../ocp-tutorial.ipynb
import logging
from ocpmodels.common.utils import SeverityLevelBetween
root = logging.getLogger()
root.setLevel(logging.INFO)
log_formatter = logging.Formatter(
"%(asctime)s (%(levelname)s): %(message)s",
datefmt="%Y-%m-%d %H:%M:%S",)
# Send INFO to stdout
handler_out = logging.FileHandler('out.txt', 'w')
handler_out.addFilter(
SeverityLevelBetween(logging.INFO, logging.WARNING)
)
handler_out.setFormatter(log_formatter)
root.addHandler(handler_out)
# Send WARNING (and higher) to stderr
handler_err = logging.FileHandler('out.txt', 'w+')
handler_err.setLevel(logging.WARNING)
handler_err.setFormatter(log_formatter)
root.addHandler(handler_err)
! ase db ../fine-tuning/oxides.db
id| age|formula|calculator| energy|natoms| fmax|pbc| volume|charge| mass
1|224m|Sn2O4 |unknown |-41.359| 6|0.045|TTT| 64.258| 0.000|301.416
2|224m|Sn2O4 |unknown |-41.853| 6|0.025|TTT| 66.526| 0.000|301.416
3|224m|Sn2O4 |unknown |-42.199| 6|0.010|TTT| 68.794| 0.000|301.416
4|224m|Sn2O4 |unknown |-42.419| 6|0.006|TTT| 71.062| 0.000|301.416
5|224m|Sn2O4 |unknown |-42.534| 6|0.011|TTT| 73.330| 0.000|301.416
6|224m|Sn2O4 |unknown |-42.562| 6|0.029|TTT| 75.598| 0.000|301.416
7|224m|Sn2O4 |unknown |-42.518| 6|0.033|TTT| 77.866| 0.000|301.416
8|224m|Sn2O4 |unknown |-42.415| 6|0.010|TTT| 80.134| 0.000|301.416
9|224m|Sn2O4 |unknown |-42.266| 6|0.006|TTT| 82.402| 0.000|301.416
10|224m|Sn2O4 |unknown |-42.083| 6|0.017|TTT| 84.670| 0.000|301.416
11|224m|Sn4O8 |unknown |-81.424| 12|0.012|TTT|117.473| 0.000|602.832
12|224m|Sn4O8 |unknown |-82.437| 12|0.005|TTT|121.620| 0.000|602.832
13|224m|Sn4O8 |unknown |-83.147| 12|0.015|TTT|125.766| 0.000|602.832
14|224m|Sn4O8 |unknown |-83.599| 12|0.047|TTT|129.912| 0.000|602.832
15|224m|Sn4O8 |unknown |-83.831| 12|0.081|TTT|134.058| 0.000|602.832
16|224m|Sn4O8 |unknown |-83.898| 12|0.001|TTT|138.204| 0.000|602.832
17|224m|Sn4O8 |unknown |-83.805| 12|0.001|TTT|142.350| 0.000|602.832
18|224m|Sn4O8 |unknown |-83.586| 12|0.002|TTT|146.496| 0.000|602.832
19|224m|Sn4O8 |unknown |-83.262| 12|0.002|TTT|150.642| 0.000|602.832
20|224m|Sn4O8 |unknown |-82.851| 12|0.013|TTT|154.788| 0.000|602.832
Rows: 295 (showing first 20)
checkpoint = get_checkpoint('GemNet-OC OC20+OC22')
from ocpmodels.common.relaxation.ase_utils import OCPCalculator
calc = OCPCalculator(checkpoint=checkpoint, trainer='forces', cpu=False)
amp: true
cmd:
checkpoint_dir: /home/jovyan/shared-scratch/jkitchin/tutorial/ocp-tutorial/advanced/checkpoints/2023-08-01-17-18-56
commit: 3973c79
identifier: ''
logs_dir: /home/jovyan/shared-scratch/jkitchin/tutorial/ocp-tutorial/advanced/logs/tensorboard/2023-08-01-17-18-56
print_every: 100
results_dir: /home/jovyan/shared-scratch/jkitchin/tutorial/ocp-tutorial/advanced/results/2023-08-01-17-18-56
seed: null
timestamp_id: 2023-08-01-17-18-56
dataset: null
gpus: 1
logger: tensorboard
model: gemnet_oc
model_attributes:
activation: silu
atom_edge_interaction: true
atom_interaction: true
cbf:
name: spherical_harmonics
cutoff: 12.0
cutoff_aeaint: 12.0
cutoff_aint: 12.0
cutoff_qint: 12.0
direct_forces: true
edge_atom_interaction: true
emb_size_aint_in: 64
emb_size_aint_out: 64
emb_size_atom: 256
emb_size_cbf: 16
emb_size_edge: 512
emb_size_quad_in: 32
emb_size_quad_out: 32
emb_size_rbf: 16
emb_size_sbf: 32
emb_size_trip_in: 64
emb_size_trip_out: 64
envelope:
exponent: 5
name: polynomial
extensive: true
forces_coupled: false
max_neighbors: 30
max_neighbors_aeaint: 20
max_neighbors_aint: 1000
max_neighbors_qint: 8
num_after_skip: 2
num_atom: 3
num_atom_emb_layers: 2
num_before_skip: 2
num_blocks: 4
num_concat: 1
num_global_out_layers: 2
num_output_afteratom: 3
num_radial: 128
num_spherical: 7
otf_graph: true
output_init: HeOrthogonal
qint_tags:
- 1
- 2
quad_interaction: true
rbf:
name: gaussian
regress_forces: true
sbf:
name: legendre_outer
symmetric_edge_symmetrization: false
noddp: false
optim:
batch_size: 16
clip_grad_norm: 10
ema_decay: 0.999
energy_coefficient: 1
eval_batch_size: 16
eval_every: 5000
factor: 0.8
force_coefficient: 1
load_balancing: atoms
loss_energy: mae
loss_force: atomwisel2
lr_initial: 0.0005
max_epochs: 80
mode: min
num_workers: 2
optimizer: AdamW
optimizer_params:
amsgrad: true
patience: 3
scheduler: ReduceLROnPlateau
weight_decay: 0
slurm:
additional_parameters:
constraint: volta32gb
cpus_per_task: 3
folder: /checkpoint/abhshkdz/ocp_oct1_logs/57632342
gpus_per_node: 8
job_id: '57632342'
job_name: gnoc_oc22_oc20_all_s2ef
mem: 480GB
nodes: 8
ntasks_per_node: 8
partition: ocp,learnaccel
time: 4320
task:
dataset: oc22_lmdb
description: Regressing to energies and forces for DFT trajectories from OCP
eval_on_free_atoms: true
grad_input: atomic forces
labels:
- potential energy
metric: mae
primary_metric: forces_mae
train_on_free_atoms: true
type: regression
trainer: forces
Split the data into train, test, val sets#
! rm -fr train.db test.db val.db
train, test, val = train_test_val_split('../fine-tuning/oxides.db')
train, test, val
(PosixPath('/home/jovyan/shared-scratch/jkitchin/tutorial/ocp-tutorial/advanced/train.db'),
PosixPath('/home/jovyan/shared-scratch/jkitchin/tutorial/ocp-tutorial/advanced/test.db'),
PosixPath('/home/jovyan/shared-scratch/jkitchin/tutorial/ocp-tutorial/advanced/val.db'))
Setup the training code#
We start by making the config.yml. We build this from the calculator checkpoint.
yml = generate_yml_config(checkpoint, 'config.yml',
delete=['slurm', 'cmd', 'logger', 'task', 'model_attributes',
'optim.loss_force', # the checkpoint setting causes an error
'dataset', 'test_dataset', 'val_dataset'],
update={'gpus': 1,
'task.dataset': 'ase_db',
'optim.eval_every': 1,
'optim.max_epochs': 5,
# Train data
'dataset.train.src': 'train.db',
'dataset.train.a2g_args.r_energy': True,
'dataset.train.a2g_args.r_forces': True,
# Test data - prediction only so no regression
'dataset.test.src': 'test.db',
'dataset.test.a2g_args.r_energy': False,
'dataset.test.a2g_args.r_forces': False,
# val data
'dataset.val.src': 'val.db',
'dataset.val.a2g_args.r_energy': True,
'dataset.val.a2g_args.r_forces': True,
})
yml
PosixPath('/home/jovyan/shared-scratch/jkitchin/tutorial/ocp-tutorial/advanced/config.yml')
Setup the training task#
This essentially allows several opportunities to define and override the config. You start with the base config.yml, and then via “command-line” arguments you specify changes you want to make.
The code is build around submitit
, which is often used with Slurm, but also works locally.
We have to mimic the main.py
setup to get the arguments and config setup. Here is a minimal way to do this.
from ocpmodels.common.flags import flags
parser = flags.get_parser()
args, args_override = parser.parse_known_args(["--mode=train",
"--config-yml=config.yml",
f"--checkpoint={checkpoint}",
"--amp"])
args, args_override
(Namespace(mode='train', config_yml=PosixPath('config.yml'), identifier='', debug=False, run_dir='./', print_every=10, seed=0, amp=True, checkpoint='gnoc_oc22_oc20_all_s2ef.pt', timestamp_id=None, sweep_yml=None, submit=False, summit=False, logdir=PosixPath('logs'), slurm_partition='ocp', slurm_mem=80, slurm_timeout=72, num_gpus=1, distributed=False, cpu=False, num_nodes=1, distributed_port=13356, distributed_backend='nccl', local_rank=0, no_ddp=False, gp_gpus=None),
[])
Next, we build the first stage in our config. This starts with the file config.yml, then updates it with the args
from ocpmodels.common.utils import build_config, new_trainer_context
config = build_config(args=args, args_override={})
config
{'amp': True,
'checkpoint': 'gnoc_oc22_oc20_all_s2ef.pt',
'dataset': {'test': {'a2g_args': {'r_energy': False, 'r_forces': False},
'src': 'test.db'},
'train': {'a2g_args': {'r_energy': True, 'r_forces': True},
'src': 'train.db'},
'val': {'a2g_args': {'r_energy': True, 'r_forces': True}, 'src': 'val.db'}},
'gpus': 1,
'model': {'activation': 'silu',
'atom_edge_interaction': True,
'atom_interaction': True,
'cbf': {'name': 'spherical_harmonics'},
'cutoff': 12.0,
'cutoff_aeaint': 12.0,
'cutoff_aint': 12.0,
'cutoff_qint': 12.0,
'direct_forces': True,
'edge_atom_interaction': True,
'emb_size_aint_in': 64,
'emb_size_aint_out': 64,
'emb_size_atom': 256,
'emb_size_cbf': 16,
'emb_size_edge': 512,
'emb_size_quad_in': 32,
'emb_size_quad_out': 32,
'emb_size_rbf': 16,
'emb_size_sbf': 32,
'emb_size_trip_in': 64,
'emb_size_trip_out': 64,
'envelope': {'exponent': 5, 'name': 'polynomial'},
'extensive': True,
'forces_coupled': False,
'max_neighbors': 30,
'max_neighbors_aeaint': 20,
'max_neighbors_aint': 1000,
'max_neighbors_qint': 8,
'name': 'gemnet_oc',
'num_after_skip': 2,
'num_atom': 3,
'num_atom_emb_layers': 2,
'num_before_skip': 2,
'num_blocks': 4,
'num_concat': 1,
'num_global_out_layers': 2,
'num_output_afteratom': 3,
'num_radial': 128,
'num_spherical': 7,
'otf_graph': True,
'output_init': 'HeOrthogonal',
'qint_tags': [1, 2],
'quad_interaction': True,
'rbf': {'name': 'gaussian'},
'regress_forces': True,
'sbf': {'name': 'legendre_outer'},
'symmetric_edge_symmetrization': False},
'noddp': False,
'optim': {'batch_size': 16,
'clip_grad_norm': 10,
'ema_decay': 0.999,
'energy_coefficient': 1,
'eval_batch_size': 16,
'eval_every': 1,
'factor': 0.8,
'force_coefficient': 1,
'load_balancing': 'atoms',
'loss_energy': 'mae',
'lr_initial': 0.0005,
'max_epochs': 5,
'mode': 'min',
'num_workers': 2,
'optimizer': 'AdamW',
'optimizer_params': {'amsgrad': True},
'patience': 3,
'scheduler': 'ReduceLROnPlateau',
'weight_decay': 0},
'task': {'dataset': 'ase_db'},
'trainer': 'forces',
'mode': 'train',
'identifier': '',
'timestamp_id': None,
'seed': 0,
'is_debug': False,
'run_dir': './',
'print_every': 10,
'cpu': False,
'submit': False,
'summit': False,
'local_rank': 0,
'distributed_port': 13356,
'world_size': 1,
'distributed_backend': 'nccl',
'gp_gpus': None}
Run the training task#
It is still annoying that if your output is too large the notebook will not be able to be saved. On the other hand, it is annoying to simply capture the output.
We are able to redirect most logging to a file above, but not all of it. The link below will open the file in a browser, and the subsequent cell captures all residual output. We do not need any of that, so it is ultimately discarded.
Alternatively, you can open a Terminal and use tail -f out.txt
to see the progress.
from IPython.display import display, FileLink
display(FileLink('out.txt'))
with new_trainer_context(config=config, args=args) as ctx:
config = ctx.config
task = ctx.task
trainer = ctx.trainer
task.setup(trainer)
task.run()
amp: true
cmd:
checkpoint_dir: ./checkpoints/2023-08-01-17-42-24
commit: 3973c79
identifier: ''
logs_dir: ./logs/tensorboard/2023-08-01-17-42-24
print_every: 10
results_dir: ./results/2023-08-01-17-42-24
seed: 0
timestamp_id: 2023-08-01-17-42-24
dataset:
a2g_args:
r_energy: true
r_forces: true
src: train.db
gpus: 1
logger: tensorboard
model: gemnet_oc
model_attributes:
activation: silu
atom_edge_interaction: true
atom_interaction: true
cbf:
name: spherical_harmonics
cutoff: 12.0
cutoff_aeaint: 12.0
cutoff_aint: 12.0
cutoff_qint: 12.0
direct_forces: true
edge_atom_interaction: true
emb_size_aint_in: 64
emb_size_aint_out: 64
emb_size_atom: 256
emb_size_cbf: 16
emb_size_edge: 512
emb_size_quad_in: 32
emb_size_quad_out: 32
emb_size_rbf: 16
emb_size_sbf: 32
emb_size_trip_in: 64
emb_size_trip_out: 64
envelope:
exponent: 5
name: polynomial
extensive: true
forces_coupled: false
max_neighbors: 30
max_neighbors_aeaint: 20
max_neighbors_aint: 1000
max_neighbors_qint: 8
num_after_skip: 2
num_atom: 3
num_atom_emb_layers: 2
num_before_skip: 2
num_blocks: 4
num_concat: 1
num_global_out_layers: 2
num_output_afteratom: 3
num_radial: 128
num_spherical: 7
otf_graph: true
output_init: HeOrthogonal
qint_tags:
- 1
- 2
quad_interaction: true
rbf:
name: gaussian
regress_forces: true
sbf:
name: legendre_outer
symmetric_edge_symmetrization: false
noddp: false
optim:
batch_size: 16
clip_grad_norm: 10
ema_decay: 0.999
energy_coefficient: 1
eval_batch_size: 16
eval_every: 1
factor: 0.8
force_coefficient: 1
load_balancing: atoms
loss_energy: mae
lr_initial: 0.0005
max_epochs: 5
mode: min
num_workers: 2
optimizer: AdamW
optimizer_params:
amsgrad: true
patience: 3
scheduler: ReduceLROnPlateau
weight_decay: 0
slurm: {}
task:
dataset: ase_db
test_dataset:
a2g_args:
r_energy: false
r_forces: false
src: test.db
trainer: forces
val_dataset:
a2g_args:
r_energy: true
r_forces: true
src: val.db
/home/jovyan/shared-scratch/jkitchin/tutorial/ocp-tutorial/fine-tuning/ocp/ocpmodels/datasets/ase_datasets.py:108: UserWarning: Supplied sid is not numeric (or missing). Using dataset indices instead.
warnings.warn(
device 0: 100%|██████████| 2/2 [00:00<00:00, 4.65it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.31it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 4.64it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 4.78it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.47it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 4.62it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.08it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.40it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.11it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.08it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.18it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 4.73it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.43it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.19it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.50it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.56it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.25it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.08it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.02it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 4.93it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.01it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.31it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.53it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.29it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 4.50it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.34it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.30it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 4.58it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.17it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.18it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.75it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.88it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 2.56it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.33it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.04it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.10it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 4.66it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.65it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 4.86it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.16it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.44it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 4.99it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.06it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.08it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 4.87it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.38it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 6.25it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 3.52it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.56it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 4.80it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.21it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.35it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.36it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.39it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 4.66it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.43it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.42it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 4.88it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 4.61it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.52it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.35it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 3.41it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.52it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.11it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.17it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.63it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.45it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.14it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.05it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 4.86it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 4.97it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.15it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 4.90it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 4.81it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 4.91it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.52it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.36it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00, 5.95it/s]
! head out.txt
! tail out.txt
2023-08-01 17:18:26 (WARNING): Unrecognized arguments: ['symmetric_edge_symmetrization']
2023-08-01 17:18:34 (WARNING): Unable to identify OCP trainer, defaulting to `forces`. Specify the `trainer` argument into OCPCalculator if otherwise.
2023-08-01 17:18:34 (WARNING): Unrecognized arguments: ['symmetric_edge_symmetrization']
2023-08-01 17:18:38 (WARNING): Unrecognized arguments: ['symmetric_edge_symmetrization']
2023-08-01 17:18:41 (WARNING): Model gradient logging to tensorboard not yet supported.
pretrained checkpoint, this is the correct behavior. To disable this, delete `scale_dict` from the checkpoint.
2023-08-01 17:18:34 (INFO): Loading dataset: oc22_lmdb
2023-08-01 17:18:34 (INFO): Loading model: gemnet_oc
2023-08-01 17:18:37 (INFO): Loaded GemNetOC with 38864438 parameters.
2023-08-01 17:18:37 (INFO): Loading checkpoint from: gnoc_oc22_oc20_all_s2ef.pt
2023-08-01 17:20:21 (INFO): forcesx_mae: 0.0091, forcesy_mae: 0.0103, forcesz_mae: 0.0080, forces_mae: 0.0091, forces_cos: 0.0461, forces_magnitude: 0.0147, energy_mae: 0.8628, energy_force_within_threshold: 0.0000, loss: 0.8812, epoch: 4.7333
2023-08-01 17:20:22 (INFO): Evaluating on val.
2023-08-01 17:20:22 (INFO): forcesx_mae: 0.0091, forcesy_mae: 0.0103, forcesz_mae: 0.0079, forces_mae: 0.0091, forces_cos: 0.0441, forces_magnitude: 0.0146, energy_mae: 0.8308, energy_force_within_threshold: 0.0000, loss: 0.8492, epoch: 4.8000
2023-08-01 17:20:23 (INFO): Evaluating on val.
2023-08-01 17:20:24 (INFO): forcesx_mae: 0.0090, forcesy_mae: 0.0103, forcesz_mae: 0.0079, forces_mae: 0.0090, forces_cos: 0.0411, forces_magnitude: 0.0146, energy_mae: 0.7813, energy_force_within_threshold: 0.0333, loss: 0.7999, epoch: 4.8667
2023-08-01 17:20:25 (INFO): Evaluating on val.
2023-08-01 17:20:25 (INFO): forcesx_mae: 0.0090, forcesy_mae: 0.0102, forcesz_mae: 0.0077, forces_mae: 0.0090, forces_cos: 0.0638, forces_magnitude: 0.0145, energy_mae: 0.7215, energy_force_within_threshold: 0.0000, loss: 0.7410, epoch: 4.9333
2023-08-01 17:20:26 (INFO): Evaluating on val.
2023-08-01 17:20:26 (INFO): forcesx_mae: 0.0089, forcesy_mae: 0.0101, forcesz_mae: 0.0076, forces_mae: 0.0089, forces_cos: 0.0619, forces_magnitude: 0.0145, energy_mae: 0.6652, energy_force_within_threshold: 0.0000, loss: 0.6855, epoch: 5.0000
2023-08-01 17:20:27 (INFO): Total time taken: 105.35927224159241
Now, you are all set to carry on with what ever subsequent analysis you want to do.