Fine-tuning with Python#

The recommended way to do training is with the main.py script in ocp. One of the reasons for that is training often takes a long time and is better suited for queue systems like slurm. However, you can submit Python scripts too, and it is possible to run notebooks in Slurm too. Here we work out a proof of concept in training from Python and a Jupyter notebook.

%run ../ocp-tutorial.ipynb
import logging
from ocpmodels.common.utils import SeverityLevelBetween

root = logging.getLogger()

 
root.setLevel(logging.INFO)

log_formatter = logging.Formatter(
            "%(asctime)s (%(levelname)s): %(message)s",
            datefmt="%Y-%m-%d %H:%M:%S",)

# Send INFO to stdout
handler_out = logging.FileHandler('out.txt', 'w')
handler_out.addFilter(
            SeverityLevelBetween(logging.INFO, logging.WARNING)
        )
handler_out.setFormatter(log_formatter)
root.addHandler(handler_out)

# Send WARNING (and higher) to stderr
handler_err = logging.FileHandler('out.txt', 'w+')
handler_err.setLevel(logging.WARNING)
handler_err.setFormatter(log_formatter)
root.addHandler(handler_err)
! ase db ../fine-tuning/oxides.db
id| age|formula|calculator| energy|natoms| fmax|pbc| volume|charge|   mass
 1|224m|Sn2O4  |unknown   |-41.359|     6|0.045|TTT| 64.258| 0.000|301.416
 2|224m|Sn2O4  |unknown   |-41.853|     6|0.025|TTT| 66.526| 0.000|301.416
 3|224m|Sn2O4  |unknown   |-42.199|     6|0.010|TTT| 68.794| 0.000|301.416
 4|224m|Sn2O4  |unknown   |-42.419|     6|0.006|TTT| 71.062| 0.000|301.416
 5|224m|Sn2O4  |unknown   |-42.534|     6|0.011|TTT| 73.330| 0.000|301.416
 6|224m|Sn2O4  |unknown   |-42.562|     6|0.029|TTT| 75.598| 0.000|301.416
 7|224m|Sn2O4  |unknown   |-42.518|     6|0.033|TTT| 77.866| 0.000|301.416
 8|224m|Sn2O4  |unknown   |-42.415|     6|0.010|TTT| 80.134| 0.000|301.416
 9|224m|Sn2O4  |unknown   |-42.266|     6|0.006|TTT| 82.402| 0.000|301.416
10|224m|Sn2O4  |unknown   |-42.083|     6|0.017|TTT| 84.670| 0.000|301.416
11|224m|Sn4O8  |unknown   |-81.424|    12|0.012|TTT|117.473| 0.000|602.832
12|224m|Sn4O8  |unknown   |-82.437|    12|0.005|TTT|121.620| 0.000|602.832
13|224m|Sn4O8  |unknown   |-83.147|    12|0.015|TTT|125.766| 0.000|602.832
14|224m|Sn4O8  |unknown   |-83.599|    12|0.047|TTT|129.912| 0.000|602.832
15|224m|Sn4O8  |unknown   |-83.831|    12|0.081|TTT|134.058| 0.000|602.832
16|224m|Sn4O8  |unknown   |-83.898|    12|0.001|TTT|138.204| 0.000|602.832
17|224m|Sn4O8  |unknown   |-83.805|    12|0.001|TTT|142.350| 0.000|602.832
18|224m|Sn4O8  |unknown   |-83.586|    12|0.002|TTT|146.496| 0.000|602.832
19|224m|Sn4O8  |unknown   |-83.262|    12|0.002|TTT|150.642| 0.000|602.832
20|224m|Sn4O8  |unknown   |-82.851|    12|0.013|TTT|154.788| 0.000|602.832
Rows: 295 (showing first 20)
checkpoint = get_checkpoint('GemNet-OC OC20+OC22')
from ocpmodels.common.relaxation.ase_utils import OCPCalculator
calc = OCPCalculator(checkpoint=checkpoint, trainer='forces', cpu=False)
amp: true
cmd:
  checkpoint_dir: /home/jovyan/shared-scratch/jkitchin/tutorial/ocp-tutorial/advanced/checkpoints/2023-08-01-17-18-56
  commit: 3973c79
  identifier: ''
  logs_dir: /home/jovyan/shared-scratch/jkitchin/tutorial/ocp-tutorial/advanced/logs/tensorboard/2023-08-01-17-18-56
  print_every: 100
  results_dir: /home/jovyan/shared-scratch/jkitchin/tutorial/ocp-tutorial/advanced/results/2023-08-01-17-18-56
  seed: null
  timestamp_id: 2023-08-01-17-18-56
dataset: null
gpus: 1
logger: tensorboard
model: gemnet_oc
model_attributes:
  activation: silu
  atom_edge_interaction: true
  atom_interaction: true
  cbf:
    name: spherical_harmonics
  cutoff: 12.0
  cutoff_aeaint: 12.0
  cutoff_aint: 12.0
  cutoff_qint: 12.0
  direct_forces: true
  edge_atom_interaction: true
  emb_size_aint_in: 64
  emb_size_aint_out: 64
  emb_size_atom: 256
  emb_size_cbf: 16
  emb_size_edge: 512
  emb_size_quad_in: 32
  emb_size_quad_out: 32
  emb_size_rbf: 16
  emb_size_sbf: 32
  emb_size_trip_in: 64
  emb_size_trip_out: 64
  envelope:
    exponent: 5
    name: polynomial
  extensive: true
  forces_coupled: false
  max_neighbors: 30
  max_neighbors_aeaint: 20
  max_neighbors_aint: 1000
  max_neighbors_qint: 8
  num_after_skip: 2
  num_atom: 3
  num_atom_emb_layers: 2
  num_before_skip: 2
  num_blocks: 4
  num_concat: 1
  num_global_out_layers: 2
  num_output_afteratom: 3
  num_radial: 128
  num_spherical: 7
  otf_graph: true
  output_init: HeOrthogonal
  qint_tags:
  - 1
  - 2
  quad_interaction: true
  rbf:
    name: gaussian
  regress_forces: true
  sbf:
    name: legendre_outer
  symmetric_edge_symmetrization: false
noddp: false
optim:
  batch_size: 16
  clip_grad_norm: 10
  ema_decay: 0.999
  energy_coefficient: 1
  eval_batch_size: 16
  eval_every: 5000
  factor: 0.8
  force_coefficient: 1
  load_balancing: atoms
  loss_energy: mae
  loss_force: atomwisel2
  lr_initial: 0.0005
  max_epochs: 80
  mode: min
  num_workers: 2
  optimizer: AdamW
  optimizer_params:
    amsgrad: true
  patience: 3
  scheduler: ReduceLROnPlateau
  weight_decay: 0
slurm:
  additional_parameters:
    constraint: volta32gb
  cpus_per_task: 3
  folder: /checkpoint/abhshkdz/ocp_oct1_logs/57632342
  gpus_per_node: 8
  job_id: '57632342'
  job_name: gnoc_oc22_oc20_all_s2ef
  mem: 480GB
  nodes: 8
  ntasks_per_node: 8
  partition: ocp,learnaccel
  time: 4320
task:
  dataset: oc22_lmdb
  description: Regressing to energies and forces for DFT trajectories from OCP
  eval_on_free_atoms: true
  grad_input: atomic forces
  labels:
  - potential energy
  metric: mae
  primary_metric: forces_mae
  train_on_free_atoms: true
  type: regression
trainer: forces

Split the data into train, test, val sets#

! rm -fr train.db test.db val.db
train, test, val = train_test_val_split('../fine-tuning/oxides.db')
train, test, val
(PosixPath('/home/jovyan/shared-scratch/jkitchin/tutorial/ocp-tutorial/advanced/train.db'),
 PosixPath('/home/jovyan/shared-scratch/jkitchin/tutorial/ocp-tutorial/advanced/test.db'),
 PosixPath('/home/jovyan/shared-scratch/jkitchin/tutorial/ocp-tutorial/advanced/val.db'))

Setup the training code#

We start by making the config.yml. We build this from the calculator checkpoint.

yml = generate_yml_config(checkpoint, 'config.yml',
                   delete=['slurm', 'cmd', 'logger', 'task', 'model_attributes',
                           'optim.loss_force', # the checkpoint setting causes an error
                           'dataset', 'test_dataset', 'val_dataset'],
                   update={'gpus': 1,
                           'task.dataset': 'ase_db',
                           'optim.eval_every': 1,
                           'optim.max_epochs': 5,
                           # Train data
                           'dataset.train.src': 'train.db',
                           'dataset.train.a2g_args.r_energy': True,
                           'dataset.train.a2g_args.r_forces': True,
                            # Test data - prediction only so no regression
                           'dataset.test.src': 'test.db',
                           'dataset.test.a2g_args.r_energy': False,
                           'dataset.test.a2g_args.r_forces': False,
                           # val data
                           'dataset.val.src': 'val.db',
                           'dataset.val.a2g_args.r_energy': True,
                           'dataset.val.a2g_args.r_forces': True,
                          })

yml
PosixPath('/home/jovyan/shared-scratch/jkitchin/tutorial/ocp-tutorial/advanced/config.yml')

Setup the training task#

This essentially allows several opportunities to define and override the config. You start with the base config.yml, and then via “command-line” arguments you specify changes you want to make.

The code is build around submitit, which is often used with Slurm, but also works locally.

We have to mimic the main.py setup to get the arguments and config setup. Here is a minimal way to do this.

from ocpmodels.common.flags import flags
parser = flags.get_parser()
args, args_override = parser.parse_known_args(["--mode=train",                                            
                                               "--config-yml=config.yml", 
                                               f"--checkpoint={checkpoint}",
                                               "--amp"])
args, args_override
(Namespace(mode='train', config_yml=PosixPath('config.yml'), identifier='', debug=False, run_dir='./', print_every=10, seed=0, amp=True, checkpoint='gnoc_oc22_oc20_all_s2ef.pt', timestamp_id=None, sweep_yml=None, submit=False, summit=False, logdir=PosixPath('logs'), slurm_partition='ocp', slurm_mem=80, slurm_timeout=72, num_gpus=1, distributed=False, cpu=False, num_nodes=1, distributed_port=13356, distributed_backend='nccl', local_rank=0, no_ddp=False, gp_gpus=None),
 [])

Next, we build the first stage in our config. This starts with the file config.yml, then updates it with the args

from ocpmodels.common.utils import build_config, new_trainer_context

config = build_config(args=args, args_override={})
config
{'amp': True,
 'checkpoint': 'gnoc_oc22_oc20_all_s2ef.pt',
 'dataset': {'test': {'a2g_args': {'r_energy': False, 'r_forces': False},
   'src': 'test.db'},
  'train': {'a2g_args': {'r_energy': True, 'r_forces': True},
   'src': 'train.db'},
  'val': {'a2g_args': {'r_energy': True, 'r_forces': True}, 'src': 'val.db'}},
 'gpus': 1,
 'model': {'activation': 'silu',
  'atom_edge_interaction': True,
  'atom_interaction': True,
  'cbf': {'name': 'spherical_harmonics'},
  'cutoff': 12.0,
  'cutoff_aeaint': 12.0,
  'cutoff_aint': 12.0,
  'cutoff_qint': 12.0,
  'direct_forces': True,
  'edge_atom_interaction': True,
  'emb_size_aint_in': 64,
  'emb_size_aint_out': 64,
  'emb_size_atom': 256,
  'emb_size_cbf': 16,
  'emb_size_edge': 512,
  'emb_size_quad_in': 32,
  'emb_size_quad_out': 32,
  'emb_size_rbf': 16,
  'emb_size_sbf': 32,
  'emb_size_trip_in': 64,
  'emb_size_trip_out': 64,
  'envelope': {'exponent': 5, 'name': 'polynomial'},
  'extensive': True,
  'forces_coupled': False,
  'max_neighbors': 30,
  'max_neighbors_aeaint': 20,
  'max_neighbors_aint': 1000,
  'max_neighbors_qint': 8,
  'name': 'gemnet_oc',
  'num_after_skip': 2,
  'num_atom': 3,
  'num_atom_emb_layers': 2,
  'num_before_skip': 2,
  'num_blocks': 4,
  'num_concat': 1,
  'num_global_out_layers': 2,
  'num_output_afteratom': 3,
  'num_radial': 128,
  'num_spherical': 7,
  'otf_graph': True,
  'output_init': 'HeOrthogonal',
  'qint_tags': [1, 2],
  'quad_interaction': True,
  'rbf': {'name': 'gaussian'},
  'regress_forces': True,
  'sbf': {'name': 'legendre_outer'},
  'symmetric_edge_symmetrization': False},
 'noddp': False,
 'optim': {'batch_size': 16,
  'clip_grad_norm': 10,
  'ema_decay': 0.999,
  'energy_coefficient': 1,
  'eval_batch_size': 16,
  'eval_every': 1,
  'factor': 0.8,
  'force_coefficient': 1,
  'load_balancing': 'atoms',
  'loss_energy': 'mae',
  'lr_initial': 0.0005,
  'max_epochs': 5,
  'mode': 'min',
  'num_workers': 2,
  'optimizer': 'AdamW',
  'optimizer_params': {'amsgrad': True},
  'patience': 3,
  'scheduler': 'ReduceLROnPlateau',
  'weight_decay': 0},
 'task': {'dataset': 'ase_db'},
 'trainer': 'forces',
 'mode': 'train',
 'identifier': '',
 'timestamp_id': None,
 'seed': 0,
 'is_debug': False,
 'run_dir': './',
 'print_every': 10,
 'cpu': False,
 'submit': False,
 'summit': False,
 'local_rank': 0,
 'distributed_port': 13356,
 'world_size': 1,
 'distributed_backend': 'nccl',
 'gp_gpus': None}

Run the training task#

It is still annoying that if your output is too large the notebook will not be able to be saved. On the other hand, it is annoying to simply capture the output.

We are able to redirect most logging to a file above, but not all of it. The link below will open the file in a browser, and the subsequent cell captures all residual output. We do not need any of that, so it is ultimately discarded.

Alternatively, you can open a Terminal and use tail -f out.txt to see the progress.

from IPython.display import display, FileLink
display(FileLink('out.txt'))
with new_trainer_context(config=config, args=args) as ctx:
    config = ctx.config
    task = ctx.task
    trainer = ctx.trainer
    task.setup(trainer)
    task.run()
amp: true
cmd:
  checkpoint_dir: ./checkpoints/2023-08-01-17-42-24
  commit: 3973c79
  identifier: ''
  logs_dir: ./logs/tensorboard/2023-08-01-17-42-24
  print_every: 10
  results_dir: ./results/2023-08-01-17-42-24
  seed: 0
  timestamp_id: 2023-08-01-17-42-24
dataset:
  a2g_args:
    r_energy: true
    r_forces: true
  src: train.db
gpus: 1
logger: tensorboard
model: gemnet_oc
model_attributes:
  activation: silu
  atom_edge_interaction: true
  atom_interaction: true
  cbf:
    name: spherical_harmonics
  cutoff: 12.0
  cutoff_aeaint: 12.0
  cutoff_aint: 12.0
  cutoff_qint: 12.0
  direct_forces: true
  edge_atom_interaction: true
  emb_size_aint_in: 64
  emb_size_aint_out: 64
  emb_size_atom: 256
  emb_size_cbf: 16
  emb_size_edge: 512
  emb_size_quad_in: 32
  emb_size_quad_out: 32
  emb_size_rbf: 16
  emb_size_sbf: 32
  emb_size_trip_in: 64
  emb_size_trip_out: 64
  envelope:
    exponent: 5
    name: polynomial
  extensive: true
  forces_coupled: false
  max_neighbors: 30
  max_neighbors_aeaint: 20
  max_neighbors_aint: 1000
  max_neighbors_qint: 8
  num_after_skip: 2
  num_atom: 3
  num_atom_emb_layers: 2
  num_before_skip: 2
  num_blocks: 4
  num_concat: 1
  num_global_out_layers: 2
  num_output_afteratom: 3
  num_radial: 128
  num_spherical: 7
  otf_graph: true
  output_init: HeOrthogonal
  qint_tags:
  - 1
  - 2
  quad_interaction: true
  rbf:
    name: gaussian
  regress_forces: true
  sbf:
    name: legendre_outer
  symmetric_edge_symmetrization: false
noddp: false
optim:
  batch_size: 16
  clip_grad_norm: 10
  ema_decay: 0.999
  energy_coefficient: 1
  eval_batch_size: 16
  eval_every: 1
  factor: 0.8
  force_coefficient: 1
  load_balancing: atoms
  loss_energy: mae
  lr_initial: 0.0005
  max_epochs: 5
  mode: min
  num_workers: 2
  optimizer: AdamW
  optimizer_params:
    amsgrad: true
  patience: 3
  scheduler: ReduceLROnPlateau
  weight_decay: 0
slurm: {}
task:
  dataset: ase_db
test_dataset:
  a2g_args:
    r_energy: false
    r_forces: false
  src: test.db
trainer: forces
val_dataset:
  a2g_args:
    r_energy: true
    r_forces: true
  src: val.db
/home/jovyan/shared-scratch/jkitchin/tutorial/ocp-tutorial/fine-tuning/ocp/ocpmodels/datasets/ase_datasets.py:108: UserWarning: Supplied sid is not numeric (or missing). Using dataset indices instead.
  warnings.warn(
device 0: 100%|██████████| 2/2 [00:00<00:00,  4.65it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.31it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  4.64it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  4.78it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.47it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  4.62it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.08it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.40it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.11it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.08it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.18it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  4.73it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.43it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.19it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.50it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.56it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.25it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.08it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.02it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  4.93it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.01it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.31it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.53it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.29it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  4.50it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.34it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.30it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  4.58it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.17it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.18it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.75it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.88it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  2.56it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.33it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.04it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.10it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  4.66it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.65it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  4.86it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.16it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.44it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  4.99it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.06it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.08it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  4.87it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.38it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  6.25it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  3.52it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.56it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  4.80it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.21it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.35it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.36it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.39it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  4.66it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.43it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.42it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  4.88it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  4.61it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.52it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.35it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  3.41it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.52it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.11it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.17it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.63it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.45it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.14it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.05it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  4.86it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  4.97it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.15it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  4.90it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  4.81it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  4.91it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.52it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.36it/s]
device 0: 100%|██████████| 2/2 [00:00<00:00,  5.95it/s]
! head out.txt
! tail out.txt
2023-08-01 17:18:26 (WARNING): Unrecognized arguments: ['symmetric_edge_symmetrization']
2023-08-01 17:18:34 (WARNING): Unable to identify OCP trainer, defaulting to `forces`. Specify the `trainer` argument into OCPCalculator if otherwise.
2023-08-01 17:18:34 (WARNING): Unrecognized arguments: ['symmetric_edge_symmetrization']
2023-08-01 17:18:38 (WARNING): Unrecognized arguments: ['symmetric_edge_symmetrization']
2023-08-01 17:18:41 (WARNING): Model gradient logging to tensorboard not yet supported.
pretrained checkpoint, this is the correct behavior. To disable this, delete `scale_dict` from the checkpoint. 
2023-08-01 17:18:34 (INFO): Loading dataset: oc22_lmdb
2023-08-01 17:18:34 (INFO): Loading model: gemnet_oc
2023-08-01 17:18:37 (INFO): Loaded GemNetOC with 38864438 parameters.
2023-08-01 17:18:37 (INFO): Loading checkpoint from: gnoc_oc22_oc20_all_s2ef.pt
2023-08-01 17:20:21 (INFO): forcesx_mae: 0.0091, forcesy_mae: 0.0103, forcesz_mae: 0.0080, forces_mae: 0.0091, forces_cos: 0.0461, forces_magnitude: 0.0147, energy_mae: 0.8628, energy_force_within_threshold: 0.0000, loss: 0.8812, epoch: 4.7333
2023-08-01 17:20:22 (INFO): Evaluating on val.
2023-08-01 17:20:22 (INFO): forcesx_mae: 0.0091, forcesy_mae: 0.0103, forcesz_mae: 0.0079, forces_mae: 0.0091, forces_cos: 0.0441, forces_magnitude: 0.0146, energy_mae: 0.8308, energy_force_within_threshold: 0.0000, loss: 0.8492, epoch: 4.8000
2023-08-01 17:20:23 (INFO): Evaluating on val.
2023-08-01 17:20:24 (INFO): forcesx_mae: 0.0090, forcesy_mae: 0.0103, forcesz_mae: 0.0079, forces_mae: 0.0090, forces_cos: 0.0411, forces_magnitude: 0.0146, energy_mae: 0.7813, energy_force_within_threshold: 0.0333, loss: 0.7999, epoch: 4.8667
2023-08-01 17:20:25 (INFO): Evaluating on val.
2023-08-01 17:20:25 (INFO): forcesx_mae: 0.0090, forcesy_mae: 0.0102, forcesz_mae: 0.0077, forces_mae: 0.0090, forces_cos: 0.0638, forces_magnitude: 0.0145, energy_mae: 0.7215, energy_force_within_threshold: 0.0000, loss: 0.7410, epoch: 4.9333
2023-08-01 17:20:26 (INFO): Evaluating on val.
2023-08-01 17:20:26 (INFO): forcesx_mae: 0.0089, forcesy_mae: 0.0101, forcesz_mae: 0.0076, forces_mae: 0.0089, forces_cos: 0.0619, forces_magnitude: 0.0145, energy_mae: 0.6652, energy_force_within_threshold: 0.0000, loss: 0.6855, epoch: 5.0000
2023-08-01 17:20:27 (INFO): Total time taken: 105.35927224159241

Now, you are all set to carry on with what ever subsequent analysis you want to do.