

CircleCI codecov


This repository hosts the adsorbate-catalyst input generation workflow used in the Open Catalyst Project.


The easiest way to install prerequisites is via conda. After installing conda, run the following commands:


The codebase supports the following workflow to generate adsorbate-catalyst input configurations.

  1. Initialize a bulk:
    • By providing an atoms object, or
    • By bulk_id (e.g. mp-30), or
    • By its index in the database, or
    • By selecting randomly.
  2. Initialize an adsorbate:
    • By providing an atoms object, or
    • By its SMILES string (e.g. *H), or
    • By its index in the database, or
    • By selecting randomly.
  3. Enumerate slabs from the Bulk class.
    This internally uses pymatgen.core.surface.SlabGenerator and supports the following:
    • All slabs up to a specified miller index, or
    • A random slab among those enumerated by the previous method, or
    • A specific miller index.
  4. Place the adsorbate on the slab.
    This broadly has two steps – identifying a binding site on the surface of the slab, and orienting the adsorbate before placing it at that site. We use custom code inspired by pymatgen to do this. There are 3 modes: heuristic, random, and random_site_heuristic_placement.
    • Identifying a binding site: First, a Delaunay meshgrid is constructed with surface atoms as nodes. For heuristic, the sites considered are on the node (atop), between 2 nodes (bridge) and in the center of the triangle (hollow). For random and random_site_heuristic_placement, positions of the sites are uniformly randomly sampled along the Delaunay triangles.
    • Adsorbate orientation: For heuristic and random_site_heuristic_placement, the adsorbate is uniformly randomly rotated around the z direction, and provided a slight wobble around x and y, which amounts to randomized tilt within a certain cone around the north pole. For random, the adsorbate is uniformly randomly rotated about its center of mass along all directions.
    • Binding atom: The adsorbate database includes information about which atoms are expected to bind. For heuristic and random_site_heuristic_placement, the binding atom of the adsorbate is placed at the site, whereas for random the center of mass of the adsorbate is placed at the site.

Workflow image


Here is a simple example using the ocdata workflow to place CO on Cu (1,1,1):

bulk_src_id = "mp-30"
adsorbate_smiles = "*CO"

bulk = Bulk(bulk_src_id_from_db=bulk_src_id, bulk_db_path="your-path-here.pkl")
adsorbate = Adsorbate(adsorbate_smiles_from_db=adsorbate_smiles, adsorbate_db_path="your-path-here.pkl")
slabs = Slab.from_bulk_get_specific_millers(bulk=bulk, specific_millers=(1,1,1))

# Perform heuristic placements
heuristic_adslabs = AdsorbateSlabConfig(slabs[0], adsorbate, mode="heuristic")

# Perform random site, heuristic placements
random_adslabs = AdsorbateSlabConfig(slabs[0], adsorbate, mode="random_site_heuristic_placement", num_sites=100)

If you want to use a bulk and/or adsorbate that is not in the database here, you may supply your own ase.Atoms object:

bulk = Bulk(bulk_atoms=your_adsorbate_atoms)
adsorbate = Adsorbate(adsorbate_atoms=your_adsorbate_atoms)
slabs = Slab.from_bulk_get_all_slabs(bulk)

# Perform fully random placements
random_adslabs = AdsorbateSlabConfig(slabs[0], adsorbate, mode="random", num_sites=100)

If you would like to randomly choose a bulk, adsorbate, and slab:

bulk = Bulk()
adsorbate = Adsorbate()
slab = Slab.from_bulk_get_random_slab(bulk)

# Perform fully random placements
random_adslabs = AdsorbateSlabConfig(slab, adsorbate, mode="random", num_sites=100)

StructureGenerator API

We also provide a StructureGenerator helper class that wraps the core functionality described above for creating bulk/slab/adsorbate objects, and writing vasp input files and metadata for multiple placements of the adsorbate on the slab. There are a number of options to configure input generation to suit different usecases. We list a few examples here.

Command Line Args

Input files:

Bulk / Slab / Adsorbate specification

Option 1: provide indices. All three must be provided to generate adsorbate-slab configurations, otherwise only slab enumeration will be performed.

Option 2: provide a set of indices (one of the following)

Slab enumeration

Adsorbate Placement

Multiprocessing, when given a file of indices



python \
  --bulk_db databases/pkls/bulks.pkl \
  --adsorbate_db databases/pkls/adsorbates.pkl  \
  --output_dir outputs/ \
  --adsorbate_index 0 \
  --bulk_index 0 \
  --surface_index 0 \
python \
  --bulk_db databases/pkls/bulks.pkl \
  --adsorbate_db databases/pkls/adsorbates.pkl  \
  --indices_file your_index_file.txt \
  --seed 0 \
  --random_placements \
  --random_sites 100

Databases for bulks and adsorbates


A database of bulk materials taken from existing databases (i.e. Materials Project) and relaxed with consistent RPBE settings may be found in ocdata/databases/pkls/bulks.pkl. To preview what bulks are available, view the corresponding mapping between indices and bulks (bulk id and composition):


A database of adsorbates may be found in ocdata/databases/pkls/adsorbates.pkl. Alternatively, it may be downloaded using the following link: The latest version is (MD5 checksum: 975e00a62c7b634b245102e42167b3fb). To preview what adsorbates are available, view the corresponding mapping between indices and adsorbates (SMILES):

Previous snapshots of the codebase


ocdata is released under the MIT license.


If you use this codebase in your work, please consider citing:

    author = {Chanussot*, Lowik and Das*, Abhishek and Goyal*, Siddharth and Lavril*, Thibaut and Shuaibi*, Muhammed and Riviere, Morgane and Tran, Kevin and Heras-Domingo, Javier and Ho, Caleb and Hu, Weihua and Palizhati, Aini and Sriram, Anuroop and Wood, Brandon and Yoon, Junwoong and Parikh, Devi and Zitnick, C. Lawrence and Ulissi, Zachary},
    title = {Open Catalyst 2020 (OC20) Dataset and Community Challenges},
    journal = {ACS Catalysis},
    year = {2021},
    doi = {10.1021/acscatal.0c04525},

The Open Catalyst 2020 (OC20) and Open Catalyst 2022 (OC22) datasets are licensed under a Creative Commons Attribution 4.0 License.