Difference between revisions of "Matgen toolkit"

From wiki
Jump to: navigation, search
(Installation)
(Descriptions)
 
(18 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
== Descriptions ==
 
== Descriptions ==
 +
Matgen toolkit is a collection of Material Gene Engineering (MGE) sofrware written in C++. The core modules include the followings:
 +
# Remove Solvents
 +
# Find Space Groups
 +
# In-Cell
 +
# ICSD' Classify And Unique
 +
# CSD' Classify
 +
# Format
 +
# Splice Molecule
 +
 
=== Remove Solvents ===
 
=== Remove Solvents ===
 +
 +
==== Description ====
 +
The program is a tool to remove solvents from MOF.
 +
 +
==== Usagen ====
 +
<pre>
 +
usage: ./bin/rm_mof_solvents --cif_in=string [options] ...
 +
options:
 +
  -i, --cif_in        input MOF cif file (string)
 +
  -o, --output_path    output filepath (string [=])
 +
  -f, --force          remove solvent molecules anyway
 +
  -?, --help          print this message
 +
</pre>
 +
 +
==== Example ====
 +
<pre>
 +
$ ./bin/rm_mof_solvents -i ./examples/cod/ABAGAO.MOF_subset.cif -o ./examples/result
 +
Parsing the cif file - ABAGAO.MOF_subset.cif
 +
Getting some known resources...
 +
Building base cell...
 +
The number of bonded atom pairs is 80
 +
Looking for solvent in ABAGAO.MOF_subset.cif
 +
The calculated solvent molecule to be screened is [ H2O<known>  ]
 +
The MOF framework is  [ C14CuH13N3O4  ]
 +
Exporting result...
 +
Export file ./example/result/ABAGAO_clean.cif successfully!
 +
</pre>
 +
 
=== Find Space Groups ===
 
=== Find Space Groups ===
 +
==== Description ====
 +
The program is used to obtain the space group information in the cif file.(base on spglib)
 +
==== Usage ====
 +
<pre>
 +
usage: ./bin/find_space_groups --input=string [options] ...
 +
options:
 +
  -i, --input        input cif file name (string)
 +
  -v, --version      return the version of spglib
 +
  -w, --why          this method is used to see roughly why  spglib failed
 +
  -s, --spacegroup    internatioanl space group short symbol and number are obtained as a string
 +
  -m, --symmetry      symmetry operations are obtained as a dictionary
 +
  -r, --refine        standardized crystal structure is obtained as a tuple of lattice (a 3x3 numpy array), atomic scaled positions (a numpy array of [number_of_atoms,3]), and atomic numbers (a 1D numpy array) that are symmetrized following space group type.
 +
  -p, --primitive    is found, lattice parameters (a 3x3 numpy array), scaled positions (a numpy array of [number_of_atoms,3]), and atomic numbers (a 1D numpy array) is returned.
 +
  -d, --dataset      dataset,cell and symprec;angle_tolerance;hall_number;number;choice;transformation_matrix;origin shift;wyckoffs;site_symmetry_symbols;equivalent_atoms;mapping_to_primitive;rotations and translations;pointgroup;std_lattice;std_positions;std_types;std_rotation_matrix;std_mapping_to_primitive
 +
  -c, --symmfdset    A set of crystallographic symmetry operations corresponding to hall_number is returned by a dictionary where rotation parts and translation parts are accessed by the keys rotations and translations, respectively.
 +
  -f, --spgfdset      This function allows to directly access to the space-group-type database in spglib (spg_database.c). A dictionary is returned. To specify the space group type with a specific choice, hall_number is used.
 +
  -n, --niggli        Niggli reduction is achieved using this method.
 +
  -l, --delaunay      Delaunay reduction is achieved using this method.
 +
  -k, --irrkpoints    Irreducible k-points are obtained from a sampling mesh of k-points
 +
  -?, --help          print this message
 +
</pre>
 +
==== Example ====
 +
<pre>
 +
$ ./bin/find_space_groups -i ./examples/cod/WAJZUE.cif -s
 +
Parsing the cif file - ABETIN_clean.cif
 +
Getting some known resources...
 +
The space group is: I4_1/acd 142
 +
</pre>
 +
 +
==== Problem && Solution ====
 +
error while loading shared libraries: libsymspg.so.1: cannot open shared object file: No such file or directory because the program will default to /lib64/libsymspg.so not in /lib64/. Therefore, the following commands need to be added to allow the program to find the library in the directory of the instruction.
 +
<pre>
 +
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:./include/spglib/_build
 +
</pre>
 +
spglib path please change according to the actual
 +
 
=== In-Cell ===
 
=== In-Cell ===
=== ICSD' Classify And Unique ====
+
==== Description ====
 +
This program will obtain its in-cell structure based on the space group information of the unit cell.
 +
==== Usage ====
 +
<pre>
 +
usage: ./bin/in_cell --input_path=string --output_path=string [options] ...
 +
options:
 +
  -i, --input_path    input MOF cif file (string)
 +
  -o, --output_path    output file path (string)
 +
  -?, --help          print this message
 +
</pre>
 +
==== Example ====
 +
<pre>
 +
$ ./bin/in_cell -i ./examples/cod/WAJZUE.cif -o ./examples/result
 +
Getting some known resources...
 +
Parsing the cif file - WAJZUE.cif
 +
Exporting in-cell result...
 +
Export file ./examples/result/WAJZUE_in_cell.cif successfully!
 +
</pre>
 +
 
 +
=== ICSD' Classify And Unique ===
 +
==== Description ====
 +
The program classifies ICSD cif files and removes duplicate files.
 +
classification rules - component/element type/space group/
 +
==== Usage ====
 +
<pre>
 +
usage: ./bin/ICSD_classify --input_dir=string --output_dir=string [options] ...
 +
options:
 +
  -i, --input_dir    icsd folder location (string)
 +
  -o, --output_dir    classification result export location (string)
 +
  -l, --log          print the detail log, no log by default
 +
  -?, --help          print this message
 +
</pre>
 +
==== Example ====
 +
<pre>
 +
$ ./bin/ICSD_classify -i ./examples/icsd -o ./examples/icsd_classify
 +
</pre>
 +
 
 
=== CSD' Classify ===
 
=== CSD' Classify ===
 +
==== Description ====
 +
This program is used to remove files containing metal elements, disorder molecules and known solvents from the CSD database. You can specify to exclude only certain metal elements. The result may contain two folders, the folder csd_warning indicates that the atoms in the structure are bonded to two or more parts, and the folder csd_normal indicates that the atoms in the structure will only be bonded to one part.
 +
==== Usage ====
 +
<pre>
 +
usage: ./bin/csd_classify --input_dir=string --output_dir=string [options] ...
 +
options:
 +
  -i, --input_dir    csd folder location (string)
 +
  -o, --output_dir    classification result export location (string)
 +
  -r, --remove        only remove the cif which contains special elements or special bonds(the input form likes special meatal/special bonds(Fe|Cu/Fe-O|C-O&C-H) or only input one of them, please use '/' as separators for elements and bonds) (string [=])
 +
  -k, --keep          only keep the cif which contains special elements and special bond(the input form likes special meatal/special bonds(Fe|Cu/Fe-O|C-O&C-H) or only input one of them, please use '/' as separators for elements and bonds (string [=])
 +
  -l, --log          print the detail log, no log by default
 +
  -u, --unique        remove duplicate files
 +
  -?, --help          print this message
 +
</pre>
 +
==== Example ====
 +
<pre>
 +
./bin/CSD_classify -i ./examples/csd -o ./examples/csd_classify
 +
</pre>
 +
 
=== Format ===
 
=== Format ===
 +
==== Description ====
 +
The program is used to convert the cif file into a file of another format. It supports customizing the atomic coordinates (atomic fractional coordinates / cartesian coordinates) in the conversion result and converting the molecular structure to in-cell or asymmetric mode.
 +
==== Usage ====
 +
<pre>
 +
usage: ./bin/format --input=string --output=string --type=string [options] ...
 +
options:
 +
  -i, --input        input file (string)
 +
  -o, --output        output path of the conversion result  (string)
 +
  -m, --mode          the mode of the format conversion(in-cell/asymmetric) (string [=asymmetric])
 +
  -t, --type          the type of the result format(gjf/vasp), convert format to vasp file format or gaussion format (string)
 +
  -c, --coord_type    the type of the coordinate(fract/cart), the coordinates of the atom in the conversion result are fractional coordinates or cartesian coordinates (string [=fract])
 +
  -?, --help          print this message
 +
</pre>
 +
==== Example ====
 +
convert to gaussion file
 +
<pre>
 +
$ ./bin/format -i ./examples/cod/WAJZUE.cif -o ./examples/result -t gjf
 +
</pre>
 +
convert to vasp file
 +
<pre>
 +
$ ./bin/format -i ./examples/cod/WAJZUE.cif -o ./examples/result -t gjf
 +
</pre>
 +
 
=== Splice Molecule ===
 
=== Splice Molecule ===
 +
==== Description ====
 +
The program is used to splice A and B molecules according to specified atoms.
 +
==== Usage ====
 +
<pre>
 +
usage: ./bin/splice_molecule --molecule_a=string --molecule_b=string --output=string --type=string --connect_a=int --connect_b=int [options] ...
 +
options:
 +
  -a, --molecule_a    path of the molecule A (string)
 +
  -b, --molecule_b    path of the molecule B (string)
 +
  -o, --output        the output path (string)
 +
  -t, --type          the type of the result format(gjf/xyz), convert format to gaussion format or xyz format (string)
 +
  -i, --connect_a    the serial number of connect site in molecule A (int)
 +
  -j, --connect_b    the serial number of connect site in molecule B (int)
 +
  -?, --help          print this message
 +
</pre>
 +
==== Example ====
 +
<pre>
 +
$ ./bin/splice_molecule -a ./examples/mol/molecule-A-label.mol -b ./examples/mol/molecule-B-label.mol -i 31 -j 7 -t gjf -o ./examples/mol
 +
</pre>

Latest revision as of 13:34, 14 February 2020

Descriptions

Matgen toolkit is a collection of Material Gene Engineering (MGE) sofrware written in C++. The core modules include the followings:

  1. Remove Solvents
  2. Find Space Groups
  3. In-Cell
  4. ICSD' Classify And Unique
  5. CSD' Classify
  6. Format
  7. Splice Molecule

Remove Solvents

Description

The program is a tool to remove solvents from MOF.

Usagen

usage: ./bin/rm_mof_solvents --cif_in=string [options] ...
options:
  -i, --cif_in         input MOF cif file (string)
  -o, --output_path    output filepath (string [=])
  -f, --force          remove solvent molecules anyway
  -?, --help           print this message

Example

$ ./bin/rm_mof_solvents -i ./examples/cod/ABAGAO.MOF_subset.cif -o ./examples/result
Parsing the cif file - ABAGAO.MOF_subset.cif
Getting some known resources...
Building base cell...
The number of bonded atom pairs is 80
Looking for solvent in ABAGAO.MOF_subset.cif
The calculated solvent molecule to be screened is [ H2O<known>  ]
The MOF framework is  [ C14CuH13N3O4  ]
Exporting result...
Export file ./example/result/ABAGAO_clean.cif successfully!

Find Space Groups

Description

The program is used to obtain the space group information in the cif file.(base on spglib)

Usage

usage: ./bin/find_space_groups --input=string [options] ...
options:
  -i, --input         input cif file name (string)
  -v, --version       return the version of spglib
  -w, --why           this method is used to see roughly why  spglib failed
  -s, --spacegroup    internatioanl space group short symbol and number are obtained as a string
  -m, --symmetry      symmetry operations are obtained as a dictionary
  -r, --refine        standardized crystal structure is obtained as a tuple of lattice (a 3x3 numpy array), atomic scaled positions (a numpy array of [number_of_atoms,3]), and atomic numbers (a 1D numpy array) that are symmetrized following space group type.
  -p, --primitive     is found, lattice parameters (a 3x3 numpy array), scaled positions (a numpy array of [number_of_atoms,3]), and atomic numbers (a 1D numpy array) is returned.
  -d, --dataset       dataset,cell and symprec;angle_tolerance;hall_number;number;choice;transformation_matrix;origin shift;wyckoffs;site_symmetry_symbols;equivalent_atoms;mapping_to_primitive;rotations and translations;pointgroup;std_lattice;std_positions;std_types;std_rotation_matrix;std_mapping_to_primitive
  -c, --symmfdset     A set of crystallographic symmetry operations corresponding to hall_number is returned by a dictionary where rotation parts and translation parts are accessed by the keys rotations and translations, respectively.
  -f, --spgfdset      This function allows to directly access to the space-group-type database in spglib (spg_database.c). A dictionary is returned. To specify the space group type with a specific choice, hall_number is used.
  -n, --niggli        Niggli reduction is achieved using this method.
  -l, --delaunay      Delaunay reduction is achieved using this method.
  -k, --irrkpoints    Irreducible k-points are obtained from a sampling mesh of k-points
  -?, --help          print this message

Example

$ ./bin/find_space_groups -i ./examples/cod/WAJZUE.cif -s
Parsing the cif file - ABETIN_clean.cif
Getting some known resources...
The space group is: I4_1/acd 142

Problem && Solution

error while loading shared libraries: libsymspg.so.1: cannot open shared object file: No such file or directory because the program will default to /lib64/libsymspg.so not in /lib64/. Therefore, the following commands need to be added to allow the program to find the library in the directory of the instruction.

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:./include/spglib/_build

spglib path please change according to the actual

In-Cell

Description

This program will obtain its in-cell structure based on the space group information of the unit cell.

Usage

usage: ./bin/in_cell --input_path=string --output_path=string [options] ...
options:
  -i, --input_path     input MOF cif file (string)
  -o, --output_path    output file path (string)
  -?, --help           print this message

Example

$ ./bin/in_cell -i ./examples/cod/WAJZUE.cif -o ./examples/result
Getting some known resources...
Parsing the cif file - WAJZUE.cif
Exporting in-cell result...
Export file ./examples/result/WAJZUE_in_cell.cif successfully!

ICSD' Classify And Unique

Description

The program classifies ICSD cif files and removes duplicate files. classification rules - component/element type/space group/

Usage

usage: ./bin/ICSD_classify --input_dir=string --output_dir=string [options] ...
options:
  -i, --input_dir     icsd folder location (string)
  -o, --output_dir    classification result export location (string)
  -l, --log           print the detail log, no log by default
  -?, --help          print this message

Example

$ ./bin/ICSD_classify -i ./examples/icsd -o ./examples/icsd_classify

CSD' Classify

Description

This program is used to remove files containing metal elements, disorder molecules and known solvents from the CSD database. You can specify to exclude only certain metal elements. The result may contain two folders, the folder csd_warning indicates that the atoms in the structure are bonded to two or more parts, and the folder csd_normal indicates that the atoms in the structure will only be bonded to one part.

Usage

usage: ./bin/csd_classify --input_dir=string --output_dir=string [options] ...
options:
  -i, --input_dir     csd folder location (string)
  -o, --output_dir    classification result export location (string)
  -r, --remove        only remove the cif which contains special elements or special bonds(the input form likes special meatal/special bonds(Fe|Cu/Fe-O|C-O&C-H) or only input one of them, please use '/' as separators for elements and bonds) (string [=])
  -k, --keep          only keep the cif which contains special elements and special bond(the input form likes special meatal/special bonds(Fe|Cu/Fe-O|C-O&C-H) or only input one of them, please use '/' as separators for elements and bonds (string [=])
  -l, --log           print the detail log, no log by default
  -u, --unique        remove duplicate files
  -?, --help          print this message

Example

./bin/CSD_classify -i ./examples/csd -o ./examples/csd_classify

Format

Description

The program is used to convert the cif file into a file of another format. It supports customizing the atomic coordinates (atomic fractional coordinates / cartesian coordinates) in the conversion result and converting the molecular structure to in-cell or asymmetric mode.

Usage

usage: ./bin/format --input=string --output=string --type=string [options] ...
options:
  -i, --input         input file (string)
  -o, --output        output path of the conversion result  (string)
  -m, --mode          the mode of the format conversion(in-cell/asymmetric) (string [=asymmetric])
  -t, --type          the type of the result format(gjf/vasp), convert format to vasp file format or gaussion format (string)
  -c, --coord_type    the type of the coordinate(fract/cart), the coordinates of the atom in the conversion result are fractional coordinates or cartesian coordinates (string [=fract])
  -?, --help          print this message

Example

convert to gaussion file

$ ./bin/format -i ./examples/cod/WAJZUE.cif -o ./examples/result -t gjf

convert to vasp file

$ ./bin/format -i ./examples/cod/WAJZUE.cif -o ./examples/result -t gjf

Splice Molecule

Description

The program is used to splice A and B molecules according to specified atoms.

Usage

usage: ./bin/splice_molecule --molecule_a=string --molecule_b=string --output=string --type=string --connect_a=int --connect_b=int [options] ...
options:
  -a, --molecule_a    path of the molecule A (string)
  -b, --molecule_b    path of the molecule B (string)
  -o, --output        the output path (string)
  -t, --type          the type of the result format(gjf/xyz), convert format to gaussion format or xyz format (string)
  -i, --connect_a     the serial number of connect site in molecule A (int)
  -j, --connect_b     the serial number of connect site in molecule B (int)
  -?, --help          print this message

Example

$ ./bin/splice_molecule -a ./examples/mol/molecule-A-label.mol -b ./examples/mol/molecule-B-label.mol -i 31 -j 7 -t gjf -o ./examples/mol