This shows you the differences between two versions of the page.

Last revision Both sides next revision | |||

x-ray_diffraction_pattern_classification_using_convolutional_neural_networks_and_some_brutal_force [2016/08/05 09:05] zhiyuan1 created |
x-ray_diffraction_pattern_classification_using_convolutional_neural_networks_and_some_brutal_force [2016/08/05 09:58] zhiyuan1 |
||
---|---|---|---|

Line 1: | Line 1: | ||

This article discusses a simple experiment on classifying experimental X-ray diffraction patterns. Note that methods described here are not polished, so anyone following the steps below should double-check the results. | This article discusses a simple experiment on classifying experimental X-ray diffraction patterns. Note that methods described here are not polished, so anyone following the steps below should double-check the results. | ||

+ | //Working Environment: CentOS 6.7; Python 3.5.2 in Anaconda3; Theano 0.8.2; \\ | ||

+ | Author: Zhiyuan Yang zyang96@ucla.edu // | ||

==== Data Preparation and Templates Creation ==== | ==== Data Preparation and Templates Creation ==== | ||

Suppose initial experiment data are stored in a hdf5 file, say ''data.h5''. Within it lies a single 3-dimensional dataset ''patterns''. | Suppose initial experiment data are stored in a hdf5 file, say ''data.h5''. Within it lies a single 3-dimensional dataset ''patterns''. | ||

<code> | <code> | ||

- | In [2]: f = h5py.File('data.h5', 'r') \\ | + | In [2]: f = h5py.File('data.h5', 'r') |

- | In [3]: list(f.items()) \\ | + | In [3]: list(f.items()) |

- | Out[3]: [('sel_p', <HDF5 dataset "patterns": shape (4000, 260, 257), type "<f4">)]\\ | + | Out[3]: [('sel_p', <HDF5 dataset "patterns": shape (30000, 260, 257), type "<f4">)] |

</code> | </code> | ||

Here the 1st dimension (or dimension 0) is the number of diffraction patterns in the dataset, and the other two dimensions are | Here the 1st dimension (or dimension 0) is the number of diffraction patterns in the dataset, and the other two dimensions are | ||

- | height and width, respectively. | + | height and width, respectively. The first and most tedious step is to hand-pick a few hundred of clean single-particle hits from the dataset. \\ |

+ | (an unsupervised clustering approach can be adopted here, and was described in this article: https://www.osapublishing.org/oe/abstract.cfm?uri=oe-19-17-16542 . However, with human labor it is much more accurate, and should result in higher accuracy with the deep learning approach described latter, in return.) | ||

+ | | ||

+ | Suppose about 200 single-hit patterns are selected, and are stored in a hdf5 dataset ''clean_single_hits''. The next step is to generate more templates of sinlge-hit patterns to compare with the rest of the whole dataset. This can be achieved with some basic python routines (note again that codes here are not optimized, : | ||

+ | <code> | ||

+ | import h5py | ||

+ | import numpy as np | ||

+ | import scipy.ndimage.interpolation as sci | ||

+ | import numpy.linalg as linalg | ||

+ | from sklearn.preprocessing import scale | ||

+ | import csv | ||

+ | from operator import itemgetter | ||

+ | | ||

+ | def generate_templates(pattern): # for each pattern, generate 36 templates, each from | ||

+ | l = np.zeros([36,260,257]) # rotating the initial image about a certain degree | ||

+ | for i in range(36): | ||

+ | rotated = sci.rotate(pattern, angle=(5*i), reshape=False) | ||

+ | l[i] = rotated | ||

+ | return l | ||

+ | | ||

+ | def generate_all_templates(patterns): | ||

+ | num = len(patterns) | ||

+ | total = np.zeros([num, 36, 260, 257]) | ||

+ | for i in range(num): | ||

+ | total[i,:,:,:] = generate_templates(patterns[i]) | ||

+ | total = total.reshape([num * 36, 260, 257]) | ||

+ | return total | ||

+ | | ||

+ | def smallest_value(image, templates): | ||

+ | diffs = [] | ||

+ | num = len(templates) | ||

+ | for i in range(num): | ||

+ | diff = np.subtract(image, templates[i]) | ||

+ | diff = linalg.norm(diff) | ||

+ | diffs.append(diff) | ||

+ | diffs = sorted(diffs, key=float) | ||

+ | return float(diffs[0]) | ||

+ | </code> |

“everything that is living can be understood in terms of the jiggling and wiggling of atoms”.

and now, we want to watch atoms jiggling and wiggling.

X-rays, electrons, fluorescence light, the advances of photon sciences, together with computational modeling, are making this happen.