In a method of operating a neural network device, a plurality of consecutive input data is received by an input layer. A delta data is generated by the input layer based on a difference between a current input data and a previous input data. A first current feature is generated by a first linear layer based on a first delta feature generated by performing a first linear operation on the delta data and a first previous feature. A second delta feature is generated by a first nonlinear layer based on a second current feature generated by performing a first nonlinear operation on the first current feature and a second previous feature. A third current feature is generated by a second linear layer based on a third delta feature generated by performing a second linear operation on the second delta feature and a third previous feature.