Linear Regression Modelling on Epigallocatechin-3-gallate Sensor Data for Green Tea
绿茶EGCG传感器数据的线性回归模型
- 关键词:
- 来源:
- ICRCICN 会议
- 类型:
- 会议论文
- 语种:
- 英语
- 原文发布日期:
- 2018-11-23
- 摘要:
- In this paper, linear regression machine learning techniques are applied to determine the quality of green tea samples. The data set is obtained by applying Differential Pulse Voltammetry (DPV) on green tea samples using Epigallocatechin-3-gallate (EGCG) specific sensor based on Molecular Imprinted Polymer (MIP) technique. Multiple linear regression models have been developed using this dataset that gives more hidden insight of the dataset and helps to find the input feature importance out of it. Regularization techniques are applied on linear regression like Ridge regression (L2 Penalty), Lasso regression (L1 Penalty) and ElasticNet regression (combination of L1 and L2 Penalty) considered to reduce overfitting of the model and to provide better prediction. The variation of cross validation score vs regularization parameter for different regularized techniques of linear regression are also taken under consideration and best value of the regularization parameter is calculated to develop the model for getting better prediction with high accuracy. From the result obtained from model metrics, a clear picture is portrayed how lasso regression performs better than ridge regression for this dataset and eliminates the less important features to develop the model as sparsity can be useful in practice if we have a high dimensional dataset with many features that are not effective for modelling. The beauty of ElasticNet Regression model is also highlighted how both L1 and L2 penalty go hand in hand to give prediction at a high accuracy.
- 所属专题:
- 60