140 likes | 148 Views
An example of a data mining project. Problem. Detect and explain faults of a continuous pulp digester Faults: drops in the output quality of the digester. Solution. A report which consists of description of analyzed data, analysis methods, results, conclusions, and
E N D
Problem • Detect and explain faults of a continuous pulp digester Faults: drops in the output quality of the digester.
Solution • A report which consists of • description of analyzed data, • analysis methods, • results, • conclusions, and • process improvement recommendations.
Problem understanding • Several sources of information: • description of process instrumentation, • documentation of digester control system, • ISO 9000 documents, • interviews of operation personnel, process engineers, researchers, and automation system vendor engineers.
Data acquisition • About 200 on-line measurements • Sampling rate 1 sample/10 minutes • Data stored in SQL-database at the mill
Data acquisition • Data acquisition procedure • a shell script run in SQL host twice a month • ftp-transfer of the data to HUT through firewall by a mill computer operator • addition of the new data files after the existing ones at HUT using shell scripts
Data acquisition Data file format: value1 checkbits1 timelabel1 value2 checkbits2 timelabel2 . . . . . . . . . valueN checkbitsN timelabelN
Basic data preparation • For each measurement channel: • check that the measurements are valid using checkbits • check using timelabels if some samples are missing; if this is the case, fill in the empty gaps with NaNs
Data survey • Visual data inspection (time series plots) revealed some problems: • some measurements didn’t work at all, • some measurements worked properly, but not all the time, • changes in production speed could be seen in most measurements, and • process tuning altered the behavior of some measurements.
Data survey • Computation of material balances provides a way to roughly estimate reliability of some sensors • Process delay from input to output of the digester about three hours • Delay between different measurements in different parts of the process had to be compensated
Data survey • In order to get reliable results, only periods with constant production speed should be analyzed
Data modeling • First, only temperature measurements in the digester sides were used • Basic idea: to estimate the movement of chips using correlations between neighboring measurements • Failed
Data modeling • Next, all available measurements were used • The measurements were reduced to the ones best depicting the state of the digester • The reduction was carried out using • process knowledge, • data visualization, and • correlation analysis.
Data modeling • During the project, a digester modeling expert was consulted • A model depicting the fault sensitivity of digester was created