240 likes | 282 Views
CRISP. WP18, High-speed data recording Krzysztof Wrona, European XFEL CERN, 23 Sep 2013. High-speed Data Recording. Objectives: “High-speed recording of data to permanent storage and archive” “Optimized and secure access to data using standard protocols” Partners:
E N D
CRISP WP18, High-speed data recording Krzysztof Wrona, European XFEL CERN, 23 Sep 2013
High-speed Data Recording Objectives: • “High-speed recording of data to permanent storage and archive” • “Optimized and secure access to data using standard protocols” Partners: DESY, ESRF, ESS, European XFEL, GANIL, ILL, Univ. Cambridge
General status • Milestone MS21 originally planned for month 24 • Architecture document as the base for implementation of the prototype system • Preparation for the document ongoing. • Current estimation: delayed by 1-2 months
Proposed architecture • Proposed architecture consists of multiple layers • Actual implementations may vary between facilities due to specific requirements and restrictions • Additional layers can be added • Some layers may be skipped
1 Detectors or dedicated electronic devices 2 Real time data processing, data aggregation from different sources and formatting 3 Local buffer as a temporary data storage 4 Central storage system 5 Data archive 6 Data and experiment monitoring 7 Data pre-processing 8 Data analysis, data export services 9 Online shared scratch disk space 10 Offline shared scratch disk space Proposed architecture 1. Detectors/Electronics A Requires further discussion between WP18 partners 2. PC layer C 6.Monitoring cluster B D 9. Shared disk space 3. Online data cache K E 7.Online Computing cluster G F 4. Central storage K I 8. Offline Computing cluster 10. Shared disk space H J 5. Archive K
1 Detectors or dedicated electronic devices 2 Real time data processing, data aggregation from different sources and formatting 3 Local buffer as a temporary data storage 4 Central storage system 5 Data archive 6 Data and experiment monitoring 7 Data pre-processing 8 Data analysis 9 Online shared scratch disk space 10 Offline shared scratch disk space Proposed architecture 1. Detectors/Electronics A 2. PC layer C 6.Monitoring cluster B D 9. Shared disk space 3. Online data cache K E 7.Online Computing cluster G F 4. Central storage K I 8. Offline Computing cluster 10. Shared disk space H J 5. Archive K
Data is sent from detectors (1) and received on the PC layer (2) Data is sent over high speed network, i.e. using multiple 10GE links. Received data are then aggregated, processed and formatted on the PC layer At this stage data processing may alter the data content before it becomes data is persistent Proposed architecture 1. Detectors/Electronics A 2. PC layer C 6.Monitoring cluster B D 9. Shared disk space 3. Online data cache K E 7.Online Computing cluster G F 4. Central storage K I 8. Offline Computing cluster 10. Shared disk space H J 5. Archive K
10GE data transfer • 10GE network transfer for data acquired by detectors • UDP and TCP protocols have been successfully implemented and tested for high throughput parallel data streams • http protocol has been investigated for commercial devices dumping data to files and where interoperability between different operating system is needed
Slow data Fast data Online data processing Data receivers Secondary process Shared Memory Primary process Monitoring 1:1 Design of data processing layer at European XFEL • Data receivers • Read train data and store it in local memory buffer. • UDP for big train data (2D detectors) • TCP for slow and small data • Processing pipeline • Users’ algorithms • Perform data monitoring, rejection, and analysis. • Aggregator, formatter, writer • Filter data and merge results, • Format data into HDF5 files, • Send files to data cache. Rejection 1:1 Scheduler Synchronization Analysis 1:N Data aggregator & formatter Check 1:10 HDF5 files Network writer Multicast Online data Cache & Scientific Computing
Slow data Fast data Online data processing Data receivers Secondary process Shared Memory Primary process Monitoring 1:1 • PC layer node software is divided into a primary process and one or more secondary processes. • Primary process • Performs critical tasks such as data receiving, storing, and scheduling. • Requires super-user mode • Secondary processes • Run users’ algorithms (pipeline) • Run at normal user-mode • Data exchange is done through inter-process shared memory • Scheduler • Monitors tasks and data status • Coordinates threads activities Rejection 1:1 Scheduler Synchronization Analysis 1:N Data aggregator & formatter Check 1:10 HDF5 files Network writer Multicast? Online Data Cache & Scientific Computing
B. Store data in the local buffer From the PC layer (2) data is sent to the online data cache (3). If the PC layer (2) is not implemented data is sent directly from detectors (1) to the data cache (3) Capacity of the online cache should be sufficient to allow for data storage up to several days Proposed architecture 1. Detectors/Electronics A 2. PC layer C 6.Monitoring cluster B D 9. Shared disk space 3. Online data cache K E 7.Online Computing cluster G F 4. Central storage K I 8. Offline Computing cluster 10. Shared disk space H J 5. Archive K
Online data cache • Sending single channel formatted data through 10GE interface using TCP • Storage performance: cached vs. direct IO • Tests with two types of storage systems: • 14 x 900GB 10Krpm , SAS, 2 x 6Gbps, RAID6 • 12 x 3TB 7.2Krpm, NL SAS, 2 x 6Gbps, RAID6 • Results • Achieved data rate per channel: • 1.1GB/s and 0.97GB/s, resp. • Direct IO improves performance and stability
C. Send data required for online data monitoring Results of real time data processing are sent to the monitoring cluster Monitoring system prepares data for visualization (i.e. histograms) and provides slow feedback for experimentalists Proposed architecture 1. Detectors/Electronics A 2. PC layer C 6.Monitoring cluster B D 9. Shared disk space 3. Online data cache K E 7.Online Computing cluster G F 4. Central storage K I 8. Offline Computing cluster 10. Shared disk space H J 5. Archive K
D. Send data for online processing Multicast can be used to send data from the PC layer to the online data cache and processing cluster Subset of data is sent to the online computing cluster where additional algorithms can be run without strict control on execution time. Data received on the online cluster can be stored on the shared disk space and inspected with standard tools used by the experimentalists Proposed architecture 1. Detectors/Electronics A 2. PC layer C 6.Monitoring cluster B D 9. Shared disk space 3. Online data cache K E 7.Online Computing cluster G F 4. Central storage K I 8. Offline Computing cluster 10. Shared disk space H J 5. Archive K
E. Data are pre-processed before storing in the central storage system (4). Additional filtering, data reduction and merging may be performed before registering dataset and storing in the central storage system Results of the pre-processing determines if data is useful for further analysis or it should be discarded Proposed architecture 1. Detectors/Electronics A 2. PC layer C 6.Monitoring cluster B D 9. Shared disk space 3. Online data cache K E 7.Online Computing cluster G F 4. Central storage K I 8. Offline Computing cluster 10. Shared disk space H J 5. Archive K
F. Output data from the pre-processing are stored in the online data cache This data can be stored in addition or instead of the original raw data Proposed architecture 1. Detectors/Electronics A 2. PC layer C 6.Monitoring cluster B D 9. Shared disk space 3. Online data cache K E 7.Online Computing cluster G F 4. Central storage K I 8. Offline Computing cluster 10. Shared disk space H J 5. Archive K
G. Send data to central data storage Entire dataset or selected good quality data are stored in the central system At this point all datasets are registered in the metadata catalogue Proposed architecture 1. Detectors/Electronics A 2. PC layer C 6.Monitoring cluster B D 9. Shared disk space 3. Online data cache K E 7.Online Computing cluster G F 4. Central storage K I 8. Offline Computing cluster 10. Shared disk space H J 5. Archive K
Data storage systems At DESY and XFEL • Testing dCache system as the candidate for the central data storage • dCache presents data distributed over multiple servers as a single namespace • Data is accessible using pNFS4.1 protocol • dCache implements access control list according to the NFS4 specification • dCache manages tape data archiving/restoring At SKA • Initial tests of the Lustre distributed filesystem
H. Archive data Data received in the central storage system needs to be secured for long term storage Implementation of the long term data archive is recommended Proposed architecture 1. Detectors/Electronics A 2. PC layer C 6.Monitoring cluster B D 9. Shared disk space 3. Online data cache K E 7.Online Computing cluster G F 4. Central storage K I 8. Offline Computing cluster 10. Shared disk space H J 5. Archive K
I. Read data required for offline analysis Access to data needs to be protected according to the adopted data access policy Data should be readable using standard protocols i.e. pNFS4.1 Offline analysis performed on cluster of computing nodes Proposed architecture 1. Detectors/Electronics A 2. PC layer C 6.Monitoring cluster B D 9. Shared disk space 3. Online data cache K E 7.Online Computing cluster G F 4. Central storage K I 8. Offline Computing cluster 10. Shared disk space H J 5. Archive K
J. Write processed data to the central data storage Results of user data analysis are stored in the central storage system, archived and kept according to the data policy adopted at facility Proposed architecture 1. Detectors/Electronics A 2. PC layer C 6.Monitoring cluster B D 9. Shared disk space 3. Online data cache K E 7.Online Computing cluster G F 4. Central storage K I 8. Offline Computing cluster 10. Shared disk space H J 5. Archive K
K. Scratch shared disk space for data processing Fast, short term data storage May be required for accessing intermediate data within application or between execution runs Proposed architecture 1. Detectors/Electronics A 2. PC layer C 6.Monitoring cluster B D 9. Shared disk space 3. Online data cache K E 7.Online Computing cluster G F 4. Central storage K I 8. Offline Computing cluster 10. Shared disk space H J 5. Archive K
Data storage systems • Performed initial tests of the Fraunhoferfilesystem. • Plan to use it as a scratch space for fast access required by demanding analysis applications Fhgfs setup at DESY
Summary • Preliminary architecture design exists • Require further feedback from WP18 participants • Architecture document needs to be prepared soon • Implementation of the prototype system will follow at XFEL, DESY and possibly at SKA.