150 likes | 288 Views
This presentation delves into the development experiences with Black Cloud OS, an experimental operating system derived from Singularity OS. Emphasizing a non-cache coherent environment, the talk outlines challenges related to debugging, porting, and performance on the SCC hardware. Highlighting the unique design for message passing and the implications of cache coherency, it presents insights into configuring the system, managing resources, and achieving effective inter-component communication. Ideal for researchers and practitioners interested in operating system innovations and experimental computing architectures.
E N D
SCC Development Experiences Alexey Pakhunov /XCG, Microsoft Research/ alexeypa@microsoft.com March 30th, 2011
Overview • Black Cloud OS: • A fork of Singularity OS • Our playground for experimenting with message passing in non-cache coherent environment • This presentation covers only our development experiences on the SCC • Submission of the paper is on its way
What is Singularity? • A quote from Singularity home page: “A research operating system prototype, extending programming languages, and developing new techniques and tools for specifying and verifying program behavior” • Written in managed code • Some Assembler and C++ in the boot loader and kernel • IPC and inter-component communications are based on passing messages
Our setup Tile Tile Tile Tile Tile Tile R R R R R R Tile Tile Tile Tile Tile Tile DDR3 MC DDR3 MC R R R R R R Tile Tile Tile Tile Tile Tile PCI-E R R R R R R Management Console (Linux) sccTcpServer/mceGui TCP/IP Desktop PC (Windows) RcLoader.Net, KdProxy, WinDbg, etc. Tile Tile Tile Tile Tile Tile DDR3 MC DDR3 MC R R R R R R VRC System Interface
RcLoader.Net • Configuration • Generates the system memory map • Configures the SCC registers • Uploads the boot loader and OS images • Supports manual editing of the SCC configuration • Debugging • Allows inspecting the memory and configuration registers
The memory map Shared memory (OS image, the initial jmp) 0xFC000000 – 0xFFFFFFFF Unused Shared memory buffers (256KB per core) 0xC0000000 – 0xC3FFFFFF Configuration space 0xA0000000 – 0xB7FFFFFF MPB (16KB per tile) 0x80000000 – 0x97FFFFFF Unused Private Memory (336 MB - 1360 MB) 0x00000000 - up to 0x54FFFFFF
Debugging challenges • No serial port or console • Memory at 0xb8000 is the console buffer • I/O redirection doesn’t work as expected • Execution of IN or OUT instruction effectively halts the core and sccTcpServer • Serial KD transport is emulated • A couple of ring buffers on the SCC side • KdProxy.exe exposes a named pipe interface for the debugger
Porting challenges • No BIOS • The system memory map is patched directly in the boot loader • No standard devices • Local APIC is used instead of i8254 timer and PIC • No RTC clock • No modern instruction supported • Context handling code was updated due to lack of MMX • 32bit flavor of Singularity uses only x87 for floating point calculations • Bartok compiler was patched due to lack of CMOV instructions
Experimental hardware • Turning on MPB bypass bit causes a race causing memory corruptions • Minus three days of debugging :-) • We couldn’t take advantage of fast MPB access • Large pages cannot be used together with MPB • Singularity uses large pages to create the identity mapping spanning 4GB
Interface • A telnet connection to each core • The same serial transport emulation via KdProxy.exe was used
Cache coherency matters • A read-only OS image is shared among all cores • Message passing code uses MPB-mapped buffers and CL1FLUSH-aware memcpy() • Large shared memory storage is accessible via dynamically remapped LUTs • R/W access is possible with proper cache flushing and/or caching settings in PTEs
Performance • Core’s memory interface bandwidth is limited • One outstanding memory operation
Performance • Memory controller bandwidth is limited
Conclusions • The SCC is an experimental platform tailored for message passing • Lack of cache coherency makes us think hard how about message passing • The chip has enough cores to play with scalability • Compare apples to apples • The cache and memory subsystems are significantly different • The SCC is super parallel, not super fast