1 / 16

OpenCL Ch. 5~7

OpenCL Ch. 5~7. Jongeun Lee. Fall 2013. Ch. 5: OpenCL C built-in functions. work-item functions math functions integer functions common functions geometric functions relational functions synchronization functions async copy and prefetch functions

lucie
Download Presentation

OpenCL Ch. 5~7

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OpenCLCh. 5~7 Jongeun Lee Fall 2013

  2. Ch. 5: OpenCL C built-in functions • work-item functions • math functions • integer functions • common functions • geometric functions • relational functions • synchronization functions • async copy and prefetch functions • vector data load and store functions • atomic functions • miscellaneous vector functions • image functions

  3. Work-item functions • example (1-D) • global work size = 16 items • work-group size = 8 items/group • note: • mapping from global & local IDs to work-items: implementation dependent

  4. cl_intclEnqueueNDRangeKernel( cl_command_queuecommand_queue, cl_kernel kernel, cl_uintwork_dim, constsize_t *global_work_offset, constsize_t *global_work_size, constsize_t *local_work_size, cl_uintnum_events_in_wait_list, constcl_event *event_wait_list, cl_event *event) • Note: Context associated with events in event_wait_list and command_queue must be the same

  5. Functions • uintget_work_dim() • work_dim • size_tget_global_size(uintdimindx) • size_tget_global_id(uintdimindx) • size_tget_local_size(uintdimindx) • size_tget_local_id(uintdimindx) • local_work_size • size_tget_num_groups(uintdimindx) • size_tget_group_id(uintdimindx) • size_tget_global_offset(uintdimindx) • The above is useful only to kernels enqueued with clEnqueueNDRangeKernel function

  6. Synchronization functions • void barrier(cl_mem_fence_flags flags) • may be in a conditional/loop • also queues a memory fence (reads and writes) to ensure correct ordering of memory operation to local/global • flags: CLK_LOCAL_MEM_FENCE, CLK_GLOBAL_MEM_FENCE, or both

  7. memory consistency is enforced only between work-items in a work-group, not across work-groups • example • global work size: 16 items • work-group size: 8 items/group • id[7] & id[8] undetermined!

  8. Async copy & prefetch functions • event_tasync_work_group_copy • (local T* dst, const global T* src, size_t n, event_t event) • the other way too (dst <-> src) • must be encountered by all work-items in a work-group • if event != 0, return the supplied event object  event shared by async copies • event_tasync_work_group_strided_copy • with additional parameter: size_t stride • stride is for the global address space • void wait_group_events(intnum_events, event_t *event_list) • must be encountered by all work-items in a work-group • void prefetch(const global T* p, size_t n) • prefetch into global cache • no effect on functional behavior of kernel

  9. must wait before exiting!

  10. Ch. 7: Buffers & sub-buffers • memory objects • buffer: 1-D arrays of bytes • sub-buffer: 1-D view into buffer • image: 2-D or 3-D data structured array • memory objects are • allocated against a context (which may have multiple devices) • globally visible to all devices within the context • writes to memory object may not be visible by a following read • read/written by enqueuing commands to a particular device • may be blocking/non-blocking

  11. Creating (sub-)buffers • cl_memclCreateBuffer(…, cl_mem_flags flags, size_t size, void* host_ptr, …) • flags (bit-field) • CL_MEM_READ_WRITE • CL_MEM_WRITE_ONLY • CL_MEM_READ_ONLY • CL_MEM_USE_HOST_PTR • CL_MEM_ALLOC_HOST_PTR • CL_MEM_COPY_HOST_PTR • use: use the host_ptr as buffer • alloc: allocate buffer in host-accessible memory • copy: allocate and copy from the host_ptr • alloc| copy: alloc in host-accessible memory and copy

  12. alternatively, sub-buffer can be used

  13. Read/write/copy buffers • clEnqueueWriteBuffer • host memory -> buffer • clEnqueueReadBuffer • buffer -> host memory • clEnqueueCopyBuffer • buffer1 -> buffer2 • also can copy rectangular segments of a buffer using …Rectversions

  14. Map/unmap buffers • void* clEnqueueMapBuffer • maps a region of buffer into host memory • returns host pointer, which can be passed to libraries (& other functions) • clEnqueueUnmapMemObject • when buffer mapping is no longer needed

  15. Ch. 6: Programs & Kernels • program vs. kernel objects? • arguments can be passed to kernel • cl_intclSetKernelArg(cl_kernel kernel, cl_unitarg_index, size_targ_size, const void *arg_value)

More Related