A new user interface for you! Read more...

File pocl.changes of Package pocl

-------------------------------------------------------------------
Mon Oct 26 21:18:54 UTC 2015 - mardnh@gmx.de

- Update to version 0.12.git1445885653.bfc42b3:
  + Update Vecmathlib
  + Create branch release-0.12
  + Add a bunch of files to dist tarball.
  + Update HSA documentation with a note on performance
  + Clarified the HSA thanking sentence.
  + Do not set HWLOC_TOPOLOGY_FLAG_WHOLE_IO/IO_DEVICES to avoid initializing the OpenCL plugin which causes alls sorts of problems (see issue #256).
  + [cmake] When cmake supports it, use USES_TERMINAL on the check target.
  + Found an ungly way to disable the hwloc OpenCL plugin.
  + The -fno-rtti flag is apparently NOT fixed in 3.7
  + Clean up libpocl.so also in 'make clean.'
  + Require TCEMC for the fp16 test as it fails with TCE 1.12 and LLVM 3.7 for an unknown reason (see issue #259).
  + Bump the version of master branch to 0.13-pre
  + Refuse older LLVM versions than 3.6.
  + Fix issue #255 CMake wants to install pocl.icd with ICD disabled
  + Fix a missing return in test_clCreateSubDevices.c
  + Fix issue #227 - The device-advertised extensions should get a macro
  + Update test to use cl2.hpp.
  + Remove unused variables.
  + Fix several compiler warnings. Note: removed the initializer for pocl_device_ops[] because static vars should be initialized to zero automagically.
  + Revert parts of cf495ed, that caused compiler warning(error) on the ARM bot.
  + TAS: the LLVM globals should be fixed only after the instructions have been fixed.
  + Revert parts of cf495ed, that caused compiler warning(error) on the ARM bot.
  + Skip the 'local struct arrays' test case in case built with HSA because the HSA LLVM branch fails it for some reason.
  + HSA: Fix data race on the packet header write.
  + Need to add -pthread because of an issue with std::call_once that the getDefault() methods of cl2.hpp call:
  + Need to define CL_HPP_CL_1_2_DEFAULT_BUILD, otherwise cl2.hpp adds -cl-std=CL2.0 which pocl doesn't yet support.
  + Added a README.FreeBSD to address issue #263
  + Change the signature of device->ops->free()
  + Move pocl_basic_set_buffer_image_limits() to common.h ... and rename it to pocl_set_buffer_image_limits()
  + Split pocl_set_buffer_image_limits()
  + Remove custom buffer allocator.
  + Remove IBM Cell support.
  + Added a mention of the MIPSel which is new a usable target in 0.12.
  + Remove old LLVM version support in various places
  + Partially revert 7d16ba2fe2c6b04bb619c0228dffe92be9f23dd5
  + Fix some bugs/typos in common.c
  + basic.c / pthread.c : use new mem accounting alloc()/free()
  + pocl_cl.h : Forgot an update in struct pocl_device_ops
  + Update runtime API funcs for the new device->ops->free()
  + Fix a forgotten BUILD_SPU in automake files
  + Updated the HSA credits as requested.
  + Fix syntax error.
  + Use 64bits to store (7<<30)
  + Another attempt to fix shift overflowing int
  + Catch allocated buffer in pthreads. Fixes segfaults in tests.
  + Fix comparison operator in pocl_memalign_alloc_global_mem()
  + Fix issue #267 : Kernel cache is not usable after first run
  + test_kernel_cache_includes.c: a test should not write anything into the source directory.
  + Add a note regarding issue #264
  + Buildbot: mkdir the kernel cache. Add execution timeout as parameter to poclfactory.
  + Fix a bug in the clCreateSubDevices test.
  + Fix a bug in "example1" test.
  + Rename LLC to LLVM_LLC to make it consistent with the rest of variables
  + Various small clean ups
  + Fix a typo in pocl-hsa.c
  + Import upstream changes
  + Release pocl 0.12
  + Update the documentation some more.

-------------------------------------------------------------------
Tue Oct 13 17:24:42 UTC 2015 - mardnh@gmx.de

- Update to version 0.11.git1444330480.b367773:
  + README.HSA:  * Fix a dead link to a patch  * add -DLLVM_ENABLE_EH=ON to cmake command, b/c libHSAIL headers    use exceptions
  + HSA: complain & abort if we can't fill the OpenCL device attributes
  + HSA: Assign each HSA device its own HSA agent,
  + HSA: Add regions for each agent into struct pocl_hsa_device_data
  + HSA: Fix a bug in hsa_agent_get_info() call
  + HSA: Fix arguments to hsa_queue_create()
  + HSA: Check that a kernel symbol is in fact a kernel symbol :)
  + HSA: Setup kernel_packet's group/private sizes
  + HSA: dynamic local memory calculation
  + HSA: slightly tune configure.ac - change --with-hsa-runtime-headers
  + HSA: Fix minor glitches from previous commits
  + HSA: cosmetic changes (update messages etc)
  + HSA: fix enough-space check in setup_kernel_args
  + HSA: add HSAIL address space to lib/llvmopencl/TargetAddressSpaces.cc
  + pocl_llvm_api.cc: fix a debug message
  + HSA: Make HSA support buildable with CMake
  + cmake/bitcode_rules: Fixes for two bugs discovered by getting HSA support
  + CMake: Add "libLLVM.so" library name into LLVM shared lib search (unlike autotools-built LLVM which has "libLLVM-3.7.so", cmake-built LLVM has "libLLVM.so.3.7")
  + CMake: Make some paths absolute (was causing issues with out-of-tree builds that went unnoticed until now)
  + scalarwave.c: get rid of compiler deprecation warning
  + HSA: Add find_program(HSAIL_ASM) to CMake, and move the reporting of HSA variables to proper place.
  + HSA: workaround a small bug in current HSAIL LLVM
  + pocl_file_util.h: Fix a misleading comment
  + HSA: make compile() working with LLVM built without libHSAIL.a library
  + HSA: Add search for HSAILasm to configure.ac, and clean it up a bit.
  + pocl-hsa.c: Forgot an #include for WEXIT_STATUS()
  + CMake/autotools: don't add -fno-rtti for LLVM 3.7+ according to bug report, this has been fixed in 3.7
  + CMake: fix a bug in kernel/hsail64 building the wrong(host) files
  + HSA: just a few cleanups & clarifications
  + HSA: add a queue callback
  + HSA: Fix a comparison/assignment bug
  + HSA: Add some caching of compiled program / kernel dispatch data.
  + HSA: instead of using hardcoded values for some OpenCL device      attributes, use HSA runtime to get the actual values.
  + HSA: Remove redundant code
  + HSA: Fix some compiler warnings and clean up debug messages a bit
  + HSA: lib/kernel/hsail: Fix get_global_id() to use LLVM builtins,      and add a new get_global_size & get_num_groups using builtins too.
  + HSA: Fix some bugs introduced by previous commits
  + HSA: pocl_cache.c: put the kernel cache file parallel.bc into the same directory (independent of local sizes) for SPMD devices.
  + CMake: Create a new config.h only if it differs from the old config.h. This helps avoid the situation: 1) you change some CMakeLists.txt somewhere 2) you run "make" which runs cmake automagically 3) cmake rewrites config.h and the following make will pick it up    as a change of dependency for a lot of files which #include "config.h"
  + CMake: Do not recreate cl.hpp everytime default target is run.
  + HSA: workaround for a LLVM bug (gridsize intrinsic crash) in lib/kernel/hsail
  + HSA: Add HSAIL implementation for get_work_dim.c
  + CMake: Move POCL_BUILD_TIMESTAMP into a separate file        (it's making unnecessary updates to config.h,        which triggers a lot of rebuilding)
  + HSA: remove done items from the TODO list, and add a few new ones
  + CMake: only modify install-paths.h when required (avoids unnecessary rebuilds of pocl_llvm_api.cc)
  + HSA: Add an additional check to pocl_hsa_get_agents_callback()      and remove some code from init_device_infos() - there's no      point in testing device type if we throw away everything but      GPU in the get_agents_callback()
  + HSA: Fix cl_device->max_work_item_sizes (CL_DEVICE_MAX_WORK_ITEM_SIZES)      This device info actually refers to local(workgroup) size limits,      not global grid size limits
  + HSA: Add device profile info - will be required for proper memory      allocation routine. For now, puts HSA Base == OpenCL Embedded      even though they're not exactly equal.
  + HSA: minor fixes      1) destroy HSA queue on device->uninit()      2) add newlines to some debug messages
  + HSA: Correct some device info for clGetDeviceInfo()
  + autotest: remove testsuite-hsa.at, instead add "hsa" keyword to tests,           plus a helper: tools/scripts/run_hsa_tests
  + HSA: give caching structs a pocl_ prefix      (to distinguish from HSA runtime structs)
  + HSA: fix a copypaste error
  + HSA: fix comment sections in lib/kernel/hsail64 for workitem functions
  + HSA: shorter some long lines to fit into ~80 lines
  + HSA: update CMake&autotools to allow specifying path to HSAILasm
  + HSA: small fixes      1) remove testsuite-hsa.at from testsuite.at      2) correct tools/scripts/run_hsa_tests header
  + HSA: Fix a bug in configure.ac (don't fail, just disable HSA,      if HSAILasm cannot be found)
  + HSA: Add proper error handling to all HSA runtime calls.
  + HSA: Add ops->alloc_mem_obj callback, use HSA runtime mem alloc routines
  + CMake: fix a typo that is causing compilation warnings in pocl_llvm_api.cc
  + HSA: Report atomic extensions in clGetDeviceInfo() properly.
  + HSA: CMake: disable compilation of amdgcn kernel library.
  + HSA: separate native_*() OpenCL builtins into their own files.
  + HSA: replace several native_*() OpenCL builtins from host lib
  + HSA: llvm.hsail.<name>.<vector-type> intrinsics do not seem to work,      this replaces them with simpler versions (not using vectors).
  + HSA: Add native_exp() & native_log() for bases other than 2.
  + HSA: Some of the native_*() only have a FP32 intrinsics,      so for FP64 we have to fallback on the non-native function.
  + HSA: Forgot to update lib/kernel/hsail64/CMakeLists.txt
  + HSA: hsail kernel lib: Fix template include
  + include/_kernel.h: native_*() may be also available for double/half vtypes
  + HSAIL kernel library: more template macros & natives
  + HSAIL kernel library: add fabs() and floor() builtins
  + HSAIL kernel library: add fma() and mad() builtins
  + HSAIL kernel library: add mul_hi() and mad_hi() builtins
  + Recreate llvm.mem* intrinsics after address space conversion.
  + HSA: the kernel argument for local buffers is 32bit, not 64bit
  + HSA: The program/kernel cache was wrong..
  + HSAIL kernel library: add implementations of mul24/mad24    that use llvm intrinsics.
  + HSAIL kernel library: forgot to update CMakeLists.txt
  + hsail_templates.h: Remove IMPLEMENT_EXPR_V_VPV. This seems to be only useful for fract()
  + hsail_templates.h: Add integer argument to IMPLEMENT_BUILTIN_TYPE_ALL_VECS This will be required for ldexp()
  + hsail_templates.h: the IMPL_*_ was not defining the scalar. .. the <type> NAME(*) __asm() declaration which was used by vectors, did not define a function actually.
  + hsail_templates.h: split up IMPLEMENT_EXPR_TYPE_ALL_VECS .. sometimes we need to only define vectors, not scalars.
  + hsail_templates.h: convert the SMALLINT and EXPR_VVV_ALL_INTS macros to use IMPLEMENT_EXPR_VECS_AND_SCALAR.
  + hsail_templates.h: clean up the macros a bit... * Rename IMPL_*_ALL to IMPL_V_*_ALL to make it consistent * Remove a bunch of DEFINE_BUILTIN_VV_I32 -style macros, just clutter * Rename the remaining DEFINE_BUILTIN_ macros to DEFINE_LLVM_INTRIN_,   to make it clear we're using LLVM intrinsics here, not GCC builtins
  + hsail_templates.h: IMPLEMENT_EXPR_ALL() For vectorizing GCC builtins (__builtin_fabs() etc)
  + hsail_templates.h: small fixes * Rename EXPR_*_ALL_SMALLINTS to be consistent with the rest * Change DEFINE_LLVM_INTRIN_SU_INT32_ONLY(used by mul_hi)   to use conversions (EXPR_*_ALL_SMALLINTS) instead of directly   calling hsail.smulhi.{i8, i16} which seems to crash llvm.
  + HSAIL kernel library: change fma() & mad() to use the new macros. TODO: implementation/fallback for half types.
  + HSAIL kernel lib: move mul24() & mad24() to new macros
  + HSAIL kernel lib: move native_*() to new macros
  + HSAIL kernel lib: move fabs() & floor() to new macros
  + HSAIL kernel lib: move mul_hi() to new macros
  + HSAIL kernel lib: add implementation of sqrt()
  + HSAIL kernel lib: add implementation of exp()
  + HSAIL kernel library: update CMake & autotools with recently added files.
  + hsail_templates.h: Add a few comments
  + hsail_templates.h: fix a bug in one macro
  + hsail_templates.h: replace __attribute__() with _CL_OVERLOADABLE
  + hsail_templates.h: rename IMPLEMENT_BUILTIN_* macros .. to something more resembling reality.
  + hsail_templates.h: rename IMPL_V_* macros .. to more accurately reflect their purpose.
  + hsail_templates.h: fix a bug in IMPLEMENT_VECWITHSCALARS_V_VI
  + HSAIL kernel library: update mad24() & mul24()
  + HSAIL kernel library: update sqrt_default.ll  * add target so llvm-link doesn't complain  * change default rounding mode to near (const arg to instruction)
  + HSAIL kernel library: update sqrt.cl with new macros
  + HSAIL kernel library: implement trunc() and rint()
  + HSAIL kernel library: add copysign() and exp*() family
  + HSAIL kernel library: add ldexp() and ilogb()
  + HSAIL kernel library: add log*() family
  + HSAIL kernel library: add frexp() fract() atan2() pow() implementations.
  + HSAIL kernel library: add sin/cos/tan and their hyperbolic variants.
  + HSAIL kernel library: implement inverse trigonometric functions .. and their hyp. variants
  + HSA: always align kernel arguments to their natural alignment while setting up the kernel's arguments.
  + HSA: add a missed free()
  + HSAIL kernel library: add a simple hypot() implementation
  + HSAIL kernel library: add a simple cbrt() implementation
  + HSA: disable HALF support for now
  + HSA: fix automake to find new #includes
  + HSA: Ihh, completely forgot to add the vml_constants.h file.
  + HSA: Add -D_CL_DISABLE_HALF also for autotools
  + hsail_templates.h: Add some macros that i forgot to commit
  + Initial release notes for 0.12 and some CHANGES items added.
  + [mips] Update readme
  + Added a note about MIPS32.
  + Added mention of dropping older LLVM version support.
  + Docs updates:
  + Add POCL_BBVECTORIZE env to control the use of BBVectorizer.
  + HSA: native_recip.cl: there is no recip() builtin.
  + HSA: Add a few working tests from AMDSDK2.9 to hsa testsuite
  + HSA: Add the clang 3.7 patch required for HSA into tools/patches/
  + Update documentation
  + HSAIL kernel library: implement remainder()
  + HSA: remove image* builtins from the kernel library for now.
  + HSAIL kernel library: replace atomics implementations
  + Update docs of 0.12 release with some notes on HSA
  + Fix for issue #232 - "built with assertions" message
  + Fix issue #212: Invalid build options leave cl_program in invalid state
  + Clean up the atomics extensions being reported in clGetDeviceInfo()
  + Autoconf: we no longer use libHSAIL directly.
  + Fix sub-devices to work again. Fixes the AMD SDK's DeviceFission test.
  + Fix a warning message to not contain extra spaces.
  + pocl-cache: refactor pocl_cache_work_group_function_so_path
  + Fix documentation to reflect that we no longer build LLVM with libHSAIL.a
  + pocl_util.c: add a few helpers for dealing with subdevices
  + pthread.c: fix wrong C func prototype
  + pthread.c: Make pocl_pthread_run use device-max-CU-count threads
  + clBuildProgram.c: Use the new subdevice macros
  + Fix clCreateCommandQueue & clCreateContext for subdevices
  + Fix clCreateProgramWithBinary for subdevices
  + Fix clEnqueueNDRangeKernel for subdevices
  + Fix a bunch of clEnqueue* runtime APIs for subdevices.
  + Passes now.
  + Removed the failing tests from HSA suite to make it usable for regression testing.
  + [buildbot] Allow extra options to cmake.
  + [buildbot] Allow alternatives to 'make' such as 'ninja'.
  + Fix & update buildbot scripts.
  + Fix some compiler warnings * unused variable "mcc" in pocl_llvm_api.cc * const missing in clCreateProgramWithBinary * const char* in clGetProgramBuildInfo
  + Thank HSAFoundation.
  + Fix the previous commit's fix... a bit too much removing.
  + CMake: Add 3.7 variants of llvm-config to searched program names.
  + Autotools: add missing files discovered by `make distcheck`

-------------------------------------------------------------------
Sun Sep 20 10:47:03 UTC 2015 - mardnh@gmx.de

- Update to version 0.11.git1442312445.522717b:
  + Query HSA info by hsa_get_agent_info & clinfo
  + Start to support local arguments
  + Start to fix "Unsupported Sufficient Local Memorey" Error
  + Support device type by hsa_agent_get_info
  + Implement CL_PROGRAM_NUM_KERNELS
  + Workaround and a regression test for #195:
  + Break constant expressions with address space casts
  + The LLVM Asserts still seem to catch the illegal AS bitcast. Thus, the workaround works only for Release built LLVM.
  + LLVM older than 3.4 does not support AddrSpaceCast.
  + Another white list attempt. Of course 3.2 or 3.3 do not care about the issue at all.
  + Fixes #198 by declaring the frexp and modf properly.
  + Correct frexp
  + frexp not defined if VML is not enabled.
  + Correct frexp implementation
  + Correct syntax error in frexp
  + *.ll - Support barrier and fix get_global_size function pattern problem pocl_llvm_api.cc - Use spmd boolean to determine if  automatic-local pass uses or not pocl-hsa.c - Support local argument issues //TODO: Not finish yet!!
  + Correct frexp implementation and test results
  + Use double underscores for __aligned__ to avoid possible name clashes
  + Avoid nonstandard "-n" option for echo
  + Add missing include files
  + Avoid non-standard <malloc.h>
  + Correct frexp implememtation, once more
  + Locally declare called function
  + Correct return values
  + Correct include statements
  + Correct frexp implementation: Convert vector size explicitly
  + use XDG_CACHE_HOME or ~/.cache/pocl for the kernel cache
  + check return value of snprintf in pocl_cache_init_topdir
  + frexp: Correct conversion function call
  + We now kcache kernels with #includes. Fix the docs.
  + Added Lars some CREDITS
  + remove outdated LAUNDRY file
  + Extend/clean up gitignore
  + Don't install utlist.h: not needed, may conflict with existing installation
  + Use const and static if possible
  + Disable crude mkdtemp replacement for Windows
  + Remove _cl_kernel.function_name in favor of .name
  + Don't cast to or from (void *), especially not malloc returns
  + Restore (void*)-to-uintptr_t cast, committed by mistake
  + Workaround for a build log crash due to a lack of build dir hash (Issue #212). Patch by Andreas Klöckner.
  + Using HSA device list to query HSA API unsupported device features
  + Free allocated sampler argument.
  + Remove unuse instructions
  + Initialize cl_sampler kernel arguments for basic and pthread.
  + No point verifying the dominator trees as they should be outdated after the function is modified in any case, and are updated by the pass manager framework when needed the next time.
  + Type changes in LLVM 3.7.
  + LLVM 3.7 API changes + style cleanups
  + Slight rework of the (hacky) passing of the local size
  + Fixes to issues revealed by LLVM 3.7
  + Protect half and double versions with the respective #ifdefs in kernel-vecmathlib.h
  + Add a (starting point for a) testsuite for cases that work with the HSA driver. The suite is enabled in case HSA support is enabled.
  + Modify the identation style to following GNU style Fix the "unable to get group segment size" bug
  + Initialize hsa device list by compile-time initialization
  + Added Chen to CREDITS.
  + Remove unneeded function.
  + Comment improvement.
  + Updated the hsa driver to the 1.0 API. Doesn't current work due to work in progress related to the new AMD's binary format. At the moment fails when trying to import the kernel symbol from the built binary.
  + Updated README note for HSA in the master branch.
  + Fixes a bug in clGetDeviceIDs when device type isn't found
  + Add missing private header file to the install directory
  + Started with HSAIL64.
  + Added Pavan to CREDITS.
  + Detect cmake built LLVM dynlib.
  + HSA updates
  + HSAIL: example1 now passes via HSAIL compilation chain. Cleaned up the code a bit.
  + Fall back to static LLVM lib in case cmake built .so detected.
  + Disable the native AMDGCN target for now.
  + Fix for older than 3.7 LLVM.
  + Added cl2.hpp.
  + Updated tests to use new cl2.hpp.
  + Removed checks for OpenGL headers now that we're using cl2.hpp.
  + Added cl2.hpp to library_include_HEADERS.
  + Added cl2.hpp modification to CHANGES.
  + Correct implementation of "length"
  + Add test case
  + Add test case
  + Correct normalize
  + Update and activate test case
  + User events do not have queues.
  + Handle infinity in length
  + HSA: - fixed an off-by-one error in arg space calculation - other accidentally commited issues fixed
  + TAS is not needed again for x86_64 in LLVM 3.7. The printf() issue didn't produce. So, disable it again as it seems to cause new issues. This should make the issue #230 go away. At least it did for me.
  + Ensure LLVM_3_7 is defined in cmake build.
  + [cmake] Make it possible to specify the host CPU.
  + [cmake] Make it possible to override CL_DISABLE_HALF.
  + Option -Werror added for clBuildProgram
  + Add Romaric to CREDITS
  + Correct usage of CreateConstGEP2_32().
  + hwloc probes OpenCL device info at its initialization in case the OpenCL extension is enabled. This causes to printout an unimplemented property error because hwloc is used to initialize global_mem_size which it is not yet. Just put a nonzero there for now.
  + Initial AMD HSA support in CMake. Not yet in a usable state.
  + Add CMake test part for commit 8daf46f8778 (Workaround and a regression test for #195)
  + CMake part for commit 1af342b313973a (test local struct array only on LLVM without assertions)
  + Add CMake part for commit 6cad56867 (Fixes #198 by declaring the frexp and modf properly.)
  + CMake: Fix a test for bug in LLVM that incorrectly stated it's checking for assertions; it's checking for LLVM's debugging support.
  + CMake: get rid of some debugging clutter
  + Revert "Don't cast to or from (void *), especially not malloc returns"
  + Fix devel-envs.sh for cmake.
  + CMake: cmake part for 091c9de8923d208a324daf87ac40a6240c5c5dd2
  + Silence compiler warnings on ARM. No functional change intended.
  + Update buildbot scripts. Kernel cache environment variable changed.
  + SPIR modules should not specify the target.
  + Documented the TargetAddressSpaces pass.
  + Revert "SPIR modules should not specify the target."
  + Flatten address spaces also with x86_64 (again).
  + Added a regresson test for issue #231 from Andreas Klöckner and James Price.
  + Added some example dirs to the .gitignore.

-------------------------------------------------------------------
Wed Jun 24 19:16:35 UTC 2015 - mardnh@gmx.de

- Update to version 0.11.git1434115277.56749df:
  + Added a FAQ about the pocl source's language usage.
  + Only add -unroll-threshold option once
  + Infrastructure for more credible device partitioning
  + Test for clCreateSubDevices
  + Test-passing implementation of clCreateSubDevices
  + Document the new feature in CHANGES
  + Patch the AMD SDKs DeviceFission examples
  + cl{Release,Retain}Device is a no-op for root devices
  + Improve sub-devices testing
  + Add some TODO notes about sub-devices
  + Test creating kernels from empty program
  + clCreateKernelsInProgram: don't segfault on empty program
  + clCreateProgramWithSource: allocate based on program, not context properties
  + clCreateProgramWithBinary: fix NULL pointer deref
  + clBuildProgram: check input parameter validity earlier
  + Warning fix.
  + Fixed test_clGetEventInfo test indentation and removed event reference count check.
  + Command queue management utility
  + Fix qeue => queue typo
  + clEnqueueNDRangeKernel: allocator size mismatch
  + clCreateSubBuffer: allocator size mismatch
  + Replace conditional checks with asserts
  + pocl_llvm_codegen: avoid NULL pointer dereference
  + clFinish: fix event deref
  + clGetSupportedImageFormats: plug memory leak
  + GenerateHeader: initialize is_* arrays
  + clBuildProgram: refactor token append code
  + Filter out yet another flaky piglit test case.
  + global_variables-test-opencl of ViennaCL now passes
  + Fixes #183
  + Tweaked the test a little
  + Pass strings by const ref
  + Buffer read/copy/write test: check number of devices
  + event cycle test: use pocl_tests.h
  + Test memory violation errors for failing commands
  + Fix memory violation errors on failing enqueues
  + Tune event_free test
  + test_event_free: also test image commands
  + Fix memory acces violation in image enqueue commands
  + test_event_free: also test EnqueueMarkerWithWaitList
  + clEnqueueMarkerWithWaitList: fix mem access violations
  + Tests: check for NULL returned on failures
  + pocl_hash: avoid dead stores
  + regression tests: return AFTER unmapping
  + test_id_dependent_computation: return AFTER unmapping
  + image test: free imageData
  + Autonomous POCL_UPDATE_EVENT_* macros
  + Standard compliance for user events
  + Add context member to _cl_event
  + Sanitize pocl_create_command
  + clCreateContextFromType: early CL_INVALID_VALUE bail-out
  + clEnqueueNativeKernel: improve memory management
  + clFinish: add check for command_queue pointer to be non null
  + clWaitForEvents: add check for parameters
  + This is a workaround to a nasty problem with libhwloc: When initializing basic, it calls libhwloc to query device info. In case libhwloc has the OpenCL plugin installed, it initializes it and it leads to initializing pocl again which leads to an infinite loop.
  + A pointer to store device-specific data in the run command
  + cl_khr_fp64 might be defined by Clang in the later versions
  + Device-setting for whether the target is an SPMD target
  + Helper to query the file size
  + Default SPMD settings for old drivers
  + Preliminary HSA kernel agent support
  + Forgotten file
  + Further forgotten files. Sorry!
  + No pocl_hsa_init_device_ops ref when not building HSA support.
  + Cloned an older version of the HSA Runtime Reference so it can be used with the current R600 LLVM backend for the initial HSA-based GPU support.
  + The example command line modified to point to the cloned repo.
  + Fix out-of-source-tree build for the amdgcn. It didn't override the get_global_id.c correctly in that case.
  + Return an empty build log if there's none.
  + Fix to the fix. The string is owned by the caller.
  + Don't OR ExecuteAction return value with previous success
  + Make program cachedir creation more robust
  + Proper log handling in case of early failure in program build
  + Plug memleak in clGetProgramBuildInfo
  + test_clBuildProgram: const-ify kernel sources
  + clBuildProgram: test preprocessing failures
  + clCreateProgramWithSource: plug memleaks on error
  + clReleaseProgram: only free program binaries if present
  + Also test getting the program build info on failure
  + Fix wrong NULL termination for buildlog
  + pocl_llvm_api: use get_build_log everywhere
  + get_build_log: only dump to std::cerr on VERBOSE or DEBUG
  + Unlink dumped source file in case of successful program build
  + Warning fix.
  + Set llvm::GlobalValue::LinkOnceAnyLinkage to the barrier function
  + Central location for the debug macros
  + Propagate uniform info to the alloca in phi2alloca conversion
  + Ensure AA is initialized
  + Fixed issue with WGfast with vector types passed by value
  + Debug output improvements
  + The kernel compiler updates fixed these two cases

-------------------------------------------------------------------
Mon Apr 13 21:21:21 UTC 2015 - mardnh@gmx.de

- Update to version 0.11.git1428871075.01e951d:
  + CMake: clock_gettime() requires -lrt on systems with older glibc
  + CMake: adjust cmake/LLVM.cmake for upcoming LLVM 3.6
  + CMake: if condition for NDEBUG test was reversed
  + CMake: Add LLVM 3.6 stuff to config.h.in.cmake
  + Add missing define of POCL_MSG_PRINT_INFO
  + CMake: lib/CL/CMakeLists.txt: move LLVM_SYSLIBS to PUBLIC link list, move Hwloc_LIBRARIES back to devices/CMakeLists.txt
  + CMake: tests/kernel/CMakeLists.txt: fix tests to catch up with autotest
  + CMake: tests/regression/CMakeLists.txt: include & fix-include directories are already in the flags, from toplevel CMakeLists.txt, restore to previous
  + CMake: Reorder the search for llvm-config to search from latest version
  + CMake: clean up the clang library list
  + CMake: clean up the mess that is `llvm-config --lib*`. We use llvm-config to get the library names only, and use cmake's find_library to handle the rest (like library suffixes, spaces in paths, etc).
  + CMake: The try-compile test that used to find out if clang uses libc++ now stopped working on Mac OS X, so this updates it a bit. Also adds -stdlib=.. to LLVM C/LD flags.
  + CMake: minor fixes for Windows
  + CMake: Hwloc changes. 1) update FindHwloc the Windows part, to properly set Hwloc_LDFLAGS 2) settle on using Hwloc_{C,LD}FLAGS in cmake files
  + CMake: on Windows, make sure we search for patch.exe in PATH env var as well.
  + CMake: simplify the HAVE_CLOCK_GETTIME / -lrt logic a bit
  + CMake: move the include_directories(LLVM) to proper places
  + CMake: clean up the POCL_LLVM_LIBS logic a bit 1) always set it to either LLVM_SHARED_LIB_FILE(dyn) or LLVM_LIBFILES(static) 2) move LLVM_SYSLIBS in libpocl linking to PRIVATE linking
  + CMake: clean up cmake summary messages WRT to recent changes in LLVM_LIB* vars
  + CMake: clean up #define/#cmakedefine in config.h.in.cmake
  + configure.ac: fix LLVM_3_6 version check logic
  + autotools: Add AM_CONDITIONAL and an if around -lrt, Mac OS X doesn't have -lrt or clock_gettime()
  + CMake: add -std=c++11 to the libc++ try-compile test
  + CMake: add LLVM_LDFLAGS to the DNDEBUG try-compile test
  + CMake: add a check for official LLVM 3.5 of Ubuntu which is broken (really not a 3.5 but some 3.4+patches svn checkout)
  + Fix a race condition in pocl_llvm_get_kernel_metadata
  + Fix a few race conditions in clBuildProgram / pocl_llvm_api
  + Fix a few unused variable warnings
  + autotools: fix missing CloverLeaf in SUBDIRS
  + Release version tagging
  + Update the release notes
  + autotools: Add missing files to dist
  + autotest: replace abs_top_builddir with abs_top_srcdir for expected test outputs (they are not copied to the build dir)
  + Updated some CHANGES.
  + No direct funding from Nokia in this release.
  + Rename config.h to pocl_config.h as installed headers need to include it and there can be multiple packages with config.h.
  + Bump version in master branch to 0.12-pre.
  + Add cmake files to the autotools distribution.
  + Reverted accedential ViennaCL Makefile mod
  + Pthread: Fixed offset to be actually added to device mem pointer
  + TCE: the half loopvec case has magically fixed itself.
  + Libversion to llvmopencl.so
  + testsuites referred to config.h which is now renamed to pocl_config.h
  + Revert "Rename config.h to pocl_config.h as installed headers need to include it and there can be multiple packages with config.h."
  + Revert "testsuites referred to config.h which is now renamed to pocl_config.h"
  + The convert type saturated float to int test cases stored the min and max possible value to a float which is subject to rounding errors in comparisons and conversions. LLVM 3.6 failed these but LLVM 3.5 did not. Stored them to integers always and now LLVM 3.6 works also.
  + Don't pass an enum value as a bool.
  + Added Hugo to the CREDITS
  + Avoid cleaning up CMakeLists.txt.
  + Add the differences to the test title (loopvec) vs. (loop).
  + Don't check for ARC support in Clang - pocl doesn't do Objective-C yet.
  + Added testsuite for OpenCV 3.0 beta
  + Removed an accidentally copy-pasted untrue sentence ;)
  + Added proper download command and an known issue.
  + Version the master as 0.12-pre (I thought this was already done, perhaps some merge issue?).
  + Fix pocl llvm api for LLVM 3.3, which doesn't have ToolOutputFile(char*, int)
  + Fix CMakeLists.txt for autotest changes from b4a74691e56bedb
  + The newlib headers of TCE expect to see valid long and double (which in 32bit TCE are defined to be 32bit).
  + Add LLVM_VERSION to the kcache hash.
  + Fix build on LLVM 3.2 and 3.3
  + Add LLVM_VERSION also to the config.h for Cmake builds
  + Corrected WORDS_BIGENDIAN for cmake build.
  + Fix Ctest to get 100% passed in a successful make check run.
  + TCE Cmake build fixes
  + Fix the .so version (this is 0.11 still)
  + Added FAQ: Why pocl is slow?
  + is pocl or pocl is?
  + clReleaseProgram object release order crash fix by James Price.
  + Fixed image for clCreateImage3D
  + Added Mateusz some CREDITS :)
  + Fix WORDS_BIGENDIAN macro usage
  + Revert "Fix the .so version (this is 0.11 still)"
  + Revert "Bump version in master branch to 0.12-pre."
  + Fix autotools & WORDS_BIGENDIAN
  + Added 0.11 release.
  + Revert versions to 0.12-pre again.
  + Michal managed 0.11, good job!
  + Fix CHANGES
  + Fix .so version in master
  + CMake: find_library() on libllvm.so should only look in LLVM_LIBDIR, otherwise it could find a .so in system dirs which belongs to a different llvm installation.
  + Added implementation for CL_PRGRAM_KERNEL_NAMES
  + Fix to prev commit
  + ./configure to allow LLVM 3.7
  + LLVM 3.7svn fixes
  + 0.11 was released in March.
  + Forgotten CompilerWarnings.h
  + Drop 'override' to support older gccs.
  + CMake: make commands lowercase to match the rest
  + Fix a few warnings about using uninitialized variables
  + Make the timestamp in pocl debug messages to have exactly 9 digits for nanoseconds
  + Clip max_work_item_sizes to max_work_group_size
  + Minor cosmetic fixes (unused vars etc)
  + CMake: Enable POCL_DEBUG by default (already set in autotools)
  + CMake: Fix HAVE_GLEW to always have a value
  + Cleanup: fix signed-unsigned mismatches and type sizes (size_t/int)
  + Slightly clean cpuinfo.c of unsigned-signed mismatch
  + pocl_hash.h: don't export the SHA functions in the .so library
  + clCreateUserEvent.c: Fix warning: implicit declaration of function 'pocl_create_event'
  + Track LLVM 3.7 API changes. Mark regressions as XFAIL in autotools.
  + Unbreak build on LLVM <3.7
  + Fix LLVM 3.2 build
  + CMake: On Windows, the math library seems to be included in the standard libraries, so no linking is required
  + CMake: Fix the "/" "\" mess of llvm-config on Windows
  + CMake: various small fixes (message texts etc)
  + CMake: MSVC doesn't like -L<path> library paths, and its useless on windows because of static linking..
  + CMake: Prepare for adding timestamps in pocl-debug functions on Windows
  + Move debugging helpers into separate pocl_debug.h
  + Uncrustify pocl_debug.h
  + Add timestamp functions for pocl-debug also on Windows
  + Explicitly cast integers to floats (gets rid of MSVC barking)
  + Wrap GCC visibility pragmas in ifdef GNUC; MSVC doesnt understand them
  + Fix memcpy() calls: can't add int to typeless/sizeless void*
  + Add pocl_debug.{h,c} to autotools as well
  + pocl_debug: Tune the the debug messages a bit
  + Remove pocl_debug_init_time(), ended up unused.
  + Unmark one 3.7 regression - fixed alraedy in latest LLVM.
  + Use cxxflags from TCE if building the ttasim driver.
  + Make a few functions from pocl_util more portable & secure by using LLVM sys::fs API calls.
  + Make the pocl_util file read/write/append functions more portable by using LLVM's sys::fs API. Also make them race-free by using locks.
  + Add file-level locking using llvm::LockFileManager to several pocl operations like file read/write/touch. This should make things more reliable when running multiple pocl processes over the same program.
  + Simplify & unify a bunch of file writing to pocl_write_file{_cpp} functions.
  + Add a replacement for llvm::tool_output_file to write out llvm::Module objects to disk.
  + Add PoclLockFileManager.{cc,h} - a class that extends llvm::LockFileManager class (portable file-system level lock) a bit. Mostly enhanced with methods to atomically read/write/remove/test a file. Used by kernel cache.
  + Add pocl_cache.{c,h} - move all (C callable) cache related functions to a single place.
  + Add LLVMFileUtils.cc: a bunch of file utility functions, callable in C, but written in C++, using llvm::sys::fs heavily, for portability reasons.
  + Add previously added files to CMake/autotools
  + lib/CL/pocl_llvm_api.cc: remove old code that was moved to pocl_cache.c
  + clBuildProgram.c: move to new kernel cache functions
  + clCreateKernel.c: move to new kernel cache functions
  + clEnqueueNDRangeKernel.c: move to new kernel cache functions
  + clGetProgramBuildInfo.c: move to new kernel cache functions
  + clReleaseProgram.c: move to new kernel cache functions
  + pocl_llvm_api.cc: update pocl_llvm_build_program() to use the new kernel cache
  + pocl_llvm_api.cc: update pocl_llvm_get_kernel_metadata() to use the new kernel cache
  + pocl_llvm_api.cc: update pocl_llvm_generate_workgroup_function() to use the new kernel cache
  + pocl_llvm_api.cc: update pocl_update_program_llvm_irs() and pocl_llvm_update_binaries() to use the new kernel cache
  + lib/CL/devices/common.c: move to new kernel cache functions
  + piglit: Updated the reference file for LLVM 3.5 and latest piglit. Filter out white space from the piglit output as it seems to randomly generate spaces here and there, making comparison difficult.
  + Mark XFAILs for LLVM 3.7 in amd and viennacl tests.
  + Test the possibility to call a kernel 'init()'
  + Change kernel launcher mangling to _pocl_launcher_
  + Add better checks to clCreateKernel(sInProgram) calls - check for the program's build status before trying to get the kernel.
  + A few more unsigned-signed / size_t-int fixes
  + Make pocl_llvm_build_program also fill in program->binaries[device], in addition to writing program.bc in the cache. This lets us avoid reading program.bc later on just to get the binaries.
  + Test for linking errors during clBuildProgram
  + Missing char ;)
  + Added opencv.patch
  + clCreateKernel: return CL_INVALID_PROGRAM instead of invalid value
  + clCreateKernelsInProgram: fix a mistake in error handling....
  + Change program->binaries[] to an array of separately malloc'ed buffers, instead of a single buffer at [0] with the rest being pointers into it. This will allow later to re-allocate binaries by one, as needed.
  + clBuildProgram: Do not try to reallocate program->binaries and program->llvm_irs everytime we rebuild - reallocating to smaller arrays would lead to memleaks. Instead, walk the entire program->devices and see which devices on the supplied list need building.
  + clBuildProgram(): minor cleanup
  + lib/CL/clGetProgramInfo.c: simplify the logic a bit
  + lib/CL/pocl_cache.c: Add a bunch of asserts to make debugging easier
  + Get rid of PoclLockFileManager class - its overcomplicated and brings too much overhead. Will do a simpler locking based on just a single lock per program.
  + Add ifdefs to GCC visibility pragmas in a few more headers
  + Clean up getting pocl_verbose from environment, this seems to have been unreliable for some reason.
  + Change the locking functions back to a simple llvm::LockFileManager
  + pocl_file_util.h: Just a bit of cleanup in the header
  + Get rid of PoclLockFileManager and use the simpler locking in the runtime library. This involves getting a lock when entering a cl* call that works with the cache, and releasing the lock at return, so the lock granularity is per-call instead of per-file-operation.
  + Get rid of PoclLockFileManager in LLVMFileUtils.cc, using mostly the same code, except not locking for each file operation.
  + pocl_cache.c: fix pocl_cache_write_descriptor() to "mkdir -p" the required directory first.
  + pocl_cache.c: do not "rm -rf" the program's cache directory based solely on POCL_LEAVE_KERNEL_COMPILER_TEMP_FILES in pocl_cache_cleanup_cachedir()
  + Change device_i to unsigned in pocl_llvm_api.cc
  + pocl_llvm_api.cc: fix memleaks in pocl_llvm_update_binaries()
  + Expand image query test
  + Implement get_image_dim
  + CHANGES: 0.11 was released already.
  + CompilerWarnings.h, not .hh
  + Group kcache dirs by first two digits
  + Unmark XFAILs on ARM-LLVM3.7
  + Buildbot script: get the testuite logfile correctly for cmake builds.
  + CMake: Add get_image_dim.cl to sources-with-vml list as well.
  + Buildbot updates, add fileIsImportant filtering.
  + Track LLVM 3.7 API change.
  + Buildbot, don't use kernel cache if requested not to.
  + Fix build against <LLVM 3.7, broken in 10793bfd.
  + Clean up some whitespace and typos in comments
  + Always clone event list when creating command
  + Add a simple read/copy/write buffer test
  + Add test to check for neverending loop in event wait lists
  + AMD ICD loader supports OPENCL_VENDOR_PATH for overriding the ICD dir. Set it also so AMD ICD loader can be used with the pocl test suite.
  + GenerateHeader::ProcessPointers: use existing image/sampler classifier
  + De-anonymize image concrete types
  + CMake: add code that computes the SHA1 of kernel-<machine>.bc and a few files from pocl/include (like _kernel.h) and use these SHA1s in generating the build hash. This makes sure the kernel cache is invalidated also at kernel library code changes.
  + Make pocl_hash.h includable in C++ code
  + Make the kernel cache depend on preprocessed CL source, part 1
  + Make the kernel cache depend on preprocessed CL source, part 2
  + Make the kernel cache depend on preprocessed CL source, part 3
  + Make the kernel cache depend on preprocessed CL source, part 4
  + Make the kernel cache depend on preprocessed CL source, part 5
  + Fix get_image* functions on LLVM 3.2. Fix by tango.
  + Add kernellib SHA1 generation rule for autotools
  + pocl_llvm_api.cc: remove redundant argument from kernel_library()
  + Reintroduce "Group kcache dirs by first two digits" commit in new place
  + TCE: Do not use the kcache temp dir when generating the vendor extension header. Use the TCE's backend temp dir instead.
  + Warning fixes.
  + LLVMFileUtils.cc: remove an unused function (lock_is_owned)
  + Rename pocl_cache_kernel_so_path to pocl_cache_work_group_function_so_path
  + Make POCL_PARALLEL_BC_FILENAME consistent with other cache filenames
  + Use pocl runtime config, instead of getenv(), to get POCL_CACHE_DIR
  + LLVMFileUtils.cc: adapt to older LLVM releases (up to 3.3). Needs more testing, but kernel cache locks will be disabled with llvm < 3.5
  + Update documentation on POCL_KERNEL_CACHE env variable.
  + Remove the POCL_KERNEL_CACHE_IGNORE_INCLUDES switch, doesn't make sense anymore with preprocessed sources
  + Add a new helper poclu_write_file in libpoclu
  + pocl_llvm_build_program: separate build log printing into get_build_log() and make sure we properly return failure if the preprocessing step fails.
  + Add a test for kernel cache returning correct (different) binary for CL programs that #include sources, when those included sources change.
  + Add recent tests by G. Bilotta to CMake tests
  + TCE: add a (now) missing include.
  + Close cpuinfo after reading it
  + Get CPU vendor from cpuinfo too
  + Try to find a meaningful CL_DEVICE_VENDOR_ID
  + Set spec-minimum values for image_max_{buffer,array}_size too
  + Correct handling of subdevices in clBuildProgram()
  + Compute maximum image dimensions from max_mem_alloc_size
  + Don't put arbitrary limit on max_mem_alloc_size
  + mem_base_addr_align is in bits, not in bytes
  + Remove unnecessary repeated init
  + printf_buffer_size: use spec min value
  + Move buffer and image limits out of topology handling
  + Kernel cache: move top cache directory creation to pocl_init_devices(). This is required if we want to use the cache early (before regular program cachedirs are available) for temporary files.
  + pocl_cache_write_program_source() doesn't need a device_i argument, since at that point the cachedir is not available yet and we have to use a temporary file.
  + Fix tmpnam() usage. The warnings were annoying, so lets try to use mkstemp() on linux and _tempnam on windows. Since this gets a bit involved, moved it into pocl_cache_mk_temp_name().
  + Just two more fixes 1) remove redundant llvm includes 2) add explicitly O_RDWR to open(), otherwise it does O_RDONLY
  + Find cache information via hwloc
  + Missing pocl_launcher prefix.
  + There seem to be indeterministically passing cases in piglit :I
  + Filter out an interministically passing piglit test case.
  + Find cache information via hwloc
  + Update CHANGES
  + Track LLVM 3.7 API change.

-------------------------------------------------------------------
Fri Feb 13 13:27:23 UTC 2015 - mardnh@gmx.de

- Update to version 0.10.git1423815643.a0ccdfd:
  + Mark hadd test as XFAIL on all x86_64 variants.
  + Mark AMD montecarlo unexpected pass on 3.4
  + Fixed the barrier duplication problem by using the attribute `noduplicate`
  + The noduplicate for barrier fixed a couple of previously failing cases.
  + XFAILs for tests that fail without the noduplicate (introduced in LLVM 3.5?).
  + The XFAIL to the correct test this time.
  + Change the kernel cache directory to ~/.pocl/kcache to prepare for having a configuration file or similar other data under ~/.pocl.
  + fixed incorrect lock initializations
  + fixed minor clang warnings
  + Release managing tips updated.
  + Updated Windows readme with complete build script.
  + Basic read support for CL_BGRA image format.
  + Different rounding modes to cl_half from float conversions
  + Enabled tce write_rect()
  + Enabled also tce_read_rect()
  + Transferred buffer offset calculation to device driver side
  + Fixed tce read/write_rect
  + Fixed again tce read/write_rect
  + updated driver api modifications to CHANGES + code cleanup

-------------------------------------------------------------------
Tue Dec 30 20:33:07 UTC 2014 - mardnh@gmx.de

- Update to version 0.10.git1419678929.c2f9267:
  + Added clCreateSubDevices implementation for AMD SDK DeviceFission
  + Changed DeviceFission test to expected pass
  + Added DeviceFission fixes to AMD SDK patches
  + Adding sub devices increases parent device's reference count
  + Put the info printout to a single line so there's only one info printout per kernel enqueue
  + testsuite-halide: - Advise to use the pocl's branch of Halide until the necessary changes are upstreamed - Add local_laplacian, bilateral_grid and interpolate sample apps
  + Enable the debug message printout capability by default as it should not incur significant overheads unless enabled by POCL_DEBUG=1.
  + Fix the issue with wrong format string.
  + Fix LLVM 3.6svn by changing the order of two clang libraries
  + Fixes to Halide example build
  + Add -lclangAST to the end of the linkage list also to unbreak LLVM 3.5.
  + The POCL_DEBUG output requires the clock_gettime and older glibs have it in a separate lib.
  + Maintain seperate directories for heterogeneous cpus in multi-node clusters
  + Capture build timestamp in cmake This is a parameter for kernel cache
  + Rename function to something meaningful
  + Fixed buffer allocation problem for memories bigger than 2GB  - Updated `chunk_slack` function to return only the result of the comparison and set `last_chunk_size` via a pointer  - Updated the other functions which use that function
  + Halide: force linking to the LLVM shared lib as it will be required in the upstream.
  + Changed coding style
  + Fix a race condition in the lock initialization.
  + Parent device's information copied to sub devices.
  + Instruct to use the upstream Halide now that the LLVM shared lib patch was merged in.
  + benchmark.py: Add Halide cases.
  + benchmark: Add a 4K image Halide benchmark. The sample is in public domain:
  + Disable the custom bufalloc by default for pthread for now.
  + Fix an issue where pthread malloc allocator freed host pointers.
  + Missing backslash.
  + Download the 4k test image on demand instead of having it in the git repo.
  + Typo in the img name.
  + Include PACKAGE_VERSION and POCL_BUILD_TIMESTAMP in the cache key hash, do not have a separate file for POCL_BUILD_TIMESTAMP.
  + Add Volkan Keleş to CREDITS.
  + Cleanups.
  + Added CloverLeaf OpenCL to the test suites.
  + Ensure the Halide cases are ran with opencl. At least interpolate does not run with opencl by default.
  + Use the 4K image.
  + Add POCL_KERNEL_CACHE_IGNORE_INCLUDES env
  + Clean indentation.
  + The upstream clover.in works now
  + Add debug printout to clBuildProgram
  + POCL_RETAIN_OBJECT used properly.
  + Add pocl include to the build paths as the first entry if using pocl's CL headers.
  + Fix broken offset computation in read rect.
  + Fix a broken copy rect in 'pthread' by using a working one from 'basic'.
  + GNU style indentation
  + Added CREDITS to Lassi and updated CHANGES.
  + Return CL_SUCCESS instead of NULL.
  + Fix for AMD APP SDK MtrixMulImage test
  + Attempt to fix Cmake
  + Warning fix.
  + Attempt to black list a CPU in an XFAIL.
  + LLVM 3.6 fixes, mostly due to the changes in the Metadata API.
  + Kernel functions abs abs_diff add_sat hadd mad_hi mad_sat mul_hi rhadd sub_sat fails with (u)short3 and loopvec also with LLVM 3.6. Mark XFAIL for now.
  + Use POCL_ANDROID macro and move HOST_LD_FLAGS to configure script
  + Prepare cmake files for android and make it cross-compile friendly
  + Link to libopencl-stub statically instead of directly including its sources in pocl-android-example
  + 1. Move to ndk gcc 4.9. lto compilation is much faster in 4.9 2. Use cmake based cross-compilation for android

-------------------------------------------------------------------
Thu Nov 27 19:23:18 UTC 2014 - mardnh@gmx.de

- Update to version 0.10.git1417094681.37fd2f6:
  + Set default global_offset to 0 instead of 1 to avoid confusion.
  + CMake parse error fix.
  + 1. Avoid POCL_MEM_FREE when there is no pointer variable in l-value 2. Set pointer to null after pocl_aligned_free()
  + Android: avoid null pointer dereference
  + Link for AMD-APP-SDK-v.2.8
  + Cleanup it a bit.
  + AMD SDK examples need libSDL, mention it in the README files.
  + AMD examples: Added info of the 'cmake' requirement and the 'prepare-examples' step.
  + Fix clCreateKernel with multiple devices
  + Initialize and use device’s dev_id
  + New driver API func. build_hash() for devices to add whatever to kcache hash
  + Added Visual Studio compatible alignment macros.
  + Removed ltdl.h when compiling with Visual Studio.
  + Added Visual Studio implementation for __restrict__
  + Added Visual Studion implementations for libltdl functions.
  + Undeclared a define made in Windows.h which conflicts with a goto label.
  + Fixed preprocessor or/and/not operators to be vc++ compatible.
  + Added VC++ implementation for lt_dlinit and removed hidden visibility pragma from GET_PROTOTYPES macro.
  + Removed header from build list to prevent triggering #error in Visual Studio.
  + Visual Studio does not have include_next.
  + Added include to get cl_device_id typedef.
  + Fixed places where unistd.h has been included to include vccompat.hpp on Visual Studio.
  + Fixed last missing Visual Studio header.
  + Added snprintf compatibility define.
  + Fixed allocating dynamic size array to use malloc.
  + Added strtok_r implementation for Visual Studio.
  + Fixed casting malloc return type for pthread.c and added alloca compatibility code for Visual Studio.
  + Make building of pocl-devices library to pass on Visual Studio.
  + Added many explicit void* casting to actual pointer for Visual Studio and implemented couple of missing functions.
  + Fixed couple of casting errors more.
  + Fixed -I not to fail even if there are spaces in CMAKE_CURRENT_SOURCE_DIR
  + Added resolving correct llvm / clang shared / static library names for Windows
  + Added parsing library strings to lists before trying to iterate over them.
  + Removed one unused variable that should not be in LLVM.cmake in current form.
  + Fixed generation of vc++ lib names one more time and the way how hwloc is included to linking so that it works in windows and *nix
  + Fixed typo from linking with pthreads with Visual Studio and prevented generation of duplicate compatibility functions.
  + Fixed rest of poclu Visual Studio compile errors and exported .dll symbols from OpenCL.dll
  + Exported API of poclu.dll
  + Fixed some examples to use only required API of host vector types (only .s[]).
  + Fixed couple of Visual Studio errors of invalid compiler flags and some more compatibility issues.
  + Reverted dynamic array to malloc change, which caused segfault on OSX (maybe bug elsewhere) and made llvmopencl lib to link on Visual Studio.
  + Added implicit pointer cast for Visual Studio
  + Fixed last Visual Studio breakages after kernel cache feature.
  + Reverted cmakefile fix, which caused pocl-standalone test to fail on ubuntu and added missing llvm system libs to be linked with POCL_LLVM_LIBS
  + Added exception for Visual Studio.
  + Fixed to call patch executable found by cmake instead of hard coded exe that is expected to be found from path.
  + Fixed libclang link list to be just libnames, which are then given to POCL_PRIVATE_LINK_LIST
  + Removed duplicate link flags and fixed LLVM_SYSLIBS to be linked with target_link_libraries() which works also with windows build.
  + Fixed compiling examples c++ compiler.
  + Added hardcoded target type for windows.
  + Fixed cast to be explicit for Visual Studio
  + Added initial instruction for building in windows.
  + Updated the logic to choose the local size automatically for embarrassingly parallel kernel launches: - Split first at the different dimensions while trying to retain   the optimal WG width (for WG vectorization). - Try not to create too big local size due to the WI context data   overheads it causes. - Fixed a couple of test cases which erroneously left the WG size   to be automatically set and broke down when it was set to something   unexpected diff --git a/lib/CL/clEnqueueNDRangeKernel.c b/lib/CL/clEnqueueNDRangeKernel.c index a0951cb..1e15848 100644 --- a/lib/CL/clEnqueueNDRangeKernel.c +++ b/lib/CL/clEnqueueNDRangeKernel.c @@ -112,6 +112,15 @@ POname(clEnqueueNDRangeKernel)(cl_command_queue command_queue,      }    else      { +      /* Embarrassingly parallel kernel with a free work-group +         size. Try to figure out one which utilizes all the +         resources efficiently. Assume work-groups are scheduled +         to compute units, so try to split it to a number of +         work groups at the equal to the number of CUs, while still +         trying to respect the preferred WG size multiple (for better +         SIMD instruction utilization). +      */ +        size_t preferred_wg_multiple;        cl_int retval =          POname(clGetKernelWorkGroupInfo) @@ -119,30 +128,83 @@ POname(clEnqueueNDRangeKernel)(cl_command_queue command_queue,           CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE,           sizeof (size_t), &preferred_wg_multiple, NULL);
  + Do not add implicit barriers to kernels without WG barriers to avoid WI context data overheads with these "embarrassingly parallel" cases.
  + Fix build without --enable-debug-messages
  + Improvements to the implicit vectorization
  + Re-entrance and a comment fix
  + POCL_VECTORIZER_REMARKS option
  + Add the WG method to the kernel compiler hash key
  + Xfail only if core-avx-i is used and LLVM 3.5.
  + Penryn fails this case also.
  + Testsuite for Halide examples (currently only one)
  + The Mandelbrot case of AMD SDK 2.9 seems to be built in a different directory and it prints out some warnings during build time. Fix the former and ignore the latter.

-------------------------------------------------------------------
Tue Oct 21 06:04:06 UTC 2014 - mardnh@gmx.de

- Update to version 0.10.git1413819612.143952b:
  + Add a compile time switch to enable pocl debug messages.
  + Add debug message macros to pocl_cl.h
  + Fix includes of pocl_util.c - pocl_cl.h is included by pocl_util.h already - pocl_runtime_config.h is required for pocl_get_bool_option()
  + Change POCL_ABORT_UNIMPLEMENTED macro to take a string argument (a message to print before aborting).
  + Add debug messages for clBuildProgram
  + Add debug messages for clCreateBuffer
  + Add debug messages for clCreateCommandQueue
  + Add debug messages for clCreateContext
  + Add debug messages for clCreateContextFromType
  + Add abort messages to clCreateFromGLTexture*
  + Add debug messages for clCreateImage
  + Add debug messages for clCreateKernel
  + Add debug messages for clCreateKernelsInProgram
  + Add debug messages for clCreateProgramWithBinary
  + Add debug messages for clCreateProgramWithSource
  + Add debug messages for clCreateSubBuffer
  + Add debug & abort messages for clCreateSampler
  + Add abort message for clCreateUserEvent
  + Add debug messages for clEnqueueBarrier
  + Add buffer bound and overlap checking functions to pocl_util.c/.h
  + Add debug messages for clEnqueueCopyBuffer
  + Add abort message for clEnqueueCopyImageToBuffer
  + Add debug messages for clEnqueueMapBuffer
  + Add debug messages for clEnqueueMarker
  + Add debug messages for clEnqueueMarkerWithWaitList
  + Add debug messages for clEnqueueNativeKernel
  + Add debug messages for clEnqueueNDRangeKernel
  + Add debug messages for clEnqueueReadBuffer
  + Add debug messages for clEnqueueUnmapMemObject
  + Add abort message for clEnqueueWaitForEvents
  + Add debug messages for clEnqueueWriteBuffer
  + Add abort messages to clFinish
  + Add debug messages for a bunch of clGet*Info API calls
  + Add debug messages for a bunch of clRelease* API calls
  + Add debug messages for clRetain* API calls
  + Add debug messages for clSetKernelArg
  + Add abort messages to clSetUserEventStatus and clSetMemObjectDestructorCallback
  + Add debug messages for clSetEventCallback
  + Add pocl_check_device_supports_image() to simplify checking image support by a device
  + Add debug messages to pocl_check_image_origin_region()
  + Add debug messages for clGetSupportedImageFormats
  + Add debug messages for clGetDeviceIDs
  + Add two bound check / overlap methods, and add debug messages to clEnqueueCopyBufferRect - pocl_buffer_boundcheck_3d for bound checks with regions - check_copy_overlap for overlap check on rectangular buffer regions,   copied from Khronos OpenCL specification
  + Add debug messages for clEnqueueCopyBufferToImage
  + Add debug messages for clEnqueueCopyImage
  + Add debug messages for clEnqueueFillImage - plus removed a bunch of code which now lives in pocl_image_util.h,   pocl_check_device_supports_image() etc
  + Change pocl_buffer_boundcheck_3d() first argument to buffer->size, not buffer - the checks in clEnqueue{Read/Write}BufferRect do not have a buffer for host memory, only a pointer
  + Add debug messages for clEnqueueReadBufferRect
  + Add debug messages for clEnqueueWriteBufferRect
  + Small fixes in clEnqueueCopyImage
  + Add debug messages for clEnqueueMapImage
  + Add debug messages for clEnqueueReadImage
  + Add debug messages for clEnqueueWriteImage
  + Change comments to C-style
  + Fix TCE device's use of the POCL_ABORT_UNIMPLEMENTED which now required a msg arg.
  + Fixed wrong error code returned from clCreateCommandQueue when a device was not found.
  + Fix for building on ARM/openSUSE by Martin Hauke.

-------------------------------------------------------------------
Tue Oct 14 17:31:36 UTC 2014 - mardnh@gmx.de

- Update to version 0.10+git.1413194287.339085e:
  + Revert "pthread device: increased minimum buffer allocation size to 10MB"
  + Add a minimum memory region size to avoid ending up with a bunch of memory regions in case of applications with very small buffers.
  + [mips] Add support for mipsel* triples.
  + [mips] Update status for big-endian
  + Fix a typo in clGetPlatformInfo.c

-------------------------------------------------------------------
Tue Oct 07 20:06:39 UTC 2014 - mardnh@gmx.de

- Update to version 0.10+git.1412677395.48b3e32:
  + remove clconfig.h.in and lib/kernel/clang.mk
  + Initial CMake buildsystem commit
  + fix include paths for dev_image.h
  + Add CMakeLists.txt for include/*
  + Initial version of cmake/bitcode_rules.cmake
  + Initial version of lib/CMakeLists.txt
  + Add the initial version of lib/llvmopencl/CMakeLists.txt
  + Initial version of CMakeLists for lib/CL{/devices,}
  + Initial version of CMakeLists for lib/kernel/*
  + Initial version of CMakeLists.txt for lib/poclu
  + Initial version of CMakeLists for scripts/*
  + Initial version of CMakeLists.txt for examples/*
  + Initial version of CMakeLists for tests/{kernel,regression,runtime}
  + Initial version of examples/standalone/CMakeLists.txt
  + Initial versions of CMakeLists for examples/{trig,scalarwave}
  + Remove POCL_BUILDING=1 from test environment
  + Initial version of tests/workgroup/CMakeLists.txt
  + Separate lib/CL/devices build system
  + Fix skip/will-fail of tests in examples/example*
  + Finish tests/kernel
  + Add skip-if/fail-if conditions to tests/regression
  + Fix the runtime/clBuildProgram test
  + Fix a bunch of typos in cmake/LLVM.cmake
  + Fix detection of enabled assertions in LLVM
  + Fix LD_FLAGS for Mac OS X
  + Make status messages more consistent
  + Fix some status messages
  + LibLTDL fix for Mac OS X
  + Fix hwloc for Mac OS X
  + Fix CMAKE_C(XX)_FLAGS
  + Fix LD_FLAGS for libllvmpocl in Mac OS X
  + Fix SOVERSION on cmake MODULE for Mac OS X
  + Fix DIRECT_LINKAGE cmake option
  + Add CPU type variables for tests
  + Autotools -> cmake for commit ee258ec2629024f966526e5ba0c44b18ad8bbaf1
  + Fix a few target properties on llvmopencl
  + Get rid of LIBPOCL_LOCATION and POCL_LIB_SUFFIX
  + Fix llvm search for PCBSD
  + Make pocl more friendly with other ICD loaders than ocl-icd
  + Fix the uppercase of some commands to match the rest
  + Make a status message more clear
  + Fix the search for libOpenCL
  + Fix up cl.hpp + OpenGL checks
  + Fix CL_DISABLE_LONG config.h variable
  + Revert a few changes to be compatible with autotools
  + Change pthread device's support of doubles to depend on CL_DISABLE_LONG
  + Fix ARM compilation
  + Introduce CL_DISABLE_HALF cmake variable
  + Another ARM fix: define HOST_FLOAT_SOFT_ABI
  + Bring cmake up2date after master merge
  + Adjust required CMake version to 2.8.12
  + Autotools -> cmake for commit 996fcf04e660e9762d7e442747a0a77dacd5782e
  + bring cmake branch up2date with master
  + Search for LLVM tools in more locations (suggested by llvm-config etc)
  + Add license information to cmake files
  + bring cmake branch up2date with master
  + Fix variables than need to be cached
  + Fix cmake for changes in llvm-config (llvm 3.5)
  + CMake fix for commit a037d2148c1c564c33
  + Make workgroup tests check their output against the expected output
  + Default find_program(llvm-config) should also check for LLVM 3.5 now
  + Fix issue 113 in cmake
  + Add cmake check for which C++ stdlib clang is using
  + Fix checking for any OpenCL library
  + Clear up messages on static linking
  + One more C++ stdlib fix
  + Fix statically linking LLVM into libpocl.so
  + Make test_shuffle_half only run if we have half support
  + Add STATIC_LLVM to variables printed in summary
  + Add information on using cmake to INSTALL
  + Add developer information on using cmake
  + Add CL_DISABLE_{HALF,LONG} to variables printed in summary.
  + Mark block test as pass on ARM.
  + Added CMake support to the CHANGES.
  + Disabled the external_linkage test as it's buggy.
  + The Clang patch was applied upstream. Not expecting a FAIL with the blocks test anymore with Clang 3.5.
  + Use clang compilation instead of lli-interpreting bitcode in cmake tests
  + Append pthread to LDFLAGS instead of overwriting LDFLAGS
  + Fix to issue #117
  + Fix issue #119
  + Tweaks to CHANGES.
  + CHANGES: Added the new bitcode linker.
  + Release notes.
  + Release version tagging. Disallow LLVM 3.6.
  + Several packaging issues revealed by 'make distcheck' fixed.
  + Release notes.
  + Fix CMake build on ARM. Issue #119.
  + Yet more distcheck fixes.
  + Some more fixes for issues found by distcheck
  + Started 0.11 development.
  + clBuildProgram fix: length of compiler options is not limited anymore
  + Clean the CMake version suffix to set the version string to 0.10.
  + clBuildProgram fix: length of compiler options is not limited anymore
  + Update device prototypes
  + Add myself to CREDITS
  + Fix TCE build after the declaration of the old mem alloc function was removed from prototypes.inc
  + Track LLVM 3.6 API changes. Passes with LLVM 216613.
  + Skip ViennaCL global-variables test on LLVM 3.6 - regression.
  + Disable the flaky ABI-dependent test cases.
  + 0.10 released
  + Fix falsely detecting operations with side-effects (especially atomic operations) as uniform. This caused deadlock/race situations due to illegal implicit barrier injection.
  + Do not consider volatile loads always 'varying'.
  + Added xarg support to prevent fail when compiling kernel bc library.
  + Fixed 2 tests which should compile host code to use host compiler, instead of kernel code compiler.
  + Fixed check if kenel clangxx compiler works for compiling vecmathlib and added resolving correct -stdlib= switch.
  + Added paths to cmake to find llvm headers and libs.
  + Added clangxx stdlib flag for compiling kernel library.
  + Update buildbot scripts from buildmaster.
  + LLVM was not initialized (especially the targets were not added) in case of hitting the "kernel compiler cache" and thus skipping the work-group generation. This caused failures at least when statically linking LLVM.
  + Update buildbot scripts - pass git_mode property from scheduler. - cmake build option.
  + www: Added a publications section.
  + Add 'make check' pseudo-target to CMake
  + LLVM 3.6: DataLayoutPass constructor arg change.
  + Fix missing arg to AC_SUBST.
  + Mark test regressions on CMake. Issue #127
  + Fixed review comments.
  + Fixed an issue in test that worked by chance with little endian, but failed on MIPS big endian.
  + Remove the old WIVectorize.
  + Make penantic build pass.
  + Remove traces of WIVectorize
  + Remove some remnants from WIVectorize.
  + Add initial support for MIPS.
  + Make config/xclang respect --host and --build.
  + Make clang++ checks in configure.ac respect --host
  + Lower address spaces for Mips.
  + Added cross platform compatible script for finding hwloc library.
  + Added support to FindHwloc.cmake for Hwloc_CFLAGS and Hwloc_LDFLAGS
  + Fixed setting Hwloc include directories as list instead of string to prevent error with empty list.
  + Enable CPU and frequency detection for Mips.
  + CMake commit for starting 0.11 release
  + Add a bit more debugging info in cmke try-compile messages
  + /bin/bash is not always available (BSDs etc)
  + CMake: LOCATION target property has been deprecated in CMake 3.0.0, so replace it with file(GENERATE ..) and generator expressions.
  + CMake: Remove redundant messages from cmake/LLVM.cmake
  + CMake: the block test is no longer failing with clang 3.5 (see commit 152943e5d6a for autotools part)
  + Add cmake counterpart to commit d3c5b6a0 (test clGetKernelArgInfo failing on LLVM 3.2)
  + Modify test_clGetKernelArgInfo to also test for issue #100
  + CMake: make add_test_custom usable in other than workgroup tests too
  + CMake/tests: Move kernel/test_sizeof and test_block's expected outputs to external files.
  + CMake/tests: move tests/runtime/clSetEventCallback expected output to an external file.
  + Fix end-of-lines in expected output of tests
  + CMake/tests: move some tests (printf, examples, scalarwave) expected output to an external file.
  + Added macro for finding pthreads-win32 lib when compiling with Visual Studio.
  + Making cmake configuration pass to complete with Visual Studio 12 generator.
  + Added llvm-config --bindir to hints where to find clang etc.
  + Removed /dev/null reference.
  + Removed cmake libltdl dependency on windows.
  + Disabled using ICD on Windows for now.
  + Omitted libGlew finding test for Visual Studio because it requires pkg_check_modules() which is not available in windows.
  + Removed libc dependency from sizeof and alignment tests.
  + Changed aligment/typesize tests to be done with lli to prevent need to call Visual Studio linker separately to build exe.
  + Added installation path setup for windows.
  + Fixed cl.hpp patch to be visual studio friendly.
  + Removed showing error if Hwloc_INCLUDE_DIRS is empty. This can be the case if it is installed in system includes.
  + Added necessary libs to make tests to link with poclu on ubuntu.
  + Cmake/test: change add_test_custom/_workgroup macros to functions
  + Cmake/test: remove ${CMAKE_CURRENT_SOURCE_DIR} from arguments to add_test_custom (it's redundant)
  + pthread device: increased minimum buffer allocation size to 10MB
  + Added -force-interpreter to lli call, to prevent JIT problems on ARM/mips
  + Set all sources to be compiled with c++ compiler when using Visual Studio.
  + Fixed one more cmakelist. to compile sources with c++ when compiled with Visual Studio.
  + Suggestions for resolving memory leaks
  + Remove comments and etc..
  + Clean up multiple definitions of POCL_RETURN_*_GETINFO macros for returning information in clGet*Info calls.
  + TTA: fixed FU operand width in fp16 ADF
  + Fix issue #127 - workgroup/workgroup_sizes tests don't work in cmake
  + Pruned the changes a bit to make it more like an overview.
  + Revert a change that caused a double free in runtime/clFinish test.
  + Reverted another part of the accidental Lee Ki-ju merge as it caused two AMD SDK 2.9 tests to fail:
  + Initial commit of android port 1. Add android flags in configure 2. Use memalign instead of posix_memalign for android 3. Place tmp files in /sdcard/pocl/tmp for android 4. Rename parallel.so to [kernel_name].so. Bionic probably has a bug that returns same handle for .so with same name in different paths
  + Second patch set for android 1. Put tmp files to POCL_TEMP_DIR & clear it at next run 2. patch cl.hpp for android
  + Rebase with upstream master
  + Fix cl.hpp for android
  + Remove examples/tests for android
  + Adding android build script & dependent prebuilt libs in seperate repo
  + Implement kernel cache feature [experimental & incomplete]
  + kernel cache improvements
  + Improve kernel cache for clCreateProgramFromBinary Even binary programs including spir are cached now
  + Additional checks
  + Update pocl_util.c
  + Fix one more error
  + Fix number of cores in cpuinfo for android
  + Adding android example A simple vector-addition sample
  + Add more space for cmdline of exe
  + Avoid calling OsLib setenv. This is not implemented in 4.2.2 Use jni wrapper to call setenv Set minSdkVer to 17
  + Take the risk of mfloat-abi hard Anyways only android versions from 4.2.2 is supported
  + Modify cpuinfo for android. Use sysfs system cpu node to get consistent info across devices
  + Revert "Take the risk of mfloat-abi hard"
  + Freeze for beta release
  + Set HOST_FLOAT_SOFT_ABI for android. Need to support most of the devices
  + Code review fixes
  + Return number of bytes read from file util functions
  + Fix unattended conflict during rebase
  + Hash: add hash string for program and kernels
  + Transform program cache to uses Clementleger's SHA Use POCL_CACHE_DIR instead of POCL_TEMP_DIR Program sources/binaries will be globally cached in ~/.pocl/ by default based on their SHA POCL_CACHE_DIR overrides location for cache storage Android build is not tested yet!
  + Search for #   include s in program source and invalidate cache Which means any number of spaces between # & include Could be done using clang pre-processor too, but can't afford to get clang inside android since its bulky
  + Fix unattended conflict during rebase
  + Enable kernel-cache by default. Provide a disable option in configure script --disable-kernel-cache
  + Build error fix: Insert space between string literal and macro
  + Fixes for android after rebase
  + Revamp clBuildProgram to support kernel cache feature
  + Use POCL_KERNEL_CACHE env variable
  + POCL_MEM_FREE: null-check for free() and avoid double-free
  + Implement clGetBuildProgramInfo missing features build log is now stored in a file
  + Use inbuilt macros for clGetProgramBuildInfo
  + Remove references to kernel.bc. Its no longer required since llv_irs[] is populated in build step itself Apparently kernel.bc was a copy of program.bc
  + Add last_accessed file to track least recent access to program's cache directory
  + Added kernel compiler cache and Android support to the feature highlights. Added Krishnaraj to the CREDITS.
  + Marked bandwidth-reduction of ViennaCL as 'long'.
  + Use fputs() instead of fprintf() as there's no format string.
  + Update buildbot scripts due to kcache
  + Fix out-of-tree CMake build
  + Fix CMake build to work with kernel cache. TIMESTAMP still undone
  + Fixup previous commit - forgot to add pocl_hash.c to cmakelist.txt
  + Add CREDITS to EXTRA_DIST as instructed by Vincent.
  + Fix LLVM 3.6 regression on ARM - address spaces cast is not a no-op.

-------------------------------------------------------------------
Mon Aug 18 18:07:13 UTC 2014 -  mardnh@gmx.de

- Update to version 0.9+git.1408177758.80c7775:
  + Warning fix.
  + Created a new build command 'prepare-examples' that should be called explicitly (usually only once) to initialize the external examples. This is to avoid useless building of them all the time when 'make check' is called.
  + Choose the SPIR test from the POCL_DEVICE_ADDRESS_BITS. Fixes #89.
  + Fix building without the custom memory allocator. Fixes #75.
  + Instruct to use LLVM 3.4 for OS X as it's the latest known working version. Issue #108.
  + Fine-tune uncrustify script.
  + Re-format code, no functionality change.
  + Fix linker to handle the case where kernel library functions call other kernel library functions.
  + Fix linker some more. This is a second part of commit 6d983a, which got lost by mistake.
  + Corrected parameter value types in clGetDeviceInfo.
  + Added James Price to the list of CREDITS.
  + Create a separate clean-examples rule to cleanup the examples that need preparing.
  + Do not call prepare-examples recursively for examples that are not enabled.
  + Avoid using a dynamic array as it causes problems in VC++
  + Pointer arithmetic with void* is undefined.
  + Fix casting Value->Barrier
  + Fix typo in setting pointer size.
  + Fix cpu core count detection to use hwloc library
  + Fix autotools for changes in llvm-config (llvm 3.5)
  + basic device driver: Advertise fp64 support to make all ViennaCL tests pass.
  + basic device: * Initialize some device properties to fix some AMD SDK cases. * Disable the 'basicdebug' case of AMD SDK 2.9 as it tests   undefined behavior via a kernel with an out-of-bounds array   access.
  + Make the first device in the device list the default OpenCL device no matter what it is.  Do not force 'pthread' to be the only possibility.
  + Only announce double/half extensions in DeviceInfo, if its really supported
  + Fix issue 113 in autotools

-------------------------------------------------------------------
Sun Aug 10 17:27:06 UTC 2014 -  mardnh@gmx.de

- Update to version 0.9+git.1407683486.cc03c88:
  + LLVM 3.5: added a separate piglit reference file for LLVM 3.5 as it has an additional passing test
  + Fix the image memory leak (among another one) in the basic device driver also.
  + Check for 'basic' device instead of 'pthread' as that should be always working on any platform.
  + test_shuffle:

-------------------------------------------------------------------
Sun Aug 10 10:55:19 UTC 2014 -  mardnh@gmx.de

- Update to version 0.9+git.1407625066.02a5564:
  + pthread: Fixed a mem leak with images.

-------------------------------------------------------------------
Fri Aug  8 17:14:21 UTC 2014 -  mardnh@gmx.de

- Update to version 0.9+git.1407506873.62715ce:
  + LLVM 3.5: XFAIL the block case until the proposed Clang patch has been applied upstream.
  + TargetRegistry used to fall back to the 'cpp' target and this is not the case anymore in LLVM 3.5. Unify the behavior by catching this and assuming no target info is used in this case.
  + The pointer size should be 4B for a 32bit pointer.
  + Fix issue #100

-------------------------------------------------------------------
Thu Aug  7 21:08:38 UTC 2014 -  mardnh@gmx.de

- Update to version v0.9+git.1407424358.6b7e5cb:
  + Update piglit reference to match the latest master. Store the sorted version.
  + Updated piglit reference file to the latest commit in piglit. Made the sorting of the reference more stable by fixing the locale used.
  + Some the AMD tests fail to build in case having a 'wrong' version of GL (happens in Ubuntu 14.04). Build the ones that can be built by using -k switch to the make.
  + Remove references to the old WIVectorizer as it's deprecated and probably breaks with LLVM 3.5.
  + Use size_t to avoid ambiguous Buffer ctor error with Clang 3.4.
  + Skip AMD 2.9's NBody, Mandelbrot, and BinomialOptionMultiGPU if they were not built (due to incompatible GL, for example).
  + Skip AMD 2.9's NBody, Mandelbrot, and BinomialOptionMultiGPU if they were not built (due to incompatible GL, for example).
  + LLVM 3.4: compiling blocks crashes Clang with multiple address spaces. The patch fixes this. Will be submitting upstream.
  + Move PHINodes from the entries of the replicated parallel regions to the entry of the first of the chain.
  + XFAIL the vector kernel args test also if _DEBUG is set.
  + LLVM 3.4: compiling blocks crashes Clang with multiple address spaces. The patch fixes this. Will be submitting upstream.
  + Move PHINodes from the entries of the replicated parallel regions to the entry of the first of the chain.
  + Updated some interesting CHANGES from git log.
  + Add -pthread to LDFLAGS otherwise some tests fail to link when compiling with some compilers such as Clang++ 3.4 or gcc 4.8.

-------------------------------------------------------------------
Thu Jul 31 19:42:34 UTC 2014 - mardnh@gmx.de

- update to version 0.9
- minor specfile cleanup 

-------------------------------------------------------------------
Wed Jan 29 15:15:23 UTC 2014 - guillaume@opensuse.org

- Add boost support

-------------------------------------------------------------------
Wed Jan 29 10:15:22 UTC 2014 - guillaume@opensuse.org

- Initial release 0.9rc3 (based on Fedora SPEC file)