Parallel array_map() with HipHop
Many computational operations, such as iterating over some big list can be sped up if they are parallelized. Parallelization offers the benefit that multiple items in the iterator can be processed at the same time. This is accomplished by using multiple threads, running the processing on multiple CPU cores at the same time.
One condition has to be met before it is possible to use multiple threads to process an iterator: each processing operation on an item in the operator must be independent on the other items. The typical PHP function to use in such cases is array_map(), which allows you to run a function over each item in an array. array_map() prohibits you to access other items in the iterator while processing an item, so the order of execution is not important.
In the past it has been very hard to effectively use parallelization in PHP. However, with Facebook’s Hiphop Facebook engineers have added the option to create threads in PHP using the call_user_func_async() function.
Some caveats apply:
- You can only create two extra threads, next to the main thread that is already running. Creating more threads doesn’t have any effects.
- Creating additional threads is expensive, so you have to carefully consider if it is worthwhile to use them.
- Unlike all other functions in PHP that accept functions as arguments, the call_user_func_async() does not accept anonymous functions as arguments. You need to pass in a string or array callback.
- The call_user_func_async() function is currently deprecated in HipHop and will issue a warning when called.
call_user_func_async() and array_map() can be combined into a parallel version of array_map(). This can be done by splitting the iterator in equal parts and then running the iterator over each part, each part in a separate thread. Finally, you need to put all the pieces back together (synchronize) by calling end_user_func_async() for each thread.
You can find my implementation at GitHub.
In practice the use of this function is limited: the maximum number of threads is three and there is quite a high cost for creating threads. However for complex processing or processing that calls external services (such as databases) that block it might be worth the extra effort.
About this entry
You’re currently reading “Parallel array_map() with HipHop,” an entry on Willem Stuursma
- Published:
- September 8, 2011 / 21:55
- Category:
- Hiphop for PHP, php
- Tags:
- hiphop, linkedin, Parallel computing, performance, php, Threads

1 Comment
Jump to comment form | comment rss [?] | trackback uri [?]