This paper studies the effects of overheads in massively parallel processing. The execution times for tasks on individual processors are modeled as independent and identically but arbitrarily distributed random variables. The time to execute a process fork is assumed to be distributed exponentially. The main result bounds (in expectation) the overhead time in forking a large number of tasks across n machines and then waiting for the join event. The model used is appropriate for massive parallelism (when n is large): in fact, the bound serves as a heavy traffic limit approached as n → ∞ and for task times that are large in comparison to the time to execute a fork. In this model, the expected total time to reach the final join consists of a forking overhead that grows linearly with the number of processors n, a time for parallel execution of tasks that decreases in n, and finally a synchronization delay that is a concave sublinear function of ρ = EX/EA, which is the ratio of expected task time to the expected time needed to fork a new process. This overhead function is typically no worse than o(ρ). An interesting aspect of the analysis is that the original problem reduces to a previously studied problem in queueing theory: estimating total end-to-end delay in an infinite-server resequencing system. The new results thus provide new bounds and heavy-traffic approximations (in distribution and in expectation) for the theory of M/GI/∞ resequencing queues.