You are here: Home Readings

Readings

List of readings that supplement the course materials. This list will be updated throughout the semester. The readings are optional unless otherwise specified.

.

Volunteer and distributed computing.
  • [Nov 13] A. Park, R. Fujimoto. "A scalable framework for parallel discrete event simulations on desktop grids." In Proc. IEEE/ACM Int'l. Conf. on Grid Computing, pp. 185--192, Sept. 2007. [PDF]
  • [Nov 13] M.R. Shirts, V. Pande. "Mathematical foundations of ensemble dynamics." Phys. Rev. Lett., 86(22):4893--4897, 2001. [PDF]
Parallel I/O and file systems.
  • [Nov 11] J. May. Parallel I/O for high-performance computing. Morgan-Kaufmann, 2000. [Amazon]
Compilers: Auto-{vector,parallel}ization.
  • [Oct 29] Cédric Bastoul."Code generation in the polyhedral model is easier than you think."In Proc. PACT, pp. 7--16, 2004. [PDF]
  • [Oct 29] U. Bondhugula, A. Hartono, J. Ramanujam, P. Sadayappan. "A practical automatic polyhedral parallelizer and locality optimizer." In Proc. PLDI, pp. 101--113, 2008. [PDF]
  • [Oct 27--29] R. Allen, K. Kennedy. Optimizing compilers for modern architectures. Morgan-Kaufmann Publishers, San Francisco, CA, USA, 2002. [WWW]
  • [Oct 27--29] D.F. Bacon, S.L. Graham, O.J. Sharp. "Compiler transformations for high-performance computing." ACM Computing Surveys, 26(4):345--420, 1994. [PDF]
  • [Oct 27--29] S. Chatterjee, E. Parker, P.J. Hanlon, A.R. Lebeck. "Exact analysis of the cache behavior of loops." ACM SIGPLAN Notices, 36(5):286--297, May 2001. [PDF]
Forward-looking programming models.
  • [Oct 21] E. Lusk, K. Yelick. "Languages for high-productivity computing: The DARPA HPCS language project." Parallel Processing Letters, 17(1), 89--102, 2007. [PDF]
  • [Oct 21] R.F. Barrett, S.R. Alam, V.F. d'Almeida, D.E. Bernholdt, W.R. Elwasif, J.A. Kuehn, S.W. Poole, A.G. Shet. "Exploring HPCS languages in scientific computing." In Proc. SciDAC, Journal of Physics: Conference Series, 125, 2008. [PDF]
  • [Oct 21] R. Murphy (ed). "Workshop on programming languages for high performance computing (HPCWPL): Final Report." Technical report SAND2007-2047, Sandia National Laboratories, Albuquerque, NM, USA, Dec. 2006. [PDF]
  • [Oct 21] H. Zima (ed). "Workshop on high productivity programming languages and models." Santa Monica, CA, USA, May 2004. [PDF]
  • [Oct 16] K. Knobe, C.D. Offner. "TStreams: How to write a parallel program." Technical report HPL-2004-193, Hewlett-Packard Laboratories, Cambridge, MA, USA, Nov. 2004. [PDF]
Concrete programming models.
  • [Oct 9] LLNL's MPI tutorial. [WWW]
  • [Oct 9] W.W. Carlson, J.M. Draper, D.E. Culler, K. Yelick, E. Brooks, K. Warren. "Introduction to UPC and language specification." Technical report CCS-TR-99-157, IDA Center for Computing Sciences, 1999. [PDF] Also: Tutorial slides from Yelick CS 267 Spring 2006 [PDF]
  • [Oct 9] R.W. Numrich, J. Reid. "Co-array Fortran for parallel programming." ACM SIGPLAN Fortran Forum, 17(2):1--31, 1998. [PDF] Also: Community website [WWW]
  • [Oct 7] R.D. Blumofe, C.E. Leiserson. "Scheduling multithreaded computations by work stealing." J. ACM, 720--748, 1999. [PDF]
  • [Oct 7] Intel Threading Building Blocks tutorial. [PDF]
  • [Oct 4] LLNL's Pthreads tutorial. [WWW]
  • [Oct 4] LLNL's OpenMP tutorial. [WWW]
  • [Oct 4] J. Reinders. Intel Threading Building Blocks, 1st ed., O'Reilly, 2007. [WWW]
  • [Oct 4] P. Tang, P.-C. Yew. "Processor self-scheduling for multiple-nested parallel loops." In Proc. ICPP, 1986. [PDF]
  • [Oct 4] C.P. Kruskal, A. Weiss. "Allocating independent subtasks on parallel processors." IEEE Trans. Soft. Eng., 11(10):1001--1016, Oct. 1985. [PDF]
  • C.P. Polychronopolous, D.A. Kuck. "Guided self-scheduling: A practical scheduling scheme for parallel supercomputers." IEEE Trans. Computers, 36(12):1425--1439, 1987. [WWW]
  • S.E. Lucco. "Adaptive parallel programs." Technical report UCB/CSD-95-864, EECS Dept., University of California, Berkeley, USA, 1994. [PDF]
  • S.F. Hummel, J. Schmidt, R.N. Uma, J. Wein. "Load-sharing in heterogeneous systems via weighted factoring." In Proc. SPAA, pp. 318--328, 1996. [PDF]
Sources of locality in simulation.
  • [Sep 30] K. Asanovic, B.C. Catanzaro, J.J. Gebis, P. Husbands, K. Keutzer, D.A. Patterson, W.L. Plishker, J. Shalf, S.W. Williams, K.A. Yelick. "The landscape of parallel computing research: A view from Berkeley." Technical report UCB/EECS-2006-183, Dept. of Electrical Engineering and Computer Sciences, University of California, Berkeley, USA, Dec. 2006. [PDF]
  • [Sep 25] D. Jefferson. "Virtual time." ACM TOPLAS, 7(3):404--425, 1985. [PDF]
  • [Sep 25] M. Warren, J. Salmon. "A parallel hashed oct-tree n-body algorithm." In Proc. Supercomputing, Dec. 1993. [PDF]
  • [Sep 25] A. Gray, A. Moore. "N-body problems in statistical learning." In Proc. NIPS, 2000. [PDF]
  • [Sep 25] K. Dewdney, "Computer Recreations: Sharks and fish wage an ecological war on the toroidal planet Wa-Tor," Scientific American, Dec. 1984.
Models and metrics.
  • [Sep 23] D. Culler, R. Karp, D. Patterson, A. Sahay, K.E. Schauser, E. Santos, R. Subramonian, T. von Eicken. "LogP: Towards a realistic model of parallel computation." In Proc. PPoPP, May 1993. [PDF]
  • [Sep 23] A. Grama, A. Gupta, V. Kumar. "Isoefficiency: Measuring the scalability of parallel algorithms and architectures." IEEE Parallel and Distributed Technology: Systems and Technology, 1(3):12--21, 1993. [WWW]
  • [Sep 23] C. Bell, D. Bonachea, Y. Cote, J. Duell, P. Hargrove, P. Husbands, C. Iancu, M. Welcome, K. Yelick. "An evaluation of current high-performance networks." In Proc. IPDPS, 2003. [PDF]
  • [Sep 23] L. Snyder. "Type architectures, shared memory, and the corollary of modest potential." Ann. Rev. Comput. Sci., 1:289--317, 1986. [PDF]
Parallel architectures.
  • [Sep 16] A. Grama, A. Gupta, G. Karypis, V. Kumar. Introduction to Parallel Computing, 2nd ed. Addison-Wesley, 2003. [WWW]
  • [Sep 16] N.R. Adiga, et al. "An overview of the BlueGene/L supercomputer." In Proc. ACM/IEEE Conf. on Supercomputing, 2002. [PDF]
  • [Sep 16] J. Brooks. "Keeping computers cool and performance hot." Presentation on the Cray XT5. [PDF slides]
  • [Sep 9] S. Adve, K. Gharachorloo. "Shared memory consistency models: A tutorial." DEC WRL Tech. rep. 95/7, Sep. 1995. [PDF]
  • [Sep 9] L. Lamport. "How to make a multiprocessor computer that correctly executes multiprocess programs." IEEE Trans. on Computers, 28(9), pp. 690--691, Sep. 1979. [PDF]
  • [Sep 4] M.D. Hill, M.R. Marty. "Amdahl's Law in the multicore era." IEEE Computer, July 2008. [WWW; PDF]
Single-core architecture and tuning.
  • [Sep 2] C. Whaley's slides on performance optimization, University of Texas at San Antonio course CS 6463: Fundamentals of High Performance Optimization, Spring 2007. [PDF]
  • [Sep 2] M. Püschel's notes on SSE, Carnegie-Mellon University course 18-645: How to Write Fast Code, Lectures 13--14, Spring 2008. [PDF]
  • [Sep 2] K. Yotov, T. Roeder, K. Pingali, J. Gunnels, F. Gustavson. "An experimental comparison of cache-oblivious and cache-conscious programs." In Proc. SPAA, 2007. [PDF]
  • [Sep 2] M. Lam. "Software pipelining: An effective scheduling technique for VLIW machines." In Proc. PLDI, Atlanta, GA, USA, 1988. [PDF]
  • [Aug 28] K. Yotov, X. Li, G. Ren, M.J.S. Garzaran, D. Padua, K. Pingali, P. Stodghill. "Is search really necessary to generate high-performance BLAS?" In Proc. IEEE, 93(2), 358--386, 2005. [PDF]
  • [Aug 28] K. Goto, R. van de Geijn. "Anatomy of high-performance matrix multiplication." ACM TOMS, 34(3), 2008. [PDF]
  • [Aug 26] M.S. Lam, E.E. Rothberg, M.E. Wolf. "The cache performance and optimizations of blocked algorithms." In Proc. ASPLOS, 1991. [PDF]
  • [Aug 26] P.J. Denning. "The locality principle." CACM, 48(7), pp. 19--24, 2005. [PDF] Describes the intellectual history of the notion of locality.
  • [Aug 21] D. Wall. "Limits of instruction-level parallelism." WRL Research Report 93/6. [PDF]
Document Actions