V1z: Desfeita a edição 34204 de V1z (Discussão)

2015-03-26T11:21:04Z

Desfeita a edição 34204 de V1z (Discussão)

← Edição anterior		Edição das 08h21min de 26 de março de 2015
Linha 3:		Linha 3:
	== General Info ==		== General Info ==
	* Instructor: prof. [http://www.lems.brown.edu/~rfabbri Ricardo Fabbri], Ph.D. Brown University		* Instructor: prof. [http://www.lems.brown.edu/~rfabbri Ricardo Fabbri], Ph.D. Brown University
	* Meeting times: ~~Thursdays 8~~:~~50am~~-12:20pm~~, room 210~~		* Meeting times: Tues 12:30pm-2pm, Thursdays 2:20pm - 4pm
	* Evaluation criteria: 1 quizz at the end of the term (60%), plus practical projects (40%).		* Evaluation criteria: 1 quizz at the end of the term (60%), plus practical projects (40%).
	* Forum for file exchange and discussion: [http://uerj.tk uerj.tk]		* Forum for file exchange and discussion: [http://uerj.tk uerj.tk]

V1z: /* General Info */

2015-03-26T11:07:49Z

General Info

← Edição anterior		Edição das 08h07min de 26 de março de 2015
Linha 3:		Linha 3:
	== General Info ==		== General Info ==
	* Instructor: prof. [http://www.lems.brown.edu/~rfabbri Ricardo Fabbri], Ph.D. Brown University		* Instructor: prof. [http://www.lems.brown.edu/~rfabbri Ricardo Fabbri], Ph.D. Brown University
	* Meeting times: ~~Tues 12~~:~~30pm~~-~~2pm, Thursdays 2~~:20pm ~~- 4pm~~		* Meeting times: Thursdays 8:50am-12:20pm, room 210
	* Evaluation criteria: 1 quizz at the end of the term (60%), plus practical projects (40%).		* Evaluation criteria: 1 quizz at the end of the term (60%), plus practical projects (40%).
	* Forum for file exchange and discussion: [http://uerj.tk uerj.tk]		* Forum for file exchange and discussion: [http://uerj.tk uerj.tk]

V1z: 2014 archive of PP

2015-03-26T10:54:31Z

2014 archive of PP

Página nova

This is the main page of a graduate-level course in parallel computing being taught in 2012/2 at the Polytechnic Institute [http://pt.wikipedia.org/wiki/IPRJ IPRJ]/UERJ. It is generally useful for programmers at the advanced level in the fields of scientific and multimedia programming.
[[Imagem:Matrix6.png|right|550px]]
== General Info ==
* Instructor: prof. [http://www.lems.brown.edu/~rfabbri Ricardo Fabbri], Ph.D. Brown University
* Meeting times: Tues 12:30pm-2pm, Thursdays 2:20pm - 4pm
* Evaluation criteria: 1 quizz at the end of the term (60%), plus practical projects (40%).
* Forum for file exchange and discussion: [http://uerj.tk uerj.tk]

=== Pre-requisites ===
* Linux - intermediate to advanced (will be reviewed as needed) - read [[Literatura recomendada pela equipe|Recommended Reading]]
* C/C++ - intermediate to advanced (will be reviewed) - read [[Literatura recomendada pela equipe|Recommended Reading]]
** [[Configuring Ubuntu for Programming]]
* Basic understanding of computer architecture - read "Computer Systems: A Programmer's perspective" listed at [[Literatura recomendada pela equipe|Recommended Reading]].

== Approximate Content ==

The course focuses on software techniques for parallel computing. We are aiming at a comprehensive treatment on different types of practical parallel programming techniques
* process-oriented parallel programming
* thread programming/thread safety
* single-core vector instructions
* multi-processor and multi-core programming
* mapreduce/hadoop
* MPI
* Cuda/OpenCL
* Dataflow techniques

Each of the above techniques would merit a course of their own, as done in many of the best universities. Therefore we aim at attaining a practical familiarity with each, in the first half of the course, and we will specialize in the later part of the course as to help the students' graduate research.

== Main Resources ==

* Textbooks
** 1st part of the course: [http://kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.html "Is Parallel Programming Hard, and, if so, what can you do about it?" - Paul E. McKenney / IBM (editor)].
** For MPI: "An Introduction to Parallel Programming" , by Peter Pacheco[http://books.google.com.br/books?id=SEmfraJjvfwC&printsec=frontcover&hl=pt-BR&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false]
** For Cuda/OpenCL: Programming Massively Parallel Processors: A Hands-On Approach[http://books.google.com.br/books?id=qW1mncii_6EC&printsec=frontcover&hl=pt-BR#v=onepage&q&f=false]
** General textbook: A. Grama, A. Gupta, G. Karypis, V. Kumar, Introduction to Parallel Computing, Second Edition, Addison-Wesley, 2003.
** Algorithms-oriented textbook: Algorithms: sequential, parallel, and distributed, Kenneth A. Berman, Jerome L. Paul
*** For more theoretical info, see the chapter on parallel algorits in Cormen's classical algorithms book.
* Mapreduce, GFS, and Bigtable:
** Papers from Google
** http://code.google.com/edu/parallel/mapreduce-tutorial.html
** Wikipedia
** Hadoop documentation
* Presentations from IBM and Intel: Cilk, etc.
** http://software.intel.com/en-us/videos/channel/parallel-programming
* Wikipedia pages
** http://en.wikipedia.org/wiki/Vectorization_%28parallel_computing%29
** http://en.wikipedia.org/wiki/Mapreduce
** http://en.wikipedia.org/wiki/Hadoop
* Rice lecture notes on Parallel Computing [http://www.clear.rice.edu/comp422/lecture-notes/index.html]
* Other Resources
** http://www-users.cs.umn.edu/~karypis/parbook/
** http://www.cse.iitd.ernet.in/~subodh/courses/CSL860/
** Running Hadoop On Ubuntu Linux Multi and Single Node Cluster [http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/]
** Short course on CUDA at [http://webmail.iprj.uerj.br/xvemc/ emc 2012] conference: http://josericardojunior.com/?p=219

=== Lectures ===

''' NOTE: There will be no lectures from Oct 4 - Oct 19 2012'''

==== Partial listing & Tentative Outline ====
# Overview of parallel computing: https://computing.llnl.gov/tutorials/parallel_comp/
# Review of Linux:
## Do the [[Learning the Unix Programming Environment]] tutorial to brush up on your skills
## See the book Running Linux from the [[Literatura_recomendada_pela_equipe| Recommended Reading]]
# Review of C/C++ (finished 28/Aug)
# Fundamental programming techniques: processes and threads
## Read The Unix Programming Environment for some classic multi-process programming[http://wiki.nosdigitais.teia.org.br/Literatura_recomendada_pela_equipe]
## What is a process: http://linuxgazette.net/133/saha.html
## 30/Aug: We'll be following the McKenney Book, ch 3 at the lab.
## Up to (and including) 18/Sept: We followed McKenney's code samples for counting algorithms and his Threading API in detail
### What is variable alignment: [http://www.songho.ca/misc/alignment/dataalign.html] [http://gcc.gnu.org/onlinedocs/gcc-4.0.4/gcc/Type-Attributes.html]
### How to determine the cache line size for your machine: [http://stackoverflow.com/questions/794632/programmatically-get-the-cache-line-size]. This will give you better results for threading algorithms.
### Install some documentation useful for POSIX threading: sudo apt-get install glibc-doc manpages-posix-dev
## 20/Sept: Finished statistic counters and showed how to read assembly language generated by GCC, and how to disassemble binary objects
## 25/Sept: Finished theoretical thread content for the course: up until ch. 5 of McKenney's Book + Semaphores + Thread Safety.
### Homework 2 (see below)
# Cuda/OpenCL (02/Oct)
## Project 1 handout (see below) due Oct. 23
# Mapreduce/Hadoop
# MPI

== Homework ==
=== Homework 1 ===
Our insitute is planning on buying a supercomputer. Therefore, our first homework will consist on studying these systems in detail.

* The class shall be divided into groups of 2 people.
* Each group must pick one top 500 (http://www.top500.org) supercomputer and write on this wiki:
** Team number:
** Supercomputer:

* Each class will start with a 20min presentation from a '''random''' group (or a volunteer group).
* A handout must also be provided on Tue August 28
* Your project must not be a plain copy from wikipedia!

Contents of the presentation and report
* Overview of the system
* Glossary of terms
* '''How to program for the system'''
* Software infrastructure
* How does this differ from a conventional cluster.

'''Groups'''

* Team 1: Julio Stutz & Joel Sanchez
** Supercomputer: '''Tupã''' (Inpe/Brazil) ([http://www.lems.brown.edu/~rfabbri/stuff/parallel-computing/supercomputers/julio-sanchez/tupa4-supercomputer-report.pdf report]) ([http://www.lems.brown.edu/~rfabbri/stuff/parallel-computing/supercomputers/julio-sanchez/tupa4-supercomputer-presentation.pdf slides])
* Team 2: Claudir Oliver & Luiz Rosalba
** Supercomputer: '''Tianhe-1A''' (China) ([http://www.lems.brown.edu/~rfabbri/stuff/parallel-computing/supercomputers/claudir-luiz/tian-he-report.pdf report]) ([http://www.lems.brown.edu/~rfabbri/stuff/parallel-computing/supercomputers/claudir-luiz/tian-he-presentation.pdf slides])
* Team 3: Mateus Guida
** Supercomputer: '''Sequoia''' (Laborátório Nacional Lawrence Livermore - Califórnia - EUA) ([http://www.lems.brown.edu/~rfabbri/stuff/parallel-computing/supercomputers/mateus-guida/sequoia-supercomputer-report.pdf report]) ([http://www.lems.brown.edu/~rfabbri/stuff/parallel-computing/supercomputers/mateus-guida/sequoia-supercomputer-presentation-v2.pdf slides])
* Team 4: Lyvia Aloquio & Stella Oggioni
** Supercomputer: '''Grifo04''' (PETROBRAS/Brazil)
* Team 5: Ricardo Dias
** Supercomputer: '''Mole-8.5''' (Institute of Process Engineering, Chinese Academy of Sciences/China) ([http://www.lems.brown.edu/~rfabbri/stuff/parallel-computing/supercomputers/ricardo/mole-supercomputer-report.pdf report]) ([http://www.lems.brown.edu/~rfabbri/stuff/parallel-computing/supercomputers/ricardo/mole-supercomputer-presentation.pdf slides])
* Team 6: Frederico Santos
** Supercomputer: '''Helios''' (International Fusion Energy Research Centre (IFERC), EU(F4E) - Japan Broader Approach collaboration) ([http://www.lems.brown.edu/~rfabbri/stuff/parallel-computing/supercomputers/frederico-santos/helios-report.pdf report])
* Team 7: Thiago Luiz
** Supercomputer: '''Pleiades''' - NASA/Ames Research Center/ NAS - United States.

=== Homework 2: limit counters ===
Describe the limit counter implementations of sec. 4.3 in McKenney's Book, answering:
# list and explain the advantages of each counter
# explain the main concepts of each method in your words
Due date: Oct 1 (was: Sept 27)

=== Project 1 ===
* The class will be divided into interest groups
* The interest groups will each pick one of the following technologies or else propose another one of their liking.
** mapreduce/hadoop
** MPI
** Cuda/OpenCL
** Hybrid technologies (Distributed + Cuda)
** Languages designed for concurrency: [http://golang.org Google Go], Erlang, etc.
** Programming for the Playstation 3 parallel architecture[http://groups.csail.mit.edu/cag/ps3/index.shtml]
** OpenGL/GLSL
** [http://dl.acm.org/citation.cfm?id=1807184 Google Pregel]
* The project will consist in a series of presentations by the group members individually.
* Grade will be asigned to each presentation individually

====Groups to Project 1====

* '''MPI''': Claudir Oliveira - 25/10/2012 ([http://www.lems.brown.edu/~rfabbri/stuff/parallel-computing/technologies/oliveira/apresentacao.pdf slides]) ([http://www.lems.brown.edu/~rfabbri/stuff/parallel-computing/technologies/oliveira/codigo/ code])
* '''Hybrid technologies (Distributed + Cuda)''': Ricardo Dias ([http://www.lems.brown.edu/~rfabbri/stuff/parallel-computing/technologies/dias/apresentacao.pdf slides]) ([http://www.lems.brown.edu/~rfabbri/stuff/parallel-computing/technologies/dias/codigo/ code])
* '''Cuda/OpenCL''': Joel Sanchez ([http://www.lems.brown.edu/~rfabbri/stuff/parallel-computing/technologies/sanchez/gpu_cuda.pdf slides]) ([http://www.lems.brown.edu/~rfabbri/stuff/parallel-computing/technologies/sanchez/codigo/ code])
* '''OPENGL/GLSL''' : Luiz Rosalba - 08/11/2012 ([http://www.lems.brown.edu/~rfabbri/stuff/parallel-computing/technologies/rosalba/glsl_opengl_apresenta.pdf slides])
* '''mapreduce/hadoop''' : Julio Stutz ([http://www.lems.brown.edu/~rfabbri/stuff/parallel-computing/technologies/stutz/mapreduce-hadoop.pdf slides])
* '''Google Go''' : Mateus Guida ([http://www.lems.brown.edu/~rfabbri/stuff/parallel-computing/technologies/guida/google_go.pdf slides])
* '''Programming for the Playstation 3''' parallel architecture: Thiago Luiz

=== Project 2 ===
* This project will consist on a practical programming problem from the student's research. The student is required to describe a problem from his research and present a parallel implementation of some aspect of it
* This will be evaluated through actual code and accompanying monograph
* Grade: up to 4 extra points in the exam grade

==== Topic Suggestions ====
For those who haven't found a suitable research topic with their own advisors,
here is a list of possible ideas. These ideas can have some of my guidance
during the project, but please let your advisors know about this. Also,
not all ideas may be very easy to develop a project on and may require
some thinking on your part in terms of goals and objectives.

Try to email the original authors for the relevant papers for your project in order to get code.
You have to understand the paper, understand the code, and run the code. You will also
have to write code or a modification of the existing code, although you have limited time for this.

If you need any of the files below but can't access them from home, let me know
and I will email you a copy.

* [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=5543612&url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel5%2F5521877%2F5543135%2F05543612.pdf%3Farnumber%3D5543612 "Design and Implementation of a Wide Area, Large-Scale Camera Network", Kuo et. al. 2010]
** Technology: distributed computing / hadoop
** Abstract
We describe a wide area camera network on a campus
setting, the SCALLOPSNet (Scalable Large Optical Sensor
Network). It covers with about 100 stationary cameras an
expansive area that can be divided into three distinct re-
gions: inside a building, along urban paths, and in a re-
mote natural reserve. Some of these regions lack connec-
tions for power and communications, and, therefore, ne-
cessitate wireless, battery-powered camera nodes. In our
exploration of available solutions, we found existing smart
cameras to be insufficient for this task, and instead designed
our own battery-powered camera nodes that communicate
using 802.11b. The camera network uses the Internet Pro-
tocol on either wired or wireless networks to communi-
cate with our central cluster, which runs cluster and cloud
computing infrastructure. These frameworks like Apache
Hadoop are well suited for large distributed and parallel
tasks such as many computer vision algorithms. We discuss
the design and implementation details of this network, to-
gether with the challenges faced in deploying such a large
scale network on a research campus. We plan to make the
datasets available for researchers in the computer vision
community in the near future.

* "Web-Scale Computer Vision using MapReduce for Multimedia Data Mining", White, Yeh, Lin and Davis 2010
** Technology: Mapreduce
** Abstract: This work explores computer vision applications of the Map-
Reduce framework that are relevant to the data mining com-
munity. An overview of MapReduce and common design
patterns are provided for those with limited MapReduce
background. We discuss both the high level theory and the
low level implementation for several computer vision algo-
rithms: classifier training, sliding windows, clustering, bag-
of-features, background subtraction, and image registration.
Experimental results for the k-means clustering and single
Gaussian background subtraction algorithms are performed
on a 410 node Hadoop cluster.

* "Building Rome on a Cloudless Day", Frahm et al, 2012
** Technology: Cuda
** Abstract:
This paper introduces an approach for dense 3D reconstruc-
tion from unregistered Internet-scale photo collections with about 3 mil-
lion of images within the span of a day on a single PC (“cloudless”). Our
method advances image clustering, stereo, stereo fusion and structure
from motion to achieve high computational performance. We leverage
geometric and appearance constraints to obtain a highly parallel imple-
mentation on modern graphics processors and multi-core architectures.
This leads to two orders of magnitude higher performance on an order
of magnitude larger dataset than competing state-of-the-art approaches.

===== Other interesting works =====
* "GraphLab: A Distributed Framework for Machine Learning in the Cloud", Low etal., 2011
Machine Learning (ML) techniques are indispensable in
a wide range of fields. Unfortunately, the exponential in-
crease of dataset sizes are rapidly extending the runtime
of sequential algorithms and threatening to slow future
progress in ML. With the promise of affordable large-
scale parallel computing, Cloud systems offer a viable
platform to resolve the computational challenges in ML.
However, designing and implementing efficient, provably
correct distributed ML algorithms is often prohibitively
challenging. To enable ML researchers to easily and effi-
ciently use parallel systems, we introduced the GraphLab
abstraction which is designed to represent the computa-
tional patterns in ML algorithms while permitting effi-
cient parallel and distributed implementations.
In this paper we provide a formal description of the
GraphLab parallel abstraction and present an efficient
distributed implementation. We conduct a comprehen-
sive evaluation of GraphLab on three state-of-the-art ML
algorithms using real large-scale data and a 64 node
EC2 cluster of 512 processors. We find that GraphLab
achieves orders of magnitude performance gains over
Hadoop while performing comparably or superior to
hand-tuned MPI implementations.

* "Parallel Data Mining from Multicore to Cloudy Grids", Fox et. al.

We describe a suite of data mining tools that cover clustering,
information retrieval and the mapping of high dimensional data to low dimensions
for visualization. Preliminary applications are given to particle physics,
bioinformatics and medical informatics. The data vary in dimension from low (2-
20), high (thousands) to undefined (sequences with dissimilarities but not vectors
defined). We use deterministic annealing to provide more robust algorithms that
are relatively insensitive to local minima. We discuss the algorithm structure and
their mapping to parallel architectures of different types and look at the
performance of the algorithms on three classes of system; multicore, cluster and
Grid using a MapReduce style algorithm. Each approach is suitable in different
application scenarios. We stress that data analysis/mining of large datasets can be
a supercomputer application.

* "Fast face tracking using parallel particle filter algorithm", Liu et. al., 2009

This paper proposed a multi-cue based face tracking
algorithm with the help of parallel multi-core processing.
Due to illumination and occlusion problems, face tracking
usually does not work stably based on a single cue. Three
different visual cues, color histogram, edge orientation
histogram and wavelet feature, are integrated under the
framework of particle filter to improve the tracking
performance considerably. To handle the huge amount of
computation cost resulted from the introduced multi-cue
strategy, a map-reduce thread model is designed to parallel
and speed up the observation steps. Besides, an online
updating strategy makes our algorithm adaptable to some
slight face rotations. The experimental results demonstrate
that our proposed face tracking algorithm works robustly for
cluttered backgrounds and different illuminations. The
multi-core parallel scheme achieves a good linear speedup
compared to the corresponding sequential algorithms.

* "CudaGIS: Report on the Design and Realization of a Massive Data Parallel GIS on GPUs", Zhang, You,
We report the design and realization of a high-
performance parallel GIS, i.e., CudaGIS, based on the General
Purpose computing on Graphics Processing Units (GPGPU)
technologies. Still under active developments, CudaGIS currently
supports major types of geospatial data (point, polyline, polygon
and raster) and provides modules for spatial indexing, spatial join
and other types of geospatial operations on such geospatial data
types. Experiments have demonstrated 20-40X and 1000-10000X
speedups over serial CPU implementations on main-memory and
disk-resident systems, respectively.

* "A Parallel Clustering Method Study Based on MapReduce", Zhanquan and Fox

Clustering is considered as the most important task in data mining. The goal
of clustering is to determine the intrinsic grouping in a set of unlabeled
data. Many practical application problems should be solved with clustering
method. It has been widely applied into all kinds of areas, such marketing,
biology, library, insurance, earth-quake study, and World Wide Web and so on.
Many clustering methods have been studied, such as k-means, Fisher clustering,
and Koehon clustering and so on. In many kinds of areas, the scale of data set
becomes larger and larger. Classical clustering method will not work to deal
with large scale data set. The study of clustering methods based on large
scale data is considered as an important task. MapReduce is taken as the most
efficient model to deal with data intensive problems. Many data mining methods
based on MapReduce have been studied. In this paper, parallel clustering
method based on MapReduce is studied. The research mainly contributes the
following aspects. Firstly, it determines the initial center objectively.
Secondly, information loss is taken as the distance metric between two
samples. Thirdly, the clustering results are visualized with interpolation MDS
method. The efficiency of the method is illustrated with a practical DNA
clustering problem.

* "Sequential and mapreduce-based algorithms for constructing an in-place multidimensional quad-tree index for answering fixed-radius nearest neighbor queries", Andreica and Tapus

Answering fixed-radius nearest neighbor queries constitutes an
important problem in many areas, ranging from geographic systems to similarity
searching in object databases (e.g. image and video databases). The usual ap-
proach in order to efficiently answer such queries is to construct an index. In this
paper we present algorithms for constructing a multidimensional quad-tree index.
We start with well-known sequential algorithms and then adapt them to the MapRe-
duce computation model, in order to be able to handle large amounts of data. In all
the algorithms the objects are indexed in association with quad-tree cells (or nodes)
which they intersect (plus possibly a few other nearby cells). When processing a
query, multiple quad-tree cells may be searched in order to find the answer.

* "Fast Homing Techniques for Autonomous Robots using Sparse Image Waypoints and Design of Vision-based Indoor Localization as Cloud Computing Services", PhD Thesis, 2011

* "Building and Using a Database of One Trillion Natural-Image Patches", Arietta and Lawrence, IEEECGA 2011

=== Class exercises ===
# In-class assignment (extra .1 in the final test): Compare 3 different implementation of statistic counters: array of per-thread variables, eventuallly consistent counter and GCC __thread variables
## Only '''Julio Stutz''' scored the extra +0.1 on the final exam.

=== Keywords ===
Portuguese: Programação Paralela, Introdução à Computação Paralela

[[Category:Lab Macambira]] [[Category:IPRJ]] [[Category:Redes]]

Introduction to Parallel Computing 2014 - Histórico de revisão

V1z: Desfeita a edição 34204 de V1z (Discussão)

V1z: /* General Info */

V1z: 2014 archive of PP