The advent of supercomputers like Blue Gene/L, XT3, XT4 and Blue Gene/P has given rise to new challenges in parallel programming. Interconnect topology of machines is one such factor which has become crucial in optimizing the performance of applications on clusters with thousands of processors. If the topology of an interconnect is clearly defined, it is possible to minimize the communication volume and balance it evenly across processors. It can be minimized by co-locating communicating entities on nearby objects. This can be done at two stages in the program: 1. during program start-up where we define the initial mapping for migratable entities and static mapping for non-migratable entities, and 2. during load balancing when migratable entities are moved across processors to keep the load evenly balanced. In this thesis, we examine how topological considerations in mapping and load balancing algorithms can help communication and make applications faster. To present concrete results, several applications written in Charm++, 7-point 3- dimensional Stencil, NAMD and LeanCP are considered. Results on 7-point Stencil are presented as a proof of principle because it is a relatively simple to analyze and map. NAMD and LeanCP on the other hand are production codes with thousands of users. NAMD is a classical molecular dynamics application and heavily depends on load balancing for optimal performance. Load balancers deployed in NAMD have been modified to place communicating objects in proximity to one another by considering the communication in multicasts. LeanCP is a quantum chemistry application based on the Car-Parrinello ab-initio Molecular Dynamics (CPAIMD) method. In LeanCP, the default mapping of its several object arrays to processors by the Charm++ runtime has been modified to optimize communication. Results of improvements from these strategies are demonstrated for all these applications on IBM's Blue Gene/L and Cray's XT3.