Optimización de modelos hidrodinámicos 3d del transporte y mezcla aplicados al conocimiento y predicción de masas de agua continental
- Acosta Cobos, Mario César
- Francisco Rueda Valdivia Codirector
- Mancia Anguita López Codirectora
Universidad de defensa: Universidad de Granada
Fecha de defensa: 28 de mayo de 2015
- Julio Ortega Lopera Presidente
- Pedro González Rodelas Secretario
- Ester Martín Garzón Vocal
- Joaquín Ortega Casanova Vocal
- Mario Piotti Vocal
Tipo: Tesis
Resumen
Abstract Water motions in lakes and reservoirs are initiated as a result of thermal and mechanical energy flowing through the lake boundaries, cascading down from the large- basin scale to the micro-scales, and ultimately leading to mixing and dissipation. Mixing patterns induced by basin-scale motions and high-frequency waves in stratified lakes are inherently patchy, setting up horizontal density variations. Spatially complex horizontal flow patterns develop in response to horizontal density gradients, when they are modulated by direct wind forcing and the Earth¿s rotation. It is through these horizontal motions that the system readjusts, returning to its equilibrium state with lines of equal density coinciding with geopotential surfaces. Horizontal density gradients have been shown to develop in response to a wide range of mechanisms with different spatial and temporal scales, including river inflows, thermal spatial variations response to surface heat fluxes, uneven mixing due to basin scale motions, spatially varying surface wind mixing, upwelling or mixing in the benthic boundary layer. Given the number of mechanisms leading to the development of baroclinic pressure gradients, and the wide range of spatio-temporal scales of these processes the problem of describing horizontal circulation in lakes and reservoirs has been elusive. Only in the last few years, advances in remote sensing and quantitative imaging capable of retrieving surface velocity and temperature fields, and the use of three-dimensional (3D) numerical models solving the equations of motion applied to simulate lake motions with sufficient temporal and spatial resolution has allowed aquatic scientists to describe and understand the complex horizontal circulation patterns that develop in lakes and reservoirs, leading to horizontal transport and heterogeneity. Most of these models are based on the solution of three-dimensional form of the shallow-water equations 3D-SWE, a simplified form of the Reynolds averaged Navier-Stokes (RANS) equations, subject to the appropriate boundary conditions. Practical computational limits and a priori scaling analyses justify the use of the 3D-SWE in the description of these large-scale flows. But, even if using 3-D SWE models, engineers and scientists still face a serious challenge when trying to simulate horizontal transport and circulation in the near-shore environments of large water bodies. These are hot-spots in the aquatic ecosystems with large biodiversity and critical habitats for many organisms in lakes. But, being the nexus of human interactions with lakes, littoral habitats are highly modified by human uses. Humans build structures, recreate, fish, extract water, or dispose sewage at lake edges. Significant inputs of nutrients also arrive to the coastal zone through rivers, as a result of agricultural practices or cattle-raising in the contributing watershed. Hence, there is an increasing need to understand near-shore environments where human uses and natural wild-live compete for resources. Simulating these environments, however, is not simple. Littoral habitats can be substantially heterogeneous in both vertical and horizontal dimensions. Moreover, physical conditions exhibit continuous and very dynamic changes, at short-time scales, as a result of strong hydrodynamic forcing and the weak inertia of shallow layers, and also as a result of the time varying nature of human activities. Furthermore, near-shore regions, though, cannot be understood in isolation from the pelagic. Hence, in trying to simulate the near-shore physical conditions both local and basin-scale circulation features need to be resolved simultaneously during long periods of time, and the computational cost of these simulations can be formidable. Parallel computation platforms are being increasingly demanded and used in the area of Civil and Environmental Engineering, and others research areas, to reduce the computational time required to conduct numerical simulations of real systems, or deal with even larger problems. In this dissertation a series of solutions are proposed and tested to reduce the computational cost of 3D hydrodynamic models, so that simulations of water motion and transport in near-shore regions of large geophysical systems, during extended periods of time and using high resolution grids, can be conducted with acceptable execution time and using accessible resources. Several approaches are introduced and studied. First, we explore nesting-procedures in which only localized regions of the littoral zone are simulated using a very high-resolution or inner-model, with boundary conditions which are provided by an outer-model that solves the large-scale processes in the rest of the water body. Second, we use parallel computation techniques so that use can be made of mid-range or low-cost platforms to run the inner and outer models simultaneously. We also optimize and parallelize the model computations used in small commodity clusters, dividing the calculations in the near-shore regions among a large number of processors, as they become available in parallel platforms. The resulting optimized and parallelized model of near-shore regions scales almost linearly, so that the computational model is run faster as more resources are used. We have additionally taken additional steps to adapt the hydrodynamic model to the architecture and tools available. With this work, we demonstrate that the adaptation and optimization of the model to the available resources can also be used to reduce significantly the computational cost, with very good scalability results even using High performance platforms. The implementation was carried out on a particular 3D-SWE, which was originally implemented for serial architectures by P. E. Smith (2006) (Figure a.1). Most improvements proposed here, though, can be applicable to other similar models. The governing equations in Smith¿s model are solved using a semi-implicit, three-level, iterative leapfrog-trapezoidal finite difference algorithm on a staggered Cartesian grid. The semi-implicit approach is based on treating the surface gravity wave and vertical diffusion terms implicitly to avoid time-step limitations as strict as in an explicit case due to gravity wave Courant¿Friedrich¿Levy (CFL) conditions, and to guarantee the stability of the method. All other terms including advection are treated explicitly. Although this semi-implicit approach avoids strict time-step limitations, it also has the disadvantage compared to completely explicit approach that long system of equations must be formed and solved, usually using iterative methods which are very difficult to parallelize. Semi-implicit 3D-SWE models form a pentadiagonal system of equations which is symmetric positive definite. The system of equations is solved by an iterative method widely used known as Conjugate Gradient (CG). Additionally, the CG is usually applied using a preconditioner (PCG), which reduces the number of iterations of the CG to converge to a correct solution within a tolerance. The semi-implicit model used in this dissertation will be referred to as Si3D (Figure a.1). Highlight that the original Si3D model was modified with several basic optimizations (Basic Si3D, Figure a.1). These optimizations are used in all the implementations made (although this implementation is the first step, it is covered in detail in the last chapter, where it is explained all the steps involved in the complete optimization of a hydrodynamic model). The approaches proposed were first applied to simulate synthetic scenarios, and then used to conduct realistic simulations in two systems: a 30 km reach along the Sacramento River, and, Lake Tahoe, both of them in California (USA). Furthermore, some of the implementations developed in this thesis are being used in other study cases, including Beznar Reservoir (Spain), the confluence between Ebro and Segre Rivers in the upstream end of Ribarroja Reservoir (Spain), Lake Cayuga (USA) and Lake Tanganyika (Africa). The Sacramento River model is used to understand the influence of tidal river dynamics on the migration of juvenile salmon towards the ocean and to reproduce the lateral and secondary circulations in the area of channel meanders. The Lake Tahoe model is used to study near-shore transport processes controlling the fate of contaminants released by humans in beaches, and the migration pathways of planktonic larvae of invasive species from bays where they have been able to settle to other beaches and bays which are free. The nesting procedure developed and implemented in Si3D (N-Si3D, Figure a.1), is a one-way nesting method, in which all the necessary information is transferred from the outer- (or low resolution) basin scale model to the inner- (or high resolution) model, to solve the dependencies in the discretized equations of the latter. It optionally adds a 3D relaxation area, which allows a smooth transition from the inflow in the nested model and prevents reflections at the border in the outflow due to possible differences in the solution of both models when the simulation progresses over time. Furthermore, the inner and outer models are executed simultaneously in parallel in an online mode, following a pipeline structure. In the parallel one-way implementation all the information required is passed from the outer to the inner model without storing any information in files, both the nested open boundary and the 3D relaxation area, and allows the transfer of information even each time-step, eliminating possible errors by temporal interpolation. In addition, communications are overlapped with calculation, so it does not represent an overhead in the implementation. The nested approach was tested using both synthetic and realistic simulations. The nested implementation was in all cases validated by comparing the results of simulations in a small region (sub-domain) of a lake or river model. These differences in results between the nested and the complete model (error) are very small and even 0, using the same grid resolution both in the nested- or inner-model and the complete or outer-model and when the pentadiagonal matrix for water surface elevation built in the semi-implicit model was solved using a direct method, which is computationally demanding but exact. Through these case examples, we demonstrate that the tangential velocities need to be transferred from the low resolution to the high resolution model. If not they can affect significantly the quality of the nested solution, in particular when currents parallel to the inner-outer boundary are strong. This was the case, for example, of the high-resolution simulations of a river bend (Claksburg bend) along the Sacramento River, where strong lateral circulation develops upstream the inner-domain. The nested model results agree well with observations previously reported in the literature. Furthermore, the nested-model results compare well with the results from the high-resolution model of the whole reach, with differences (Normal Root Square Error, NRMSE) that are less than 4%. In other environments, with weaker currents, though, the need for passing tangential velocity information is not that strong. This was the case when simulating the local-scale circulation in a small bay (Marla Bay) of Lake Tahoe. In our realistic simulations of both the river bend in Sacramento River and in Marla Bay, Lake Tahoe, the use of a high-resolution grid in the inner-model reveals flow features which cannot be simulated with the low-resolution basin-scale model. Using the nesting procedure developed in Chapter 2 we have explored approaches to conduct high-resolution simulations of extended near-shore regions. In particular, we are interested in simulating the littoral perimeter of Lake Tahoe. As reviewed by Rao and Schwab (2007), currents in the near-shore are largely aligned along isobaths, hence, creating strong physical links among beaches and bays existing along lake perimeters. The extension of the inner-domain (the littoral perimeter of a large lake) in these simulations can be large, and, high-resolution simulations in this domain can be very computationally demanding. The solution proposed consists of dividing the high-resolution computations of the littoral fringe among several cores/computers. Modifying a semi-implicit model to conduct parallel computations, though, is not free of difficulties. Probably the computational stage in a semi-implicit 3D model which poses the largest difficulties to parallelize is the solution with iterative methods of the pentadiagonal matrix problem for the free surface elevation. In Si3D the matrix problem is solved using a Preconditioned Conjugate Gradient method. Its implementation in parallel requires numerous communications, both one-to-many/many-to-one where all sub-domains are involved and communications between neighbor sub-domains. In addition, the need of using reordering (as red-black ordering) or other techniques typically used to parallelize iterative methods produces an important overhead which may reduce significantly the scalability of the algorithm and increases the complexity of the implementation. A first and plausible approach to parallelize the model computations consists of avoiding the parallelization of the pentadiagonal matrix solution. This can be achieved by (1) creating the pentadiagonal matrix in parallel by splitting the workload, through domain decomposition, among several threads/processes of the operating system, (2) then solving the matrix problem sequentially in one of the threads/processes; and, (3) distributing the solution of the matrix problem among all the threads/processes. Communications among processes in this stage occur twice: just before (many-to-one) and after (one-to-many) the matrix solution. This parallel implementation of Si3D (P-Si3D, Figure a.1) only works properly when the computational cost of the matrix solution represents only a small fraction (up to 2%) of the total runtime. This approach, however, by itself, scales poorly as a result of the overhead introduced by the communications involved in constructing the matrix and distributing the solution among processes and the sequential execution of the pentadiagonal matrix solution. Still one can use the parallel implementation of Si3D, conjunctively with the nesting procedures, to conduct high-resolution simulations of the littoral fringe. This implementation is here referred to as P/N-Si3D (Figure a.1) and can be used to conduct high-resolution near-shore simulations in small clusters. The cost of P/N-Si3D is significantly reduced, compared to the sequential version of the model, as a result of two strategies (1) only using a high-resolution grid in the nested near-shore model; and (2) simulating the littoral fringe using the parallel version of the model P-Si3D. This implementation was used to simulate the dispersion of passive tracers released at several locations along the perimeter of Lake Tahoe. These tracer simulations were intended to represent the fate of water constituents entering the lake through a total of 51 outfalls existing around Lake Tahoe, discharging storm-water directly into the lake. The results of the model are presented and analyzed for a particular bay in the southeast of Lake Tahoe (Marla Bay). Tracer concentrations within the bay are partly explained as a result of tracer being released locally within the bay, but also as a result of long-shore currents carrying exogenous tracer released outside the bay, along the southern coast of Lake Tahoe. The concentration of exogenous tracer peaks during periods of strong winds, when water from the South is rapidly transported and trapped in the bay as a result of the development of local bay scale eddies. Given that P-Si3D does not scale correctly, P/N-Si3D does not either. A scalable parallel and nested implementation, though, can be constructed if the free surface solution of the outer- low-resolution model is used as boundary conditions so that the high resolution nested model is divided into subdomains of the littoral fringe where each subdomain can solve a pentadiagonal matrix sub-problem independently. With this implementation, we can avoid collective communications many-to-one/one-to-many which limit scalability. The resulting implementation uses an adapted pipeline structure in addition to the domain decomposition of P-Si3D. The pipeline structure allows the runtime of the low resolution model added has no effect on the total execution time. We will refer to this implementation as SP-Si3D. It has been evaluated and validated in a cluster of 9 nodes using a high-resolution nested model and a low resolution model of Lake Tahoe. The results show a linear scalability when the number of subdomains used to distribute the high resolution littoral zone of Lake Tahoe increases and an error (NRMSE, comparing with the results of a complete high resolution model), very small. Finally, several approaches are proposed and tested to optimize P-Si3D by adapting the parallel version of the code to the available architecture, for both distributed machines and shared memory platforms. The resulting implementation is called OP-Si3D (Figure a.1). Our goal was to reduce the overhead introduced by a parallel implementation. Among other approaches tested, we parallelized the pentadiagonal matrix solution using a new Modified Incomplete Cholensky preconditioner (MIC) which we propose that does not add any communications or reordering. Using this matrix solution method one gets better time execution results compared to other implementations used in the literature, both sequential and parallel methods. We proved that a parallel method scales poorly if either a sequential implementation or a non-optimized parallel implementation of the matrix solution is used, even though other stages are fully optimized. The resulting implementation presented here has good scalability in high performance plataforms with more than 10 nodes.