With PyWren, AWS Lambda Finds an Unexpected Market in Scientific Computing – InApps Technology 2022

Main Contents:

With PyWren, AWS Lambda Finds an Unexpected Market in Scientific Computing – InApps Technology is an article under the topic Software Development Many of you are most interested in today !! Today, let’s InApps.net learn With PyWren, AWS Lambda Finds an Unexpected Market in Scientific Computing – InApps Technology in today’s post !

In a Pickle

AMPlab supports research across a number of different disciplines, many of which involve examining data across multiple scales. One is solar flares on the sun, the study of which will help better predict the solar storms that would hamper with our power lines and satellite operations. The nearby Solar Dynamics Observatory throws off 1.5TB of observations a day.

Solar Dynamics Observatory

From the every large to the every small, researchers are also looking at how large numbers of neurons combine to way to create behaviors, diseases and even cognition.

Most graduate students, Jonas noted, have never run a Spark or Hadoop job. Their investigations are carried out on laptops or workstations, which sadly limits the breadth of their testing.

One way to scale, obviously, is the cloud services, such as Amazon Web Services, which has bequeathed a generous grant of usage for Berkeley. To the average researcher, however, getting AWS to do something useful is a formidable task. Better to just let the laptop churn a little longer.

Enter PyWren.

PyWren is perfect for jobs such as “parameter tuning.” This could be a job that, say, takes about 5 minutes to run, and it needs to run 1,000 times. The aim of PyWren would be to run that entire workload, all 1,000 instances, in 5 minutes.

A stateless computing function-as-a-service, Lambda gives the user a single process with each instance that can run Python, JavaScript (Node) or Java code for up to 300 seconds, along with 512mb of temporary space and 1.5GB of RAM.

Massively distributed computing did not seem to be one of AWS’ original intents of Lambda, though AWS has acknowledged Jonas work.

The standard use case was one of a Lambda job being triggered by someone uploading an object of some sort to S3, Lambda doing a small function on the object, and delivering the results back to a database. But there is no reason why you couldn’t map a single Python function across 2,000 Lambda services, Jonas noted.

Behind the scenes, PyWren serializes the function with the data, using Python’s Pickle serialization function and a bit of technology borrowed from the PySpark project. PyWren places serialized data and function into S3, then evokes Lambda, along with a slimmed-down version of Anaconda, a packaged version of Python and supporting tools offered by Continuum IO. The results are delivered back to S3, then unpickled, and returned to the user.

To be sure, there are some drawbacks, the engineering tradeoffs, if you will. Network overhead and possible throttling by AWS can slow the submission of jobs, the completion of which can then stagger in. Also, a good 20 percent of most jobs are taken up by the set-up, even if most jobs come nowhere near the 300-second max.

“There is some transactional overhead,” Jonas admitted, though added, “Most of our users don’t care if their serial job takes twice as long as otherwise, because now they can run 3,000 of them at once.”

Jonas envisions over time that Lambda could take on even more complex “stateless” tasks, such as executing full MapReduce jobs, which splits out data analysis across many nodes (the “mapping” part) and then reassembling into a meaningful result (the “reduce” portion). As such, it could simplify a lot of “big data”-styled analysis, he suggested (noting a 2015 Yahoo study that the average Hadoop data set was 15GB, something that could easily fit on a laptop).

Today, PyWren can do mapping portion of MapReduce jobs, but its reducing capability, needed for parameter tuning, is still experimental.

Joans was not so sure, however, that this microservices-styled approach would work with the typical super large-scale scientific workloads carried out on today’s supercomputers. The High-Performance Computing workloads, for tasks such as computational fluid dynamics, tend to have many more threads and larger data sets, as well as rely on GPUs, which Lambda doesn’t yet support.

Although PyWren was designed to work with lambda specifically, the idea could be applied to other serverless platforms, such as the just-launched Fission from Platform9. Jonas said that writing PyWren took him only about a weekend’s worth of time.

Source: InApps.net

Rate this post

Phu Nguyen

As a Senior Tech Enthusiast, I bring a decade of experience to the realm of tech writing, blending deep industry knowledge with a passion for storytelling. With expertise in software development to emerging tech trends like AI and IoT—my articles not only inform but also inspire. My journey in tech writing has been marked by a commitment to accuracy, clarity, and engaging storytelling, making me a trusted voice in the tech community.