This space has been made publicly available. No login is required to access this information. Only place information classified as UNRESTRICTED in this space.
AA2 processing scaling - it is not scaling. Current tests failed. Looking into further measures. Failed at distributed processing.
I/O is an expected bottleneck in scaling.
Tests are not good enough to determine where the bottlenecks are.
XRADIO is a python based project. Actual reading on zarr files uses C++ libraries. Reimplementing everything into C++ is probably not feasible. Reading WSClean from C++ should be simple.
DMS at NRAO plans: continuation of XRADIO; starting next round of prototyping ("pilot") looking into different kinds of workflow orchestration (for example, combining Prefect with Dask). Have started writing domain (pure-science) functions that aren't parallel. gridflag, fringefit based on xradio
ARDG - algorithm architecture is being tested for scaling. Testing on 100+ GPUs across US. Deployed architecture on that to process 2TB of VLA wide-band data. Architecture scaled quite well, as expected. There was some data distribution issues not related to algorithm architecture. Will do another run in a few weeks. Expect throughput to improve by a factor of 2. Focussing on throughput metrics, not flops per second.
Current throughput is about 1 TB/hour; expect to go to 2 TB/hour. Using HT Condor.
Distribute along an axis that data is stored in - time, frequency; exploring other axes.
Imaging is done separately and then brought together.
Does not yet include calibration. Next step is to do self-cal. (DI and pointing) Working on deploying that and measuring its scaling.
Is implementation sufficiently decoupled from parallelization framework? So far, yes. Tested on both HT Condor, AWS, and ...?
Should build up a list of tools/software using XRADIO
For immediate future - are we progressing sufficiently along prototyping phase?
Is the amount of effort right?
DMS (Jeff) is happy with the way things are going
As we go forward, we will need more formalism along milestones and deliverables.
Eventually transition from exploratory mode, need to have some deadlines
Should have review of schemas and prove that within that schema, we can deliver on goals for scalability and performance. Starting exercise on these activities on SKA side, but need a milestone for schemas. This should be the basis for v1.0 of schema.
SKA - AA2 pipeline scaling tests by end of the year. This likely next milestone. NRAO - also doing prototyping testing around the same time. Determine what scaling looks like and exploring different architectures using the data schema and identification of bottlenecks.
Timeline
Schema is largely complete and is awaiting feedback from testing. Will need to incorporate feedback.
Sept 2024 - schema documentation complete and prototyping documentation complete.
Review by end of the year.
Action
Nick/Jeff - who and what are being reviewed (Jeff visiting in March)
Contributing institutes involved in review process to effectively agree they're willing to use (does this become an IAU standard?).
List of organisations interested in contributing/participating in the review. Jan-Willem has a starting list of people involved.
April - Management steering committee meeting.
Revisit potential for non-CALIM
post-review perhaps smaller meeting to discuss what's been done with the testing
Describe what kind of more algorithm-focussed meeting we would want - gather an SOC for this
.10 min
Meetings moving forward - another F2F, Cal-IM reboot with focus on algorithms, leveraging existing conferences?
Add Comment