Large foundation models (FMs) are transforming Earth science by integrating heterogeneous multimodal data, such as multi-platform imagery, gridded reanalysis data, diverse geophysical and geochemical observations, and domain-specific text, to support tasks ranging from basic perception to advanced scientific discovery. This paper provides a unified review of Earth science foundation models (Earth FMs) through two complementary dimensions: depth, which traces the evolution of model capabilities from perception to multimodal reasoning and agentic scientific workflows, and breadth, which summarizes their expanding applications across the atmosphere, hydrosphere, lithosphere, biosphere, anthroposphere, and cryosphere, as well as coupled Earth system processes. Using this framework, we review representative multimodal Earth foundation models and compile more than 200 datasets and benchmarks spanning diverse Earth science tasks and modalities. We further discuss key challenges in multimodal data heterogeneity, scientific reliability and continual updating, scalability and sustainability, and the transition from foundation models to agentic and embodied Earth intelligence, and outline future directions toward more integrated, trustworthy, and actionable AI Earth scientists. Overall, this paper offers a structured roadmap for understanding the development of Earth foundation models from both capability depth and application breadth.
The origin of the large-scale poloidal magnetic field required to power relativistic jets in collapsars remains uncertain. While such a field may be inherited during PNS collapse, the efficiency of this process is unclear, motivating an in situ mechanism to generate poloidal fields out of the predominantly toroidal fields produced by stellar differential rotation. We present the first 3D general-relativistic magnetohydrodynamic collapsar simulations initialized with toroidal magnetic field profiles that closely follows those of pre-collapse stellar models. As the toroidal field in the disk becomes dynamically important, it seeds the dynamo, producing coherent poloidal magnetic loops that appear at $\sim \mathcal{O}(100)$ gravitational radii and are then advected inward along paths that may deviate from the disk midplane. The resulting poloidal fields thread the black hole (BH) and launch highly variable, wobbling relativistic jets on timescales of order seconds, with the onset depending on the initial magnetic field and the plasma circularization radius. Although the jets are highly variable and misaligned with the BH spin axis, they sustain $\gtrsim 10^{50}$ erg s$^{-1}$, comparable to that inferred for long gamma-ray bursts (LGRB). We identify magnetic-flux inversions driven by the stochastic dynamo, leading to the formation of striped jets that could be imprinted in LGRB light curves. These results demonstrate that the accretion0disk dynamo provides a robust pathway for jet production in collapsars across a broad range of progenitors.
High-precision ground-based observations of the inner corona (1.05-2.0 R_sun) are fundamentally constrained by instrumental stray light, particularly the additive background from dynamic dust accumulation on the objective lens. To address this issue, we propose a correction method for the Spectral Imaging Coronagraph (SICG) based on dual-path real-time monitoring and forward physical modeling. By simultaneously imaging the objective lens surface, we obtain deterministic prior information on dust distribution. We construct a physical point-spread function using optical defocus parameters and reconstruct the nonuniform scattering background via convolution. Model parameters are retrieved through data-driven inversion constrained by polar coronal holes. The method demonstrates excellent robustness under varying contamination conditions. After correction, the rms noise in the polar background is reduced by approximately 67% on average, and the signal-to-background ratio improves by a factor of up to 3.7 under heavy contamination conditions. Comparisons with space-based Solar Dynamics Observatory/Atmospheric Imaging Assembly observations indicate that the corrected images recover the morphological structures of streamers with high fidelity. Further radial intensity analysis reveals that the correction process successfully restores the hydrostatic exponential decay characteristic of inner coronal radiation. The fitted decay coefficient corresponds to a plasma temperature of approximately 2.0 MK, consistent with the characteristic formation temperature of the Fe XIV 530.3 nm line. These results demonstrate that the method effectively eliminates the dominant systematic bias in ground-based observations, providing a reliable data foundation for high-precision coronal thermodynamic and dynamic research with the SICG.
Improving the accuracy of photometric redshifts (photo-$z$) is essential for reliable statistical studies of cosmology and galaxy evolution. However, missing photometric bands are a common observational challenge that can significantly degrade photo-$z$ estimation accuracy. In this work, we present a systematic evaluation of data imputation methods aimed at improving photo-$z$ performance. We benchmark a range of representative machine learning (ML) and deep learning (DL) architectures, identifying k-nearest neighbors (KNN) and the attention-based SAITS model as the leading performers. These models are then applied to China Space Station Survey Telescope (CSST) mock data to assess their performance under realistic observational conditions. Our results show that KNN yields the highest accuracy under idealized missing completely at random (MCAR) conditions with complete training sets, whereas robustness tests reveal that SAITS significantly outperforms KNN when training data is incomplete or when applied to realistic mixed-mechanism scenarios. We find that domain consistency between training and testing missingness patterns is a prerequisite for optimal performance, highlighting the risks of domain shift in supervised regression tasks. Furthermore, our analysis demonstrates that while general imputation models are highly effective for MCAR and missing at random (MAR) data, they are detrimental when applied to missing not at random (MNAR) data arising from flux limits, as statistical models fail to capture the physical information inherent in these non-detections. Consequently, we advocate for more sophisticated architectures capable of disentangling stochastic missingness from physical non-detections to address these distinct mechanisms individually.