Wednesday, June 17, 2026
No menu items!
HomeNatureOptical metasurfaces for general vision processing on the edge

Optical metasurfaces for general vision processing on the edge

  • Shanahan, M., McDonell, K. & Reynolds, L. Role play with large language models. Nature 623, 493–498 (2023).

    Article 
    ADS 
    CAS 
    PubMed 

    Google Scholar
     

  • Singhal, K. et al. Large language models encode clinical knowledge. Nature 620, 172–180 (2023).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Shastri, B. J. et al. Photonics for artificial intelligence and neuromorphic computing. Nat. Photonics 15, 102–114 (2021).

    Article 
    ADS 
    CAS 

    Google Scholar
     

  • Bernstein, L. et al. Single-shot optical neural network. Sci. Adv. 9, eadg7904 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Zheng, H. et al. Multichannel meta-imagers for accelerating machine vision. Nat. Nanotechnol. 19, 471–478 (2024).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Zheng, H. et al. Meta-optic accelerators for object classifiers. Sci. Adv. 8, eabo6410 (2022).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Luo, M. et al. Meta-optics based parallel convolutional processing for neural network accelerator. Laser Photonics Rev. 18, 2300984 (2024).

    Article 
    ADS 

    Google Scholar
     

  • Liu, C. et al. A programmable diffractive deep neural network based on a digital-coding metasurface array. Nat. Electron. 5, 113–122 (2022).

    Article 

    Google Scholar
     

  • Shen, Y. et al. Deep learning with coherent nanophotonic circuits. Nat. Photon. 11, 441–446 (2017).

    Article 
    ADS 
    CAS 

    Google Scholar
     

  • Ashtiani, F., Geers, A. J. & Aflatouni, F. An on-chip photonic deep neural network for image classification. Nature 606, 501–506 (2022).

    Article 
    ADS 
    CAS 
    PubMed 

    Google Scholar
     

  • Feldmann, J. et al. Parallel convolutional processing using an integrated photonic tensor core. Nature 589, 52–58 (2021).

    Article 
    ADS 
    CAS 
    PubMed 

    Google Scholar
     

  • Lin, X. et al. All-optical machine learning using diffractive deep neural networks. Science 361, 1004–1008 (2018).

    Article 
    ADS 
    MathSciNet 
    CAS 
    PubMed 

    Google Scholar
     

  • Zhou, T. et al. Large-scale neuromorphic optoelectronic computing with a reconfigurable diffractive processing unit. Nat. Photonics 15, 367–373 (2021).

    Article 
    ADS 
    CAS 

    Google Scholar
     

  • Antonik, P., Marsal, N., Brunner, D. & Rontani, D. Human action recognition with a large-scale brain-inspired photonic computer. Nat. Mach. Intell. 1, 530–537 (2019).

    Article 

    Google Scholar
     

  • Wang, T. et al. Image sensing with multilayer nonlinear optical neural networks. Nat. Photon. 17, 408–415 (2023).

    Article 
    ADS 
    CAS 

    Google Scholar
     

  • Xia, F. et al. Nonlinear optical encoding enabled by recurrent linear scattering. Nat. Photon. 18, 1067–1075 (2024).

    Article 
    ADS 
    CAS 

    Google Scholar
     

  • Luo, X. et al. Metasurface-enabled on-chip multiplexed diffractive neural networks in the visible. Light Sci. Appl. 11, 158 (2022).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Huang, C. et al. A silicon photonic–electronic neural network for fibre nonlinearity compensation. Nat. Electron. 4, 837–844 (2021).

    Article 
    CAS 

    Google Scholar
     

  • Fu, T. et al. Photonic machine learning with on-chip diffractive optics. Nat. Commun. 14, 70 (2023).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Dong, B. et al. Partial coherence enhances parallelized photonic computing. Nature 632, 55–62 (2024).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Xu, Z. et al. Large-scale photonic chiplet Taichi empowers 160-TOPS/W artificial general intelligence. Science 384, 202–209 (2024).

    Article 
    ADS 
    CAS 
    PubMed 

    Google Scholar
     

  • McMahon, P. L. The physics of optical computing. Nat. Rev. Phys. 5, 717–734 (2023).

    Article 

    Google Scholar
     

  • Yildirim, M., Dinc, N. U., Oguz, I., Psaltis, D. & Moser, C. Nonlinear processing with linear optics. Nat. Photon. 18, 1076–1082 (2024).

    Article 
    ADS 
    CAS 

    Google Scholar
     

  • Goi, E. et al. Nanoprinted high-neuron-density optical linear perceptrons performing near-infrared inference on a CMOS chip. Light Sci. Appl. 10, 40 (2021).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Chen, Y. et al. All-analog photoelectronic chip for high-speed vision tasks. Nature 623, 48–57 (2023).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Wetzstein, G. et al. Inference in artificial intelligence with deep optics and photonics. Nature 588, 39–47 (2020).

    Article 
    ADS 
    CAS 
    PubMed 

    Google Scholar
     

  • Feng, H. et al. Integrated lithium niobate microwave photonic processing engine. Nature 627, 80–87 (2024).

    Article 
    ADS 
    CAS 
    PubMed 

    Google Scholar
     

  • Xu, X. et al. 11 TOPS photonic convolutional accelerator for optical neural networks. Nature 589, 44–51 (2021).

    Article 
    ADS 
    CAS 
    PubMed 

    Google Scholar
     

  • Liu, Z. et al. Swin Transformer: hierarchical vision transformer using shifted windows. In Proc. 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 10012–10022 (IEEE, 2021).

  • Cui, K. et al. Spectral convolutional neural network chip for in-sensor edge computing of incoherent natural light. Nat. Commun. 16, 81 (2025).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Wei, K. et al. Spatially varying nanophotonic neural networks. Sci. Adv. 10, eadp0391 (2024).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Qu, G. et al. All-dielectric metasurface empowered optical-electronic hybrid neural networks. Laser Photonics Rev. 16, 2100732 (2022).

    Article 
    ADS 
    CAS 

    Google Scholar
     

  • Rahimi, A. & Recht, B. Random features for large-scale kernel machines. In Proc. 21st International Conference on Neural Information Processing Systems (NIPS’07) 1177–1184 (Curran Associates, 2007).

  • Choromanski, K. M. et al. Rethinking attention with performers. In Proc. International Conference on Learning Representations (ICLR 2021) (ICLR, 2021).

  • Zhang, Y. et al. Image super-resolution using very deep residual channel attention networks. In Proc. European Conference on Computer Vision (ECCV) 286–301 (CVF, 2018).

  • Wang, Q. et al. ECA-net: efficient channel attention for deep convolutional neural networks. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 11534–11542 (CVF, 2020).

  • Vaswani, A. et al. Attention is all you need. In Proc. 31st International Conference on Neural Information Processing Systems (NIPS’17) 6000–6010 (Curran Associates, 2017).

  • Dosovitskiy, A. et al. An image is worth 16×16 words: transformers for image recognition at scale. In Proc. International Conference on Learning Representations (ICLR 2021) (ICLR, 2021).

  • Cordts, M. et al. The Cityscapes dataset for semantic urban scene understanding. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 3213–3223 (CVF, 2016).

  • Perazzi, F. et al. A benchmark dataset and evaluation methodology for video object segmentation. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 724–732 (CVF, 2016).

  • Jocher, G. Ultralytics YOLOv5. https://github.com/ultralytics/yolov5 (2020).

  • Zhu, X. et al. Deformable DETR: deformable transformers for end-to-end object detection. In Proc. International Conference on Learning Representations (ICLR 2021) (ICLR, 2021).

  • Cheng, B., Misra, I., Schwing, A. G., Kirillov, A. & Girdhar, R. Masked-attention Mask Transformer for universal image segmentation. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 1290–1299 (CVF, 2022).

  • Pan, H., Hong, Y., Sun, W. & Jia, Y. Deep dual-resolution networks for real-time and accurate semantic segmentation of traffic scenes. IEEE Trans. Intell. Transp. Syst. 24, 3448–3460 (2022).

    Article 

    Google Scholar
     

  • Xie, E. et al. SegFormer: simple and efficient design for semantic segmentation with transformers. Adv. Neural Inf. Process. Syst. 34, 12077–12090 (2021).


    Google Scholar
     

  • Ranftl, R., Bochkovskiy, A. & Koltun, V. Vision transformers for dense prediction. In Proc. IEEE/CVF International Conference on Computer Vision (ICCV) 12179–12188 (CVF, 2021).

  • Bhat, S. F., Alhashim, I. & Wonka, P. AdaBins: depth estimation using adaptive bins. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 4009–4018 (CVF, 2021).

  • Yang, L. et al. Depth anything: unleashing the power of large-scale unlabeled data. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 10371–10381 (CVF, 2024).

  • Ranftl, R., Lasinger, K., Hafner, D., Schindler, K. & Koltun, V. Towards robust monocular depth estimation: mixing datasets for zero-shot cross-dataset transfer. IEEE Trans. Pattern Anal. Mach. Intell. 44, 1623–1637 (2020).

    Article 
    ADS 

    Google Scholar
     

  • Zitova, B. & Flusser, J. Image registration methods: a survey. Image Vis. Comput. 21, 977–1000 (2003).

    Article 

    Google Scholar
     

  • Bergevin, R., Soucy, M., Gagnon, H. & Laurendeau, D. Towards a general multi-view registration technique. IEEE Trans. Pattern Anal. Mach. Intell. 18, 540–547 (1996).

    Article 
    ADS 

    Google Scholar
     

  • Ravi, N. et al. Sam 2: Segment anything in images and videos. In Proc. International Conference on Learning Representations (ICLR 2025) (ICLR, 2025).

  • LeCun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998).

    Article 
    ADS 

    Google Scholar
     

  • Xiao, H., Rasul, K. & Vollgraf, R. Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. Preprint at https://arxiv.org/abs/1708.07747 (2017).

  • Schüldt, C., Laptev, I. & Caputo, B. Recognizing human actions: a local SVM approach. In Proc. 17th International Conference on Pattern Recognition (ICPR 2004) Vol. 3, 32–36 (IEEE, 2004).

  • Zheng, Z., Wei, Y. & Yang, Y. University-1652: a multi-view multi-source benchmark for drone-based geo-localization. In Proc. 28th ACM International Conference on Multimedia 1395–1403 (ACM, 2020).

  • Berman, M., Triki, A. R. & Blaschko, M. B. The Lovász-softmax loss: a tractable surrogate for the optimization of the intersection-over-union measure in neural networks. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 4413–4421 (CVF, 2018).

  • Sandler, M., Howard, A., Zhu, M., Zhmoginov, A. & Chen, L.-C. MobileNetV2: inverted residuals and linear bottlenecks. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 4510–4520 (CVF, 2018).

  • Han, K. et al. GhostNet: more features from cheap operations. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 1580–1589 (CVF, 2020).

  • Han, K. et al. Model Rubik’s cube: twisting resolution, depth and width for tinynets. Adv. Neural Inf. Process. Syst. 33, 19353–19364 (2020).


    Google Scholar
     

  • Tan, M. & Le, Q. EfficientNet: rethinking model scaling for convolutional neural networks. In Proc. 36th International Conference on Machine Learning 6105–6114 (PMLR, 2019).

  • Ren, S., He, K., Girshick, R. & Sun, J. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149 (2016).

    Article 
    ADS 
    PubMed 

    Google Scholar
     

  • He, K., Gkioxari, G., Dollár, P. & Girshick, R. B. Mask R-CNN. In Proc. IEEE International Conference on Computer Vision (ICCV) 2961–2969 (CVF, 2017).

  • Lin, T.-Y., Goyal, P., Girshick, R. B., He, K. & Dollár, P. Focal loss for dense object detection. In Proc. IEEE International Conference on Computer Vision (ICCV) 2980–2988 (CVF, 2017).

  • Tan, M., Pang, R. & Le, Q. V. EfficientDet: scalable and efficient object detection. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 10781–10790 (2020).

  • Liu, S. et al. Grounding DINO: marrying DINO with grounded pre-training for open-set object detection. In Proc. European Conference on Computer Vision (ECCV 2024) 38–55 (Springer, 2025).

  • Ronneberger, O., Fischer, P. & Brox, T. U-net: convolutional networks for biomedical image segmentation. In Proc. Medical Image Computing and Computer-Assisted Intervention (MICCAI 2015) 234–241 (Springer, 2015).

  • Zhao, H., Shi, J., Qi, X., Wang, X. & Jia, J. Pyramid scene parsing network. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2881–2890 (CVF, 2017).

  • Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F. & Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proc. European Conference on Computer Vision (ECCV) 801–818 (CVF, 2018).

  • Eigen, D., Puhrsch, C. & Fergus, R. Depth map prediction from a single image using a multi-scale deep network. In Proc. 28th International Conference on Neural Information Processing Systems (NIPS’14) 2366–2374 (MIT Press, 2014).

  • Wofk, D., Ma, F., Yang, T.-J., Karaman, S. & Sze, V. FastDepth: fast monocular depth estimation on embedded systems. In Proc. 2019 International Conference on Robotics and Automation (ICRA) 6101–6108 (IEEE, 2019).

  • Hazirbas, C., Ma, L., Domokos, C. & Cremers, D. FuseNet: incorporating depth into semantic segmentation via fusion-based CNN architecture. In Proc. Asian Conference on Computer Vision (ACCV 2016) 213–228 (Springer, 2017).

  • Peng, J. Code for optical metasurfaces for general vision processing on the edge. Zenodo https://doi.org/10.5281/zenodo.19382032 (2026).

  • RELATED ARTICLES

    Most Popular

    Recent Comments