publications
publications by categories in reversed chronological order.
2026
- Preprint
OSDN: Improving Delta Rule with Provable Online Preconditioning in Linear AttentionarXiv preprint arXiv:2605.13473, 2026Linear attention and state-space models offer constant-memory alternatives to softmax attention, but often struggle with in-context associative recall. The Delta Rule mitigates this by writing each token via one step of online gradient descent. However, its step size relies on a single scalar gate that ignores the feature-wise curvature of the inner objective. We propose Online Scaled DeltaNet (OSDN), which augments the scalar gate with a diagonal preconditioner updated online via hypergradient feedback. Crucially, this right-preconditioning is algebraically equivalent to a per-feature scaling of the write-side key. This equivalence allows OSDN to strictly preserve the hardware-friendly chunkwise parallel pipeline of DeltaNet without incurring high-dimensional state overhead. Theoretically, by exploiting the exact-quadratic structure of the inner regression loss, we establish super-geometric convergence against a right-Newton comparator and prove an algorithm-aligned token-local residual contraction bound. To handle non-stationary contexts, we further introduce Adaptive Preconditioner Forgetting (APF) to dynamically refresh stale calibration. Empirically, OSDN demonstrates strong performance across scales. At the 340M-parameter scale, OSDN improves JRT-style in-context recall by 32% over DeltaNet. Scaling to 1.3B parameters, it achieves a 39% reduction in the recall residual ratio while maintaining parity on general downstream tasks, demonstrating that our online-preconditioning mechanism effectively transfers and amplifies at the billion-parameter scale.
Formatted citationZhou, C., Li, H., Liu, Y., Lin, J., Ge, D., and Ye, Y. OSDN: Improving Delta Rule with Provable Online Preconditioning in Linear Attention. arXiv preprint arXiv:2605.13473, 2026.
BibTeX@misc{zhou2026osdnimprovingdeltarule, title={OSDN: Improving Delta Rule with Provable Online Preconditioning in Linear Attention}, author={Chenyu Zhou and Hongpei Li and Yuerou Liu and Jianghao Lin and Dongdong Ge and Yinyu Ye}, year={2026}, eprint={2605.13473}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2605.13473} } - ICML 2026
MemDecoder: Enhancing Test-Time Compute for LLM Agents via Reinforced Memory DecodingHaoran Yin, Chenyu Zhou, Wei Zhu, and Yuhua JinIn The Forty-Third International Conference on Machine Learning, 2026Agentic memory - conditioning large language and vision-language models on past cases, external knowledge, or meta-experiences - has become a key mechanism for improving inference-time reasoning. However, existing approaches largely rely on heuristic retrieval or expensive LLM-based reranking, and do not explicitly learn how to compose memory for a given query. To address these limitations, we propose MemDecoder, a learned framework for adaptive agentic memory selection. MemDecoder formulates memory composition as an autoregressive index decoding problem over a retrieved candidate set, using a lightweight Transformer encoder-decoder to generate an ordered sequence of memory elements. This design enables efficient, task-aware few-shot reasoning without generating textual demonstrations. MemDecoder can be trained via supervised fine-tuning and reinforcement learning with verifiable rewards. We further introduce a ranking-aware variant of Group Relative Policy Optimization that exploits pairwise comparisons within response groups to provide richer learning signals. Experiments across visual question answering, mathematical reasoning, and scientific question answering benchmarks show that MemDecoder consistently outperforms prior agentic memory selection methods, demonstrating the benefits of the architectural design and learning algorithm of MemDecoder.
Formatted citationYin, H., Zhou, C., Zhu, W., and Jin, Y. MemDecoder: Enhancing Test-Time Compute for LLM Agents via Reinforced Memory Decoding. In The Forty-Third International Conference on Machine Learning (ICML 2026), 2026.
BibTeX@inproceedings{yin2026memdecoder, title={MemDecoder: Enhancing Test-Time Compute for LLM Agents via Reinforced Memory Decoding}, author={Yin, Haoran and Zhou, Chenyu and Zhu, Wei and Jin, Yuhua}, booktitle={The Forty-Third International Conference on Machine Learning}, year={2026}, url={https://icml.cc/virtual/2026/poster/65523} } - Preprint
From Soliloquy to Agora: Memory-Enhanced LLM Agents with Decentralized Debate for Optimization ModelingSubmitted to Operations Research, 2026Optimization modeling underpins real-world decision-making in logistics, manufacturing, energy, and public services, but reliably solving such problems from natural-language requirements remains challenging for current large language models (LLMs). In this paper, we propose Agora-Opt, a modular agentic framework for optimization modeling that combines decentralized debate with a read-write memory bank. Agora-Opt allows multiple agent teams to independently produce end-to-end solutions and reconcile them through an outcome-grounded debate protocol, while memory stores solver-verified artifacts and past disagreement resolutions to support training-free improvement over time. This design is flexible across both backbones and methods: it reduces base-model lock-in, transfers across different LLM families, and can be layered onto existing pipelines with minimal coupling. Across public benchmarks, Agora-Opt achieves the strongest overall performance among all compared methods, outperforming strong zero-shot LLMs, training-centric approaches, and prior agentic baselines. Further analyses show robust gains across backbone choices and component variants, and demonstrate that decentralized debate offers a structural advantage over centralized selection by enabling agents to refine candidate solutions through interaction and even recover correct formulations when all initial candidates are flawed. These results suggest that reliable optimization modeling benefits from combining collaborative cross-checking with reusable experience, and position Agora-Opt as a practical and extensible foundation for trustworthy optimization modeling assistance.
Formatted citationLin, J., Ling, Z., Zhou, C., Xu, T., Jiang, R., Wang, Z., and Ge, D. From Soliloquy to Agora: Memory-Enhanced LLM Agents with Decentralized Debate for Optimization Modeling. arXiv preprint arXiv:2604.25847, 2026.
BibTeX@misc{lin2026soliloquyagoramemoryenhancedllm, title={From Soliloquy to Agora: Memory-Enhanced LLM Agents with Decentralized Debate for Optimization Modeling}, author={Jianghao Lin and Zi Ling and Chenyu Zhou and Tianyi Xu and Ruoqing Jiang and Zizhuo Wang and Dongdong Ge}, year={2026}, eprint={2604.25847}, archivePrefix={arXiv}, primaryClass={math.OC}, url={https://arxiv.org/abs/2604.25847} } - Preprint
Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness EngineeringChenyu Zhou, Huacan Chai , Wenteng Chen, Zihan Guo, Rong Shan, Yuanyi Song, Tianyi Xu, Yingxuan Yang, Aofan Yu , Weiming Zhang, Congming Zheng, Jiachen Zhu, Zeyu Zheng , Zhuosheng Zhang, Xingyu Lou , Changwang Zhang, Zhihui Fu , Jun Wang, Weiwen Liu, Jianghao Lin, and Weinan ZhangarXiv preprint arXiv:2604.08224, 2026LLM agent development increasingly relies on reorganizing runtime infrastructure rather than modifying model weights. We present a framework analyzing four externalization dimensions: memory stores persist state over time, skills encapsulate procedural knowledge, protocols structure interactions, and harness engineering coordinates these elements into reliable execution. We trace progress from weights to context to harness design, examining trade-offs between parametric and externalized capabilities while discussing evaluation challenges and the co-evolution of models with external infrastructure.
Formatted citationZhou, C., Chai, H., Chen, W., Guo, Z., Shan, R., Song, Y., Xu, T., Yang, Y., Yu, A., Zhang, W., Zheng, C., Zhu, J., Zheng, Z., Zhang, Z., Lou, X., Zhang, C., Fu, Z., Wang, J., Liu, W., Lin, J., and Zhang, W. Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering. arXiv preprint arXiv:2604.08224, 2026.
BibTeX@misc{zhou2026externalizationllmagentsunified, title={Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering}, author={Chenyu Zhou and Huacan Chai and Wenteng Chen and Zihan Guo and Rong Shan and Yuanyi Song and Tianyi Xu and Yingxuan Yang and Aofan Yu and Weiming Zhang and Congming Zheng and Jiachen Zhu and Zeyu Zheng and Zhuosheng Zhang and Xingyu Lou and Changwang Zhang and Zhihui Fu and Jun Wang and Weiwen Liu and Jianghao Lin and Weinan Zhang}, year={2026}, eprint={2604.08224}, archivePrefix={arXiv}, primaryClass={cs.SE}, url={https://arxiv.org/abs/2604.08224} } - ICLR 2026
StepORLM: A Self-Evolving Framework With Generative Process Supervision For Operations Research Language ModelsChenyu Zhou, Tianyi Xu, Jianghao Lin, and Dongdong GeThe Fourteenth International Conference on Learning Representations, 2026Large Language Models (LLMs) have shown promising capabilities for solving Operations Research (OR) problems. While reinforcement learning serves as a powerful paradigm for LLM training on OR problems, existing works generally face two key limitations. First, outcome reward suffers from the credit assignment problem, where correct final answers can reinforce flawed reasoning. Second, conventional discriminative process supervision is myopic, failing to evaluate the interdependent steps of OR modeling holistically. To this end, we introduce StepORLM, a novel self-evolving framework with generative process supervision. At its core, StepORLM features a co-evolutionary loop where a policy model and a generative process reward model (GenPRM) iteratively improve on each other. This loop is driven by a dual-feedback mechanism: definitive, outcome-based verification from an external solver, and nuanced, holistic process evaluation from the GenPRM. The combined signal is used to align the policy via Weighted Direct Preference Optimization (W-DPO) and simultaneously refine the GenPRM. Our resulting 8B-parameter StepORLM establishes a new state-of-the-art across six benchmarks, significantly outperforming vastly larger generalist models, agentic methods, and specialized baselines. Moreover, the co-evolved GenPRM is able to act as a powerful and universally applicable process verifier, substantially boosting the inference scaling performance of both our own model and other existing LLMs.
Formatted citationZhou, C., Xu, T., Lin, J., and Ge, D. StepORLM: A Self-Evolving Framework With Generative Process Supervision For Operations Research Language Models. In The Fourteenth International Conference on Learning Representations, 2026.
BibTeX@inproceedings{zhou2026steporlm, title={Step{ORLM}: A Self-Evolving Framework With Generative Process Supervision For Operations Research Language Models}, author={Chenyu Zhou and Tianyi Xu and Jianghao Lin and Dongdong Ge}, booktitle={The Fourteenth International Conference on Learning Representations}, year={2026}, url={https://openreview.net/forum?id=ZrgxU8WMmG} }
2025
- Preprint
Auto-Formulating Dynamic Programming Problems with Large Language ModelsChenyu Zhou, Jingyuan Yang, Linwei Xin , Yitian Chen, Ziyan He, and Dongdong GeMajor Revision at Manufacturing & Service Operations Management, 2025Dynamic programming (DP) is a fundamental method in operations research, but formulating DP models has traditionally required expert knowledge of both the problem context and DP techniques. Large Language Models (LLMs) offer the potential to automate this process. However, DP problems pose unique challenges due to their inherently stochastic transitions and the limited availability of training data. These factors make it difficult to directly apply existing LLM-based models or frameworks developed for other optimization problems, such as linear or integer programming. We introduce DP-Bench, the first benchmark covering a wide range of textbook-level DP problems to enable systematic evaluation. We present Dynamic Programming Language Model (DPLM), a 7B-parameter specialized model that achieves performance comparable to state-of-the-art LLMs like OpenAI’s o1 and DeepSeek-R1, and surpasses them on hard problems. Central to DPLM’s effectiveness is DualReflect, our novel synthetic data generation pipeline, designed to scale up training data from a limited set of initial examples. DualReflect combines forward generation for diversity and backward generation for reliability. Our results reveal a key insight: backward generation is favored in low-data regimes for its strong correctness guarantees, while forward generation, though lacking such guarantees, becomes increasingly valuable at scale for introducing diverse formulations. This trade-off highlights the complementary strengths of both approaches and the importance of combining them.
Formatted citationZhou, C., Yang, J., Xin, L., Chen, Y., He, Z., and Ge, D. Auto-Formulating Dynamic Programming Problems with Large Language Models. arXiv preprint arXiv:2507.11737, 2026.
BibTeX@misc{zhou2026autoformulatingdynamicprogrammingproblems, title={Auto-Formulating Dynamic Programming Problems with Large Language Models}, author={Chenyu Zhou and Jingyuan Yang and Linwei Xin and Yitian Chen and Ziyan He and Dongdong Ge}, year={2026}, eprint={2507.11737}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2507.11737} } - WebSci 2025
Decentralization Funded by Centralization: Understanding Crypto Joint-Investment NetworkJunyu Zhang, Chenyu Zhou, and Wei CaiIn The 17th ACM Web Science Conference, 2025In the crypto industry, venture capital investors provide funding to drive the domain towards its decentralized vision, while many individual investors follow online investment news to aid their decision-making. By scraping investment event data from crypto data analytics websites, we construct and analyze the joint investment network of venture capital investors in the crypto industry. Our study reveals the centralized nature of the investor network in this supposedly decentralized domain, by identifying central nodes such as Coinbase Venture and disclosing their persistence of dominance, despite Coinbase’s relatively small market share as an exchange. Based on node features, we divide the network into different communities, hinting at investor clustering. To measure the robustness of this network, we simulate various attack strategies to model bankruptcy and risk propagation, confirming its vulnerability as seen in historical events. Additionally, using graph neural network approaches, we fill in unknown structural information of investors, mitigating information asymmetry in investment disclosures and achieving classification accuracy above 70% for tier ratings and over 65% for investor types. Results are validated using data from another platform, exhibiting consistency. This study sheds light on the dynamics of joint investment networks in this emerging tech domain, offering insights for both potential and existing investors.
Formatted citationZhang, J., Zhou, C., and Cai, W. Decentralization Funded by Centralization: Understanding Crypto Joint-Investment Network. In Proceedings of the 17th ACM Web Science Conference 2025 (WebSci '25), pages 86--95, 2025. DOI: 10.1145/3717867.3717897.
BibTeX@inproceedings{zhang2025decentralization, title={Decentralization Funded by Centralization: Understanding Crypto Joint-Investment Network}, author={Zhang, Junyu and Zhou, Chenyu and Cai, Wei}, booktitle={Proceedings of the 17th ACM Web Science Conference 2025}, series={WebSci '25}, pages={86--95}, year={2025}, publisher={ACM}, doi={10.1145/3717867.3717897}, url={https://doi.org/10.1145/3717867.3717897} }
2024
- CSCW 2024
Decentralized Web3 Non-Fungible Token Community for Societal Prosperity? A Social Capital PerspectiveIn The 27th ACM SIGCHI Conference on Computer-Supported Cooperative Work & Social Computing, 2024In the rapidly evolving Web3 world, non-fungible token (NFT) communities are reshaping the formation, distribution, and activation of social capital in ways distinct from traditional models. However, despite their growing impact on societal prosperity, a comprehensive understanding of social capital dynamics within Web3 NFT communities remains limited. This study explores the Mfers community, a key example within Web3 NFT ecosystems. By analyzing social media and blockchain data and using a Delphi method-based human-large language model (LLM) collaboration, we uncovered unique social capital patterns across six dimensions. Our findings highlight a compelling blend of decentralization, inclusion, trust, and empowerment but also raise critical questions about wealth inequality, content quality, and ethical challenges. Based on the findings, we discussed the uniqueness of social capital in Web3 NFT communities, the tension between technical and power decentralization, and the multidimensional nature of societal prosperity. We also suggested directions for future research on decentralized online communities in the CSCW field. This study provides a systematic perspective on social capital in Web3 NFT communities and introduces an innovative human-LLM collaborative analysis, offering insights into the design and governance of benign decentralized online communities.
Formatted citationChen, H., Zhou, C., El Saddik, A., and Cai, W. Decentralized Web3 Non-Fungible Token Community for Societal Prosperity? A Social Capital Perspective. Proceedings of the ACM on Human-Computer Interaction, 9(2):1--36, 2025. DOI: 10.1145/3710956.
BibTeX@article{chen2025decentralized, title={Decentralized Web3 Non-Fungible Token Community for Societal Prosperity? A Social Capital Perspective}, author={Chen, Hongzhou and Zhou, Chenyu and El Saddik, Abdulmotaleb and Cai, Wei}, journal={Proceedings of the ACM on Human-Computer Interaction}, volume={9}, number={2}, pages={1--36}, year={2025}, publisher={Association for Computing Machinery (ACM)}, doi={10.1145/3710956}, url={https://doi.org/10.1145/3710956} } - WWW 2024
ARTEMIS: Detecting Airdrop Hunters in NFT Markets with a Graph Learning SystemChenyu Zhou, Hongzhou Chen, Hao Wu , Junyu Zhang, and Wei CaiIn The 2024 ACM Web Conference (Oral), 2024As Web3 projects leverage airdrops to incentivize participation, airdrop hunters tactically amass wallet addresses to capitalize on token giveaways. This poses challenges to the decentralization goal. Current detection approaches tailored for cryptocurrencies overlook non-fungible tokens (NFTs) nuances. We introduce ARTEMIS, an optimized graph neural network system for identifying airdrop hunters in NFT transactions. ARTEMIS captures NFT airdrop hunters through: (1) a multimodal module extracting visual and textual insights from NFT metadata using Transformer models; (2) a tailored node aggregation function chaining NFT transaction sequences, retaining behavioral insights; (3) engineered features based on market manipulation theories detecting anomalous trading. Evaluated on decentralized exchange Blur’s data, ARTEMIS significantly outperforms baselines in pinpointing hunters. This pioneering computational solution for an emergent Web3 phenomenon has broad applicability for blockchain anomaly detection. The data and code for the paper are accessible at the following link: doi.org/10.5281/zenodo.10676801.
Formatted citationZhou, C., Chen, H., Wu, H., Zhang, J., and Cai, W. ARTEMIS: Detecting Airdrop Hunters in NFT Markets with a Graph Learning System. In Proceedings of the ACM Web Conference 2024 (WWW '24), pages 1824--1834, 2024. DOI: 10.1145/3589334.3645597.
BibTeX@inproceedings{zhou2024artemis, title={ARTEMIS: Detecting Airdrop Hunters in NFT Markets with a Graph Learning System}, author={Zhou, Chenyu and Chen, Hongzhou and Wu, Hao and Zhang, Junyu and Cai, Wei}, booktitle={Proceedings of the ACM Web Conference 2024}, series={WWW '24}, pages={1824--1834}, year={2024}, publisher={ACM}, doi={10.1145/3589334.3645597}, url={https://doi.org/10.1145/3589334.3645597} }
2023
- IEEE Wirel Commun
Harnessing Web3 on Carbon Offset Market for Sustainability: Framework and A Case StudyIEEE Wireless Communications, 2023Blockchain, pivotal in shaping the metaverse and Web3, often draws criticism for high energy consumption and carbon emission. The rise of sustainability-focused blockchains, especially when intersecting with innovative wireless technologies, revises this predicament. To understand blockchain’s role in sustainability, we propose a three-layers structure encapsulating four green utilities: Recording and Tracking, Wide Verification, Value Trading, and Concept Disseminating. Nori, a decentralized voluntary carbon offset project, serves as our case, illuminating these utilities. Our research unveils unique insights into the on-chain carbon market participants, affect factors of the market, value propositions of NFT-based carbon credits, and the role of social media to spread the concept of carbon offset. We argue that blockchain’s contribution to sustainability is significant, with carbon offsetting potentially evolving as a new standard within the blockchain sector.
Formatted citationZhou, C., Chen, H., Wang, S., Sun, X., El Saddik, A., and Cai, W. Harnessing Web3 on Carbon Offset Market for Sustainability: Framework and A Case Study. IEEE Wireless Communications, 30(5):104--111, 2023. DOI: 10.1109/MWC.010.2300038.
BibTeX@article{zhou2023harnessing, title={Harnessing Web3 on Carbon Offset Market for Sustainability: Framework and A Case Study}, author={Zhou, Chenyu and Chen, Hongzhou and Wang, Shiman and Sun, Xinyao and El Saddik, Abdulmotaleb and Cai, Wei}, journal={IEEE Wireless Communications}, volume={30}, number={5}, pages={104--111}, year={2023}, publisher={Institute of Electrical and Electronics Engineers (IEEE)}, doi={10.1109/MWC.010.2300038}, url={https://doi.org/10.1109/MWC.010.2300038} }