星跃计划 | MSR Asia-MSR Redmond 联合科研计划再度开放！

2021 年 9 月 7 日 微软研究院AI头条

微软亚洲研究院、微软雷德蒙研究院联合推出“星跃计划”第二轮报名开启，人才持续招募中，快来申请吧！

该计划旨在为优秀人才创造与微软全球两大研究院的研究团队一起聚焦真实前沿问题的机会。你将在国际化的科研环境中、在多元包容的科研氛围中、在顶尖研究员的指导下，做有影响力的研究！

首批推出的跨研究院联合科研项目覆盖自然语言处理、数据智能、计算机系统与网络、智能云等领域。研究项目如下：High-performance Distributed Deep Learning, Intelligent Data Cleansing, Intelligent Power-Aware Virtual Machine Allocation, Learning Bandwidth Estimation for Real-Time Video, Neuro-Symbolic Semantic Parsing for Data Science, Next-Gen Large Pretrained Language Models。

2021 年 1 月，“星跃计划”首轮报名一经推出，便收到海内外学子的积极报名与热情关注。现在第二轮“星跃计划”已经开放申请，还在等什么？加入“星跃计划”，和我们一起跨越重洋，探索科研的更多可能！

星跃亮点

同时在微软亚洲研究院、微软雷德蒙研究院顶级研究员的指导下进行科研工作，与不同研究背景的科研人员深度交流
聚焦来自于工业界的真实前沿问题，致力于做出对学术及产业界有影响力的成果
通过线下与线上的交流合作，在微软的两大研究院了解国际化、开放的科研氛围，及多元与包容的文化

申请资格

本科、硕士、博士在读学生；延期（deferred）或间隔年（gap year）学生
可全职在国内工作 6-12 个月
各项目详细要求详见下方项目介绍

▼

还在等什么？

快来寻找适合你的项目吧！

High-performance

Distributed Deep Learning

点击此处向上滑动阅览

The parallel and distributed systems are the solution to address the ever-increasing complexity problem of deep learning trainings. However, existing solutions still leave efficiency and scalability on the table by missing optimization opportunities on various environments at industrial scale.

In this project, we’ll work with scientists who are at the forefront of system and network research, leveraging the world-leading platforms to solve system and networking problems in parallel and distributed deep learning area. The current project team members, from both MSR Asia and MSR Redmond labs, have rich experience contributing to both industry and academic community through transferring innovations that support production systems and publications at top conferences.

Research Areas

System and Networking, MSR Asia

https://www.microsoft.com/en-us/research/group/systems-and-networking-research-group-asia/

Research in Software Engineering, MSR Redmond

https://www.microsoft.com/en-us/research/group/research-software-engineering-rise/

Qualifications

Major in computer science, electrical engineering, or equivalent field
Solid knowledge of data structure/algorithm
Familiarity with Python, C/C++ and other programming languages, familiar with Linux and development on Linux platform
Good communication and presentation skills
Good English reading and writing ability, capable of system implementing based on academic papers in English, capable of writing English documents

Those with the following conditions are preferred:

Familiarity with deep learning systems, e.g., PyTorch TensorFlow, GPU programming and networking
Familiarity with NCCL, MPI communication protocols such as OpenMPI and MVAPICH
Rich knowledge of machine learning and machine learning models
Familiarity with engineering process as a strong plus
Active on GitHub, used or participated in well-known open source projects

Intelligent Data Cleansing

点击此处向上滑动阅览

Tabular data such as Excel spreadsheets and databases are one of the most important assets in large enterprises today, which however are often plagued with data quality issues. Intelligent data cleansing focuses on novel ways to detect and fix data quality issues in tabular data, which can assist the large class of less-technical and non-technical users in enterprises.

We are interested in a variety of topics in this area, including data-driven and intelligent techniques to detect data quality issues and suggest possible fixes, leveraging inferred constraints and statistical properties based on existing data assets and software artifacts.

Research Areas

Data, Knowledge, and Intelligence (DKI), MSR Asia

https://www.microsoft.com/en-us/research/group/data-knowledge-intelligence/

Exploration and Mining (DMX), MSR Redmond

https://www.microsoft.com/en-us/research/group/data-management-exploration-and-mining-dmx

Qualifications

Graduate-level students in Computer Science or related STEM fields. PhD students are preferred
Students with research background in database, data mining, statistics, software engineering, and visualization are preferred

Intelligent Power-Aware

Virtual Machine Allocation

点击此处向上滑动阅览

As one of the world-leading cloud service providers, Microsoft Azure manages tens of millions of virtual machines every day. Within such a large-scale cloud system, how to efficiently allocate virtual machines on servers is critical and has been a hot research topic for years. Previously, teams from MSR-Asia and MSR-Redmond have made significant contributions in this area that resulted in production impact and publication of academic papers at top-tier conferences (e.g., IJCAI, AAAI, OSDI, NSDI). In this project we intend to unify the strength of MSR-Asia and MSR-Redmond for performing forward-looking and collaborative research on power management in datacenters, including power-aware virtual machine allocation. The project involves developing power prediction models by leveraging the start-of-the-art machine learning methods, as well as building efficient and reliable allocation systems in large-scale distributed environments.

Research Areas

Data, Knowledge, and Intelligence (DKI), MSR Asia

https://www.microsoft.com/en-us/research/group/data-knowledge-intelligence/

System, MSR Redmond

https://www.microsoft.com/en-us/research/group/systems-research-group-redmond/

Qualifications

Currently enrolled in a graduate program in computer science or equivalent field
Good research track record in related areas
Able to carry out research tasks with high quality
Good communication and presentation skills in written and oral English
Knowledge and experience in machine learning, data mining and data analytics are preferred
Familiarity with AIOps or AI for systems is a strong plus

Learning Bandwidth Estimation

for Real-Time Video

点击此处向上滑动阅览

In today’s real-time video applications, a key component for optimizing the user’s quality of experience is bandwidth estimation and rate control. It estimates the network capacity based on congestion signals observed on the path and adapts the video bitrate accordingly through the codec. However, existing handcrafted bandwidth estimators have failed to accommodate a wide range of complex network conditions, calling for a data-driven approach.

Motivated by the recent success in applying reinforcement learning (RL) to video streaming and congestion control, we have made an initial attempt at designing an RL-based bandwidth estimator for one-on-one video calls. Going forward, we are working to optimize the performance of our current neural network model, as well as extending the research study of bandwidth estimation and rate control to multiparty videoconferencing.

Research Areas

System and Networking, MSR Asia

https://www.microsoft.com/en-us/research/group/systems-and-networking-research-group-asia/

Mobility and Networking, MSR Redmond

https://www.microsoft.com/en-us/research/group/mobility-and-networking-research/

Qualifications

Major in computer science or a related field
Strong programming skills in Python or C++
Excellent English communication skills
Experience with deep reinforcement learning or related areas is preferred
Knowledge of computer networks is preferred
Background in AI for systems and networking is a strong plus
Track record of publications in top systems, networking, or AI conferences is strongly preferred

Neuro-Symbolic Semantic Parsing

for Data Science

点击此处向上滑动阅览

Our cross-lab, inter-disciplinary research team develops AI technology for interactive coding assistance for data science, data analytics, and business process automation. It allows the user to specify their data processing intent in the middle of their workflow using a combination of natural language, input-output examples, and multi-modal UX – and translates that intent into the desired source code. The underlying AI technology integrates our state-of-the-art research in program synthesis, semantic parsing, and structure-grounded natural language understanding. It has the potential to improve productivity of millions of data scientists and software developers, as well as establish new scientific milestones for deep learning over structured data, grounded language understanding, and neuro-symbolic AI.

The research project involves collecting and establishing a novel benchmark dataset for data science program generation, developing novel neuro-symbolic semantic parsing models to tackle this challenge, adapting large-scale pretrained language models to new domains and knowledge bases, as well as publishing in top-tier AI/NLP conferences. We expect the benchmark dataset and the new models to be used in academia as well as in Microsoft products.

Research Areas

Natural Language Computing, MSR Asia

https://www.microsoft.com/en-us/research/group/natural-language-computing

Neuro-Symbolic Learning, MSR Redmond

Qualifications

Masters or Ph.D. students, majoring in computer science or equivalent areas
Background in deep NLP, semantic parsing, sequence-to-sequence learning, Transformers required
Experience with PyTorch and HuggingFace Transformers
Fluent English speaking, listening, and writing skills
Background in deep learning over structured data (graphs/trees/programs) and program synthesis preferred
Students with papers published at top-tier AI/NLP conferences are preferred

Next-Gen Large

Pretrained Language Models

点击此处向上滑动阅览

The goal of this project is to develop game-changing techniques for next-gen large pre-trained language models, including

(1) Beyond UniLM/InfoXLM: novel pre-training frameworks and self-supervised tasks for monolingual and multilingual pre-training to support language understanding, generation and translation tasks;

(2) Beyond Transformers: new model architectures and optimization algorithms for improving training effectiveness and efficiency of extremely large language models;

(3) Knowledge Fusion: new modeling frameworks to fuse massive pre-compiled knowledge into pre-trained models;

(4) Lifelong Self-supervised Learning: mechanisms and algorithms for lifelong (incremental) pre-training. This project extends our existing research and aims to advance SOTA on NLP and AI in general.

Research Areas

Natural Language Computing, MSR Asia

https://www.microsoft.com/en-us/research/group/natural-language-computing

Deep Learning, MSR Redmond

https://www.microsoft.com/en-us/research/group/deep-learning-group

Qualifications

Major in computer science or equivalent areas
One+ year research experience in deep learning for NLP, CV or related areas
Experience with open-source tools such as PyTorch, Tensorflow, etc.
Background knowledge of language model pre-training is preferred
Track record of publications in related top conferences (e.g., ACL, EMNLP, NAACL, ICML, NeurIPS, ICLR) is preferred
Excellent communication and writing skills