星跃计划 | MSR Asia-MSR Redmond 联合科研计划再度开放!

2021 年 9 月 7 日 微软研究院AI头条



微软亚洲研究院、微软雷德蒙研究院联合推出“星跃计划”第二轮报名开启,人才持续招募中,快来申请吧!


该计划旨在为优秀人才创造与微软全球两大研究院的研究团队一起聚焦真实前沿问题的机会。你将在国际化的科研环境中、在多元包容的科研氛围中、在顶尖研究员的指导下,做有影响力的研究!


首批推出的跨研究院联合科研项目覆盖自然语言处理、数据智能、计算机系统与网络、智能云等领域。研究项目如下:High-performance Distributed Deep Learning, Intelligent Data Cleansing, Intelligent Power-Aware Virtual Machine Allocation, Learning Bandwidth Estimation for Real-Time Video, Neuro-Symbolic Semantic Parsing for Data Science, Next-Gen Large Pretrained Language Models。


2021 年 1 月,“星跃计划”首轮报名一经推出,便收到海内外学子的积极报名与热情关注。现在第二轮“星跃计划”已经开放申请,还在等什么?加入“星跃计划”,和我们一起跨越重洋,探索科研的更多可能!


星跃亮点


  • 同时在微软亚洲研究院、微软雷德蒙研究院顶级研究员的指导下进行科研工作,与不同研究背景的科研人员深度交流

  • 聚焦来自于工业界的真实前沿问题,致力于做出对学术及产业界有影响力的成果

  • 通过线下与线上的交流合作,在微软的两大研究院了解国际化、开放的科研氛围,及多元与包容的文化


申请资格


  • 本科、硕士、博士在读学生;延期(deferred)或间隔年(gap year)学生

  • 可全职在国内工作 6-12 个月

  • 各项目详细要求详见下方项目介绍


还在等什么?

快来寻找适合你的项目吧!


High-performance 

Distributed Deep Learning

点击此处向上滑动阅览

The parallel and distributed systems are the solution to address the ever-increasing complexity problem of deep learning trainings. However, existing solutions still leave efficiency and scalability on the table by missing optimization opportunities on various environments at industrial scale.


In this project, we’ll work with scientists who are at the forefront of system and network research, leveraging the world-leading platforms to solve system and networking problems in parallel and distributed deep learning area. The current project team members, from both MSR Asia and MSR Redmond labs, have rich experience contributing to both industry and academic community through transferring innovations that support production systems and publications at top conferences.


Research Areas 


System and Networking, MSR Asia

https://www.microsoft.com/en-us/research/group/systems-and-networking-research-group-asia/


Research in Software Engineering, MSR Redmond

https://www.microsoft.com/en-us/research/group/research-software-engineering-rise/


Qualifications


  • Major in computer science, electrical engineering, or equivalent field

  • Solid knowledge of data structure/algorithm 

  • Familiarity with Python, C/C++ and other programming languages, familiar with Linux and development on Linux platform

  • Good communication and presentation skills

  • Good English reading and writing ability, capable of system implementing based on academic papers in English, capable of writing English documents

Those with the following conditions are preferred: 

  • Familiarity with deep learning systems, e.g., PyTorch TensorFlow, GPU programming and networking 

  • Familiarity with NCCL, MPI communication protocols such as OpenMPI and MVAPICH

  • Rich knowledge of machine learning and machine learning models

  • Familiarity with engineering process as a strong plus 

  • Active on GitHub, used or participated in well-known open source projects

Intelligent Data Cleansing

点击此处向上滑动阅览

Tabular data such as Excel spreadsheets and databases are one of the most important assets in large enterprises today, which however are often plagued with data quality issues. Intelligent data cleansing focuses on novel ways to detect and fix data quality issues in tabular data, which can assist the large class of less-technical and non-technical users in enterprises. 


We are interested in a variety of topics in this area, including data-driven and intelligent techniques to detect data quality issues and suggest possible fixes, leveraging inferred constraints and statistical properties based on existing data assets and software artifacts.


Research Areas 


Data, Knowledge, and Intelligence (DKI), MSR Asia

https://www.microsoft.com/en-us/research/group/data-knowledge-intelligence/


Exploration and Mining (DMX), MSR Redmond

https://www.microsoft.com/en-us/research/group/data-management-exploration-and-mining-dmx


Qualifications


  • Graduate-level students in Computer Science or related STEM fields. PhD students are preferred

  • Students with research background in database, data mining, statistics, software engineering, and visualization are preferred

Intelligent Power-Aware

 Virtual Machine Allocation

点击此处向上滑动阅览

As one of the world-leading cloud service providers, Microsoft Azure manages tens of millions of virtual machines every day. Within such a large-scale cloud system, how to efficiently allocate virtual machines on servers is critical and has been a hot research topic for years. Previously, teams from MSR-Asia and MSR-Redmond have made significant contributions in this area that resulted in production impact and publication of academic papers at top-tier conferences (e.g., IJCAI, AAAI, OSDI, NSDI). In this project we intend to unify the strength of MSR-Asia and MSR-Redmond for performing forward-looking and collaborative research on power management in datacenters, including power-aware virtual machine allocation. The project involves developing power prediction models by leveraging the start-of-the-art machine learning methods, as well as building efficient and reliable allocation systems in large-scale distributed environments.


Research Areas 


Data, Knowledge, and Intelligence (DKI), MSR Asia

https://www.microsoft.com/en-us/research/group/data-knowledge-intelligence/


System, MSR Redmond

https://www.microsoft.com/en-us/research/group/systems-research-group-redmond/


Qualifications


  • Currently enrolled in a graduate program in computer science or equivalent field

  • Good research track record in related areas

  • Able to carry out research tasks with high quality

  • Good communication and presentation skills in written and oral English

  • Knowledge and experience in machine learning, data mining and data analytics are preferred

  • Familiarity with AIOps or AI for systems is a strong plus

Learning Bandwidth Estimation

for Real-Time Video

点击此处向上滑动阅览

In today’s real-time video applications, a key component for optimizing the user’s quality of experience is bandwidth estimation and rate control. It estimates the network capacity based on congestion signals observed on the path and adapts the video bitrate accordingly through the codec. However, existing handcrafted bandwidth estimators have failed to accommodate a wide range of complex network conditions, calling for a data-driven approach.


Motivated by the recent success in applying reinforcement learning (RL) to video streaming and congestion control, we have made an initial attempt at designing an RL-based bandwidth estimator for one-on-one video calls. Going forward, we are working to optimize the performance of our current neural network model, as well as extending the research study of bandwidth estimation and rate control to multiparty videoconferencing.


Research Areas 


System and Networking, MSR Asia

https://www.microsoft.com/en-us/research/group/systems-and-networking-research-group-asia/


Mobility and Networking, MSR Redmond

https://www.microsoft.com/en-us/research/group/mobility-and-networking-research/


Qualifications


  • Major in computer science or a related field

  • Strong programming skills in Python or C++

  • Excellent English communication skills

  • Experience with deep reinforcement learning or related areas is preferred

  • Knowledge of computer networks is preferred

  • Background in AI for systems and networking is a strong plus

  • Track record of publications in top systems, networking, or AI conferences is strongly preferred

Neuro-Symbolic Semantic Parsing

 for Data Science

点击此处向上滑动阅览

Our cross-lab, inter-disciplinary research team develops AI technology for interactive coding assistance for data science, data analytics, and business process automation. It allows the user to specify their data processing intent in the middle of their workflow using a combination of natural language, input-output examples, and multi-modal UX – and translates that intent into the desired source code. The underlying AI technology integrates our state-of-the-art research in program synthesis, semantic parsing, and structure-grounded natural language understanding. It has the potential to improve productivity of millions of data scientists and software developers, as well as establish new scientific milestones for deep learning over structured data, grounded language understanding, and neuro-symbolic AI. 


The research project involves collecting and establishing a novel benchmark dataset for data science program generation, developing novel neuro-symbolic semantic parsing models to tackle this challenge, adapting large-scale pretrained language models to new domains and knowledge bases, as well as publishing in top-tier AI/NLP conferences. We expect the benchmark dataset and the new models to be used in academia as well as in Microsoft products. 


Research Areas 


Natural Language Computing, MSR Asia

https://www.microsoft.com/en-us/research/group/natural-language-computing


Neuro-Symbolic Learning, MSR Redmond


Qualifications


  • Masters or Ph.D. students, majoring in computer science or equivalent areas

  • Background in deep NLP, semantic parsing, sequence-to-sequence learning, Transformers required

  • Experience with PyTorch and HuggingFace Transformers 

  • Fluent English speaking, listening, and writing skills 

  • Background in deep learning over structured data (graphs/trees/programs) and program synthesis preferred 

  • Students with papers published at top-tier AI/NLP conferences are preferred 

Next-Gen Large 

Pretrained Language Models

点击此处向上滑动阅览

The goal of this project is to develop game-changing techniques for next-gen large pre-trained language models, including 

(1) Beyond UniLM/InfoXLM: novel pre-training frameworks and self-supervised tasks for monolingual and multilingual pre-training to support language understanding, generation and translation tasks;


 (2) Beyond Transformers: new model architectures and optimization algorithms for improving training effectiveness and efficiency of extremely large language models; 


(3) Knowledge Fusion: new modeling frameworks to fuse massive pre-compiled knowledge into pre-trained models; 


(4) Lifelong Self-supervised Learning: mechanisms and algorithms for lifelong (incremental) pre-training. This project extends our existing research and aims to advance SOTA on NLP and AI in general. 


Research Areas 


Natural Language Computing, MSR Asia

https://www.microsoft.com/en-us/research/group/natural-language-computing


Deep Learning, MSR Redmond

https://www.microsoft.com/en-us/research/group/deep-learning-group


Qualifications


  • Major in computer science or equivalent areas

  • One+ year research experience in deep learning for NLP, CV or related areas

  • Experience with open-source tools such as PyTorch, Tensorflow, etc.

  • Background knowledge of language model pre-training is preferred

  • Track record of publications in related top conferences (e.g., ACL, EMNLP, NAACL, ICML, NeurIPS, ICLR) is preferred 

  • Excellent communication and writing skills


申请方式


符合条件的申请者请填写下方申请表:

https://jinshuju.net/f/LadoJK

或扫描下方二维码,立即填写进入申请!









你也许还想看



登录查看更多
0

相关内容

挖掘软件存储库(MSR)会议分析软件存储库中可用的丰富数据,以发现有关软件系统和项目的有趣和可操作的信息。官网链接:http://www.msrconf.org/
专知会员服务
91+阅读 · 2021年7月23日
专知会员服务
41+阅读 · 2021年1月18日
一份硬核计算机科学CS自学修炼计划
专知会员服务
40+阅读 · 2021年1月12日
如何撰写好你的博士论文?CMU-Priya博士这30页ppt为你指点
专知会员服务
52+阅读 · 2020年10月30日
【CCL 2019】刘康、韩先培:做失败科研的10个方法
专知会员服务
25+阅读 · 2019年11月12日
直播预告 | 10月27日:中欧联合实验室系列学术讲座
中国科学院自动化研究所
0+阅读 · 2021年10月22日
2021-2022微软亚洲研究院星桥计划开放申请啦!
微软研究院AI头条
0+阅读 · 2021年7月5日
国家自然科学基金
4+阅读 · 2015年7月12日
国家自然科学基金
0+阅读 · 2013年12月31日
国家自然科学基金
1+阅读 · 2013年12月31日
国家自然科学基金
0+阅读 · 2013年7月31日
国家自然科学基金
2+阅读 · 2012年12月31日
国家自然科学基金
1+阅读 · 2012年12月31日
国家自然科学基金
0+阅读 · 2012年12月31日
国家自然科学基金
0+阅读 · 2011年12月31日
国家自然科学基金
1+阅读 · 2008年12月31日
Arxiv
14+阅读 · 2020年10月26日
Arxiv
43+阅读 · 2019年12月20日
A Survey on Deep Learning for Named Entity Recognition
Arxiv
72+阅读 · 2018年12月22日
VIP会员
相关基金
国家自然科学基金
4+阅读 · 2015年7月12日
国家自然科学基金
0+阅读 · 2013年12月31日
国家自然科学基金
1+阅读 · 2013年12月31日
国家自然科学基金
0+阅读 · 2013年7月31日
国家自然科学基金
2+阅读 · 2012年12月31日
国家自然科学基金
1+阅读 · 2012年12月31日
国家自然科学基金
0+阅读 · 2012年12月31日
国家自然科学基金
0+阅读 · 2011年12月31日
国家自然科学基金
1+阅读 · 2008年12月31日
Top
微信扫码咨询专知VIP会员