Software is the outcome of active and effective communication between members of an organization. This has been noted with Conway's law, which states that ``organizations design systems that mirror their own communication structure.'' However, software developers are often members of multiple organizational groups (e.g., corporate, regional,) and it is unclear how association with groups beyond one's company influence the development process. In this paper, we study social effects of country by measuring differences in software repositories associated with different countries. Using a novel dataset we obtain from GitHub, we identify key properties that differentiate software repositories based upon the country of the developers. We propose a novel approach of modeling repositories based on their sequence of development activities as a sequence embedding task and coupled with repo profile features we achieve 79.2% accuracy in identifying the country of a repository. Finally, we conduct a case study on repos from well-known corporations and find that country can describe the differences in development better than the company affiliation itself. These results have larger implications for software development and indicate the importance of considering the multiple groups developers are associated with when considering the formation and structure of teams.
翻译:软件是组织成员之间积极有效沟通的结果。 这一点在康威的法律中已经注意到, 该法律指出“ 组织设计系统, 反映自己的通信结构。 ”然而, 软件开发者往往是多个组织团体( 如公司、 区域)的成员, 并且不清楚与公司以外团体的联系如何影响发展进程。 在本文件中, 我们通过测量与不同国家相关的软件储存库的差异来研究国家的社会影响。 使用我们从吉特胡布获得的新颖数据集, 我们发现根据开发者的国家而区别软件储存库的关键属性。 我们建议根据开发活动的顺序来建模储存库的新颖方法,作为嵌入任务,并结合重新定位特征,我们在确定储存库的国家方面实现79.2%的准确性。 最后, 我们进行关于著名公司重新定位的案例研究,发现国家能够比公司所属单位本身更好地描述发展的差异。 这些结果对软件开发具有更大的影响,并表明在考虑团队的组建和结构时,考虑多组开发者与多组开发者有关的重要性。