{"id":65208,"date":"2021-06-30T05:03:36","date_gmt":"2021-06-30T12:03:36","guid":{"rendered":"https:\/\/github.blog\/?p=65208"},"modified":"2022-08-16T10:59:33","modified_gmt":"2022-08-16T17:59:33","slug":"github-copilot-research-recitation","status":"publish","type":"post","link":"https:\/\/github.blog\/ai-and-ml\/github-copilot\/github-copilot-research-recitation\/","title":{"rendered":"GitHub Copilot research recitation"},"content":{"rendered":"<h2 id=\"introduction\"><a class=\"heading-link\" href=\"#introduction\">Introduction<span class=\"heading-hash pl-2 text-italic text-bold\" aria-hidden=\"true\"><\/span><\/a><\/h2>\n<p>GitHub Copilot is trained on billions of lines of public code. The suggestions it makes to you are adapted to your code, but the processing behind it is ultimately informed by code written by others.<\/p>\n<p>How direct is the relationship between the suggested code and the code that informed it? In a recent thought-provoking paper<sup id=\"anchor1\"><a href=\"#footnote1\">1<\/a><\/sup>, Bender, Gebru et al. coined the phrase \u201cstochastic parrots\u201d for artificial intelligence systems, like the ones that power GitHub Copilot. Or, as a fellow machine learning engineer at GitHub<sup id=\"anchor2\"><a href=\"#footnote2\">2<\/a><\/sup> remarked during a water cooler chat: these systems can feel like &#8220;a toddler with a photographic memory.\u201d<\/p>\n<p>These are deliberate oversimplifications. Many GitHub Copilot suggestions feel specifically tailored to the particular code base the user is working on. Often, it looks less like a parrot and more like a crow building novel tools out of small blocks<sup id=\"anchor3\"><a href=\"#footnote3\">3<\/a><\/sup>. Yet there\u2019s no denying that GitHub Copilot has an impressive memory:<\/p>\n<p><img data-recalc-dims=\"1\" decoding=\"async\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/user-images.githubusercontent.com\/4434330\/165667874-2f04b14c-909e-4bf5-9639-ff346da960f1.gif?ssl=1\" alt=\"A movie demonstration of GitHub Copilot\" \/><\/p>\n<p>Here, I intentionally directed<sup id=\"anchor4\"><a href=\"#footnote4\">4<\/a><\/sup> GitHub Copilot to recite a well-known text it obviously knows by heart. I, too, know a couple of texts by heart. For example, I still remember some poems I learned in school. Yet no matter the topic, not once have I been tempted to derail a conversation by falling into iambic tetrameter and waxing about daffodils.<\/p>\n<p>So, is that (or rather the coding equivalent of it) something GitHub Copilot is prone to doing? How many of its suggestions are unique, and how often does it just parrot some likely looking code it has seen during training?<\/p>\n<h2 id=\"the-experiment\"><a class=\"heading-link\" href=\"#the-experiment\">The experiment<span class=\"heading-hash pl-2 text-italic text-bold\" aria-hidden=\"true\"><\/span><\/a><\/h2>\n<p>During GitHub Copilot\u2019s early development, nearly 300 employees used it in their daily work as part of an internal trial. This trial provided a good dataset to test for recitation. I wanted to find out how often GitHub Copilot gave them a suggestion that was quoted from something it had seen before.<\/p>\n<p>I limited the investigation to Python suggestions with a cutoff on May 7, 2021 (the day we started extracting that data). That left 453,780 suggestions spread out over 396 \u201cuser weeks\u201d, that is, calendar weeks during which a user actively used GitHub Copilot on Python code.<\/p>\n<h3 id=\"automatic-filtering\"><a class=\"heading-link\" href=\"#automatic-filtering\">Automatic filtering<span class=\"heading-hash pl-2 text-italic text-bold\" aria-hidden=\"true\"><\/span><\/a><\/h3>\n<p>Though 453,780 suggestions are a lot, many of them can be dismissed immediately. To get to the interesting cases, consider sequences of \u201cwords\u201d that occur in the suggestion in the same order as in the code GitHub Copilot has been trained on. In this context, punctuation, brackets, or other special characters all count as \u201cwords,\u201d while tabs, spaces, or even line breaks are ignored completely. After all, a quote is still a quote, whether it\u2019s indented by one tab or eight spaces.<\/p>\n<p>For example, one of GitHub Copilot\u2019s suggestions was the following regex for numbers separated by whitespace:<\/p>\n<pre><code>r'^\\s+\\d+\\s+\\d+\\s+\\d+\\s+\\d+\\s+\\d+\\s+\\d+\\s+\\d+\\s+\\d+\\s+\\d+\\s+\\d+\\s+\\d+\\s+\\d+\\s+\\d+\\s+\\d+\\s+\\d+\\s+\\d+'\n<\/code><\/pre>\n<p>This would be exactly 100 \u201cwords\u201d in the sense above, but it\u2019s a particularly dense example. The average non-empty line of code has only 10 \u201cwords.\u201d I\u2019ve restricted this investigation to cases where the overlap with the code GitHub Copilot was trained on contains at least 60 such \u201cwords\u201d. We must set the cut somewhere, and I think it\u2019s rather rare that shorter sequences would be of great interest. In fact, most of the interesting cases identified later are well clear of that threshold of 60.<\/p>\n<p>If the overlap extends to what the user has already written, that also counts for the length. After all, the user may have written that context with the help of GitHub Copilot as well!<\/p>\n<p>In the following example, the user has started writing a very common snippet. GitHub Copilot completes it. Even though the completion itself is rather short, together with the already existing code, it clears the threshold and is retained.<\/p>\n<p><img data-recalc-dims=\"1\" decoding=\"async\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/user-images.githubusercontent.com\/4434330\/165668070-da55664a-a6a2-40bd-87dd-e29db77de58f.png?ssl=1\" alt=\"Example code\" \/><\/p>\n<p>This procedure is permissive enough to let many relatively \u201cboring\u201d examples through, like the two above. But it\u2019s still effective at dialing in the human analysis to the interesting cases, sorting out more than 99% of GitHub Copilot suggestions.<\/p>\n<h3 id=\"manual-bucketing\"><a class=\"heading-link\" href=\"#manual-bucketing\">Manual bucketing<span class=\"heading-hash pl-2 text-italic text-bold\" aria-hidden=\"true\"><\/span><\/a><\/h3>\n<p>After filtering, there were 473 suggestions left. However, they came in very different forms:<\/p>\n<ol>\n<li>Some were basically just repeats of another case that passed filtering. For example, sometimes GitHub Copilot makes a suggestion, the developer types a comment line, and GitHub Copilot offers a very similar suggestion again. I removed these cases from the analysis as duplicates.<\/li>\n<li>Some were long, repetitive sequences. Take the following example, where the repeated blocks of <code>\u2018&lt;p&gt;\u2019<\/code> are, of course, found somewhere in the training set: <br \/><img data-recalc-dims=\"1\" decoding=\"async\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/user-images.githubusercontent.com\/4434330\/165668349-f2dc170c-1956-47cd-a8c2-c6a64f15701c.png?ssl=1\" alt=\"Example repetitions\" \/><br \/> Such suggestions can be helpful (test cases, regular expressions) or not helpful (like this case, I suspect). In any case, they do not fit the idea of rote learning I had in mind when I started this investigation.<\/li>\n<li>Some were standard inventories, like the natural numbers, the prime numbers, stock market tickers, or the Greek alphabet: <br \/><img data-recalc-dims=\"1\" decoding=\"async\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/user-images.githubusercontent.com\/4434330\/165668475-4d2658f7-71f3-4c84-9975-ed8f6da10bd4.png?ssl=1\" alt=\"Example of Greek alphabet\" \/><\/li>\n<li>Some were common, straightforward ways, perhaps even universal ways, of doing things with very few natural degrees of freedom. For example, the middle part of the following strikes me as very much the standard way of using the BeautifulSoup package to parse a Wikipedia list. In fact, the best matching snippet found in GitHub Copilot&#8217;s training data<sup id=\"anchor5\"><a href=\"#footnote5\">5<\/a><\/sup> uses such code to parse a different article and goes on to do different things with the results. <br \/><img data-recalc-dims=\"1\" decoding=\"async\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/user-images.githubusercontent.com\/4434330\/165668596-c4ec8bf8-7aeb-4bae-97f2-e0ce382ed4e9.png?ssl=1\" alt=\"Example of Beautiful Soup\" \/> <br \/>This doesn\u2019t fit my idea of a quote either. It\u2019s a bit like when someone says \u201cI\u2019m taking out the trash. I\u2019ll be back soon.\u201d That\u2019s a matter-of-fact statement, not a quote, even though that particular phrase has been uttered many times before.<\/li>\n<li>Then there are all other cases. Those with at least some specific overlap in either code or comments. These are what interest me the most, and what I\u2019m going to concentrate on moving forward.<\/li>\n<\/ol>\n<p>This bucketing necessarily has some edge cases<sup id=\"anchor6\"><a href=\"#footnote6\">6<\/a><\/sup>, and your mileage may vary in how you think they should be classified. Maybe you even disagree with the whole set of buckets in the first place.<\/p>\n<p>That\u2019s why we\u2019ve open sourced that dataset<sup id=\"anchor7\"><a href=\"#footnote7\">7<\/a><\/sup>. So, if you feel a bit differently about the bucketing, or if you\u2019re interested in other aspects of GitHub Copilot parroting its training set, you\u2019re very welcome to ignore my next section and draw your own conclusions.<\/p>\n<h2 id=\"results\"><a class=\"heading-link\" href=\"#results\">Results<span class=\"heading-hash pl-2 text-italic text-bold\" aria-hidden=\"true\"><\/span><\/a><\/h2>\n<p><img data-recalc-dims=\"1\" decoding=\"async\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/user-images.githubusercontent.com\/4434330\/165668688-b9cbbaf3-42e9-44cf-86c0-c186721c4eef.png?ssl=1\" alt=\"Overview Plot\" \/><\/p>\n<p>For most of GitHub Copilot&#8217;s suggestions, our automatic filter didn\u2019t find any significant overlap with the code used for training. Yet it did bring 473 cases to our attention. Removing the first bucket (cases that look very similar to other cases) left me with 185 suggestions. Of these suggestions, 144 got sorted out in buckets 2 &#8211; 4. This left 41 cases in the last bucket, the \u201crecitations,\u201d in the meaning of the term I have in mind.<\/p>\n<p>That corresponds to <strong>one recitation event every 10 user weeks<\/strong> (95% confidence interval: 7 &#8211; 13 weeks, using a Poisson test).<\/p>\n<p>Naturally, this was measured by the GitHub and Microsoft developers who tried out GitHub Copilot. If your coding behavior is very different from theirs, your results might differ. Some of these developers are only working part-time on Python projects. I could not distinguish that and therefore counted everyone who writes some Python in a given week as a user.<\/p>\n<p>One event in 10 weeks doesn\u2019t sound like a lot, but it\u2019s not 0 either. Also, I found three things that struck me.<\/p>\n<h3 id=\"github-copilot-quotes-when-it-lacks-specific-context\"><a class=\"heading-link\" href=\"#github-copilot-quotes-when-it-lacks-specific-context\">GitHub Copilot quotes when it lacks specific context<span class=\"heading-hash pl-2 text-italic text-bold\" aria-hidden=\"true\"><\/span><\/a><\/h3>\n<p>If I want to learn the lyrics to a song, I must listen to it many times. GitHub Copilot is no different: To learn a snippet of code by heart, it must see that snippet a lot. Each file is only shown to GitHub Copilot once, so the snippet needs to exist in many different files in public code.<\/p>\n<p>Of the 41 main cases we singled out during manual labelling, none appear in less than 10 different files. Most (35 cases) appear more than a hundred times. In one instance, GitHub Copilot suggested starting an empty file with something it had even seen more than a whopping 700,000 different times during training&#8211;that was the GNU General Public License.<\/p>\n<p>The following plot shows the number of matched files of the results in bucket 5 (one red mark on the bottom for each result), versus buckets 2-4. I left out bucket 1, which is really just a mix of duplicates of bucket 2-4 cases and duplicates of bucket 5 cases. The inferred distribution is displayed as a red line. It peaks between 100 and 1000 matches.<\/p>\n<p><img data-recalc-dims=\"1\" decoding=\"async\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/user-images.githubusercontent.com\/4434330\/165668729-591dbf5a-627a-4d6f-a93b-fb801cf7b8b8.png?ssl=1\" alt=\"Number of Matches Plot\" \/><\/p>\n<h3 id=\"github-copilot-mostly-quotes-in-generic-contexts\"><a class=\"heading-link\" href=\"#github-copilot-mostly-quotes-in-generic-contexts\">GitHub Copilot mostly quotes in generic contexts<span class=\"heading-hash pl-2 text-italic text-bold\" aria-hidden=\"true\"><\/span><\/a><\/h3>\n<p>As time goes on, each file becomes unique. Yet GitHub Copilot doesn\u2019t wait for that<sup id=\"anchor8\"><a href=\"#footnote8\">8<\/a><\/sup>. It will offer its solutions while your file is still extremely generic. And in the absence of anything specific to go on, it\u2019s much more likely to quote from somewhere else than it would be otherwise.<\/p>\n<p><img data-recalc-dims=\"1\" decoding=\"async\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/user-images.githubusercontent.com\/4434330\/165668841-d3e4f909-6afe-4019-a5f3-0602b82f4e42.png?ssl=1\" alt=\"Context Length Plot\" \/><\/p>\n<p>Of course, software developers spend most of their time deep inside the files, where the context is unique enough that GitHub Copilot will offer unique suggestions. In contrast, the suggestions at the beginning are rather hit-and-miss, since GitHub Copilot cannot know what the program will be. Yet sometimes, especially in toy projects or standalone scripts, a modest amount of context can be enough to hazard a reasonable guess of what the user wanted to do. Sometimes it&#8217;s also still generic enough so that GitHub Copilot thinks one of the solutions it knows by heart looks promising:<\/p>\n<p><img data-recalc-dims=\"1\" decoding=\"async\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/user-images.githubusercontent.com\/4434330\/165668869-3e1f8d88-a3fd-468d-af23-061c7c065ded.png?ssl=1\" alt=\"Example code\" \/><\/p>\n<p>This is all but directly taken from coursework for a robotics class uploaded in different variations<sup id=\"anchor9\"><a href=\"#footnote9\">9<\/a><\/sup>.<\/p>\n<h3 id=\"detection-is-only-as-good-as-the-tool-that-does-the-detecting\"><a class=\"heading-link\" href=\"#detection-is-only-as-good-as-the-tool-that-does-the-detecting\">Detection is only as good as the tool that does the detecting<span class=\"heading-hash pl-2 text-italic text-bold\" aria-hidden=\"true\"><\/span><\/a><\/h3>\n<p>In its current form, the filter will turn up a good number of uninteresting cases when applied broadly. Yet it still should not be too much noise. For the internal users in the experiment, it would have been a bit more than one find per week on average (albeit likely in bursts!). Of these finds, roughly 17% (95% confidence interval using a binomial test: 14%-21%) would be in the fifth bucket.<\/p>\n<p>Nothing is ever foolproof, of course, so this too can be tricked. Some cases are rather hard to detect by the tool we\u2019re building, but still have an obvious source. To return to the Zen of Python:<\/p>\n<p><img data-recalc-dims=\"1\" decoding=\"async\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/user-images.githubusercontent.com\/4434330\/165668905-519012d3-ccca-452a-bd63-744cafcffc4f.gif?ssl=1\" alt=\"Zen Variation\" \/><\/p>\n<h2 id=\"conclusion-and-next-steps\"><a class=\"heading-link\" href=\"#conclusion-and-next-steps\">Conclusion and next steps<span class=\"heading-hash pl-2 text-italic text-bold\" aria-hidden=\"true\"><\/span><\/a><\/h2>\n<p>This investigation demonstrates that GitHub Copilot <em>can<\/em> quote a body of code verbatim, yet it rarely does so, and when it does, it mostly quotes code that everybody quotes, typically at the beginning of a file, as if to break the ice.<\/p>\n<p>However, there\u2019s still one big difference between GitHub Copilot reciting code and me reciting a poem: I <em>know<\/em> when I\u2019m quoting. I would also like to know when Copilot is echoing existing code rather than coming up with its own ideas. That way, I\u2019m able to look up background information about that code, and when to include credit where credit is due.<\/p>\n<p>The answer is obvious: sharing the prefiltering solution we used in this analysis to detect overlap with the training set. When a suggestion contains snippets copied from the training set, the UI should simply tell you where it\u2019s quoted from. You can then either include proper attribution, or decide against using that code altogether.<\/p>\n<p>This duplication search is not yet integrated into the technical preview, but we plan to do so. We will both continue to work on decreasing rates of recitation, as well as making its detection more precise.<\/p>\n<h3 id=\"footnotes\"><a class=\"heading-link\" href=\"#footnotes\">Footnotes<span class=\"heading-hash pl-2 text-italic text-bold\" aria-hidden=\"true\"><\/span><\/a><\/h3>\n<p><a name=\"footnote1\">1<\/a>: <a href=\"https:\/\/dl.acm.org\/doi\/10.1145\/3442188.3445922\">On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?<\/a> <a href=\"#anchor1\">^<\/a><\/p>\n<p><a name=\"footnote2\">2<\/a>: <a href=\"https:\/\/github.com\/tiferet\">Tiferet Gazit<\/a> <a href=\"#anchor2\">^<\/a><\/p>\n<p><a name=\"footnote3\">3<\/a>: see von Bayern et al. about the creative wisdom of crows: <a href=\"https:\/\/www.nature.com\/articles\/s41598-018-33458-z\">Compound tool construction by New Caledonian crows<\/a> <a href=\"#anchor3\">^<\/a><\/p>\n<p><a name=\"footnote4\">4<\/a>: see Carlini et al. about deliberately triggering the recall of training data: <a href=\"https:\/\/arxiv.org\/pdf\/2012.07805.pdf\">Extracting Training Data from Large Language Models<\/a> <a href=\"#anchor4\">^<\/a><\/p>\n<p><a name=\"footnote5\">5<\/a>: jaeteekae: <a href=\"https:\/\/github.com\/jaeteekae\/DelayedTwitter\/blob\/0a0b03de74c03cfbf36877ffded0cb1312d59642\/get_top_twitter_accounts.py#L21\">DelayedTwitter<\/a> <a href=\"#anchor5\">^<\/a><\/p>\n<p><a name=\"footnote6\">6<\/a>: Probably not <em>too<\/em> many though. I\u2019ve asked some developers to help me label the cases, and everyone was prompted to flag up any uncertainty with their judgement. That happened in only 34 cases, i.e. less than 10%. <a href=\"#anchor6\">^<\/a><\/p>\n<p><a name=\"footnote7\">7<\/a>: In the <a href=\"https:\/\/github.com\/github\/docs\/blob\/f817678adb6c490ec846e4fc15eb2236e4b2050c\/assets\/images\/help\/copilot\/matched_snippets.csv\">public dataset<\/a>, I list the part of Copilot&#8217;s suggestion that was also found in the training set, how often it was found, and a link to an example where it occurs in public code. For privacy reasons, I don&#8217;t include the not-matched part of the completion or the code context the user had typed (only an indication of its length). <a href=\"#anchor7\">^<\/a><\/p>\n<p><a name=\"footnote8\">8<\/a>: In fact, since this experiment has been made, GitHub Copilot <em>has<\/em> changed to require a minimum file content. So some of the suggestions flagged here would not have been shown by the current version. <a href=\"#anchor8\">^<\/a><\/p>\n<p><a name=\"footnote9\">9<\/a>: For example jenevans33: <a href=\"https:\/\/github.com\/jenevans33\/CS8803-1\/blob\/eca1bbc27ca6f7355dbc806b2f95964b59381605\/src\/Final\/ekfcode.py#L23\">CS8803-1<\/a> <a href=\"#anchor9\">^<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>GitHub Copilot: Parrot or Crow? A first look at rote learning in GitHub Copilot suggestions.<\/p>\n","protected":false},"author":2003,"featured_media":63782,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_gh_post_show_toc":"no","_gh_post_is_no_robots":"","_gh_post_is_featured":"no","_gh_post_is_excluded":"no","_gh_post_is_unlisted":"","_gh_post_related_link_1":"","_gh_post_related_link_2":"","_gh_post_related_link_3":"","_gh_post_sq_img":"https:\/\/github.blog\/wp-content\/uploads\/2022\/01\/GitHub-Engineering_green-square-icon.png","_gh_post_sq_img_id":"62550","_gh_post_cta_title":"","_gh_post_cta_text":"","_gh_post_cta_link":"","_gh_post_cta_button":"Click Here to Learn More","_gh_post_recirc_hide":"no","_gh_post_recirc_col_1":"gh-auto-select","_gh_post_recirc_col_2":"65301","_gh_post_recirc_col_3":"65308","_gh_post_recirc_col_4":"65316","_featured_video":"","_gh_post_additional_query_params":"","_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"_wpas_customize_per_network":false,"_links_to":"","_links_to_target":""},"categories":[3293,3295],"tags":[2535],"coauthors":[2742],"class_list":["post-65208","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-and-ml","category-github-copilot","tag-github-copilot"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v27.3 (Yoast SEO v27.3) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>GitHub Copilot research recitation - The GitHub Blog<\/title>\n<meta name=\"description\" content=\"GitHub Copilot: Parrot or Crow? A first look at rote learning in GitHub Copilot suggestions.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/github.blog\/ai-and-ml\/github-copilot\/github-copilot-research-recitation\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"GitHub Copilot research recitation\" \/>\n<meta property=\"og:description\" content=\"GitHub Copilot: Parrot or Crow? A first look at rote learning in GitHub Copilot suggestions.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/github.blog\/ai-and-ml\/github-copilot\/github-copilot-research-recitation\/\" \/>\n<meta property=\"og:site_name\" content=\"The GitHub Blog\" \/>\n<meta property=\"article:published_time\" content=\"2021-06-30T12:03:36+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2022-08-16T17:59:33+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/github.blog\/wp-content\/uploads\/2022\/03\/Engineering-Product@2x-1.png?fit=2400%2C1260\" \/>\n\t<meta property=\"og:image:width\" content=\"2400\" \/>\n\t<meta property=\"og:image:height\" content=\"1260\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Albert Ziegler\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Albert Ziegler\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"11 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/github.blog\\\/ai-and-ml\\\/github-copilot\\\/github-copilot-research-recitation\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/github.blog\\\/ai-and-ml\\\/github-copilot\\\/github-copilot-research-recitation\\\/\"},\"author\":{\"name\":\"Albert Ziegler\",\"@id\":\"https:\\\/\\\/github.blog\\\/#\\\/schema\\\/person\\\/6515636ca0bf9fd27d1dd5ed61aa2466\"},\"headline\":\"GitHub Copilot research recitation\",\"datePublished\":\"2021-06-30T12:03:36+00:00\",\"dateModified\":\"2022-08-16T17:59:33+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/github.blog\\\/ai-and-ml\\\/github-copilot\\\/github-copilot-research-recitation\\\/\"},\"wordCount\":2074,\"image\":{\"@id\":\"https:\\\/\\\/github.blog\\\/ai-and-ml\\\/github-copilot\\\/github-copilot-research-recitation\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/github.blog\\\/wp-content\\\/uploads\\\/2022\\\/03\\\/Engineering-Product@2x-1.png?fit=2400%2C1260\",\"keywords\":[\"GitHub Copilot\"],\"articleSection\":[\"AI &amp; ML\",\"GitHub Copilot\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/github.blog\\\/ai-and-ml\\\/github-copilot\\\/github-copilot-research-recitation\\\/\",\"url\":\"https:\\\/\\\/github.blog\\\/ai-and-ml\\\/github-copilot\\\/github-copilot-research-recitation\\\/\",\"name\":\"GitHub Copilot research recitation - The GitHub Blog\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/github.blog\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/github.blog\\\/ai-and-ml\\\/github-copilot\\\/github-copilot-research-recitation\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/github.blog\\\/ai-and-ml\\\/github-copilot\\\/github-copilot-research-recitation\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/github.blog\\\/wp-content\\\/uploads\\\/2022\\\/03\\\/Engineering-Product@2x-1.png?fit=2400%2C1260\",\"datePublished\":\"2021-06-30T12:03:36+00:00\",\"dateModified\":\"2022-08-16T17:59:33+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/github.blog\\\/#\\\/schema\\\/person\\\/6515636ca0bf9fd27d1dd5ed61aa2466\"},\"description\":\"GitHub Copilot: Parrot or Crow? A first look at rote learning in GitHub Copilot suggestions.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/github.blog\\\/ai-and-ml\\\/github-copilot\\\/github-copilot-research-recitation\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/github.blog\\\/ai-and-ml\\\/github-copilot\\\/github-copilot-research-recitation\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/github.blog\\\/ai-and-ml\\\/github-copilot\\\/github-copilot-research-recitation\\\/#primaryimage\",\"url\":\"https:\\\/\\\/github.blog\\\/wp-content\\\/uploads\\\/2022\\\/03\\\/Engineering-Product@2x-1.png?fit=2400%2C1260\",\"contentUrl\":\"https:\\\/\\\/github.blog\\\/wp-content\\\/uploads\\\/2022\\\/03\\\/Engineering-Product@2x-1.png?fit=2400%2C1260\",\"width\":2400,\"height\":1260},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/github.blog\\\/ai-and-ml\\\/github-copilot\\\/github-copilot-research-recitation\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/github.blog\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"AI &amp; ML\",\"item\":\"https:\\\/\\\/github.blog\\\/ai-and-ml\\\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"GitHub Copilot\",\"item\":\"https:\\\/\\\/github.blog\\\/ai-and-ml\\\/github-copilot\\\/\"},{\"@type\":\"ListItem\",\"position\":4,\"name\":\"GitHub Copilot research recitation\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/github.blog\\\/#website\",\"url\":\"https:\\\/\\\/github.blog\\\/\",\"name\":\"The GitHub Blog\",\"description\":\"Updates, ideas, and inspiration from GitHub to help developers build and design software.\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/github.blog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/github.blog\\\/#\\\/schema\\\/person\\\/6515636ca0bf9fd27d1dd5ed61aa2466\",\"name\":\"Albert Ziegler\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/bebe6658a53334f87b6dbb5da430abda1de63af2046253f045e641a9984b460b?s=96&d=mm&r=gd5010a5cbed3d567a9476f693b4065cd\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/bebe6658a53334f87b6dbb5da430abda1de63af2046253f045e641a9984b460b?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/bebe6658a53334f87b6dbb5da430abda1de63af2046253f045e641a9984b460b?s=96&d=mm&r=g\",\"caption\":\"Albert Ziegler\"},\"url\":\"https:\\\/\\\/github.blog\\\/author\\\/wunderalbert\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"GitHub Copilot research recitation - The GitHub Blog","description":"GitHub Copilot: Parrot or Crow? A first look at rote learning in GitHub Copilot suggestions.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/github.blog\/ai-and-ml\/github-copilot\/github-copilot-research-recitation\/","og_locale":"en_US","og_type":"article","og_title":"GitHub Copilot research recitation","og_description":"GitHub Copilot: Parrot or Crow? A first look at rote learning in GitHub Copilot suggestions.","og_url":"https:\/\/github.blog\/ai-and-ml\/github-copilot\/github-copilot-research-recitation\/","og_site_name":"The GitHub Blog","article_published_time":"2021-06-30T12:03:36+00:00","article_modified_time":"2022-08-16T17:59:33+00:00","og_image":[{"width":2400,"height":1260,"url":"https:\/\/github.blog\/wp-content\/uploads\/2022\/03\/Engineering-Product@2x-1.png?fit=2400%2C1260","type":"image\/png"}],"author":"Albert Ziegler","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Albert Ziegler","Est. reading time":"11 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/github.blog\/ai-and-ml\/github-copilot\/github-copilot-research-recitation\/#article","isPartOf":{"@id":"https:\/\/github.blog\/ai-and-ml\/github-copilot\/github-copilot-research-recitation\/"},"author":{"name":"Albert Ziegler","@id":"https:\/\/github.blog\/#\/schema\/person\/6515636ca0bf9fd27d1dd5ed61aa2466"},"headline":"GitHub Copilot research recitation","datePublished":"2021-06-30T12:03:36+00:00","dateModified":"2022-08-16T17:59:33+00:00","mainEntityOfPage":{"@id":"https:\/\/github.blog\/ai-and-ml\/github-copilot\/github-copilot-research-recitation\/"},"wordCount":2074,"image":{"@id":"https:\/\/github.blog\/ai-and-ml\/github-copilot\/github-copilot-research-recitation\/#primaryimage"},"thumbnailUrl":"https:\/\/github.blog\/wp-content\/uploads\/2022\/03\/Engineering-Product@2x-1.png?fit=2400%2C1260","keywords":["GitHub Copilot"],"articleSection":["AI &amp; ML","GitHub Copilot"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/github.blog\/ai-and-ml\/github-copilot\/github-copilot-research-recitation\/","url":"https:\/\/github.blog\/ai-and-ml\/github-copilot\/github-copilot-research-recitation\/","name":"GitHub Copilot research recitation - The GitHub Blog","isPartOf":{"@id":"https:\/\/github.blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/github.blog\/ai-and-ml\/github-copilot\/github-copilot-research-recitation\/#primaryimage"},"image":{"@id":"https:\/\/github.blog\/ai-and-ml\/github-copilot\/github-copilot-research-recitation\/#primaryimage"},"thumbnailUrl":"https:\/\/github.blog\/wp-content\/uploads\/2022\/03\/Engineering-Product@2x-1.png?fit=2400%2C1260","datePublished":"2021-06-30T12:03:36+00:00","dateModified":"2022-08-16T17:59:33+00:00","author":{"@id":"https:\/\/github.blog\/#\/schema\/person\/6515636ca0bf9fd27d1dd5ed61aa2466"},"description":"GitHub Copilot: Parrot or Crow? A first look at rote learning in GitHub Copilot suggestions.","breadcrumb":{"@id":"https:\/\/github.blog\/ai-and-ml\/github-copilot\/github-copilot-research-recitation\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/github.blog\/ai-and-ml\/github-copilot\/github-copilot-research-recitation\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/github.blog\/ai-and-ml\/github-copilot\/github-copilot-research-recitation\/#primaryimage","url":"https:\/\/github.blog\/wp-content\/uploads\/2022\/03\/Engineering-Product@2x-1.png?fit=2400%2C1260","contentUrl":"https:\/\/github.blog\/wp-content\/uploads\/2022\/03\/Engineering-Product@2x-1.png?fit=2400%2C1260","width":2400,"height":1260},{"@type":"BreadcrumbList","@id":"https:\/\/github.blog\/ai-and-ml\/github-copilot\/github-copilot-research-recitation\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/github.blog\/"},{"@type":"ListItem","position":2,"name":"AI &amp; ML","item":"https:\/\/github.blog\/ai-and-ml\/"},{"@type":"ListItem","position":3,"name":"GitHub Copilot","item":"https:\/\/github.blog\/ai-and-ml\/github-copilot\/"},{"@type":"ListItem","position":4,"name":"GitHub Copilot research recitation"}]},{"@type":"WebSite","@id":"https:\/\/github.blog\/#website","url":"https:\/\/github.blog\/","name":"The GitHub Blog","description":"Updates, ideas, and inspiration from GitHub to help developers build and design software.","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/github.blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/github.blog\/#\/schema\/person\/6515636ca0bf9fd27d1dd5ed61aa2466","name":"Albert Ziegler","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/bebe6658a53334f87b6dbb5da430abda1de63af2046253f045e641a9984b460b?s=96&d=mm&r=gd5010a5cbed3d567a9476f693b4065cd","url":"https:\/\/secure.gravatar.com\/avatar\/bebe6658a53334f87b6dbb5da430abda1de63af2046253f045e641a9984b460b?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/bebe6658a53334f87b6dbb5da430abda1de63af2046253f045e641a9984b460b?s=96&d=mm&r=g","caption":"Albert Ziegler"},"url":"https:\/\/github.blog\/author\/wunderalbert\/"}]}},"jetpack_publicize_connections":[],"jetpack_featured_media_url":"https:\/\/github.blog\/wp-content\/uploads\/2022\/03\/Engineering-Product@2x-1.png?fit=2400%2C1260","jetpack_shortlink":"https:\/\/wp.me\/pamS32-gXK","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/github.blog\/wp-json\/wp\/v2\/posts\/65208","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/github.blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/github.blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/github.blog\/wp-json\/wp\/v2\/users\/2003"}],"replies":[{"embeddable":true,"href":"https:\/\/github.blog\/wp-json\/wp\/v2\/comments?post=65208"}],"version-history":[{"count":5,"href":"https:\/\/github.blog\/wp-json\/wp\/v2\/posts\/65208\/revisions"}],"predecessor-version":[{"id":66539,"href":"https:\/\/github.blog\/wp-json\/wp\/v2\/posts\/65208\/revisions\/66539"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/github.blog\/wp-json\/wp\/v2\/media\/63782"}],"wp:attachment":[{"href":"https:\/\/github.blog\/wp-json\/wp\/v2\/media?parent=65208"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/github.blog\/wp-json\/wp\/v2\/categories?post=65208"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/github.blog\/wp-json\/wp\/v2\/tags?post=65208"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/github.blog\/wp-json\/wp\/v2\/coauthors?post=65208"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}