<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by Benjamin Morel on Medium]]></title>
        <description><![CDATA[Stories by Benjamin Morel on Medium]]></description>
        <link>https://medium.com/@benmorel?source=rss-a8ebc8eeeb2f------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/1*yDS5NmGknWxTVRvbAmP6FQ.png</url>
            <title>Stories by Benjamin Morel on Medium</title>
            <link>https://medium.com/@benmorel?source=rss-a8ebc8eeeb2f------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Sat, 30 May 2026 06:44:03 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@benmorel/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[To JOIN or not to JOIN]]></title>
            <link>https://medium.com/@benmorel/to-join-or-not-to-join-bba9c1377c10?source=rss-a8ebc8eeeb2f------2</link>
            <guid isPermaLink="false">https://medium.com/p/bba9c1377c10</guid>
            <category><![CDATA[programming]]></category>
            <category><![CDATA[sql]]></category>
            <category><![CDATA[database]]></category>
            <category><![CDATA[join]]></category>
            <category><![CDATA[performance]]></category>
            <dc:creator><![CDATA[Benjamin Morel]]></dc:creator>
            <pubDate>Tue, 08 Jan 2019 23:08:15 GMT</pubDate>
            <atom:updated>2019-01-09T17:53:20.505Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*y6lhjvtw9zvIYpaEC5rvQg.jpeg" /></figure><p>The question crops up regularly on StackOverflow:</p><ul><li><a href="https://stackoverflow.com/q/1067016/759866">Are JOIN queries faster than several queries?</a></li><li><a href="https://stackoverflow.com/q/1932019/759866">Which provides better performance: one big join or multiple queries?</a></li><li><a href="https://dba.stackexchange.com/q/42998/5068">Are individual queries faster than joins?</a></li></ul><p>And many, many more. The question pretty much always boils down to: given a table A with a foreign key to table B, is it faster to perform a single query to load all A’s together with their B’s using a JOIN:</p><pre>SELECT A.*, B.* FROM A JOIN B ON B.id = A.b_id;</pre><p>Or to load the A’s alone, then query each B individually depending on the values returned by the first query:</p><pre>SELECT * FROM A;<br>SELECT * FROM B WHERE id = …;<br>SELECT * FROM B WHERE id = …;<br>SELECT * FROM B WHERE id = …;<br>…</pre><p>The latter approach is commonly referred to as the <em>N+1 problem</em> (we’ll see why below): you’re executing at most N + 1 queries, where N is the number of records returned from the first table.</p><p>Yet another way is to execute 2 queries, one to load the A’s, then another one to load all the B’s in a single query, depending on the result of the first query:</p><pre>SELECT * FROM A;<br>SELECT * FROM B WHERE id IN(…, …, …);</pre><p>Most of the answers you’ll find to the question “which one is faster?” will be along the lines of “JOIN will always be faster”, “it depends”, and <strong>“you should benchmark it”</strong>.</p><p>That’s exactly what we’re here for today.</p><h3>The benchmark</h3><p>I’m using a <a href="https://github.com/datacharmer/test_db">sample employee database</a> readily available on GitHub. I’m loading entries from the salaries table, each salary record referencing a distinct record from the employees table:</p><iframe src="https://cdn.embedly.com/widgets/media.html?url=https%3A%2F%2Fplot.ly%2F%7EBenMorel%2F60%2F&amp;src=https%3A%2F%2Fplot.ly%2F%7EBenMorel%2F60.embed%3Fautosize%3Dtrue&amp;type=text%2Fhtml&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;schema=plot" width="600" height="400" frameborder="0" scrolling="no"><a href="https://medium.com/media/183290dd23c778170d2a8dc1f2f3809c/href">https://medium.com/media/183290dd23c778170d2a8dc1f2f3809c/href</a></iframe><p>No surprise here, <strong>JOIN is indeed, by far, the fastest way</strong>, followed by the WHERE IN approach.</p><p>The N+1 query performance, on the other hand, drops drastically as soon as you’re selecting more than 1 record: <strong>we’re talking about a 10x decrease in performance when loading just 15 records!</strong></p><p>The reason is that <em>each query to the database incurs a fixed cost</em>, even in the perfect conditions: connecting through a local socket, using real, non-emulated prepared statements.</p><h3>When does it matter?</h3><p>When you write your database queries by hand, you usually know what you’re doing, and should already be using JOINs as appropriate.</p><p>A common situation where the N+1 problem typically occurs though, is when using an ORM that abstracts these queries for you, such as Doctrine 2 for PHP. If you’re not careful enough, you may load a collection of entities and traverse associations (Salary -&gt; Employee in this case), and <strong>the ORM will lazy-load each record as you access it</strong>, effectively issuing N+1 queries — or more, if you have nested relationships.</p><p>Given the huge performance gap, you should be on the lookout for every N+1 your application may trigger:</p><ul><li>always eager load associations that you know you’re going to traverse</li><li>keep an eye on your logs to ensure that you’re not running N+1 queries without knowing it!</li></ul><h3>The exception to the rule</h3><p>If a lot of your referenced records are the same (a lot of A’s pointing to the same B), then JOINing records will lead to a lot of duplicated data in the result set, while a N+1 or WHERE IN approach would load the referenced records only once.</p><p>How does JOIN stack up against multiple queries, in this case?</p><p>Here is a benchmark of the best case scenario, where all salaries point to the same employee:</p><iframe src="https://cdn.embedly.com/widgets/media.html?url=https%3A%2F%2Fplot.ly%2F%7EBenMorel%2F62%2F&amp;src=https%3A%2F%2Fplot.ly%2F%7EBenMorel%2F62.embed%3Fautosize%3Dtrue&amp;type=text%2Fhtml&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;schema=plot" width="600" height="400" frameborder="0" scrolling="no"><a href="https://medium.com/media/95c6a96ddc1ee4b64eeabea46beac034/href">https://medium.com/media/95c6a96ddc1ee4b64eeabea46beac034/href</a></iframe><p><em>(Note that I did not benchmark WHERE IN here, as for a single record, it will be the same as N+1.)</em></p><p>JOIN still wins here, but we see a trend as the curves seem to be close to bumping into each other. Let’s push the number of records further:</p><iframe src="https://cdn.embedly.com/widgets/media.html?url=https%3A%2F%2Fplot.ly%2F%7EBenMorel%2F64%2F&amp;src=https%3A%2F%2Fplot.ly%2F%7EBenMorel%2F64.embed%3Fautosize%3Dtrue&amp;type=text%2Fhtml&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;schema=plot" width="600" height="400" frameborder="0" scrolling="no"><a href="https://medium.com/media/1daa5de04799ed295b766af3d0ef436e/href">https://medium.com/media/1daa5de04799ed295b766af3d0ef436e/href</a></iframe><p>Ah, now you have it. N+1 is faster than JOIN. For this to happen however, you need to select <em>a lot</em> of records in A, that point to a ridiculously low number of distinct records in B. I’d argue that this is rarely, if ever, the case in any application, so you can safely always use JOIN and be happy!</p><p>This is, of course, if the foreign key is on A (-to-one relationship); if the foreign key is on B (-to-many relationship), you may end up with rows from A duplicated many times, and are probably better off with a WHERE IN approach.</p><p>The benchmarks above have been run on PHP 7 (PDO) and MySQL 8 on localhost, using prepared statements. Your mileage <em>will</em> vary if you use another programming language / connector / database / dataset, but I expect the overall trend to be very similar. Drop a comment if you find out that it’s not!</p><p>You can find the benchmark code in <a href="https://gist.github.com/BenMorel/13bb06f084c9e4d066d4c0e5d98d233c">this Gist</a>.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=bba9c1377c10" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Removing the MySQL root password]]></title>
            <link>https://medium.com/@benmorel/remove-the-mysql-root-password-ba3fcbe29870?source=rss-a8ebc8eeeb2f------2</link>
            <guid isPermaLink="false">https://medium.com/p/ba3fcbe29870</guid>
            <category><![CDATA[mysql]]></category>
            <dc:creator><![CDATA[Benjamin Morel]]></dc:creator>
            <pubDate>Thu, 07 Sep 2017 11:58:40 GMT</pubDate>
            <atom:updated>2020-04-30T15:00:47.793Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*42MAW4UcPH1l9AXmsySvWQ.jpeg" /></figure><p>There you go again. You just spinned up a virtual machine to do some testing, installed MySQL using your favourite package manager, started the server, and failed to connect:</p><pre>$ mysql --user=root<br>ERROR 1045 (28000): Access denied for user &#39;root&#39;@&#39;localhost&#39; (using password: NO)</pre><p>Since version 5.7, MySQL is secure-by-default:</p><ul><li>a random root password is generated upon installation; you need to read this password from the server log</li><li>you have to change this password the first time you connect</li><li>you cannot use a blank password because of the <a href="https://dev.mysql.com/doc/refman/5.7/en/validate-password-plugin.html">validate_password</a> plugin</li></ul><p>This is all good security-wise. But if you’re just installing MySQL on a local VM for your own testing, this can become really annoying.</p><p>To remove the MySQL root password, just run the following script right after installing <strong>and starting</strong> the MySQL server:</p><p>On MySQL 5.7:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/ff791199c08f7ee599cfde8c0e1a3c50/href">https://medium.com/media/ff791199c08f7ee599cfde8c0e1a3c50/href</a></iframe><p>On MySQL 8.0:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/cfa451f50d126b84edbceac6d7582e2f/href">https://medium.com/media/cfa451f50d126b84edbceac6d7582e2f/href</a></iframe><p>Note: <strong>you must execute this script as root</strong>.</p><p>The script performs the following actions:</p><ul><li>reads the temporary password from the log file</li><li>changes this password to another temporary password that passes the validate_password checks</li><li>uninstalls the validate_password plugin (or <em>component </em>in MySQL 8)</li><li>sets a blank password</li></ul><p>You can now connect without a password:</p><pre>$ mysql --user=root<br>Welcome to the MySQL monitor.  Commands end with ; or \g.<br>Your MySQL connection id is 7~<br>Server version: 5.7.19 MySQL Community Server (GPL)</pre><pre>Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.</pre><pre>Oracle is a registered trademark of Oracle Corporation and/or its<br>affiliates. Other names may be trademarks of their respective<br>owners.</pre><pre>Type &#39;help;&#39; or &#39;\h&#39; for help. Type &#39;\c&#39; to clear the current input statement.</pre><pre>mysql&gt;</pre><p>Be careful that this leaves your MySQL installation unsecured, <em>you should not use this for anything serious</em>!</p><h4>A secure alternative</h4><p>If you’re mainly using MySQL from the command line, you can keep the root account protected by a password, while still avoiding the inconvenience of having to provide the password on the command line.</p><p>Just create a ~/.my.cnf file:</p><pre>[client]<br>user = root<br>password = xxx</pre><p>You can now just type mysql, and the MySQL client will automatically log in with these credentials.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=ba3fcbe29870" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Creating a Linux service with systemd]]></title>
            <link>https://medium.com/@benmorel/creating-a-linux-service-with-systemd-611b5c8b91d6?source=rss-a8ebc8eeeb2f------2</link>
            <guid isPermaLink="false">https://medium.com/p/611b5c8b91d6</guid>
            <category><![CDATA[programming]]></category>
            <category><![CDATA[linux]]></category>
            <category><![CDATA[devops]]></category>
            <dc:creator><![CDATA[Benjamin Morel]]></dc:creator>
            <pubDate>Tue, 05 Sep 2017 10:34:21 GMT</pubDate>
            <atom:updated>2019-05-09T09:01:42.213Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*FRVJrLHTXN3S3VhpnWYmug.jpeg" /><figcaption>Crafting your own services — Photo by <a href="https://unsplash.com/photos/8z2Q6XWLYa4?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Jeff Sheldon</a> on <a href="https://unsplash.com/?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Unsplash</a></figcaption></figure><p>While writing web applications, I often need to offload compute-heavy tasks to an asynchronous worker script, schedule tasks for later, or even write a daemon that listens to a socket to communicate with clients directly.</p><p>While there might sometimes be better tools for the job — always consider using existing software first, such as a task queue server —writing your own service can give you a level of flexibility you’ll never get when bound by the constraints of third-party software.</p><p>The cool thing is that it’s fairly easy to create a Linux service: use your favourite programming language to write a long-running program, and turn it into a service using systemd.</p><h3>The program</h3><p>Let’s create a small server using PHP. I can see your eyebrows rising, but it works surprisingly well. We’ll listen to UDP port 10000, and return any message received with a <a href="https://en.wikipedia.org/wiki/ROT13">ROT13</a> transformation:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/c5940ba07d3c126c852e8530332258fc/href">https://medium.com/media/c5940ba07d3c126c852e8530332258fc/href</a></iframe><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/674e2c2adc4160cb6bf575a674c8dc1c/href">https://medium.com/media/674e2c2adc4160cb6bf575a674c8dc1c/href</a></iframe><p>Let’s start it:</p><pre>$ php server.php</pre><p>And test it in another terminal:</p><pre>$ nc -u 127.0.0.1 10000<br>Hello, world!<br>Uryyb, jbeyq!</pre><p>Cool, it works. Now we want this script to run at all times, be restarted in case of a failure (unexpected exit), and even survive server restarts. That’s where systemd comes into play.</p><h3>Turning it into a service</h3><p>Let’s create a file called /etc/systemd/system/rot13.service:</p><pre>[Unit]<br>Description=ROT13 demo service<br>After=network.target<br>StartLimitIntervalSec=0</pre><pre>[Service]<br>Type=simple<br>Restart=always<br>RestartSec=1<br>User=centos<br>ExecStart=/usr/bin/env php /path/to/server.php<br><br>[Install]<br>WantedBy=multi-user.target</pre><p>You’ll need to:</p><ul><li>set your actual username after User=</li><li>set the proper path to your script in ExecStart=</li></ul><p>That’s it. We can now start the service:</p><pre>$ systemctl start rot13</pre><p>And automatically get it to start on boot:</p><pre>$ systemctl enable rot13</pre><h3>Going further</h3><p>Now that your service (hopefully) works, it may be important to dive a bit deeper into the configuration options, and ensure that it will always work as you expect it to.</p><h4>Starting in the right order</h4><p>You may have wondered what the After= directive did. It simply means that your service must be started <em>after </em>the network is ready. If your program expects the MySQL server to be up and running, you should add:</p><pre>After=mysqld.service</pre><h4>Restarting on exit</h4><p>By default, systemd does not restart your service if the program exits for whatever reason. This is usually not what you want for a service that must be always available, so we’re instructing it to always restart on exit:</p><pre>Restart=always</pre><p>You could also use on-failure to only restart if the exit status is not 0.</p><p>By default, systemd attempts a restart after 100ms. You can specify the number of seconds to wait before attempting a restart, using:</p><pre>RestartSec=1</pre><h4>Avoiding the trap: the start limit</h4><p>I personally fell into this one more than once. By default, when you configure Restart=always as we did, <strong>systemd gives up restarting your service if it fails to start more than 5 times within a 10 seconds interval</strong>. Forever.</p><p>There are two [Unit] configuration <a href="https://www.freedesktop.org/software/systemd/man/systemd.unit.html#StartLimitIntervalSec=">options</a> responsible for this:</p><pre>StartLimitBurst=5<br>StartLimitIntervalSec=10</pre><p>The RestartSec directive also has an impact on the outcome: if you set it to restart after 3 seconds, then you can never reach 5 failed retries within 10 seconds.</p><p><strong>The simple fix that always works is to set </strong><strong>StartLimitIntervalSec=0. </strong>This way, systemd will attempt to restart your service forever.</p><p>It’s a good idea to set RestartSec to at least 1 second though, to avoid putting too much stress on your server when things start going wrong.</p><p>As an alternative, you can leave the default settings, and ask systemd to restart your server if the start limit is reached, using StartLimitAction=reboot.</p><h3>Is that really it?</h3><p>That’s all it takes to create a Linux service with systemd: writing a small configuration file that references your long-running program.</p><p>Systemd has been the default init system in RHEL/CentOS, Fedora, Ubuntu, Debian and others for several years now, so chances are that your server is ready to host your homebrew services!</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=611b5c8b91d6" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[High-speed inserts with MySQL]]></title>
            <link>https://medium.com/@benmorel/high-speed-inserts-with-mysql-9d3dcd76f723?source=rss-a8ebc8eeeb2f------2</link>
            <guid isPermaLink="false">https://medium.com/p/9d3dcd76f723</guid>
            <category><![CDATA[database]]></category>
            <category><![CDATA[mysql]]></category>
            <dc:creator><![CDATA[Benjamin Morel]]></dc:creator>
            <pubDate>Mon, 04 Sep 2017 18:03:25 GMT</pubDate>
            <atom:updated>2019-06-21T14:10:57.785Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*9Ihf50zErzTg4KR4JnodzA.jpeg" /><figcaption>Get the dolphin up to speed — Photo by <a href="https://unsplash.com/photos/5Xm_LIystCg?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">JIMMY ZHANG</a> on <a href="https://unsplash.com/?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Unsplash</a></figcaption></figure><p>When you need to bulk-insert many million records in a MySQL database, you soon realize that sending INSERT statements one by one is not a viable solution.</p><p>The MySQL documentation has some <a href="https://dev.mysql.com/doc/refman/5.7/en/insert-optimization.html">INSERT optimization tips</a> that are worth reading to start with.</p><p>I will try to summarize here the two main techniques to efficiently load data into a MySQL database.</p><h3>LOAD DATA INFILE</h3><p>If you’re looking for raw performance, this is indubitably your solution of choice. LOAD DATA INFILE is a highly optimized, MySQL-specific statement that directly inserts data into a table from a CSV / TSV file.</p><p>There are two ways to use LOAD DATA INFILE. You can copy the data file to the server&#39;s data directory (typically /var/lib/mysql-files/) and run:</p><pre>LOAD DATA INFILE &#39;/path/to/products.csv&#39; INTO TABLE products;</pre><p>This is quite cumbersome as it requires you to have access to the server’s filesystem, set the proper permissions, etc.</p><p>The good news is, you can also store the data file <em>on the client side</em>, and use the LOCAL keyword:</p><pre>LOAD DATA <strong>LOCAL </strong>INFILE &#39;/path/to/products.csv&#39; INTO TABLE products;</pre><p>In this case, the file is read from the client’s filesystem, transparently copied to the server’s temp directory, and imported from there. All in all, <strong>it’s almost as fast as loading from the server’s filesystem directly</strong>. You do need to ensure that this <a href="https://dev.mysql.com/doc/refman/5.7/en/server-system-variables.html#sysvar_local_infile">option</a> is enabled on your server, though.</p><p>There are many options to LOAD DATA INFILE, mostly related to how your data file is structured (field delimiter, enclosure, etc.). Have a look at the <a href="https://dev.mysql.com/doc/refman/5.7/en/load-data.html">documentation</a> to see them all.</p><p>While LOAD DATA INFILE is your best option performance-wise, it requires you to have your data ready as delimiter-separated text files. If you don’t have such files, you’ll need to spend additional resources to create them, and will likely add a level of complexity to your application. Fortunately, there’s an alternative.</p><h3>Extended inserts</h3><p>A typical SQL INSERT statement looks like:</p><pre>INSERT INTO user (id, name) VALUES (1, &#39;Ben&#39;);</pre><p>An extended INSERT groups several records into a single query:</p><pre>INSERT INTO user (id, name) VALUES (1, &#39;Ben&#39;), (2, &#39;Bob&#39;);</pre><p>The key here is to find the optimal number of inserts per query to send. There is no one-size-fits-all number, so you need to benchmark a sample of your data to find out the value that yields the maximum performance, or the best tradeoff in terms of memory usage / performance.</p><p>To get the most out of extended inserts, it is also advised to:</p><ul><li>use prepared statements</li><li>run the statements in a transaction</li></ul><h3>The benchmark</h3><p>I’m inserting 1.2 million rows, 6 columns of mixed types, ~26 bytes per row on average. I tested two common configurations:</p><ul><li>Client and server on the same machine, communicating through a UNIX socket</li><li>Client and server on separate machines, on a very low latency (&lt; 0.1 ms) Gigabit network</li></ul><p>As a basis for comparison, I copied the table using INSERT … SELECT, yielding a performance of <strong>313,000 inserts / second</strong>.</p><h4>LOAD DATA INFILE</h4><p>To my surprise, LOAD DATA INFILE proves <em>faster </em>than a table copy:</p><ul><li>LOAD DATA INFILE: <strong>377,000</strong> inserts / second</li><li>LOAD DATA LOCAL INFILE over the network: <strong>322,000</strong> inserts / second</li></ul><p>The difference between the two numbers seems to be directly related to the time it takes to transfer the data from the client to the server: the data file is 53 MiB in size, and the timing difference between the 2 benchmarks is 543 ms, which would represent a transfer speed of 780 mbps, close to the Gigabit speed.</p><p>This means that, in all likelihood, <em>the MySQL server does not start processing the file until it is fully transferred</em>: your insert speed is therefore directly related to the bandwidth between the client and the server, which is important to take into account if they are not located on the same machine.</p><h4>Extended inserts</h4><p>I measured the insert speed using BulkInserter, a PHP class part of <a href="https://github.com/brick/db">an open-source library</a> that I wrote, with up to 10,000 inserts per query:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/700/1*k_QS1qtgN5-UyrDkjSRg_w.png" /></figure><p>As we can see, the insert speed raises quickly as the number of inserts per query increases. We got a 6× increase in performance on localhost and a 17× increase over the network, compared to the sequential INSERT speed:</p><ul><li>40,000 → 247,000 inserts / second on localhost</li><li>12,000 → 201,000 inserts / second over the network</li></ul><p>It takes around 1,000 inserts per query to reach the maximum throughput in both cases, but <strong>40 inserts per query are enough to achieve 90% of this throughput</strong> on localhost, which could be a good tradeoff here. It’s also important to note that <em>after a peak, the performance actually decreases</em> as you throw in more inserts per query.</p><p>The benefit of extended inserts is higher over the network, because sequential insert speed becomes a function of your latency:</p><pre>max sequential inserts per second ~= 1000 / ping in milliseconds</pre><p>The higher the latency between the client and the server, the more you’ll benefit from using extended inserts.</p><h3>Conclusion</h3><p>As expected, <strong>LOAD DATA INFILE is the preferred solution when looking for raw performance on a single connection</strong>. It requires you to prepare a properly formatted file, so if you have to generate this file first, and/or transfer it to the database server, be sure to take that into account when measuring insert speed.</p><p>Extended inserts on the other hand, do not require a temporary text file, and can give you around 65% of the LOAD DATA INFILE throughput, which is a very reasonable insert speed. It’s interesting to note that it doesn’t matter whether you’re on localhost or over the network, <strong>grouping several inserts in a single query always yields better performance</strong>.</p><p>If you decide to go with extended inserts, be sure to <strong>test your environment with a sample of your real-life data</strong> and a few different inserts-per-query configurations before deciding upon which value works best for you.</p><p>Be careful when increasing the number of inserts per query, as it may require you to:</p><ul><li>allocate more memory on the client side</li><li>increase the <a href="https://dev.mysql.com/doc/refman/5.7/en/server-system-variables.html#sysvar_max_allowed_packet">max_allowed_packet</a> setting on the MySQL server</li></ul><p>As a final note, it’s worth mentioning that according to Percona, you can achieve even better performance using concurrent connections, partitioning, and multiple buffer pools. See <a href="http://www.percona.com/blog/2011/01/07/high-rate-insertion-with-mysql-and-innodb/">this post on their blog</a> for more information.</p><p><em>The benchmarks have been run on a bare metal server running Centos 7 and MySQL 5.7, Xeon E3 @ 3.8 GHz, 32 GB RAM and NVMe SSD drives. The MySQL benchmark table uses the InnoDB storage engine.</em></p><p><em>The benchmark source code can be found in </em><a href="https://gist.github.com/BenMorel/78f742356391d41c91d1d733f47dcb13"><em>this gist</em></a><em>. The benchmark result graph is available on </em><a href="https://plot.ly/~BenMorel/52"><em>plot.ly</em></a><em>.</em></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=9d3dcd76f723" width="1" height="1" alt="">]]></content:encoded>
        </item>
    </channel>
</rss>