How to fix your local trackbacks ?

Monday, 16 March, Year 7 d.Tr. | Author: Mircea Popescu

If you're anything like me, you keep a blog like the sum total of human knowledge, and use newer articles to build upon older articles. In this approach to writing, the fact that newer articles send trackbacks to older articles is very useful, because the list of trackbacks on an older article can for instance indicate the more important nodes to the reader, among other benefits. As the blog grows this accumulated set of notations becomes quite invaluable.

Obviously Wordpress does a shit job of this task as it does a shit job of any other task. In this case, it will simply omit, for no apparent reason, to send some trackbacks. Notwithstanding that they're going from your own blog on your own server to your own blog on your own server, a good chunk get lost en route (imagine what happens when there's an actual internet between source and destination!). Pretty much the only guarantee Automattic offers is that if you write an article F with links to articles A, B, C, D and E, only some of A, B, C, D and E will receive the trackback. Certainly not all, and there's no good way to tell which. Talk about code that's poetry!

To fix this, you have to delve into the perl :

  • Step 1. Traverse the database to produce a list of pingbacks that weren't properly sent.
    • <?
      
      // Index of post at which script last ran. Script won't look
      // through earlier posts. You'll have to update the value yourself.
      $last_run = 0; 
      
      // Db connect data.
      
      $db_name = '';
      $db_user = '';
      $db_pass = '';
      
      $table_prefix  = '';
      
      $nconnection = mysql_connect("localhost", $db_user, $db_pass );
      mysql_select_db($db_name, $nconnection);
      
      // Part one : select all the posts that contain a link to your own blog.
      // Replace the url with your own.
      $local = "http://trilema.com/";
      
      $query = 'SELECT YEAR(post_date), post_name, post_content FROM '.
      $table_prefix.'posts WHERE post_type ="post" AND post_content LIKE
      "%<a href=%" AND ID > '.$last_run;
      $record = mysql_query($query);
      
      while (	$row = mysql_fetch_array($record, MYSQL_NUM)) {
        $post_url = $local.$row[0]."/".$row[1];i
      
        $dom = new DOMDocument();
        @$dom->loadHTML($row[2]);
      
        $xpath = new DOMXPath($dom);
        $hrefs = $xpath->evaluate("/html/body//a");
      
        for ($i = 0; $i < $hrefs->length; $i++) {
          $href = $hrefs->item($i);
          $url = $href->getAttribute('href');
      
          $parse = parse_url($url);
          echo 'curl -A "Mozilla/5.0" -r 0-4096 --connect-timeout 30 ';
          echo '--max-time 10 "http://';
          echo $parse['host'];
          echo '/xmlrpc.php" --header "Content-Type: text/xml" --data ';
          echo '"<?xmlversion="1.0"?>'
          echo '<methodCall><methodName>pingback.ping</methodName>';
          echo '<params><param><value><string>';
          echo $post_url;
          echo '</string></value></param><param><value><string>';
          echo $url;
          echo '</string></value></param></params></methodCall>"'."\n";
        }
      }
      ?>

    This will output a lengthy list of curl commands. You probably want to save it as a file on your server, say fix_trackbacks.php, after which you can call it from command line, perhaps with something like

      $curl http://your.domain/fix_trackbacks.php >> fix_trackbacks.sh

    (Make sure you edit the values of $db_name, $db_user, $db_pass and $local.)

  • Step 2. Connect to the server (via ssh), set the executable bit on the .sh script you just created ($chmod +x fix_trackbacks.sh) and then execute it (./ fix_trackbacks.sh >> result.txt).
  • Step 3. Once you're done, don't forget to set the $last_run variable to the ID of your latest post, so you don't have to go hunting for the correct value whenever you remember you need to run this again next. Or, alternatively, you could simply automate its function.

And in this manner, a very short half hour to an hour later, you'll be exactly in the position you should have been from the ver ybeginning. If only Computer Science weren't afflicted with the terrible curse of playing social security net for the kids that aren't smart enough to sling dope.

PS. There's probably a less... kludgy way to do this same thing, but I'll be damned before I'm paying any "Wordpress expert" a bent nickle, and I'll be triple damned and twice hexed before I'm paying any actual engineer to delve into the depths of that pile of encoded idiocy and figure out how the data structures work. It's a whole world of duct tape and chewing gum out there!

———
  1. Note that if your url scheme isn't domain/year/article like Trilema's, you'll have to fiddle with this string a little. []
Category: Meta psihoza
Comments feed : RSS 2.0. Leave your own comment below, or send a trackback.

5 Responses

  1. Somewhat related to

    > It's a whole world of duct tape and chewing gum out there!

    Is "$parse['host']/xmlrpc.php" the canonical link for XML-RPC services? The way I did it so far was to (manually) lift $xml_rpc_server from ... . I previously tried to automate this with various sh-isms (grep and awk), but this method is ugly and doesn't really work.

    So I guess this is a good time to ask: what's the correct way to grab the XML-RPC link that handles pingbacks?

  2. "..." above being: <head> <link rel="pingback" href="$xml_rpc_server" /> ... </head>.

  3. Mircea Popescu`s avatar
    3
    Mircea Popescu 
    Wednesday, 6 February 2019

    > Is "$parse['host']/xmlrpc.php" the canonical link for XML-RPC services?

    It's the canonical link for mp-wp.

    There's an ad-hocism to provide [some semblance of] generality, if you curl,

    $ curl -v trilema.com

    [data not shown]

    < Date: Wed, 06 Feb 2019 14:22:59 GMT
    < Server: Apache/2.4.16 (Unix) OpenSSL/1.0.1e-fips mod_bwlimited/1.4
    < X-Powered-By: PHP/5.5.30
    < X-Pingback: http://trilema.com/xmlrpc.php
    < Cache-Control: max-age=0
    < Expires: Wed, 06 Feb 2019 14:22:59 GMT
    < Connection: close
    < Transfer-Encoding: chunked
    < Content-Type: text/html; charset=UTF-8

    [data not shown]

    So at least in principle you could look for that X-Pingback ; but I utterly do not see the point.

    In a similar situation 20 years ago, robots.txt was similarily imposed by google (back when google was more or less the MP of the internet). They did it because it was the right thing to do, rather than passive-aggressive white cuckboy nonsense with invented headers & other "generalities" in this peculiar sense of "i did not want to assume" trans-friendly bs.

  4. Mircea Popescu`s avatar
    4
    Mircea Popescu 
    Saturday, 16 March 2019

    For the curious, just ran this script, producing a 1.8mb (!!) bash script which recuperates 3912 possible pingbacks (though many of them are to places like btcbase, which doesn't pingback). It hadn't run since January last year.

  1. [...] of most recent accepted comments. ...awstats as per discussion. ...force missing pingbacks tool, as described. ———One-click shortcut to inserting lengthier fixed forms such as [...]

Add your cents! »
    If this is your first comment, it will wait to be approved. This usually takes a few hours. Subsequent comments are not delayed.