Fixing the Trilema reference base.
The title originally read "fixing the Trilema trackbase". Then I altered it to "pingbase". Then I realised I don't know why the other should be excluded, so I changed it again, to "trackbase & pingbase". Then it finally dawned on me : there's two poorly defined terms in use, let's fucking define them!
So : a trackback is a type of comment, like a spam comment or an interesting comment. Its top node is comment, and like all comments it is published at the bottom of an article, it has a specific url pointing back to it and it gets a count in the comment stream etcetera. Pinging back, meanwhile, is the act of sending the information that'll be processed such that a trackback is perhaps eventually producedi, and therefore a pingback is that evanescent set of packets, and therefore a blog will contain a machine for turning pingbacks into trackbacks, much like it has a machine for turning comments (in the sense of -- strings people sent) into comments (in the sense of -- strings published on the blog). Don't ask me why there's need for two terms -- ask instead yourself, how the fuck did you live before I showed up to disambiguate your world for you.
But anyways -- these can collectively be referred to as the reference base, and it occurs to me that while Trilema pretty much invented this in any practical sense, it also could stand some improvement. Such is the life of those who "invent" things in the practical sense : they can always improve on them. Everyone else's stuck with perfections.
Specifically : some trackbacks substantially repeat, because at some early point as the trackback url format changed to permit selection, the p-t engine failed to identify two substantially identical but formally different urls as identical. Some trackbacks are missingii, also. Finally, and most importantly : none of the 14`537iii trackbacks currently extant actually use the new selection scheme.
What they do have, of course, is the original date when they were sent. I do not believe this carries much value in self-referential terms, however. For one thing, trackbacks are sorted at the end of the comment section anyway, meaning a trackback however old will still come after a comment however new. For the other thing, all trackbacks link back to an article, which has a publishing date -- people confused by the situation wherein an article from may 2016 was apparently pingbacked by an article from june 2017 in 2020 have more pressing matters to resolve than any involvement with Trilema at all.
So therefore : I am contemplating this measure whereby I will delete all trackbacks sent from Trilema to Trilema ; and then reinstate them with the novel machinery. The list is all readyiv ; it comes off a reviewed versionv of ye venerable trackback fixer
On the downside, this is expected to generate as you can see, upwards of 50`000 "new" comments on Trilema, as far as the RSS readers are concerned, that aren't however new in any substantial sense. Sorry for the inconvenience ; the fix is intended to go in sometime tonight, so with apologies for the noise,
I remain your most faithful,
first,
&foremost
etcetera.
- Tracking back, counterdistinctly, is the possibility of following the pinging blogs from the pinged blog, on the basis of the trackbacks it has published. [↩]
- Sadly, I still don't have a good solution to this problem. What's even sadder is that... well... I don't expect a good solution will be possible on the current TCP/IP infrastructure. [↩]
- Here, for the minutiously curious :
mysql> select count(*) from tril_comments where comment_type="pingback";
+----------+
| count(*) |
+----------+
| 14537 |
+----------+
1 row in set (0.07 sec)And since we started :
mysql> select count(*) from tril_comments where comment_type="pingback" and comment_author_url like "%trilema.com%";
+----------+
| count(*) |
+----------+
| 12630 |
+----------+
1 row in set (0.08 sec)That'd be about 86.9%, 2009 - 2020. [↩]
cat pings.sh | wc -l
86133cat pings.sh | grep -cv "trilema.com/xmlrpc"
67963Look at that, 78.9%. I wonder what this means!
What say you, Ringo, what does it mean ? Does it mean that I'm the righteous man, with Mr .45 here protecting my righteous ass in the valley of darkness ? Or does it mean it's the world that's evil and shellfish ?
Or is the truth simply that you're weak, and so imagine tyrannies of "evil men" ? How is it that after well over a decade, the outbound still beats the inbound, how is it that your entire world is still so much smaller than my hand, my writ and my deed ? [↩]
- Here you go :
<?
// Db connect data, fill in your own.$db_name = '';
$db_user = '';
$db_pass = '';$table_prefix = '';
$nconnection = mysql_connect("localhost", $db_user, $db_pass );
mysql_select_db($db_name, $nconnection);// Index of post at which script last ran. Script won't look through earlier posts.
// You have to update the value manually.$last_run = 90949;
// Part one : select all the articles that contain a link.
$query = 'SELECT YEAR(post_date), post_name, post_content FROM '.$table_prefix.'posts WHERE post_type ="post" AND post_content LIKE "%<a href=%" AND ID > '.$last_run;
$record = mysql_query($query);while ( $row = mysql_fetch_array($record, MYSQL_NUM)) {
// Construct the pinging url from that data
$post_url = "http://trilema.com/".$row[0]."/".$row[1];
// Parse the dom of the article pinging, to extract links out
$dom = new DOMDocument();
@$dom->loadHTML($row[2]);
$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body//a");// For each such link, output the corresponding magic strings that will allow curl to send the needed pingbacks
for ($i = 0; $i < $hrefs->length; $i++) {
$href = $hrefs->item($i);
$url = $href->getAttribute('href');// Eliminate article links to itself (these pingbacks don't ever yield trackbacks)
// Also eliminate links out to a .jpg image -- the list can be readily extended.
if ((parse_url($url, PHP_URL_PATH) != parse_url($post_url, PHP_URL_PATH)) && (substr(parse_url(url, PHP_URL_PATH),-4,4) != ".jpg")) {
echo 'curl -A "Mozilla/5.0" -r 0-4096 --connect-timeout 30 --max-time 10 "http://';
echo parse_url($url, PHP_URL_HOST);
echo '/xmlrpc.php" --header "Content-Type: text/xml" --data "<?xmlversion="1.0"?><methodCall><methodName>pingback.ping</methodName><params><param><value><string>';
echo $post_url;
echo '</string></value></param><param><value><string>';
echo str_replace("'","'",$url);
echo '</string></value></param></params></methodCall>"'."\n";
}
}
}
?>[↩]
Wednesday, 8 January 2020
This now completed. The new count stands at 14`319 trackbacks as of today, meaning that
- only about 2k trackbacks were in fact missing (the figure is likely larger than 14319-12630 due to the duplicates issue discussed).
- the difference between 80k links detected and 14k local trackbacks delivered would have to be no less than 66k links to other blogs, which other blogs are either so very broken they don't know what to do with a pingback, or else run by such intelectually nil operators they don't know what to do with a pingback -- I expect less than 1% of the outreached base fails to qualify for one or the other of those foregoing categories.