PHP script to clean up table data

You just found yourself a great source of data from your favorite search engine. This data has been formatted as a table and you'd love to get a copy and use it in your application or display it on your page. You pull up the html source, the html table code that you see makes your stomach uneasy. It renders nicely in the browser, but this html code..... How will you get this great data out?

There are many ways to clean up or scrape this sort of data. I am going to discuss a quick and dirty way to get this data into php arrays and from there you can use it for your app or generate better html code for the table. This will be done by copying columns into separate lists and having php combine the lists as arrays referencing each individual list by index to grab the right values for each row/column. In firefox you can quickly copy a column by holding down the control ( ctrl ) key and then dragging a selection down the column of the table. Once the table cells you want are highlighted copy them and paste them into your script following the example.


<?php

 $cols = array(
  <<<ENDTEXT
75
103
60
13
ENDTEXT
  ,
  <<<ENDTEXT
132
132
132
15
ENDTEXT
  ,
  <<<ENDTEXT
12/11/2007
12/11/2007
12/11/2007
03/05/2010
ENDTEXT
  ,
  <<<ENDTEXT
18,018
21,585
8,844
3,025
ENDTEXT
  ,
 );

 $col1 = explode("n",array_shift($cols));
 $col1_count = count($col1);

 $arrs = array();
 foreach($cols as $c) {
  $n = explode("n",$c);
  if ( count($n) != $col1_count ) {
   die('column length mismatch');
  }
  array_push($arrs,$n);
 }

 $data = array();
 foreach($col1 as $i => $v) {
  $v = trim($v);
  $n = array($v);
  foreach($arrs as $a) {
   array_push($n,$a[$i]);
  }
  array_push($data,$n);
 }

 var_dump($data);

 echo '<table border="1">';
 foreach($data as $tr) {
  echo '<tr>';
  foreach($tr as $td) {
   echo '<td>'. $td .'</td>';
  }
  echo '</tr>';
 }
 echo '</table>';

?>

You will want to make sure you copied the right amount of data per column. This code will have populated the data array with arrays each representing one row of the table. You can take this one step further and generate a new more concise table that has much nicer code.
It's as easy as that.

Comments

Be the first to leave a comment on this post.

Leave a comment

To leave a comment, please log in / sign up