help depositing file data into database using PHP

@TheDragonRebornOct 09.2008

Hello to all,

I am having problems getting an uploaded file to place a portion of its contents into a mySQL database. By portion I mean that there are only certain lines from the input file that I want to extract. My problem isn’t with connecting to the database or uploading the file, it is with the code that operates on the file. Here is the code that I have:

[CODE]<?php

// Eliminate Error Notice error_reporting(E_ALL & ~E_NOTICE) ;

// This will get an input file from an html page and deposit it // into a mySQL database // Gregory Koenig // Last modified – 10/9/08

// Test if file was uploaded if(! $_FILES[‘dataFile’][‘tmp_name’]) { echo “<html><body><h1>No file uploaded</h1></body></html>” ; exit ; }

// Test connection to mySQL // Don’t forget to change the password to ‘******’ !!!!! $link = mysql_connect( ‘localhost’, ‘*******’, ‘*******’ ) or die( ‘Could not connect to mySQL:’.mysql_error() ) ;

echo “Successful connection !! nn” ;

// Test connectionto database ‘gkoenig’ mysql_select_db( ‘gkoenig’) or die(‘Could not connect to database!’) ;

echo “Successful selection of database !! nn” ;

// This is where we input the file into the database // using mySQL commands

$fh = fopen( $_FILES[‘dataFile’][‘tmp_name’], ‘r’) ;

// Parse through Swiss-Prot file and extract the seq_id (Accession), // seq_type (PRT,DNA, RNA), and seq_data

while($text = fgets($fh)) { if(feof($fh)) { exit ; }

// Grab the seq_type from first line if(strstr($text, “ID”)) { $token = strtok($text, “t”) ; $pre_seq_type = $token[3] ; // Knock off semi-colon $seq_id = $token[0].$token[1].$token[2] ; }

// Grab the seq_id from the second line if(strstr($text, “AC”)) { $token2 = strtok($text, “t”) ; $seq_id = $token2[1] ; }

// Jump to sequences and collect if(strstr($text, “SQ”)) { fgets($fh) ; // Jump to the next line

// Collect sequence while( ($c = fgetc($fh)) != “/”) { $seq_data[] = $c ; } }

// Put data into Database $query = “INSERT into sequences VALUES (‘”.$seq_id.”‘, ‘”.$seq_type.”‘, ‘”.$seq_data.”‘) ” ;

$result = mysql_query($query) or die(‘Data insertion failed:’.mysql_error() ) ;

// Empty sequence array so that it can accept new data while ($seq_data) { array_pop($seq_data) ; }

}

?>

<html> <head> <title>Sequences in Database</title> </head>

<body> <h1>Sequences in Database</h1>

<?php /* // Show data that was input into ‘sequences’ Database

$query2 = ‘SELECT * from sequences’ ;

$result2 = mysql_query($query2) or die(‘Query failed:’.mysql_error() ) ;

echo “<table border=1> n” ;

while( $row = mysql_fetch_assoc($result2) ) { echo “t<tr>t” ;

foreach ($row as $col) { echo “tt<td>$col</td>n” ;

}

echo “t</tr>n” ; }

echo “</table>n” ; */ ?>

</body>

</html>[/CODE]

The file that it will work on looks like this (I will only include the relevant portions:

ID 1433F_HUMAN Reviewed; 246 AA.
AC Q04917;
DT 01-OCT-1993, integrated into UniProtKB/Swiss-Prot.
DT 23-JAN-2007, sequence version 4.
.
.
. ” this represents a bunch of info that will be skipped
.
.
FT TURN 213 215
FT HELIX 216 234
SQ SEQUENCE 246 AA; 28219 MW; D70FBC100C45D6E5 CRC64;
MGDREQLLQR ARLAEQAERY DDMASAMKAV TELNEPLSNE DRNLLSVAYK NVVGARRSSW
RVISSIEQKT MADGNEKKLE KVKAYREKIE KELETVCNDV LSLLDKFLIK NCNDFQYESK
VFYLKMKGDY YRYLAEVASG EKKNSVVEAS EAAYKEAFEI SKEQMQPTHP IRLGLALNFS
VFYYEIQNAP EQACLLAKQA FDDAIAELDT LNEDSYKDST LIMQLLRDNL TLWTSDQQDE
EAGEGN
//

When I run the script it was not placing what I wanted into the database.

Any help would be great. If I am not that clear in the post and I will post any other info I can.

Thanks

to post a comment

PHP

12 Comments(s) _↴

@TheDragonRebornauthorOct 09.2008 — #Perhaps to help in fixing my problem, here is the html code:

The php script is in my other submission. I would show the output by the page is too long. Pretty much in my output I want a single row in my mySQL data base to contain:

seq_id, seq_type, seq_data

example:

A12345 PRT MASDFJDFDKAHDSJD....

[CODE]<!- An html page that uploads an input file to a php action
 file so that it can be added to a mySQL database
 Last modified 10/9/08 -->
 
 <html>
 
 <head>
 <title>Depository for Swiss-Prot sequences</title>
 </head>
 
 <body bgcolor="grey">
 
  <h1>Add file to Database</h1>
 
  <form enctype="multipart/form-data"
  action="<ip address>" method="post">
 
  Data input file: 
  <input type="file" name="dataFile" size="50">
  <br/><br/>
 
 
  <input type="submit" value="Add sequence to Database">
  </form>
 
 </body>
 
 </html>[/CODE]

Any help would be great. I am new to PHP and only a novice to programming so I could really use the help.

thanks

@opifexOct 10.2008 — #verify that your "file" is constructed correctly.

the way i see things "SQ SEQUENCE 246 AA; " would write "SEQUENCE 246 AA" in the table - is that what you want?

you might be better off putting your "file" in a better order - csv maybe - to make it easier to process.

@TheDragonRebornauthorOct 10.2008 — #the file format is predetermined. its a flat file downloaded from a website. I think my use of strstr() isn't working. When I check the contents of my variable, $seq_id, it does not contain what its supposed to (a six character unique identifier).

@opifexOct 10.2008 — #can you download a copy of the file.... exactly as it downloads! and zip it so we can take a look at the structure? there ia probably something really simple that's been overlooked.

@TheDragonRebornauthorOct 10.2008 — #Here is an example file (Its not too long):

[CODE]ID   1433F_HUMAN             Reviewed;         246 AA.
 AC   Q04917;
 DT   01-OCT-1993, integrated into UniProtKB/Swiss-Prot.
 DT   23-JAN-2007, sequence version 4.
 DT   22-JUL-2008, entry version 93.
 DE   RecName: Full=14-3-3 protein eta;
 DE   AltName: Full=Protein AS1;
 GN   Name=YWHAH; Synonyms=YWHA1;
 OS   Homo sapiens (Human).
 OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
 OC   Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
 OC   Catarrhini; Hominidae; Homo.
 OX   NCBI_TaxID=9606;
 RN   [1]
 RP   NUCLEOTIDE SEQUENCE [MRNA].
 RC   TISSUE=Brain;
 RX   MEDLINE=94032477; PubMed=8218406; DOI=10.1016/0167-4781(93)90053-G;
 RA   Swanson K.D., Dhar M.S., Joshi J.G.;
 RT   "The human and bovine 14-3-3 eta protein mRNAs are highly conserved in
 RT   both their translated and untranslated regions.";
 RL   Biochim. Biophys. Acta 1216:145-148(1993).
 RN   [2]
 RP   NUCLEOTIDE SEQUENCE [GENOMIC DNA].
 RC   TISSUE=Brain;
 RX   MEDLINE=92251832; PubMed=1578511;
 RA   Ichimura-Ohshima Y., Morii K., Ichimura T., Araki K., Takahashi Y.,
 RA   Isobe T., Minoshima S., Fukuyama R., Shimizu N., Kuwano R.;
 RT   "cDNA cloning and chromosome assignment of the gene for human brain
 RT   14-3-3 protein eta chain.";
 RL   J. Neurosci. Res. 31:600-605(1992).
 RN   [3]
 RP   NUCLEOTIDE SEQUENCE [MRNA].
 RA   Leffers H., Tommerup N., Celis J.E.;
 RL   Submitted (MAR-1994) to the EMBL/GenBank/DDBJ databases.
 RN   [4]
 RP   NUCLEOTIDE SEQUENCE [MRNA].
 RX   MEDLINE=96123461; PubMed=8561965;
 RA   Muratake T., Hayashi S., Ichimura Y., Morii K., Kuwano R.,
 RA   Ichikawa T., Kumanishi T., Isobe T., Watanabe M., Kondo H.;
 RT   "The effect on methamphetamine on the mRNA level for 14.3.3 eta chain
 RT   in the human cultured cells.";
 RL   Mol. Neurobiol. 11:223-230(1995).
 RN   [5]
 RP   NUCLEOTIDE SEQUENCE [GENOMIC DNA].
 RX   MEDLINE=96411648; PubMed=8812417; DOI=10.1006/geno.1996.0426;
 RA   Muratake T., Hayashi S., Ichikawa T., Kumanishi T., Ichimura Y.,
 RA   Kuwano R., Isobe T., Wang Y., Minoshima S., Shimizu N., Takahashi Y.;
 RT   "Structural organization and chromosomal assignment of the human 14-3-
 RT   3 eta chain gene (YWHAH).";
 RL   Genomics 36:63-69(1996).
 RN   [6]
 RP   NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA].
 RX   PubMed=15461802; DOI=10.1186/gb-2004-5-10-r84;
 RA   Collins J.E., Wright C.L., Edwards C.A., Davis M.P., Grinham J.A.,
 RA   Cole C.G., Goward M.E., Aguado B., Mallya M., Mokrab Y., Huckle E.J.,
 RA   Beare D.M., Dunham I.;
 RT   "A genome annotation-driven approach to cloning the human ORFeome.";
 RL   Genome Biol. 5:RESEARCH84.1-RESEARCH84.11(2004).
 RN   [7]
 RP   NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
 RX   MEDLINE=20057165; PubMed=10591208; DOI=10.1038/990031;
 RA   Dunham I., Hunt A.R., Collins J.E., Bruskiewich R., Beare D.M.,
 RA   Khan A.S., Lane L., Tilahun Y., Wright H.;
 RT   "The DNA sequence of human chromosome 22.";
 RL   Nature 402:489-495(1999).
 RN   [8]
 RP   NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA].
 RC   TISSUE=Lymph;
 RX   PubMed=15489334; DOI=10.1101/gr.2596504;
 RG   The MGC Project Team;
 RT   "The status, quality, and expansion of the NIH full-length cDNA
 RT   project: the Mammalian Gene Collection (MGC).";
 RL   Genome Res. 14:2121-2127(2004).
 RN   [9]
 RP   NUCLEOTIDE SEQUENCE [MRNA] OF 27-225.
 RC   TISSUE=Keratinocyte;
 RX   MEDLINE=93294871; PubMed=8515476; DOI=10.1006/jmbi.1993.1346;
 RA   Leffers H., Madsen P., Rasmussen H.H., Honore B., Andersen A.H.,
 RA   Walbum E., Vandekerckhove J., Celis J.E.;
 RT   "Molecular cloning and expression of the transformation sensitive
 RT   epithelial marker stratifin. A member of a protein family that has
 RT   been involved in the protein kinase C signalling pathway.";
 RL   J. Mol. Biol. 231:982-998(1993).
 RN   [10]
 RP   PROTEIN SEQUENCE OF 2-10.
 RC   TISSUE=Platelet;
 RX   MEDLINE=22608298; PubMed=12665801; DOI=10.1038/nbt810;
 RA   Gevaert K., Goethals M., Martens L., Van Damme J., Staes A.,
 RA   Thomas G.R., Vandekerckhove J.;
 RT   "Exploring proteomes and analyzing protein processing by mass
 RT   spectrometric identification of sorted N-terminal peptides.";
 RL   Nat. Biotechnol. 21:566-569(2003).
 RN   [11]
 RP   PROTEIN SEQUENCE OF 2-10; 29-50; 62-69; 126-132; 144-155; 163-172 AND
 RP   218-227, CLEAVAGE OF INITIATOR METHIONINE, ACETYLATION AT GLY-2, AND
 RP   MASS SPECTROMETRY.
 RC   TISSUE=Platelet;
 RA   Bienvenut W.V.;
 RL   Submitted (AUG-2005) to UniProtKB.
 RN   [12]
 RP   INTERACTION WITH AR; ESR1; ESR2; MC2R; NRIP1; NR3C1; PPARBP AND THRA.
 RX   MEDLINE=21168078; PubMed=11266503; DOI=10.1210/me.15.4.501;
 RA   Zilliacus J., Holter E., Wakui H., Tazawa H., Treuter E.,
 RA   Gustafsson J.-A.;
 RT   "Regulation of glucocorticoid receptor activity by 14-3-3-dependent
 RT   intracellular relocalization of the corepressor RIP140.";
 RL   Mol. Endocrinol. 15:501-511(2001).
 CC   -!- FUNCTION: Adapter protein implicated in the regulation of a large
 CC       spectrum of both general and specialized signaling pathway. Binds
 CC   -!- SIMILARITY: Belongs to the 14-3-3 family.
 CC   -----------------------------------------------------------------------
 CC   Copyrighted by the UniProt Consortium, see http://www.uniprot.org/terms
 CC   Distributed under the Creative Commons Attribution-NoDerivs License
 CC   -----------------------------------------------------------------------
 DR   EMBL; L20422; AAA35483.1; -; mRNA.
 DR   PROSITE; PS00796; 1433_1; 1.
 DR   PROSITE; PS00797; 1433_2; 1.
 PE   1: Evidence at protein level;
 KW   3D-structure; Acetylation; Direct protein sequencing; Phosphoprotein.
 FT   INIT_MET      1      1       Removed.
 FT   CHAIN         2    246       14-3-3 protein eta.
 FT                                /FTId=PRO_0000058623.
 FT   HELIX       140    164
 FT   HELIX       170    185
 FT   HELIX       190    206
 FT   HELIX       207    210
 FT   TURN        213    215
 FT   HELIX       216    234
 SQ   SEQUENCE   246 AA;  28219 MW;  D70FBC100C45D6E5 CRC64;
 MGDREQLLQR ARLAEQAERY DDMASAMKAV TELNEPLSNE DRNLLSVAYK NVVGARRSSW
 RVISSIEQKT MADGNEKKLE KVKAYREKIE KELETVCNDV LSLLDKFLIK NCNDFQYESK
 VFYLKMKGDY YRYLAEVASG EKKNSVVEAS EAAYKEAFEI SKEQMQPTHP IRLGLALNFS
 VFYYEIQNAP EQACLLAKQA FDDAIAELDT LNEDSYKDST LIMQLLRDNL TLWTSDQQDE
 EAGEGN
 //
 [/CODE]

I have to stop at the tags "AC" and "SQ". I've been able to extract and test the sequence with this code:

[CODE]// Jump to sequences and collect
 if(strchr("$text", "SQ"))
 {
 //fgets($fh) ; // Jump to the next line
 
   // Collect sequence and add to array one at a time
   while( ($c = fgetc($fh)) != "/")
   {
   array_push($sequence,$c) ;
   }
 
   // Turn array into long string
 
   $seq_data = implode($sequence) ;
 
   }
 
 echo " Here is the sequence: " ;
 
   foreach($sequence as $letter)
   {
   echo $letter ;
   }
 
 
 [/CODE]

However, the code I am using for the AC part doesn't work:

[CODE] // Grab the seq_id from the second line 

 if(strchr("$text", "AC"))
 {
 $token2 = strtok($text, "   ") ;
 $seq_id = $token2[2] ; 
 }[/CODE]

Also, when I put this into my database, its picking up a lot of other nonsense and its creating more then one row. The goal with this example is to have one row only. ( I know that the last field that contains the sequences will be very long.)

Hopefully this helps,

Thank you so much for the help

@opifexOct 10.2008 — #check here for some tips that the author of the data has to offer. one of the formats available is csv and that would avoid a lot of aggravation. but there are several other pointers for other options in the uniprot site that should help.

@TheDragonRebornauthorOct 10.2008 — #The program has to act on a text file. This is an assignment for school.

@opifexOct 10.2008 — #well, csv is a text file......

but...

you can treat the text document as a fixed-width data file and that way you can separate the first column from the second and use the content of the first column as your identifier and then process what you want from the second column.

it will be messy, but will work.

@TheDragonRebornauthorOct 10.2008 — #I'm sorry, you're right about the csv. What I meant is that I have to work on the text file format that I showed.

Also, I've been working on my code and I've narrowed it down to only having a problem with "AC" part. I can get the sequence but, I can't get the seq_id. Here is the code i'm using.

[CODE] while($text = fgets($fh)) 
 {
 
   // Grab the seq_id from the second line 
   if(strstr("$text", 'AC'))
   {
   //$token2 = strtok($text, " nt") ;
   //$seq_id = $token2[1] ; 
 
   // just a sloppy alternative that did not work either
   $line = str_split($text) ;
   $seq_id = $line[5].$line[6].$line[7].$line[8].$line[9].$line[10] ;
 
 
   }
 [/CODE]

for some reason, it is jumping down the file to a "CC" tag. I know this because the output of my $seq_id variable is = "-!- IN". When I plug that into a Find command in a text editor, it shows it popping up way down in the text file after a "CC" tag. ??????

Any suggestions....

@opifexOct 11.2008 — #you can't rely on line numbers ... the first 2 letters id the "element" and may be repeated or have multiple lines for some identifiers ... the "-!-" can be looked at as defining an "attribute"

look at the xml file for the same "AC" identifier... it will be a lot clearer

@TheDragonRebornauthorOct 11.2008 — #is there a way to stop collecting after I get the first occurrence of AC?

I want to stop with that particular if-condition and go the next.

I tried this, but it breaks me out of the entire while loop.

[CODE]// Grab the seq_id from the second line 

 if(strstr("$text", 'AC'))
 {
 
   $line = str_split($text) ;
   $seq_id = $line[5].$line[6].$line[7].$line[8].$line[9].$line[10] ;
   break ;
 
   }
 [/CODE]

I just want to next jump to my next if-condition that gets the sequence.

@opifexOct 11.2008 — #use [B]for()[/B]

AC will exist only 1 time ... it is the ID for the content

SQ will exist only 1 time also

then you need to get the sequence of data that you want from the SQ content

Also in #PHP _↴

date comparison PHPMailer - Spam/Junk newbie: install php + import download modules question

Success!

Help @TheDragonReborn spread the word by sharing this article on Twitter...

Tweet This

help depositing file data into database using PHP

12 Comments(s) _↴

Also in #PHP _↴

Success!

Social

Version

help depositing file data into database using PHP

12 Comments(s) ↴

Also in #PHP ↴

Success!

The web is an endless sea of information. Don't miss the boat... Subscribe!

Social

Version

12 Comments(s) _↴

Also in #PHP _↴