/    Sign up×
Community /Pin to ProfileBookmark

help depositing file data into database using PHP

Hello to all,

I am having problems getting an uploaded file to place a portion of its contents into a mySQL database. By portion I mean that there are only certain lines from the input file that I want to extract. My problem isn’t with connecting to the database or uploading the file, it is with the code that operates on the file. Here is the code that I have:

[CODE]<?php

// Eliminate Error Notice
error_reporting(E_ALL & ~E_NOTICE) ;

// This will get an input file from an html page and deposit it
// into a mySQL database
// Gregory Koenig
// Last modified – 10/9/08

// Test if file was uploaded
if(! $_FILES[‘dataFile’][‘tmp_name’])
{
echo “<html><body><h1>No file uploaded</h1></body></html>” ;
exit ;
}

// Test connection to mySQL
// Don’t forget to change the password to ‘******’ !!!!!
$link = mysql_connect( ‘localhost’, ‘*******’, ‘*******’ )
or die( ‘Could not connect to mySQL:’.mysql_error() ) ;

echo “Successful connection !! nn” ;

// Test connectionto database ‘gkoenig’
mysql_select_db( ‘gkoenig’)
or die(‘Could not connect to database!’) ;

echo “Successful selection of database !! nn” ;

// This is where we input the file into the database
// using mySQL commands

$fh = fopen( $_FILES[‘dataFile’][‘tmp_name’], ‘r’) ;

// Parse through Swiss-Prot file and extract the seq_id (Accession),
// seq_type (PRT,DNA, RNA), and seq_data

while($text = fgets($fh))
{
if(feof($fh))
{
exit ;
}

// Grab the seq_type from first line
if(strstr($text, “ID”))
{
$token = strtok($text, “t”) ;
$pre_seq_type = $token[3] ;
// Knock off semi-colon
$seq_id = $token[0].$token[1].$token[2] ;
}

// Grab the seq_id from the second line
if(strstr($text, “AC”))
{
$token2 = strtok($text, “t”) ;
$seq_id = $token2[1] ;
}

// Jump to sequences and collect
if(strstr($text, “SQ”))
{
fgets($fh) ; // Jump to the next line

// Collect sequence
while( ($c = fgetc($fh)) != “/”)
{
$seq_data[] = $c ;
}
}

// Put data into Database
$query = “INSERT into sequences VALUES (‘”.$seq_id.”‘, ‘”.$seq_type.”‘, ‘”.$seq_data.”‘) ” ;

$result = mysql_query($query)
or die(‘Data insertion failed:’.mysql_error() ) ;

// Empty sequence array so that it can accept new data
while ($seq_data)
{
array_pop($seq_data) ;
}

}

?>

<html>
<head>
<title>Sequences in Database</title>
</head>

<body>
<h1>Sequences in Database</h1>

<?php
/*
// Show data that was input into ‘sequences’ Database

$query2 = ‘SELECT * from sequences’ ;

$result2 = mysql_query($query2)
or die(‘Query failed:’.mysql_error() ) ;

echo “<table border=1> n” ;

while( $row = mysql_fetch_assoc($result2) )
{
echo “t<tr>t” ;

foreach ($row as $col)
{
echo “tt<td>$col</td>n” ;

}

echo “t</tr>n” ;
}

echo “</table>n” ;
*/
?>

</body>

</html>[/CODE]

The file that it will work on looks like this (I will only include the relevant portions:

ID 1433F_HUMAN Reviewed; 246 AA.
AC Q04917;
DT 01-OCT-1993, integrated into UniProtKB/Swiss-Prot.
DT 23-JAN-2007, sequence version 4.
.
.
. ” this represents a bunch of info that will be skipped
.
.
FT TURN 213 215
FT HELIX 216 234
SQ SEQUENCE 246 AA; 28219 MW; D70FBC100C45D6E5 CRC64;
MGDREQLLQR ARLAEQAERY DDMASAMKAV TELNEPLSNE DRNLLSVAYK NVVGARRSSW
RVISSIEQKT MADGNEKKLE KVKAYREKIE KELETVCNDV LSLLDKFLIK NCNDFQYESK
VFYLKMKGDY YRYLAEVASG EKKNSVVEAS EAAYKEAFEI SKEQMQPTHP IRLGLALNFS
VFYYEIQNAP EQACLLAKQA FDDAIAELDT LNEDSYKDST LIMQLLRDNL TLWTSDQQDE
EAGEGN
//

When I run the script it was not placing what I wanted into the database.

Any help would be great. If I am not that clear in the post and I will post any other info I can.

Thanks

to post a comment
PHP

12 Comments(s)

Copy linkTweet thisAlerts:
@TheDragonRebornauthorOct 09.2008 — Perhaps to help in fixing my problem, here is the html code:

The php script is in my other submission. I would show the output by the page is too long. Pretty much in my output I want a single row in my mySQL data base to contain:

seq_id, seq_type, seq_data

example:

A12345 PRT MASDFJDFDKAHDSJD....

[CODE]<!- An html page that uploads an input file to a php action
file so that it can be added to a mySQL database
Last modified 10/9/08 -->

<html>

<head>
<title>Depository for Swiss-Prot sequences</title>
</head>

<body bgcolor="grey">

<h1>Add file to Database</h1>

<form enctype="multipart/form-data"
action="<ip address>" method="post">

Data input file:
<input type="file" name="dataFile" size="50">
<br/><br/>


<input type="submit" value="Add sequence to Database">
</form>

</body>

</html>[/CODE]


Any help would be great. I am new to PHP and only a novice to programming so I could really use the help.

thanks
Copy linkTweet thisAlerts:
@opifexOct 10.2008 — verify that your "file" is constructed correctly.

the way i see things "SQ SEQUENCE 246 AA; " would write "SEQUENCE 246 AA" in the table - is that what you want?

you might be better off putting your "file" in a better order - csv maybe - to make it easier to process.
Copy linkTweet thisAlerts:
@TheDragonRebornauthorOct 10.2008 — the file format is predetermined. its a flat file downloaded from a website. I think my use of strstr() isn't working. When I check the contents of my variable, $seq_id, it does not contain what its supposed to (a six character unique identifier).
Copy linkTweet thisAlerts:
@opifexOct 10.2008 — can you download a copy of the file.... exactly as it downloads! and zip it so we can take a look at the structure? there ia probably something really simple that's been overlooked.
Copy linkTweet thisAlerts:
@TheDragonRebornauthorOct 10.2008 — Here is an example file (Its not too long):

[CODE]ID 1433F_HUMAN Reviewed; 246 AA.
AC Q04917;
DT 01-OCT-1993, integrated into UniProtKB/Swiss-Prot.
DT 23-JAN-2007, sequence version 4.
DT 22-JUL-2008, entry version 93.
DE RecName: Full=14-3-3 protein eta;
DE AltName: Full=Protein AS1;
GN Name=YWHAH; Synonyms=YWHA1;
OS Homo sapiens (Human).
OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
OC Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
OC Catarrhini; Hominidae; Homo.
OX NCBI_TaxID=9606;
RN [1]
RP NUCLEOTIDE SEQUENCE [MRNA].
RC TISSUE=Brain;
RX MEDLINE=94032477; PubMed=8218406; DOI=10.1016/0167-4781(93)90053-G;
RA Swanson K.D., Dhar M.S., Joshi J.G.;
RT "The human and bovine 14-3-3 eta protein mRNAs are highly conserved in
RT both their translated and untranslated regions.";
RL Biochim. Biophys. Acta 1216:145-148(1993).
RN [2]
RP NUCLEOTIDE SEQUENCE [GENOMIC DNA].
RC TISSUE=Brain;
RX MEDLINE=92251832; PubMed=1578511;
RA Ichimura-Ohshima Y., Morii K., Ichimura T., Araki K., Takahashi Y.,
RA Isobe T., Minoshima S., Fukuyama R., Shimizu N., Kuwano R.;
RT "cDNA cloning and chromosome assignment of the gene for human brain
RT 14-3-3 protein eta chain.";
RL J. Neurosci. Res. 31:600-605(1992).
RN [3]
RP NUCLEOTIDE SEQUENCE [MRNA].
RA Leffers H., Tommerup N., Celis J.E.;
RL Submitted (MAR-1994) to the EMBL/GenBank/DDBJ databases.
RN [4]
RP NUCLEOTIDE SEQUENCE [MRNA].
RX MEDLINE=96123461; PubMed=8561965;
RA Muratake T., Hayashi S., Ichimura Y., Morii K., Kuwano R.,
RA Ichikawa T., Kumanishi T., Isobe T., Watanabe M., Kondo H.;
RT "The effect on methamphetamine on the mRNA level for 14.3.3 eta chain
RT in the human cultured cells.";
RL Mol. Neurobiol. 11:223-230(1995).
RN [5]
RP NUCLEOTIDE SEQUENCE [GENOMIC DNA].
RX MEDLINE=96411648; PubMed=8812417; DOI=10.1006/geno.1996.0426;
RA Muratake T., Hayashi S., Ichikawa T., Kumanishi T., Ichimura Y.,
RA Kuwano R., Isobe T., Wang Y., Minoshima S., Shimizu N., Takahashi Y.;
RT "Structural organization and chromosomal assignment of the human 14-3-
RT 3 eta chain gene (YWHAH).";
RL Genomics 36:63-69(1996).
RN [6]
RP NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA].
RX PubMed=15461802; DOI=10.1186/gb-2004-5-10-r84;
RA Collins J.E., Wright C.L., Edwards C.A., Davis M.P., Grinham J.A.,
RA Cole C.G., Goward M.E., Aguado B., Mallya M., Mokrab Y., Huckle E.J.,
RA Beare D.M., Dunham I.;
RT "A genome annotation-driven approach to cloning the human ORFeome.";
RL Genome Biol. 5:RESEARCH84.1-RESEARCH84.11(2004).
RN [7]
RP NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
RX MEDLINE=20057165; PubMed=10591208; DOI=10.1038/990031;
RA Dunham I., Hunt A.R., Collins J.E., Bruskiewich R., Beare D.M.,
RA Khan A.S., Lane L., Tilahun Y., Wright H.;
RT "The DNA sequence of human chromosome 22.";
RL Nature 402:489-495(1999).
RN [8]
RP NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA].
RC TISSUE=Lymph;
RX PubMed=15489334; DOI=10.1101/gr.2596504;
RG The MGC Project Team;
RT "The status, quality, and expansion of the NIH full-length cDNA
RT project: the Mammalian Gene Collection (MGC).";
RL Genome Res. 14:2121-2127(2004).
RN [9]
RP NUCLEOTIDE SEQUENCE [MRNA] OF 27-225.
RC TISSUE=Keratinocyte;
RX MEDLINE=93294871; PubMed=8515476; DOI=10.1006/jmbi.1993.1346;
RA Leffers H., Madsen P., Rasmussen H.H., Honore B., Andersen A.H.,
RA Walbum E., Vandekerckhove J., Celis J.E.;
RT "Molecular cloning and expression of the transformation sensitive
RT epithelial marker stratifin. A member of a protein family that has
RT been involved in the protein kinase C signalling pathway.";
RL J. Mol. Biol. 231:982-998(1993).
RN [10]
RP PROTEIN SEQUENCE OF 2-10.
RC TISSUE=Platelet;
RX MEDLINE=22608298; PubMed=12665801; DOI=10.1038/nbt810;
RA Gevaert K., Goethals M., Martens L., Van Damme J., Staes A.,
RA Thomas G.R., Vandekerckhove J.;
RT "Exploring proteomes and analyzing protein processing by mass
RT spectrometric identification of sorted N-terminal peptides.";
RL Nat. Biotechnol. 21:566-569(2003).
RN [11]
RP PROTEIN SEQUENCE OF 2-10; 29-50; 62-69; 126-132; 144-155; 163-172 AND
RP 218-227, CLEAVAGE OF INITIATOR METHIONINE, ACETYLATION AT GLY-2, AND
RP MASS SPECTROMETRY.
RC TISSUE=Platelet;
RA Bienvenut W.V.;
RL Submitted (AUG-2005) to UniProtKB.
RN [12]
RP INTERACTION WITH AR; ESR1; ESR2; MC2R; NRIP1; NR3C1; PPARBP AND THRA.
RX MEDLINE=21168078; PubMed=11266503; DOI=10.1210/me.15.4.501;
RA Zilliacus J., Holter E., Wakui H., Tazawa H., Treuter E.,
RA Gustafsson J.-A.;
RT "Regulation of glucocorticoid receptor activity by 14-3-3-dependent
RT intracellular relocalization of the corepressor RIP140.";
RL Mol. Endocrinol. 15:501-511(2001).
CC -!- FUNCTION: Adapter protein implicated in the regulation of a large
CC spectrum of both general and specialized signaling pathway. Binds
CC -!- SIMILARITY: Belongs to the 14-3-3 family.
CC -----------------------------------------------------------------------
CC Copyrighted by the UniProt Consortium, see http://www.uniprot.org/terms
CC Distributed under the Creative Commons Attribution-NoDerivs License
CC -----------------------------------------------------------------------
DR EMBL; L20422; AAA35483.1; -; mRNA.
DR PROSITE; PS00796; 1433_1; 1.
DR PROSITE; PS00797; 1433_2; 1.
PE 1: Evidence at protein level;
KW 3D-structure; Acetylation; Direct protein sequencing; Phosphoprotein.
FT INIT_MET 1 1 Removed.
FT CHAIN 2 246 14-3-3 protein eta.
FT /FTId=PRO_0000058623.
FT HELIX 140 164
FT HELIX 170 185
FT HELIX 190 206
FT HELIX 207 210
FT TURN 213 215
FT HELIX 216 234
SQ SEQUENCE 246 AA; 28219 MW; D70FBC100C45D6E5 CRC64;
MGDREQLLQR ARLAEQAERY DDMASAMKAV TELNEPLSNE DRNLLSVAYK NVVGARRSSW
RVISSIEQKT MADGNEKKLE KVKAYREKIE KELETVCNDV LSLLDKFLIK NCNDFQYESK
VFYLKMKGDY YRYLAEVASG EKKNSVVEAS EAAYKEAFEI SKEQMQPTHP IRLGLALNFS
VFYYEIQNAP EQACLLAKQA FDDAIAELDT LNEDSYKDST LIMQLLRDNL TLWTSDQQDE
EAGEGN
//
[/CODE]


I have to stop at the tags "AC" and "SQ". I've been able to extract and test the sequence with this code:

[CODE]// Jump to sequences and collect
if(strchr("$text", "SQ"))
{
//fgets($fh) ; // Jump to the next line

// Collect sequence and add to array one at a time
while( ($c = fgetc($fh)) != "/")
{
array_push($sequence,$c) ;
}

// Turn array into long string

$seq_data = implode($sequence) ;

}

echo " Here is the sequence: " ;

foreach($sequence as $letter)
{
echo $letter ;
}


[/CODE]



However, the code I am using for the AC part doesn't work:

[CODE] // Grab the seq_id from the second line

if(strchr("$text", "AC"))
{
$token2 = strtok($text, " ") ;
$seq_id = $token2[2] ;
}[/CODE]


Also, when I put this into my database, its picking up a lot of other nonsense and its creating more then one row. The goal with this example is to have one row only. ( I know that the last field that contains the sequences will be very long.)


Hopefully this helps,

Thank you so much for the help
Copy linkTweet thisAlerts:
@opifexOct 10.2008 — check here for some tips that the author of the data has to offer. one of the formats available is csv and that would avoid a lot of aggravation. but there are several other pointers for other options in the uniprot site that should help.
Copy linkTweet thisAlerts:
@TheDragonRebornauthorOct 10.2008 — The program has to act on a text file. This is an assignment for school.
Copy linkTweet thisAlerts:
@opifexOct 10.2008 — well, csv is a text file......

but...

you can treat the text document as a fixed-width data file and that way you can separate the first column from the second and use the content of the first column as your identifier and then process what you want from the second column.

it will be messy, but will work.
Copy linkTweet thisAlerts:
@TheDragonRebornauthorOct 10.2008 — I'm sorry, you're right about the csv. What I meant is that I have to work on the text file format that I showed.

Also, I've been working on my code and I've narrowed it down to only having a problem with "AC" part. I can get the sequence but, I can't get the seq_id. Here is the code i'm using.

[CODE] while($text = fgets($fh))
{

// Grab the seq_id from the second line
if(strstr("$text", 'AC'))
{
//$token2 = strtok($text, " nt") ;
//$seq_id = $token2[1] ;

// just a sloppy alternative that did not work either
$line = str_split($text) ;
$seq_id = $line[5].$line[6].$line[7].$line[8].$line[9].$line[10] ;


}
[/CODE]


for some reason, it is jumping down the file to a "CC" tag. I know this because the output of my $seq_id variable is = "-!- IN". When I plug that into a Find command in a text editor, it shows it popping up way down in the text file after a "CC" tag. ??????

Any suggestions....
Copy linkTweet thisAlerts:
@opifexOct 11.2008 — you can't rely on line numbers ... the first 2 letters id the "element" and may be repeated or have multiple lines for some identifiers ... the "-!-" can be looked at as defining an "attribute"

look at the xml file for the same "AC" identifier... it will be a lot clearer
Copy linkTweet thisAlerts:
@TheDragonRebornauthorOct 11.2008 — is there a way to stop collecting after I get the first occurrence of AC?

I want to stop with that particular if-condition and go the next.

I tried this, but it breaks me out of the entire while loop.

[CODE]// Grab the seq_id from the second line

if(strstr("$text", 'AC'))
{

$line = str_split($text) ;
$seq_id = $line[5].$line[6].$line[7].$line[8].$line[9].$line[10] ;
break ;

}
[/CODE]


I just want to next jump to my next if-condition that gets the sequence.
Copy linkTweet thisAlerts:
@opifexOct 11.2008 — use [B]for()[/B]

AC will exist only 1 time ... it is the ID for the content

SQ will exist only 1 time also

then you need to get the sequence of data that you want from the SQ content
×

Success!

Help @TheDragonReborn spread the word by sharing this article on Twitter...

Tweet This
Sign in
Forgot password?
Sign in with TwitchSign in with GithubCreate Account
about: ({
version: 0.1.9 BETA 5.19,
whats_new: community page,
up_next: more Davinci•003 tasks,
coming_soon: events calendar,
social: @webDeveloperHQ
});

legal: ({
terms: of use,
privacy: policy
});
changelog: (
version: 0.1.9,
notes: added community page

version: 0.1.8,
notes: added Davinci•003

version: 0.1.7,
notes: upvote answers to bounties

version: 0.1.6,
notes: article editor refresh
)...
recent_tips: (
tipper: @AriseFacilitySolutions09,
tipped: article
amount: 1000 SATS,

tipper: @Yussuf4331,
tipped: article
amount: 1000 SATS,

tipper: @darkwebsites540,
tipped: article
amount: 10 SATS,
)...