Main Content

pdbread

Read data from Protein Data Bank (PDB) file

Syntax

PDBStruct = pdbread(File)
PDBStruct = pdbread(File, 'ModelNum', ModelNumValue)
PDBStruct = pdbread(File,'TimeOut', TimeOutValue)

Input Arguments

File

Either of the following:

  • Character vector or string specifying a file name, a path and file name, or a URL pointing to a file. The referenced file is a Protein Data Bank (PDB)-formatted file (ASCII text file). If you specify only a file name, that file must be on the MATLAB® search path or in the MATLAB Current Folder.

  • Character array or column vector of strings that contains the text of a PDB-formatted file.

Tip

You can use the getpdb function with the 'ToFile' property to retrieve protein structure data from the PDB database and create a PDB-formatted file.

ModelNumValue

Positive integer specifying a model in a PDB-formatted file.

TimeOutValueConnection timeout in seconds, specified as a positive scalar. The default value is 5. For details, see here.

Output Arguments

PDBStructMATLAB structure containing a field for each PDB record.

Description

The Protein Data Bank (PDB) database is an archive of experimentally determined 3-D biological macromolecular structure data. For more information about the PDB format, see:

PDBStruct = pdbread(File) reads the data from PDB-formatted text file File and stores the data in the MATLAB structure, PDBStruct, which contains a field for each PDB record. The following table summarizes the possible PDB records and the corresponding fields in the MATLAB structure PDBStruct:

PDB Database RecordField in the MATLAB Structure
HEADERHeader
OBSLTEObsolete
TITLETitle
CAVEATCaveat
COMPNDCompound
SOURCESource
KEYWDSKeywords
EXPDTAExperimentData
AUTHORAuthors
REVDATRevisionDate
SPRSDESuperseded
JRNLJournal
REMARK 1Remark1
REMARK N

Note

N equals 2 through 999.

Remarkn

Note

n equals 2 through 999.

DBREFDBReferences
SEQADVSequenceConflicts
SEQRESSequence
FTNOTEFootnote
MODRESModifiedResidues
HETHeterogen
HETNAMHeterogenName
HETSYNHeterogenSynonym
FORMULFormula
HELIXHelix
SHEETSheet
TURNTurn
SSBONDSSBond
LINKLink
HYDBNDHydrogenBond
SLTBRGSaltBridge
CISPEPCISPeptides
SITESite
CRYST1Cryst1
ORIGXnOriginX
SCALEnScale
MTRIXnMatrix
TVECTTranslationVector
MODELModel
ATOMAtom
SIGATMAtomSD
ANISOUAnisotropicTemp
SIGUIJAnisotropicTempSD
TERTerminal
HETATMHeterogenAtom
CONECTConnectivity

PDBStruct = pdbread(File, 'ModelNum', ModelNumValue) reads only the model specified by ModelNumValue from the PDB-formatted text file File and stores the data in the MATLAB structure PDBStruct. If ModelNumValue does not correspond to an existing mode number in File, then pdbread reads the coordinate information of all the models.

PDBStruct = pdbread(File,'TimeOut', TimeOutValue) sets the connection timeout (in seconds) to read data from the PDB database.

The Sequence Field

The Sequence field is also a structure containing sequence information in the following subfields:

  • NumOfResidues

  • ChainID

  • ResidueNames — Contains the three-letter codes for the sequence residues.

  • Sequence — Contains the single-letter codes for the sequence residues.

Note

If the sequence has modified residues, then the ResidueNames subfield might not correspond to the standard three-letter amino acid codes. In this case, the Sequence subfield will contain the modified residue code in the position corresponding to the modified residue. The modified residue code is provided in the ModifiedResidues field.

The Model Field

The Model field is also a structure or an array of structures containing coordinate information. If the MATLAB structure contains one model, the Model field is a structure containing coordinate information for that model. If the MATLAB structure contains multiple models, the Model field is an array of structures containing coordinate information for each model. The Model field contains the following subfields:

  • Atom

  • AtomSD

  • AnisotropicTemp

  • AnisotropicTempSD

  • Terminal

  • HeterogenAtom

The Atom Field

The Atom field is also an array of structures containing the following subfields:

  • AtomSerNo

  • AtomName

  • altLoc

  • resName

  • chainID

  • resSeq

  • iCode

  • X

  • Y

  • Z

  • occupancy

  • tempFactor

  • segID

  • element

  • charge

  • AtomNameStruct — Contains three subfields: chemSymbol, remoteInd, and branch.

Examples

  1. Use the getpdb function to retrieve structure information from the Protein Data Bank (PDB) for the nicotinic receptor protein with identifier 1abt, and then save the data to the PDB-formatted file nicotinic_receptor.pdb in the MATLAB Current Folder.

    getpdb('1abt', 'ToFile', 'nicotinic_receptor.pdb');
  2. Read the data from the nicotinic_receptor.pdb file into a MATLAB structure pdbstruct.

    pdbstruct = pdbread('nicotinic_receptor.pdb');
  3. Read only the second model from the nicotinic_receptor.pdb file into a MATLAB structure pdbstruct_Model2.

    pdbstruct_Model2 = pdbread('nicotinic_receptor.pdb', 'ModelNum', 2);
  4. View the atomic coordinate information in the model fields of both MATLAB structures pdbstruct and pdbstruct_Model2.

    pdbstruct.Model
    
    ans = 
    
    1x4 struct array with fields:
        MDLSerNo
        Atom
        Terminal
    
    pdbstruct_Model2.Model
    
    ans = 
    
        MDLSerNo: 2
            Atom: [1x1205 struct]
        Terminal: [1x2 struct]
  5. Read the data from a URL into a MATLAB structure, gfl_pdbstruct.

    gfl_pdbstruct = pdbread('http://www.rcsb.org/pdb/files/1gfl.pdb');

Version History

Introduced before R2006a