how to download data from website?

Dear all,
I am trying to download data from the following website
my problem is I can not get the files, meaning only the html been located to my computer. (below what I used to locat the link into my machine)
url='https://oceandata.sci.gsfc.nasa.gov/MODIS-Aqua/Mapped/Daily/4km/sst/2019/'
filename='A2019.nc'
outfilename=websave(filename,url)
what I need is getting the files separtly and read them.
Thanks for the help.

2 commentaires

Juan
Juan le 27 Mai 2023
To download data from a website, follow these steps:
Identify the data you want to download on the website.
Right-click on the data or the download link.
Select the "Save link as" or "Save target as" option from the context menu.
Choose the destination folder on your computer where you want to save the downloaded data.
Click "Save" to initiate the download.
Wait for the download to complete. The time taken will depend on the size of the data and your internet connection speed.
Once the download is finished, you can access the downloaded data from the specified destination folder on your computer.
It's important to note that downloading data from websites should be done in compliance with applicable laws, website terms of service, and copyright restrictions. Ensure that you have the necessary rights and permissions to download and use the data obtained from websites.
SASSA
SASSA le 15 Août 2024
Modifié(e) : Walter Roberson le 5 Déc 2024
The website you linked provides access to oceanographic data, but directly downloading individual files through code might be tricky. Here's a breakdown of what you're encountering and alternative approaches:
Understanding the Download Challenge:
  • The website likely uses a different download method than websave expects. It might require authentication or interact with the server differently.
Alternative Approaches:
  1. Manual Download:
  1. Command-line tools (for advanced users):
  • Tools like wget can be used to download files from websites. However, it might require additional configuration for NASA's specific setup. Refer to the wget documentation and NASA's download instructions (if available) for this approach.
Additional Tips:
  • NASA's OceanColor website provides information on download methods
  • Consider using Python libraries specifically designed for downloading scientific data like earthpy or pydap. These libraries might offer better compatibility with NASA's data access methods.

Connectez-vous pour commenter.

 Réponse acceptée

Akira Agata
Akira Agata le 26 Mar 2019
How about the following?
url = 'https://oceandata.sci.gsfc.nasa.gov/MODIS-Aqua/Mapped/Daily/4km/sst/2019/';
str = webread(url);
links = regexp(str,'https://([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)?(\.nc)','match')';
data = cell(size(links));
for kk = 1:numel(links)
data{kk} = webread(links{kk});
end
By running this, all the .nc files are downloaded and stored in the cell array data.

12 commentaires

Lilya
Lilya le 30 Mar 2019
Thank you so much!!!
I cut and past the code above to try it; however, I got the follwing error after this of code
links = regexp(str,'https://([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)?(\.nc)','match')';
Error using regexp
The 'STRING' input must be either a char row vector, a cell array of char row vectors, or a string array.
I am very new to MATLAB, sorry if I have missed something obvious.
Hi Belinda-san,
Seems strange. Could you tell us what is stored in the variable str after running the first two lines?
url = 'https://oceandata.sci.gsfc.nasa.gov/MODIS-Aqua/Mapped/Daily/4km/sst/2019/';
str = webread(url);
Belinda Finlay
Belinda Finlay le 30 Mar 2020
Modifié(e) : Belinda Finlay le 30 Mar 2020
Hi Akira-San, Thanks for helping. This is the screenshot of the variable str
Hi Belinda-san,
OK. Then, please set 'ContentType' option to 'text' explicitly so that webread returns plain html text.
The following shows how to set this option:
opt = weboptions('ContentType','text');
str = webread(url,opt);
Thanks, that removed the error; however, I still seem to missing something.
I was expecting the .nc files to be downloaded and stored in cell array data; however, that didn't happen.
to simply things I tested this code
url = 'https://oceandata.sci.gsfc.nasa.gov/MODIS-Aqua/Mapped/Daily/4km/sst/2019/';
options = weboptions('ContentType','text');
data = webread(url,options);
However I didn't get a .nc file I got:
May I trouble you to ask what I have missed?
Hi Belinda-san,
No no... please combine my March 26th answer and 31st answer, like:
url = 'https://oceandata.sci.gsfc.nasa.gov/MODIS-Aqua/Mapped/Daily/4km/sst/2019/';
opt = weboptions('ContentType','text');
str = webread(url,opt);
links = regexp(str,'https://([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)?(\.nc)','match')';
data = cell(size(links));
for kk = 1:numel(links)
data{kk} = webread(links{kk});
end
Belinda Finlay
Belinda Finlay le 14 Avr 2020
Thanks you Akira-san,
When I use the code above the data file only contains the the html, not the actual file. (the same as my example above).
Akira Agata
Akira Agata le 15 Avr 2020
Hi Belinda-san,
Thank you for checking the code.
Investigating the process, I've found that the target web site requires user authentication before downloading the files, as shown below.
At March 26th (when I posted the 1st answer), this user authentication was not needed and the code worked as expected.
If you have your username/password for this site, maybe you can download files by setting 'Username', 'Password' options to the webread function.
Yubing Cheng
Yubing Cheng le 2 Mai 2020
Modifié(e) : Walter Roberson le 5 Déc 2024
Hello Sir,
I'm Yubing, a student from China. I have a question want to ask you.
I know the link to dawnload data, and I can use
web('https://simba.srsl.com/admin/include/csvtemps.php?imb=fmi0401')
to download the data file, but the problem is how can I put the data file to my path and folder, and I want to give the data file my own name?
Appreciate for you kind and help.
Belinda Finlay
Belinda Finlay le 12 Mai 2020
Thank you for your assistance Akira-san.
Soo Mee Kim
Soo Mee Kim le 11 Juin 2020
Modifié(e) : Walter Roberson le 5 Déc 2024
Hi, I have the same problem to to download nc files website.
When I tried the following code, both codes gave html file.
Please let me know how to download nc file from the website.
[my own]
fname = 'A2019152.L3m_DAY_PAR_par_4km.nc';
downloadURL = 'https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/A2019152.L3m_DAY_PAR_par_4km.nc';
options = weboptions('Username', '<username>', 'Password', '<password>');
websave(fname, downloadURL, options);
[From Akira Agata's answer]
downloadURL = 'https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/A2019152.L3m_DAY_PAR_par_4km.nc';
options = weboptions('Username', '<username>', 'Password', '<password>');
data = webread(downloadURL, options);

Connectez-vous pour commenter.

Plus de réponses (3)

Tracy
Tracy le 1 Août 2023
It looks like you are using MATLAB's websave function to download the file from the URL. However, the websave function is primarily used to download a file and save it to a specified location. It seems like you are trying to download multiple files, and it won't work as you expect because you are providing a URL of a directory, not a direct link to a specific file.
To download multiple files from the website, you will need to loop through the links in the directory and download each file individually. Additionally, you can use Python with the requests library to achieve this task more easily. Below is a Python script that demonstrates how to download multiple files from the website using the requests library:
python
import requests
import os
url = 'https://oceandata.sci.gsfc.nasa.gov/MODIS-Aqua/Mapped/Daily/4km/sst/2019/'
output_directory = './downloaded_files/'
# Create the output directory if it doesn't exist
os.makedirs(output_directory, exist_ok=True)
response = requests.get(url)
if response.status_code == 200:
# Parse the HTML content to find links to files
file_links = []
lines = response.text.split('\n')
for line in lines:
if '<a href="' in line:
start_index = line.find('<a href="') + len('<a href="')
end_index = line.find('">', start_index)
file_link = line[start_index:end_index]
if file_link.endswith('.nc'): # Only consider links that point to .nc files
file_links.append(file_link)
# Download each file and save it to the output directory
for file_link in file_links:
file_url = url + file_link
out_file_path = os.path.join(output_directory, file_link)
response = requests.get(file_url)
if response.status_code == 200:
with open(out_file_path, 'wb') as f:
f.write(response.content)
print(f"Downloaded: {file_link}")
else:
print(f"Failed to download: {file_link}")
else:
print("Failed to fetch the URL.")
print("All files downloaded successfully.")
Loria Smith
Loria Smith le 5 Sep 2023

0 votes

There are several ways of manual web scraping.
  1. Code a web scraper with Python. It is possible to quickly build software with any general-purpose programming language like Java, JavaScript, PHP, C, C#, and so on. ...
  2. Use a data service. ...
  3. Use Excel for data extraction. ...
  4. Web scraping tools
Hitesh
Hitesh le 3 Oct 2023
Modifié(e) : Walter Roberson le 5 Déc 2024
import requests
# Define the URL of the file you want to download
url = 'https://oceandata.sci.gsfc.nasa.gov/MODIS-Aqua/Mapped/Daily/4km/sst/2019/A2019.nc'
# Specify the local filename where you want to save the downloaded file
filename = 'A2019.nc'
# Send an HTTP GET request to the URL
response = requests.get(url)
# Check if the request was successful (status code 200)
if response.status_code == 200:
# Open a local file with write-binary ('wb') mode and save the content of the response
with open(filename, 'wb') as file:
file.write(response.content)
print(f"File '{filename}' downloaded successfully.")
else:
print(f"Failed to download file. Status code: {response.status_code}")
There are various methods for manually collecting data from websites:
  1. Coding a Web Scraper with Python: You can create a web scraper using Python, a versatile programming language. Python offers libraries like BeautifulSoup and Scrapy that make web scraping easier.
  2. Utilizing a Data Service: Another option is to use a data service or API provided by the website, if available. This method allows you to access structured data without the need for web scraping.
  3. Leveraging Excel for Data Extraction: Microsoft Excel can also be used for data extraction. You can import data from web pages into Excel and then manipulate and analyze it as needed.
  4. Web Scraping Tools: There are various web scraping tools and software applications designed specifically for data extraction from websites. These tools often provide a user-friendly interface for collecting data without extensive coding.

Catégories

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by