by Paul Näger, 30.10.2024
The following Jupyternotebook contains code which automatically crawls the Stanford Encyclopedia of Philosophy (SEP), downloads articles in HTML format and processes the data to yield the articles in markdown format.
This provides a convenient machine readable access to the content of the SEP, as markdown files with their lightweight syntax are much better to process for analysis.
The code was written for research purposes only. Note that the content of the SEP is protected by copyright and has considerable restrictions on distributing its content, see its terms of use at https://plato.stanford.edu/info.html. When using the code always respect these terms! In the crawl section one can indicate a parameter max_links which indicates the maximal number of links to follow, i.e. articles to be downloaded. Before changing that number to a value larger than 0, check the current terms of use.
The following markdown file illustrates the result of applying the code to the SEP article on Neoplatonism (Fall 2024 version), https://plato.stanford.edu/entries/neoplatonism/. Note that the depiction here is a strongly abridged version of the crawling result, which involves only the first approx. 50 words of each main section; it is only meant for giving an impression of the structure of the output.