The Cracked Bassoon


My bibliography

Filed under jekyll, liquid, python, references.

Previously, I described how to automatically add references (citations and a bibliography) to Jekyll blog posts. I use a related to method to automatically generate a list of my own publications, which can be seen here.

As I mentioned in that previous post, I have a collection of Python scripts that run when I build my site. This collection also contains the following script, called make_my_bib.py.

"""Create the bibliography file my_papers.yml.

"""

from Bio import Entrez
import re


def make_my_bib():
    """Grab all my publications and format them into a YAML bibliography.

    """
    Entrez.email = "your.email@example.com"
    handle = Entrez.esearch(
        db="pubmed", sort="date", retmax="200", retmode="xml", term="mathias sr[author]"
    )
    pmids = Entrez.read(handle)["IdList"]
    extras = ["28480992", "28385874", "27744290", "32170019"]
    pmids += extras
    pmids = set(pmids)
    pmids = ",".join(pmids)
    handle = Entrez.efetch(db="pubmed", retmode="xml", id=pmids)
    papers = Entrez.read(handle)["PubmedArticle"]
    data = []

    for paper in papers:

        article = paper["MedlineCitation"]["Article"]
        journal = article["Journal"]
        date = journal["JournalIssue"]["PubDate"]
        year = date["Year"]
        month = "00" if "Month" not in date else date["Month"]
        if len(month) == 3:
            month = dict(
                zip(
                    [
                        "Jan",
                        "Feb",
                        "Mar",
                        "Apr",
                        "May",
                        "Jun",
                        "Jul",
                        "Aug",
                        "Sep",
                        "Oct",
                        "Nov",
                        "Dec",
                    ],
                    range(1, 13),
                )
            )[month]
        sort = year + "%02d" % int(month)
        ids = paper["PubmedData"]["ArticleIdList"]

        authors = []
        for _a in article["AuthorList"]:
            if "LastName" in _a:
                a = _a["LastName"] + ", " + ". ".join(_a["Initials"]) + "."
            else:
                a = _a["CollectiveName"].rstrip()
            authors.append(a)
        if len(authors) > 1:
            authors[-1] = "& " + authors[-1]

        jt = journal["Title"].title().replace("Of The", "of the").replace("And", "and")
        jt = jt.replace("Of", "of").replace(" (New York, N.Y. : 1991)", "")
        jt = jt.replace(" (New York, N.Y.)", "")
        jt = jt.split(" : ")[0]
        jt = jt.replace(". ", ": ").replace("In ", "in ").replace(": Cb", "")
        jt = jt.replace("Jama", "JAMA")

        ji = journal["JournalIssue"]
        volume = None if "Volume" not in ji else ji["Volume"]
        issue = None if "Issue" not in ji else ji["Issue"]

        title = article["ArticleTitle"]
        rtn = re.split("([:] *)", title)
        title = "".join([i.capitalize() for i in rtn])
        keep = ["BP1-BP2", "African", "American", "Americans", "MRI", "QTL", "ENIGMA"]
        title = " ".join(s.upper() if s.upper() in keep else s for s in title.split())

        k = "Pagination"
        first_page = None if k not in article else article[k]["MedlinePgn"]
        last_page = None
        if first_page:
            first_page, *last_page = first_page.split("-")

        dic = {
            "authors": ", ".join(authors),
            "title": title if title[-1] != "." else title[:-1],
            "journal": jt,
            "year": year,
            "sort": sort,
            "pmid": str(paper["MedlineCitation"]["PMID"]),
            "doi": [str(i) for i in ids if str(i)[:3] == "10."][0].lower(),
        }
        dic["id"] = dic["pmid"]

        if volume:
            dic["volume"] = volume
        if issue:
            dic["issue"] = issue
        if first_page:
            dic["first_page"] = first_page
        if last_page:
            dic["last_page"] = last_page[0]

        if int(year) >= 2010 and "Correction:" not in title:

            data.append(dic)

    with open("../../_data/my_papers.yaml", "w") as fw:

        for paper in data:

            fw.write(f"p{paper['sort'] + paper['title'].split()[0][:2].lower()}:\n")
            del paper["id"]
            [fw.write(f"""   {k}: "{v}"\n""") for k, v in paper.items()]
            fw.write(f"\n")

        s = "".join(open("../../_data/my_papers_manual.yaml").readlines())
        fw.write(s)


if __name__ == "__main__":
    make_my_bib()

This script uses the Biopython third-party package—specifically the Entrez subpackage—to grab a list of all my publications from PubMed. It collects metadata from these publications, performs a few hard-coded edits, and adds them to a YAML file called my_papers.yaml. It also appends data from another YAML file called my_papers_manual.yaml which, as the name suggests, contains manually entered publications that are not listed on PubMed. Here’s an example item from that file:

p201601re:
   authors: "Mathias, S. R., Knowles, E. E., Kent, J. W., McKay, D. R., Curran, J. E., de Almeida, M. A., Dyer, T. D., Göring, H. H., Olvera, R. L., Duggirala, R., Fox, P. T., Almasy, L., Blangero, J., & Glahn, D. C."
   title: "Recurrent major depression and right hippocampal volume: A bivariate linkage and association study"
   journal: "Human Brain Mapping"
   year: "2016"
   sort: "201601"
   pmid: "26485182"
   doi: "10.1002/hbm.23025"
   volume: "37"
   issue: "1"
   first_page: "191"
   last_page: "202"

These data are interpreted by following Liquid code embedded within the static page publications.md.

{% assign papers = site.data.my_papers | sort %}
{% for paper in papers reversed %}
  {% include citation.html %}
{% endfor %}

The array papers is sorted in reverse chronological order so that the most recent publication appears first. See my previous post to understand how citation.html works.

Version history

Related posts