OILS / doc / admin / cz-letter-of-intent.txt View on Github | oils.pub

155 lines, 98 significant
12022-04-19
2
3
4c. Proposal Summary/Scope of Work
5Provide a short summary of the work being proposed (maximum of 500 words)
6
7The Unix shell is a central user interface and glue language in all kinds of
8scientific computing, particularly in bioinformatics.
9
10Oil is a new open source Unix shell. It's meant to be our upgrade path from GNU
11bash, the most popular shell in the world. Home page: https://www.oilshell.org/
12
13It runs your existing scripts, and allows you to upgrade them to the new Oil
14language, which is designed to be familiar to Python and JavaScript users.
15
16In the last few years, we've released a correct but slow implementation dozens
17of times, and gotten regular feedback from users.
18
19Now we need a COMPILER ENGINEER to finish semi-automatically translating the
20code to fast C++. This plot is a quick way to see this:
21https://www.oilshell.org/blog/2022/03/spec-test-history.png
22
23Blog post with this plot: https://www.oilshell.org/blog/2022/03/middle-out.html
24
25Roughly speaking, we'll have a competitive replacement for / upgrade to bash
26when the red line meets the blue line!
27
28So the work is already more than half done, and I would consider it low risk /
29high reward. Addressing the speed issue will allow us to aggressively add new
30features and polish the documentation.
31
32Our FAQ has over 178K views, having been featured in many places like Hacker
33News: https://www.oilshell.org/blog/2021/01/why-a-new-shell.html
34
35-----
36
37I've drafted the job requirements here:
38https://github.com/oilshell/oil/wiki/Compiler-Engineer-Job
39
40I will use my professional network (having worked at Google and EA) to find the
41compiler engineer, who will be skilled in compilers, C++ and Python.
42
43Python creator Guido van Rossum knows about Oil:
44
45https://twitter.com/gvanrossum/status/995862193609551872
46
47"Amazing. A bash implementation in Python, by my ex-coworker (at Google) Andy
48Chu"
49
50A few years ago he introduced me to 2 compiler engineers working at Dropbox,
51who may be good candidates for the job. However they are highly employed and
52would need to be compensated.
53
54
55
56
57
58
59d. Value to Biomedical Users
60Described the expected value the proposed work to the biomedical research community (maximum of 250 words)
61
62If batch computation on Unix systems is a bottleneck in your lab's "scientific
63discovery loop", then a better Unix shell will make you more productive! You
64can run more experiments with less staff.
65
66Oil treats Unix shell like a real programming language, rather than a mystery
67handed off from one researcher to the next.
68
69Moreover, the software that underlies published experiments is heterogeneous: a
70mix of programs written in different languages, at different times, by
71different people.
72
73The Unix shell glues it all together and provides an interactive interface.
74It's also a powerful interface for using remote computers.
75
76But shell is showing its age and has been neglected by industry and academia.
77It has fundamental flaws like a lack of robust error handling, which lead to
78productivity loss, expensive training, and even erroneous scientific results.
79
80Oil fixes these problems, and adds much-needed features that will be familiar
81to Python, JavaScript, and R users.
82
83Four Features That Justify a New Unix Shell:
84http://www.oilshell.org/blog/2020/10/osh-features.html
85
86A Tour of the Oil Language:
87https://www.oilshell.org/release/latest/doc/oil-language-tour.html
88
89----
90
91Similar sentiments from a third party at https://datacarpentry.org/2015-11-04-ACUNS/shell-intro/
92
93- For most bioinformatics tools, you have to use the shell. There is no
94 graphical interface. If you want to work in metagenomics or genomics you're
95 going to need to use the shell.
96- The shell gives you power ... When you need to do things tens to hundreds of
97 times, knowing how to use the shell is transformative.
98- To use remote computers or cloud computing, you need to use the shell.
99
100
101
102f. Landscape Analysis
103
104Briefly describe the other software tools (either proprietary or open source)
105that the audience for this proposal primarily uses. How do the software
106project(s) in this proposal compare to these other tools in terms of user base
107size, usage, and maturity? How do existing tools and the project(s) in this
108proposal interact? (maximum of 250 words)
109
110
111I made a list of alternative shells:
112https://github.com/oilshell/oil/wiki/Alternative-Shells
113
114Oil is the ONLY shell that is compatible with bash. This effort took years,
115and the work is largely DONE, and documented extensively on the blog. It runs
116thousands of lines of unmodified bash scripts.
117
118Compatibility is important because users (including scientific users) don't
119have time to rewrite working shell scripts in a different language. It's
120expensive, just as it's expensive to rewrite C code in another language.
121
122But it's easy to run existing code under a new shell, and desirable if it
123provides better error handling, debugging, and new features.
124
125------
126
127Scientific workflow languages like CWL, WDL, and Snakemake are increasingly
128popular [1]. However, they generally wrap Unix shell rather than replace it.
129So shell is complementary to these higher level tools.
130
131There are also many such languages, and each one may be especially suited for a
132particular HPC problem domain.
133
134In contrast, Unix shell is ubiquitous in all scientific computing domains, in
135both academia and industry. For example, here are some organizations that are
136teaching shell (found through Google):
137
138https://curriculumfellows.hms.harvard.edu/classes/introduction-command-line-interface-shell-bash-unix-linux
139
140http://chemlabs.princeton.edu/researchcomputing/wp-content/uploads/sites/21/2018/09/hpc-getting-started-chem-workshop.pdf
141
142https://bioinformatics.uconn.edu/unix-basics/#
143
144https://www.melbournebioinformatics.org.au/tutorials/tutorials/unix/unix/
145
146http://williamslab.bscb.cornell.edu/?page_id=235
147
148Shell is also widely used in machine learning. It has the same flavor of
149gluing together disparate data sets and tools that you find in the natural
150sciences.
151
152-----
153
154[1] "A review of bioinformatic pipeline frameworks" https://academic.oup.com/bib/article/18/3/530/2562749
155