Contributing to CPython is a great way to improve your programming skills, learn about the Python language, and help the Python community. Here are some steps to start contributing to CPython:
Set up your development environment:
bashgit clone https://github.com/YOUR_USERNAME/cpython.git
cd cpython
bashpython3 -m venv venv
source venv/bin/activate
python -m pip install -r requirements.txt
Build CPython from source:
bash./configure --with-pydebug make -j
PCbuild\build.bat -e -d
Run the test suite:
bash./python -m test
Find an issue to work on:
Fix the issue and add tests:
Lib/test directory.str.capitalize() method, you might add a test like this in Lib/test/test_string.py:pythondef test_capitalize_bugfix(self):
# Description of the bugfix and issue number
s = "your test string"
expected = "Your expected result"
self.assertEqual(s.capitalize(), expected)
Commit your changes:
csharpgit add FILE1 FILE2 ...
sqlgit commit -m "bpo-12345: Fixed bug in str.capitalize()"
Run the test suite again to ensure everything still works:
bash./python -m test
Push your changes to your fork on GitHub:
perlgit push origin BRANCH_NAME
Create a pull request:
python/cpython and the base branch is main.Address any feedback from the core developers and maintainers. They may suggest changes or ask for clarification. Update your PR as needed and continue the discussion until your changes are approved and merged.
Happy coding!
As an AI language model, I cannot interact with live repositories or check the current status of a repository. However, I can suggest a hypothetical change to illustrate the process of making a contribution to CPython.
Hypothetical change: Improve the str.title() method to better handle apostrophes in words.
Let's assume the current implementation of str.title() doesn't handle words with apostrophes correctly. For example, when the string "o'clock" is passed to str.title(), it returns "O'Clock" instead of the expected "O'Clock".
Objects/unicodeobject.cstr.title(), which might be named something like unicode_titleHere's a hypothetical modification to the function:
c /* ... */
for (i = 0; i < n; i++) {
Py_UCS1 ch = PyUnicode_READ(kind, data, i);
if (previous_is_cased) {
Py_UCS1 t = Py_UNICODE_TOLOWER(ch);
if (t != ch) {
SET_TITLE(i, t);
continue;
}
}
else if (ch != '\'' || i == 0 || !Py_UNICODE_ISALPHA(PyUnicode_READ(kind, data, i - 1))) {
Py_UCS1 t = Py_UNICODE_TOUPPER(ch);
if (t != ch) {
SET_TITLE(i, t);
continue;
}
}
previous_is_cased = Py_UNICODE_ISALPHA(ch);
}
/* ... */
Lib/test/test_string.pyStringTest classExample of a test for this change:
pythondef test_title_with_apostrophe(self):
s = "it's o'clock"
expected = "It's O'Clock"
self.assertEqual(s.title(), expected)
s = "they're here"
expected = "They're Here"
self.assertEqual(s.title(), expected)
Remember that this is just a hypothetical example. Always make sure to understand the issue you're working on, consult the documentation, and ask for help if needed.
Hypothetical change: Add a new method str.isnumeric_only() to check if a string contains only numeric characters (excluding decimal points, commas, and other special characters).
Objects/unicodeobject.cstr.isnumeric_only() method, e.g., unicode_isnumeric_onlyHere's a hypothetical implementation of the function:
cstatic PyObject *
unicode_isnumeric_only(PyObject *self)
{
void *data;
int kind;
Py_ssize_t i, n;
if (!PyUnicode_Check(self))
return PyBool_FromLong(0);
data = PyUnicode_DATA(self);
kind = PyUnicode_KIND(self);
n = PyUnicode_GET_LENGTH(self);
for (i = 0; i < n; i++) {
Py_UCS4 ch = PyUnicode_READ(kind, data, i);
if (!Py_UNICODE_ISDIGIT(ch))
return PyBool_FromLong(0);
}
return PyBool_FromLong(1);
}
unicode_methods array in the same filecstatic PyMethodDef unicode_methods[] = {
// ...
{"isnumeric_only", (PyCFunction) unicode_isnumeric_only, METH_NOARGS,
PyDoc_STR("Return True if the string contains only numeric characters, False otherwise.")},
// ...
}
Lib/test/test_unicode.pyTestUnicode classExample of a test for this change:
pythondef test_isnumeric_only(self):
self.assertTrue("12345".isnumeric_only())
self.assertTrue("00000".isnumeric_only())
self.assertFalse("3.14".isnumeric_only())
self.assertFalse("1,000".isnumeric_only())
self.assertFalse("abc123".isnumeric_only())
self.assertFalse(" 123 ".isnumeric_only())
Remember that this is just a hypothetical example. Always make sure to understand the issue you're working on, consult the documentation, and ask for help if needed.
Hypothetical bug fix: The str.split() method doesn't handle consecutive delimiters properly when a custom delimiter is provided.
Assume that the current implementation of str.split() doesn't handle consecutive custom delimiters correctly. For example, when the string "a;;b;c;;d" is passed to str.split(';'), it returns ['a', '', 'b', 'c', '', 'd'] instead of the expected ['a', 'b', 'c', 'd'].
Objects/unicodeobject.cstr.split(), which might be named something like unicode_splitHere's a hypothetical modification to the function:
c/* ... */
if (maxcount < 0)
maxcount = PY_SSIZE_T_MAX;
if (maxcount == 0 || (sub_len == 0 && maxcount == 1)) {
result = PyTuple_New(1);
if (!result)
return NULL;
Py_INCREF(self);
PyTuple_SET_ITEM(result, 0, self);
return result;
}
/* ... */
Lib/test/test_string.pyStringTest classExample of a test for this bug fix:
pythondef test_split_consecutive_delimiters(self):
s = "a;;b;c;;d"
expected = ['a', 'b', 'c', 'd']
self.assertEqual(s.split(';'), expected)
s = "a:::b:c::d"
expected = ['a', 'b', 'c', 'd']
self.assertEqual(s.split(':'), expected)
Remember that this is just a hypothetical example. Always make sure to understand the issue you're working on, consult the documentation, and ask for help if needed.
Hypothetical speed-up: Optimize the sum() function for cases when the input is a list of integers.
Currently, the sum() function works for any iterable containing numbers. However, when the input is a list of integers, it could be optimized using a more efficient algorithm that takes advantage of the specific data type.
Python/bltinmodule.csum() function, which might be named something like builtin_sumHere's a hypothetical modification to the function:
cstatic PyObject *
builtin_sum(PyObject *self, PyObject *args)
{
PyObject *seq;
PyObject *result = NULL;
PyObject *temp, *item, *iter;
if (!PyArg_ParseTuple(args, "O|O:sum", &seq, &result))
return NULL;
if (result == NULL) {
result = PyLong_FromLong(0);
if (result == NULL)
return NULL;
}
else
Py_INCREF(result);
if (PyList_CheckExact(seq) && PyList_GET_SIZE(seq) > 0) {
// Fast path for lists of integers
Py_ssize_t i, n = PyList_GET_SIZE(seq);
long total = PyLong_AsLong(result);
if (total == -1 && PyErr_Occurred()) {
Py_DECREF(result);
return NULL;
}
for (i = 0; i < n; i++) {
item = PyList_GET_ITEM(seq, i);
if (PyLong_CheckExact(item)) {
long value = PyLong_AsLong(item);
if (value == -1 && PyErr_Occurred()) {
Py_DECREF(result);
return NULL;
}
total += value;
}
else {
break;
}
}
if (i == n) {
// All items were integers, return the result
Py_DECREF(result);
return PyLong_FromLong(total);
}
}
// Fallback to the generic algorithm for other cases
iter = PyObject_GetIter(seq);
if (iter == NULL) {
Py_DECREF(result);
return NULL;
}
for (;;) {
item = PyIter_Next(iter);
if (item == NULL) {
if (PyErr_Occurred())
goto error;
break;
}
temp = PyNumber_Add(result, item);
Py_DECREF(item);
if (temp == NULL)
goto error;
Py_DECREF(result);
result = temp;
}
Py_DECREF(iter);
return result;
error:
Py_DECREF(iter);
Py_DECREF(result);
return NULL;
}
sum() function still works correctly after the optimization. This can be done in the Lib/test/test_builtin.py file.sum() function for lists of integers. You can use the timeit module or a benchmarking library like pytest-benchmark for this purpose.Example of a benchmark using timeit:
pythonimport timeit
def benchmark_sum():
integer_list = [i for i in range(1000)]
float_list = [float(i) for i in range(1000)]
integer_time = timeit.timeit('sum(integer_list)', globals=globals(), number=1000)
float_time = timeit.timeit('sum(float_list)', globals=globals(), number=100