Meeting retrospective
Review what worked well and what could be improved for attendees (Berlin + rest of world), audience, stakeholders, etc.
Scheduling note - this would fit best either at the end of the con or afterwards
Review what worked well and what could be improved for attendees (Berlin + rest of world), audience, stakeholders, etc.
Scheduling note - this would fit best either at the end of the con or afterwards
A session to look at how users of OpenRefine with Wikibase, Wikidata, and Wikimedia commons use the openRefine, what are the key use cases, key functions and key workflows.
Explore what needs improving and where the wiki* use cases overlap and diverge from other types of OpenRefine usage
I regularly deliver OpenRefine training, specifically to librarians, and I also wrote the original Library Carpentry OpenRefine course and I continue as a maintainer for that course.
I'm interested in both sharing my experience and sharing the experience of others as to good approaches to offering training on a tool which has so many possible functions and use cases
Presenting OpenRefine grant status and what is the funding and team structure plan for next 12 to 18 months
This session demonstrates the workflow of the reconciliation service in OpenRefine along with the recent improvements (before vs after) in it be it in the the UI or in the error display, this would also be a brief anecdote of my experience in working for OpenRefine both as an Outreachy intern and a contractor along with various improvements that can be done in future for even better reconciliation !
OpenRefine has developed robust capabilities for key operations for data import, wrangling and ingestion for several platforms. Different user segments have needs that the technical, complex and unintuitive interfaces cannot meet.
Coming from the Wikimedia / GLAM partnerships domain, I would love to see how different modalities could be used for reconciliation: geographic comparison, comparison of source materials, using visual clues, using exact matches via authority IDs, AI suggestions, use of related information etc.
It is obvious that OR development cannot meet all the possible needs of all the user segments, but what kind of architecture would allow building on top of OpenRefine, or connect with capabilities of OpenRefine to create tools for specific purposes while taking advantage of all current capabilities? What kind of production structure would be needed, who should coordinate and fund that?
Even if this overlaps with other proposals, the clue of this proposal is to focus on facilitating a variety of user experiences.
When you browse GitHub or GitLab you can start a browser based integrated development environment with just one click. You can then start hacking with only a web browser and afterwards commit your results back to the project.
What needs to be done to do the same with OpenRefine in the context of datasets?
As a user you browse a collection of datasets and with one click you have a dataset loaded as OpenRefine project in your browser. With another click the updated data is sent back to the data provider.
As a "data provider" you can start an OpenRefine instance for just a single user/session and populate it with a given dataset. After working on this dataset there is some callback to update the dataset.
More and more services document their REST endpoints using the OpenAPI specification (https://swagger.io/specification/). Using the OpenAPI specification it is quite simple to programmatically create REST requests for the documented API.
Supporting OpenAPI would allow more people to use this "technical" services, without the need for programming customized GUIs, export and/or import functionalities.
Use-Cases:
- Create OpenRefine projects from REST endpoints.
- Create REST requests from columns similar to the "Add column by fetching URLs" feature in OpenRefine.
- Create or update items on a REST endpoint via an export dialog.
Challenges:
- Setting custom headers (User-Agents, ...)
- Consider API rate limits
- Support paging
- Support API authentication
- Map columns to request properties/payloads
- ...
There is some overlap with the session proposal "Supporting data upload to more platforms". The goal of this session would be to identify and define more use cases and challenges regarding the support of this specific technology. Maybe even get technical and start some first experiments.
Bocoup has interviewed several members of the OpenRefine community to draft initial versions of OpenRefine’s Mission, Vision, and Values.
In this session, we will share updates on the engagement thus far, reflect on the importance of a coherent Vision that guides Mission & Values, and workshop these drafts in small groups.
Who can contribute to OpenRefine's design framework? What workflows would enable not only broader participation, but also more actionable and achievable design contributions? Design practice in open source products and communities is notoriously difficult to implement in equitable and effective ways, so with this session we can discuss together how this could work in the context of OpenRefine: What can we learn from design projects carried out with OpenRefine in the past, and what could a future design contribution framework for OpenRefine look like.
It'd be great to explore options to improve basic data visualization in OpenRefine. Using visualization tools as part of facets is great IMO but the visual side could be improved, especially if we're better integrating GIS tools to OpenRefine. We could discuss the scope we're comfortable with when it comes to exploratory analyses and suggest types of visualizations/analyses we feel are lacking right now.
It would be great to have improved tools to read/write common GIS formats like shapefiles. Perhaps having a closer collaboration with OSM? ArcGIS is becoming fairly open to open-source as well and they basically dominate the academic world and multiple industries. Some governments are turning to QGIS, which might be the fastest way forward for now, but it'd be interesting to hear from anyone who has specific challenges with GIS and OpenRefine.
Adding geospatial data to non-geospatial datasets can also represent a bit of a challenge for beginners when there are so many good geospatial sources out there.
Our ongoing discussions throughout 2023 and early 2024 (https://forum.openrefine.org/t/improving-the-onboarding-process-for-new-contributors/882) have highlighted the need for a more systematic approach to managing our contributors. This session will serve as a platform to share our ideas and discuss how we can formalize the process. We will looking into the following points:
1. Defining Contributor Roles: What are the various contributor roles we should recognize and encourage within our community? How can we ensure that each role is clear and meaningful?
2. Setting Permission Levels: What should the permission levels be for different stages of involvement (e.g., new contributor vs. long-term committer)? How can these levels help in managing community contributions more effectively?
3. Managing Permissions: What processes should we implement for granting and revoking permissions? What criteria should be used to ensure fairness and transparency in these decisions?
Quick demos of other tools that OpenRefine should take inspiration from
OpenRefine currently supports data upload to Wikibase, but also to RDF databases via the RDF Transform extension or to SNAC also via an extension for instance. More such extensions (or forks) have been developed in the past, with varied maintenance status. A recent discussion with the GND community explored the idea of building a similar integration to populate the authority file of the German national library.
How can we enable more of such integrations? Are there specific integrations we want to prioritize? Should those be developed inside the OpenRefine project or outside, as extensions?
The goal of this session is to explore those questions together: mapping the ecosystem of existing and potential integrations, reflecting on what their common needs are and how OpenRefine can meet them better.
It would be nice to have someone go over the general roadmap of OpenRefine at the start. A final session on last day might be where we summarize the consensus and put into GitHub Project the Roadmap plan (perhaps with milestones). From my personal viewpoint, it seems we just need to get consensus on priorities which seem shifty to our users. We indeed already have some forum posts on Roadmap discussions, but for many users, it feels disconnected from "what is possible" versus "what do users need" versus the actually realized "what can we get funded to actually work on despite our users needs".
(a sub-component of this discussion is how best to present dependencies of Roadmap items to users? Many other projects in GitHub use milestone partitioning, instead of Tracks and Tracked by fields. For example: "4.0-first", "4.0-second" "4.0-extensions-first", etc. in order to know that the "first" issues need to be worked first because "second" issues have dependency on the "first".)
(another sub-component of this discussion is adding new fields to the GitHub Project tables to enhance Roadmap visibility)