Thanks to visit codestin.com
Credit goes to huggingface.co

AI & ML interests

None defined yet.

Recent Activity

christopher 
posted an update 12 days ago
view post
Post
382
Something very cool is cooking at Lichess
  • 1 reply
·
meg 
posted an update about 1 month ago
view post
Post
2812
🤖 As AI-generated content is shared in movies/TV/across the web, there's one simple low-hanging fruit 🍇 to help know what's real: Visible watermarks. With the Gradio team, I've made sure it's trivially easy to add this disclosure to images, video, chatbot text. See how: https://huggingface.co/blog/watermarking-with-gradio
Thanks to the code collab in particular from @abidlabs and Yuvraj Sharma.
yjernite 
posted an update about 1 month ago
view post
Post
2335
Tremendous quality of life upgrade on the Hugging Face Hub - we now have auto-complete emojis 🤗 🥳 👏 🙌 🎉

Get ready for lots more very serious analysis on a whole range of topics from yours truly now that we have unlocked this full range of expression 😄 🤔 🗣 🙊
meg 
posted an update 2 months ago
meg 
posted an update 2 months ago
view post
Post
435
🤖 ICYMI: Yesterday, Hugging Face and OpenAI partnered to bring open source GPT to the public. This is a Big Deal in "AI world".

0. Common ground setting: OpenAI is the ChatGPT people. An “open source” model is one whose weights are available — that means the model can be “yours”.
1. You don’t have to interact with the company directly, nor give them your interactions, to use the system. The company can't "surveil" you.
2. You can evaluate the unique contributions of their SOTA model much more rigorously than you can when there are collections of models+code behind a closed API. You can find out specifically what the model can and can't do.
3. And you can directly customize it for whatever you'd like. Fine-tuning, wherein you give the model data that's tailored to your use cases and train it some more on that data, is trivial* when you have the model weights.
*Provided you have the compute.
4. You can directly benchmark whatever you'd like. Biases? Energy usage? Strengths/weaknesses? Go for it. You wants it you gots it--this transparency helps people understand SOTA *in general*, not just for this model, but points to, e.g., what's going on with closed Google models as well.
5. One of the most powerful things about "openness" that I've learned is that it cultivates ecosystems of collaborators building on top of one another's brilliance to make systems that are significantly better than they would be if created in isolation.
But, caveat wrt my own philosophy...
6. I do not take it as a given that advancing LLMs is good, and have a lot more to say wrt where I think innovation should focus more. For example, a focus on *data* -- curation, measurement, consent, credit, compensation, safety -- would deeply improve technology for everyone.
7. The transparency this release provides is massive for people who want to *learn* about LLMs. For the next generation of technologists to advance over the current, they MUST be able to learn about what's happening now. (cont...)
  • 1 reply
·
meg 
posted an update 3 months ago
view post
Post
492
🤖 👾 Thanks so much to BBC News and the stellar Suranjana Tewari for having me on to talk about US <—> China relationship in AI, and what it means for AI ethics.
yjernite 
posted an update 3 months ago
view post
Post
4177
𝗙𝗶𝗿𝘀𝘁 𝗚𝗣𝗔𝗜 𝗠𝗼𝗱𝗲𝗹 𝘄𝗶𝘁𝗵 𝗘𝗨 𝗗𝗮𝘁𝗮 𝗧𝗿𝗮𝗻𝘀𝗽𝗮𝗿𝗲𝗻𝗰𝘆 𝗧𝗲𝗺𝗽𝗹𝗮𝘁𝗲? 🇪🇺

With the release of the EU data transparency template this week, we finally got to see one of the most meaningful artifacts to come out of the AI Act implementation so far (haven't you heard? AI's all about the data! 📊📚)

The impact of the template will depend on how effectively it establishes a minimum meaningful transparency standard for companies that don't otherwise offer any transparency into their handling of e.g. personal data or (anti?-)competitive practices in commercial licensing - we'll see how those play out as new models are released after August 2nd 👀


In the meantime, I wanted to see how the template works for a fully open-source + commercially viable model, so I filled it out for the SmolLM3 - which my colleagues at Hugging Face earlier this month 🤗 ICYMI, it's fully open-source with 3B parameters and performance matching the best similar-size models (I've switched all my local apps from Qwen3 to it, you should too 💡)

Verdict: congrats to the European Commission AI Office for making it so straightforward! Fully open and transparent models remain a cornerstone of informed regulation and governance, but the different organizational needs of their developers aren't always properly accounted for in new regulation. In this case, it took me all of two hours to fill out and publish the template (including reading the guidelines) - so kudos for making it feasible for smaller and distributed organizations 🙌 Definitely a step forward for transparency 🔍

To learn more have a look at:

- The SmolLM3 model: HuggingFaceTB/SmolLM3-3B
- Its filled out Public Summary of Training Content: hfmlsoc/smollm3-eu-data-transparency
- And if you're interested, some previous remarks on regulatory minimum meaningful standards for data disclosure: https://huggingface.co/blog/yjernite/naiac-data-transparency
yjernite 
posted an update 4 months ago
meg 
posted an update 6 months ago
yjernite 
posted an update 6 months ago
view post
Post
3411
Today in Privacy & AI Tooling - introducing a nifty new tool to examine where data goes in open-source apps on 🤗

HF Spaces have tons (100Ks!) of cool demos leveraging or examining AI systems - and because most of them are OSS we can see exactly how they handle user data 📚🔍

That requires actually reading the code though, which isn't always easy or quick! Good news: code LMs have gotten pretty good at automatic review, so we can offload some of the work - here I'm using Qwen/Qwen2.5-Coder-32B-Instruct to generate reports and it works pretty OK 🙌

The app works in three stages:
1. Download all code files
2. Use the Code LM to generate a detailed report pointing to code where data is transferred/(AI-)processed (screen 1)
3. Summarize the app's main functionality and data journeys (screen 2)
4. Build a Privacy TLDR with those inputs

It comes with a bunch of pre-reviewed apps/Spaces, great to see how many process data locally or through (private) HF endpoints 🤗

Note that this is a POC, lots of exciting work to do to make it more robust, so:
- try it: yjernite/space-privacy
- reach out to collab: yjernite/space-privacy